Prin digcommselectedsoln

SELECTED SOLUTIONS TO
PRINCIPLES OF DIGITAL
COMMUNICATION
Cambridge Press 2008
by ROBERT G. GALLAGER
A complete set of solutions is available from Cambridge Press for instructors teaching a class
using this text. This is a subset of solutions that I feel would be valuable for those studying the
subject on their own.
Chapter 2
Exercise 2.2:
(a) V +W is a random variable, so its expectation, by definition, is
E[V +W] =
X
v∈V
X
w∈W
(v + w)pVW(v,w)
=
X
v∈V
X
w∈W
v pVW(v,w) +
X
v∈V
X
w∈W
w pVW(v,w)
=
X
v∈V
v
X
w∈W
pVW(v,w) +
X
w∈W
w
X
v∈V
pVW(v,w)
=
X
v∈V
v pV (v) +
X
w∈W
w pW(w)
= E[V ] + E[W].
(b) Once again, working from first principles,
E[V ·W] =
X
v∈V
X
w∈W
(v · w)pVW(v,w)
=
X
v∈V
X
w∈W
(v · w)pV (v)pW(w) (Using independence)
=
X
v∈V
v pV (v)
X
w∈W
w pW(w)
= E[V ] · E[W].
(c) To discover a case where E[V ·W]6= E[V ] ·E[W], first try the simplest kind of example where
V and W are binary with the joint pmf
pVW(0, 1) = pVW(1, 0) = 1/2; pVW(0, 0) = pVW(1, 1) = 0.
1

Clearly, V and W are not independent. Also, E[V · W] = 0 whereas E[V ] = E[W] = 1/2 and
hence E[V ] · E[W] = 1/4.
The second case requires some experimentation. One approach is to choose a joint distribution
such that E[V ·W] = 0 and E[V ] = 0. A simple solution is then given by the pmf,
pVW(−1, 0) = pVW(0, 1) = pVW(1, 0) = 1/3.
Again, V and W are not independent. Clearly, E[V ·W] = 0. Also, E[V ] = 0 (what is E[W]?).
Hence, E[V ·W] = E[V ] · E[W].
(d)
σ2V
+W = E[(V +W)2] − (E[V +W])2
2W
2V
= E[V 2] + E[W2] + E[2V ·W] − (E[V ] + E[W])2
= E[V 2] + E[W2] + 2E[V ] · E[W] − E[V ]2 − E[W]2 − 2E[V ] · E[W]
= E[V 2] − E[V ]2 + E[W2] − E[W]2
= σ+ σ.
Exercise 2.4:
(a) Since X1 and X2 are iid, symmetry implies that Pr(X1 > X2) = Pr(X2 > X1). These two
events are mutually exclusive and the event X1 = X2 has 0 probability. Thus Pr(X1 > X2) and
Pr(X1 < X2) sum to 1, so must each be 1/2. Thus Pr(X1 ≥ X2) = Pr(X2 ≥ X1) = 1/2.
(b) Invoking the symmetry among X1, X2 and X3, we see that each has the same probability of
being the smallest of the three (the probability of a tie is 0). These three events are mutually
exclusive and their probabilities must add up to 1. Therefore each event occurs with probability
1/3.
(c) The event {N > n} is the same as the event {X1 is the minimum among the n iid
random variables X1, X2, · · · , Xn}. By extending the argument in part (b), we see that
Pr(X1 is the smallest of X1, . . . ,Xn) = 1/n. Finally, Pr {N ≥ n} = Pr {N > n − 1}= 1
n−1 for
n ≥ 2.
(d) Since N is a non-negative integer random variable (taking on values from 2 to 1), we can
use Exercise 2.3(a) as follows:
E[N] =
1X
n=1
Pr {N ≥ n}
= Pr {N ≥ 1} +
1X
n=2
Pr {N ≥ n}
= 1 +
1X
n=2
1
n − 1
= 1 +
1X
n=1
1
n
.
2

Since the series
P
1n=1
1n
diverges, we conclude that E[N] = 1.
(e) Since the alphabet has a finite number of letters,1 Pr(X1 = X2) is no longer 0 and depends
on the particular probability distribution. Thus, although, Pr(X1 ≥ X2) = Pr(X2 ≥ X1) by
symmetry, neither can be found without knowing the distribution.
Out of the alphabet letters with nonzero probability, let amin be a letter of minimum numeric
value. If X1 = amin, then no subsequent rv X2,X3, . . . can have a smaller value, so N = 1 in
this case. Since the event X1 = amin occurs with positive probability, E[N] = 1.
Exercise 2.6:
(a) Assume the contrary; i.e., there is a suffix-free code that is not uniquely decodable. Then that
code must contain two distinct sequences of source letters, say, x1, x2, . . . , xn and x01, x02, . . . , x0m
such that,
C(x1)C(x2) . . . C(xn) = C(x01)C(x02) . . . C(x0m).
Then one of the following must hold:
• C(xn) = C(x0m)
• C(xn) is a suffix of C(x0m)
• C(x0m) is a suffix of C(xn).
In the last two cases we arrive at a contradiction since the code is hypothesized to be suffix-free.
In the first case, xn must equal x0m because of the suffix freedom. Simply delete that final letter
from each sequence and repeat the argument. Since the sequences are distinct, the final letter
must differ after some number of repetitions of the above argument, and at that point one of
the latter two cases holds and a contradiction is reached.
Hence, suffix-free codes are uniquely decodable.
(b) Any prefix-free code becomes a suffix-free code if the ordering of symbols in each codeword
is reversed. About the simplest such example is {0,01,11} which can be seen to be a suffix-free
code (with codeword lengths {1, 2, 2}) but not a prefix-free code.
A codeword in the above code cannot be decoded as soon as its last bit arrives at the decoder.
To illustrate a rather extreme case, consider the following output produced by the encoder,
0111111111 . . .
Assuming that source letters {a,b,c} map to {0,01,11}, we cannot distinguish between the two
possible source sequences,
acccccccc . . .
and
bcccccccc . . . ,
1The same results can be obtained with some extra work for a countably infinite discrete alphabet.
3

till the end of the string is reached. Hence, in this case the decoder might have to wait for an
arbitrarily long time before decoding.
(c) There cannot be any code with codeword lengths (1, 2, 2) that is both prefix free and suffix
free. Without loss of generality, set C1 = 0. Then a prefix-free code cannot use either the
codewords 00 and 01 for C2 or C3, and thus must use 10 and 11, which is not suffix free.
Exercise 2.7: Consider the set of codeword lengths (1,2,2) and arrange them as (2,1,2). Then,
u1=0 is represented as 0.00. Next, u2 = 1/4 = 0.01 must be represented using 1 bit after the
binary point, which is not possible. Hence, the algorithm fails.
Exercise 2.9:
(a) Assume, as usual, that pj > 0 for each j. From Eqs. (2.8) and (2.9)
H[X] − L =
MX
j=1
pj log
2−lj
pj ≤
MX
j=1
pj
Σ
2−lj
pj − 1
Π
log e = 0.
As is evident from Figure 2.7, the inequality is strict unless 2−lj = pj for each j. Thus if
H[X] = L, it follows that 2−lj = pj for each j.
(b) First consider Figure 2.4, repeated below, assuming that Pr(a) = 1/2 and Pr(b) = Pr(c) =
1/4. The first order node 0 corresponds to the letter a and has probability 1/2. The first order
node 1 corresponds to the occurence of either letter b or c, and thus has probability 1/2.
✓
✓
✏✏✏
PPP
✏✏✏PPP
PPP
✟✟✟
❍❍❍PPP
✓✓
❅
❅❅
✏✏✏
b
✏✏✏
1
0
✏✏✏
PPP
✟✟✟
PPP
a
c
bb
bc
cb
cc
ba
ca
ab
ac
aa
aa → 00
ab → 011
ac → 010
ba → 110
bb → 1111
bc → 1110
ca → 100
cb → 1011
cc → 1010
Similarly, the second order node 00 corresponds to aa, which has probability 1/4, and the second
order node 01 corresponds to either ab or ac, which have cumulative probability 1/4. In the
same way, 10 amd 11 correspond to b and c, with probabilities 1/4 each. One can proceed with
higher order nodes in the same way, but what is the principle behind this?
In general, when an infinite binary tree is used to represent an unending sequence of letters
from an iid source where each letter j has probability pj and length `j = 2−j , we Q
see that each
node corresponding to an initial sequence of letters x1, . . . , xn has a probability
i 2−`xi equal
to the product of the individual letter probabilities and an order equal to
P
i `xi . Thus each
node labelled by a subsequence of letters has a probability 2−` where ` is the order of that node.
The other nodes (those unlabelled in the example above) have a probability equal to the sum of
the immediately following labelled nodes. This probability is again 2−` for an `th order node,
which can be established by induction if one wishes to be formal.
4

Exercise 2.11: (a) For n = 2,


MX
j=1
2−lj


2
=


MX
j1=1
2−lj1




MX
j2=1
2−lj2

 =
MX
j1=1
MX
j2=1
2−(lj1+lj2 ).
The same approach works for arbitrary n.
(b) Each source n-tuple xn = (aj1aj2 , . . . , ajn), is encoded into a concatenation
C(aj1)C(aj2) . . .C(ajn) of binary digits of aggregate length l(xn) = lj1 + lj2 + · · · ,+ljn. Since
there is one n-tuple xn for each choice of aj1 , aj2 , . . . , ajn, the result of part (a) can be rewritten
as


MX
j=1
2−lj


n
=
X
xn
2−l(xn). (1)
(c) Rewriting (1) in terms of the number Ai of concatenations of n codewords of aggregate length
i,


MX
j=1
2−lj


n
=
nXlmax
i=1
Ai2−i.
This uses the fact that since each codeword has length at most lmax, each concatenation has
length at most nlmax.
(d) From unique decodability, each of these concatenations must be different, so there are at
most 2i concatenations of aggregate length i, i.e., Ai ≤ 2i. Thus, since the above sum contains
at most nlmax terms,


MX
j=1
2−lj


n
≤ nlmax. (2)
(e) Note that
[nlmax]1/n = exp
Σ
ln(nlmax)
n
Π
−→ exp(0) = 1
as n → 1. Since (2) must be satisfied for all n, the Kraft inequality must be satisfied.
Exercise 2.13:
(a) In the Huffman algorithm, we start by combining p3 and p4. Since we have p1 = p3+p4 ≥ p2,
we can combine p1 and p2 in the next step, leading to all codewords of length 2. We can also
combine the supersymbol obtained by combining symbols 3 and 4 with symbol 2, yielding
codewords of lengths 1,2,3 and 3 respectively.
(b) Note that p3 ≤ p2 and p4 ≤ p2 so p3 + p4 ≤ 2p2. Thus
p1 = p3 + p4 ≤ 2p2 which implies p1 + p3 + p4 ≤ 4p2.
5

Since p2 = 1−p1 −p3 −p4, the latter equation implies that 1−p2 ≤ 4p2, or p2 ≥ 0.2. From the
former equation, then, p1 ≤ 2p2 ≤ 0.4 shows that p1 ≤ 0.4. These bounds can be met by also
choosing p3 = p4 = 0.2. Thus pmax = 0.4.
(c) Reasoning similarly to part (b), p2 ≤ p1 and p2 = 1 − p1 − p3 − p4 = 1 − 2p1. Thus
1 − 2p1 ≤ p1 so p1 ≥ 1/3, i.e., pmin = 1/3. This bound is achievable by choosing p1 = p2 = 1/3
and p3 = p4 = 1/6.
(d) The argument in part (b) remains the same if we assume p1 ≤ p3+p4 rather than p1 = p3+p4,
i.e., p1 ≤ p3 + p4 implies that p1 ≤ pmax. Thus assuming p1 > pmax implies that p1 > p3 + p4.
Thus the supersymbol obtained by combining symbols 3 and 4 will be combined with symbol
2 (or perhaps with symbol 1 if p2 = p1). Thus the codeword for symbol 1 (or perhaps the
codeword for symbol 2) will have length 1.
(e) The lengths of any optimal prefix free code must be either (1, 2, 3, 3) or (2, 2, 2, 2). If p1 >
pmax, then, from (b), p1 > p3 + p4, so the lengths (1, 2, 3, 3) yield a lower average length than
(2, 2, 2, 2).
(f) The argument in part (c) remains almost the same if we start with the assumption that
p1 ≥ p3 + p4. In this case p2 = 1 − p1 − p3 − p3 ≥ 1 − 2p1. Combined with p1 ≥ p2, we again
have p1 ≥ pmin. Thus if p1 < pmin, we must have p3 + p4 > p1 ≥ p2. We then must combine p1
and p2 in the second step of the Huffman algorithm, so each codeword will have length 2.
(g) It turns out that pmax is still 2/5. To see this, first note that if p1 = 2/5, p2 = p3 = 1/5
and all other symbols have an aggregate probability of 1/5, then the Huffman code construction
combines the least likely symbols until they are tied together into a supersymbol of probability
1/5. The completion of the algorithm, as in part (b), can lead to either one codeword of length
1 or 3 codewords of length 2 and the others of longer length. If p1 > 2/5, then at each stage of
the algorithm, two nodes of aggregate probability less than 2/5 are combined, leaving symbol
1 unattached until only 4 nodes remain in the reduced symbol set. The argument in (d) then
guarantees that the code will have one codeword of length 1.
Exercise 2.15:
(a) This is the same as Lemma 2.5.1.
(b) Since p1 < pM−1 + pM, we see that p1 < p0M−1, where p0M−1 is the probability of the node
in the reduced code tree corresponding to letters M − 1 and M in the original alphabet. Thus,
by part (a), l1 ≥ l0M
−1 = lM − 1.
(c) Consider an arbitrary minimum-expected-length code tree. This code tree must be full (by
Lemma 2.5.2), so suppose that symbol k is the sibling of symbol M in this tree. If k = 1,
then l1 = lM, and otherwise, p1 < pM + pk, so l1 must be at least as large as the length of the
immediate parent of M, showing that l1 ≥ lM − 1.
(d) and (e) We have shown that the shortest and longest length differ by at most 1, with some
number m ≥ 1 lengths equal to l1 and the remainingM−m lengths equal to l1+1. It follows that
2l1+1 = 2m+(M −m) = M +m. From this is follows that l1 = blog2(M)c and m = 2l1+1 −M.
Exercise 2.16:
(a) Grow a full ternary tree to a full ternary tree at each step. The smallest tree has 3 leaves. For
the next largest full tree, convert one of the leaves into an intermediate node and grow 3 leaves
6

from that node. We lose 1 leaf, but gain 2 more at each growth extension. Thus, M = 3 + 2n
(for n an integer).
(b) It is clear that for optimality, all the unused leaves in the tree must have the same length
as the longest codeword. For M even, combine the 2 lowest probabilities into a node at the
first step, then combine the 3 lowest probability nodes for all the rest of the steps until the root
node. If M is odd, a full ternary tree is possible, so combine the 3 lowest probability nodes at
each step.
(c) If {a, b, c, d, e, f} have symbol probabilities {0.3, 0.2, 0.2, 0.1, 0.1, 0.1} respectively, then the
ternary Huffman code will be {a → 0, b → 1, c → 20, d → 21, e → 220, f → 221}.
Exercise 2.18:
(a) Applying the Huffman coding algorithm to the code with M +1 symbols with pM+1 = 0, we
combine symbol M +1 with symbol M and the reduced code has M symbols with probabilities
p1, . . . , pM. The Huffman code for this reduced set of symbols is simply the code for the original
set of symbols with symbol M + 1 eliminated. Thus the code including symbol M + 1 is the
reduced code modified by a unit length increase in the codeword for symbolM. Thus L = L0+pM
where L0 is the expected length for the code with M symbols.
(b) All n of the zero probability symbols are combined together in the Huffman algorithm, and
the reduced code from this combination is then the same as the code with M + 1 symbols in
part (a). Thus L = L0 + pM again.
Exercise 2.19:
(a) The entropies H(X), H(Y ), and H(XY ) can be expressed as
H(XY ) = −
X
x∈X,y∈Y
pXY (x, y) log pXY (x, y)
H(X) = −
X
x∈X,y∈Y
pXY (x, y) log pX(x)
H(Y ) = −
X
x∈X,y∈Y
pXY (x, y) log pY (y).
It is assumed that all symbol pairs x, y of zero probability have been removed from this sum,
and thus all x (y) for which pX(x) = 0 ( pY (y) = 0) are consequently removed. Combining these
equations,
H(XY ) − H(X) − H(Y ) =
X
x∈X,y∈Y
pXY (x, y) log pX(x)pY (y)
pXY (x, y) .
(b) Using the standard inequality log x ≤ (x − 1) log e,
H(XY ) − H(X) − H(Y ) ≤
X
x∈X,y∈Y
pXY (x, y)
Σ
pX(x)pY (y)
pXY (x, y) − 1
Π
log e = 0.
Thus H(X, Y ) ≤ H(X)+H(Y ). Note that this inequality is satisfied with equality if and only if
X and Y are independent.
7

(c) For n symbols, X1, . . . ,Xn, let Y be the ‘super-symbol’ X2, . . . ,Xn. Then using (b),
H(X1, . . . ,Xn) = H(X1, Y ) ≤ H(X1) + H(Y ) = H(X1) + H(X2, . . . ,Xn).
Iterating this gives the desired result.
An alternate approach generalizes part (b) in the following way:
H(X1, . . . ,Xn) −
X
i
H(Xi) =
X
x1,... ,xn
p(x1, . . . , xn) log p(x1), . . . , ...p(xn)
p(x1, . . . , xn)
≤ 0,
where we have used log x ≤ (x − 1) log e again.
Exercise 2.20
(a) Y is 1 if X = 1, which occurs with probability p1. Y is 0 otherwise. Thus
H(Y ) = −p1 log(p1) − (1 − p1) log(1 − p1) = Hb(p1).
(b) Given Y =1, X = 1 with probability 1, so H(X | Y =1) = 0.
(c) Given Y =0, X=1 has probability 0, so X hasM−1 possible choices with non-zero probability.
The maximum entropy for an alphabet of sizeM−1 terms is log(M−1), so H(X|Y =0) ≤ log(M− 1). This upper bound is met with equality if Pr(X=j | X6=1) = 1
M−1 for all j6= 1. Since
Pr(X=j|X6=1) = pj/(1 − p1), this upper bound on H(X | Y =0) is achieved when p2 = p3 =
· · · = pM. Combining this with part (b),
H(X | Y ) = p1H(X | Y =1) ≤ (1−p1) log(M − 1).
(d) Note that
H(XY ) = H(Y ) + H(X|Y ) ≤ Hb(p1) + (1−p1) log(M−1)
and this is met with equality for p2 = · · · , pM. There are now two reasonable approaches. One
is to note that H(XY ) can also be expressed as H(X) + H(Y |X). Since Y is uniquely specified
by X, H(Y |X) = 0,
H(X) = H(XY ) ≤ Hb(p1) + (1 − p1) log(M − 1), (3)
with equality when p2 = p3 = · · · = pM. The other approach is to observe that H(X) ≤ H(XY ),
which again leads (3), but this does not immediately imply that equality is met for p2 = · · · = pM.
Equation (3) is the Fano bound of information theory; it is useful when p1 is very close to 1 and
plays a key role in the noisy channel coding theorem.
(e) The same bound applies to each symbol by replacing p1 by pj for any j, 1 ≤ j ≤ M. Thus
it also applies to pmax.
23
Exercise 2.22: One way to generate a source code for (X1,X2,X3 is to concatenate a Huffman
code for (X1,X2) with a Huffman code of X3. The expected length of the resulting code for
(X1,X2,X3) is Lmin,2 + Lmin. The expected length per source letter of this code is Lmin,2 +
8

Lmin. The expected length per source letter of the optimal code for (X1,X2,X3) can be no
worse, so
13
Lmin,3 ≤
2
3Lmin,2 +
1
3Lmin.
Exercise 2.23: (Run Length Coding)
(a) Let C and C0 be the codes mapping source symbols to intermediate integers and intermediate
integers to outbit bits respectively. If C0 is uniquely decodable, then the intermediate integers
can be decoded from the received bit stream, and if C is also uniquely decodable, the original
source bits can be decoded.
The lengths specified for C0 satisfy Kraft and thus this code can be made prefix-free and thus
uniquely decodable. For example, mapping 8 → 1 and each other integer to 0 followed by its 3
bit binary representation is prefix-free.
C is a variable to fixed length code, mapping {b, ab, a2b, . . . , a7b, a8} to the integers 0 to 8. This
set of strings forms a full prefix-free set, and thus any binary string can be parsed into these
‘codewords’, which are then mapped to the integers 0 to 8. The integers can then be decoded
into the ‘codewords’ which are then concatenated into the original binary sequence. In general, a
variable to fixed length code is uniquely decodable if the encoder can parse, which is guaranteed
if that set of ‘codewords’ is full and prefix-free.
(b) Each occurence of source letter b causes 4 bits to leave the encoder immediately. In addition,
each subsequent run of 8 a’s causes 1 extra bit to leave the encoder. Thus, for each b, the encoder
emits 4 bits with probability 1; it emits an extra bit with probability (0.9)8; it emits yet a further
bit with probability (0.9)16 and so forth. Letting Y be the number of output bits per input b,
E(Y ) = 4 + (.09)8 + (0.9)16 + · · · =
4 · (0.9)8
1 − (0.9)8 = 4.756.
(c) To count the number of b’s out of the source, let Bi = 1 if the ith source letter is b and
Bi = 0 otherwise. Then E(Bi) = 0.1 and σ2B
i = 0.09. Let AB = (1/n)
Pn i=1 Bi be the number of
b’s per input in a run of n = 1020 inputs. This has mean 0.1 and variance (0.9) · 10−21, which is
close to 0.1 with very high probability. As the number of trials increase, it is closer to 0.1 with
still higher probability.
(d) The total number of output bits corresponding to the essentially 1019 b’s in the 1020 source
letters is with high probability close to 4.756 · 1019(1 + ≤) for small ≤. Thus,
L ≈
(0.1) · 4 · (0.9)8
1 − (0.9)8 = 0.4756.
Renewal theory provides a more convincing way to fully justify this solution.
Note that the achieved L is impressive considering that the entropy of the source is
−(0.9) log(0.9) − (0.1) log(0.1) = 0.469 bits/source symbol.
Exercise 2.25:
(a) Note that W takes on values −log(2/3) with probability 2/3 and −log(1/3) with probability
1/3. Thus E(W) = log 3 − 23
. Note that E(W) = H(X). The fluctuation of W around its means
is −1
3 with probability 23
and 23
with probability 13
. Thus σ2W
= 29
.
9

(b) The bound on the probability of the typical set, as derived using the Chebyshev inequality,
and stated in (2.2) is:
Pr(Xn ∈ T≤) ≥ 1 −
σ2W
n≤2 = 1 −
1
45.
2Y
(c) To count the number of a’s out of the source, let the rv Yi(Xi) be 1 for Xi = a and 0 for
Xi = b. The Yi(Xi)’s are iid with mean Y = 2/3 and σ= 2/9. Na(Xn) is given by
Na =
Xn
i=1
Yi(Xi),
which has mean 2n/3 and variance 2n/9.
(d) Since the n-tuple Xn is iid, the sample outcome w(xn) =
P
i w(xi). Let na be the sample
value of Na corresponding to xn. Since w(a) = −log 2/3 and w(b) = −log 1/3, we have
w(xn) = na(−log 2/3) + (n − na)(−log 1/3) = n log 3 − na
W(Xn) = n log 3 − Na.
In other words, ˜W
(Xn), the fluctuation of W(Xn) around its mean, is the negative of the
fluctuation of Na(Xn); that is ˜W
(Xn) = − Ñ
a(Xn).
(e) The typical set is given by:
T≤n
=
Ω
xn :
ØØØØ
ØØØØ
w(xn)
n − E[W]
æ
=
< ≤
Ω
xn :
ØØØØ
˜ w(xn)
n
ØØØØ
æ
< ≤
=
Ω
xn :
ØØØØ
ña(xn)
n
ØØØØ
æ
=
< ≤
Ω
xn : 105
μ
2
3 − ≤
∂
< na(xn) < 105
μ
2
3
∂æ
+ ≤
.
where we have used ˜ w(xn) = −ña(xn). Thus, α = 105
°23
− ≤
¢
and β = 105
°23
¢
.
+ ≤
(f) From part (c), Na = 2n/3 and σ2N
a = 2n/9.
The CLT says that for n large, the sum of n iid random variables (rvs) has a distribution
function close to Gaussian within several standard deviations from the mean. As n increases,
the range and accuracy of the approximation increase. In this case, α and β are 103 below
and above the mean respectively. The standard deviation is
p
2 · 105/9, so α and β are about
6.7 standard deviations from the mean. The probability that a Gaussian rv is more than 6.7
standard deviations from the mean is about (1.6) · 10−10.
This is not intended as an accurate approximation, but only to demonstrate the weakness of the
Chebyshev bound, which is useful in bounding but not for numerical approximation.
Exercise 2.26: Any particular string xn which has i a’s and n−i b’s has probability
°23
¢i °13
¢n−i.
This is maximized when i = 105, and the corresponding probability is 10−17,600. Those strings
with a single b have a probability 1/2 as large, and those with 2 b’s have a probability 1/4 as
large. Since there are
°n
i
¢
different sequences that have exactly i a’s and n − i b’s,
Pr{Na = i} =
μ
n
i
∂μ
2
3
∂i μ
1
3
∂n−i
.
10

Evaluating for i = n, n−1, and n−2 for n = 105:
Pr{Na = n} =
μ
2
3
∂n
≈ 10−17,609
Pr{Na = n−1} = 105
μ
2
3
∂n−1 μ
1
3
∂
≈ 10−17604
Pr{Na = n−2} =
μ
105
2
∂μ
2
3
∂n−2 μ
1
3
∂2
≈ 10−17600.
What this says is that the probability of any given string with na ones decreases as na decreases,
while the aggregate probability of all strings with na a’s increases with na (for na large compared
to Na). We saw in the previous exercise that the typical set is the set where na is close to Na
and we now see that the most probable individual strings have fantastically small probability,
both individually and in the aggregate, and thus can be ignored.
Exercise 2.28:
(a) The probability of an n-tuple xn = (x1, . . . , xn) is pXn(xn) =
Qnk
=1 pX(xk). This product
includes Nj(xn) terms xk for which xk is the letter j, and this is true for each j in the alphabet.
Thus
pXn(xn) =
MY
j=1
pNj (xn)
j . (4)
(b) Taking the log of (4),
−log pXn(xn) =
X
j
Nj(xn) log
1
pj
. (5)
Using the definition of Sn
≤ , all xn ∈ Sn
≤ must satisfy
X
j
npj(1 − ≤) log
1
pj
<
X
j
Nj(xn) log
1
pj
< npj(1 + ≤) log
1
pj
nH(X)(1 − ≤) <
X
j
Nj(xn) log
1
pj
< nH(X)(1 + ≤).
Combining this with (5, every xn ∈ S≤(n) satisfies
H(X)(1 − ≤) < −log pXn(xn)
n
< H(X)(1 + ≤). (6)
(c) With ≤0 = H(X)≤, (6) shows that for all xn ∈ Sn
≤ ,
H(X) − ≤0 < −log pXn(xn)
n
< H(X) + ≤0.
By (2.25) in the text, this is the defining equation of Tn
≤0 , so all xn in Sn
≤ are also in Tn
≤0 .
11

(d) For each j in the alphabet, the WLLN says that for any given ≤ > 0 and δ > 0, and for all
sufficiently large n,
Pr
μØØØØ
Nj(xn)
n − pj
ØØØØ
∂
≤
≥ ≤
δ
M
. (7)
For all sufficiently large n, (7) is satisfied for all j, 1 ≤ j ≤ M. For all such large enough n, each
xn is either in Sn
≤ or is a member of the event that |Nj (xn)
n − pj | ≥ ≤ for some j. The union of
the events that |Nj (xn)
n − pj | ≥ ≤ for some j is upper bounded by δ, so Pr(Sn
≤ ) ≥ 1 − δ.
(e) The proof here is exactly the same as that of Theorem 2.7.1. Part (b) gives upper and lower
bounds on Pr(xn) for xn ∈ Sn
and (d) shows that 1 ≤ − δ ≤ Pr(Sn
≤ ≤ 1, which together give the
desired bounds on the number of elements in Sn
≤ .
Exercise 2.30:
(a) First note that the chain is ergodic (i.e., it is aperiodic and all states can be reached P
from all
other states). Thus steady state probabilities q(s) exist and satisfy the equations
s q(s) = 1
and q(s) =
P
s0 q(s0)Q(s|s0). For the given chain, these latter equations are
q(1) = q(1)(1/2) + q(2)(1/2) + q(4)(1)
q(2) = q(1)(1/2)
q(3) = q(2)(1/2)
q(4) = q(3)(1).
Solving by inspection, q(1) = 1/2, q(2) = 1/4, and q(3) = q(4) = 1/8.
(b) To calculate H(X1) we first calculate the P
pmf pX1(x) for each x ∈ X. Using the steady state
probabilities q(s) for S0, we have pX1(x) =
s q(s) Pr{X1=x|S0=s}. Since X1=a occurs with
probability 1/2 from both S0=0 and S0=2 and occurs with probability 1 from S0=4,
pX1(a) = q(0)
1
2
+ q(2)
1
2
+ q(4) =
1
2.
Similarly, pX1(b) = pX1(c) = 1/4. Hence the pmf of X1 is
©12
, 14
, 14
™
and H(X1) = 3/2.
(c) The pmf of X1 conditioned on S0 = 1 is {12
, 12
}. Hence, H(X1|S0=1) = 1. Similarly,
H(X1|S0=2)=1. There is no uncertainty from states 3 and 4, so H(X1|S0=3) = H(X1|S0=4) =
0.
P
Since H(X1|S0) is defined as
Pr(S0 = s)H(X|S0=s), we have
s H(X1|S0) = q(0)H(X1|S0=0) + q(1)H(X1|S0=0) =
3
4,
which is less than H(X1) as expected.
(d) We can achieve L = H(X1|S0) by achieving L(s) = H(X1|s) for each state s ∈ S. To do
that, we use an optimal prefix-free code for each state.
For S0 = 1, the code {a → 0, b → 1} is optimal with L(S0=1) = 1 = H(X1|S0=1).
Similarly, for S0=2 {a → 0, c → 1} is optimal with L(S0=2) = 1 = H(X1|S0=2).
Since H(X1|S0=3) = H(X1|S0=4) = 0, we do not use any code at all for the states 3 and 4.
In other words, our encoder does not transmit any bits for symbols that result from transitions
from these states.
12

Now we explain why the decoder can track the state after time 0. The decoder is assumed
to know the initial state. When in states 1 or 2, the next codeword from the corresponding
prefix-free code uniquely determines the next state. When state 3 is entered, the the next state
must be 4 since there is a single deterministic transition out of state 3 that goes to state 4 (and
this is known without receiving the next codeword). Similarly, when state 4 is entered, the next
state must be 1. When states 3 or 4 are entered, the next received codeword corresponds to the
subsequent transition out of state 1. In this manner, the decoder can keep track of the state.
(e)The question is slightly ambiguous. The intended meaning is how many source symbols
x1, x2, . . . , xk must be observed before the new state sk is known, but one could possibly interpret
it as determining the initial state s0.
To determine the new state, note that the symbol a always drives the chain to state 0 and the
symbol b always drives it to state 2. The symbol c, however, could lead to either state 3 or 4.
In this case, the subsequent symbol could be c, leading to state 4 with certainty, or could be a,
leading to state 1. Thus at most 2 symbols are needed to determine the new state.
Determining the initial state, on the other hand, is not always possible. The symbol a could
come from states 1, 2, or 4, and no future symbols can resolve this ambiguity.
A more interesting problem is to determine the state, and thus to start decoding correctly, when
the initial state is unknown at the decoder. For the code above, this is easy, since whenever
a 0 appears in the encoded stream, the corresponding symbol is a and the next state is 0,
permitting correct decoding from then on. This problem, known as the synchronizing problem,
is quite challenging even for memoryless sources.
Exercise 2.31: We know from (2.37) in the text that H(XY ) = H(Y ) + H(X | Y ) for any
random symbols X and Y . For any k-tuple X1, . . . ,Xk of random symbols, we can view Xk
as the symbol X above and view the k − 1 tuple Xk−1,Xk−2, . . . ,X1 as the symbol Y above,
getting
H(Xk,Xk−1 . . . ,X1) = H(Xk | Xk−1, . . . ,X1) + H(Xk−1, . . . ,X1).
Since this expresses the entropy of each k-tuple in terms of a k−1-tuple, we can iterate, getting
H(Xn,Xn−1, . . . ,X1) = H(Xn | Xn−1, . . . ,X1) + H(Xn−1, . . . ,X1)
= H(Xn | Xn−1, . . . ,X1) + H(Xn−1 | Xn−2 . . . ,X1) + H(Xn−2 . . . ,X1)
Xn
= · · · =
k=2
H(Xk | Xk−1, . . . ,X1) + H(X1).
Exercise 2.32:
(a) We must show that H(S2|S1S0) = H(S2|S1). Viewing the pair of random symbols S1S0 as a
random symbol in its own right, the definition of conditional entropy is
H(S2|S1S0) =
X
s1,s0
Pr(S1, S0 = s1, s0)H(S2|S1=s1, S0=s0)
=
X
s1s0
Pr(s1s0)H(S2|s1s0). (8)
13

where we will use the above abbreviations throughout for clarity. By the Markov property,
Pr(S2=s2|s1s0) = Pr(S2=s2|s1) for all symbols s0, s1, s2. Thus
H(S2|s1s0) =
X
s2
−Pr(S2=s2|s1s0) log Pr(S2=s2|s1s0)
=
X
s2
−Pr(S2=s2|s1) log Pr(S2=s2|s1) = H(S2|s1).
Substituting this in (8), we get
H(S2|S1S0) =
X
s1s0
Pr(s1s0)H(S2|s1)
=
X
s1
Pr(s1)H(S2|s1) = H(S2|S1). (9)
(b) Using the result of Exercise 2.31,
H(S0, S1, . . . , Sn) =
Xn
k=1
H(Sk | Sk−1, . . . , S0) + H(S0).
Viewing S0 as one symbol and the n-tuple S1, . . . , Sn as another,
H(S0, . . . , Sn) = H(S1, . . . , Sn | S0) + H(S0).
Combining these two equations,
H(S1, . . . , Sn | S0) =
Xn
k=1
H(Sk | Sk−1, . . . , S0). (10)
Applying the same argument as in part (a), we see that
H(Sk | Sk−1, . . . , S0) = H(Sk | Sk−1).
Substituting this into (10),
H(S1, . . . , Sn | S0) =
Xn
k=1
H(Sk | Sk−1).
(c) If the chain starts in steady state, each successive state has the same steady state pmf, so
each of the terms above are the same and
H(S1, . . . , Sn|S0) = nH(S1|S0).
(d) By definition of a Markov source, the state S0 and the next source symbol X1 uniquely
determine the next state S1 (and vice-versa). Also, given state S1, the next symbol X2 uniquely
determines the next state S2. Thus, Pr(x1x2|s0) = Pr(s1s2|s0) where x1x2 are the sample
values of X1X2 in one-to-one correspondence to the sample values s1s2 of S1S2, all conditional
on S0 = s0.
Hence the joint pmf of X1X2 conditioned on S0=s0 is the same as the joint pmf for S1S2
conditioned on S0=s0. The result follows.
(e) Combining the results of (c) and (d) verifies (2.40) in the text.
Exercise 2.33: Lempel-Ziv parsing of the given string can be done as follows:
14

Step 1: 00011101 |0{0z1} 0101100
↑ u = 7 n = 3
Step 2: 00011101001 |01{z01} 100
u = 2 ↑ n = 4
Step 3: 000111010010101 |1{0z0}
↑ u = 8 n = 3
The string is parsed in three steps. In each step, the window is underlined and the parsed block
is underbraced. The (n, u) pairs resulting from these steps are respectively (3,7), (4,2), and
(3,8).
Using the unary-binary code for n, which maps 3 → 011 and 4 → 00100, and a standard 3-bit
map for u, 1 ≤ u ≤ 8, the encoded sequence is 011, 111, 00100, 010, 011, 000 (transmitted without
commas).
Note that for small examples, as in this case, LZ77 may not be very efficient. In general, the
algorithm requires much larger window sizes to compress efficiently.
Chapter 3
Exercise 3.3:
(a) Given a1 and a2, the Lloyd-Max conditions assert that b should be chosen half way between
them, i.e., b = (a1+a2)/2. This insures that all points are mapped into the closest quantization
point. If the probability density is zero in some region around (a1 + a2)/2, then it makes no
difference where b is chosen within this region, since those points can not affect the MSE.
(b) Note that y(x)/Q(x) is the expected value of U conditional on U ≥ x, Thus, given b, the
MSE choice for a2 is y(b)/Q(b). Similarly, a1 is (E[U] − y(b))/1 − Q(x). Using the symmetry
condition, E[U] = 0, so
a1 = −y(b)
1 − Q(b) a2 = y(b)
Q(b) . (11)
(c) Because of the symmetry,
Q(0) =
Z
1
0
f(u) du =
Z
1
0
f(−u) du =
Z 0
−1
f(u) du = 1 − Q(0).
This implicity assumes that there is no impulse of probability density at the origin, since such
an impulse would cause the integrals to be ill-defined. Thus, with b = 0, (11) implies that
a1 = −a2.
(d) Part (c) shows that for b = 0, a1 = −a2 satisfies step 2 in the Lloyd-Max algorithm, and
then b = 0 = (a1 + a2)/2 then satisfies step 3.
(e) The solution in part (d) for the density below is b = 0, a1 = −2/3, and a2 = 2/3. Another
solution is a2 = 1, a1 = −1/2 and b = 1/3. The final solution is the mirror image of the second,
namely a1 = −1, a2 = 1/2, and b = −1/3.
15

1
3≤
1
3≤
✲ ✛≤ ✲ ✛≤ ✲ ✛≤
-1 0 1
1
3≤
f(u)
(f) The MSE for the first solution above (b = 0) is 2/9. That for each of the other solutions
is 1/6. These latter two solutions are optimal. On reflection, choosing the separation point b
in the middle of one of the probability pulses seems like a bad idea, but the main point of the
problem is that finding optimal solutions to MSE problems is often messy, despite the apparent
simplicity of the Lloyd-Max algorithm.
Exercise 3.4:
(a) Using the hint, we minimize
MSE(Δ1,Δ2) + Πf(Δ1,Δ2) =
1
12
£
Δ21
f1L1 + Δ22
f2L2
§
+ Π
Σ
L1
Δ1
+
L2
Δ2
Π
over both Δ1 and Δ2. The function is convex over Δ1 and Δ2, so we simply take the derivative
with respect to each and set it equal to 0, i.e.,
1
6
Δ1f1L1 − Π
L1
Δ21
= 0;
1
6
Δ2f2L2 − Π
L2
Δ22
= 0.
Rearranging,
6Π = Δ31
f1 = Δ32
f2,
which means that for each choice of Π, Δ1f1/3
1 = Δ2f1/3
2 .
21
(b) We see from part (a) that Δ1/Δ2 = (f2/f1)1/3 is fixed independent of M. Holding this ratio
fixed, MSE is proportional to Δand M is proportional to 1/Δ1 Thus M2 MSE is independent
of Δ1 (for the fixed ratio Δ1/Δ2).
M2MSE =
1
12
Σ
f1L1 +
Δ22
Δ21
f2L2
Π Σ
L1 + L2
Δ1
Δ2
Π2
=
1
12
h
f1L1 + f2/3
1 f1/3
2 L2
i "
L1 + L2
f1/3
2
f1/3
1
#2
=
1
12
h
f1/3
1 L1 + f1/3
2 L2
i3
.
(c) If the algorithm starts with M1 points uniformly spaced over the first region and M2 points
uniformly spaced in the second region, then it is in equilibrium and never changes.
(d) If the algorithm starts with one additional point in the central region of zero probability
density, and if that point is more than Δ1/2 away from region 1 and δ2/2 away from region
2, then the central point is unused (with probability 1). Since the conditional mean over the
region mapping into that central point is not well defined, it is not clear what the algorithm
will do. If it views that conditional mean as being in the center of the region mapped into that
16

2j
point, then the algorihm is in equilibrium. The point of parts (c) and (d) is to point out that
the Lloyd-Max algorithm is not very good at finding a global optimum.
(e) The probability that the sample point lies in region j (j = 1, 2) is fjLj . The mean square
error, using Mj points in region j and conditional on lying in region j, is L/(12M2
). Thus, the
j MSE with Mj points in region j is
MSE =
f1L31
12M21
+
f2L32
12(M2)2 .
This can be minimized numerically over integer M1 subject to M1 + M2 = M. This was
minimized in part (b) without the integer constraint, and thus the solution here is slightly
larger than that there, except in the special cases where the non-integer solution happens to be
integer.
(f) With given Δ1 and Δ2, the probability of each point in region j, j = 1, 2, is fjΔj and the
number of such points is Lj/Δj (assumed to be integer). Thus the entropy is
H(V ) = L1
Δ1
(f1Δ1) ln
μ
1
f1L1
∂
+ L2
Δ2
(f2Δ2) ln
μ
1
f2L2
∂
= −L1f1 ln(f1L1) − L2f2 ln(f2L2).
(g) We use the same Lagrange multiplier approach as in part (a), now using the entropy H(V )
as the constraint.
MSE(Δ1,Δ2) + ΠH(Δ1,Δ2) =
1
12
£
Δ21
f1L1 + Δ22
f2L2
§
− Πf1L1 ln(f1Δ1) − Πf2L2 ln(f2Δ2).
Setting the derivatives with respect to Δ1 and Δ2 equal to zero,
1
6
Δ1f1L1 −
Πf1L1
Δ1
= 0;
1
6
Δ2f2L2 −
Πf2L2
Δ2
= 0.
This leads to 6Π = Δ21
= Δ22
, so that Δ1 = Δ2. This is the same type of approximation as before
since it ignores the constraint that L1/Δ1 and L2/Δ2 must be integers.
Exercise 3.6:
(a) The probability of the quantization region R is A = Δ(12
+x+ Δ2
). To simplify the algebraic
Δ2
12
1A
messiness, shift U to U − x − Δ/2, which, conditional on R, lies in [−Δ/2,Δ/2]. Let Y denote
this shifted conditional variable. As shown below, fY (y) = [y + (x++)].
E[Y ] =
Z Δ/2
−Δ/2
y
A
[y + (x +
1
2
+
Δ
2
)] dy =
Z Δ/2
−Δ/2
y2
A
dy +
Z Δ/2
−Δ/2
y
A
[x +
1
2
+
Δ
2
] dy =
Δ3
12A
,
since, by symmetry, the final integral above is 0.
✟✟✟✟✟✟✟✟
x
fU(u)
✛Δ
1
2 + x
✲
−Δ
2
fY (y) = 1
Δ2
A(y+x+1
2+Δ
2 )
y
17

Since Y is the shift of U conditioned on R,
E[U|R] = x +
Δ
2
+ E[Y ] = x +
Δ
2
+
Δ3
12A
.
That is, the conditional mean is slightly larger than the center of the region R because of the
increasing density in the region.
(b) Since the variance of a rv is invariant to shifts, MSE= σ2U
|R
= σ2Y
. Also, note from symmetry
that
R Δ/2
−Δ/2 y3 dy = 0. Thus
E[Y 2] =
Z Δ/2
−Δ/2
y2
A
Σ
y + (x +
1
2
+
Δ
2
Π
dy =
)
(x + 12
+ Δ2
)
A
Δ3
12
=
Δ2
12 .
MSE = σ2Y
= E[Y 2] − (E[Y ])2 =
Δ2
12 −
Σ
Δ3
12A
Π2
.
MSE −
Δ2
12
= −
Σ
Δ3
12A
Π2
=
Δ4
144(x + 12
+ Δ2
)2
.
(c) The quantizer output V is a discrete random variable whose entropy H[V ] is
H(V ) =
MX
j=1
Z jΔ
(j−1)Δ −fU(u) log[f(u)Δ] du =
Z 1
0 −fU(u) log[f(u)] du − logΔ
and the entropy of h(U) is by definition
h[U] =
Z 1
−0 −fU(u) log[fU(u)] du.
Thus,
h[U] − logΔ − H[V ] =
Z 1
0
fU(u) log[f(u)/fU(u)] du.
(d) Using the inequality ln x ≤ x − 1,
Z 1
0
fU(u) log[f(u)/fU(u)] du ≤ log e
Z 1
0
fU(u)[f(u)/fU(u) − 1] du
= log e
ΣZ 1
0
f(u) du −
Z 1
0
Π
fU(u)
= 0.
Thus, the difference h[U] − logΔ − H[V ] is non-positive (not non-negative).
(e) Approximating ln x by (1+x) − (1+x)2/2 for x = f(u)/f(u) and recognizing from part d)
that the integral for the linear term is 0, we get
Z 1
0
fU(u) log[f(u)/fU(u)] du ≈ −
1
2
log e
Z 1
0
Σ
f(u)
f(u) − 1
Π2
du (12)
= −
1
2
log e
Z 1
0
[f(u) − fU(u)]2
fU(u) du. (13)
18

Now f(u) varies by at most Δ over any single region, and f(u) lies between the minimum and
maximum f(u) in that region. Thus |f(u) − f(u)| ≤ Δ. Since f(u) ≥ 1/2, the integrand above
is at most 2Δ2, so the right side of (13) is at most Δ2 log e.
Exercise 3.7:
(a) Note that 1
u(ln u)2 is the derivative of −1/ ln u and thus integrates to 1 over the given interval.
(b)
h(U) =
Z
1
e
1
u(ln u)2 [ln u + 2 ln(ln u)] du =
Z
1
e
1
u ln u
du +
Z
1
e
2 ln(ln u)
u(ln u)2 du.
The first integrand above is the derivative of ln(ln u) and thus the integral is infinite. The second
integrand is positive for large enough u, and therefore h(U) is infinite.
(c) The hint establishes the result directly.
Exercise 3.8:
(a) As suggested in the hint2, (and using common sense in any region where f(x) = 0)
−D(fkg) =
Z
f(x) ln g(x)
f(x) dx
≤
Z
f(x)
Σ
g(x)
f(x) − 1
Π
dx =
Z
g(x) dx −
Z
f(x) dx = 0.
Thus D(fkg) ≥ 0,
(b)
D(fkφ) =
Z
f(x) ln f(x)
φ(x) dx
= −h(f) +
Z
f(x)
Σ
ln√2πσ2 + x2
2σ2
Π
dx
= −h(f) + √2πeσ2.
(c) Combining parts (a) and (b), h(f) ≤ √2πeσ2. Since D(φkφ) = 0, this inequality is satisfied
with equality for a Gaussian rv ∼ N(0, σ2).
Exercise 3.9:
(a) For the same reason as for sources with probability densities, each representation point aj
must be chosen as the conditional mean of the set of symbols in Rj . Specifically,
aj =
P
i∈Rj pi ri P
i∈Rj pi
.
2A useful feature of divergence is that it exists whether or not a density exists; it can be defined over any
quantization of the sample space and it increases as the quantization becomes finer, thus approaching a limit
(which might be finite or infinite).
19

(b) The symbol ri has a squared error |ri − aj |2 if mapped into Rj and thus into aj . Thus ri
must be mapped into the closest aj and thus the region Rj must contain all source symbols
that are closer to aj than to any other representation point. The quantization intervals are not
uniquely determined by this rule since Rj can end and Rj+1 can begin at any point between
the largest source symbol closest to aj and the smallest source symbol closest to aj+1.
(c) For ri midway between aj and aj+1, the squared error is |ri − aj |2 = |ri − aj+1|2 no matter
whether ri is mapped into aj or aj+1.
(d) In order for the case of part (c) to achieve MMSE, it is necessary for aj and aj+1 to each
be the conditional mean of the set of points in the corresponding region. Now assume that aj
is the conditional mean of Rj under the assumption that ri is part of Rj . Switching ri to Rj+1
will not change the MSE (as seen in part (c)), but it will change Rj and will thus change the
conditional mean of Rj . Moving aj to that new conditional mean will reduce the MSE. The
same argument applies if ri is viewed as being in Rj+1 or even if it is viewed as being partly in
Rj and partly in Rj+1.
20

Chapter 4
Exercise 4.2:
From (4.1) in the text, we have u(t) =
P
1k
=−1
ûke2πikt/T for t ∈ [−T/2, T/2]. Substituting this
into
R T/2
−T/2 u(t)u∗(t)dt, we have
Z T/2
−T/2 |u(t)|2dt =
Z T/2
−T/2
1X
k=−1
ûke2πikt/T 1X
`=−1
û∗`
e−2πi`t/T dt
=
1X
k=−1
1X
`=−1
ûk û∗`
Z T/2
−T/2
e2πi(k−`)t/T dt
=
1X
k=−1
1X
`=−1
ûk û∗`
Tδk,`,
where δk,` equals 1 if k = ` and 0 otherwise. Thus,
Z T/2
−T/2 |u(t)|2dt = T
1X
k=−1
|ûk|2.
Exercise 4.4:
(a) Note that sa(k) − sa(k − 1) = ak ≥ 0, so the sequence sa(1), sa(2), . . . , is non-decreasing.
A standard result in elementary analysis states that a bounded non-decreasing sequence must
have a limit. The limit is the least upper bound of the sequence {sa(k); k ≥ 1}.
(b) Let Jk = max{j(a), j(2), . . . , j(k), i.e., Jk is the largest index in aj(1), . . . , aj(k). Then
Xk
`=1
b` =
Xk
`=1
aj(`) ≤
XJk
j=1
aj ≤ Sa.
By the same argument as in part (a),
Pk `=1 b` has a limit as k → 1 and the limit, say Sb is at
most Sa.
(c) Using the inverse permutation to define the sequence {ak} from the sequence {bk}, the same
argument as in part (b) shows that Sa ≤ Sb. Thus Sa = Sb and the limit is independent of the
order of summation.
(d) The simplest example is the sequence {1,−1, 1,−1, . . . }. The partial sums here alternate
between 1 and 0, so do not converge at all. Also, in a sequence taking two odd terms for each
even term, the series goes to 1. A more common (but complicated) example is the alternating
harmonic series. This converges to 0, but taking two odd terms for each even term, the series
approaches 1.
Exercise 4.5:
(a) For E = I1 ∪I2, with the left end points satisfying a1 ≤ a2, there are three cases to consider.
21

• a2 < b1. In this case, all points in I1 and all points in I2 lie between a1 and max{b1, b2}.
Conversely all points in (a1, max{b1, b2}) lie in either I1 or I2. Thus E is a single interval
which might or might not include each end point.
• a2 > b1. In this case, I1 and I2 are disjoint.
• a2 = b1. If I1 is open on the right and I2 is open on the left, then I1 and I2 are separated
by the single point a2 = b1. Otherwise E is a single interval.
(b) Let Ek = I1∪I2∪· · ·∪Ik and let Jk be the final interval in the separated interval representation
of Ek. We have seen how to find J2 from E2 and note that the starting point of J2 is either a1
or a2. Assume that in general the starting point of Jk is aj for some j, 1 ≤ j ≤ k.
Assuming that the starting points are ordered a1 ≤ a2 ≤ · · · , we see that ak+1 is greater than
or equal to the starting point of Jk. Thus Jk ∪ Ik+1 is either a single interval or two separated
intervals by the argument in part (a). Thus Ek+1, in separated interval form, is Ek, with Jk
replaced either by two separated intervals, the latter starting with ak+1, or by a single interval
starting with the same starting point as Jk. Either way the starting point of Jk+1 is aj for some
j, 1 ≤ j ≤ k+1, verifying the initial assumption by induction.
(c) Each interval Jk created above starts with an interval starting point a1, . . . , and ends with
an interval ending point b1, . . . , and therefore all the separated intervals start and end with such
points.
(d) Let I01
∪ · · · ∪ I0`
be the union of disjoint intervals arising from the above algorithm and let
I100 ∪ · · · ∪ I00 i be any other ordered union of separated intervals. Let k be the smallest integer for
which I0k
6= I00 k . Then the starting points or the ending points of these intervals are different, or
one of the two intervals is open and one closed on one end. In all of these cases, there is at least
one point that is in one of the unions and not the other.
0j
Exercise 4.6:
0j
(a) 0j
0jIf we assume that the intervals {Ij ; 1 ≤ j < 1} are ordered in terms of starting points, then
the argument in Exercise 4.5 immediately shows that the set of separated intervals stays the
same as each new new interval Ik+1 is added except for the possible addition of a new interval
at the right or the expansion of the right most interval. However, with a countably infinite set
of intervals, it is not necessarily possible to order the intervals in terms of starting points (e.g.,
suppose the left points are the set of rationals in (0,1)). However, in the general case, in going
from Bk to Bk+1, a single interval Ik+1 is added to Bk. This can add a new separated interval,
or extend one of the existing separated intervals, or combine two or more adjacent separated
intervals. In each of these cases, each of the separated intervals in Bk (including Ij,k) either
stays the same or is expanded. Thus Ij,k ⊆ Ij,k+1.
(b) Since Ij,k ⊆ Ij,k+1, the left end points of the sequence {Ij,k; k ≥ j} is a monotonic decreasing
sequence and thus has a limit (including the possibility of −1). Similarly the right end points
are monotonically increasing, and thus have a limit (possibly +1). Thus limk→1 Ij,k exists as
an interval Ithat might be infinite on either end. Note now that any point in the interior of Imust be in Ij,k for some k. The same is true for the left (right) end point of Iif Iis closed on
the left (right). Thus I0j
must be in B for each j.
(c) From Exercise 4.5, we know that for each k ≥ 1, the set of intervals {I1,k, I2,k, . . . , Ik,k} is
a separated set whose union is Bk. Thus, for each `, j ≤ k, either I`,k = Ij,k or I`,k and Ij,k are
22

0j
separated. If 0`
I`,k = Ij,k, then the fact that Ij,k ⊆ Ij,k+1 ensures that I`,k+1 = Ij,k+1, and thus,
in the limit, I= I. If I`,k and Ij,k are separated, then, as explained in part (a), the addition
0j
of Ik+1 either maintains the separation or combines I`,k and Ij,k into a single interval. Thus, as
k increases, either I`,k and Ij,k remain separated or become equal.
(d) The sequence {I; j ≥ 1} is countable, and after removing repetitions it is still countable. It
is a separated sequence of intervals from (c). From (b), ∪1j
=1 ⊆ B. Also, since B = ∪jIj ⊆ ∪jI0j
,
we see that B = ∪jI0j
.
(e) Let {I0j
; j ≥ 1} be the above sequence of separated intervals and let {I00 j ; j ≥ 1} be any other
sequence of separated intervals such that ∪jI00 j = B. For each j ≥ 1, let c0j be the center point
of I0j
. Since c0j is in B, cj ∈ I00 k for some k ≥ 1. Assume first that I0j
is open on the left. Letting
a0j be the left end point of I0j
, the interval (a0j , c0j ] must be contained in I00 k . Since a0j /∈ B, a0j
must be the left end point of I00 k and I00 k must be open on the left. Similarly, if I0j
is closed on
0j
0j
the left, a0is the left end point of I00 and I00 is closed on the left. Using the same analysis on
j k k the right end point of I, we see that I= I00 k . Thus the sequence {I00 j ; j ≥ 1} contains each
interval in {I0j
; j ≥ 1}. The same analysis applied to each interval in {I00 j ; j ≥ 1} shows that
{I0j
; j ≥ 1} contains each interval in {I00 j ; j ≥ 1}, and thus the two sequences are the same except
for possibly different orderings.
Exercise 4.7:
(a) and (b) For any finite unions of intervals E1 and E2, (4.87) in the text states that
μ(E1) + μ(E2) = μ(E1 ∪ E2) + μ(E1 ∩ E2) ≥ μ(E1 ∪ E2),
kj
where the final inequality follows from the non-negativity of measure and is satisfied with equality
if E1 and E2 are disjoint. For part (a), let I1 = E1 and I2 = E2 and for part (b), let Bk = E1 and
Ik+1 = E2.
(c) For k = 2, part (a) shows that μ(Bk) ≤ μ(I1) + μ(I2). Using this Pas the initial step of the
induction and using part (b) for the inductive step shows that μ(Bk) ≤
μ(Ij) with equality
=1 in the disjoint case.
(d) First assume that μ(B) is finite (this is always the case for measure over the interval
[−T/2, T/2]). Then since Bk is non-decreasing in k,
μ(B) = lim
k→1
μ(Bk) ≤ lim
k→1
Xk
j=1
μ(Ik).
Alternatively, if μ(B) = 1, then limk→1
Pkj
=1 μ(Ik) = 1 also.
Exercise 4.8: Let Bn = ∪1j
=1In,j. Then B = ∪n,jIn,j. The collection of intervals {In,j; 1 ≤ n ≤
1, 1 ≤ j ≤ 1} is a countable collection of intervals since the set of pairs of positive integers is
countable.
Exercise 4.12:
(a) By combining parts (a) and (c) of Exercise 4.11, {t : u(t) > β} is measurable for all β.
Thus, {t : −u(t) < −β} is measurable for all β, so −u(t) is measurable. Next, for β > 0,
{t : |u(t)| < β} = {t : u(t) < β} ∩ {t : u(t) > −β}, which is measurable.
23

(b) {t : u(t) < β} = {t : g(u(t)) < g(β), so if u(t) is measurable, then g(u(t) is also.
(c) Since exp(·) is increasing, exp[u(t)] is measurable by part (b). Part (a) shows that |u(t)| is
measurable if u(t) is. Both the squaring function and the log function are increasing for positive
values, so u2(t) = |u(t)|2 and log(|u(t)| are measurable.
Exercise 4.13:
(a) Let y(t) = u(t) + v(t). We will show that {t : y(t) < β) is measurable for all real β. Let
≤ > 0 be arbitrary and k ∈ Z be arbitrary. Then, for any given t,
(k − 1)≤ ≤ u(t) < k≤ and v(t) < β − k≤) =⇒ y(t) < β.
This means that the set of t for which the left side holds is included in the set of t for which the
right side holds, so
{t : (k − 1)≤ ≤ u(t) < k≤} ∩ {t : v(t) < β − k≤)} ⊆ {t : y(t) < β}.
This subset inequality holds for each integer k and thus must hold for the union over k,
[
k
h
{t : (k − 1)≤ ≤ u(t) < k≤} ∩ {t : v(t) < β − k≤)}
i
⊆ {t : y(t) < β}.
Finally this must hold for all ≤ > 0, so we choose a sequence 1/n for n ≥ 1, yielding
[
n≥1
[
k
h
{t : (k − 1)/n ≤ u(t) < k/n} ∩ {t : v(t) < β − k/n)}
i
⊆ {t : y(t) < β}.
The set on the left is a countable union of measurable sets and thus is measurable. It is also
equal to {t : y(t) < β}, since any t in this set also satisfies y(t) < β−1/n for sufficiently large n.
(b) This can be shown by an adaptation of the argument in (a). If u(t) and v(t) are positive
functions, it can also be shown by observing that ln u(t) and ln v(t) are measurable. Thus the
sum is measurable by part (a) and exp[ln u(t) + ln v(t)] is measurable.
Exercise 4.14: The hint says it all.
Exercise 4.15: (a) Restrict attention to t ∈ [−T/2, T/2] throughout. First we show that
vm(t) = inf1n=m un(t) is measurable for all m ≥ 1. For any given t, if un(t) ≥ V for all n ≥ m,
then V is a lower bound to un(t) over n ≥ m, and thus V is greater than or equal to the greatest
such lower bound, i.e., V ≥ vm(t). Similarly, vm(t) ≥ V implies that un(t) ≥ V for all n ≥ m.
Thus,
{t : vm(t) ≥ V } =
1
n=m{t : un(t) ≥ V }.
Using Exercise 4.11, the measurability of un implies that {t : un(t) ≥ V } is measurable for
each n. The countable intersection above is therefore measurable, and thus, using the result of
Exercise 4.11 again, vm(t) is measurable for each m.
24

Next, if vm(t) ≥ V then vm0(t) ≥ V for all m0 > m. This means that vm(t) is a non-decreasing
function of m for each t, and thus limm vm(t) exists for each t. This also means that
{t : lim
m→1
vm(t) ≥ V } =
1[
m=1
"
1
#
.
n=m{t : un(t) ≥ V }
This is a countable union of measurable sets and is thus measurable, showing that lim inf un(t)
is measurable.
(b) If lim infn un(t) = V1 for a given t, then limm vm(t) = V1, which implies that for the given
t, the sequence {un(t); n ≥ 1} has a subsequence that approaches V1 as a limit. Similarly, if
lim supn un(t) = V2 for that t, then the sequence {un(t), n ≥ 1} has a subsequence approaching
V2. If V1 < V2, then limn un(t) does not exist for that t, since the sequence oscillates infinitely
between V1 and V2. If V1 = V2, the limit does exist and equals V1.
(c) Using the same argument as in part (a), with inf and sup interchanged,
{t : lim sup un(t) ≤ V } =
1
m=1
"
1[
#
n=m{t : un(t) ≤ V }
is also measurable, and thus lim sup un(t) is measurable. It follows from this, with the help of
Exercise 4.13 (a), that lim supn un(t) − lim infn un(t) is measurable. Using part (b), limn un(t)
exists if and only if this difference equals 0. Thus the set of points on which limn un(t) exists is
measurable and the function that is this limit when it exists and 0 otherwise is measurable.
Exercise 4.16: As seen below, un(t) is a rectangular pulse taking the value 2n from 1
2n+1 to
3
2n+1 . It follows that for any t ≤ 0, un(t) = 0 for all n. For any fixed t > 0, we can visually see
that for n large enough, un(t) = 0. Since un(t) is 0 for all t greater than 3
2n+1 , then for any fixed
t > 0, un(t) = 0 for all n > log2
3t
− 1. Thus limn→1 un(t) = 0 for all t.
Since limn→1 un(t) = 0 for all t, it follows that
R
R R
limn→1 un(t)dt = 0. On the other hand,
un(t)dt = 1 for all n so limn→1
un(t)dt = 1.
1/16 3/16
1/4 3/8 3/4
0 1/8
Exercise 4.17:
(a) Since u(t) is real valued,
ØØØØ
ØØØØ
Z
u(t)dt
=
ØØØØ
Z
u+(t)dt −
ØØØØ
Z
u−(t)dt
≤
ØØØØ
ØØØØ
Z
u+(t)dt
+
ØØØØ
ØØØØ
Z
u−(t)dt
=
Z ØØ
ØØ
u+(t)
dt +
Z ØØ
ØØ
u−(t)
dt
=
Z
u+(t)dt +
Z
u−(t)dt =
Z
|u(t)|dt.
25

(b) As in the hint we select α such that α
R
u(t)dt is non-negative and real and |α| = 1. Now
let R
αu(t) = v(t) + jw(t) where R
v(t) and w(t) R
are the real and imaginary part of αu(t). Since
α
u(t)dt is real, we have
w(t)dt = 0 and α
u(t)dt =
R
v(t)dt. Note also that |v(t)| ≤ |αu(t)|.
Hence
ØØØØ
ØØØØ
Z
u(t)dt
=
ØØØØ
α
ØØØØ
Z
u(t)dt
=
ØØØØ
ØØØØ
Z
v(t)dt
Z
≤
|v(t)| dt (part a)
Z
≤
|αu(t)| dt
=
Z
|α| |u(t)| dt
=
Z
|u(t)| dt.
Exercise 4.18:
(a) The meaning of u(t) = v(t) a.e. is that μ{t : R |u(t) − v(t)| > 0} = 0. It follows that
|u(t) − v(t)|2dt = 0. Thus u(t) and v(t) are L2 equivalent.
(b) If u(t) and v(t) are L2 equivalent, then
R
|u(R
t)−v(t)|2dt = 0. Now suppose that μ{t : |u(t)− v(t)|2 > ≤} is non-zero for some ≤ > 0. Then
|u(t) − v(t)|2dt ≥ ≤μ{t : |u(t) − v(t)|2 > ≤} > 0
which contradicts the assumption that u(t) and v(t) are L2 equivalent.
(c) The set {t : |u(t) − v(t)| > 0} can be expressed as
{t : |u(t) − v(t)| > 0} =
[
n≥1
{t : |u(t) − v(t)| > 1/n}.
Since each term on the right has zero measure, the countable union also has zero measure. Thus
{t : |u(t) − v(t)| > 0} has zero measure and u(t) = v(t) a.e.
Exercise 4.21:
(a) By expanding the magnitude squared within the given integral as a product of the function
and its complex conjugate, we get
Z ØØ Øu(t) −
Xn
m=−n
X`
k=−`
ØØØ
ûk,mθk,m(t)
2
dt =
Z
|u(t)|2 dt −
Xn
m=−n
X`
k=−`
T|ûk,m|2. (14)
Since each increase in n (or similarly in `) subtracts additional non-negative terms, the given
integral is non-increasing in n and `.
(b) and (c) The set of terms T|ûk,m|2 for k ∈ Z and m ∈ Z is a countable set of non-negative
terms with a sum bounded by
R
|u(t)|2 dt, which is finite since u(t) is L2. Thus, using the result
of Exercise 4.4, the sum over this set of terms is independent of the ordering of the summation.
Any scheme for increasing n and ` in any order relative to each other in (14) is just an example
of this more general ordering and must converge to the same quantity.
26

Since um(t) = u(t)rect(t/T − m) satisfies
R
|um(t)|2 dt = T
P
k |uk,m|2 by Theorem 4.4.1 of the
text, it is clear that the limit of (14) as n, ` → 1 is 0, so the limit is the same for any ordering.
There is a subtlety above which is important to understand, but not so important as far as
developing the notation to avoid the subtlety. The easiest way to understand (14) is by under-standing
that
R
|um(t)|2 dt = T
P
k |uk,m|2, which suggests taking the limit k → ±1 for each
value of m in (14). This does not correspond to a countable ordering of (k,m). This can be
straightened out with epsilons and deltas, but is better left to the imagination of the reader.
Exercise 4.22:
(a) First note that:
Xn
m=−n
um(t) =


0 |t| > (n + 1/2)T
2u(t) t = (m+ 1/2)T, |m| < n
u(t) otherwise.
Z ØØØØØ
u(t) −
Xn
m=−n
ØØØØØ
um(t)
2
dt =
Z (−n−1/2)T
−1 |u(t)|2 +
Z
1
(n+1/2)T |u(t)|2 dt.
By the definition of an L2 function over an infinite time interval, each of the integrals on the
right approach 0 with increasing n.
(b) Let u`
m(t) =
P`
k=−` ûk,mθk,m(t). Note that
Pn
m=−n u`
m(t) = 0 for |t| > (n + 1/2)T. We can
now write the given integral as:
Z
|t|>(n+1/2)T |u(t)|2 dt +
Z (n+1/2)T
−(n+1/2)T
ØØØØØ
u(t) −
Xn
m=−n
ØØØØØ
u`
m(t)
2
dt. (15)
As in part (a), the first integral vanishes as n → 1.
(c) Since ûk,m are the Fourier series coefficients of um(t) we know um(t) = l.i.m`→1u`
m(t). Hence,
for each n, the second integral goes to zero as ` → 1. Thus, for any ≤ > 0, we can choose n so
that the first term is less than ≤/2 and then choose ` large enough that the second term is less
than ≤/2. Thus the limit of (15) as n, ` → 1 is 0.
Exercise 4.23: The Fourier transform of the LHS of (4.40) is a function of t, so its Fourier
transform is
F(u(t) ∗ v(t)) =
Z
1
−1
μZ
1
−1
u(τ )v(t − τ )dτ
∂
e−2πiftdt
=
Z
1
−1
u(τ )
μZ
1
−1
∂
v(t − τ )e−2πift dt
dτ
=
Z
1
−1
u(τ )
μZ
1
−1
v(r)e−2πif(t+r) dr
∂
dτ
=
Z
1
−1
u(τ )e−2πifτdτ
Z
1
−1
v(r)e−2πifrdr
= û(f)ˆv(f).
27

Exercise 4.24:
(a)
Z
|t|>T
ØØ Øu(t)e−2πift − u(t)e−2πi(f−δ)t
ØØØ
dt =
Z
|t|>T
ØØ Øu(t)e−2πift
≥
1 − e2πiδt
¥ØØØ
dt
=
Z
|t|>T |u(t)|
ØØØ
1 − e2πiδt
ØØØ
dt
≤ 2
Z
|t|>T |u(t)| dt for all f > 0, δ > 0.
Since u(t) is L1,
R
1
−1 |u(t)| dt is finite. Thus, for T large enough, we can make
R
|t|>T |u(t)| dt
as small as we wish. In particular, we can let T be sufficiently large that 2
R
|t|>T |u(t)| dt is less
than ≤/2. The result follows.
(b) For all f,
Z
|t|≤T
ØØ Øu(t)e−2πift − u(t)e−2πi(f−δ)t
ØØØ
dt =
Z
|t|≤T |u(t)|
ØØØ
1 − e2πiδt
ØØØ
dt.
For the T selected in part a), we can make
ØØ
1 − e2πiδt
ØØ
arbitrarily small for all |t| ≤ T by choosing
δ to be small enough. Also, since u(t) is L1,
R
|t|≤T |u(t)| dt is finite. Thus, by choosing δ small
enough, we can make
R
|t|≤T |u(t)|
ØØ
1 − e2πiδt
ØØ
dt < ≤/2.
Exercise 4.26: Exercise 4.11 shows that the sum of two measurable functions is measurable,
so the question concerns the energy in R
au(t) + bv(t). Note that R
for each t, |au(t) + bv(t)|2 ≤ 2|a|2|u(t)|2 + 2|b|2|v(t)|2. Thus since
|u(t)|2 dt < 1 and
R |v(t)|2 dt < 1, it follows that
|au(t) + bv(t)|2 dt < 1.
If {t : u(t) ≤ β} is a union of disjoint intervals, then {t : u(t − T) ≤ β} is that same union of
intervals each shifted to the left by T, and therefore it has the same measure. In the general
case, any cover of {t : u(t) ≤ β}, if shifted to the left by T, is a cover of {t : u(t−T) ≤ β}. Thus,
for all β, μ{t : u(t) ≤ β} = μ{t : u(t−T) ≤ β}. Similarly if {t : u(t) ≤ β} is a union of intervals,
then {t : u(t/T ) ≤ β} is that same set of intervals expanded by a factor of T. This generalizes
to arbitrary measurable sets as before. Thus μ{t : u(t) ≤ β} = (1/T)μ{t : u(t/T ) ≤ β}.
Exercise 4.29: The statement of the exercise contains a misprint — the transform ˆu(f) is
limited to |f| ≤ 1/2 (thus making the sampling theorem applicable) rather than the function
being time-limited. For the given sampling coefficients, we have
u(t) =
X
k
u(k)sinc(t − k) =
Xn
(−1)ksinc(t − k)
k=−n
u(n +
1
2
) =
Xn
(−1)ksinc(n +
k=−n
1
2 − k) =
Xn
k=−n
12
(−1)k(−1)n−k
π[n − k + ]
. (16)
Since n is even, (−1)k(−1)n−k = (−1)n = 1. Substituting j for n − k, we then have
u(n +
1
2
) =
X2n
k=0
1
π(k + 12
)
. (17)
28

The approximation
Pm2
k=m1
1
ln m2+1
k+1/2 ≈ m1
comes from approximating the sum by an integral
and is quite accurate for m1 >> 0 To apply this approximation to (17), we must at least omit
the term k = 0 and this gives us the approximation
u(n +
1
2
) ≈
2
π
+
1
π
ln(2n + 1).
This goes to infinity logarithmically in n as n → 1. The approximation can be improved
by removing the first few terms from (17) before applying the approximation, but the term
ln(2n + 1) remains.
We can evaluate u(n+m+12
) and u(n−m−12
) by the same procedure as in (16). In particular,
u(n+m+
1
2
) =
Xn
(−1)ksinc(n+m+
k=−n
1
2−k)
=
Xn
k=−n
12
(−1)k(−1)n+m−k
π[n+m−k + ]
=
2Xn+m
j=m
12
(−1)n+m
π[j + ]
.
.
u(n−m−
1
2
) =
Xn
k=−n
12
(−1)k(−1)n−m−k
π[n−m−k − ]
=
2Xn−m
j=−m
12
(−1)n−m
π[j − ]
.
Taking magnitudes,
ØØØØ
u(n+m+
1
2
ØØØØ
)
=
2Xn+m
j=m
1
π[j + 12
]
;
ØØØØ
u(n−m−
1
2
ØØØØ
)
=
2Xn−m
j=−m
1
π[j − 12
]
.
All terms in the first expression above are positive, whereas those in the second expression are
negative for j ≤ 0. We break this second expression into positive and negative terms:
ØØØØ
u(n−m−
1
2
ØØØØ
)
=
X0
j=−m
1
π[j − 12
]
+
2Xn−m
j=1
1
π[j − 12
]
=
X0
k=−m
1
π[k − 12
]
+
2nX−m−1
j=0
1
π[j + 12
]
.
For each j, 0 ≤ j ≤ m, the term in the second sum above is the negative of the term in the first
sum with j = −k. Cancelling these terms out,
ØØØØu(n−m−
1
2
ØØØØ
)
=
2nX−m−1
j=m+1
1
π[j + 12
]
.
This is a sum of positive terms and is a subset of the positive terms in |u(n+m+ 12
|, establishing
that |u(n − m − 12
| ≤ |u(n + m + 12
)|. What is happening here is that for points inside [−n, n],
the sinc functions from the samples on one side of the point cancel out the sinc functions from
the samples on the other side.
The particular samples in this exercise have been chosen to illustrate that truncating the samples
of a bandlimited function and truncating the function can have very different effects. Here
the function with truncated samples oscillates wildly (at least logarithmically in n), with the
oscillations larger outside of the interval than inside. Thus most of the energy in the function
resides outside of the region where the samples are nonzero.
29

1W
Exercise 4.31:
(a) Note that g(t) = p2(t) where p(t) = sinc(Wt). Thus g(ˆf) is the convolution of p(ˆf) with
itself. Since p(ˆf) = rect( f
), we can convolve graphically to get the triangle function below.
W °
°
°❅
°
ˆg(f) g(t)
❅
❅
❅
1
W
−W W= 1
2T
1
1W
(b) Since u(t) =
P
k u(kT)sinc(2Wt − k), it follows that v(t) =
P
k u(kT)sinc(2Wt − k) ∗ g(t).
Letting h(t) = sinc(t/T ) ∗ g(t), we see that ˆh
(f) = Trect(Tf)ˆg(f). Since rect(Tf) = 1 over the
range where ˆg(f) is non-zero, ˆh
(f) = T ˆg(f). Thus h(t) = Tg(t). It follows that
v(t) =
X
k
Tu(kT)g(t − kT). (18)
(c) Note that g(t) ≥ 0 for all t. This is the feature of g(t) that makes it useful in generating
amplitude limited pulses. Thus, since u(kT) ≥ 0 for each k, each term in the sum is non-negative,
and v(t) is non-negative.
P
(d) The obvious but incomplete way to see that
k sinc(t/T − k) = 1 is to observe that each
sample of the constant function 1 is 1, so this is just the sampling expansion of a constant.
Unfortunately, u(t) = 1 is not L2, so the sampling theorem does not apply. The problem is more
than nit-picking, since, for example, the sampling expansion of a sequence of alternating 1’s and
-1’s does not converge (as can be seen from Exercise 4.29). The desired result follows here from
noting that both the sampling expansion and the constant function 1 are periodic in T and both
are L2 over one period. Taking the Fourier series over a period establishes the equality.
P
(e) To evaluate
k g(t−kT), consider P (18) with each u(kT) = 1. For this choice, it follows that
k g(t−kT) = v(t)/T. To evaluate v(t) for this choice, note that u(t) = 1 and v(t) = u(t)∗g(t),
so that v(t) can be regarded as the output when the constant 1 is passed through the filter g(t).
The output is then constant also and equal to
R
g(t) dt = ˆg(0) = 1W
. Thus
P
k g(t − kT) =
1/TW = 2.
(f) Note that v(t) =
P
k u(kT)Tg(t − kT) is non-decreasing, for each t, in each sample u(kT).
Thus v(t) ≤
P
k Tg(t − kT), which as we have seen is simply 2T.
(h) Since g is real and non-negative and each |u(kT)| ≤ 1,
|v(t)| ≤
X
k
|u(kT)|Tg(t − kT) ≤ 2T for all t.
We will find in Chapter 6 that g(t) is not a very good modulation waveform at a sample
separation T, but it could be used at a sample separation 2T.
Exercise 4.33: Consider the sequence of functions vm(t) = rect(t − m) for m ∈ Z+, i.e., time
spaced rectangular pulses. For every t, limm→1 rect(t − m) = 0 so this sequence converges
pointwise to 0. However,
R
|(rect(t −m) − rect(t − n)|2 dt = 2 for all n6= m, so L2 convergence
is impossible.
30

Exercise 4.37:
(a)
Z
|ˆs(f)| df =
Z
|
X
m
û(f + m
T
)rect(fT)| df ≤
Z X
|u(ˆf + m
T
m )rect(fT)| df =
Z
|û(f)| df,
which shows that s(f) is L1 if û(f) is.
(b) The following sketch makes it clear that û(f) is L1 and L2. In particular,
Z
|û(f)| df =
Z
|û(f)|2 df = 2
X
k≥1
1
k2 < 1.
0 1 2
1
û(f)
✲ 1✛/4 ✲ ✛1/9
ˆs(f)
2
6
4
0 1/2
It can be seen from the sketch of ˆs(f) that ˆs(f) = 2 from 18
to 12
and from −12
to −18, which is
a set of measure 3/4. In general, for arbitrary integer k > 0, it can be seen that ˆs(f) = 2k from
1
to 1
and from 1
2(k+1)2 2k2 − 2k2 to − 1
2(k+1)2 . Thus ˆs(f) = 2k over a set of measure 2k+1
k2(k+1)2 . It
follows that
Z
ˆ |s(f)|2 df = lim
n→1
Xn
k=1
(2k)2 2k + 1
k2(k + 1)2 = lim
n→1
Xn
k=1
4(2k + 1)
(k + 1)2
≥ lim
n→1
Xn
k=1
4(k + 1)
(k + 1)2 =
Xn
k=1
4
k + 1
= 1.
(c) Note that û(f) = 1 for every positive integer value of f, and thus (for positive ≤) û(f)f1+≤
approaches 1. It is 0 for other arbitrarily large values of f, and thus no limit exists.
Exercise 4.38: Z
1
−1 |u(t)|2dt = 2
μ
1 +
1
22 +
∂
.
1
32 + ...
This sum is finite so u(t) is L2. Now we’ll show that
s(t) =
X
k
u(k)sinc(t − k) =
X
k
sinc(t − k)
is neither L1 nor L2. Taking the Fourier Transform of s(t),
ˆs(f) =
X
k
rect(f)e−2πifk = rect(f)
X
k
e−2πifk.
To show that s(t) is not L1,
Z
1
−1 |s(t)|dt =
Z
1
−1
s(t)dt since s(t) ≥ 0 for all t
= ˆs(0) =
X
k
1 = 1.
31

To show that s(t) is not L2,
Z
1
−1 |s(t)|2dt =
Z
1
−1 |
X
k
sinc(t − k)|2df = 1.
Since u(k) is equal to 1 for every integer k,
P
k u2(k) = 1. The sampling theorem energy
equation does not apply here
°R
|u(t)|2dt6= T
P
k |u(kT)|2
¢
because ˆu(f) is not band-limited.
32

Chapter 5
Exercise 5.1: The first algorithm starts with a set of vectors S = {v1, . . . , vm} that span
V but are dependent. A vector vk ∈ S is selected that is a linear combination of the other
vectors in S. vk is removed from S, forming a reduced set S0. Now S0 still spans V since each
v ∈ V is a linear combination of vectors in S, and vk in that expansion can be replaced by
its representation using the other vectors. If S0 is independent, we are done, and if not, the
previous step is repeated with S0 replacing S. Since the size of S is reduced by 1 on each such
step, the algorithm terminates with an independent spanning set, i.e., a basis.
The second algorithm starts with an independent set S = {v1, . . . , vm} of vectors that do
not span the space. An arbitrary nonzero vector vm+1 ∈ V is then selected that is not a
linear combination of S (this is possible since S does not span V). It can be seen that S0 =
{v1, . . . , vm+1} is an independent set. If S0 spans V, we are done, and if not, the previous step is
repeated with S0 replacing S. With each repetition of this step, the independent set is increased
by 1 vector until it eventually spans V.
It is not immediately clear that the second algorithm ever terminates. To prove this and also
prove that all bases of a finite dimensional vector space have the same number of elements, we
describe a third algorithm. Let Sind = v1, . . . , vm be an arbitrary set of independent vectors and
let Ssp = {u1, . . . , un} be a finite spanning set for V (which must exist by the finite dimensional
assumption). Then, for k = 1, . . .m, successively add vk to Ssp and remove one of the original
vectors uj of Ssp so that the remaining set, say S0sp is still a spanning set. This is always possible
since the added element must be a linear combination of a spanning set, so the augmented set is
linearly dependent. One of the original elements of Ssp can be removed (while maintaining the
spanning property) since the newly added vector is not a linear combination of the previously
added vectors. A contradiction occurs if m > n, i.e., if the independent set is larger than the
spanning set, since no more than the n original vectors in the spanning set can be removed.
We have just shown that every spanning set contains at least as many members as any inde-pendent
set. Since every basis is both a spanning set and an independent set, this means that
every basis contains the same number of elements, say b. Since every independent set contains
at most b elements, algorithm 2 must terminate as a basis when S reaches b vectors.
Exercise 5.3: Let the n vectors that uniquely span a vector space V be called v1, v2, . . . , vn.
We will prove that the n vectors are linearly independent using proof by contradiction. Assume
v1, v2, . . . , vn are linearly dependent. Then
Pnj
=1 αjvj = 0 for some set of scalars α1, α2, .., αn
where not all the αjs equal zero. Say αk6= 0. We can express vk as a linear combination of the
other n − 1 vectors {vj}j6=k:
vk =
X
j6=k
−αj
αk
vj .
Thus vk P
has two representations in terms of {v1, . . . , vn}. One is that above, and the other
is vk =
j βjvj where βk = 1 and βj = 0 for j6= k. Thus the representation is non-unique,
demonstrating the contradiction.
It follows that if n vectors uniquely span a vector space, they are also independent and thus
form a basis. From Theorem 5.1.1, the dimension of V is then n.
33

Exercise 5.6:
kv + uk2 = hv + u, v + ui
= hv, v + ui + hu, v + ui by axiom (b)
= hv, vi + hv, ui + hu, vi + hu, ui by axiom (b)
≤ |hv, vi > | + |hv, ui| + |hu, vi| + |hu, ui|
≤ kvk2 + kvkkuk + kukkvk + kuk2 = (kvk + kvk)2.
So kv + uk ≤ kvk + kuk.
Exercise 5.8:
(a) By direct substitution of u(t) =
P
k,m ûk,mθk,m(t) and v∗(t) =
P
k,m ˆv∗k,mθ∗k,m(t) into the
inner product definition
hu, vi =
Z
1
−1
u(t)v∗(t) dt
=
Z
1
−1
X
k,m
ûk,mθk,m(t)
X
k0,m0
ˆv∗k0,m0θ∗k0,m0(t) dt
=
X
k,m
ûk,m
X
k0,m0
ˆv∗k0,m0
Z
1
−1
θk,m(t)θ∗k0,m0(t) dt
= T
X
k,m
ûk,mˆv∗k,m.
(b) For any real numbers a and b, 0 ≤ (a + b)2 = a2 + 2ab + b2. It follows that ab ≤ 12
a2 + 12
b2.
Applying this to |ûk,m| and |ˆvk,m|, we see that
|uk,mv∗k,m| = |uk,m| |v∗k,m| ≤
1
2|uk,m|2 +
1
2|vk,m|2.
Thus, using part (a),
|hu, vi| ≤ T
X
k,m
|uk,mv∗k,m| ≤
T
2
X
k,m
|uk,m|2 + T
2
X
k,m
|vk,m|2.
Since u and v are L2, the latter sums above are finite, so |hu, vi| is also finite.
(c) It is necessary for inner products in an inner-product space to be finite since, by definition
of a complex inner-product space, the inner product must be a complex number, and the set
of complex numbers (just like the set of real numbers) does not include 1. This seems like a
technicality, but it is central to the special properties held by finite energy functions.
Exercise 5.9:
(a) For V to be a vector subspace, it is necessary for v = 0 to be an element of V, and this
is only possible in the special case where ku1k = ku2k. Even in this case, however, V is not a
vector space. This will be shown at the end of part (b). It will be seen in studying detection in
Chapter 8 that V is an important set of vectors, subspace or not.
34

(b) V can be rewritten as V = {v : kv −u1k2 = kv −u2k2}. Expanding these energy differences
for k = 1, 2,
kv − ukk2 = kvk2 − hv, uki − huk, vi + kukk2
= kvk2 + kukk2 − 2<(hv, uki).
It follows that v ∈ V if and only if
kvk2 + ku1k2 − 2<(hv, u1i) = kvk2 + ku2k2 − 2<(hv, u2i).
Rearranging terms, v ∈ V if and only if
<(hv, u2 − u1i) = ku2k2 − ku1k2
2 . (19)
Now to complete part (a), assume ku2k2 = ku1k2 (which is necessary for V to be a vector space)
and assume u16= u2 to avoid the trivial case where V is all of L2. Now let v = i(u2 − u1).
Thus hv, u2 − u1i is pure imaginary so that v ∈ V. But iv is not in V since hiv, u2 − u1i =
−ku2 −u1k26= 0. In a vector subspace, multiplication by a scalar (in this case i) yields another
element of the subspace, so V is not a subspace except in the trivial case where u1 = u2¿
(c) Substituting (u1 + u2)/2 for v, we see that kv − u1k = k(u2 − u2)k and kv − u2k =
k(−u2 + u2)k, so kv − u1k = kv − u2k and consequently v ∈ V.
(d) The geometric situation is more clear if the underlying class of functions is the class of real
L2 functions. In that case V is a subspace whenever ku1k = ku2k. If ku1k6= ku2k, then V is
a hyperplane. In general, a hyperplane H is defined in terms of a vector u and a subspace S as H = {v : v = u + s for some s ∈ S}. In R2 a hyperplane is a straight line, not necessarily
through the origin, and in R3, a hyperplane is either a plane or a line, neither necessarily
including the origin. For complex L2, V is not a hyperplane. Part of the reason for this exercise
is to see that real L2 and complex L2, while similar in may aspects, are very different in other
aspects, especially those involving vector subspaces.
Exercise 5.12:
(a) To show that S⊥ is a subspace of V, we need to show that for any v1, v2 ∈ S⊥, αv1+βv2 ∈ S⊥
for all scalars α, β. If v1, v2 ∈ S⊥, then for all w ∈ S, hαv1 + βv2,wi = αhv1,wi+βhv2,wi =
0 + 0. Thus αv1 + βv2 ∈ S⊥ and S⊥ is a subspace of V.
(b) By the Projection Theorem, for any u ∈ V, there is a unique vector u|S ∈ S such that
hu − u|S, si = 0 for all s ∈ S. So u⊥S = u − u|S ∈ S⊥S and we have a unique decomposition
of u into u = u|S + u⊥S.
(c) Let V and S (where S < V ) denote the dimensions of V and S respectively. Start with a
set of V independent vectors s1, s2 · · · sV ∈ V. This set is chosen so that the first S of these i.e.
s1, s2 · · · sS are in S. The first S orthonormal vectors obtained by Gram-Schmidt procedure
will be a basis for S. The next V − S orthonormal vectors obtained by the procedure will be a
basis for S⊥.
Exercise 5.14:
(a) Assume throughout this part that m, n are positive integers, m > n. We will show, as case
1, that if the left end, am, of the pulse gm(t) satisfies am < an, then am + 2−m−1 < an, i.e.,
35

the pulses do not overlap at all. As case 2, we will show that if am ∈ (an, , an + 2−n−1), then
am + 2−m−1 ∈ [an, an + 2−n−1], i.e., the pulses overlap completely.
Case 1: Let dm be the denominator of the rational number am (in reduced form). Thus (since
andn and amdm are integers), it follows that if am < an, then also am + 1
dndm ≤ an. Since
dn ≤ dm ≤ m for m ≥ 3, we have am + 1
m2 ≤ an for m ≥ 3. Since 1/m2 > 2−m−1 for m ≥ 3,
it follows that am + 2−m−1 ≤ an. Thus, if am < an, gm and gn do not overlap for any m > 3.
Since g2 does not overlap g1 by inspection, there can be no partial overlap for any am < an.
Case 2: Apologies! This is very tedious. Assume that am ∈ (an, an + 2−n−1). By the same
argument as above,
am ≥ an +
1
dndm
and am +
1
dmd0n ≤ an + 2−n−1 (20)
where d0n is the denominator of an + 2−n−1. Combining these inequalities,
1
dndm
< 2−n−1. (21)
We now separate case 2 into three subcases. First, from inspection of Figure 5.3 in the text,
there are no partial overlaps for m < 8. Next consider m ≥ 8 and n ≤ 4. From the right side of
(20), there can be no partial overlap if
2−m−1 ≤
1
dmd0n
condition for no partial overlap. (22)
From direct evaluation, we see that d0n ≤ 48 for n ≤ 4. Now dm2−m−1 is 5/512 for m = 8 and
is decreasing for m ≥ 8. Since 5/512 < 1/48, there is no partial overlap for n ≤ 4,m ≥ 8.
Next we consider the general case where n ≥ 5. From (21), we now derive a general condition
on how small m can be for m, n pairs that satisfy the conditions of case 2. Since m ≥ dm for
m ≥ 3, we have
m >
2n+1
dn
(23)
For n ≥ 5, 2n+1/dn ≥ 2n + 2, so the general case reduces to n ≥ 5 and m ≥ 2n + 2.
Next consider the condition for no partial overlap in (22). Since d0n ≤ 2n+1dn ≤ 2n+1n and
dm ≤ m, the following condition also implies no partial overlap:
m2−m−1 ≤
2−n−1
n
(24)
The left side of (24) is decreasing in m, so if we can establish (24) for m = 2n+2, it is established
for all m ≥ 2n+2. The left side, for m = 2n+2 is (2n+2)2−2m−3. Thus all that remains is to
show that (2n + 2)n ≤ 2n+2. This, however, is obvious for n ≥ 5.
Exercise 5.15: Using the same notation as in the proof of Theorem 4.5.1,
u(n)(t) =
Xn
m=−n
Xn
k=−n
ûk,mθk,m(t) û(n)(f) =
Xn
m=−n
Xn
k=−n
ûk,m√k,m(f).
36

Since √k,m(f) is the Fourier transform of θk,m(t) for each k,m, the coefficients ûk,m are the same
in each expansion. In the same way,
v(n)(t) =
Xn
m=−n
Xn
k=−n
ˆvk,mθk,m(t) ˆv(n)(f) =
Xn
m=−n
Xn
k=−n
ˆvk,m√k,m(f).
It is elementary, using the orthonormality of the θk,m and the orthonormality of the √k,m to see
that for all n > 0,
hu(n), v(n)i =
Xn
m=−n
Xn
k=−n
ûk,mv∗k,m = hû
(n), ˆv(n)i. (25)
Thus our problem is to show that this same relationship holds in the limit n → 1. We know
(from Theorem 4.5.1) that l.i.m.n→1u(n) = u, with the corresponding limits for v(n),û
(n), and
ˆv(n). Using the Schwarz inequality on the second line below, and Bessel’s inequality on the
third,
|hu(n), vi − hu(n), v(n)i| = |hu(n), v − v(n)i|
≤ ku(n)kkv − v(n)k
≤ kukkv − v(n)k.
Since limn→1 kv −v(n)k = 0, we see that limn→1 |hu(n), vi−hu(n), v(n)i| = 0. In the same way,
limn→1hu(n), vi − hu, vi| = 0. Combining these limits, and going through the same operations
on the transform side,
lim
n→1hu(n), v(n)i = hu, vi lim
n→1hû
(n), ˆv(n)i = hû
, ˆvi. (26)
Combining (25) and (26), we get Parsevals relation for L2 functions, hu, vi = hû
, ˆvi.
Exercise 5.16:
(a) Colloquially, lim|f|→1 û(f)|f|1+≤ = 0 means that
ØØ
û(f)||f|1+≤
ØØ
becomes and stays increas-ingly
small as |f| becomes large. More technically, it means that for any δ > 0, there is an A(δ)
such that
ØØ
û(f)||f|1+≤
ØØ
≤ δ for all f such that |f| ≥ A(δ). Choosing δ = 1 and A = A(1), we see
that |û(f)| ≤ |f|−1−≤ for |f| ≥ A.
(b)
Z
1
−1 |û(f)| df =
Z
|f|>A |û(f)| df +
Z
|f|≤A |û(f)| df
≤ 2
Z
1
A
f−1−≤ df +
Z A
−A |û(f)| df
=
2A−≤
≤
+
Z A
−A |û(f)| df.
Since û(f) is L2, its truncated version to [−A,A] is also L1, so the second integral is finite,
showing that û(f) (untruncated) is also L1. In other words, one role of the ≤ above is to make
û(f) decreases quickly enough with increasing f to maintain the L1 property.
37

(c) Recall that ˆs(n)(f) =
P
|m|≤n ˆsm(f) where ˆsm(f) = û(f − m)rect†(f). Assuming A to be
integer and m0 > A, |ˆsm0(f)| ≤ (m0 − 1)−1−≤. Thus for f ∈ (−1/2, 1/2]
|ˆs(n)(f)| ≤
ØØØØØØ
X
|m|≤A
ØØØØØØ
û(f −m)
+
X
(|m0| − 1)−1−≤
|m0|>A
=
ØØØØØØ
X
|m|≤A
ØØØØØØ
û(f −m)
+
X
m≥A
2m−1−≤. (27)
1
The factor →of 2 1
→above was omitted by error from the exercise statement. Note that since the final
sum converges, this is independent of n and is thus an upper bound on |s(ˆf)|. Now visualize the
2A + 1 terms in the first sum above as a vector, say a. Let be the vector of 2A + 1 ones, so
P
P
that ha,i =
ak. Applying the Schwarz inequality to this, |
k ak| ≤ kakk→1
k. Substituting
this into (27),
|ˆs(f)| ≤
s
(2A + 1)
X
|m|≤A
|û(f +m)|2 +
X
m≥A
2m−1−≤. (28)
(d) Note that for any complex numbers a and b, |a + b|2 ≤ |a + b|2 + |a − b|2 = 2|a|2 + 2|b|2.
Applying this to (28),
|ˆs(f)|2 ≤ (4A + 2)
X
|m|≤A
|û(f +m)|2 +


X
m≥A
2m−1−≤


2
.
Since ˆs(f) is nonzero only in [1/2, 1/2] we can demonstrate that ˆs(f) is L2 by showing that the
integral of |ˆs(f)|2 over [−1/2, 1/2] is finite. The integral of the first term above is 4A + 2 times
the integral of |û(f)|2 from −A − 1/2 to A + 1/2 and is finite since û(f) is L2. The integral of
the second term is simply the second term itself, which is finite.
38

Chapter 6
Exercise 6.1: Let Uk be be a standard M-PAM random variable where the M points each have
probability 1/M. Consider the analogy with a uniform M-level quantizer used on a uniformly
distributed rv U over the interval [−Md/2,Md/2].
✛ R1 ✲✛ R2 ✲✛ R3 ✲✛ R4 ✲✛ R5 ✲✛ R6 ✲
a1 a2 a3 a4 a5 a6
0
✛ d ✲
(M = 6)
.
Let Q be the quantization error for the quantizer and Uk be the quantization point. Thus
U = Uk + Q. Observe that for each quantization point the quantization error is uniformly
distributed over [−d/2, d/2]. This means that Q is zero mean and statistically independent of
the quantization point Uk. It follows that
k ] + E[Q2] = E[U2
k ] + d2
E[U2] = E[(Q + Uk)2 = E[U2
12.
On the other hand, since U is uniformly distributed, E[U2] = (dM)2/12. It then follows that
k ] = d2(M2 − 1)
E[U2
12 .
Verifying the formula for M=4:
ES = 2
°d2
¢2 +
°3d
2
¢2
4
=
5
4d2
d2(M2 − 1)
12
=
5
4d2.
Verifying the formula for M=8:
ES = 2
°d2
¢2 +
°3d
2
¢2 +
°5d
2
¢2 +
°7d
2
¢2
8
=
21
4 d2
d2(M2 − 1)
12
=
21
4 d2.
Exercise 6.3:
(a) Since the received signal is decoded to the closest PAM signal, the intervals decoded to each
signal are indicated below.
✛ d ✲
R1 R2 R3 R4 ✛ ✲✛ ✲✛ ✲✛ ✲
a1 a2 a3 a4 −3d/2 −d/2 d/2 3d/2
0
.
39

Thus if Uk = a1 is transmitted, an error occurs if Zk ≥ d/2. The probability of this is Q(d/2)
where
Q(x) =
Z
1
x
1
√2π
exp(−z2/2).
If Uk = a2 is transmitted, an error occurs if either Zk ≥ d/2 or Zk < −d/2, so, using the
symmetry of the Gaussian density, the probability of an error in this case is 2Q(d/2). In the
same way, the error probability is 2Q(d/2) for a3 and Q(d/2) for a4. Thus the overall error
probability is (3/2)Q(d/2).
(b) Now suppose the third point is moved to d/2+≤. This moves the decision boundary between
R3 and R4 by ≤/2 and similarly moves the decision boundary between R2 and R3 by ≤/2. The
error probability then becomes
Pe(≤) =
1
2
Σ
Q(d/2) + Q(d + ≤
2
) + Q(d − ≤
2
Π
.
)
dP≤(e)
d≤
=
1
4
Σ
1
√2π
exp(−(d + ≤)2/2) −
1
√2π
Π
.
exp(−(d − ≤)2/2)
This is equal to 0 at ≤ = 0, as can be seen by symmetry without actually taking the derivitive.
(c) With the third signal point at d/2 + ≤, the signal energy is
ES =
1
4
"μ
d
2
∂2
+
μ
d + ≤
2
∂2
+ 2
μ
3d
2
∂2
#
.
The derivitive of this with respect to ≤ is (d + ≤)/8.
(d) This means that to first order in ≤, the energy can be reduced by reducing a3 without
changing Pe. Thus moving the two inner points slightly inward provides better energy efficiency
for 4-PAM. This is quite counter-intuitive. The difference between optimizing the points in
4-PAM and using standard PAM is not very significant, however. At 10 dB signal to noise
ratio, the optimal placement of points (which requires considerably more computation) makes
the ratio of outer points to inner points 3.15 instead of 3, but it reduces error probability by
less than 1%.
Exercise 6.4:
(a) If for each j,
Z
1
−1
u(t)dj(t) dt =
Z
1
−1
1X
k=1
ukp(t−kT)dj(t) dt
=
1X
k=1
uk
Z
1
−1
p(t−kT)dj(t) dt = uj ,
then it must be that
R
1
−1
p(t−kT)dj(t) dt = hp(t − kT), dj(t)i has the value one for k = j and
the value zero for all k6= j. That is, dj(t) must be orthogonal to p(t − kT) for all k6= j.
40

(b) Since hp(t − kT), d0(t)i = 1 for k = 0 and equals zero for k6= 0, it follows by shifting each
function by jT that hp(t − (k − j)T), d0(t)i equals 1 for j = k and 0 for j6= k. It follows that
dj(t) = d0(t − jT).
(c) In this exercise, to avoid ISI (intersymbol interference), we pass u(t) through a bank of
filters d0(−t), d1(−t) . . . dj(−t) . . . , and the output of each filter at time t = 0 is u0, u1 . . . uj . . .
respectively. To see this, note that the output of the j-th filter in the filter bank is
rj(t) =
1X
k=1
uk
Z
1
−1
p(τ−kT)dj(−t + τ ) dτ.
At time t = 0,
rj(0) =
1X
k=1
uk
Z
1
−1
p(τ−kT)dj(τ ) dτ = uj .
Thus, for every j, to retrieve uj from u(t), we filter u(t) through dj(−t) and look at the output
at t = 0.
However, from part (b), since dj(t) = d0(t−jT) (the j-th filter is just the first filter delayed by
jT). Rather than processing in parallel through a filter bank and looking at the value at t = 0,
we can process serially by filtering u(t) through d0(−t) and looking at the output every T. To
verify this, note that the output after filtering u(t) through d0(−t) is
r(t) =
1X
k=1
uk
Z
1
−1
p(τ−kT)d0(−t + τ ) dτ,
and so for every j,
r(jT) =
1X
k=1
uk
Z
1
−1
p(τ−kT)d0(τ − jT) dτ = uj .
Filtering the received signal through d0(−t) and looking at the values at jT for every j is the
same operation as filtering the signal through q(t) and then sampling at jT. Thus, q(t) = d0(−t).
Exercise 6.6:
(a) g(t) must be ideal Nyquist, i.e., g(0) = 1 and g(kT) = 0 for all non-zero integer k. The
existence of the channel filter does not change the requirement for the overall cascade of filters.
The Nyquist criterion is given in the previous problem as Eq. (??).
P
(b) It is possible, as shown below. There is no ISI if the Nyquist criterion
m ˆg(f +2m) = 12
for
|f| ≤ 1 is satisfied. Since ˆg(f) = ˆp(f)ˆh
(f)ˆq(f), we know that ˆg(f) is zero where ever ˆh
(f) = 0.
In particular, ˆg(f) must be 0 for |f| > 5/4 (and thus for f ≥ 2). Thus we can use the band edge
symmetry condition, ˆg(f) + ˆg(2 − f) = 1/2 over 0 ≤ f ≤ 1. Since ˆg(f) = 0 for 3/4 < f ≤ 1,
it is necessary that ˆg(f) = 1/2 for 1 < f ≤ 5/4. Similarly, since ˆg(f) = 0 for f > 5/4, we
must satisfy ˆg(f) = 1/2 for |f| < 3/4. Thus, to satisfy the Nyquist criterion, ˆg(f) is uniquely
specified as below.
41

ˆg(f 0.5 )
0 34
54
In the regions where ˆg(f) = 1/2, we must choose ˆq(f) = 1/[2ˆp(f)ˆh
(f)]. Elsewhere ˆg(f) = 0
because ˆh
(f) = 0, and thus ˆq(f) is arbitrary. More specifically, we must choose ˆq(f) to satisfy
ˆq(f) =


0.5, |f| ≤ 0.5;
1
3−2|f|
, 0.5 < |f| ≤ 0.75
1
3−2|f|
, 1 ≤ |f| ≤ 5/4
It makes no difference what ˆq(f) is elsewhere as it will be multiplied by zero there.
(c) Sinceˆh
(f) = 0 for f > 3/4, it is necessary that ˆg(f) = 0 for |f| > 3/4. Thus, for all integers
m, ˆg(f + 2m) is 0 for 3/4 < f < 1 and the Nyquist criterion cannot be met.
(d) If for some frequency f, ˆp(f)ˆh
(f)6= 0, it is possible for ˆg(f) to have an arbitrary value by
choosing ˆq(f) appropriately. On the other hand, if ˆp(f)ˆh
(f) = 0 for some f, then ˆg(f) = 0.
Thus, to avoid ISI, it is necessary that for each 0 ≤ f ≤ 1/(2T), there is some integer m such that
ˆh
(f+m/T)ˆp(f+m/T)6= 0. Equivalently, it is necessary that
P
m
ˆh
(f+m/T)ˆp(f+m/T)6= 0 for
all f.
There is one peculiarity here that you were not expected to deal with. If ˆp(f)ˆh
(f) goes through
zero at f0 with some given slope, and that is the only f that can be used to satisfy the Nyquist
criterion, then even if we ignore the point f0, the response ˆq(f) would approach infinity fast
enough in the vicinity of f0 that ˆq(f) would not be L2.
This overall problem shows that under ordinary conditions (i.e.non-zero filter transforms), there
is no problem in choosing ˆq(f) to avoid intersymbol interference. Later, when noise is taken into
account, it will be seen that it is undesirable for ˆq(f) be very large where ˆp(f) is small, since
this amplifies the noise in frequency regions where there is very little signal.
Exercise 6.8:
(a) With α = 1, the flat part of ˆg(f) disappears. Using T = 1 and using the familiar formula
cos2 x = (1 + cos 2x)/2, ˆg1(f) becomes
ˆg1(f) =
1
2
Σ
1 + cos(πf
2
Π
rect(f
)
2
).
Writing cos x = (1/2)[eix + e−ix] and using the frequency shift rule for Fourier transforms, we
42

get
g1(t) = sinc(2t) +
1
2
sinc(2t + 1) +
1
2
sinc(2t − 1)
=
sin(2πt)
2πt
+
1
2
sin(π(2t + 1))
π(2t + 1)
+
1
2
sin(π(2t − 1))
π(2t − 1)
=
sin(2πt)
2πt −
1
2
sin(2πt)
π(2t + 1) −
1
2
sin(2πt)
π(2t − 1)
=
sin(2πt)
2π
Σ
1
t −
1
2t + 1 −
1
2t − 1
Π
=
sin(2πt)
2πt(1 − 4t2)
=
sin(πt) cos(πt)
πt(1 − 4t2)
=
sinc(t) cos(πt)
(1 − 4t2) .
This agrees with (6.18) in the text for α = 1, T = 1. Note that the denominator is 0 at t = ±0.5.
The numerator is also 0, and it can be seen from the first equation above that the limiting value
as t → ±0.5 is 1/2. Note also that this approaches 0 with increasing t as 1/t3, much faster than
sinc(t).
(b) It is necessary to use the result of Exercise 6.6 here. As shown there, the inverse transform
of a real symmetric waveform gα(f) that satisfies the Nyquist criterion for T = 1 and has a
rolloff of α ≤ 1 is equal to sinc(t)v(t). Here v(t) is lowpass limited to α/2 and its transform ˆv(f)
is given by the following:
ˆv(f + 1/2) = dˆg(f)
df
for −(1 + α)
2 < f <
(1 − α)
2 .
That is, we take the derivitive of the leading edge of ˆg(f), from −(1 + α)/2 to −(1 − α)/2 and
shift by 1/2 to get ˆv(f). Using the middle expression in (6.17) of the text, and using the fact
that cos2(x) = (1 + cos 2x)/2,
ˆv(f + 1/2) =
1
2
d
df
Σ
1 + cos
μ
π(−f − (1 − α)/2)
α
∂Π
for f in the interval (−(1 + α)/2,−(1 − α)/2). Shifting by letting s = f + 12
,
ˆv(s) =
1
2
d
ds
cos
Σ
−πs
α −
π
2
Π
=
1
2
d
ds
sin
hπs
α
i
= π
2α
cos
hπs
α
i
12
for s ∈ (−α/2, α/2). Multiplying this by rect(s/α) gives us an expression for v(ˆs) everywhere.
Using cos x = (eix + e−ix) allows us to take the inverse transform of v(ˆs), getting
v(t) = π
4
[sinc(αt + 1/2) + sinc(αt − 1/2)]
= π
4
Σ
sin(παt + π/2)
παt + π/2
+
sin(παt − π/2)
παt − π/2
Π
.
Using the identity sin(x + π/2) = cos x again, this becomes
v(t) =
1
4
Σ
cos(παt)
αt + 1/2 −
cos(παt)
αt − 1/2
Π
=
cos(παt)
1 − 4α2t2 .
Since g(t) = sinc(t/a)v(t) the above result for v(t) corresponds with (6.18) for T = 1.
43

(c) The result for arbitrary T follows simply by scaling.
Exercise 6.9:
(a) The figure is incorrectly drawn in the exercise statement and should be as follows:
1
1
2
−1
− 2 3
− 4 3
− 2 7
4 −1
4 0 1
4
3
4
1
2
7
4
3
2
In folding these pulses together to check the Nyquist criterion, note that each pulse on the
positive side of the figure folds onto the interval from −1/2 to −1/4, and each pulse of the left
folds onto 1/4 to 1/2, Since there are k of them, each of height 1/k, they add up to satisfy the
Nyquist criterion.
(b) In the limit k → 1, the height of each pulse goes to 0, so the pointwise limit is simply the
middle pulse. Since there are 2k pulses, each of energy 1/(4k2), the energy difference between
that pointwise limit and ˆgk(f) is 1/(2k), which goes to 0 with k. Thus the pointwise limit and
the L2 limit both converge to a function that does not satisfy the Nyquist criterion for T = 1
and is not remotely close to a function satisfying the Nyquist condition. Note also that one
could start with any central pulse and construct a similar example such that the limit satisfies
the Nyquist criterion.
Exercise 6.11:
(a) Note that
xk(t) = 2<{exp(2πi(fk + fc)t)} = cos[2π(fk + fc)t].
The cosine function is even, and thus x1(t) = x2(t) if f1 + fc = −f2 − fc. This is the only
possibility for equality unless f1 = f2. Thus, the only f26= f1 for which x1(t) = x2(t) is
f2 = −2fc −f1. Since f1 > −fc, this requires f2 < −fc, which is why this situation cannot arise
when fk ∈ [−fc, fc) for each k.
(b) For any û1(f), one can find a function û2(f) by the transformation f2 = −2fc − f1 in
(a). Thus without the knowledge that u1(t) is lowpass limited to some B < fc, the ambiguous
frequency components in u1(t) cannot be differentiated from those of u2(t) by observing x(t).
If u(t) is known only to be bandlimited to some band B greater than fc, then the frequenices
between −B and B − 2fc are ambiguous.
An easy way to see the problem here is to visualize û(f) both moved up by fc and down by fc.
The bands overlap if B > fc and the overlapped portion can not be retrieved without additional
knowledge about u(t).
(c) The ambiguity is obvious by repeating the argument in (a). Now, since y(t) has some nonzero
bandwidth, ambiguity might be possible in other ways also. We have already seen, however,
that if u(t) has a bandwidth less than fc, then u(t) can be uniquely retrieved from x(t) in the
absence of noise.
(d) For u(t) real, x(t) = 2u(t) cos(2πfct), so u(t) can be retrieved by dividing x(t) by 2 cos(2πfct)
except at those points of measure 0 where the cosine function is 0. This is not a reasonable
approach, especially in the presence of noise, but it points out that the PAM case is essentially
different from the QAM case
44

(e) Since u∗(t) exp(2πifct) has energy at positive frequencies, the use of a Hilbert filter does
not have an output equal to u(t) exp(2πifct), and thus u(t) does not result from shifting this
output down by fc. In the same way, the bands at 2fc and −2fc that result from DSB-QC
demodulation mix with those at 0 frequency, so cannot be removed by an ordinary LTI filter.
For QAM, this problem is to be expected since u(t) cannot be uniquely generated by any means
at all.
For PAM it is surprising, since it says that these methods are not general. Since all time-limited
waveforms are unbounded in frequency, it says that there is a fundamental theoretical problem
with the standard methods of demodulation. This is not a problem in practice, since fc is usually
so much larger than the nominal bandwidth of u(t) that this problem is of no significance.
Exercise 6.13:
(a) Since u(t) is real, φ1(t) = <{u(t)e2πifct} = u(t) cos(2πfct), and since v(t) is pure imaginary,
φ2(t) = <{v(t)e2πifct} = [iv(t)] sin(2πifct). Note that [iv(t)] is real. Thus we must show that
Z
u(t) cos(2πfct)[iv(t)] sin(2πfct) dt =
Z
u(t)[iv(t)] sin(4πfct) dt = 0.
Since u(t) and v(t) are lowpass limited to B/2, their product (which corresponds to convolution
in the frequency domain) is lowpass limited to B < 2fc. Rewriting the sin(4πfct) above in
terms of complex exponentials, and recognizing the resulting integral as the Fourier transform
of u(t)[iv(t)] at ±2fc, we see that the above integral is indeed zero.
(b) Almost anything works here, and a simple choice is u(t) = [iv(t)] = rect(8fct − 1/2).
Exercise 6.15:
(a)
Z
1
−1
√2p(t − jT) cos(2πfct)√2p∗(t − kT) cos(2πfct)dt
=
Z
1
−1
p(t − jT)p∗(t − kT)[1 + cos(4πfct)]dt
=
Z
1
−1
p(t − jT)p∗(t − kT)dt +
Z
1
−1
p(t − jT)p∗(t − kT) cos(4πfct)dt
= δjk +
1
2
Z
1
−1
p(t − jT)p∗(t − kT)
h
e4πifct + e−4πifct
i
dt.
The remaining task is to show that the integral above is 0. Let gjk(t) = p(t − jT)g∗(t − kT).
Note that ˆgjk(f) is the convolution of the transform of p(t − jT) and that of p∗(t − kT). Since
p is lowpass limited to fc, gjk is lowpass limited to 2fc, and thus the integral (which calculates
the Fourier transform of gjk at 2fc and −2fc) is zero.
45

Prin digcommselectedsoln

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (10)

Similar to Prin digcommselectedsoln

Similar to Prin digcommselectedsoln (20)

Prin digcommselectedsoln