Information Theory Coding Theorems For Discrete
Memoryless Systems 2nd Edition Imre Csiszr
download
https://ebookbell.com/product/information-theory-coding-theorems-
for-discrete-memoryless-systems-2nd-edition-imre-csiszr-42168218
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Coding Theorems Of Classical And Quantum Information Theory Second
Edition Parthasarathy
https://ebookbell.com/product/coding-theorems-of-classical-and-
quantum-information-theory-second-edition-parthasarathy-6754144
Information Theory Coding And Cryptography Mandal Surajit Manna
https://ebookbell.com/product/information-theory-coding-and-
cryptography-mandal-surajit-manna-21999880
Information Theory Coding And Cryptography 3rd Edition Ranjan Bose
https://ebookbell.com/product/information-theory-coding-and-
cryptography-3rd-edition-ranjan-bose-10953542
Information Theory And Coding 1st Edition Dr Muralidhar Kulkarni Dr
Shivaprakash K S
https://ebookbell.com/product/information-theory-and-coding-1st-
edition-dr-muralidhar-kulkarni-dr-shivaprakash-k-s-48324350
Information Theory And Coding By Example New Kelbert Mark Suhov Yuri
https://ebookbell.com/product/information-theory-and-coding-by-
example-new-kelbert-mark-suhov-yuri-55201892
Information Theory And Coding Solved Problems Predrag Ivani
https://ebookbell.com/product/information-theory-and-coding-solved-
problems-predrag-ivani-5725500
Fundamentals In Information Theory And Coding Borda Monica
https://ebookbell.com/product/fundamentals-in-information-theory-and-
coding-borda-monica-20009716
Twodimensional Information Theory And Coding With Application To
Graphics And Highdensity Storage Media Jrn Justesen Sren Forchhammer
https://ebookbell.com/product/twodimensional-information-theory-and-
coding-with-application-to-graphics-and-highdensity-storage-media-jrn-
justesen-sren-forchhammer-4700602
Fundamentals Of Information Theory And Coding Design 1st Edition
Roberto Togneri
https://ebookbell.com/product/fundamentals-of-information-theory-and-
coding-design-1st-edition-roberto-togneri-1006086
Information Theory
Coding Theorems for Discrete Memoryless Systems
This book is widely regarded as a classic in the field of information theory, providing
deep insights and expert treatment of the key theoretical issues. It includes in-depth
coverage of the mathematics of reliable information transmission, both in two-terminal
and multi-terminal network scenarios. Updated and considerably expanded, this new
edition presents unique discussions of information-theoretic secrecy and of zero-error
information theory, including substantial connections of the latter with extremal com-
binatorics. The presentations of all core subjects are self-contained, even the advanced
topics, which helps readers to understand the important connections between seemingly
different problems. Finally, 320 end-of-chapter problems, together with helpful solving
hints, allow readers to develop a full command of the mathematical techniques. This
is an ideal resource for graduate students and researchers in electrical and electronic
engineering, computer science, and applied mathematics.
Imre Csiszár is a Research Professor at the Alfréd Rényi Institute of Mathematics of
the Hungarian Academy of Sciences, where he has worked since 1961. He is also Pro-
fessor Emeritus of the University of Technology and Economics, Budapest, a Fellow
of the IEEE, and former President of the Hungarian Mathematical Society. He has
received numerous awards, including the Shannon Award of the IEEE Information
Theory Society (1996).
János Körner is a Professor of Computer Science at the Sapienza University of Rome,
Italy, where he has worked since 1992. Prior to this, he was a member of the Institute
of Mathematics of the Hungarian Academy of Sciences for over 20 years, and he also
worked at AT&T Bell Laboratories, Murray Hill, New Jersey, for two years.
The field of applied mathematics known as Information Theory owes its origins and
early development to three pioneers: Shannon (USA), Kolmogorov (Russia) and Rényi
(Hungary). This book, authored by two of Rényi’s leading disciples, represents the
elegant and precise development of the subject by the Hungarian School. This sec-
ond edition contains new research of the authors on applications to secrecy theory and
zero-error capacity with connections to combinatorial mathematics.
Andrew Viterbi, USC
Information Theory: Coding Theorems for Discrete Memoryless Systems, by Imre
Csiszár and János Körner, is a classic of modern information theory. “Classic” since
its first edition appeared in 1979. “Modern” since the mathematical techniques and
the results treated are still fundamentally up to date today. This new edition was long
overdue. Beyond the original material, it contains two new chapters on zero-error infor-
mation theory and connections to extremal combinatorics, and on information theoretic
security, a topic that has garnered very significant attention in the last few years. This
book is an indispensable reference for researchers and graduate students working in the
exciting and ever-growing area of information theory.
Giuseppe Caire, USC
The first edition of the Csiszár and Körner book on information theory is a classic, in
constant use by most mathematically-oriented information theorists. The second edition
expands the first with two new chapters, one on zero-error information theory and one
on information theoretic security. These use the same consistent set of tools as edition
1 to organize and prove the central results of these currently important areas. In addi-
tion, there are many new problems added to the original chapters, placing many newer
research results into a consistent formulation.
Robert Gallager, MIT
The classic treatise on the fundamental limits of discrete memoryless sources and
channels –an indispensable tool for every information theorist.
Sergio Verdu, Princeton
Information Theory
Coding Theorems for Discrete Memoryless Systems
IMRE CSISZÁR
Alfréd Rényi Institute of Mathematics, Hungarian Academy of Sciences, Hungary
JÁNOS KÖRNER
Sapienza University of Rome, Italy
C A M B R I D G E U N I V E R S I T Y P R E S S
Cambridge, New York, Melbourne, Madrid, Cape Town,
Singapore, São Paulo, Delhi, Tokyo, Mexico City
Cambridge University Press
The Edinburgh Building, Cambridge CB2 8RU, UK
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9780521196819
First edition c
 Akadémiai Kiadó, Budapest 1981
Second edition c
 Cambridge University Press 2011
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 1981
Second edition 2011
Printed in the United Kingdom at the University Press, Cambridge
A catalog record for this publication is available from the British Library
ISBN 978-0-521-19681-9 Hardback
Cambridge University Press has no responsibility for the persistence or
accuracy of URLs for external or third-party internet websites referred to
in this publication, and does not guarantee that any content on such
websites is, or will remain, accurate or appropriate.
To the memory of Alfréd Rényi,
the outstanding mathematician
who established information theory in Hungary
1
9
7
6
10
5 3
2 4
14 16
Dependence graph of the text; numbers refer to chapters
17
13 15
12
11 8
Contents
Preface to the first edition page ix
Preface to the second edition xi
Basic notation and conventions xii
Introduction xv
Part I Information measures in simple coding problems 1
1 Source coding and hypothesis
testing; information measures 3
2 Types and typical sequences 16
3 Formal properties of Shannon’s
information measures 34
4 Non-block source coding 48
5 Blowing up lemma: a combinatorial digression 71
Part II Two-terminal systems 81
6 The noisy channel coding problem 83
7 Rate-distortion trade-off in source coding and the source–channel
transmission problem 107
8 Computation of channel capacity and -distortion rates 120
9 A covering lemma and the error exponent in source coding 132
10 A packing lemma and the error exponent in channel coding 144
11 The compound channel revisited: zero-error information theory and extremal
combinatorics 184
viii Contents
12 Arbitrarily varying channels 209
Part III Multi-terminal systems 241
13 Separate coding of correlated sources 243
14 Multiple-access channels 272
15 Entropy and image size characterization 304
16 Source and channel networks 354
17 Information-theoretic security 400
References 461
Name index 478
Index of symbols and abbreviations 482
Subject index 485
Preface to the first edition
Information theory was created by Claude E. Shannon for the study of certain quan-
titative aspects of information, primarily as an analysis of the impact of coding on
information transmission. Research in this field has resulted in several mathematical
theories. Our subject is the stochastic theory, often referred to as the Shannon theory,
which directly descends from Shannon’s pioneering work.
This book is intended for graduate students and research workers in mathematics
(probability and statistics), electrical engineering and computer science. It aims to
present a well-integrated mathematical discipline, including substantial new develop-
ments of the 1970s. Although applications in engineering and science are not covered,
we hope to have presented the subject so that a sound basis for applications has also
been provided. A heuristic discussion of mathematical models of communication sys-
tems is given in the Introduction, which also offers a general outline of the intuitive
background for the mathematical problems treated in the book.
As the title indicates, this book deals with discrete memoryless systems. In other
words, our mathematical models involve independent random variables with finite
range. Idealized as these models are from the point of view of most applications, their
study reveals the characteristic phenomena of information theory without burdening
the reader with the technicalities needed in the more complex cases. In fact, the reader
needs no other prerequisites than elementary probability and a reasonable mathematical
maturity. By limiting our scope to the discrete memoryless case, it was possible to use
a unified, basically combinatorial approach. Compared with other methods, this often
led to stronger results and yet simpler proofs. The combinatorial approach also seems
to lead to a deeper understanding of the subject.
The dependence graph of the text is shown on p. vi.
There are several ways to build up a course using this book. A one-semester graduate
course can be made up of Chapters 1, 2, 6, 7 and the first half of Chapter 13. A challeng-
ing short course is provided by Chapters 2, 9 and 10. In both cases, the technicalities
from Chapter 3 should be used when necessary. For students with some information
theory background, a course on multi-terminal Shannon theory can be based on Part III,
using Chapters 2 and 6 as preliminaries. The problems offer a lot of opportunities for
creative work for the students. It should be noted, however, that illustrative examples are
scarce; thus the teacher is also supposed to do some homework of his own by supplying
such examples.
x Preface to the first edition
Every chapter consists of text followed by a Problems section. The text covers the
main ideas and proof techniques, with a sample of the results they yield. The selection
of the latter was influenced both by didactic considerations and the authors’ research
interests. Many results of equal importance are given in the Problem sections. While
the text is self-contained, there are several points at which the reader is advised to
supplement formal understanding by consulting specific problems. This suggestion is
indicated by the Problem number in the margin of the text. For all but a few problems
sufficient hints are given to enable a serious student familiar with the corresponding
text to give a solution. The exceptions, marked by an asterisk, serve mainly for
supplementary information; these problems are not necessarily more difficult than the
others, but their solution requires methods not treated in the text.
In the text the origins of the results are not mentioned, but credits to authors are
given at the end of each chapter. Concerning the Problems, an appropriate attribution
accompanies each Problem. An absence of references indicates that the assertion is
either folklore or else an unpublished result of the authors. Results were attributed on
the basis of publications in journals or books with complete proofs. The number after
the author’s name indicates the year of appearance of the publication. Conference talks,
theses and technical reports are quoted only if – to our knowledge – their authors have
never published their result in another form. In such cases, the word “unpublished” is
attached to the reference year, to indicate that the latter does not include the usual delay
of “regular” publications.
We are indebted to our friends Rudy Ahlswede, Péter Gács and Katalin Marton for
fruitful discussions which contributed to many of our ideas. Our thanks are due to
R. Ahlswede, P. Bártfai, J. Beck, S. Csibi, P. Gács, S. I. Gelfand, J. Komlós, G. Longo,
K. Marton, A. Sgarro and G. Tusnády for reading various parts of the manuscript. Some
of them have saved us from vicious errors.
The patience of Mrs. Éva Várnai in typing and retyping the ever-changing manuscript
should be remembered, as well as the spectacular pace of her doing it.
Special mention should be made of the friendly assistance of Sándor Csibi who
helped us to overcome technical difficulties with the preparation of the manuscript. Last
but not least, we are grateful to Eugene Lukács for his constant encouragement without
which this project might not have been completed.
Preface to the second edition
When the first edition of this book went to print, information theory was only 30 years
old. At that time we covered a large part of the topic indicated in the title, a goal that is
no longer realistic. An additional 30 years have passed, the Internet revolution occurred,
and information theory has grown in breadth, volume and impact. Nevertheless, we feel
that, despite many new developments, our original book has not lost its relevance since
the material therein is still central to the field.
The main novelty of this second edition is the addition of two new chapters. These
cover zero-error problems and their connections to combinatorics (Chapter 11) and
information-theoretic security (Chapter 17). Of several new research directions that
emerged in the 30 years between the two editions, we chose to highlight these two
because of personal research interests. As a matter of fact, these topics started to intrigue
us when writing the first edition; back then, this led us to a last-minute addition of
problems on secrecy.
Except for the new chapters, new results are presented only in the form of problems.
These either directly complete the original material or, occasionally, illustrate a new
research area. We made only minor changes, mainly corrections, to the text of the origi-
nal chapters. (Hence the words recent and new refer to the time of the first edition, unless
the context indicates otherwise.) We have updated the history part of each chapter and,
in particular, we have included pointers to new developments. We have not broadened
the original scope of the book. Readers interested in a wider perspective may consult
Cover and Thomas (2006).
In the preface to the first edition we suggested several ways in which to construct
courses using this book. In addition, either of the new Chapters 11 or 17 can be used for
a short graduate course.
As in the first edition, this book is dedicated to the memory of Alfréd Rényi,
whose mathematical heritage continues to influence information theory and to
inspire us.
Special thanks are due to Miklós Simonovits, who, sacrificing his precious research
time, assisted us to overcome TeX-nical difficulties as only the most selfless friend
would do. We are indebted to our friends Prakash Narayan and Gábor Simonyi, as well
as to the Ph.D. students Lóránt Farkas, Tamás Kói, Sirin Nitinawarat and Himanshu
Tyagi for a careful reading of parts of the manuscript.
Basic notation and conventions
 equal by definition
iff if and only if
 end of a theorem, definition, remark, etc.
 end of a proof
A, B, . . . , X, Y, Z sets (finite unless stated otherwise; infinite sets will be
usually denoted by script capitals)
∅ void set
x ∈ X x is an element of the set X; as a rule, elements of a set will
be denoted by the same letter as the set
X  {x1, . . . , xk} X is a set having elements x1, . . . , xk
|X| number of elements of the set X
x = (x1, . . . , xn)
x = x1 . . . xn

vector (finite sequence) of elements of a set X
X × Y Cartesian product of the sets X and Y
Xn nth Cartesian power of X, i.e., the set of n-length sequences
of elements of X
X∗ set of all finite sequences of elements of X
A ⊂ X A is a (not necessarily proper) subset of X
A − B the set of those elements x ∈ A which are not in B
Ā complement of a set A ⊂ X, i.e., Ā  X − A (will be used
only if a finite ground set X is specified)
A ◦ B symmetric difference: A ◦ B  (A − B) ∪ (B − A)
f : X → Y mapping of X into Y
f −1(y) the inverse image of y ∈ Y, i.e., f −1(y)  {x : f (x) = y}
|| f || number of elements of the range of the mapping f
PD abbreviation of “probability distribution”
P  {P(x) : x ∈ X} PD on X
P(A) probability of the set A ⊂ X for the PD P,
i.e., P(A) 

x∈A P(x)
P × Q direct product of the PDs P on X and Q on Y,
i.e., P × Q  {P(x)Q(y) : x ∈ X, y ∈ Y}
Pn nth power of the PD P, i.e., Pn(x) 
n
i=1 P(xi )
support of P the set {x : P(x)  0}
Notation xiii
W : X → Y
W = {W(y|x) :
x ∈ X, y ∈ Y}
⎫
⎬
⎭
stochastic matrix with rows indexed by elements of X and
columns indexed by elements of Y; i.e., W(·|x) is a PD on
Y for every x ∈ X
W(B|x) probability of the set B ⊂ Y for the PD W(·|x)
Wn : Xn → Yn nth direct power of W, i.e., Wn(y|x) 
n
i=1 W(yi |xi )
RV abbreviation for “random variable”
X, Y, Z RVs ranging over finite sets
Xn = (X1, . . . , Xn)
Xn = X1 . . . Xn

alternative notations for the vector-valued RV with compo-
nents X1, . . ., Xn
Pr {X ∈ A} probability of the event that the RV X takes a value in the
set A
PX distribution of the RV X, defined by PX (x)  Pr {X = x}
PY|X=x conditional distribution of Y given X = x, i.e.,
PY|X=x (y)  Pr {Y = y|X = x}; not defined if PX (x) = 0
PY|X the stochastic matrix with rows PY|X=x , called the con-
ditional distribution of Y given X; here x ranges over the
support of PX
PY|X = W means that PY|X=x = W(·|x) if PX (x)  0, involving no
assumption on the remaining rows of W
E X expectation of the real-valued RV X
var(X) variance of the real-valued RV X
X o
— Y o
— Z means that these RVs form a Markov chain in this order
(a, b), [a, b], [a, b) open, closed resp. left-closed interval with endpoints a  b
|r|+ positive part of the real number r, i.e., |r|+  max (r, 0)
r largest integer not exceeding r
r smallest integer not less than r
min[a, b], max[a, b] the smaller resp. larger of the numbers a and b
r  s means for vectors r = (r1, . . . ,rn), s = (s1, . . . , sn) of the
n-dimensional Euclidean space that ri  si , i = 1, . . . , n
A convex closure of a subset A of a Euclidean space, i.e., the
smallest closed convex set containing A
exp, log are understood to the base 2
ln natural logarithm
a log(a/b) equals zero if a = 0 and +∞ if a  b = 0
h(r) the binary entropy function
h(r)  −r log r − (1 − r) log(1 − r), r ∈ [0, 1]
Most asymptotic results in this book are established with uniform convergence. Our
way of specifying the extent of uniformity is to indicate in the statement of results all
those parameters involved in the problem upon which threshold indices depend. In this
context, e.g., n0 = n0(|X|, ε, δ) means some threshold index which could be explicitly
given as a function of |X|, ε, δ alone.
xiv Notation
Preliminaries on random variables and probability distributions
As we shall deal with RVs ranging over finite sets, the measure-theoretic foundations of
probability theory will never really be needed. Still, in a formal sense, when speaking
of RVs it is understood that a Kolmogorov probability space (, F, μ) is given (i.e., 
is some set, F is a σ-algebra of its subsets, and μ is a probability measure on F). Then
a RV with values in a finite set X is a mapping X :  → X such that X−1(x) ∈ F for
every x ∈ X. The probability of an event defined in terms of RVs means the μ-measure
of the corresponding subset of , e.g.,
Pr {X ∈ A}  μ({ω : X(ω) ∈ A}).
Throughout this book, it will be assumed that the underlying probability space
(, F, μ) is “rich enough” in the following sense. To any pair of finite sets X, Y, any
RV X with values in X and any distribution P on X × Y whose marginal on X coincides
with PX , there exists a RV Y with values in Y such that PXY = P. This assumption is
certainly fulfilled, e.g., if  is the unit interval, F is the family of its Borel subsets, and
μ is the Lebesgue measure.
The set of all PDs on a finite set X will be identified with the subset of the |X|-
dimensional Euclidean space, consisting of all vectors with non-negative components
summing up to unity. Linear combinations of PDs and convexity are understood accord-
ingly. For example, the convexity of a real-valued function f (P) of PDs on X means
that
f (αP1 + (1 − α)P2)  αf (P1) + (1 − α) f (P2)
for every P1, P2 and α ∈ (0, 1). Similarly, topological terms for PDs on X refer to the
metric topology defined by Euclidean distance. In particular, the convergence Pn → P
means that Pn(x) → P(x) for every x ∈ X.
The set of all stochastic matrices W : X → Y is identified with a subset of the
|X||Y|-dimensional Euclidean space in an analogous manner. Convexity and topological
concepts for stochastic matrices are understood accordingly.
Finally, for any distribution P on X and any stochastic matrix W : X → Y we denote
by PW the distribution on Y defined as the matrix product of the (row) vector P and
the matrix W, i.e.,
(PW)(y) 

x∈X
P(x)W(y|x) for every y ∈ Y.
Introduction
Information is a fashionable concept with many facets, among which the quantitative
one–our subject–is perhaps less striking than fundamental. At the intuitive level, for
our purposes, it suffices to say that information is some knowledge of predetermined
type contained in certain data or pattern and wanted at some destination. Actually, this
concept will not explicitly enter the mathematical theory. However, throughout the book
certain functionals of random variables will be conveniently interpreted as measures of
the amount of information provided by the phenomena modeled by these variables. Such
information measures are characteristic tools of the analysis of optimal performance of
codes, and they have turned out to be useful in other branches of mathematics as well.
Intuitive background
The mathematical discipline of information theory, created by C. E. Shannon (1948) on
an engineering background, still has a special relation to communication engineering,
the latter being its major field of application and the source of its problems and moti-
vation. We believe that some familiarity with the intuitive communication background
is necessary for a more than formal understanding of the theory, let alone for doing fur-
ther research. The heuristics, underlying most of the material in this book, can be best
explained on Shannon’s idealized model of a communication system (which can also
be regarded as a model of an information storage system). The important question of
how far the models treated are related to, and the results obtained are relevant for, real
systems will not be addressed. In this respect we note that although satisfactory math-
ematical modeling of real systems is often very difficult, it is widely recognized that
significant insight into their capabilities is given by phenomena discovered on appar-
ently overidealized models. Familiarity with the mathematical methods and techniques
of proof is a valuable tool for system designers in judging how these phenomena apply
in concrete cases.
Shannon’s famous block diagram of a (two-terminal) communication system is
shown in Fig. I.1. Before turning to the mathematical aspects of Shannon’s model, let
us take a glance at the objects to be modeled.
The source of information may be nature, a human being, a computer, etc. The data
or pattern containing the information at the source is called the message; it may consist
of observations on a natural phenomenon, a spoken or written sentence, a sequence of
xvi Introduction
Source Encoder Channel Decoder Destination
Figure I.1
binary digits, etc. Part of the information contained in the message (e.g., the shape of
characters of a handwritten text) may be immaterial to the particular destination. Small
distortions of the relevant information might be tolerated as well. These two aspects
are jointly reflected in a fidelity criterion for the reproduction of the message at the
destination. For example, for a person watching a color TV program on a black-and-
white set, the information contained in the colors must be considered immaterial and
the fidelity criterion is met if the picture is not perceivably worse than it would be by a
good black-and-white transmission. Clearly, the fidelity criterion of a person watching
the program in color would be different.
The source and destination are separated in space or time. The communication or
storing device available for bridging over this separation is called the channel. As a
rule, the channel does not work perfectly and thus its output may significantly differ
from the input. This phenomenon is referred to as channel noise. While the properties
of the source and channel are considered unalterable, characteristic to Shannon’s model
is the liberty of transforming the message before it enters the channel. Such a transfor-
mation, called encoding, is always necessary if the message is not a possible input of the
channel (e.g., a written sentence cannot be directly radioed). More importantly, encod-
ing is an effective tool of reducing the cost of transmission and of combating channel
noise (trivial examples are abbreviations such as cable addresses in telegrams on the
one hand, and spelling names on telephone on the other). Of course, these two goals are
conflicting and a compromise must be found. If the message has been encoded before
entering the channel – and often even if not – a suitable processing of the channel out-
put is necessary in order to retrieve the information in a form needed at the destination;
this processing is called decoding. The devices performing encoding and decoding are
the encoder and decoder of Fig. I.1. The rules determining their operation constitute the
code. A code accomplishes reliable transmission if the joint operation of encoder, chan-
nel and decoder results in reproducing the source messages at the destination within the
prescribed fidelity criterion.
Informal description of the basic mathematical model
Shannon developed information theory as a mathematical study of the problem of
reliable transmission at a possibly low cost (for a given source, channel and fidelity
criteria). For this purpose mathematical models of the objects in Fig. I.1 had to be
introduced. The terminology of the following models reflects the point of view of com-
munication between terminals separated in space. Appropriately interchanging the roles
of time and space, these models are equally suitable for describing data storage.
Having in mind a source which keeps producing information, its output is visual-
ized as an infinite sequence of symbols (e.g., Latin characters, binary digits, etc.). For
Introduction xvii
an observer, the successive symbols cannot be predicted. Rather, they seem to appear
randomly according to probabilistic laws representing potentially available prior knowl-
edge about the nature of the source (e.g., in the case of an English text we may think of
language statistics, such as letter or word frequencies, etc.). For this reason the source
is identified with a discrete-time stochastic process. The first k random variables of the
source process represent a random message of length k; realizations thereof are called
messages of length k. The theory is largely of asymptotic character: we are interested
in the transmission of long messages. This justifies restricting our attention to messages
of equal length, although, e.g., in an English text, the first k letters need not repre-
sent a meaningful piece of information; the point is that a sentence cut at the tail is
of negligible length compared to a large k. In non-asymptotic investigations, however,
the structure of messages is of secondary importance. Then it is mathematically more
convenient to regard them as realizations of an arbitrary random variable, the so-called
random message (which may be identified with a finite segment of the source process
or even with the whole process, etc.). Hence we shall often speak of messages (and their
transformation) without specifying a source.
An obvious way of taking advantage of a stochastic model is to disregard undesirable
events of small probability. The simplest fidelity criterion of this kind is that the proba-
bility of error, i.e., the overall probability of not receiving the message accurately at the
destination, should not exceed a given small number. More generally, viewing the mes-
sage and its reproduction at the destination as realizations of stochastically dependent
random variables, a fidelity criterion is formulated as a global requirement involving
their joint distribution. Usually, one introduces a numerical measure of the loss result-
ing from a particular reproduction of a message. In information theory this is called a
distortion measure. A typical fidelity criterion is that the expected distortion be less than
a threshold, or that the probability of a distortion transgressing this threshold be small.
The channel is supposed to be capable of successively transmitting symbols from a
given set, the input alphabet. There is a starting point of the transmission and each of
the successive uses of the channel consists of putting in one symbol and observing the
corresponding symbol at the output. In the ideal case of a noiseless channel the output
is identical to the input; in general, however, they may differ and the output need not be
uniquely determined by the input. Also, the output alphabet may differ from the input
alphabet. Following the stochastic approach, it is assumed that for every finite sequence
of input symbols there exists a probability distribution on output sequences of the same
length. This distribution governs the successive outputs if the elements of the given
sequence are successively transmitted from the start of transmission on, as the begin-
ning of a potentially infinite sequence. This assumption implies that no output symbol
is affected by possible later inputs, and it amounts to certain consistency requirements
among the mentioned distributions. The family of these distributions represents all pos-
sible knowledge about the channel noise, prior to transmission. This family defines the
channel as a mathematical object.
The encoder maps messages into sequences of channel input symbols in a not neces-
sarily one-to-one way. Mathematically, this very mapping is the encoder. The images of
messages are referred to as codewords. For convenience, attention is usually restricted
xviii Introduction
to encoders with fixed codeword length, mapping the messages into channel input
sequences of length n, say. Similarly, from a purely mathematical point of view, a
decoder is a mapping of output sequences of the channel into reproductions of mes-
sages. By a code we shall mean, as a rule, an encoder–decoder pair or, in specific
problems, a mathematical object effectively determining this pair.
A random message, an encoder, a channel and a decoder define a joint probability
distribution over messages, channel input and output sequences, and reproductions of
the messages at the destination. In particular, it can be decided whether a given fidelity
criterion is met. If it is, we speak of reliable transmission of the random message. The
cost of transmission is not explicitly included in the above mathematical model. As a
rule, one implicitly assumes that its main factor is the cost of channel use, the latter
being proportional to the length of the input sequence. (In the case of telecommunica-
tion this length determines the channel’s operation time and, in the case of data storage,
the occupied space, provided that each symbol requires the same time or space, respec-
tively.) Hence, for a given random message, channel and fidelity criterion, the problem
consists in finding the smallest codeword length n for which reliable transmission can
be achieved.
We are basically interested in the reliable transmission of long messages of a given
source using fixed-length-to-fixed-length codes, i.e., encoders mapping messages of
length k into channel input sequences of length n and decoders mapping channel out-
put sequences of length n into reproduction sequences of length k. The average number
n/k of channel symbols used for the transmission of one source symbol is a measure
of the performance of the code, and it will be called the transmission ratio. The goal
is to determine the limit of the minimum transmission ratio (LMTR) needed for reli-
able transmission, as the message length k tends to infinity. Implicit in this problem
statement is that fidelity criteria are given for all sufficiently large k. Of course, for the
existence of a finite LMTR, let alone for its computability, proper conditions on source,
channel and fidelity criteria are needed.
The intuitive problem of transmission of long messages can also be approached in
another – more ambitious – manner, incorporating into the model certain constraints on
the complexity of encoder and decoder, along with the requirement that the transmis-
sion be indefinitely continuable. Any fixed-length-to-fixed-length code, designed for
transmitting messages of length k by n channel symbols, say, may be used for non-
terminating transmission as follows. The infinite source output sequence is partitioned
into consecutive blocks of length k. The encoder mapping is applied to each block
separately and the channel input sequence is the succession of the obtained blocks of
length n. The channel output sequence is partitioned accordingly and is decoded block-
wise by the given decoder. This method defines a code for non-terminating transmission.
The transmission ratio is n/k; the block lengths k and n constitute a rough measure of
complexity of the code. If the channel has no “input memory,” i.e., the transmission of
the individual blocks is not affected by previous inputs, and if the source and channel
are time-invariant, then each source block will be reproduced within the same fidelity
criterion as the first one. Suppose, in addition, that the fidelity criteria for messages of
different length have the following property: if successive blocks and their reproductions
Introduction xix
individually meet the fidelity criterion, then so does their juxtaposition. Then, by this
very coding, messages of potentially infinite length are reliably transmitted, and one can
speak of reliable non-terminating transmission. Needless to say, this blockwise coding
is a very special way of realizing non-terminating transmission. Still, within a very
general class of codes for reliable non-terminating transmission, in order to minimize
the transmission ratio1 under conditions such as above, it suffices to restrict attention
to blockwise codes. In such cases the present minimum equals the previous LMTR
and the two approaches to the intuitive problem of transmission of long messages are
equivalent.
While in this book we basically adopt the first approach, a major reason of consid-
ering mainly fixed-length-to-fixed-length codes consists in their appropriateness also
for non-terminating transmission. These codes themselves are often called block codes
without specifically referring to non-terminating transmission.
Measuring information
A remarkable feature of the LMTR problem, discovered by Shannon and established
in great generality by further research, is a phenomenon suggesting the heuristic inter-
pretation that information, like liquids, “has volume but no shape,” i.e., the amount of
information is measurable by a scalar. Just as the time necessary for conveying the liq-
uid content of a large container through a pipe (at a given flow velocity) is determined
by the ratio of the volume of the liquid to the cross-sectional area of the pipe, the LMTR
equals the ratio of two numbers, one depending on the source and fidelity criterion, the
other depending on the channel. The first number is interpreted as a measure of the
amount of information needed, on average, for the reproduction of one source symbol,
whereas the second is a measure of the channel’s capacity, i.e., of how much informa-
tion is transmissible on average by one channel use. It is customary to take as a standard
the simplest channel that can be used for transmitting information, namely the noise-
less channel with two input symbols, 0 and 1, say. The capacity of this binary noiseless
channel, i.e., the amount of information transmissible by one binary digit, is considered
the unit of the amount of information, called 1 bit. Accordingly, the amount of informa-
tion needed on average for the reproduction of one symbol of a given source (relative
to a given fidelity criterion) is measured by the LMTR for this source and the binary
noiseless channel. In particular, if the most demanding fidelity criterion is imposed,
which within a stochastic theory is that of a small probability of error, the correspond-
ing LMTR provides a measure of the total amount of information carried, on average,
by one source symbol.
1 The relevance of this minimization problem to data storage is obvious. In typical communication situations,
however, the transmission ratio of non-terminating transmission cannot be chosen freely. Rather, it is deter-
mined by the rates at which the source produces and the channel transmits symbols. Then one question is
whether a given transmission ratio admits reliable transmission, but this is mathematically equivalent to the
above minimization problem.
xx Introduction
The above ideas naturally suggest the need for a measure of the amount of infor-
mation individually contained in a single source output. In view of our source model,
this means to associate some information content with an arbitrary random variable.
One relies on the intuitive postulate that the observation of a collection of independent
random variables yields an amount of information equal to the sum of the information
contents of the individual variables. Accordingly, one defines the entropy (information
content) of a random variable as the amount of information carried, on average, by one
symbol of a source which consists of a sequence of independent copies of the random
variable in question. This very entropy is also a measure of the amount of uncertainty
concerning this random variable before its observation.
We have sketched a way of assigning information measures to sources and chan-
nels in connection with the LMTR problem and arrived, in particular, at the concept
of entropy of a single variable. There is also an opposite way: starting from entropy,
which can be expressed by a simple formula, one can build up more complex func-
tionals of probability distributions. On the basis of heuristic considerations (quite
independent of the above communication model), these functionals can be interpreted
as information measures corresponding to different connections of random variables.
The operational significance of these information measures is not a-priori evident.
Still, under general conditions the solution of the LMTR problem can be given in
terms of these quantities. More precisely, the corresponding theorems assert that the
operationally defined information measures for source and channel can be given by
such functionals, just as intuition suggests. This consistency underlines the importance
of entropy-based information measures, both from a formal and a heuristic point of
view.
The relevance of these functionals, corresponding to their heuristic meaning, is not
restricted to communication or storage problems. Still, there are also other functionals
which can be interpreted as information measures with an operational significance not
related to coding.
Multi-terminal systems
Shannon’s block diagram (Fig. I.1) models one-way communication between two termi-
nals. The communication link it describes can be considered as an artificially isolated
elementary part of a large communication system involving exchange of information
among many participants. Such an isolation is motivated by the implicit assumptions
that
(i) the source and channel are in some sense independent of the remainder of the
system, the effects of the environment being taken into account only as channel
noise,
(ii) if exchange of information takes place in both directions, they do not affect each
other.
Introduction xxi
Note that dropping assumption (ii) is meaningful even in the case of communica-
tion between two terminals. Then the new phenomenon arises that transmission in one
direction has the byproduct of feeding back information on the result of transmission
in the opposite direction. This feedback can conceivably be exploited for improv-
ing the performance of the code; this, however, will necessitate a modification of the
mathematical concept of the encoder.
Problems involving feedback will be discussed in this book only casually. On the
other hand, the whole of Part III will be devoted to problems arising from dropping
assumption (i). This leads to models of multi-terminal systems with several sources,
channels and destinations, such that the stochastic interdependence of individual sources
and channels is taken into account. A heuristic description of such mathematical mod-
els at this point would lead too far. However, we feel that readers familiar with the
mathematics of two-terminal systems treated in Parts I and II will have no difficulty in
understanding the motivation for the multi-terminal models of Part III.
Part I
Information measures in simple
coding problems
1 Source coding and hypothesis
testing; information measures
A (discrete) source is a sequence {Xi }∞
i=1 of random variables (RVs) taking values
in a finite set X called the source alphabet. If the Xi ’s are independent and have the
same distribution P, we speak of a discrete memoryless source (DMS) with generic
distribution P.
A k-to-n binary block code is a pair of mappings
f : Xk
→ {0, 1}n
, ϕ : {0, 1}n
→ Xk
.
For a given source, the probability of error of the code ( f, ϕ) is
e( f, ϕ)  Pr{ϕ( f (Xk
)) = Xk
},
where Xk stands for the k-length initial string of the sequence {Xi }∞
i=1. We are interested
in finding codes with small ratio n/k and small probability of error. ➞ 1.1
More exactly, for every k let n(k, ε) be the smallest n for which there exists a k-to-n
binary block code satisfying e( f, ϕ)  ε; we want to determine limk→∞
n(k,ε)
k . ➞ 1.2
THEOREM 1.1 For a DMS with generic distribution P = {P(x) : x ∈ X}
lim
k→∞
n(k, ε)
k
= H(P) for every ε ∈ (0, 1), (1.1)
where H(P)  −

x ∈ X
P(x) log P(x). 
COROLLARY 1.1
0  H(P)  log |X|. (1.2)

Proof The existence of a k-to-n binary block code with e( f, ϕ)  ε is equivalent to
the existence of a set A ⊂ Xk with Pk(A)  1 − ε, |A|  2n (let A be the set of those
sequences x ∈ Xk which are reproduced correctly, i.e., ϕ( f (x)) = x). Denote by s(k, ε)
the minimum cardinality of sets A ⊂ Xk with Pk(A)  1 − ε. It suffices to show that
lim
k→∞
1
k
log s(k, ε) = H(P) (ε ∈ (0, 1)). (1.3)
To this end, let B(k, δ) be the set of those sequences x ∈ Xk which have probability
exp{−k(H(P) + δ)}  Pk
(x)  exp{−k(H(P) − δ)}.
4 Information measures in simple coding problems
We first show that Pk(B(k, δ)) → 1 as k → ∞, for every δ  0. In fact, consider the
real-valued RVs
Yi  − log P(Xi );
these are well defined with probability 1 even if P(x) = 0 for some x ∈ X. The Yi ’s are
independent, identically distributed and have expectation H(P). Thus by the weak law
of large numbers
lim
k→∞
Pr
1
k
k

i=1
Yi − H(P)  δ = 1 for every δ  0.
As Xk ∈ B(k, δ) iff |1
k
k
i=1 Yi − H(P)|  δ, the convergence relation means that
lim
k→∞
Pk
(B(k, δ)) = 1 for every δ  0, (1.4)
as claimed. The definition of B(k, δ) implies that
|B(k, δ)|  exp{k(H(P)) + δ)}.
Thus (1.4) gives for every δ  0
lim
k→∞
1
k
log s(k, ε)  lim
k→∞
1
k
log |B(k, δ)|  H(P) + δ. (1.5)
On the other hand, for every set A ⊂ Xk with Pk(A)  1 − ε, (1.4) implies
Pk
(A ∩ B(k, δ)) 
1 − ε
2
for sufficiently large k. Hence, by the definition of B(k, δ),
|A|  |A ∩ B(k, δ)| 

x∈A ∩ B(k,δ)
Pk
(x) exp{k(H(P) − δ)}

1 − ε
2
exp{k(H(P) − δ)},
proving that for every δ  0
lim
k→∞
1
k
log s(k, ε)  H(P) − δ.
This and (1.5) establish (1.3). The corollary is immediate. 
For intuitive reasons expounded in the Introduction, the limit H(P) in Theorem 1.1 is
interpreted as a measure of the information content of (or the uncertainty about) a RV X
with distribution PX = P. It is called the entropy of the RV X or of the distribution P:
H(X) = H(P)  −

x∈X
P(x) log P(x).
This definition is often referred to as Shannon’s formula.
Source coding and hypothesis testing 5
The mathematical essence of Theorem 1.1 is formula (1.3). It gives the asymptotics
for the minimum size of sets of large probability in Xk. We now generalize (1.3) for the
case when the elements of Xk have unequal weights and the size of subsets is measured
by total weight rather than cardinality.
Let us be given a sequence of positive-valued “mass functions” M1(x), M2(x), . . .
on X and set
M(x) 
k
i=1
Mi (xi ) for x = x1 · · · xk ∈ Xk
.
For an arbitrary sequence of X-valued RVs {Xi }∞
i=1 consider the minimum of the
M-mass
M(A) 

x ∈ A
M(x)
of those sets A ⊂ Xk which contain Xk with high probability: let s(k, ε) denote the
minimum of M(A) for sets A ⊂ Xk of probability
PXk (A)  1 − ε.
The previous s(k, ε) is a special case obtained if all the functions Mi (x) are identically
equal to 1.
THEOREM 1.2 If the Xi ’s are independent with distributions Pi  PXi and
| log Mi (x)|  c for every i and x ∈ X then, setting
Ek 
1
k
k

i=1

x ∈ X
Pi (x) log
Mi (x)
Pi (x)
,
we have for every 0  ε  1
lim
k→∞
1
k
log s(k, ε) − Ek

= 0.
More precisely, for every δ, ε ∈ (0, 1),
1
k
log s(k, ε) − Ek  δ if k  k0 = k0(|X|, c, ε, δ). (1.6)

Proof Consider the real-valued RVs
Yi  log
Mi (Xi )
Pi (Xi )
.
Since the Yi ’s are independent and E 1
k
k

i=1
Yi

= Ek, Chebyshev’s inequality gives
for any δ  0
Pr
1
k
k

i=1
Yi − Ek  δ

1
k2δ2
k

i=1
var (Yi ) 
1
kδ2
max
i
var (Yi ).
6 Information measures in simple coding problems
This means that for the set
B(k, δ
) 

x : x ∈ Xk
, Ek − δ

1
k
log
M(x)
PXk (x)
 Ek + δ

we have
PXk (B(k, δ
))  1 − ηk, where ηk 
1
kδ2
max
i
var (Yi ).
Since by the definition of B(k, δ)
M(B(k, δ
)) =

x∈B(k,δ)
M(x) 

x∈B(k,δ)
PXk (x) exp[k(Ek + δ
)]  exp[k(Ek + δ
)],
it follows that
1
k
log s(k, ε) 
1
k
log M(B(k, δ
))  Ek + δ
if ηk  ε.
On the other hand, we have PXk (A ∩ B(k, δ))  1 − ε − ηk for any set A ⊂ Xk with
PXk (A)  1 − ε. Thus for every such A, again by the definition of B(k, δ),
M(A)  M(A ∩ B(k, δ
)) 

x∈A∩B(k,δ)
PXk (x) exp{k(Ek − δ
)}
 (1 − ε − ηk) exp[(Ek − δ
)],
implying
1
k
log s(k, ε) 
1
k
log(1 − ε − ηk) + Ek + δ
.
Setting δ  δ/2, these results imply (1.6) provided that
ηk =
4
kδ2
max
i
var (Yi )  ε and
1
k
log(1 − ε − ηk)  −
δ
2
.
By the assumption | log Mi (x)|  c, the last relations hold if k  k0(|X|, c, ε, δ). 
An important corollary of Theorem 1.2 relates to testing statistical hypotheses. Sup-
pose that a probability distribution of interest for the statistician is given by either
P = {P(x) : x ∈ X} or Q = {Q(x) : x ∈ X}. She or he has to decide between P and
Q on the basis of a sample of size k, i.e., the result of k independent drawings from
the unknown distribution. A (non-randomized) test is characterized by a set A ⊂ Xk, in
➞ 1.3
the sense that if the sample X1 . . . Xk belongs to A, the statistician accepts P and else
accepts Q. In most practical situations of this kind, the role of the two hypotheses is not
symmetric. It is customary to prescribe a bound ε for the tolerated probability of wrong
decision if P is the true distribution. Then the task is to minimize the probability of a
wrong decision if hypothesis Q is true. The latter minimum is
➞ 1.4
β(k, ε)  min
A⊂Xk
Pk(A)1−ε
Qk
(A).
Source coding and hypothesis testing 7
COROLLARY 1.2 For any 0  ε  1,
lim
k→∞
1
k
log β(k, ε) = −

x∈X
P(x) log
P(x)
Q(x)
. 
Proof If Q(x)  0 for each x ∈ X, set Pi  P, Mi  Q in Theorem 1.2. If P(x) 
Q(x) = 0 for some x ∈ X, the P-probability of the set of all k-length sequences con-
taining this x tends to 1. This means that β(k, ε) = 0 for sufficiently large k, so that
both sides of the asserted equality are −∞. 
It follows from Corollary 1.2 that the sum on the right-hand side is non-negative.
It measures how much the distribution Q differs from P in the sense of statistical
distinguishability, and is called informational divergence or I-divergence:
D(P||Q) 

x∈X
P(x) log
P(x)
Q(x)
.
Another common name given to this quantity is relative entropy. Intuitively, one can
say that the larger D(P||Q) is, the more information for discriminating between the
hypotheses P and Q can be obtained from one observation. Hence D(P||Q) is also
called the information for discrimination. The amount of information measured by
D(P||Q) is, however, conceptually different from entropy, since it has no immediate
coding interpretation.
On the space of infinite sequences of elements of X one can build up product measures
both from P and Q. If P = Q, the two product measures are mutually orthogonal;
D(P||Q) is a (non-symmetric) measure of how fast their restrictions to k-length strings
approach orthogonality.
REMARK Both entropy and informational divergence have a form of expectation:
H(X) = E(− log P(X)), D(P||Q) = E log
P(X)
Q(X)
,
where X is a RV with distribution P. It is convenient to interpret − log P(x), resp.
log P(x)/Q(x), as a measure of the amount of information, resp. the weight of evidence
in favor of P against Q provided by a particular value x of X. These quantities are
important ingredients of the mathematical framework of information theory, but have
less direct operational meaning than their expectations. 
The entropy of a pair of RVs (X, Y) with finite ranges X and Y needs no new def-
inition, since the pair can be considered a single RV with range X × Y. For brevity,
instead of H((X, Y)) we shall write H(X, Y); similar notation will be used for any
finite collection of RVs.
The intuitive interpretation of entropy suggests to consider as further information
measures certain expressions built up from entropies. The difference H(X, Y) − H(X)
measures the additional amount of information provided by Y if X is already known.
8 Information measures in simple coding problems
It is called the conditional entropy of Y given X:
H(Y|X)  H(X, Y) − H(X).
Expressing the entropy difference by Shannon’s formula we obtain
H(Y|X) = −

x∈X

y∈Y
PXY (x, y) log
PXY (x, y)
PX (x)
=

x∈X
PX (x)H(Y|X = x), (1.7)
where
H(Y|X = x)

= −

y∈Y
PY|X (y|x) log PY|X (y|x).
Thus H(Y|X) is the expectation of the entropy of the conditional distribution of Y
given X = x. This gives further support to the above intuitive interpretation of condi-
tional entropy. Intuition also suggests that the conditional entropy cannot exceed the
unconditional one.
➞ 1.5
LEMMA 1.3
H(Y|X)  H(Y). 
Proof
H(Y) − H(Y|X) = H(Y) − H(X, Y) + H(X)
=

x∈X

y∈Y
PXY (x, y) log
PXY (x, y)
PX (x)PY (y)
= D(PXY PX × PY )  0. 
REMARK For certain values of x, H(Y|X = x) may be larger than H(Y). 
The entropy difference in the preceding proof measures the decrease of uncertainty
about Y caused by the knowledge of X. In other words, it is a measure of the amount
of information about Y contained in X. Note the remarkable fact that this difference is
symmetric in X and Y. It is called mutual information:
I (X ∧ Y)

= H(Y) − H(Y|X) = H(X) − H(X|Y) = D(PXY PX × PY ). (1.8)
Of course, the amount of information contained in X about itself is just the entropy:
I (X ∧ X) = H(X).
Mutual information is a measure of stochastic dependence of the RVs X and Y. The
fact that I (X ∧ Y) equals the informational divergence of the joint distribution of X
and Y from what it would be if X and Y were independent reinforces this interpretation.
There is no compelling reason other than tradition to denote mutual information by a
different symbol than entropy. We keep this tradition, although our notation I (X ∧ Y)
differs slightly from the more common I (X; Y).
Source coding and hypothesis testing 9
Discussion
Theorem 1.1 says that the minimum number of binary digits needed – on average – to
represent one symbol of a DMS with generic distribution P equals the entropy H(P).
This fact – and similar ones discussed later on – are our basis for interpreting H(X)
as a measure of the amount of information contained in the RV X, resp. of the uncer-
tainty about this RV. In other words, in this book we adopt an operational or pragmatic
approach to the concept of information. Alternatively, one could start from the intu-
itive concept of information and set up certain postulates which an information measure
should fulfil. Some representative results of this axiomatic approach are treated in
Problems 1.11–1.14.
Our starting point, Theorem 1.1, has been proved here in the conceptually simplest
way. The key idea is that, for large k, all sequences in a subset of Xk with probability
close to 1, namely B(k, δ), have “nearly equal” probabilities in an exponential sense.
This proof easily extends also to non-DM cases (not in the scope of this book).
On the other hand, in order to treat DM models at depth, another – purely combina-
torial – approach will be more suitable. The preliminaries to this approach will be given
in Chapter 2.
Theorem 1.2 demonstrates the intrinsic relationship of the basic source coding and
hypothesis testing problems. The interplay of information theory and mathematical
statistics goes much further; its more substantial examples are beyond the scope of this
book. 
Problems
1.1. (a) Check that the problem of determining limk→∞
1
k n(k, ε) for a discrete
source is just the formal statement of the LMTR problem (see the Intro-
duction) for the given source and the binary noiseless channel, with the
probability of error fidelity criterion.
(b) Show that for a DMS and a noiseless channel with arbitrary alphabet size m
the LMTR is H(P)/ log m, where P is the generic distribution of the source.
1.2. Given an encoder f : Xk → {0, 1}n, show that the probability of error e( f, ϕ)
is minimized iff the decoder ϕ : {0, 1}n → Xk has the property that ϕ(y) is a
sequence of maximum probability among those x ∈ Xk for which f (x) = y.
1.3. A randomized test introduces a chance element into the decision between the
hypotheses P and Q in the sense that if the result of k successive drawings
is x ∈ Xk, one accepts the hypothesis P with probability π(x), say. Define
the analog of β(k, ε) for randomized tests and show that it still satisfies
Corollary 1.2.
1.4. (Neyman–Pearson lemma) Show that for any given bound 0  ε  1 on the
probability of wrong decision if P is true, the best randomized test is given by
π(x) =
⎧
⎪
⎪
⎨
⎪
⎪
⎩
1 if Pk(x)  ck Qk(x)
γk if Pk(x) = ck Qk(x)
0 if Pk(x)  ck Qk(x),
10 Information measures in simple coding problems
where ck and γk are appropriate constants. Observe that the case k = 1 contains
the general one, and there is no need to restrict attention to independent
drawings.
1.5. (a) Let {Xi }∞
i=1 be a sequence of independent RVs with common range X but
with arbitrary distributions. As in Theorem 1.1, denote by n(k, ε) the small-
est n for which there exists a k-to-n binary block code having probability of
error  ε for the source {Xi }∞
i=1. Show that for every ε ∈ (0, 1) and δ  0
n(k, ε)
k
−
1
k
k

i=1
H(Xi )  δ if k  k0(|X|, ε, δ).
Hint Use Theorem 1.2 with Mi (x) = 1.
(b) Let {(Xi , Yi )}∞
i=1 be a sequence of independent replicas of a pair of RVs
(X, Y) and suppose that Xk should be encoded and decoded in the knowl-
edge of Yk. Let ñ(k, ε) be the smallest n for which there exists an encoder
f : Xk × Yk → {0, 1}n and a decoder ϕ : {0, 1}n × Yk → Xk such that the
probability of error is Pr{ϕ( f (Xk, Yk), Yk) = Xk}  ε.
Show that
lim
k→∞
ñ(k, ε)
k
= H(X|Y) for every ε ∈ (0, 1).
Hint Use part (a) for the conditional distributions of the Xi ’s given various
realizations y of Yk.
1.6. (Random selection of codes) Let F(k, n) be the class of all mappings f : Xk →
{0, 1}n. Given a source {Xi }∞
i=1, consider the class of codes ( f, ϕ f ), where f
ranges over F(k, n) and ϕ f : {0, 1}n → Xk is defined so as to minimize e( f, ϕ);
see Problem 1.2. Show that for a DMS with generic distribution P we have
1
|F(k, n)|

f ∈F(k,n)
e( f, ϕ f ) → 0,
if k and n tend to infinity, so that
inf
n
k
 H(P).
Hint Consider a random mapping F of Xk into {0, 1}n, assigning to each x ∈ Xk
one of the 2n binary sequences of length n with equal probabilities 2−n, indepen-
dently of each other and of the source RVs. Let Φ : {0, 1}n → Xk be the random
mapping taking the value ϕ f if F = f . Then
1
|F(k, n)|

f ∈F(k,n)
e( f, ϕ f ) = Pr{Φ(F(Xk
)) = Xk
}
=

x∈Xk
Pk
(x)Pr{Φ(F(x)) = x}.
Source coding and hypothesis testing 11
Here
Pr{Φ(F(x)) = x}  2−n
|{x
: Pk
(x
)  Pk
(x)}|
and this is less than 2−n+k(H(P)+δ) if Pk(x)  2−k(H(P)+δ).
1.7. (a) (Linear source codes) Let X be a Galois field (i.e., any finite field) and con-
sider Xk as a vector space over this field. A linear source code is a pair of
mappings f : Xk → Xn and ϕ : Xn → Xk such that f is a linear mapping
(ϕ is arbitrary). Show that for a DMS with generic distribution P there exist
linear source codes with n/k → H(P)/ log |X| and e( f, ϕ) → 0. Compare
this result with Problem 1.l(b). (Implicit in Elias (1955), cf. Wyner (1974).)
Hint Verify that the class of all linear mappings f : Xk → Xn satisfies the
condition in (b) below.
(b) Extend the result of Problem 1.6 to the case when the role of {0, 1} is played
by any finite set Y, and F(k, n) is any class of mappings f : Xk → Yn
satisfying
1
|F(k, n)|
{ f ∈ F(k, n): f (x) = f (x
)}  |Y|−n
for x = x
.
(Such a class of mappings is called a universal hash family; see Carter and
Wegman (1979).)
Hint If |Y| = 2, the hint to Problem 1.6 applies verbatim for the random map-
ping F selected from the present F(k, n), by the uniform distribution. If |Y|  2,
the crucial bound on Pr{Φ(F(x)) = x} will hold with |Y|−n instead of 2−n;
accordingly, the assertion follows if in the hypothesis H(P) is replaced by
H(P)/ log |Y|.
1.8.∗ Show that the s(k, ε) of Theorem 1.2 satisfies
log s(k, ε) − Ek −
√
kλSk +
1
2
log k 
140
δ8
whenever
δ  min Sk,
1
Rk

, δ  ε  1 − δ,
√
k 
140
δ8
.
Here Sk 

1
k
k
i=1 var (Yi )
1/2
, Rk

=

1
k
k
i=1 E|Yi −EYi |3
1/3
and λ is
determined by Φ(λ) = 1 − ε, where Φ denotes the distribution function of the
standard normal distribution; Ek and Yi are the same as in the text. (See Strassen
(1964).)
1.9. In hypothesis testing problems it sometimes makes sense to speak of “prior prob-
abilities” Pr{P is true} = p0 and Pr{Q is true} = q0 = 1 − p0. On the basis of a
sample x ∈ Xk, the posterior probabilities are then calculated as
12 Information measures in simple coding problems
Pr{P is true |Xk
= x}  pk(x) =
p0 Pk(x)
p0 Pk(x) + q0 Qk(x)
,
Pr{Q is true |Xk
= x}  qk(x) = 1 − pk(x).
Show that if P is true then pk(Xk) → 1 and 1
k log qk(Xk) → −D(PQ) with
probability 1, no matter what was p0 ∈ (0, 1).
1.10. The interpretation of entropy as a measure of uncertainty suggests that “more
uniform” distributions have larger entropy. For two distributions P and Q on X
we call P more uniform than Q, in symbols P  Q, if for the non-increasing
ordering p1  p2  · · ·  pn, q1  q2  · · ·  qn (n = |X|) of their proba-
bilities,
k
i=1 pi 
k
i=1 qi for every 1  k  n. Show that P  Q implies
H(P)  H(Q); compare this result with (1.2).
(More generally, P  Q implies
k
i=1 ψ(pi ) 
k
i=1 ψ(qi ) for every con-
vex function ψ; see Karamata (1932).)
Postulational characterizations of entropy (Problems 1.11–1.14)
In the following problems, Hm(p1, . . . , pm), m = 2, 3, . . ., designates a
sequence of real-valued functions defined for non-negative pi ’s with sum
1 such that Hm is invariant under permutations of the pi ’s. Some simple
postulates on Hm will be formulated which ensure that
Hm(p1, . . . , pm) = −
m

i=1
pi log pi , m = 2, 3 . . . (∗)
In particular, we shall say that {Hm} is
(i) expansible if Hm+1(p1, . . . , pm, 0) = Hm(p1, . . . , pm);
(ii) additive if
Hm(p1, . . . , pm) + Hn(q1, . . . , qn)
= Hmn(p1q1, . . . , p1qn, . . . , pmq1, . . . , pmqn);
(iii) subadditive if Hm(p1, . . . , pn) + Hn(q1, . . . , qn)  Hmn(r11, . . . ,rmn)
whenever
n

j=1
ri j = pi ,
m

i=1
ri j = qj ;
(iv) branching if there exist functions Jm(x, y) (with x, y  0, x + y  1,
m = 3, 4, . . .) such that
Hm(p1, . . . , pm) − Hm−1(p1 + p2, . . . , pm) = Jm(p1, p2);
(v) recursive if it is branching with
Jm(p1, p2) = (p1 + p2)H2
p1
p1 + p2
,
p2
p1 + p2

, m = 3, 4, . . . ;
(vi) normalized if H2

1
2 , 1
2

= 1.
For a complete exposition of this subject, we refer to Aczél and Daróczy
(1975).
Source coding and hypothesis testing 13
1.11. Show that if {Hm} is recursive, normalized and H2(p, 1 − p) is a continu-
ous function of p then (∗) holds. (See Faddeev (1956); the first “axiomatic”
characterization of entropy, using somewhat stronger postulates, was given by
Shannon (1948).)
Hint The key step is to prove Hm

1
m , . . . , 1
m

= log m. To this end,
check that f (m)  Hm

1
m , . . . , 1
m

is additive, i.e., f (mn) = f (m) + f (n),
and that f (m + 1) − f (m) → 0 as m → ∞. Show that these properties and
f (2) = 1 imply f (m) = log m. (The last implication is a result of Erdös
(1946); for a simple proof, see Rényi (1961).)
1.12.∗ (a) Show that if Hm(p1, . . . , pm) =
m
i=1 g(pi ) with a continuous function
g(p), and {Hm} is additive and normalized, then (∗) holds. (Chaundy and
McLeod (1960)).
(b) Show that if {Hm} is expansible and branching then Hm(p1, . . . , pm) =
m
i=1 g(pi ), with g(0) = 0 (Ng, (1974).)
1.13.∗ (a) Show that if {Hm} is expansible, additive, subadditive, normalized and
H2(p, 1 − p) → 0 as p → 0 then (∗) holds.
(b) If {Hm} is expansible, additive and subadditive, show that there exist
constants A  0, B  0 such that
Hm(p1, . . . , pm) = A

−
m

i=1
pi log pi

+ B log |{i : pi  0}|.
(Forte (1975), Aczél–Forte–Ng (1974).)
1.14.∗ Suppose that Hm(p1, . . . , pm) = − log −1
m
i=1 pi (pi )

with some
strictly monotonic continuous function  on (0,1] such that t(t) →
0(0)  0 as t → 0. Show that if {Hm} is additive and normalized then
either (∗) holds or
Hm(p1, . . . , pm) =
1
1 − α
log
m

i=1
pα
i with some α  0, α = 1.
(Conjectured by Rényi (1961) and proved by Daróczy (1964). The preceding
expression is called Rényi’s entropy of order α. A similar expression was used
earlier by Schützenberger (1954) as “pseudo information.”)
1.15. For P = (p1, . . . , pm), denote by Hα(P) the Rényi entropy of order α if
α = 1, α  0, and the Shannon entropy H(P) if α = 1. Show that Hα(P)
is a continuous, non-increasing function of α, whose limits as α → 0, resp.
α → +∞, are
H0(P)  log |{i : pi  0}| , H∞(P)  min (− log pi ),
called the maxentropy, resp. minentropy, of P.
Hint Check that log
m
i=1 pα
i is a convex function of α.
1.16. (Fisher’s information) Let {Pϑ } be a family of distributions on a finite set X,
where ϑ is a real parameter ranging over an open interval. Suppose that the
probabilities Pϑ (x) are positive and that they are continuously differentiable
functions of ϑ. Write
14 Information measures in simple coding problems
I (ϑ) 

x∈X
1
Pϑ (x)
∂
∂ϑ
Pϑ (x)
2
.
(a) Show that for every ϑ
lim
ϑ→ϑ
1
(ϑ − ϑ)2
D(Pϑ Pϑ ) =
1
ln 4
I (ϑ)
(Kullback and Leibler, 1951).
(b) Show that every unbiased estimator f of ϑ from a sample of size n, i.e.,
every real-valued function f on Xn such that Eϑ f (Xn) = ϑ for each ϑ,
satisfies
varϑ ( f (Xn
)) 
1
nI (ϑ)
.
Here Eϑ and varϑ denote expectation, resp. variance, in the case when Xn
has distribution Pn
ϑ .
(Fisher (1925) introduced I (ϑ) as a measure of the information contained
in one observation from Pϑ for estimating ϑ. His motivation was that the
maximum likelihood estimator of ϑ from a sample of size n has asymptotic
variance 1/(nI (ϑ0)) if ϑ = ϑ0. The assertion of (b) is a special case of the
Cramér–Rao inequality, see e.g., Schmetterer (1974).)
Hint (a) directly follows by L’Hospital’s rule. For (b), it suffices to consider
the case n = 1. But

x∈X
Pϑ (x)
1
Pϑ (x)
∂
∂ϑ
Pϑ (x)
2
·

x∈X
Pϑ (x)( f (x) − ϑ)2
 1
follows from Cauchy’s inequality, since

x∈X
∂
∂ϑ
Pϑ (x) · ( f (x) − ϑ) =
ϑ
∂ϑ

x∈X
Pϑ (x) f (x) = 1.
Story of the results
The basic concepts of information theory are due to Shannon (1948). In particular, he
proved Theorem 1.1, introduced the information measures entropy, conditional entropy
and mutual information, and established their basic properties. The name entropy has
been borrowed from physics, as entropy in the sense of statistical physics is expressed
by a similar formula, due to Boltzmann (1877). The very idea of measuring informa-
tion regardless of its content dates back to Hartley (1928), who assigned to a symbol
out of m alternatives the amount of information log m. An information measure in a
specific context was used by Fisher (1925), as in Problem 1.16. Informational diver-
gence was introduced by Kullback and Leibler (1951) (under the name information for
discrimination; they used the term divergence for its symmetrized version). Corollary
1.2 is known as Stein’s lemma; it appears in Chernoff (1956), attributed to C. Stein.
Theorem 1.2 is a common generalization of Theorem 1.1 and Corollary 1.2; a stronger
Source coding and hypothesis testing 15
result of this kind was given by Strassen (1964), see Problem 1.8. For a nice discus-
sion of the pragmatic and axiomatic approaches to information measures, see Rényi
(1965).
Addition. For more on the interplay of information theory and statistics, see Kullback
(1959), Rissanen (1989), and Csiszár and Shields (2004).
2 Types and typical sequences
Most of the proof techniques used in this book will be based on a few simple
combinatorial lemmas, summarized below.
Drawing k times independently with distribution Q from a finite set X, the probability
of obtaining the sequence x ∈ Xk depends only on how often the various elements of
X occur in x. In fact, denoting by N(a|x) the number of occurrences of a ∈ X in x,
we have
Qk
(x) =
a∈X
Q(a)N(a|x)
. (2.1)
DEFINITION 2.1 The type of a sequence x ∈ Xk is the distribution Px on X defined by
Px(a) 
1
k
N(a|x) for every a ∈ X.
For any distribution P on X, the set of sequences of type P in Xk is denoted by Tk
P or
simply TP. A distribution P on X is called a type of sequences in Xk if Tk
P = ∅. 
Sometimes the term “type” will also be used for the sets Tk
P = ∅ when this does not
lead to ambiguity. These sets are also called type classes or composition classes.
REMARK In mathematical statistics, if x ∈ Xk is a sample of size k consisting of
the results of k observations, the type of x is called the empirical distribution of the
sample x. 
By (2.1), the Qk-probability of a subset of TP is determined by its cardinality. Hence
the Qk-probability of any subset A of Xk can be calculated by combinatorial counting
arguments, looking at the intersections of A with the various sets TP separately. In doing
so, it will be relevant that the number of different types in Xk is much smaller than the
number of sequences x ∈ Xk.
LEMMA 2.2 (Type counting) The number of different types of sequences in Xk is less
than (k + 1)|X|. 
➞ 2.1
Proof For every a ∈ X, N(a|x) can take k + 1 different values. 
The next lemma explains the role of entropy from a combinatorial point of view, via
the asymptotics of a multinomial coefficient.
Types and typical sequences 17
LEMMA 2.3 For any type P of sequences in Xk
➞ 2.2
(k + 1)−|X|
exp[kH(P)]  |TP|  exp[kH(P)]. 
Proof Since (2.1) implies
Pk
(x) = exp[−kH(P)] if x ∈ TP
we have
|TP| = Pk
(TP) exp[kH(P)].
Hence it is enough to prove that
Pk
(TP)  (k + 1)−|X|
.
This will follow by the Type counting lemma if we show that the Pk-probability of T
P
is maximized for 
P = P.
By (2.1) we have
Pk
(T
P) = |T
P| ·
a∈X
P(a)k 
P(a)
=
k!

a∈X
(k 
P(a))! a∈X
P(a)k 
P(a)
for every type 
P of sequences in Xk.
It follows that
Pk(T
P)
Pk(TP)
=
a∈X
(kP(a))!
(k 
P(a))!
P(a)k(
P(a)−P(a))
.
Applying the obvious inequality n!
m!  nn−m, this gives
Pk(T
P)
Pk(TP)

a∈X
kk(P(a)−
P(a))
= 1. 
If X and Y are two finite sets, the joint type of a pair of sequences x ∈ Xk and y ∈ Yk
is defined as the type of the sequence {(xi , yi )}k
i=1 ∈ (X × Y)k. In other words, it is the
distribution Px,y on X × Y defined by
Px,y(a, b) 
1
k
N(a, b|x, y) for every a ∈ X, b ∈ Y.
Joint types will often be given in terms of the type of x and a stochastic matrix V : X → Y
such that
Px,y(a, b) = Px(a)V (b|a) for every a ∈ X, b ∈ Y. (2.2)
Note that the joint type Px,y uniquely determines V (b|a) for those a ∈ X which do
occur in the sequence x. For conditional probabilities of sequences y ∈ Yk, given a
sequence x ∈ Yk, the matrix V of (2.2) will play the same role as the type of y does for
unconditional probabilities.
18 Information measures in simple coding problems
DEFINITION 2.4 We say that y ∈ Yk has conditional type V given x ∈ Xk if
N(a, b|x, y) = N(a|x)V (b|a) for every a ∈ X, b ∈ Y.
For any given x ∈ Yk and stochastic matrix V : X → Y, the set of sequences y ∈ Yk
having conditional type V given x will be called the V-shell of x, denoted by Tk
V (x) or
simply TV (x). 
REMARK The conditional type of y given x is not uniquely determined if some a ∈ X
do not occur in x. Still, the set TV (x) containing y is unique. 
Note that conditional type is a generalization of types. In fact, if all the components of
➞ 2.3
the sequence x are equal (say x) then the V -shell of x coincides with the set of sequences
of type V (·|x) in Yk.
In order to formulate the basic size and probability estimates for V -shells, it will be
convenient to introduce some notations. The average of the entropies of the rows of a
stochastic matrix V : X → Y with respect to a distribution P on X will be denoted by
H(V |P)

=

x∈X
P(x)H(V (·|x)). (2.3)
The analogous average of the informational divergences of the corresponding rows of
two stochastic matrices V : X → Y and W : X → Y will be denoted by
D(V W|P)

=

x∈X
P(x)D(V (·|x)W(·|x)). (2.4)
Note that H(V |P) is the conditional entropy H(Y|X) of RVs X and Y such that X has
distribution P and Y has conditional distribution V given X. The quantity D(V W|P)
is called the conditional informational divergence. A counterpart of Lemma 2.3 for
V -shells is
LEMMA 2.5 For every x ∈ Xk and stochastic matrix V : X → Y such that TV (x) is
non-void, we have
(k + 1)−|X||Y|
exp [kH(V |Px)]  |TV (x)|  exp [kH(V |Px)]. 
Proof This is an easy consequence of Lemma 2.2. In fact, |TV (x)| depends on x only
through the type of x. Hence we may assume that x is the juxtaposition of sequences
xa, a ∈ X, where xa consists of N(a|x) identical elements a. In this case TV (x) is the
Cartesian product of the sets of sequences of type V (·|a) in YN(a|x), with a running over
those elements of X which occur in x.
Thus Lemma 2.3 gives
a∈X
(N(a|x) + 1)−|Y|
exp [N(a|x)H(V (·|a))]
 |TV (x)| 
a∈X
exp [N(a|x)H(V (·|a))],
whence the assertion follows by (2.3).
Types and typical sequences 19
LEMMA 2.6 For every type P of sequences in Xk and distribution Q on X
Qk
(x) = exp [−k(D(PQ) + H(P))] if x ∈ TP, (2.5)
(k + 1)−|X|
exp [−kD(PQ)]  Qk
(TP)  exp [−kD(PQ)]. (2.6)
Similarly, for every x ∈ Xk and stochastic matrices V : X → Y, W : X → Y such that
TV (x) is non-void,
Wk
(y|x) = exp [−k(D(V W|Px) + H(V |Px))] if y ∈ TV (x), (2.7)
(k + 1)−|XY|
exp [−kD(V W|Px)]  Wk
(TV (x)|x)
 exp [−kD(V W|Px)]. (2.8)

Proof Equation (2.5) is just a rewriting of (2.1). Similarly, (2.7) is a rewriting of the
identity
Wk
(y|x) =
a∈X, b∈Y
W(b|a)N(a,b|x,y)
.
The remaining assertions now follow from Lemmas 2.3 and 2.5. 
The quantity D(PQ) + H(P) = −

x∈X P(x) log Q(x) appearing in (2.5) is
sometimes called inaccuracy.
For Q = P, the Qk-probability of the set Tk
P is exponentially small (for large k);
cf. Lemma 2.6. It can be seen that even Pk(Tk
P) → 0 as k → ∞. Thus sets of large ➞ 2.2
probability must contain sequences of different types. Dealing with such sets, the con-
tinuity of the entropy function plays a relevant role. The next lemma gives more precise
information on this continuity.
The variation distance of two distributions P and Q on X is
d(P, Q) 

x∈X
|P(x) − Q(x)|.
(Some authors use the term for the half of this.)
LEMMA 2.7 If d(P, Q) =   1/2 then
|H(P) − H(Q)|  − log

|X|
. 
For a sharpening of this lemma, see Problem 3.10.
Proof Write ϑ(x)  |P(x) − Q(x)|. Since f (t)  −t log t is concave and f (0) =
f (1) = 0, we have for every 0  t  1 − τ, 0  τ  1/2,
| f (t) − f (t + τ)|  max( f (τ), f (1 − τ)) = −τ log τ.
20 Information measures in simple coding problems
Hence for 0    1/2
|H(P) − H(Q)| 

x∈X
| f (P(x)) − f (Q(x))|  −

x∈X
ϑ(x) log ϑ(x)
= 

−

x∈X
ϑ(x)

log
ϑ(x)

− log 

  log |X| −  log ,
where the last step follows from Corollary 1.1. 
DEFINITION 2.8 For any distribution P on X, a sequence x ∈ Xk is called P-typical
with constant δ if
1
k
N(a|x) − P(a)  δ for every a ∈ X
and, in addition, no a ∈ X with P(a) = 0 occurs in x. The set of such sequences will be
denoted by Tk
[P]δ
or simply T[P]δ . If X is a RV with values in X, we refer to PX -typical
sequences as X-typical, and write Tk
[X]δ
or T[X]δ for Tk
[PX ]δ
. 
REMARK Tk
[P]δ
is the union of the sets Tk

P
for those types 
P of sequences in Xk which
satisfy
|
P(a) − P(a)|  δ for every a ∈ X
and 
P(a) = 0 whenever P(a) = 0. 
DEFINITION 2.9 For a stochastic matrix W : X → Y, a sequence y ∈ Yk is W-typical
under the condition x ∈ Xk (or W-generated by the sequence x ∈ Xk) with constant δ if
1
k
N(a, b|x, y) −
1
k
N(a|x)W(b|a)  δ for every a ∈ X, b ∈ Y,
and, in addition, N(a, b|x, y) = 0 whenever W(b|a) = 0. The set of such sequences
y will be denoted by Tk
[W]δ
(x) or simply by T[W]δ (x). Further, if X and Y are RVs
with values in X resp. Y and PY|X = W, then we shall speak of Y|X-typical or
Y|X-generated sequences and write Tk
[Y|X]δ
(x) or T[Y|X]δ (x) for Tk
[W]δ
(x). 
Sequences Y|X-generated by an x ∈ Xk are defined only if the condition PY|X = W
uniquely determines W(·|a) for a ∈ X with N(a|x)  0, that is, if no a ∈ X with
PX (a) = 0 occurs in the sequence x; this automatically holds if x is X-typical.
The set Tk
[XY]δ
of (X, Y)-typical pairs (x, y) ∈ Xk × Yk is defined applying
Definition 2.8 to (X, Y) in the role of X. When the pair (x, y) is typical, we say that
x and y are jointly typical.
LEMMA 2.10 If x ∈ Tk
[X]δ
and y ∈ Tk
[Y|X]δ
(x) then (x, y) ∈ Tk
[XY]δ+δ
and, conse-
quently, y ∈ Tk
[Y]δ
for δ  (δ + δ)|X|.
Types and typical sequences 21
For reasons which will be obvious from Lemmas 2.12 and 2.13, typical sequences ➞ 2.4
will be used with δ depending on k such that
δk → 0,
√
k · δk → ∞ as k → ∞. (2.9)
Throughout this book, we adopt the following convention.
CONVENTION 2.11 (Delta-convention) To every set X resp. ordered pair of sets (X, Y)
there is given a sequence {δk}∞
k=1 satisfying (2.9). Typical sequences are understood with
these δk. The sequences {δk} are considered as fixed, and in all assertions dependence
on them will be suppressed. Accordingly, the constant δ will be omitted from the nota-
tion, i.e., we shall write Tk
[P], Tk
[W](x), etc. In most applications, some simple relations
between these sequences {δk} will also be needed. In particular, whenever we need that
typical sequences should generate typical ones, we assume that the corresponding δk are
chosen according to Lemma 2.10. 
LEMMA 2.12 There exists a sequence εk → 0 depending only on |X| and |Y| (see
the delta-convention) so that for every distribution P on X and stochastic matrix
W : X → Y
Pk
(Tk
[P])  1 − εk,
Wk
(Tk
[W](x)|x)  1 − εk for every x ∈ Xk
. 
REMARK More explicitly,
Pk
(Tk
[P]δ
)  1 −
|X|
4kδ2
, Wk
(Tk
[W]δ
(x)|x)  1 −
|XY|
4kδ2
,
for every δ  0, and here the terms subtracted from 1 could be replaced even by
2|X|e−2kδ2
resp. 2|X||Y|e−2kδ2
. 
Proof It suffices to prove the inequalities of the Remark. Clearly, the second inequality
implies the first one as a special case (choose in the second inequality a one-point set
for X). Now if x = x1 . . . xk, let Y1, Y2, . . ., Yk be independent RVs with distributions
PYi = W(·|xi ). Then the RV N(a, b|x, Yk) has binominal distribution with expectation
N(a|x)W(b|a) and variance
N(a|x)W(b|a)(1 − W(b|a)) 
1
4
N(a|x) 
k
4
.
Thus by Chebyshev’s inequality
Pr {|N(a, b|x, Yk
) − N(a|x)W(b|a)|  kδ} 
1
4kδ2
for every a ∈ X, b ∈ Y. Hence the inequality with 1 − |XY|
4kδ2 follows. The claimed
sharper bound is obtained similarly, employing Hoeffding’s inequality (see Problem
3.18 (b)) instead of Chebyshev’s.
22 Information measures in simple coding problems
LEMMA 2.13 There exists a sequence εk → 0 depending only on |X| and |Y| (see
the delta-convention) so that for every distribution P on X and stochastic matrix
W : X → Y
1
k
log |Tk
[P]| − H(P)  εk
and
1
k
log |Tk
[W](x)| − H(W|P)  εk for every x ∈ Tk
[P]. 
Proof The first assertion immediately follows from Lemma 2.3 and the uniform con-
tinuity of the entropy function (Lemma 2.7). The second assertion, containing the first
one as a special case, follows similarly from Lemmas 2.5 and 2.7. To be formal, observe
that, by the type counting lemma, Tk
[W](x) is the union of at most (k + 1)|XY| disjoint
V -shells TV (x). By Definitions 2.4 and 2.9, all the underlying V satisfy
|Px(a)V (b|a) − Px(a)W(b|a)|  δ
k for every a ∈ X, b ∈ Y, (2.10)
where {δ
k} is the sequence corresponding to the pair of sets X, Y by the delta-convention.
By (2.10) and Lemma 2.7, the entropies of the joint distributions on X × Y determined
by Px and V resp. by Px and W differ by at most −|XY|δ
k log δ
k (if |XY|δ
k  1/2)
and thus also
|H(V |Px) − H(W|Px)|  −|XY|δ
k log δ
k.
On account of Lemma 2.5, it follows that
(k + 1)−|XY|
exp [k(H(W|Px) + |XY|δ
k log δ
k)]
 |Tk
[W](x)|  (k + 1)|XY|
exp [k(H(W|Px) − |XY|δ
k log δ
k)]. (2.11)
Finally, since x is P-typical, i.e.,
|Px(a) − P(a)|  δk for every a ∈ X,
we have by Corollary 1.1
|H(W|Px) − H(W|P)|  δklog|Y|.
Substituting this into (2.11), the assertion follows. 
The last basic lemma of this chapter asserts that no “large probability set” can be
substantially smaller than T[P] resp.T[W](x).
LEMMA 2.14 Given 0  η  1, there exists a sequence εk → 0 depending only on
η, |X| and |Y| such that
(i) if A ⊂ Xk, Pk(A)  η then 1
k log|A|  H(P) − εk;
(ii) if B ⊂ Yk, Wk(B|x)  η then 1
k log|B|  H(W|Px) − εk.
Types and typical sequences 23
COROLLARY 2.14 There exists a sequence ε
k → 0 depending only on η, |X|, |Y| (see
the delta-convention) such that if B ⊂ Yk and Wk(B|x)  η for some x ∈ T[P] then
1
k
log|B|  H(W|P) − ε
k. 
Proof It is sufficient to prove (ii). By Lemma 2.12, the condition Wk(B|x)  η
implies
Wk
(B ∩ T[W](x)|x) 
η
2
for k  k0(η, |X|, |Y|). Recall that T[W](x) is the union of disjoint V -shells TV(x) satis-
fying (2.10); see the proof of Lemma 2.13. Since Wk(y|x) is constant within a V -shell
of x, it follows that
|B ∩ TV (x)| 
η
2
|TV (x)|
for at least one V : X → Y satisfying (2.10). Now the proof can be completed using
Lemmas 2.5 and 2.7 just as in the proof of the previous lemma. 
Observe that the preceding three lemmas contain a proof of Theorem 1.1. Namely, ➞ 2.5
the fact that about kH(P) binary digits are sufficient for encoding k-length messages
of a DMS with generic distribution P, is a consequence of Lemmas 2.12 and 2.13,
while the necessity of this many binary digits follows from Lemma 2.14. Most coding
theorems in this book will be proved using typical sequences in a similar manner. The
merging of several nearby types has the advantage of facilitating computations. When
dealing with the more refined questions of the speed of convergence of error proba-
bilities, however, the method of typical sequences will become inappropriate. In such
problems, we shall have to consider each type separately, relying on the first part of this
chapter. Although this will not occur until Chapter 9, as an immediate illustration of the
more subtle method we now refine the basic source coding result, Theorem 1.1.
THEOREM 2.15 For any finite set X and R  0 there exists a sequence of k-to-nk
binary block codes ( fk, ϕk) with
nk
k
→ R
such that for every DMS with alphabet X and arbitrary generic distribution P, the
probability of error satisfies
e( fk, ϕk)  exp

−k

inf
Q:H(Q)R
D(Q||P) − ηk

(2.12)
with
ηk 
log(k + 1)
k
|X|.
24 Information measures in simple coding problems
This result is asymptotically sharp for every particular DMS, in the sense that for any
sequence of k-to-nk binary block codes, nk/k → R implies
lim
k→∞
1
k
log e( fk, ϕk)  − inf
Q:H(Q)R
D(Q||P). (2.13)
The infimum in (2.12) and (2.13) is finite iff R  log s(P), and then it equals the
minimum subject to H(Q)  R. 
Here s(P) denotes the size of the support of P, that is, the number of those a ∈ X for
which P(a)  0.
REMARK This result sharpens Theorem 1.1 in two ways. First, for a DMS with
generic distribution P, and R  H(P), it gives the precise asymptotics, in the expo-
nential sense, of the probability of error of the best codes with nk/k → R (the result
is also true, but uninteresting, for R  H(P)). Second, it shows that this optimal per-
formance can be achieved by codes not depending on the generic distribution of the
source. The remaining assertion of Theorem 1.1, namely that for nk/k → R  H(P)
the probability of error tends to 1, can be sharpened similarly. 
➞ 2.6
Proof of Theorem 2.15. Write
Ak 

Q:H(Q)R
TQ.
Then, by Lemmas 2.2 and 2.3,
|Ak|  (k + 1)|X|
exp(kR); (2.14)
further, by Lemmas 2.2 and 2.6,
Pk
(Xk
− Ak)  (k + 1)|X|
exp

− k min
Q:H(Q)R
D(Q||P)

. (2.15)
Let us encode the sequences in Ak in a one-to-one way and all others by a fixed
codeword, say. Equation (2.14) shows that this can be done with binary codewords of
length nk satisfying nk/k → R. For the resulting code, (2.15) gives (2.12), with
ηk 
log(k + 1)
k
|X|.
The last assertion of Theorem 2.15 is obvious, and implies that it suffices to prove
(2.13) for R  log s(P). The number of sequences in Xk correctly reproduced by a
k-to-nk binary block code is at most 2nk . Thus, by Lemma 2.3, for every type Q of
sequences in Xk satisfying
(k + 1)−|X|
exp [kH(Q)]  2nk+1
, (2.16)
at least half of the sequences in TQ will not be reproduced correctly. On account of
Lemma 2.6, it follows that
e( fk, ϕk) 
1
2
(k + 1)−|X|
exp [−kD(Q||P)]
Types and typical sequences 25
for every type Q satisfying (2.16). Hence
e( fk, ϕk) 
1
2
(k + 1)−|X|
exp −k min
Q:H(Q)R+εk
D(Q||P) ,
where Q runs over types of sequences in Xk and
εk 
nk
k
− R +
1
k
+
log (k + 1)
k
|X|.
Using that R  log s(P), for large k the last minimum changes little if Q is not ➞ 2.7
restricted to types, and εk is omitted. 
Discussion
The simple combinatorial lemmas concerning types are the basis of the proof of most
coding theorems treated in this book. Merging “nearby” types, i.e., the formalism of
typical sequences, has the advantage of shortening computations. In the literature, there
are several concepts of typical sequences. Often one merges more types than we have
done in Definition 2.8; in particular, the entropy-typical sequences of Problem 2.5 are
widely used. The latter kind of typicality has the advantage that it easily generalizes to
models with memory and with abstract alphabets. For the discrete memoryless systems
treated in this book, the adopted concept of typicality often leads to stronger results.
Still, the formalism of typical sequences has a limited scope, for it does not allow eval-
uation of convergence rates of error probabilities. This is illustrated by the fact that
typical sequences led to a simple proof of Theorem 1.1 while to prove Theorem 2.15
types had to be considered individually.
The technique of estimating probabilities without merging types is also more appro-
priate for the purpose of deriving universal coding theorems. Intuitively, universal
coding means that codes have to be constructed in complete ignorance of the proba-
bility distributions governing the system; then the performance of the code is evaluated
by the whole spectrum of its performance indices for the various possible distributions.
Theorem 2.15 is the first universal coding result in this book. It is clear that two codes
are not necessarily comparable from the point of view of universal coding. In view of
this, it is somewhat surprising that for the class of DMSs with a fixed alphabet X there
exist codes universally optimal in the sense that for every DMS they have asymptotically
the same probability of error as the best code designed for that particular DMS. 
Problems
2.1. Show that the exact number of types of sequences in Xk equals
k + |X| − 1
|X| − 1

.
Draw the conclusion that the lower bounds in Lemmas 2.3, 2.5 and 2.6 can be
sharpened replacing the power of (k + 1) by this number.
26 Information measures in simple coding problems
2.2. Prove that the size of Tk
P is of order of magnitude k−(s(P)−1)/2 exp{kH(P)},
where s(P) is the number of elements a ∈ X with P(a)  0. More precisely,
show that
log |Tk
P| = kH(P) −
s(P) − 1
2
log (2πk) −
1
2

a:P(a)0
logP(a) −
ϑ(k, P)
12 ln 2
s(P),
where 0  ϑ(k, P)  1.
Hint Use Robbins’ sharpening of Stirling’s formula:
√
2πnn+ 1
2 e−n+ 1
12(n+1)  n! 
√
2πnn+ 1
2 e−n+ 1
12n
(see e.g. Feller (1968), p. 54), noting that P(a)  1/k whenever P(a)  0.
2.3. Clearly, every y ∈ Yk in the V -shell of an x ∈ Xk has the same type Q where
Q(b) 

a∈X
Px(a)V (b|a).
(a) Show that TV(x) = TQ even if all the rows of the matrix V are equal to Q
(unless x consists of identical elements).
(b) Show that if Px = P then
(k + 1)−|X||Y|
exp [−kI (P, V )] 
|TV (x)|
|TQ|
 (k + 1)|Y|
exp [−kI (P, V )],
where I (P, V )  H(Q) − H(V |P) is the mutual information of RVs X and
Y such that PX = P and PY|X = V . In particular, if all rows of V are equal
to Q then the size of TV(x) is not “exponentially smaller” than that of TQ.
2.4. Prove that the first resp. second condition of (2.9) is necessary for Lemmas 2.13
resp. 2.12 to hold.
2.5. (Entropy-typical sequences) Let us say that a sequence x ∈ Xk is entropy-P-
typical with constant δ if
−
1
k
log Pk
(x) − H(P)  δ;
further, y ∈ Yk is entropy-W-typical under the condition x if
−
1
k
logWk
(y|x) − H(W|Px)  δ.
(a) Check that entropy-typical sequences also satisfy the assertions of Lemmas
2.12 and 2.13 (if δ = δk is chosen as in the delta-convention).
Hint These properties were implicitly used in the proofs of Theorems 1.1
and 1.2.
(b) Show that typical sequences – with constants chosen according to the delta-
convention – are also entropy-typical, with some constants δ
k = cP · δk resp.
δ
k = cW · δk. On the other hand, entropy-typical sequences are not necessar-
ily typical with constants of the same order of magnitude.
Another Random Document on
Scribd Without Any Related Topics
knew that only her bodily presence had been removed. She still lived
in our midst—we heard the ring of her voice in the words we read, in
the words our hearts told us she would say; we even heard the ring
of her laugh! And to-day you may be sure that the woman-pioneer
who had the faith to plant the first college for women in America,
lives by that faith, not only in her own Mount Holyoke, but in the
larger lives of all the women who have profited by her labors.
THE PRINCESS OF WELLESLEY:
ALICE FREEMAN PALMER
Our echoes roll from soul to soul,
And grow forever and forever.
Tennyson.
T
THE PRINCESS OF WELLESLEY
HIS is the story of a princess of our own time and our own
America—a princess who, while little more than a girl herself,
was chosen to rule a kingdom of girls. It is a little like the story of
Tennyson's Princess, with her woman's kingdom, and very much
like the happy, old-fashioned fairy-tale.
We have come to think it is only in fairy-tales that a golden
destiny finds out the true, golden heart, and, even though she
masquerades as a goose-girl, discovers the kingly child and brings
her to a waiting throne. We are tempted to believe that the chance
of birth and the gifts of wealth are the things that spell opportunity
and success. But this princess was born in a little farm-house, to a
daily round of hard work and plain living. That it was also a life of
high thinking and rich enjoyment of what each day brought, proved
her indeed a kingly child.
Give me health and a day, and I will make the pomp of
emperors ridiculous! said the sage of Concord. So it was with little
Alice Freeman. As she picked wild strawberries on the hills, and
climbed the apple-tree to lie for a blissful minute in a nest of
swaying blossoms under the blue sky, she was, as she said, happy
all over. The trappings of royalty can add nothing to one who knows
how to be royally happy in gingham.
But Alice was not always following the pasture path to her
friendly brook, or running across the fields with the calling wind, or
dancing with her shadow in the barn-yard, where even the prosy
hens stopped pecking corn for a minute to watch. She had work to
do for Mother. When she was only four, she could dry the dishes
without dropping one; and when she was six, she could be trusted
to keep the three toddlers younger than herself out of mischief.
My little daughter is learning to be a real little mother, said
Mrs. Freeman, as she went about her work of churning and baking
without an anxious thought.
Alice Freeman Palmer
It was Sister Alice who pointed out the robin's nest, and found
funny turtles and baby toads to play with. She took the little brood
with her to hunt eggs in the barn and to see the ducks sail around
like a fleet of boats on the pond. When Ella and Fred were wakened
by a fearsome noise at night, they crept up close to their little
mother, who told them a story about the funny screech-owl in its
hollow-tree home.
It is the ogre of mice and bats, but not of little boys and girls,
she said.
It sounds funny now, Alice, they whispered. It's all right when
we can touch you.
When Alice was seven a change came in the home. The father
and mother had some serious talks, and then it was decided that
Father should go away for a time, for two years, to study to be a
doctor.
It is hard to be chained to one kind of life when all the time you
are sure that you have powers and possibilities that have never had
a chance to come out in the open, she heard her father say one
evening. I have always wanted to be a doctor; I can never be more
than a half-hearted farmer.
You must go to Albany now, James, said the dauntless wife. I
can manage the farm until you get through your course at the
medical college; and then, when you are doing work into which you
can put your whole heart, a better time must come for all of us.
How can you possibly get along? he asked in amazement.
How can I leave you for two years to be a farmer, and father and
mother, too?
There is a little bank here, she said, taking down a jar from a
high shelf in the cupboard and jingling its contents merrily. I have
been saving bit by bit for just this sort of thing. And Alice will help
me, she added, smiling at the child who had been standing near
looking from father to mother in wide-eyed wonder. You will be the
little mother while I take father's place for a time, won't you, Alice?
It will be cruelly hard on you all, said the father, soberly. I
cannot make it seem right.
Think how much good you can do afterward, urged his wife.
The time will go very quickly when we are all thinking of that. It is
not hard to endure for a little for the sake of 'a gude time coming'—a
better time not only for us, but for many besides. For I know you will
be the true sort of doctor, James.
Alice never quite knew how they did manage during those two
years, but she was quite sure that work done for the sake of a good
to come is all joy.
I owe much of what I am to my milkmaid days, she said.
She was always sorry for children who do not grow up with the
sights and sounds of the country. One is very near to all the simple,
real things of life on a farm, she used to say. There is a dewy
freshness about the early out-of-door experiences, and a warm
wholesomeness about tasks that are a part of the common lot. A
country child develops, too, a responsibility—a power to do and to
contrive—that the city child, who sees everything come ready to
hand from a near-by store, cannot possibly gain. However much
some of my friends may deplore my own early struggle with poverty
and hard work, I can heartily echo George Eliot's boast:
But were another childhood-world
my share,
I would be born a little sister there.
When Alice was ten years old, the family moved from the farm
to the village of Windsor, where Dr. Freeman entered upon his life as
a doctor, and where Alice's real education began. From the time she
was four she had, for varying periods, sat on a bench in the district
school, but for the most part she had taught herself. At Windsor
Academy she had the advantage of a school of more than average
efficiency.
Words do not tell what this old school and place meant to me
as a girl, she said years afterward. Here we gathered abundant
Greek, Latin, French, and mathematics; here we were taught
truthfulness, to be upright and honorable; here we had our first
loves, our first ambitions, our first dreams, and some of our first
disappointments. We owe a large debt to Windsor Academy for the
solid groundwork of education that it laid.
More important than the excellent curriculum and wholesome
associations, however, was the influence of a friendship with one of
the teachers, a young Harvard graduate who was supporting himself
while preparing for the ministry. He recognized the rare nature and
latent powers of the girl of fourteen, and taught her the delights of
friendship with Nature and with books, and the joy of a mind trained
to see and appreciate. He gave her an understanding of herself, and
aroused the ambition, which grew into a fixed resolve, to go to
college. But more than all, he taught her the value of personal
influence.
It is people that count, she used to say. The truth and beauty
that are locked up in books and in nature, to which only a few have
the key, begin really to live when they are made over into human
character. Disembodied ideas may mean little or nothing; it is when
they are 'made flesh' that they can speak to our hearts and minds.
As Alice drove about with her father when he went to see his
patients and saw how this true doctor of the old school was a
physician to the mind as well as the body of those who turned to
him for help, she came to a further realization of the truth: It is
people that count.
It must be very depressing to have to associate with bodies and
their ills all the time, she ventured one day when her father seemed
more than usually preoccupied. She never forgot the light that shone
in his eyes as he turned and looked at her.
We can't begin to minister to the body until we understand that
spirit is all, he said. What we are pleased to call body is but one
expression—and a most marvelous expression—of the hidden life
that impels
All thinking things, all objects of all
thought,
And rolls through all things.
It seemed to Alice that this might be a favorable time to broach
the subject of college. He looked at her in utter amazement; few
girls thought of wanting more than a secondary education in those
days, and there were still fewer opportunities for them.
Why, daughter, he exclaimed, a little more Latin and
mathematics won't make you a better home-maker! Why should you
set your heart on this thing?
I must go, Father, she answered steadily. It is not a sudden
notion; I have realized for a long time that I cannot live my life—the
life that I feel I have it within me to live—without this training. I
want to be a teacher—the best kind of a teacher—just as you
wanted to be a doctor.
But, my dear child, he protested, much troubled, it will be as
much as we can manage to see one of you through college, and that
one should be Fred, who will have a family to look out for one of
these days.
If you let me have this chance, Father, said Alice, earnestly, I'll
promise that you will never regret it. I'll help to give Fred his chance,
and see that the girls have the thing they want as well.
In the end Alice had her way. It seemed as if the strength of her
single-hearted longing had power to compel a reluctant fate. In
June, 1872, when but a little over seventeen, she went to Ann Arbor
to take the entrance examinations for the University of Michigan, a
careful study of catalogues having convinced her that the standard
of work was higher there than in any college then open to women.
A disappointment met her at the outset. Her training at Windsor,
good as it was, did not prepare her for the university requirements.
Conditions loomed mountain high, and the examiners
recommended that she spend another year in preparation. Her
intelligence and character had won the interest of President Angell,
however, and he asked that she be granted a six-weeks' trial. His
confidence in her was justified; for she not only proved her ability to
keep up with her class, but steadily persevered in her double task
until all conditions were removed.
The college years were a glory instead of a grind, in spite of
the ever-pressing necessity for strict economy in the use of time and
money. Her sense of values—the ability to see large things large
and small things small, which has been called the best measure of
education,—showed a wonderful harmony of powers. While the mind
was being stored with knowledge and the intellect trained to clear,
orderly thinking, there was never a too-muchness in this direction
that meant a not-enoughness in the realm of human relationships.
Always she realized that it is people that count, and her supreme
test of education as of life was its consecrated serviceableness.
President Angell in writing of her said:
One of her most striking characteristics in college
was her warm and demonstrative sympathy with her
circle of friends. Her soul seemed bubbling over with
joy, which she wished to share with the other girls.
While she was therefore in the most friendly relations
with all those girls then in college, she was the radiant
center of a considerable group whose tastes were
congenial with her own. Without assuming or striving
for leadership, she could not but be to a certain degree
a leader among these, some of whom have attained
positions only less conspicuous for usefulness than her
own. Wherever she went, her genial, outgoing spirit
seemed to carry with her an atmosphere of
cheerfulness and joy.
In the middle of her junior year, news came from her father of a
more than usual financial stress, owing to a flood along the
Susquehanna, which had swept away his hope of present gain from
a promising stretch of woodland. It seemed clear to Alice that the
time had come when she must make her way alone. Through the
recommendation of President Angell she secured a position as
teacher of Latin and Greek in the High School at Ottawa, Illinois,
where she taught for five months, receiving enough money to carry
her through the remainder of her college course. The omitted junior
work was made up partly during the summer vacation and partly in
connection with the studies of the senior year. An extract from a
letter home will tell how the busy days went:
This is the first day of vacation. I have been so
busy this year that it seems good to get a change,
even though I do keep right on here at work. For some
time I have been giving a young man lessons in Greek
every Saturday. I have had two junior speeches
already, and there are still more. Several girls from
Flint tried to have me go home with them for the
vacation, but I made up my mind to stay and do what
I could for myself and the other people here. A young
Mr. M. is going to recite to me every day in Virgil; so
with teaching and all the rest I sha'n't have time to be
homesick, though it will seem rather lonely when the
other girls are gone and I don't hear the college bell
for two weeks.
Miss Freeman's early teaching showed the vitalizing spirit that
marked all of her relations with people.
She had a way of making you feel 'all dipped in sunshine,' one
of her girls said.
Everything she taught seemed a part of herself, another
explained. It wasn't just something in a book that she had to teach
and you had to learn. She made every page of our history seem a
part of present life and interests. We saw and felt the things we
talked about.
The fame of this young teacher's influence traveled all the way
from Michigan, where she was principal of the Saginaw High School,
to Massachusetts. Mr. Henry Durant, the founder of Wellesley, asked
her to come to the new college as teacher of mathematics. She
declined the call, however, and, a year later, a second and more
urgent invitation. Her family had removed to Saginaw, where Dr.
Freeman was slowly building up a practice, and it would mean
leaving a home that needed her. The one brother was now in the
university; Ella was soon to be married; and Stella, the youngest,
who was most like Alice in temperament and tastes, was looking
forward hopefully to college.
But at the time when Dr. Freeman was becoming established
and the financial outlook began to brighten, the darkest days that
the family had ever known were upon them. Stella, the chief joy and
hope of them all, fell seriously ill. The little mother loved this
starlike girl as her own child, and looked up to her as one who
would reach heights her feet could never climb. When she died it
seemed to Alice that she had lost the one chance for a perfectly
understanding and inspiring comradeship that life offered. At this
time a third call came to Wellesley,—as head of the department of
history,—and hoping that a new place with new problems would give
her a fresh hold on joy, she accepted.
Into her college work the young woman of twenty-four put all
the power and richness of her radiant personality. She found peace
and happiness in untiring effort, and her girls found in her the most
inspiring teacher they had ever known. She went to the heart of the
history she taught, and she went to the hearts of her pupils.
She seemed to care for each of us—to find each as interesting
and worth while as if there were no other person in the world, one
of her students said.
Mr. Durant had longed to find just such a person to build on the
foundation he had laid. It was in her first year that he pointed her
out to one of the trustees.
Do you see that little dark-eyed girl? She will be the next
president of Wellesley, he said.
Surely she is much too young and inexperienced for such a
responsibility, protested the other, looking at him in amazement.
As for the first, it is a fault we easily outgrow, said Mr. Durant,
dryly, and as for her inexperience—well, I invite you to visit one of
her classes.
The next year, on the death of Mr. Durant, she was made acting
president of the college, and the year following she inherited the
title and honors, as well as the responsibilities and opportunities, of
the office. The Princess had come into her kingdom.
The election caused a great stir among the students, particularly
the irrepressible seniors. It was wonderful and most inspiring that
their splendid Miss Freeman, who was the youngest member of the
faculty, should have won this honor. Why, she was only a girl like
themselves! The time of strict observances and tiresome regulations
of every sort was at an end. Miss Freeman seemed to sense the
prevailing mood, and, without waiting for a formal assembly, asked
the seniors to meet her in her rooms. In they poured, overflowing
chairs, tables, and ranging themselves about on the floor in
animated, expectant groups. The new head of the college looked at
them quietly for a minute before she began to speak.
I have sent for you seniors, she said at last seriously, to ask
your advice. You may have heard that I have been called to the
position of acting president of your college. I am, of course, too
young; and the duties are, as you know, too heavy for the strongest
to carry alone. If I must manage alone, there is only one course—to
decline. It has, however, occurred to me that my seniors might be
willing to help by looking after the order of the college and leaving
me free for administration. Shall I accept? Shall we work things out
together?
The hearty response made it clear that the princess was to rule
not only by divine right, but also by the glad consent of the
governed. Perhaps it was her youth and charm and the romance of
her brilliant success that won for her the affectionate title of The
Princess; perhaps it was her undisputed sway in her kingdom of
girls. It was said that her radiant, outgoing spirit was felt in the
atmosphere of the place and in all the graduates. Her spirit became
the Wellesley spirit.
What did she do besides turning all of you into an adoring band
of Freeman-followers? a Wellesley woman was asked.
The reply came without a moment's hesitation: She had the
life-giving power of a true creator, one who can entertain a vision of
the ideal, and then work patiently bit by bit to 'carve it in the marble
real.' She built the Wellesley we all know and love, making it
practical, constructive, fine, generous, human, spiritual.
For six years the Princess of Wellesley ruled her kingdom wisely.
She raised the standard of work, enlisted the interest and support of
those in a position to help, added to the buildings and equipment,
and won the enthusiastic cooperation of students, faculty, and
public. Then, one day, she voluntarily stepped down from her
throne, leaving others to go on with the work she had begun. She
married Professor George Herbert Palmer of Harvard, and, (quite in
the manner of the fairy-tale) lived happily ever after.
What a disappointment! some of her friends said. That a
woman of such unusual powers and gifts should deliberately leave a
place of large usefulness and influence to shut herself up in the
concerns of a single home!
There is nothing better than the making of a true home, said
Alice Freeman Palmer. I shall not be shut away from the concerns of
others, but more truly a part of them. 'For love is fellow-service,' I
believe.
The home near Harvard Yard was soon felt to be the most free
and perfect expression of her generous nature. Its happiness made
all life seem happier. Shy undergraduates and absorbed students
who had withdrawn overmuch within themselves and their pet
problems found there a thaw after their winter of discontent.
Wellesley girls—even in those days before automobiles—did not feel
fifteen miles too great a distance to go for a cup of tea and a half-
hour by the fire.
College Hall, destroyed by fire in
1914
Tower Court, which stands on
the site of College Hall
Many were surprised that Mrs. Palmer never seemed worn by
the unstinted giving of herself to the demands of others on her time
and sympathy. The reason was that their interests were her
interests. Her spirit was indeed outgoing; there was no wall
hedging in a certain number of things and people as hers, with the
rest of the world outside. As we have seen, people counted with her
supremely; and the ideas which moved her were those which she
found embodied in the joys and sorrows of human hearts.
Mrs. Palmer wrote of her days at this time:
I don't know what will happen if life keeps on
growing so much better and brighter each year. How
does your cup manage to hold so much? Mine is
running over, and I keep getting larger cups; but I
can't contain all my blessings and gladness. We are
both so well and busy that the days are never half long
enough.
Life held, indeed, a full measure of opportunities for service.
Wellesley claimed her as a member of its executive committee, and
other colleges sought her counsel. When Chicago University was
founded, she was induced to serve as its Dean of Women until the
opportunities for girls there were wisely established. She worked
energetically raising funds for Radcliffe and her own Wellesley.
Throughout the country her wisdom as an educational expert was
recognized, and her advice sought in matters of organization and
administration. For several years, as a member of the Massachusetts
State Board of Education, she worked early and late to improve the
efficiency and influence of the normal schools. She was a public
servant who brought into all her contact with groups and masses of
people the simple directness and intimate charm that marked her
touch with individuals.
How is it that you are able to do so much more than other
people? asked a tired, nervous woman, who stopped Mrs. Palmer
for a word at the close of one of her lectures.
Because, she answered, with the sudden gleam of a smile, I
haven't any nerves nor any conscience, and my husband says I
haven't any backbone.
It was true that she never worried. She had early learned to live
one day at a time, without looking before and after. And nobody
knew better than Mrs. Palmer the renewing power of joy. She could
romp with some of her very small friends in the half-hour before an
important meeting; go for a long walk or ride along country lanes
when a vexing problem confronted her; or spend a quiet evening by
the fire reading aloud from one of her favorite poets at the end of a
busy day.
For fifteen years Mrs. Palmer lived this life of joyful, untiring
service. Then, at the time of her greatest power and usefulness, she
died. The news came as a personal loss to thousands. Just as
Wellesley had mourned her removal to Cambridge, so a larger world
mourned her earthly passing. But her friends soon found that it was
impossible to grieve or to feel for a moment that she was dead. The
echoes of her life were living echoes in the world of those who knew
her.
There are many memorials speaking in different places of her
work. In the chapel at Wellesley, where it seems to gather at every
hour a golden glory of light, is the lovely transparent marble by
Daniel Chester French, eternally bearing witness to the meaning of
her influence with her girls. In the tower at Chicago the chimes
make music, joyfully to recall, her labors there. But more lasting
than marble or bronze is the living memorial in the hearts and minds
made better by her presence. For it is, indeed, people that count,
and in the richer lives of many the enkindling spirit of Alice Freeman
Palmer still lives.
OUR LADY OF THE RED CROSS:
CLARA BARTON
Who gives himself with his alms
feeds three,—
Himself, his hungering neighbor, and
Me.
The Vision of Sir Launfal.—Lowell.
A
OUR LADY OF THE RED CROSS
CHRISTMAS baby! Now isn't that the best kind of a Christmas
gift for us all? cried Captain Stephen Barton, who took the
interesting flannel bundle from the nurse's arms and held it out
proudly to the assembled family.
No longed-for heir to a waiting kingdom could have received a
more royal welcome than did that little girl who appeared at the
Barton home in Oxford, Massachusetts, on Christmas Day, 1821. Ten
years had passed since a child had come to the comfortable farm-
house, and the four big brothers and sisters were very sure that they
could not have had a more precious gift than this Christmas baby.
No one doubted that she deserved a distinguished name, but it was
due to Sister Dorothy, who was a young lady of romantic seventeen
and something of a reader, that she was called Clarissa Harlowe,
after a well-known heroine of fiction. The name which this heroine
of real life actually bore and made famous, however, was Clara
Barton; for the Christmas baby proved to be a gift not only to a little
group of loving friends, but also to a great nation and to humanity.
The sisters and brothers were teachers rather than playmates for
Clara, and her education began so early that she had no recollection
of the way they led her toddling steps through the beginnings of
book-learning. On her first day at school she announced to the
amazed teacher who tried to put a primer into her hands that she
could spell the artichoke words. The teacher had other surprises
besides the discovery that this mite of three was acquainted with
three-syllabled lore.
Brother Stephen, who was a wizard with figures, had made the
sums with which he covered her slate seem a fascinating sort of play
at a period when most infants are content with counting the fingers
of one hand. All other interests, however, paled before the stories
that her father told her of great men and their splendid deeds.
Captain Barton was amused one day at the discovery that his
precocious daughter, who always eagerly encored his tales of
conquerors and leaders, thought of their greatness in images of
quite literal and realistic bigness. A president must, for instance, be
as large as a house, and a vice-president as spacious as a barn door
at the very least. But these somewhat crude conceptions did not put
a check on the epic recitals of the retired officer, who, in the
intervals of active service in plowed fields or in pastures where his
thoroughbreds grazed with their mettlesome colts, liked to live over
the days when he served under Mad Anthony Wayne in the
Revolutionary War, and had a share in the thrilling adventures of the
Western frontier.
Clara was only five years old when Brother David taught her to
ride. Learning to ride is just learning a horse, said this daring
youth, who was the Buffalo Bill of the surrounding country.
How can I learn a horse, David? quavered the child, as the
high-spirited animals came whinnying to the pasture bars at her
brother's call.
Catch hold of his mane, Clara, and just feel the horse a part of
yourself—the big half for the time being, said David, as he put her
on the back of a colt that was broken only to bit and halter, and,
easily springing on his favorite, held the reins of both in one hand,
while he steadied the small sister with the other by seizing hold of
one excited foot.
They went over the fields at a gallop that first day, and soon
little Clara and her mount understood each other so well that her
riding feats became almost as far-famed as those of her brother. The
time came when her skill and confidence on horseback—her power
to feel the animal she rode a part of herself and keep her place in
any sort of saddle through night-long gallops—meant the saving of
many lives.
David taught her many other practical things that helped to
make her steady and self-reliant in the face of emergencies. She
learned, for instance, to drive a nail straight, and to tie a knot that
would hold. Eye and hand were trained to work together with quick
decision that made for readiness and efficiency in dealing with a
situation, whether it meant the packing of a box, or first-aid
measures after an accident on the skating-pond.
She was always an outdoor child, with dogs, horses, and ducks
for playfellows. The fuzzy ducklings were the best sort of dolls.
Sometimes when wild ducks visited the pond and all her waddling
favorites began to flap their wings excitedly, it seemed that her
young heart felt, too, the call of large, free spaces.
The only real fun is to do things, she used to say.
She rode after the cows, helped in the milking and churning, and
followed her father about, dropping potatoes in their holes or
helping weed the garden. Once, when the house was being painted,
she begged to be allowed to assist in the work, even learning to
grind the pigments and mix the colors. The family was at first
amused and then amazed at the persistency of her application as
day after day she donned her apron and fell to work.
They were not less astonished when she wanted to learn the
work of the weavers in her brothers' satinet mills. At first, her
mother refused this extraordinary request; but Stephen, who
understood the intensity of her craving to do things, took her part;
and at the end of her first week at the flying shuttle Clara had the
satisfaction of finding that her cloth was passed as first-quality
goods. Her career as a weaver was of short duration, however,
owing to a fire which destroyed the mills.
The young girl was as enthusiastic in play as at work. Whether it
was a canter over the fields on Billy while her dog, Button, dashed
along at her side, his curly white tail bobbing ecstatically, or a coast
down the rolling hills in winter, she entered into the sport of the
moment with her whole heart.
When there was no outlet for her superabundant energy, she
was genuinely unhappy. Then it was that a self-consciousness and
morbid sensitiveness became so evident that it was a source of real
concern to her friends.
People say that I must have been born brave, said Clara
Barton. Why, I seem to remember nothing but terrors in my early
days. I was a shrinking little bundle of fears—fears of thunder, fears
of strange faces, fears of my strange self. It was only when thought
and feeling were merged in the zest of some interesting activity that
she lost her painful shyness and found herself.
When she was eleven years old she had her first experience as a
nurse. A fall which gave David a serious blow on the head, together
with the bungling ministrations of doctors, who, when in doubt, had
recourse only to the heroic treatment of bleeding and leeches,
brought the vigorous young brother to a protracted invalidism. For
two years Clara was his constant and devoted attendant. She
schooled herself to remain calm, cheerful, and resourceful in the
presence of suffering and exacting demands. When others gave way
to fatigue or nerves, her wonderful instinct for action kept her,
child though she was, at her post. Her sympathy expressed itself in
untiring service.
In the years that followed her brother's recovery Clara became a
real problem to herself and her friends. The old blighting
sensitiveness made her school-days restless and unhappy in spite of
her alert mind and many interests.
At length her mother, at her wit's end because of this baffling,
morbid strain in her remarkable daughter, was advised by a man of
sane judgment and considerable understanding of child nature, to
throw responsibility upon her and give her a school to teach.
It happened, therefore, that when Clara Barton was fifteen she
put down her skirts, put up her hair, and entered upon her
successful career as a teacher. She liked the children and believed in
them, entering enthusiastically into their concerns, and opening the
way to new interests. When asked how she managed the discipline
of the troublesome ones, she said, The children give no trouble; I
never have to discipline at all, quite unconscious of the fact that her
vital influence gave her a control that made assertion of authority
unnecessary.
When the boys found that I was as strong as they were and
could teach them something on the playground, they thought that
perhaps we might discover together a few other worth-while things
in school hours, she said.
For eighteen years Clara Barton was a teacher. Always learning
herself while teaching others, she decided in 1852 to enter Clinton
Liberal Institute in New York as a pupil for graduation, for there was
then no college whose doors were open to women. When she had
all that the Institute could give her, she looked about for new fields
for effort.
In Bordentown, New Jersey, she found there was a peculiar need
for some one who would bring to her task pioneer zeal as well as the
passion for teaching. At that time there were no public schools in the
town or, indeed, in the State.
The people who pose as respectable are too proud and too
prejudiced to send their boys and girls to a free pauper school, and
in the meantime all the children run wild, Miss Barton was told.
We have tried again and again, said a discouraged young
pedagogue. It is impossible to do anything in this place.
Give me three months, and I will teach free, said Clara Barton.
This was just the sort of challenge she loved. There was
something to be done. She began with six unpromising gamins in a
dilapidated, empty building. In a month her quarters proved too
narrow. Each youngster became an enthusiastic and effectual
advertisement. As always, her success lay in an understanding of her
pupils as individuals, and a quickening interest that brought out the
latent possibilities of each. The school of six grew in a year to one of
six hundred, and the thoroughly converted citizens built an eight-
room school-house where Miss Barton remained as principal and
teacher until a breakdown of her voice made a complete rest
necessary.
The weak throat soon made it evident that her teaching days
were over; but she found at the same time in Washington, where
she had gone for recuperation, a new work.
Living is doing, she said. Even while we say there is nothing
we can do, we stumble over the opportunities for service that we are
passing by in our tear-blinded self-pity.
The over-sensitive girl had learned her lesson well. Life offered
moment by moment too many chances for action for a single worker
to turn aside to bemoan his own particular condition.
The retired teacher became a confidential secretary in the office
of the Commissioner of Patents. Great confusion existed in the
Patent Office at that time because some clerks had betrayed the
secrets of certain inventions. Miss Barton was the first woman to be
employed in a Government department; and while ably handling the
critical situation that called for all her energy and resourcefulness,
she had to cope not only with the scarcely veiled enmity of those
fellow-workers who were guilty or jealous, but also with the open
antagonism of the rank and file of the clerks, who were indignant
because a woman had been placed in a position of responsibility and
influence. She endured covert slander and deliberate disrespect,
letting her character and the quality of her work speak for
themselves. They spoke so eloquently that when a change in
political control caused her removal, she was before long recalled to
straighten out the tangle that had ensued.
At the outbreak of the Civil War Miss Barton was, therefore, at
the very storm-center.
The early days of the conflict found her binding up the wounds
of the Massachusetts boys who had been attacked by a mob while
passing through Baltimore, and who for a time were quartered in the
Capitol. Some of these recruits were boys from Miss Barton's own
town who had been her pupils, and all were dear to her because
they were offering their lives for the Union. We find her with other
volunteer nurses caring for the injured, feeding groups who
gathered about her in the Senate Chamber, and, from the desk of
the President of the Senate, reading them the home news from the
Worcester papers.
Meeting the needs as they presented themselves in that time of
general panic and distress, she sent to the Worcester Spy appeals
for money and supplies. Other papers took up the work, and soon
Miss Barton had to secure space in a large warehouse to hold the
provisions that poured in.
Not for many days, however, did she remain a steward of
supplies. When she met the transports which brought the wounded
into the city, her whole nature revolted at the sight of the untold
suffering and countless deaths which were resulting from delay in
caring for the injured. Her flaming ardor, her rare executive ability,
and her tireless persistency won for her the confidence of those in
command, and, though it was against all traditions, to say nothing of
iron-clad army regulations, she obtained permission to go with her
stores of food, bandages, and medicines to the firing-line, where
relief might be given on the battle-field at the time of direst need.
The girl who had been a bundle of fears had grown into the
woman who braved every danger and any suffering to carry help to
her fellow-countrymen.
People who spoke of her rare initiative and practical judgment
had little comprehension of the absolute simplicity and directness of
her methods. She managed the sulky, rebellious drivers of her army-
wagons, who had little respect for orders that placed a woman in
control, in the same way that she had managed children in school.
Without relaxing her firmness, she spoke to them courteously, and
called them to share the warm dinner she had prepared and spread
out in appetizing fashion. When, after clearing away the dishes, she
was sitting alone by the fire, the men returned in an awkward, self-
conscious group.
We didn't come to get warm, said their spokesman, as she
kindly moved to make room for them at the flames, we come to tell
you we are ashamed. The truth is we didn't want to come. We know
there is fighting ahead, and we've seen enough of that for men who
don't carry muskets, only whips; and then we've never seen a train
under charge of a woman before, and we couldn't understand it.
We've been mean and contrary all day, and you've treated us as if
we'd been the general and his staff, and given us the best meal
we've had in two years. We want to ask your forgiveness, and we
sha'n't trouble you again.
She found that a comfortable bed had been arranged for her in
her ambulance, a lantern was hanging from the roof, and when next
morning she emerged from her shelter, a steaming breakfast
awaited her and a devoted corps of assistants stood ready for
orders.
I had cooked my last meal for my drivers, said Clara Barton.
These men remained with me six months through frost and snow
and march and camp and battle; they nursed the sick, dressed the
wounded, soothed the dying, and buried the dead; and, if possible,
they grew kinder and gentler every day.
An incident that occurred at Antietam is typical of her quiet
efficiency. According to her directions, the wounded were being fed
with bread and crackers moistened in wine, when one of her
assistants came to report that the entire supply was exhausted,
while many helpless ones lay on the field unfed. Miss Barton's quick
eye had noted that the boxes from which the wine was taken had
fine Indian meal as packing. Six large kettles were at once
unearthed from the farm-house in which they had taken quarters,
and soon her men were carrying buckets of hot gruel for miles over
the fields where lay hundreds of wounded and dying. Suddenly, in
the midst of her labors, Miss Barton came upon the surgeon in
charge sitting alone, gazing at a small piece of tallow candle which
flickered uncertainly in the middle of the table.
Tired, Doctor? she asked sympathetically.
Tired indeed! he replied bitterly; tired of such heartless
neglect and carelessness. What am I to do for my thousand
wounded men with night here and that inch of candle all the light I
have or can get?
Miss Barton took him by the arm and led him to the door, where
he could see near the barn scores of lanterns gleaming like stars.
What is that! he asked amazedly.
The barn is lighted, she replied, and the house will be
directly.
Where did you get them! he gasped.
Brought them with me.
How many have you?
All you want—four boxes.
The surgeon looked at her for a moment as if he were waking
from a dream; and then, as if it were the only answer he could
make, fell to work. And so it was invariably that she won her
complete command of people as she did of situations, by always
proving herself equal to the emergency of the moment.
Though, as she said in explaining the tardiness of a letter, my
hands complain a little of unaccustomed hardships, she never
complained of any ill, nor allowed any danger or difficulty to
interrupt her work.
What are my puny ailments beside the agony of our poor
shattered boys lying helpless on the field? she said. And so, while
doctors and officers wondered at her unlimited capacity for prompt
and effective action, the men who had felt her sympathetic touch
and effectual aid loved and revered her as The Angel of the
Battlefield.
One incident well illustrates the characteristic confidence with
which she moved about amid scenes of terror and panic. At
Fredericksburg, when every street was a firing-line and every house
a hospital, she was passing along when she had to step aside to
allow a regiment of infantry to sweep by. At that moment General
Patrick caught sight of her, and, thinking she was a bewildered
resident of the city who had been left behind in the general exodus,
leaned from his saddle and said reassuringly:
You are alone and in great danger, madam. Do you want
protection?
Miss Barton thanked him with a smile, and said, looking about at
the ranks, I believe I am the best-protected woman in the United
States.
The soldiers near overheard and cried out, That's so! that's so!
And the cheer that they gave was echoed by line after line until a
mighty shout went up as for a victory.
The courtly old general looked about comprehendingly, and,
bowing low, said as he galloped away, I believe you are right,
madam.
Clara Barton was present on sixteen battle-fields; she was eight
months at the siege of Charleston, and served for a considerable
period in the hospitals of Richmond.
Clara Barton
When the war was ended and the survivors of the great armies
were marching homeward, her heart was touched by the distress in
many homes where sons and fathers and brothers were among
those listed as missing. In all, there were 80,000 men of whom no
definite report could be given to their friends. She was assisting
President Lincoln in answering the hundreds of heartbroken letters,
imploring news, which poured in from all over the land when his
tragic death left her alone with the task. Then, as no funds were
available to finance a thorough investigation of every sort of record
of States, hospitals, prisons, and battle-fields, she maintained out of
her own means a bureau to prosecute the search.
Four years were spent in this great labor, during which time Miss
Barton made many public addresses, the proceeds of which were
devoted to the cause. One evening in the winter of 1868, while in
the midst of a lecture, her voice suddenly left her. This was the
beginning of a complete nervous collapse. The hardships and
prolonged strain had, in spite of her robust constitution and iron will,
told at last on the endurance of that loyal worker.
When able to travel, she went to Geneva, Switzerland, in the
hope of winning back her health and strength. Soon after her arrival
she was visited by the president and members of the International
Committee for the Relief of the Wounded in War, who came to learn
why the United States had refused to sign the Treaty of Geneva,
providing for the relief of sick and wounded soldiers. Of all the
civilized nations, our great republic alone most unaccountably held
aloof.
Miss Barton at once set herself to learn all she could about the
ideals and methods of the International Red Cross, and during the
Franco-Prussian War she had abundant opportunity to see and
experience its practical working on the battle-field.
At the outbreak of the war in 1870 she was urged to go as a
leader, taking the same part that she had borne in the Civil War.
I had not strength to trust for that, said Clara Barton, and
declined with thanks, promising to follow in my own time and way;
and I did follow within a week. As I journeyed on, she continued, I
saw the work of these Red Cross societies in the field accomplishing
in four months under their systematic organization what we failed to
accomplish in four years without it—no mistakes, no needless
suffering, no waste, no confusion, but order, plenty, cleanliness, and
comfort wherever that little flag made its way—a whole continent
marshaled under the banner of the Red Cross. As I saw all this and
joined and worked in it, you will not wonder that I said to myself 'if I
live to return to my country, I will try to make my people understand
the Red Cross and that treaty.'
Months of service in caring for the wounded and the helpless
victims of siege and famine were followed by a period of nervous
exhaustion from which she but slowly crept back to her former hold
on health. At last she was able to return to America to devote herself
to bringing her country into line with the Red Cross movement. She
found that traditionary prejudice against entangling alliances with
other powers, together with a singular failure to comprehend the
vital importance of the matter, militated against the great cause.
Why should we make provision for the wounded? it was said.
We shall never have another war; we have learned our lesson.
It came to Miss Barton then that the work of the Red Cross
should be extended to disasters, such as fires, floods, earthquakes,
and epidemics—great public calamities which require, like war,
prompt and well-organized help.
Years of devoted missionary work with preoccupied officials and
a heedless, short-sighted public at length bore fruit. After the
Geneva Treaty received the signature of President Arthur on March
1, 1882, it was promptly ratified by the Senate, and the American
National Red Cross came into being, with Clara Barton as its first
president. Through her influence, too, the International Congress of
Berne adopted the American Amendment, which dealt with the
extension of the Red Cross to relief measures in great calamities
occurring in times of peace.
The story of her life from this time on is one with the story of
the work of the Red Cross during the stress of such disasters as the
Mississippi River floods, the Texas famine in 1885, the Charleston
earthquake in 1886, the Johnstown flood in 1899, the Russian
famine in 1892, and the Spanish-American War. The prompt,
efficient methods followed in the relief of the flood sufferers along
the Mississippi in 1884 may serve to illustrate the sane, constructive
character of her work.
Supply centers were established, and a steamer chartered to ply
back and forth carrying help and hope to the distracted human
creatures who stood wringing their hands on a frozen, fireless shore
—with every coal-pit filled with water. For three weeks she patrolled
the river, distributing food, clothing, and fuel, caring for the sick,
and, in order to establish at once normal conditions of life, providing
the people with many thousands of dollars' worth of building
material, seeds, and farm implements, thus making it possible for
them to help themselves and in work find a cure for their benumbing
distress.
Our Lady of the Red Cross lived past her ninetieth birthday, but
her real life is measured by deeds, not days. It was truly a long one,
rich in the joy of service. She abundantly proved the truth of the
words: We gain in so far as we give. If we would find our life, we
must be willing to lose it.
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

Information Theory Coding Theorems For Discrete Memoryless Systems 2nd Edition Imre Csiszr

  • 1.
    Information Theory CodingTheorems For Discrete Memoryless Systems 2nd Edition Imre Csiszr download https://ebookbell.com/product/information-theory-coding-theorems- for-discrete-memoryless-systems-2nd-edition-imre-csiszr-42168218 Explore and download more ebooks at ebookbell.com
  • 2.
    Here are somerecommended products that we believe you will be interested in. You can click the link to download. Coding Theorems Of Classical And Quantum Information Theory Second Edition Parthasarathy https://ebookbell.com/product/coding-theorems-of-classical-and- quantum-information-theory-second-edition-parthasarathy-6754144 Information Theory Coding And Cryptography Mandal Surajit Manna https://ebookbell.com/product/information-theory-coding-and- cryptography-mandal-surajit-manna-21999880 Information Theory Coding And Cryptography 3rd Edition Ranjan Bose https://ebookbell.com/product/information-theory-coding-and- cryptography-3rd-edition-ranjan-bose-10953542 Information Theory And Coding 1st Edition Dr Muralidhar Kulkarni Dr Shivaprakash K S https://ebookbell.com/product/information-theory-and-coding-1st- edition-dr-muralidhar-kulkarni-dr-shivaprakash-k-s-48324350
  • 3.
    Information Theory AndCoding By Example New Kelbert Mark Suhov Yuri https://ebookbell.com/product/information-theory-and-coding-by- example-new-kelbert-mark-suhov-yuri-55201892 Information Theory And Coding Solved Problems Predrag Ivani https://ebookbell.com/product/information-theory-and-coding-solved- problems-predrag-ivani-5725500 Fundamentals In Information Theory And Coding Borda Monica https://ebookbell.com/product/fundamentals-in-information-theory-and- coding-borda-monica-20009716 Twodimensional Information Theory And Coding With Application To Graphics And Highdensity Storage Media Jrn Justesen Sren Forchhammer https://ebookbell.com/product/twodimensional-information-theory-and- coding-with-application-to-graphics-and-highdensity-storage-media-jrn- justesen-sren-forchhammer-4700602 Fundamentals Of Information Theory And Coding Design 1st Edition Roberto Togneri https://ebookbell.com/product/fundamentals-of-information-theory-and- coding-design-1st-edition-roberto-togneri-1006086
  • 7.
    Information Theory Coding Theoremsfor Discrete Memoryless Systems This book is widely regarded as a classic in the field of information theory, providing deep insights and expert treatment of the key theoretical issues. It includes in-depth coverage of the mathematics of reliable information transmission, both in two-terminal and multi-terminal network scenarios. Updated and considerably expanded, this new edition presents unique discussions of information-theoretic secrecy and of zero-error information theory, including substantial connections of the latter with extremal com- binatorics. The presentations of all core subjects are self-contained, even the advanced topics, which helps readers to understand the important connections between seemingly different problems. Finally, 320 end-of-chapter problems, together with helpful solving hints, allow readers to develop a full command of the mathematical techniques. This is an ideal resource for graduate students and researchers in electrical and electronic engineering, computer science, and applied mathematics. Imre Csiszár is a Research Professor at the Alfréd Rényi Institute of Mathematics of the Hungarian Academy of Sciences, where he has worked since 1961. He is also Pro- fessor Emeritus of the University of Technology and Economics, Budapest, a Fellow of the IEEE, and former President of the Hungarian Mathematical Society. He has received numerous awards, including the Shannon Award of the IEEE Information Theory Society (1996). János Körner is a Professor of Computer Science at the Sapienza University of Rome, Italy, where he has worked since 1992. Prior to this, he was a member of the Institute of Mathematics of the Hungarian Academy of Sciences for over 20 years, and he also worked at AT&T Bell Laboratories, Murray Hill, New Jersey, for two years.
  • 8.
    The field ofapplied mathematics known as Information Theory owes its origins and early development to three pioneers: Shannon (USA), Kolmogorov (Russia) and Rényi (Hungary). This book, authored by two of Rényi’s leading disciples, represents the elegant and precise development of the subject by the Hungarian School. This sec- ond edition contains new research of the authors on applications to secrecy theory and zero-error capacity with connections to combinatorial mathematics. Andrew Viterbi, USC Information Theory: Coding Theorems for Discrete Memoryless Systems, by Imre Csiszár and János Körner, is a classic of modern information theory. “Classic” since its first edition appeared in 1979. “Modern” since the mathematical techniques and the results treated are still fundamentally up to date today. This new edition was long overdue. Beyond the original material, it contains two new chapters on zero-error infor- mation theory and connections to extremal combinatorics, and on information theoretic security, a topic that has garnered very significant attention in the last few years. This book is an indispensable reference for researchers and graduate students working in the exciting and ever-growing area of information theory. Giuseppe Caire, USC The first edition of the Csiszár and Körner book on information theory is a classic, in constant use by most mathematically-oriented information theorists. The second edition expands the first with two new chapters, one on zero-error information theory and one on information theoretic security. These use the same consistent set of tools as edition 1 to organize and prove the central results of these currently important areas. In addi- tion, there are many new problems added to the original chapters, placing many newer research results into a consistent formulation. Robert Gallager, MIT The classic treatise on the fundamental limits of discrete memoryless sources and channels –an indispensable tool for every information theorist. Sergio Verdu, Princeton
  • 9.
    Information Theory Coding Theoremsfor Discrete Memoryless Systems IMRE CSISZÁR Alfréd Rényi Institute of Mathematics, Hungarian Academy of Sciences, Hungary JÁNOS KÖRNER Sapienza University of Rome, Italy
  • 10.
    C A MB R I D G E U N I V E R S I T Y P R E S S Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo, Delhi, Tokyo, Mexico City Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521196819 First edition c Akadémiai Kiadó, Budapest 1981 Second edition c Cambridge University Press 2011 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 1981 Second edition 2011 Printed in the United Kingdom at the University Press, Cambridge A catalog record for this publication is available from the British Library ISBN 978-0-521-19681-9 Hardback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
  • 11.
    To the memoryof Alfréd Rényi, the outstanding mathematician who established information theory in Hungary
  • 12.
    1 9 7 6 10 5 3 2 4 1416 Dependence graph of the text; numbers refer to chapters 17 13 15 12 11 8
  • 13.
    Contents Preface to thefirst edition page ix Preface to the second edition xi Basic notation and conventions xii Introduction xv Part I Information measures in simple coding problems 1 1 Source coding and hypothesis testing; information measures 3 2 Types and typical sequences 16 3 Formal properties of Shannon’s information measures 34 4 Non-block source coding 48 5 Blowing up lemma: a combinatorial digression 71 Part II Two-terminal systems 81 6 The noisy channel coding problem 83 7 Rate-distortion trade-off in source coding and the source–channel transmission problem 107 8 Computation of channel capacity and -distortion rates 120 9 A covering lemma and the error exponent in source coding 132 10 A packing lemma and the error exponent in channel coding 144 11 The compound channel revisited: zero-error information theory and extremal combinatorics 184
  • 14.
    viii Contents 12 Arbitrarilyvarying channels 209 Part III Multi-terminal systems 241 13 Separate coding of correlated sources 243 14 Multiple-access channels 272 15 Entropy and image size characterization 304 16 Source and channel networks 354 17 Information-theoretic security 400 References 461 Name index 478 Index of symbols and abbreviations 482 Subject index 485
  • 15.
    Preface to thefirst edition Information theory was created by Claude E. Shannon for the study of certain quan- titative aspects of information, primarily as an analysis of the impact of coding on information transmission. Research in this field has resulted in several mathematical theories. Our subject is the stochastic theory, often referred to as the Shannon theory, which directly descends from Shannon’s pioneering work. This book is intended for graduate students and research workers in mathematics (probability and statistics), electrical engineering and computer science. It aims to present a well-integrated mathematical discipline, including substantial new develop- ments of the 1970s. Although applications in engineering and science are not covered, we hope to have presented the subject so that a sound basis for applications has also been provided. A heuristic discussion of mathematical models of communication sys- tems is given in the Introduction, which also offers a general outline of the intuitive background for the mathematical problems treated in the book. As the title indicates, this book deals with discrete memoryless systems. In other words, our mathematical models involve independent random variables with finite range. Idealized as these models are from the point of view of most applications, their study reveals the characteristic phenomena of information theory without burdening the reader with the technicalities needed in the more complex cases. In fact, the reader needs no other prerequisites than elementary probability and a reasonable mathematical maturity. By limiting our scope to the discrete memoryless case, it was possible to use a unified, basically combinatorial approach. Compared with other methods, this often led to stronger results and yet simpler proofs. The combinatorial approach also seems to lead to a deeper understanding of the subject. The dependence graph of the text is shown on p. vi. There are several ways to build up a course using this book. A one-semester graduate course can be made up of Chapters 1, 2, 6, 7 and the first half of Chapter 13. A challeng- ing short course is provided by Chapters 2, 9 and 10. In both cases, the technicalities from Chapter 3 should be used when necessary. For students with some information theory background, a course on multi-terminal Shannon theory can be based on Part III, using Chapters 2 and 6 as preliminaries. The problems offer a lot of opportunities for creative work for the students. It should be noted, however, that illustrative examples are scarce; thus the teacher is also supposed to do some homework of his own by supplying such examples.
  • 16.
    x Preface tothe first edition Every chapter consists of text followed by a Problems section. The text covers the main ideas and proof techniques, with a sample of the results they yield. The selection of the latter was influenced both by didactic considerations and the authors’ research interests. Many results of equal importance are given in the Problem sections. While the text is self-contained, there are several points at which the reader is advised to supplement formal understanding by consulting specific problems. This suggestion is indicated by the Problem number in the margin of the text. For all but a few problems sufficient hints are given to enable a serious student familiar with the corresponding text to give a solution. The exceptions, marked by an asterisk, serve mainly for supplementary information; these problems are not necessarily more difficult than the others, but their solution requires methods not treated in the text. In the text the origins of the results are not mentioned, but credits to authors are given at the end of each chapter. Concerning the Problems, an appropriate attribution accompanies each Problem. An absence of references indicates that the assertion is either folklore or else an unpublished result of the authors. Results were attributed on the basis of publications in journals or books with complete proofs. The number after the author’s name indicates the year of appearance of the publication. Conference talks, theses and technical reports are quoted only if – to our knowledge – their authors have never published their result in another form. In such cases, the word “unpublished” is attached to the reference year, to indicate that the latter does not include the usual delay of “regular” publications. We are indebted to our friends Rudy Ahlswede, Péter Gács and Katalin Marton for fruitful discussions which contributed to many of our ideas. Our thanks are due to R. Ahlswede, P. Bártfai, J. Beck, S. Csibi, P. Gács, S. I. Gelfand, J. Komlós, G. Longo, K. Marton, A. Sgarro and G. Tusnády for reading various parts of the manuscript. Some of them have saved us from vicious errors. The patience of Mrs. Éva Várnai in typing and retyping the ever-changing manuscript should be remembered, as well as the spectacular pace of her doing it. Special mention should be made of the friendly assistance of Sándor Csibi who helped us to overcome technical difficulties with the preparation of the manuscript. Last but not least, we are grateful to Eugene Lukács for his constant encouragement without which this project might not have been completed.
  • 17.
    Preface to thesecond edition When the first edition of this book went to print, information theory was only 30 years old. At that time we covered a large part of the topic indicated in the title, a goal that is no longer realistic. An additional 30 years have passed, the Internet revolution occurred, and information theory has grown in breadth, volume and impact. Nevertheless, we feel that, despite many new developments, our original book has not lost its relevance since the material therein is still central to the field. The main novelty of this second edition is the addition of two new chapters. These cover zero-error problems and their connections to combinatorics (Chapter 11) and information-theoretic security (Chapter 17). Of several new research directions that emerged in the 30 years between the two editions, we chose to highlight these two because of personal research interests. As a matter of fact, these topics started to intrigue us when writing the first edition; back then, this led us to a last-minute addition of problems on secrecy. Except for the new chapters, new results are presented only in the form of problems. These either directly complete the original material or, occasionally, illustrate a new research area. We made only minor changes, mainly corrections, to the text of the origi- nal chapters. (Hence the words recent and new refer to the time of the first edition, unless the context indicates otherwise.) We have updated the history part of each chapter and, in particular, we have included pointers to new developments. We have not broadened the original scope of the book. Readers interested in a wider perspective may consult Cover and Thomas (2006). In the preface to the first edition we suggested several ways in which to construct courses using this book. In addition, either of the new Chapters 11 or 17 can be used for a short graduate course. As in the first edition, this book is dedicated to the memory of Alfréd Rényi, whose mathematical heritage continues to influence information theory and to inspire us. Special thanks are due to Miklós Simonovits, who, sacrificing his precious research time, assisted us to overcome TeX-nical difficulties as only the most selfless friend would do. We are indebted to our friends Prakash Narayan and Gábor Simonyi, as well as to the Ph.D. students Lóránt Farkas, Tamás Kói, Sirin Nitinawarat and Himanshu Tyagi for a careful reading of parts of the manuscript.
  • 18.
    Basic notation andconventions equal by definition iff if and only if end of a theorem, definition, remark, etc. end of a proof A, B, . . . , X, Y, Z sets (finite unless stated otherwise; infinite sets will be usually denoted by script capitals) ∅ void set x ∈ X x is an element of the set X; as a rule, elements of a set will be denoted by the same letter as the set X {x1, . . . , xk} X is a set having elements x1, . . . , xk |X| number of elements of the set X x = (x1, . . . , xn) x = x1 . . . xn vector (finite sequence) of elements of a set X X × Y Cartesian product of the sets X and Y Xn nth Cartesian power of X, i.e., the set of n-length sequences of elements of X X∗ set of all finite sequences of elements of X A ⊂ X A is a (not necessarily proper) subset of X A − B the set of those elements x ∈ A which are not in B Ā complement of a set A ⊂ X, i.e., Ā X − A (will be used only if a finite ground set X is specified) A ◦ B symmetric difference: A ◦ B (A − B) ∪ (B − A) f : X → Y mapping of X into Y f −1(y) the inverse image of y ∈ Y, i.e., f −1(y) {x : f (x) = y} || f || number of elements of the range of the mapping f PD abbreviation of “probability distribution” P {P(x) : x ∈ X} PD on X P(A) probability of the set A ⊂ X for the PD P, i.e., P(A) x∈A P(x) P × Q direct product of the PDs P on X and Q on Y, i.e., P × Q {P(x)Q(y) : x ∈ X, y ∈ Y} Pn nth power of the PD P, i.e., Pn(x) n i=1 P(xi ) support of P the set {x : P(x) 0}
  • 19.
    Notation xiii W :X → Y W = {W(y|x) : x ∈ X, y ∈ Y} ⎫ ⎬ ⎭ stochastic matrix with rows indexed by elements of X and columns indexed by elements of Y; i.e., W(·|x) is a PD on Y for every x ∈ X W(B|x) probability of the set B ⊂ Y for the PD W(·|x) Wn : Xn → Yn nth direct power of W, i.e., Wn(y|x) n i=1 W(yi |xi ) RV abbreviation for “random variable” X, Y, Z RVs ranging over finite sets Xn = (X1, . . . , Xn) Xn = X1 . . . Xn alternative notations for the vector-valued RV with compo- nents X1, . . ., Xn Pr {X ∈ A} probability of the event that the RV X takes a value in the set A PX distribution of the RV X, defined by PX (x) Pr {X = x} PY|X=x conditional distribution of Y given X = x, i.e., PY|X=x (y) Pr {Y = y|X = x}; not defined if PX (x) = 0 PY|X the stochastic matrix with rows PY|X=x , called the con- ditional distribution of Y given X; here x ranges over the support of PX PY|X = W means that PY|X=x = W(·|x) if PX (x) 0, involving no assumption on the remaining rows of W E X expectation of the real-valued RV X var(X) variance of the real-valued RV X X o — Y o — Z means that these RVs form a Markov chain in this order (a, b), [a, b], [a, b) open, closed resp. left-closed interval with endpoints a b |r|+ positive part of the real number r, i.e., |r|+ max (r, 0) r largest integer not exceeding r r smallest integer not less than r min[a, b], max[a, b] the smaller resp. larger of the numbers a and b r s means for vectors r = (r1, . . . ,rn), s = (s1, . . . , sn) of the n-dimensional Euclidean space that ri si , i = 1, . . . , n A convex closure of a subset A of a Euclidean space, i.e., the smallest closed convex set containing A exp, log are understood to the base 2 ln natural logarithm a log(a/b) equals zero if a = 0 and +∞ if a b = 0 h(r) the binary entropy function h(r) −r log r − (1 − r) log(1 − r), r ∈ [0, 1] Most asymptotic results in this book are established with uniform convergence. Our way of specifying the extent of uniformity is to indicate in the statement of results all those parameters involved in the problem upon which threshold indices depend. In this context, e.g., n0 = n0(|X|, ε, δ) means some threshold index which could be explicitly given as a function of |X|, ε, δ alone.
  • 20.
    xiv Notation Preliminaries onrandom variables and probability distributions As we shall deal with RVs ranging over finite sets, the measure-theoretic foundations of probability theory will never really be needed. Still, in a formal sense, when speaking of RVs it is understood that a Kolmogorov probability space (, F, μ) is given (i.e., is some set, F is a σ-algebra of its subsets, and μ is a probability measure on F). Then a RV with values in a finite set X is a mapping X : → X such that X−1(x) ∈ F for every x ∈ X. The probability of an event defined in terms of RVs means the μ-measure of the corresponding subset of , e.g., Pr {X ∈ A} μ({ω : X(ω) ∈ A}). Throughout this book, it will be assumed that the underlying probability space (, F, μ) is “rich enough” in the following sense. To any pair of finite sets X, Y, any RV X with values in X and any distribution P on X × Y whose marginal on X coincides with PX , there exists a RV Y with values in Y such that PXY = P. This assumption is certainly fulfilled, e.g., if is the unit interval, F is the family of its Borel subsets, and μ is the Lebesgue measure. The set of all PDs on a finite set X will be identified with the subset of the |X|- dimensional Euclidean space, consisting of all vectors with non-negative components summing up to unity. Linear combinations of PDs and convexity are understood accord- ingly. For example, the convexity of a real-valued function f (P) of PDs on X means that f (αP1 + (1 − α)P2) αf (P1) + (1 − α) f (P2) for every P1, P2 and α ∈ (0, 1). Similarly, topological terms for PDs on X refer to the metric topology defined by Euclidean distance. In particular, the convergence Pn → P means that Pn(x) → P(x) for every x ∈ X. The set of all stochastic matrices W : X → Y is identified with a subset of the |X||Y|-dimensional Euclidean space in an analogous manner. Convexity and topological concepts for stochastic matrices are understood accordingly. Finally, for any distribution P on X and any stochastic matrix W : X → Y we denote by PW the distribution on Y defined as the matrix product of the (row) vector P and the matrix W, i.e., (PW)(y) x∈X P(x)W(y|x) for every y ∈ Y.
  • 21.
    Introduction Information is afashionable concept with many facets, among which the quantitative one–our subject–is perhaps less striking than fundamental. At the intuitive level, for our purposes, it suffices to say that information is some knowledge of predetermined type contained in certain data or pattern and wanted at some destination. Actually, this concept will not explicitly enter the mathematical theory. However, throughout the book certain functionals of random variables will be conveniently interpreted as measures of the amount of information provided by the phenomena modeled by these variables. Such information measures are characteristic tools of the analysis of optimal performance of codes, and they have turned out to be useful in other branches of mathematics as well. Intuitive background The mathematical discipline of information theory, created by C. E. Shannon (1948) on an engineering background, still has a special relation to communication engineering, the latter being its major field of application and the source of its problems and moti- vation. We believe that some familiarity with the intuitive communication background is necessary for a more than formal understanding of the theory, let alone for doing fur- ther research. The heuristics, underlying most of the material in this book, can be best explained on Shannon’s idealized model of a communication system (which can also be regarded as a model of an information storage system). The important question of how far the models treated are related to, and the results obtained are relevant for, real systems will not be addressed. In this respect we note that although satisfactory math- ematical modeling of real systems is often very difficult, it is widely recognized that significant insight into their capabilities is given by phenomena discovered on appar- ently overidealized models. Familiarity with the mathematical methods and techniques of proof is a valuable tool for system designers in judging how these phenomena apply in concrete cases. Shannon’s famous block diagram of a (two-terminal) communication system is shown in Fig. I.1. Before turning to the mathematical aspects of Shannon’s model, let us take a glance at the objects to be modeled. The source of information may be nature, a human being, a computer, etc. The data or pattern containing the information at the source is called the message; it may consist of observations on a natural phenomenon, a spoken or written sentence, a sequence of
  • 22.
    xvi Introduction Source EncoderChannel Decoder Destination Figure I.1 binary digits, etc. Part of the information contained in the message (e.g., the shape of characters of a handwritten text) may be immaterial to the particular destination. Small distortions of the relevant information might be tolerated as well. These two aspects are jointly reflected in a fidelity criterion for the reproduction of the message at the destination. For example, for a person watching a color TV program on a black-and- white set, the information contained in the colors must be considered immaterial and the fidelity criterion is met if the picture is not perceivably worse than it would be by a good black-and-white transmission. Clearly, the fidelity criterion of a person watching the program in color would be different. The source and destination are separated in space or time. The communication or storing device available for bridging over this separation is called the channel. As a rule, the channel does not work perfectly and thus its output may significantly differ from the input. This phenomenon is referred to as channel noise. While the properties of the source and channel are considered unalterable, characteristic to Shannon’s model is the liberty of transforming the message before it enters the channel. Such a transfor- mation, called encoding, is always necessary if the message is not a possible input of the channel (e.g., a written sentence cannot be directly radioed). More importantly, encod- ing is an effective tool of reducing the cost of transmission and of combating channel noise (trivial examples are abbreviations such as cable addresses in telegrams on the one hand, and spelling names on telephone on the other). Of course, these two goals are conflicting and a compromise must be found. If the message has been encoded before entering the channel – and often even if not – a suitable processing of the channel out- put is necessary in order to retrieve the information in a form needed at the destination; this processing is called decoding. The devices performing encoding and decoding are the encoder and decoder of Fig. I.1. The rules determining their operation constitute the code. A code accomplishes reliable transmission if the joint operation of encoder, chan- nel and decoder results in reproducing the source messages at the destination within the prescribed fidelity criterion. Informal description of the basic mathematical model Shannon developed information theory as a mathematical study of the problem of reliable transmission at a possibly low cost (for a given source, channel and fidelity criteria). For this purpose mathematical models of the objects in Fig. I.1 had to be introduced. The terminology of the following models reflects the point of view of com- munication between terminals separated in space. Appropriately interchanging the roles of time and space, these models are equally suitable for describing data storage. Having in mind a source which keeps producing information, its output is visual- ized as an infinite sequence of symbols (e.g., Latin characters, binary digits, etc.). For
  • 23.
    Introduction xvii an observer,the successive symbols cannot be predicted. Rather, they seem to appear randomly according to probabilistic laws representing potentially available prior knowl- edge about the nature of the source (e.g., in the case of an English text we may think of language statistics, such as letter or word frequencies, etc.). For this reason the source is identified with a discrete-time stochastic process. The first k random variables of the source process represent a random message of length k; realizations thereof are called messages of length k. The theory is largely of asymptotic character: we are interested in the transmission of long messages. This justifies restricting our attention to messages of equal length, although, e.g., in an English text, the first k letters need not repre- sent a meaningful piece of information; the point is that a sentence cut at the tail is of negligible length compared to a large k. In non-asymptotic investigations, however, the structure of messages is of secondary importance. Then it is mathematically more convenient to regard them as realizations of an arbitrary random variable, the so-called random message (which may be identified with a finite segment of the source process or even with the whole process, etc.). Hence we shall often speak of messages (and their transformation) without specifying a source. An obvious way of taking advantage of a stochastic model is to disregard undesirable events of small probability. The simplest fidelity criterion of this kind is that the proba- bility of error, i.e., the overall probability of not receiving the message accurately at the destination, should not exceed a given small number. More generally, viewing the mes- sage and its reproduction at the destination as realizations of stochastically dependent random variables, a fidelity criterion is formulated as a global requirement involving their joint distribution. Usually, one introduces a numerical measure of the loss result- ing from a particular reproduction of a message. In information theory this is called a distortion measure. A typical fidelity criterion is that the expected distortion be less than a threshold, or that the probability of a distortion transgressing this threshold be small. The channel is supposed to be capable of successively transmitting symbols from a given set, the input alphabet. There is a starting point of the transmission and each of the successive uses of the channel consists of putting in one symbol and observing the corresponding symbol at the output. In the ideal case of a noiseless channel the output is identical to the input; in general, however, they may differ and the output need not be uniquely determined by the input. Also, the output alphabet may differ from the input alphabet. Following the stochastic approach, it is assumed that for every finite sequence of input symbols there exists a probability distribution on output sequences of the same length. This distribution governs the successive outputs if the elements of the given sequence are successively transmitted from the start of transmission on, as the begin- ning of a potentially infinite sequence. This assumption implies that no output symbol is affected by possible later inputs, and it amounts to certain consistency requirements among the mentioned distributions. The family of these distributions represents all pos- sible knowledge about the channel noise, prior to transmission. This family defines the channel as a mathematical object. The encoder maps messages into sequences of channel input symbols in a not neces- sarily one-to-one way. Mathematically, this very mapping is the encoder. The images of messages are referred to as codewords. For convenience, attention is usually restricted
  • 24.
    xviii Introduction to encoderswith fixed codeword length, mapping the messages into channel input sequences of length n, say. Similarly, from a purely mathematical point of view, a decoder is a mapping of output sequences of the channel into reproductions of mes- sages. By a code we shall mean, as a rule, an encoder–decoder pair or, in specific problems, a mathematical object effectively determining this pair. A random message, an encoder, a channel and a decoder define a joint probability distribution over messages, channel input and output sequences, and reproductions of the messages at the destination. In particular, it can be decided whether a given fidelity criterion is met. If it is, we speak of reliable transmission of the random message. The cost of transmission is not explicitly included in the above mathematical model. As a rule, one implicitly assumes that its main factor is the cost of channel use, the latter being proportional to the length of the input sequence. (In the case of telecommunica- tion this length determines the channel’s operation time and, in the case of data storage, the occupied space, provided that each symbol requires the same time or space, respec- tively.) Hence, for a given random message, channel and fidelity criterion, the problem consists in finding the smallest codeword length n for which reliable transmission can be achieved. We are basically interested in the reliable transmission of long messages of a given source using fixed-length-to-fixed-length codes, i.e., encoders mapping messages of length k into channel input sequences of length n and decoders mapping channel out- put sequences of length n into reproduction sequences of length k. The average number n/k of channel symbols used for the transmission of one source symbol is a measure of the performance of the code, and it will be called the transmission ratio. The goal is to determine the limit of the minimum transmission ratio (LMTR) needed for reli- able transmission, as the message length k tends to infinity. Implicit in this problem statement is that fidelity criteria are given for all sufficiently large k. Of course, for the existence of a finite LMTR, let alone for its computability, proper conditions on source, channel and fidelity criteria are needed. The intuitive problem of transmission of long messages can also be approached in another – more ambitious – manner, incorporating into the model certain constraints on the complexity of encoder and decoder, along with the requirement that the transmis- sion be indefinitely continuable. Any fixed-length-to-fixed-length code, designed for transmitting messages of length k by n channel symbols, say, may be used for non- terminating transmission as follows. The infinite source output sequence is partitioned into consecutive blocks of length k. The encoder mapping is applied to each block separately and the channel input sequence is the succession of the obtained blocks of length n. The channel output sequence is partitioned accordingly and is decoded block- wise by the given decoder. This method defines a code for non-terminating transmission. The transmission ratio is n/k; the block lengths k and n constitute a rough measure of complexity of the code. If the channel has no “input memory,” i.e., the transmission of the individual blocks is not affected by previous inputs, and if the source and channel are time-invariant, then each source block will be reproduced within the same fidelity criterion as the first one. Suppose, in addition, that the fidelity criteria for messages of different length have the following property: if successive blocks and their reproductions
  • 25.
    Introduction xix individually meetthe fidelity criterion, then so does their juxtaposition. Then, by this very coding, messages of potentially infinite length are reliably transmitted, and one can speak of reliable non-terminating transmission. Needless to say, this blockwise coding is a very special way of realizing non-terminating transmission. Still, within a very general class of codes for reliable non-terminating transmission, in order to minimize the transmission ratio1 under conditions such as above, it suffices to restrict attention to blockwise codes. In such cases the present minimum equals the previous LMTR and the two approaches to the intuitive problem of transmission of long messages are equivalent. While in this book we basically adopt the first approach, a major reason of consid- ering mainly fixed-length-to-fixed-length codes consists in their appropriateness also for non-terminating transmission. These codes themselves are often called block codes without specifically referring to non-terminating transmission. Measuring information A remarkable feature of the LMTR problem, discovered by Shannon and established in great generality by further research, is a phenomenon suggesting the heuristic inter- pretation that information, like liquids, “has volume but no shape,” i.e., the amount of information is measurable by a scalar. Just as the time necessary for conveying the liq- uid content of a large container through a pipe (at a given flow velocity) is determined by the ratio of the volume of the liquid to the cross-sectional area of the pipe, the LMTR equals the ratio of two numbers, one depending on the source and fidelity criterion, the other depending on the channel. The first number is interpreted as a measure of the amount of information needed, on average, for the reproduction of one source symbol, whereas the second is a measure of the channel’s capacity, i.e., of how much informa- tion is transmissible on average by one channel use. It is customary to take as a standard the simplest channel that can be used for transmitting information, namely the noise- less channel with two input symbols, 0 and 1, say. The capacity of this binary noiseless channel, i.e., the amount of information transmissible by one binary digit, is considered the unit of the amount of information, called 1 bit. Accordingly, the amount of informa- tion needed on average for the reproduction of one symbol of a given source (relative to a given fidelity criterion) is measured by the LMTR for this source and the binary noiseless channel. In particular, if the most demanding fidelity criterion is imposed, which within a stochastic theory is that of a small probability of error, the correspond- ing LMTR provides a measure of the total amount of information carried, on average, by one source symbol. 1 The relevance of this minimization problem to data storage is obvious. In typical communication situations, however, the transmission ratio of non-terminating transmission cannot be chosen freely. Rather, it is deter- mined by the rates at which the source produces and the channel transmits symbols. Then one question is whether a given transmission ratio admits reliable transmission, but this is mathematically equivalent to the above minimization problem.
  • 26.
    xx Introduction The aboveideas naturally suggest the need for a measure of the amount of infor- mation individually contained in a single source output. In view of our source model, this means to associate some information content with an arbitrary random variable. One relies on the intuitive postulate that the observation of a collection of independent random variables yields an amount of information equal to the sum of the information contents of the individual variables. Accordingly, one defines the entropy (information content) of a random variable as the amount of information carried, on average, by one symbol of a source which consists of a sequence of independent copies of the random variable in question. This very entropy is also a measure of the amount of uncertainty concerning this random variable before its observation. We have sketched a way of assigning information measures to sources and chan- nels in connection with the LMTR problem and arrived, in particular, at the concept of entropy of a single variable. There is also an opposite way: starting from entropy, which can be expressed by a simple formula, one can build up more complex func- tionals of probability distributions. On the basis of heuristic considerations (quite independent of the above communication model), these functionals can be interpreted as information measures corresponding to different connections of random variables. The operational significance of these information measures is not a-priori evident. Still, under general conditions the solution of the LMTR problem can be given in terms of these quantities. More precisely, the corresponding theorems assert that the operationally defined information measures for source and channel can be given by such functionals, just as intuition suggests. This consistency underlines the importance of entropy-based information measures, both from a formal and a heuristic point of view. The relevance of these functionals, corresponding to their heuristic meaning, is not restricted to communication or storage problems. Still, there are also other functionals which can be interpreted as information measures with an operational significance not related to coding. Multi-terminal systems Shannon’s block diagram (Fig. I.1) models one-way communication between two termi- nals. The communication link it describes can be considered as an artificially isolated elementary part of a large communication system involving exchange of information among many participants. Such an isolation is motivated by the implicit assumptions that (i) the source and channel are in some sense independent of the remainder of the system, the effects of the environment being taken into account only as channel noise, (ii) if exchange of information takes place in both directions, they do not affect each other.
  • 27.
    Introduction xxi Note thatdropping assumption (ii) is meaningful even in the case of communica- tion between two terminals. Then the new phenomenon arises that transmission in one direction has the byproduct of feeding back information on the result of transmission in the opposite direction. This feedback can conceivably be exploited for improv- ing the performance of the code; this, however, will necessitate a modification of the mathematical concept of the encoder. Problems involving feedback will be discussed in this book only casually. On the other hand, the whole of Part III will be devoted to problems arising from dropping assumption (i). This leads to models of multi-terminal systems with several sources, channels and destinations, such that the stochastic interdependence of individual sources and channels is taken into account. A heuristic description of such mathematical mod- els at this point would lead too far. However, we feel that readers familiar with the mathematics of two-terminal systems treated in Parts I and II will have no difficulty in understanding the motivation for the multi-terminal models of Part III.
  • 29.
    Part I Information measuresin simple coding problems
  • 31.
    1 Source codingand hypothesis testing; information measures A (discrete) source is a sequence {Xi }∞ i=1 of random variables (RVs) taking values in a finite set X called the source alphabet. If the Xi ’s are independent and have the same distribution P, we speak of a discrete memoryless source (DMS) with generic distribution P. A k-to-n binary block code is a pair of mappings f : Xk → {0, 1}n , ϕ : {0, 1}n → Xk . For a given source, the probability of error of the code ( f, ϕ) is e( f, ϕ) Pr{ϕ( f (Xk )) = Xk }, where Xk stands for the k-length initial string of the sequence {Xi }∞ i=1. We are interested in finding codes with small ratio n/k and small probability of error. ➞ 1.1 More exactly, for every k let n(k, ε) be the smallest n for which there exists a k-to-n binary block code satisfying e( f, ϕ) ε; we want to determine limk→∞ n(k,ε) k . ➞ 1.2 THEOREM 1.1 For a DMS with generic distribution P = {P(x) : x ∈ X} lim k→∞ n(k, ε) k = H(P) for every ε ∈ (0, 1), (1.1) where H(P) − x ∈ X P(x) log P(x). COROLLARY 1.1 0 H(P) log |X|. (1.2) Proof The existence of a k-to-n binary block code with e( f, ϕ) ε is equivalent to the existence of a set A ⊂ Xk with Pk(A) 1 − ε, |A| 2n (let A be the set of those sequences x ∈ Xk which are reproduced correctly, i.e., ϕ( f (x)) = x). Denote by s(k, ε) the minimum cardinality of sets A ⊂ Xk with Pk(A) 1 − ε. It suffices to show that lim k→∞ 1 k log s(k, ε) = H(P) (ε ∈ (0, 1)). (1.3) To this end, let B(k, δ) be the set of those sequences x ∈ Xk which have probability exp{−k(H(P) + δ)} Pk (x) exp{−k(H(P) − δ)}.
  • 32.
    4 Information measuresin simple coding problems We first show that Pk(B(k, δ)) → 1 as k → ∞, for every δ 0. In fact, consider the real-valued RVs Yi − log P(Xi ); these are well defined with probability 1 even if P(x) = 0 for some x ∈ X. The Yi ’s are independent, identically distributed and have expectation H(P). Thus by the weak law of large numbers lim k→∞ Pr 1 k k i=1 Yi − H(P) δ = 1 for every δ 0. As Xk ∈ B(k, δ) iff |1 k k i=1 Yi − H(P)| δ, the convergence relation means that lim k→∞ Pk (B(k, δ)) = 1 for every δ 0, (1.4) as claimed. The definition of B(k, δ) implies that |B(k, δ)| exp{k(H(P)) + δ)}. Thus (1.4) gives for every δ 0 lim k→∞ 1 k log s(k, ε) lim k→∞ 1 k log |B(k, δ)| H(P) + δ. (1.5) On the other hand, for every set A ⊂ Xk with Pk(A) 1 − ε, (1.4) implies Pk (A ∩ B(k, δ)) 1 − ε 2 for sufficiently large k. Hence, by the definition of B(k, δ), |A| |A ∩ B(k, δ)| x∈A ∩ B(k,δ) Pk (x) exp{k(H(P) − δ)} 1 − ε 2 exp{k(H(P) − δ)}, proving that for every δ 0 lim k→∞ 1 k log s(k, ε) H(P) − δ. This and (1.5) establish (1.3). The corollary is immediate. For intuitive reasons expounded in the Introduction, the limit H(P) in Theorem 1.1 is interpreted as a measure of the information content of (or the uncertainty about) a RV X with distribution PX = P. It is called the entropy of the RV X or of the distribution P: H(X) = H(P) − x∈X P(x) log P(x). This definition is often referred to as Shannon’s formula.
  • 33.
    Source coding andhypothesis testing 5 The mathematical essence of Theorem 1.1 is formula (1.3). It gives the asymptotics for the minimum size of sets of large probability in Xk. We now generalize (1.3) for the case when the elements of Xk have unequal weights and the size of subsets is measured by total weight rather than cardinality. Let us be given a sequence of positive-valued “mass functions” M1(x), M2(x), . . . on X and set M(x) k i=1 Mi (xi ) for x = x1 · · · xk ∈ Xk . For an arbitrary sequence of X-valued RVs {Xi }∞ i=1 consider the minimum of the M-mass M(A) x ∈ A M(x) of those sets A ⊂ Xk which contain Xk with high probability: let s(k, ε) denote the minimum of M(A) for sets A ⊂ Xk of probability PXk (A) 1 − ε. The previous s(k, ε) is a special case obtained if all the functions Mi (x) are identically equal to 1. THEOREM 1.2 If the Xi ’s are independent with distributions Pi PXi and | log Mi (x)| c for every i and x ∈ X then, setting Ek 1 k k i=1 x ∈ X Pi (x) log Mi (x) Pi (x) , we have for every 0 ε 1 lim k→∞ 1 k log s(k, ε) − Ek = 0. More precisely, for every δ, ε ∈ (0, 1), 1 k log s(k, ε) − Ek δ if k k0 = k0(|X|, c, ε, δ). (1.6) Proof Consider the real-valued RVs Yi log Mi (Xi ) Pi (Xi ) . Since the Yi ’s are independent and E 1 k k i=1 Yi = Ek, Chebyshev’s inequality gives for any δ 0 Pr 1 k k i=1 Yi − Ek δ 1 k2δ2 k i=1 var (Yi ) 1 kδ2 max i var (Yi ).
  • 34.
    6 Information measuresin simple coding problems This means that for the set B(k, δ ) x : x ∈ Xk , Ek − δ 1 k log M(x) PXk (x) Ek + δ we have PXk (B(k, δ )) 1 − ηk, where ηk 1 kδ2 max i var (Yi ). Since by the definition of B(k, δ) M(B(k, δ )) = x∈B(k,δ) M(x) x∈B(k,δ) PXk (x) exp[k(Ek + δ )] exp[k(Ek + δ )], it follows that 1 k log s(k, ε) 1 k log M(B(k, δ )) Ek + δ if ηk ε. On the other hand, we have PXk (A ∩ B(k, δ)) 1 − ε − ηk for any set A ⊂ Xk with PXk (A) 1 − ε. Thus for every such A, again by the definition of B(k, δ), M(A) M(A ∩ B(k, δ )) x∈A∩B(k,δ) PXk (x) exp{k(Ek − δ )} (1 − ε − ηk) exp[(Ek − δ )], implying 1 k log s(k, ε) 1 k log(1 − ε − ηk) + Ek + δ . Setting δ δ/2, these results imply (1.6) provided that ηk = 4 kδ2 max i var (Yi ) ε and 1 k log(1 − ε − ηk) − δ 2 . By the assumption | log Mi (x)| c, the last relations hold if k k0(|X|, c, ε, δ). An important corollary of Theorem 1.2 relates to testing statistical hypotheses. Sup- pose that a probability distribution of interest for the statistician is given by either P = {P(x) : x ∈ X} or Q = {Q(x) : x ∈ X}. She or he has to decide between P and Q on the basis of a sample of size k, i.e., the result of k independent drawings from the unknown distribution. A (non-randomized) test is characterized by a set A ⊂ Xk, in ➞ 1.3 the sense that if the sample X1 . . . Xk belongs to A, the statistician accepts P and else accepts Q. In most practical situations of this kind, the role of the two hypotheses is not symmetric. It is customary to prescribe a bound ε for the tolerated probability of wrong decision if P is the true distribution. Then the task is to minimize the probability of a wrong decision if hypothesis Q is true. The latter minimum is ➞ 1.4 β(k, ε) min A⊂Xk Pk(A)1−ε Qk (A).
  • 35.
    Source coding andhypothesis testing 7 COROLLARY 1.2 For any 0 ε 1, lim k→∞ 1 k log β(k, ε) = − x∈X P(x) log P(x) Q(x) . Proof If Q(x) 0 for each x ∈ X, set Pi P, Mi Q in Theorem 1.2. If P(x) Q(x) = 0 for some x ∈ X, the P-probability of the set of all k-length sequences con- taining this x tends to 1. This means that β(k, ε) = 0 for sufficiently large k, so that both sides of the asserted equality are −∞. It follows from Corollary 1.2 that the sum on the right-hand side is non-negative. It measures how much the distribution Q differs from P in the sense of statistical distinguishability, and is called informational divergence or I-divergence: D(P||Q) x∈X P(x) log P(x) Q(x) . Another common name given to this quantity is relative entropy. Intuitively, one can say that the larger D(P||Q) is, the more information for discriminating between the hypotheses P and Q can be obtained from one observation. Hence D(P||Q) is also called the information for discrimination. The amount of information measured by D(P||Q) is, however, conceptually different from entropy, since it has no immediate coding interpretation. On the space of infinite sequences of elements of X one can build up product measures both from P and Q. If P = Q, the two product measures are mutually orthogonal; D(P||Q) is a (non-symmetric) measure of how fast their restrictions to k-length strings approach orthogonality. REMARK Both entropy and informational divergence have a form of expectation: H(X) = E(− log P(X)), D(P||Q) = E log P(X) Q(X) , where X is a RV with distribution P. It is convenient to interpret − log P(x), resp. log P(x)/Q(x), as a measure of the amount of information, resp. the weight of evidence in favor of P against Q provided by a particular value x of X. These quantities are important ingredients of the mathematical framework of information theory, but have less direct operational meaning than their expectations. The entropy of a pair of RVs (X, Y) with finite ranges X and Y needs no new def- inition, since the pair can be considered a single RV with range X × Y. For brevity, instead of H((X, Y)) we shall write H(X, Y); similar notation will be used for any finite collection of RVs. The intuitive interpretation of entropy suggests to consider as further information measures certain expressions built up from entropies. The difference H(X, Y) − H(X) measures the additional amount of information provided by Y if X is already known.
  • 36.
    8 Information measuresin simple coding problems It is called the conditional entropy of Y given X: H(Y|X) H(X, Y) − H(X). Expressing the entropy difference by Shannon’s formula we obtain H(Y|X) = − x∈X y∈Y PXY (x, y) log PXY (x, y) PX (x) = x∈X PX (x)H(Y|X = x), (1.7) where H(Y|X = x) = − y∈Y PY|X (y|x) log PY|X (y|x). Thus H(Y|X) is the expectation of the entropy of the conditional distribution of Y given X = x. This gives further support to the above intuitive interpretation of condi- tional entropy. Intuition also suggests that the conditional entropy cannot exceed the unconditional one. ➞ 1.5 LEMMA 1.3 H(Y|X) H(Y). Proof H(Y) − H(Y|X) = H(Y) − H(X, Y) + H(X) = x∈X y∈Y PXY (x, y) log PXY (x, y) PX (x)PY (y) = D(PXY PX × PY ) 0. REMARK For certain values of x, H(Y|X = x) may be larger than H(Y). The entropy difference in the preceding proof measures the decrease of uncertainty about Y caused by the knowledge of X. In other words, it is a measure of the amount of information about Y contained in X. Note the remarkable fact that this difference is symmetric in X and Y. It is called mutual information: I (X ∧ Y) = H(Y) − H(Y|X) = H(X) − H(X|Y) = D(PXY PX × PY ). (1.8) Of course, the amount of information contained in X about itself is just the entropy: I (X ∧ X) = H(X). Mutual information is a measure of stochastic dependence of the RVs X and Y. The fact that I (X ∧ Y) equals the informational divergence of the joint distribution of X and Y from what it would be if X and Y were independent reinforces this interpretation. There is no compelling reason other than tradition to denote mutual information by a different symbol than entropy. We keep this tradition, although our notation I (X ∧ Y) differs slightly from the more common I (X; Y).
  • 37.
    Source coding andhypothesis testing 9 Discussion Theorem 1.1 says that the minimum number of binary digits needed – on average – to represent one symbol of a DMS with generic distribution P equals the entropy H(P). This fact – and similar ones discussed later on – are our basis for interpreting H(X) as a measure of the amount of information contained in the RV X, resp. of the uncer- tainty about this RV. In other words, in this book we adopt an operational or pragmatic approach to the concept of information. Alternatively, one could start from the intu- itive concept of information and set up certain postulates which an information measure should fulfil. Some representative results of this axiomatic approach are treated in Problems 1.11–1.14. Our starting point, Theorem 1.1, has been proved here in the conceptually simplest way. The key idea is that, for large k, all sequences in a subset of Xk with probability close to 1, namely B(k, δ), have “nearly equal” probabilities in an exponential sense. This proof easily extends also to non-DM cases (not in the scope of this book). On the other hand, in order to treat DM models at depth, another – purely combina- torial – approach will be more suitable. The preliminaries to this approach will be given in Chapter 2. Theorem 1.2 demonstrates the intrinsic relationship of the basic source coding and hypothesis testing problems. The interplay of information theory and mathematical statistics goes much further; its more substantial examples are beyond the scope of this book. Problems 1.1. (a) Check that the problem of determining limk→∞ 1 k n(k, ε) for a discrete source is just the formal statement of the LMTR problem (see the Intro- duction) for the given source and the binary noiseless channel, with the probability of error fidelity criterion. (b) Show that for a DMS and a noiseless channel with arbitrary alphabet size m the LMTR is H(P)/ log m, where P is the generic distribution of the source. 1.2. Given an encoder f : Xk → {0, 1}n, show that the probability of error e( f, ϕ) is minimized iff the decoder ϕ : {0, 1}n → Xk has the property that ϕ(y) is a sequence of maximum probability among those x ∈ Xk for which f (x) = y. 1.3. A randomized test introduces a chance element into the decision between the hypotheses P and Q in the sense that if the result of k successive drawings is x ∈ Xk, one accepts the hypothesis P with probability π(x), say. Define the analog of β(k, ε) for randomized tests and show that it still satisfies Corollary 1.2. 1.4. (Neyman–Pearson lemma) Show that for any given bound 0 ε 1 on the probability of wrong decision if P is true, the best randomized test is given by π(x) = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1 if Pk(x) ck Qk(x) γk if Pk(x) = ck Qk(x) 0 if Pk(x) ck Qk(x),
  • 38.
    10 Information measuresin simple coding problems where ck and γk are appropriate constants. Observe that the case k = 1 contains the general one, and there is no need to restrict attention to independent drawings. 1.5. (a) Let {Xi }∞ i=1 be a sequence of independent RVs with common range X but with arbitrary distributions. As in Theorem 1.1, denote by n(k, ε) the small- est n for which there exists a k-to-n binary block code having probability of error ε for the source {Xi }∞ i=1. Show that for every ε ∈ (0, 1) and δ 0 n(k, ε) k − 1 k k i=1 H(Xi ) δ if k k0(|X|, ε, δ). Hint Use Theorem 1.2 with Mi (x) = 1. (b) Let {(Xi , Yi )}∞ i=1 be a sequence of independent replicas of a pair of RVs (X, Y) and suppose that Xk should be encoded and decoded in the knowl- edge of Yk. Let ñ(k, ε) be the smallest n for which there exists an encoder f : Xk × Yk → {0, 1}n and a decoder ϕ : {0, 1}n × Yk → Xk such that the probability of error is Pr{ϕ( f (Xk, Yk), Yk) = Xk} ε. Show that lim k→∞ ñ(k, ε) k = H(X|Y) for every ε ∈ (0, 1). Hint Use part (a) for the conditional distributions of the Xi ’s given various realizations y of Yk. 1.6. (Random selection of codes) Let F(k, n) be the class of all mappings f : Xk → {0, 1}n. Given a source {Xi }∞ i=1, consider the class of codes ( f, ϕ f ), where f ranges over F(k, n) and ϕ f : {0, 1}n → Xk is defined so as to minimize e( f, ϕ); see Problem 1.2. Show that for a DMS with generic distribution P we have 1 |F(k, n)| f ∈F(k,n) e( f, ϕ f ) → 0, if k and n tend to infinity, so that inf n k H(P). Hint Consider a random mapping F of Xk into {0, 1}n, assigning to each x ∈ Xk one of the 2n binary sequences of length n with equal probabilities 2−n, indepen- dently of each other and of the source RVs. Let Φ : {0, 1}n → Xk be the random mapping taking the value ϕ f if F = f . Then 1 |F(k, n)| f ∈F(k,n) e( f, ϕ f ) = Pr{Φ(F(Xk )) = Xk } = x∈Xk Pk (x)Pr{Φ(F(x)) = x}.
  • 39.
    Source coding andhypothesis testing 11 Here Pr{Φ(F(x)) = x} 2−n |{x : Pk (x ) Pk (x)}| and this is less than 2−n+k(H(P)+δ) if Pk(x) 2−k(H(P)+δ). 1.7. (a) (Linear source codes) Let X be a Galois field (i.e., any finite field) and con- sider Xk as a vector space over this field. A linear source code is a pair of mappings f : Xk → Xn and ϕ : Xn → Xk such that f is a linear mapping (ϕ is arbitrary). Show that for a DMS with generic distribution P there exist linear source codes with n/k → H(P)/ log |X| and e( f, ϕ) → 0. Compare this result with Problem 1.l(b). (Implicit in Elias (1955), cf. Wyner (1974).) Hint Verify that the class of all linear mappings f : Xk → Xn satisfies the condition in (b) below. (b) Extend the result of Problem 1.6 to the case when the role of {0, 1} is played by any finite set Y, and F(k, n) is any class of mappings f : Xk → Yn satisfying 1 |F(k, n)| { f ∈ F(k, n): f (x) = f (x )} |Y|−n for x = x . (Such a class of mappings is called a universal hash family; see Carter and Wegman (1979).) Hint If |Y| = 2, the hint to Problem 1.6 applies verbatim for the random map- ping F selected from the present F(k, n), by the uniform distribution. If |Y| 2, the crucial bound on Pr{Φ(F(x)) = x} will hold with |Y|−n instead of 2−n; accordingly, the assertion follows if in the hypothesis H(P) is replaced by H(P)/ log |Y|. 1.8.∗ Show that the s(k, ε) of Theorem 1.2 satisfies log s(k, ε) − Ek − √ kλSk + 1 2 log k 140 δ8 whenever δ min Sk, 1 Rk , δ ε 1 − δ, √ k 140 δ8 . Here Sk 1 k k i=1 var (Yi ) 1/2 , Rk = 1 k k i=1 E|Yi −EYi |3 1/3 and λ is determined by Φ(λ) = 1 − ε, where Φ denotes the distribution function of the standard normal distribution; Ek and Yi are the same as in the text. (See Strassen (1964).) 1.9. In hypothesis testing problems it sometimes makes sense to speak of “prior prob- abilities” Pr{P is true} = p0 and Pr{Q is true} = q0 = 1 − p0. On the basis of a sample x ∈ Xk, the posterior probabilities are then calculated as
  • 40.
    12 Information measuresin simple coding problems Pr{P is true |Xk = x} pk(x) = p0 Pk(x) p0 Pk(x) + q0 Qk(x) , Pr{Q is true |Xk = x} qk(x) = 1 − pk(x). Show that if P is true then pk(Xk) → 1 and 1 k log qk(Xk) → −D(PQ) with probability 1, no matter what was p0 ∈ (0, 1). 1.10. The interpretation of entropy as a measure of uncertainty suggests that “more uniform” distributions have larger entropy. For two distributions P and Q on X we call P more uniform than Q, in symbols P Q, if for the non-increasing ordering p1 p2 · · · pn, q1 q2 · · · qn (n = |X|) of their proba- bilities, k i=1 pi k i=1 qi for every 1 k n. Show that P Q implies H(P) H(Q); compare this result with (1.2). (More generally, P Q implies k i=1 ψ(pi ) k i=1 ψ(qi ) for every con- vex function ψ; see Karamata (1932).) Postulational characterizations of entropy (Problems 1.11–1.14) In the following problems, Hm(p1, . . . , pm), m = 2, 3, . . ., designates a sequence of real-valued functions defined for non-negative pi ’s with sum 1 such that Hm is invariant under permutations of the pi ’s. Some simple postulates on Hm will be formulated which ensure that Hm(p1, . . . , pm) = − m i=1 pi log pi , m = 2, 3 . . . (∗) In particular, we shall say that {Hm} is (i) expansible if Hm+1(p1, . . . , pm, 0) = Hm(p1, . . . , pm); (ii) additive if Hm(p1, . . . , pm) + Hn(q1, . . . , qn) = Hmn(p1q1, . . . , p1qn, . . . , pmq1, . . . , pmqn); (iii) subadditive if Hm(p1, . . . , pn) + Hn(q1, . . . , qn) Hmn(r11, . . . ,rmn) whenever n j=1 ri j = pi , m i=1 ri j = qj ; (iv) branching if there exist functions Jm(x, y) (with x, y 0, x + y 1, m = 3, 4, . . .) such that Hm(p1, . . . , pm) − Hm−1(p1 + p2, . . . , pm) = Jm(p1, p2); (v) recursive if it is branching with Jm(p1, p2) = (p1 + p2)H2 p1 p1 + p2 , p2 p1 + p2 , m = 3, 4, . . . ; (vi) normalized if H2 1 2 , 1 2 = 1. For a complete exposition of this subject, we refer to Aczél and Daróczy (1975).
  • 41.
    Source coding andhypothesis testing 13 1.11. Show that if {Hm} is recursive, normalized and H2(p, 1 − p) is a continu- ous function of p then (∗) holds. (See Faddeev (1956); the first “axiomatic” characterization of entropy, using somewhat stronger postulates, was given by Shannon (1948).) Hint The key step is to prove Hm 1 m , . . . , 1 m = log m. To this end, check that f (m) Hm 1 m , . . . , 1 m is additive, i.e., f (mn) = f (m) + f (n), and that f (m + 1) − f (m) → 0 as m → ∞. Show that these properties and f (2) = 1 imply f (m) = log m. (The last implication is a result of Erdös (1946); for a simple proof, see Rényi (1961).) 1.12.∗ (a) Show that if Hm(p1, . . . , pm) = m i=1 g(pi ) with a continuous function g(p), and {Hm} is additive and normalized, then (∗) holds. (Chaundy and McLeod (1960)). (b) Show that if {Hm} is expansible and branching then Hm(p1, . . . , pm) = m i=1 g(pi ), with g(0) = 0 (Ng, (1974).) 1.13.∗ (a) Show that if {Hm} is expansible, additive, subadditive, normalized and H2(p, 1 − p) → 0 as p → 0 then (∗) holds. (b) If {Hm} is expansible, additive and subadditive, show that there exist constants A 0, B 0 such that Hm(p1, . . . , pm) = A − m i=1 pi log pi + B log |{i : pi 0}|. (Forte (1975), Aczél–Forte–Ng (1974).) 1.14.∗ Suppose that Hm(p1, . . . , pm) = − log −1 m i=1 pi (pi ) with some strictly monotonic continuous function on (0,1] such that t(t) → 0(0) 0 as t → 0. Show that if {Hm} is additive and normalized then either (∗) holds or Hm(p1, . . . , pm) = 1 1 − α log m i=1 pα i with some α 0, α = 1. (Conjectured by Rényi (1961) and proved by Daróczy (1964). The preceding expression is called Rényi’s entropy of order α. A similar expression was used earlier by Schützenberger (1954) as “pseudo information.”) 1.15. For P = (p1, . . . , pm), denote by Hα(P) the Rényi entropy of order α if α = 1, α 0, and the Shannon entropy H(P) if α = 1. Show that Hα(P) is a continuous, non-increasing function of α, whose limits as α → 0, resp. α → +∞, are H0(P) log |{i : pi 0}| , H∞(P) min (− log pi ), called the maxentropy, resp. minentropy, of P. Hint Check that log m i=1 pα i is a convex function of α. 1.16. (Fisher’s information) Let {Pϑ } be a family of distributions on a finite set X, where ϑ is a real parameter ranging over an open interval. Suppose that the probabilities Pϑ (x) are positive and that they are continuously differentiable functions of ϑ. Write
  • 42.
    14 Information measuresin simple coding problems I (ϑ) x∈X 1 Pϑ (x) ∂ ∂ϑ Pϑ (x) 2 . (a) Show that for every ϑ lim ϑ→ϑ 1 (ϑ − ϑ)2 D(Pϑ Pϑ ) = 1 ln 4 I (ϑ) (Kullback and Leibler, 1951). (b) Show that every unbiased estimator f of ϑ from a sample of size n, i.e., every real-valued function f on Xn such that Eϑ f (Xn) = ϑ for each ϑ, satisfies varϑ ( f (Xn )) 1 nI (ϑ) . Here Eϑ and varϑ denote expectation, resp. variance, in the case when Xn has distribution Pn ϑ . (Fisher (1925) introduced I (ϑ) as a measure of the information contained in one observation from Pϑ for estimating ϑ. His motivation was that the maximum likelihood estimator of ϑ from a sample of size n has asymptotic variance 1/(nI (ϑ0)) if ϑ = ϑ0. The assertion of (b) is a special case of the Cramér–Rao inequality, see e.g., Schmetterer (1974).) Hint (a) directly follows by L’Hospital’s rule. For (b), it suffices to consider the case n = 1. But x∈X Pϑ (x) 1 Pϑ (x) ∂ ∂ϑ Pϑ (x) 2 · x∈X Pϑ (x)( f (x) − ϑ)2 1 follows from Cauchy’s inequality, since x∈X ∂ ∂ϑ Pϑ (x) · ( f (x) − ϑ) = ϑ ∂ϑ x∈X Pϑ (x) f (x) = 1. Story of the results The basic concepts of information theory are due to Shannon (1948). In particular, he proved Theorem 1.1, introduced the information measures entropy, conditional entropy and mutual information, and established their basic properties. The name entropy has been borrowed from physics, as entropy in the sense of statistical physics is expressed by a similar formula, due to Boltzmann (1877). The very idea of measuring informa- tion regardless of its content dates back to Hartley (1928), who assigned to a symbol out of m alternatives the amount of information log m. An information measure in a specific context was used by Fisher (1925), as in Problem 1.16. Informational diver- gence was introduced by Kullback and Leibler (1951) (under the name information for discrimination; they used the term divergence for its symmetrized version). Corollary 1.2 is known as Stein’s lemma; it appears in Chernoff (1956), attributed to C. Stein. Theorem 1.2 is a common generalization of Theorem 1.1 and Corollary 1.2; a stronger
  • 43.
    Source coding andhypothesis testing 15 result of this kind was given by Strassen (1964), see Problem 1.8. For a nice discus- sion of the pragmatic and axiomatic approaches to information measures, see Rényi (1965). Addition. For more on the interplay of information theory and statistics, see Kullback (1959), Rissanen (1989), and Csiszár and Shields (2004).
  • 44.
    2 Types andtypical sequences Most of the proof techniques used in this book will be based on a few simple combinatorial lemmas, summarized below. Drawing k times independently with distribution Q from a finite set X, the probability of obtaining the sequence x ∈ Xk depends only on how often the various elements of X occur in x. In fact, denoting by N(a|x) the number of occurrences of a ∈ X in x, we have Qk (x) = a∈X Q(a)N(a|x) . (2.1) DEFINITION 2.1 The type of a sequence x ∈ Xk is the distribution Px on X defined by Px(a) 1 k N(a|x) for every a ∈ X. For any distribution P on X, the set of sequences of type P in Xk is denoted by Tk P or simply TP. A distribution P on X is called a type of sequences in Xk if Tk P = ∅. Sometimes the term “type” will also be used for the sets Tk P = ∅ when this does not lead to ambiguity. These sets are also called type classes or composition classes. REMARK In mathematical statistics, if x ∈ Xk is a sample of size k consisting of the results of k observations, the type of x is called the empirical distribution of the sample x. By (2.1), the Qk-probability of a subset of TP is determined by its cardinality. Hence the Qk-probability of any subset A of Xk can be calculated by combinatorial counting arguments, looking at the intersections of A with the various sets TP separately. In doing so, it will be relevant that the number of different types in Xk is much smaller than the number of sequences x ∈ Xk. LEMMA 2.2 (Type counting) The number of different types of sequences in Xk is less than (k + 1)|X|. ➞ 2.1 Proof For every a ∈ X, N(a|x) can take k + 1 different values. The next lemma explains the role of entropy from a combinatorial point of view, via the asymptotics of a multinomial coefficient.
  • 45.
    Types and typicalsequences 17 LEMMA 2.3 For any type P of sequences in Xk ➞ 2.2 (k + 1)−|X| exp[kH(P)] |TP| exp[kH(P)]. Proof Since (2.1) implies Pk (x) = exp[−kH(P)] if x ∈ TP we have |TP| = Pk (TP) exp[kH(P)]. Hence it is enough to prove that Pk (TP) (k + 1)−|X| . This will follow by the Type counting lemma if we show that the Pk-probability of T P is maximized for P = P. By (2.1) we have Pk (T P) = |T P| · a∈X P(a)k P(a) = k! a∈X (k P(a))! a∈X P(a)k P(a) for every type P of sequences in Xk. It follows that Pk(T P) Pk(TP) = a∈X (kP(a))! (k P(a))! P(a)k( P(a)−P(a)) . Applying the obvious inequality n! m! nn−m, this gives Pk(T P) Pk(TP) a∈X kk(P(a)− P(a)) = 1. If X and Y are two finite sets, the joint type of a pair of sequences x ∈ Xk and y ∈ Yk is defined as the type of the sequence {(xi , yi )}k i=1 ∈ (X × Y)k. In other words, it is the distribution Px,y on X × Y defined by Px,y(a, b) 1 k N(a, b|x, y) for every a ∈ X, b ∈ Y. Joint types will often be given in terms of the type of x and a stochastic matrix V : X → Y such that Px,y(a, b) = Px(a)V (b|a) for every a ∈ X, b ∈ Y. (2.2) Note that the joint type Px,y uniquely determines V (b|a) for those a ∈ X which do occur in the sequence x. For conditional probabilities of sequences y ∈ Yk, given a sequence x ∈ Yk, the matrix V of (2.2) will play the same role as the type of y does for unconditional probabilities.
  • 46.
    18 Information measuresin simple coding problems DEFINITION 2.4 We say that y ∈ Yk has conditional type V given x ∈ Xk if N(a, b|x, y) = N(a|x)V (b|a) for every a ∈ X, b ∈ Y. For any given x ∈ Yk and stochastic matrix V : X → Y, the set of sequences y ∈ Yk having conditional type V given x will be called the V-shell of x, denoted by Tk V (x) or simply TV (x). REMARK The conditional type of y given x is not uniquely determined if some a ∈ X do not occur in x. Still, the set TV (x) containing y is unique. Note that conditional type is a generalization of types. In fact, if all the components of ➞ 2.3 the sequence x are equal (say x) then the V -shell of x coincides with the set of sequences of type V (·|x) in Yk. In order to formulate the basic size and probability estimates for V -shells, it will be convenient to introduce some notations. The average of the entropies of the rows of a stochastic matrix V : X → Y with respect to a distribution P on X will be denoted by H(V |P) = x∈X P(x)H(V (·|x)). (2.3) The analogous average of the informational divergences of the corresponding rows of two stochastic matrices V : X → Y and W : X → Y will be denoted by D(V W|P) = x∈X P(x)D(V (·|x)W(·|x)). (2.4) Note that H(V |P) is the conditional entropy H(Y|X) of RVs X and Y such that X has distribution P and Y has conditional distribution V given X. The quantity D(V W|P) is called the conditional informational divergence. A counterpart of Lemma 2.3 for V -shells is LEMMA 2.5 For every x ∈ Xk and stochastic matrix V : X → Y such that TV (x) is non-void, we have (k + 1)−|X||Y| exp [kH(V |Px)] |TV (x)| exp [kH(V |Px)]. Proof This is an easy consequence of Lemma 2.2. In fact, |TV (x)| depends on x only through the type of x. Hence we may assume that x is the juxtaposition of sequences xa, a ∈ X, where xa consists of N(a|x) identical elements a. In this case TV (x) is the Cartesian product of the sets of sequences of type V (·|a) in YN(a|x), with a running over those elements of X which occur in x. Thus Lemma 2.3 gives a∈X (N(a|x) + 1)−|Y| exp [N(a|x)H(V (·|a))] |TV (x)| a∈X exp [N(a|x)H(V (·|a))], whence the assertion follows by (2.3).
  • 47.
    Types and typicalsequences 19 LEMMA 2.6 For every type P of sequences in Xk and distribution Q on X Qk (x) = exp [−k(D(PQ) + H(P))] if x ∈ TP, (2.5) (k + 1)−|X| exp [−kD(PQ)] Qk (TP) exp [−kD(PQ)]. (2.6) Similarly, for every x ∈ Xk and stochastic matrices V : X → Y, W : X → Y such that TV (x) is non-void, Wk (y|x) = exp [−k(D(V W|Px) + H(V |Px))] if y ∈ TV (x), (2.7) (k + 1)−|XY| exp [−kD(V W|Px)] Wk (TV (x)|x) exp [−kD(V W|Px)]. (2.8) Proof Equation (2.5) is just a rewriting of (2.1). Similarly, (2.7) is a rewriting of the identity Wk (y|x) = a∈X, b∈Y W(b|a)N(a,b|x,y) . The remaining assertions now follow from Lemmas 2.3 and 2.5. The quantity D(PQ) + H(P) = − x∈X P(x) log Q(x) appearing in (2.5) is sometimes called inaccuracy. For Q = P, the Qk-probability of the set Tk P is exponentially small (for large k); cf. Lemma 2.6. It can be seen that even Pk(Tk P) → 0 as k → ∞. Thus sets of large ➞ 2.2 probability must contain sequences of different types. Dealing with such sets, the con- tinuity of the entropy function plays a relevant role. The next lemma gives more precise information on this continuity. The variation distance of two distributions P and Q on X is d(P, Q) x∈X |P(x) − Q(x)|. (Some authors use the term for the half of this.) LEMMA 2.7 If d(P, Q) = 1/2 then |H(P) − H(Q)| − log |X| . For a sharpening of this lemma, see Problem 3.10. Proof Write ϑ(x) |P(x) − Q(x)|. Since f (t) −t log t is concave and f (0) = f (1) = 0, we have for every 0 t 1 − τ, 0 τ 1/2, | f (t) − f (t + τ)| max( f (τ), f (1 − τ)) = −τ log τ.
  • 48.
    20 Information measuresin simple coding problems Hence for 0 1/2 |H(P) − H(Q)| x∈X | f (P(x)) − f (Q(x))| − x∈X ϑ(x) log ϑ(x) = − x∈X ϑ(x) log ϑ(x) − log log |X| − log , where the last step follows from Corollary 1.1. DEFINITION 2.8 For any distribution P on X, a sequence x ∈ Xk is called P-typical with constant δ if 1 k N(a|x) − P(a) δ for every a ∈ X and, in addition, no a ∈ X with P(a) = 0 occurs in x. The set of such sequences will be denoted by Tk [P]δ or simply T[P]δ . If X is a RV with values in X, we refer to PX -typical sequences as X-typical, and write Tk [X]δ or T[X]δ for Tk [PX ]δ . REMARK Tk [P]δ is the union of the sets Tk P for those types P of sequences in Xk which satisfy | P(a) − P(a)| δ for every a ∈ X and P(a) = 0 whenever P(a) = 0. DEFINITION 2.9 For a stochastic matrix W : X → Y, a sequence y ∈ Yk is W-typical under the condition x ∈ Xk (or W-generated by the sequence x ∈ Xk) with constant δ if 1 k N(a, b|x, y) − 1 k N(a|x)W(b|a) δ for every a ∈ X, b ∈ Y, and, in addition, N(a, b|x, y) = 0 whenever W(b|a) = 0. The set of such sequences y will be denoted by Tk [W]δ (x) or simply by T[W]δ (x). Further, if X and Y are RVs with values in X resp. Y and PY|X = W, then we shall speak of Y|X-typical or Y|X-generated sequences and write Tk [Y|X]δ (x) or T[Y|X]δ (x) for Tk [W]δ (x). Sequences Y|X-generated by an x ∈ Xk are defined only if the condition PY|X = W uniquely determines W(·|a) for a ∈ X with N(a|x) 0, that is, if no a ∈ X with PX (a) = 0 occurs in the sequence x; this automatically holds if x is X-typical. The set Tk [XY]δ of (X, Y)-typical pairs (x, y) ∈ Xk × Yk is defined applying Definition 2.8 to (X, Y) in the role of X. When the pair (x, y) is typical, we say that x and y are jointly typical. LEMMA 2.10 If x ∈ Tk [X]δ and y ∈ Tk [Y|X]δ (x) then (x, y) ∈ Tk [XY]δ+δ and, conse- quently, y ∈ Tk [Y]δ for δ (δ + δ)|X|.
  • 49.
    Types and typicalsequences 21 For reasons which will be obvious from Lemmas 2.12 and 2.13, typical sequences ➞ 2.4 will be used with δ depending on k such that δk → 0, √ k · δk → ∞ as k → ∞. (2.9) Throughout this book, we adopt the following convention. CONVENTION 2.11 (Delta-convention) To every set X resp. ordered pair of sets (X, Y) there is given a sequence {δk}∞ k=1 satisfying (2.9). Typical sequences are understood with these δk. The sequences {δk} are considered as fixed, and in all assertions dependence on them will be suppressed. Accordingly, the constant δ will be omitted from the nota- tion, i.e., we shall write Tk [P], Tk [W](x), etc. In most applications, some simple relations between these sequences {δk} will also be needed. In particular, whenever we need that typical sequences should generate typical ones, we assume that the corresponding δk are chosen according to Lemma 2.10. LEMMA 2.12 There exists a sequence εk → 0 depending only on |X| and |Y| (see the delta-convention) so that for every distribution P on X and stochastic matrix W : X → Y Pk (Tk [P]) 1 − εk, Wk (Tk [W](x)|x) 1 − εk for every x ∈ Xk . REMARK More explicitly, Pk (Tk [P]δ ) 1 − |X| 4kδ2 , Wk (Tk [W]δ (x)|x) 1 − |XY| 4kδ2 , for every δ 0, and here the terms subtracted from 1 could be replaced even by 2|X|e−2kδ2 resp. 2|X||Y|e−2kδ2 . Proof It suffices to prove the inequalities of the Remark. Clearly, the second inequality implies the first one as a special case (choose in the second inequality a one-point set for X). Now if x = x1 . . . xk, let Y1, Y2, . . ., Yk be independent RVs with distributions PYi = W(·|xi ). Then the RV N(a, b|x, Yk) has binominal distribution with expectation N(a|x)W(b|a) and variance N(a|x)W(b|a)(1 − W(b|a)) 1 4 N(a|x) k 4 . Thus by Chebyshev’s inequality Pr {|N(a, b|x, Yk ) − N(a|x)W(b|a)| kδ} 1 4kδ2 for every a ∈ X, b ∈ Y. Hence the inequality with 1 − |XY| 4kδ2 follows. The claimed sharper bound is obtained similarly, employing Hoeffding’s inequality (see Problem 3.18 (b)) instead of Chebyshev’s.
  • 50.
    22 Information measuresin simple coding problems LEMMA 2.13 There exists a sequence εk → 0 depending only on |X| and |Y| (see the delta-convention) so that for every distribution P on X and stochastic matrix W : X → Y 1 k log |Tk [P]| − H(P) εk and 1 k log |Tk [W](x)| − H(W|P) εk for every x ∈ Tk [P]. Proof The first assertion immediately follows from Lemma 2.3 and the uniform con- tinuity of the entropy function (Lemma 2.7). The second assertion, containing the first one as a special case, follows similarly from Lemmas 2.5 and 2.7. To be formal, observe that, by the type counting lemma, Tk [W](x) is the union of at most (k + 1)|XY| disjoint V -shells TV (x). By Definitions 2.4 and 2.9, all the underlying V satisfy |Px(a)V (b|a) − Px(a)W(b|a)| δ k for every a ∈ X, b ∈ Y, (2.10) where {δ k} is the sequence corresponding to the pair of sets X, Y by the delta-convention. By (2.10) and Lemma 2.7, the entropies of the joint distributions on X × Y determined by Px and V resp. by Px and W differ by at most −|XY|δ k log δ k (if |XY|δ k 1/2) and thus also |H(V |Px) − H(W|Px)| −|XY|δ k log δ k. On account of Lemma 2.5, it follows that (k + 1)−|XY| exp [k(H(W|Px) + |XY|δ k log δ k)] |Tk [W](x)| (k + 1)|XY| exp [k(H(W|Px) − |XY|δ k log δ k)]. (2.11) Finally, since x is P-typical, i.e., |Px(a) − P(a)| δk for every a ∈ X, we have by Corollary 1.1 |H(W|Px) − H(W|P)| δklog|Y|. Substituting this into (2.11), the assertion follows. The last basic lemma of this chapter asserts that no “large probability set” can be substantially smaller than T[P] resp.T[W](x). LEMMA 2.14 Given 0 η 1, there exists a sequence εk → 0 depending only on η, |X| and |Y| such that (i) if A ⊂ Xk, Pk(A) η then 1 k log|A| H(P) − εk; (ii) if B ⊂ Yk, Wk(B|x) η then 1 k log|B| H(W|Px) − εk.
  • 51.
    Types and typicalsequences 23 COROLLARY 2.14 There exists a sequence ε k → 0 depending only on η, |X|, |Y| (see the delta-convention) such that if B ⊂ Yk and Wk(B|x) η for some x ∈ T[P] then 1 k log|B| H(W|P) − ε k. Proof It is sufficient to prove (ii). By Lemma 2.12, the condition Wk(B|x) η implies Wk (B ∩ T[W](x)|x) η 2 for k k0(η, |X|, |Y|). Recall that T[W](x) is the union of disjoint V -shells TV(x) satis- fying (2.10); see the proof of Lemma 2.13. Since Wk(y|x) is constant within a V -shell of x, it follows that |B ∩ TV (x)| η 2 |TV (x)| for at least one V : X → Y satisfying (2.10). Now the proof can be completed using Lemmas 2.5 and 2.7 just as in the proof of the previous lemma. Observe that the preceding three lemmas contain a proof of Theorem 1.1. Namely, ➞ 2.5 the fact that about kH(P) binary digits are sufficient for encoding k-length messages of a DMS with generic distribution P, is a consequence of Lemmas 2.12 and 2.13, while the necessity of this many binary digits follows from Lemma 2.14. Most coding theorems in this book will be proved using typical sequences in a similar manner. The merging of several nearby types has the advantage of facilitating computations. When dealing with the more refined questions of the speed of convergence of error proba- bilities, however, the method of typical sequences will become inappropriate. In such problems, we shall have to consider each type separately, relying on the first part of this chapter. Although this will not occur until Chapter 9, as an immediate illustration of the more subtle method we now refine the basic source coding result, Theorem 1.1. THEOREM 2.15 For any finite set X and R 0 there exists a sequence of k-to-nk binary block codes ( fk, ϕk) with nk k → R such that for every DMS with alphabet X and arbitrary generic distribution P, the probability of error satisfies e( fk, ϕk) exp −k inf Q:H(Q)R D(Q||P) − ηk (2.12) with ηk log(k + 1) k |X|.
  • 52.
    24 Information measuresin simple coding problems This result is asymptotically sharp for every particular DMS, in the sense that for any sequence of k-to-nk binary block codes, nk/k → R implies lim k→∞ 1 k log e( fk, ϕk) − inf Q:H(Q)R D(Q||P). (2.13) The infimum in (2.12) and (2.13) is finite iff R log s(P), and then it equals the minimum subject to H(Q) R. Here s(P) denotes the size of the support of P, that is, the number of those a ∈ X for which P(a) 0. REMARK This result sharpens Theorem 1.1 in two ways. First, for a DMS with generic distribution P, and R H(P), it gives the precise asymptotics, in the expo- nential sense, of the probability of error of the best codes with nk/k → R (the result is also true, but uninteresting, for R H(P)). Second, it shows that this optimal per- formance can be achieved by codes not depending on the generic distribution of the source. The remaining assertion of Theorem 1.1, namely that for nk/k → R H(P) the probability of error tends to 1, can be sharpened similarly. ➞ 2.6 Proof of Theorem 2.15. Write Ak Q:H(Q)R TQ. Then, by Lemmas 2.2 and 2.3, |Ak| (k + 1)|X| exp(kR); (2.14) further, by Lemmas 2.2 and 2.6, Pk (Xk − Ak) (k + 1)|X| exp − k min Q:H(Q)R D(Q||P) . (2.15) Let us encode the sequences in Ak in a one-to-one way and all others by a fixed codeword, say. Equation (2.14) shows that this can be done with binary codewords of length nk satisfying nk/k → R. For the resulting code, (2.15) gives (2.12), with ηk log(k + 1) k |X|. The last assertion of Theorem 2.15 is obvious, and implies that it suffices to prove (2.13) for R log s(P). The number of sequences in Xk correctly reproduced by a k-to-nk binary block code is at most 2nk . Thus, by Lemma 2.3, for every type Q of sequences in Xk satisfying (k + 1)−|X| exp [kH(Q)] 2nk+1 , (2.16) at least half of the sequences in TQ will not be reproduced correctly. On account of Lemma 2.6, it follows that e( fk, ϕk) 1 2 (k + 1)−|X| exp [−kD(Q||P)]
  • 53.
    Types and typicalsequences 25 for every type Q satisfying (2.16). Hence e( fk, ϕk) 1 2 (k + 1)−|X| exp −k min Q:H(Q)R+εk D(Q||P) , where Q runs over types of sequences in Xk and εk nk k − R + 1 k + log (k + 1) k |X|. Using that R log s(P), for large k the last minimum changes little if Q is not ➞ 2.7 restricted to types, and εk is omitted. Discussion The simple combinatorial lemmas concerning types are the basis of the proof of most coding theorems treated in this book. Merging “nearby” types, i.e., the formalism of typical sequences, has the advantage of shortening computations. In the literature, there are several concepts of typical sequences. Often one merges more types than we have done in Definition 2.8; in particular, the entropy-typical sequences of Problem 2.5 are widely used. The latter kind of typicality has the advantage that it easily generalizes to models with memory and with abstract alphabets. For the discrete memoryless systems treated in this book, the adopted concept of typicality often leads to stronger results. Still, the formalism of typical sequences has a limited scope, for it does not allow eval- uation of convergence rates of error probabilities. This is illustrated by the fact that typical sequences led to a simple proof of Theorem 1.1 while to prove Theorem 2.15 types had to be considered individually. The technique of estimating probabilities without merging types is also more appro- priate for the purpose of deriving universal coding theorems. Intuitively, universal coding means that codes have to be constructed in complete ignorance of the proba- bility distributions governing the system; then the performance of the code is evaluated by the whole spectrum of its performance indices for the various possible distributions. Theorem 2.15 is the first universal coding result in this book. It is clear that two codes are not necessarily comparable from the point of view of universal coding. In view of this, it is somewhat surprising that for the class of DMSs with a fixed alphabet X there exist codes universally optimal in the sense that for every DMS they have asymptotically the same probability of error as the best code designed for that particular DMS. Problems 2.1. Show that the exact number of types of sequences in Xk equals k + |X| − 1 |X| − 1 . Draw the conclusion that the lower bounds in Lemmas 2.3, 2.5 and 2.6 can be sharpened replacing the power of (k + 1) by this number.
  • 54.
    26 Information measuresin simple coding problems 2.2. Prove that the size of Tk P is of order of magnitude k−(s(P)−1)/2 exp{kH(P)}, where s(P) is the number of elements a ∈ X with P(a) 0. More precisely, show that log |Tk P| = kH(P) − s(P) − 1 2 log (2πk) − 1 2 a:P(a)0 logP(a) − ϑ(k, P) 12 ln 2 s(P), where 0 ϑ(k, P) 1. Hint Use Robbins’ sharpening of Stirling’s formula: √ 2πnn+ 1 2 e−n+ 1 12(n+1) n! √ 2πnn+ 1 2 e−n+ 1 12n (see e.g. Feller (1968), p. 54), noting that P(a) 1/k whenever P(a) 0. 2.3. Clearly, every y ∈ Yk in the V -shell of an x ∈ Xk has the same type Q where Q(b) a∈X Px(a)V (b|a). (a) Show that TV(x) = TQ even if all the rows of the matrix V are equal to Q (unless x consists of identical elements). (b) Show that if Px = P then (k + 1)−|X||Y| exp [−kI (P, V )] |TV (x)| |TQ| (k + 1)|Y| exp [−kI (P, V )], where I (P, V ) H(Q) − H(V |P) is the mutual information of RVs X and Y such that PX = P and PY|X = V . In particular, if all rows of V are equal to Q then the size of TV(x) is not “exponentially smaller” than that of TQ. 2.4. Prove that the first resp. second condition of (2.9) is necessary for Lemmas 2.13 resp. 2.12 to hold. 2.5. (Entropy-typical sequences) Let us say that a sequence x ∈ Xk is entropy-P- typical with constant δ if − 1 k log Pk (x) − H(P) δ; further, y ∈ Yk is entropy-W-typical under the condition x if − 1 k logWk (y|x) − H(W|Px) δ. (a) Check that entropy-typical sequences also satisfy the assertions of Lemmas 2.12 and 2.13 (if δ = δk is chosen as in the delta-convention). Hint These properties were implicitly used in the proofs of Theorems 1.1 and 1.2. (b) Show that typical sequences – with constants chosen according to the delta- convention – are also entropy-typical, with some constants δ k = cP · δk resp. δ k = cW · δk. On the other hand, entropy-typical sequences are not necessar- ily typical with constants of the same order of magnitude.
  • 55.
    Another Random Documenton Scribd Without Any Related Topics
  • 56.
    knew that onlyher bodily presence had been removed. She still lived in our midst—we heard the ring of her voice in the words we read, in the words our hearts told us she would say; we even heard the ring of her laugh! And to-day you may be sure that the woman-pioneer who had the faith to plant the first college for women in America, lives by that faith, not only in her own Mount Holyoke, but in the larger lives of all the women who have profited by her labors.
  • 57.
    THE PRINCESS OFWELLESLEY: ALICE FREEMAN PALMER Our echoes roll from soul to soul, And grow forever and forever. Tennyson.
  • 58.
    T THE PRINCESS OFWELLESLEY HIS is the story of a princess of our own time and our own America—a princess who, while little more than a girl herself, was chosen to rule a kingdom of girls. It is a little like the story of Tennyson's Princess, with her woman's kingdom, and very much like the happy, old-fashioned fairy-tale. We have come to think it is only in fairy-tales that a golden destiny finds out the true, golden heart, and, even though she masquerades as a goose-girl, discovers the kingly child and brings her to a waiting throne. We are tempted to believe that the chance of birth and the gifts of wealth are the things that spell opportunity and success. But this princess was born in a little farm-house, to a daily round of hard work and plain living. That it was also a life of high thinking and rich enjoyment of what each day brought, proved her indeed a kingly child. Give me health and a day, and I will make the pomp of emperors ridiculous! said the sage of Concord. So it was with little Alice Freeman. As she picked wild strawberries on the hills, and climbed the apple-tree to lie for a blissful minute in a nest of swaying blossoms under the blue sky, she was, as she said, happy all over. The trappings of royalty can add nothing to one who knows how to be royally happy in gingham. But Alice was not always following the pasture path to her friendly brook, or running across the fields with the calling wind, or dancing with her shadow in the barn-yard, where even the prosy hens stopped pecking corn for a minute to watch. She had work to do for Mother. When she was only four, she could dry the dishes
  • 59.
    without dropping one;and when she was six, she could be trusted to keep the three toddlers younger than herself out of mischief. My little daughter is learning to be a real little mother, said Mrs. Freeman, as she went about her work of churning and baking without an anxious thought. Alice Freeman Palmer It was Sister Alice who pointed out the robin's nest, and found funny turtles and baby toads to play with. She took the little brood with her to hunt eggs in the barn and to see the ducks sail around like a fleet of boats on the pond. When Ella and Fred were wakened by a fearsome noise at night, they crept up close to their little
  • 60.
    mother, who toldthem a story about the funny screech-owl in its hollow-tree home. It is the ogre of mice and bats, but not of little boys and girls, she said. It sounds funny now, Alice, they whispered. It's all right when we can touch you. When Alice was seven a change came in the home. The father and mother had some serious talks, and then it was decided that Father should go away for a time, for two years, to study to be a doctor. It is hard to be chained to one kind of life when all the time you are sure that you have powers and possibilities that have never had a chance to come out in the open, she heard her father say one evening. I have always wanted to be a doctor; I can never be more than a half-hearted farmer. You must go to Albany now, James, said the dauntless wife. I can manage the farm until you get through your course at the medical college; and then, when you are doing work into which you can put your whole heart, a better time must come for all of us. How can you possibly get along? he asked in amazement. How can I leave you for two years to be a farmer, and father and mother, too? There is a little bank here, she said, taking down a jar from a high shelf in the cupboard and jingling its contents merrily. I have been saving bit by bit for just this sort of thing. And Alice will help me, she added, smiling at the child who had been standing near looking from father to mother in wide-eyed wonder. You will be the little mother while I take father's place for a time, won't you, Alice? It will be cruelly hard on you all, said the father, soberly. I cannot make it seem right. Think how much good you can do afterward, urged his wife. The time will go very quickly when we are all thinking of that. It is
  • 61.
    not hard toendure for a little for the sake of 'a gude time coming'—a better time not only for us, but for many besides. For I know you will be the true sort of doctor, James. Alice never quite knew how they did manage during those two years, but she was quite sure that work done for the sake of a good to come is all joy. I owe much of what I am to my milkmaid days, she said. She was always sorry for children who do not grow up with the sights and sounds of the country. One is very near to all the simple, real things of life on a farm, she used to say. There is a dewy freshness about the early out-of-door experiences, and a warm wholesomeness about tasks that are a part of the common lot. A country child develops, too, a responsibility—a power to do and to contrive—that the city child, who sees everything come ready to hand from a near-by store, cannot possibly gain. However much some of my friends may deplore my own early struggle with poverty and hard work, I can heartily echo George Eliot's boast: But were another childhood-world my share, I would be born a little sister there. When Alice was ten years old, the family moved from the farm to the village of Windsor, where Dr. Freeman entered upon his life as a doctor, and where Alice's real education began. From the time she was four she had, for varying periods, sat on a bench in the district school, but for the most part she had taught herself. At Windsor Academy she had the advantage of a school of more than average efficiency. Words do not tell what this old school and place meant to me as a girl, she said years afterward. Here we gathered abundant Greek, Latin, French, and mathematics; here we were taught truthfulness, to be upright and honorable; here we had our first
  • 62.
    loves, our firstambitions, our first dreams, and some of our first disappointments. We owe a large debt to Windsor Academy for the solid groundwork of education that it laid. More important than the excellent curriculum and wholesome associations, however, was the influence of a friendship with one of the teachers, a young Harvard graduate who was supporting himself while preparing for the ministry. He recognized the rare nature and latent powers of the girl of fourteen, and taught her the delights of friendship with Nature and with books, and the joy of a mind trained to see and appreciate. He gave her an understanding of herself, and aroused the ambition, which grew into a fixed resolve, to go to college. But more than all, he taught her the value of personal influence. It is people that count, she used to say. The truth and beauty that are locked up in books and in nature, to which only a few have the key, begin really to live when they are made over into human character. Disembodied ideas may mean little or nothing; it is when they are 'made flesh' that they can speak to our hearts and minds. As Alice drove about with her father when he went to see his patients and saw how this true doctor of the old school was a physician to the mind as well as the body of those who turned to him for help, she came to a further realization of the truth: It is people that count. It must be very depressing to have to associate with bodies and their ills all the time, she ventured one day when her father seemed more than usually preoccupied. She never forgot the light that shone in his eyes as he turned and looked at her. We can't begin to minister to the body until we understand that spirit is all, he said. What we are pleased to call body is but one expression—and a most marvelous expression—of the hidden life
  • 63.
    that impels All thinkingthings, all objects of all thought, And rolls through all things. It seemed to Alice that this might be a favorable time to broach the subject of college. He looked at her in utter amazement; few girls thought of wanting more than a secondary education in those days, and there were still fewer opportunities for them. Why, daughter, he exclaimed, a little more Latin and mathematics won't make you a better home-maker! Why should you set your heart on this thing? I must go, Father, she answered steadily. It is not a sudden notion; I have realized for a long time that I cannot live my life—the life that I feel I have it within me to live—without this training. I want to be a teacher—the best kind of a teacher—just as you wanted to be a doctor. But, my dear child, he protested, much troubled, it will be as much as we can manage to see one of you through college, and that one should be Fred, who will have a family to look out for one of these days. If you let me have this chance, Father, said Alice, earnestly, I'll promise that you will never regret it. I'll help to give Fred his chance, and see that the girls have the thing they want as well. In the end Alice had her way. It seemed as if the strength of her single-hearted longing had power to compel a reluctant fate. In June, 1872, when but a little over seventeen, she went to Ann Arbor to take the entrance examinations for the University of Michigan, a careful study of catalogues having convinced her that the standard of work was higher there than in any college then open to women. A disappointment met her at the outset. Her training at Windsor, good as it was, did not prepare her for the university requirements. Conditions loomed mountain high, and the examiners
  • 64.
    recommended that shespend another year in preparation. Her intelligence and character had won the interest of President Angell, however, and he asked that she be granted a six-weeks' trial. His confidence in her was justified; for she not only proved her ability to keep up with her class, but steadily persevered in her double task until all conditions were removed. The college years were a glory instead of a grind, in spite of the ever-pressing necessity for strict economy in the use of time and money. Her sense of values—the ability to see large things large and small things small, which has been called the best measure of education,—showed a wonderful harmony of powers. While the mind was being stored with knowledge and the intellect trained to clear, orderly thinking, there was never a too-muchness in this direction that meant a not-enoughness in the realm of human relationships. Always she realized that it is people that count, and her supreme test of education as of life was its consecrated serviceableness. President Angell in writing of her said: One of her most striking characteristics in college was her warm and demonstrative sympathy with her circle of friends. Her soul seemed bubbling over with joy, which she wished to share with the other girls. While she was therefore in the most friendly relations with all those girls then in college, she was the radiant center of a considerable group whose tastes were congenial with her own. Without assuming or striving for leadership, she could not but be to a certain degree a leader among these, some of whom have attained positions only less conspicuous for usefulness than her own. Wherever she went, her genial, outgoing spirit seemed to carry with her an atmosphere of cheerfulness and joy. In the middle of her junior year, news came from her father of a more than usual financial stress, owing to a flood along the
  • 65.
    Susquehanna, which hadswept away his hope of present gain from a promising stretch of woodland. It seemed clear to Alice that the time had come when she must make her way alone. Through the recommendation of President Angell she secured a position as teacher of Latin and Greek in the High School at Ottawa, Illinois, where she taught for five months, receiving enough money to carry her through the remainder of her college course. The omitted junior work was made up partly during the summer vacation and partly in connection with the studies of the senior year. An extract from a letter home will tell how the busy days went: This is the first day of vacation. I have been so busy this year that it seems good to get a change, even though I do keep right on here at work. For some time I have been giving a young man lessons in Greek every Saturday. I have had two junior speeches already, and there are still more. Several girls from Flint tried to have me go home with them for the vacation, but I made up my mind to stay and do what I could for myself and the other people here. A young Mr. M. is going to recite to me every day in Virgil; so with teaching and all the rest I sha'n't have time to be homesick, though it will seem rather lonely when the other girls are gone and I don't hear the college bell for two weeks. Miss Freeman's early teaching showed the vitalizing spirit that marked all of her relations with people. She had a way of making you feel 'all dipped in sunshine,' one of her girls said. Everything she taught seemed a part of herself, another explained. It wasn't just something in a book that she had to teach and you had to learn. She made every page of our history seem a part of present life and interests. We saw and felt the things we talked about.
  • 66.
    The fame ofthis young teacher's influence traveled all the way from Michigan, where she was principal of the Saginaw High School, to Massachusetts. Mr. Henry Durant, the founder of Wellesley, asked her to come to the new college as teacher of mathematics. She declined the call, however, and, a year later, a second and more urgent invitation. Her family had removed to Saginaw, where Dr. Freeman was slowly building up a practice, and it would mean leaving a home that needed her. The one brother was now in the university; Ella was soon to be married; and Stella, the youngest, who was most like Alice in temperament and tastes, was looking forward hopefully to college. But at the time when Dr. Freeman was becoming established and the financial outlook began to brighten, the darkest days that the family had ever known were upon them. Stella, the chief joy and hope of them all, fell seriously ill. The little mother loved this starlike girl as her own child, and looked up to her as one who would reach heights her feet could never climb. When she died it seemed to Alice that she had lost the one chance for a perfectly understanding and inspiring comradeship that life offered. At this time a third call came to Wellesley,—as head of the department of history,—and hoping that a new place with new problems would give her a fresh hold on joy, she accepted. Into her college work the young woman of twenty-four put all the power and richness of her radiant personality. She found peace and happiness in untiring effort, and her girls found in her the most inspiring teacher they had ever known. She went to the heart of the history she taught, and she went to the hearts of her pupils. She seemed to care for each of us—to find each as interesting and worth while as if there were no other person in the world, one of her students said. Mr. Durant had longed to find just such a person to build on the foundation he had laid. It was in her first year that he pointed her out to one of the trustees.
  • 67.
    Do you seethat little dark-eyed girl? She will be the next president of Wellesley, he said. Surely she is much too young and inexperienced for such a responsibility, protested the other, looking at him in amazement. As for the first, it is a fault we easily outgrow, said Mr. Durant, dryly, and as for her inexperience—well, I invite you to visit one of her classes. The next year, on the death of Mr. Durant, she was made acting president of the college, and the year following she inherited the title and honors, as well as the responsibilities and opportunities, of the office. The Princess had come into her kingdom. The election caused a great stir among the students, particularly the irrepressible seniors. It was wonderful and most inspiring that their splendid Miss Freeman, who was the youngest member of the faculty, should have won this honor. Why, she was only a girl like themselves! The time of strict observances and tiresome regulations of every sort was at an end. Miss Freeman seemed to sense the prevailing mood, and, without waiting for a formal assembly, asked the seniors to meet her in her rooms. In they poured, overflowing chairs, tables, and ranging themselves about on the floor in animated, expectant groups. The new head of the college looked at them quietly for a minute before she began to speak. I have sent for you seniors, she said at last seriously, to ask your advice. You may have heard that I have been called to the position of acting president of your college. I am, of course, too young; and the duties are, as you know, too heavy for the strongest to carry alone. If I must manage alone, there is only one course—to decline. It has, however, occurred to me that my seniors might be willing to help by looking after the order of the college and leaving me free for administration. Shall I accept? Shall we work things out together? The hearty response made it clear that the princess was to rule not only by divine right, but also by the glad consent of the
  • 68.
    governed. Perhaps itwas her youth and charm and the romance of her brilliant success that won for her the affectionate title of The Princess; perhaps it was her undisputed sway in her kingdom of girls. It was said that her radiant, outgoing spirit was felt in the atmosphere of the place and in all the graduates. Her spirit became the Wellesley spirit. What did she do besides turning all of you into an adoring band of Freeman-followers? a Wellesley woman was asked. The reply came without a moment's hesitation: She had the life-giving power of a true creator, one who can entertain a vision of the ideal, and then work patiently bit by bit to 'carve it in the marble real.' She built the Wellesley we all know and love, making it practical, constructive, fine, generous, human, spiritual. For six years the Princess of Wellesley ruled her kingdom wisely. She raised the standard of work, enlisted the interest and support of those in a position to help, added to the buildings and equipment, and won the enthusiastic cooperation of students, faculty, and public. Then, one day, she voluntarily stepped down from her throne, leaving others to go on with the work she had begun. She married Professor George Herbert Palmer of Harvard, and, (quite in the manner of the fairy-tale) lived happily ever after. What a disappointment! some of her friends said. That a woman of such unusual powers and gifts should deliberately leave a place of large usefulness and influence to shut herself up in the concerns of a single home! There is nothing better than the making of a true home, said Alice Freeman Palmer. I shall not be shut away from the concerns of others, but more truly a part of them. 'For love is fellow-service,' I believe. The home near Harvard Yard was soon felt to be the most free and perfect expression of her generous nature. Its happiness made all life seem happier. Shy undergraduates and absorbed students who had withdrawn overmuch within themselves and their pet
  • 69.
    problems found therea thaw after their winter of discontent. Wellesley girls—even in those days before automobiles—did not feel fifteen miles too great a distance to go for a cup of tea and a half- hour by the fire. College Hall, destroyed by fire in 1914 Tower Court, which stands on the site of College Hall Many were surprised that Mrs. Palmer never seemed worn by the unstinted giving of herself to the demands of others on her time and sympathy. The reason was that their interests were her interests. Her spirit was indeed outgoing; there was no wall
  • 70.
    hedging in acertain number of things and people as hers, with the rest of the world outside. As we have seen, people counted with her supremely; and the ideas which moved her were those which she found embodied in the joys and sorrows of human hearts. Mrs. Palmer wrote of her days at this time: I don't know what will happen if life keeps on growing so much better and brighter each year. How does your cup manage to hold so much? Mine is running over, and I keep getting larger cups; but I can't contain all my blessings and gladness. We are both so well and busy that the days are never half long enough. Life held, indeed, a full measure of opportunities for service. Wellesley claimed her as a member of its executive committee, and other colleges sought her counsel. When Chicago University was founded, she was induced to serve as its Dean of Women until the opportunities for girls there were wisely established. She worked energetically raising funds for Radcliffe and her own Wellesley. Throughout the country her wisdom as an educational expert was recognized, and her advice sought in matters of organization and administration. For several years, as a member of the Massachusetts State Board of Education, she worked early and late to improve the efficiency and influence of the normal schools. She was a public servant who brought into all her contact with groups and masses of people the simple directness and intimate charm that marked her touch with individuals. How is it that you are able to do so much more than other people? asked a tired, nervous woman, who stopped Mrs. Palmer for a word at the close of one of her lectures. Because, she answered, with the sudden gleam of a smile, I haven't any nerves nor any conscience, and my husband says I haven't any backbone.
  • 71.
    It was truethat she never worried. She had early learned to live one day at a time, without looking before and after. And nobody knew better than Mrs. Palmer the renewing power of joy. She could romp with some of her very small friends in the half-hour before an important meeting; go for a long walk or ride along country lanes when a vexing problem confronted her; or spend a quiet evening by the fire reading aloud from one of her favorite poets at the end of a busy day. For fifteen years Mrs. Palmer lived this life of joyful, untiring service. Then, at the time of her greatest power and usefulness, she died. The news came as a personal loss to thousands. Just as Wellesley had mourned her removal to Cambridge, so a larger world mourned her earthly passing. But her friends soon found that it was impossible to grieve or to feel for a moment that she was dead. The echoes of her life were living echoes in the world of those who knew her. There are many memorials speaking in different places of her work. In the chapel at Wellesley, where it seems to gather at every hour a golden glory of light, is the lovely transparent marble by Daniel Chester French, eternally bearing witness to the meaning of her influence with her girls. In the tower at Chicago the chimes make music, joyfully to recall, her labors there. But more lasting than marble or bronze is the living memorial in the hearts and minds made better by her presence. For it is, indeed, people that count, and in the richer lives of many the enkindling spirit of Alice Freeman Palmer still lives.
  • 72.
    OUR LADY OFTHE RED CROSS: CLARA BARTON Who gives himself with his alms feeds three,— Himself, his hungering neighbor, and Me. The Vision of Sir Launfal.—Lowell.
  • 73.
    A OUR LADY OFTHE RED CROSS CHRISTMAS baby! Now isn't that the best kind of a Christmas gift for us all? cried Captain Stephen Barton, who took the interesting flannel bundle from the nurse's arms and held it out proudly to the assembled family. No longed-for heir to a waiting kingdom could have received a more royal welcome than did that little girl who appeared at the Barton home in Oxford, Massachusetts, on Christmas Day, 1821. Ten years had passed since a child had come to the comfortable farm- house, and the four big brothers and sisters were very sure that they could not have had a more precious gift than this Christmas baby. No one doubted that she deserved a distinguished name, but it was due to Sister Dorothy, who was a young lady of romantic seventeen and something of a reader, that she was called Clarissa Harlowe, after a well-known heroine of fiction. The name which this heroine of real life actually bore and made famous, however, was Clara Barton; for the Christmas baby proved to be a gift not only to a little group of loving friends, but also to a great nation and to humanity. The sisters and brothers were teachers rather than playmates for Clara, and her education began so early that she had no recollection of the way they led her toddling steps through the beginnings of book-learning. On her first day at school she announced to the amazed teacher who tried to put a primer into her hands that she could spell the artichoke words. The teacher had other surprises besides the discovery that this mite of three was acquainted with three-syllabled lore.
  • 74.
    Brother Stephen, whowas a wizard with figures, had made the sums with which he covered her slate seem a fascinating sort of play at a period when most infants are content with counting the fingers of one hand. All other interests, however, paled before the stories that her father told her of great men and their splendid deeds. Captain Barton was amused one day at the discovery that his precocious daughter, who always eagerly encored his tales of conquerors and leaders, thought of their greatness in images of quite literal and realistic bigness. A president must, for instance, be as large as a house, and a vice-president as spacious as a barn door at the very least. But these somewhat crude conceptions did not put a check on the epic recitals of the retired officer, who, in the intervals of active service in plowed fields or in pastures where his thoroughbreds grazed with their mettlesome colts, liked to live over the days when he served under Mad Anthony Wayne in the Revolutionary War, and had a share in the thrilling adventures of the Western frontier. Clara was only five years old when Brother David taught her to ride. Learning to ride is just learning a horse, said this daring youth, who was the Buffalo Bill of the surrounding country. How can I learn a horse, David? quavered the child, as the high-spirited animals came whinnying to the pasture bars at her brother's call. Catch hold of his mane, Clara, and just feel the horse a part of yourself—the big half for the time being, said David, as he put her on the back of a colt that was broken only to bit and halter, and, easily springing on his favorite, held the reins of both in one hand, while he steadied the small sister with the other by seizing hold of one excited foot. They went over the fields at a gallop that first day, and soon little Clara and her mount understood each other so well that her riding feats became almost as far-famed as those of her brother. The time came when her skill and confidence on horseback—her power
  • 75.
    to feel theanimal she rode a part of herself and keep her place in any sort of saddle through night-long gallops—meant the saving of many lives. David taught her many other practical things that helped to make her steady and self-reliant in the face of emergencies. She learned, for instance, to drive a nail straight, and to tie a knot that would hold. Eye and hand were trained to work together with quick decision that made for readiness and efficiency in dealing with a situation, whether it meant the packing of a box, or first-aid measures after an accident on the skating-pond. She was always an outdoor child, with dogs, horses, and ducks for playfellows. The fuzzy ducklings were the best sort of dolls. Sometimes when wild ducks visited the pond and all her waddling favorites began to flap their wings excitedly, it seemed that her young heart felt, too, the call of large, free spaces. The only real fun is to do things, she used to say. She rode after the cows, helped in the milking and churning, and followed her father about, dropping potatoes in their holes or helping weed the garden. Once, when the house was being painted, she begged to be allowed to assist in the work, even learning to grind the pigments and mix the colors. The family was at first amused and then amazed at the persistency of her application as day after day she donned her apron and fell to work. They were not less astonished when she wanted to learn the work of the weavers in her brothers' satinet mills. At first, her mother refused this extraordinary request; but Stephen, who understood the intensity of her craving to do things, took her part; and at the end of her first week at the flying shuttle Clara had the satisfaction of finding that her cloth was passed as first-quality goods. Her career as a weaver was of short duration, however, owing to a fire which destroyed the mills. The young girl was as enthusiastic in play as at work. Whether it was a canter over the fields on Billy while her dog, Button, dashed
  • 76.
    along at herside, his curly white tail bobbing ecstatically, or a coast down the rolling hills in winter, she entered into the sport of the moment with her whole heart. When there was no outlet for her superabundant energy, she was genuinely unhappy. Then it was that a self-consciousness and morbid sensitiveness became so evident that it was a source of real concern to her friends. People say that I must have been born brave, said Clara Barton. Why, I seem to remember nothing but terrors in my early days. I was a shrinking little bundle of fears—fears of thunder, fears of strange faces, fears of my strange self. It was only when thought and feeling were merged in the zest of some interesting activity that she lost her painful shyness and found herself. When she was eleven years old she had her first experience as a nurse. A fall which gave David a serious blow on the head, together with the bungling ministrations of doctors, who, when in doubt, had recourse only to the heroic treatment of bleeding and leeches, brought the vigorous young brother to a protracted invalidism. For two years Clara was his constant and devoted attendant. She schooled herself to remain calm, cheerful, and resourceful in the presence of suffering and exacting demands. When others gave way to fatigue or nerves, her wonderful instinct for action kept her, child though she was, at her post. Her sympathy expressed itself in untiring service. In the years that followed her brother's recovery Clara became a real problem to herself and her friends. The old blighting sensitiveness made her school-days restless and unhappy in spite of her alert mind and many interests. At length her mother, at her wit's end because of this baffling, morbid strain in her remarkable daughter, was advised by a man of sane judgment and considerable understanding of child nature, to throw responsibility upon her and give her a school to teach.
  • 77.
    It happened, therefore,that when Clara Barton was fifteen she put down her skirts, put up her hair, and entered upon her successful career as a teacher. She liked the children and believed in them, entering enthusiastically into their concerns, and opening the way to new interests. When asked how she managed the discipline of the troublesome ones, she said, The children give no trouble; I never have to discipline at all, quite unconscious of the fact that her vital influence gave her a control that made assertion of authority unnecessary. When the boys found that I was as strong as they were and could teach them something on the playground, they thought that perhaps we might discover together a few other worth-while things in school hours, she said. For eighteen years Clara Barton was a teacher. Always learning herself while teaching others, she decided in 1852 to enter Clinton Liberal Institute in New York as a pupil for graduation, for there was then no college whose doors were open to women. When she had all that the Institute could give her, she looked about for new fields for effort. In Bordentown, New Jersey, she found there was a peculiar need for some one who would bring to her task pioneer zeal as well as the passion for teaching. At that time there were no public schools in the town or, indeed, in the State. The people who pose as respectable are too proud and too prejudiced to send their boys and girls to a free pauper school, and in the meantime all the children run wild, Miss Barton was told. We have tried again and again, said a discouraged young pedagogue. It is impossible to do anything in this place. Give me three months, and I will teach free, said Clara Barton. This was just the sort of challenge she loved. There was something to be done. She began with six unpromising gamins in a dilapidated, empty building. In a month her quarters proved too narrow. Each youngster became an enthusiastic and effectual
  • 78.
    advertisement. As always,her success lay in an understanding of her pupils as individuals, and a quickening interest that brought out the latent possibilities of each. The school of six grew in a year to one of six hundred, and the thoroughly converted citizens built an eight- room school-house where Miss Barton remained as principal and teacher until a breakdown of her voice made a complete rest necessary. The weak throat soon made it evident that her teaching days were over; but she found at the same time in Washington, where she had gone for recuperation, a new work. Living is doing, she said. Even while we say there is nothing we can do, we stumble over the opportunities for service that we are passing by in our tear-blinded self-pity. The over-sensitive girl had learned her lesson well. Life offered moment by moment too many chances for action for a single worker to turn aside to bemoan his own particular condition. The retired teacher became a confidential secretary in the office of the Commissioner of Patents. Great confusion existed in the Patent Office at that time because some clerks had betrayed the secrets of certain inventions. Miss Barton was the first woman to be employed in a Government department; and while ably handling the critical situation that called for all her energy and resourcefulness, she had to cope not only with the scarcely veiled enmity of those fellow-workers who were guilty or jealous, but also with the open antagonism of the rank and file of the clerks, who were indignant because a woman had been placed in a position of responsibility and influence. She endured covert slander and deliberate disrespect, letting her character and the quality of her work speak for themselves. They spoke so eloquently that when a change in political control caused her removal, she was before long recalled to straighten out the tangle that had ensued. At the outbreak of the Civil War Miss Barton was, therefore, at the very storm-center.
  • 79.
    The early daysof the conflict found her binding up the wounds of the Massachusetts boys who had been attacked by a mob while passing through Baltimore, and who for a time were quartered in the Capitol. Some of these recruits were boys from Miss Barton's own town who had been her pupils, and all were dear to her because they were offering their lives for the Union. We find her with other volunteer nurses caring for the injured, feeding groups who gathered about her in the Senate Chamber, and, from the desk of the President of the Senate, reading them the home news from the Worcester papers. Meeting the needs as they presented themselves in that time of general panic and distress, she sent to the Worcester Spy appeals for money and supplies. Other papers took up the work, and soon Miss Barton had to secure space in a large warehouse to hold the provisions that poured in. Not for many days, however, did she remain a steward of supplies. When she met the transports which brought the wounded into the city, her whole nature revolted at the sight of the untold suffering and countless deaths which were resulting from delay in caring for the injured. Her flaming ardor, her rare executive ability, and her tireless persistency won for her the confidence of those in command, and, though it was against all traditions, to say nothing of iron-clad army regulations, she obtained permission to go with her stores of food, bandages, and medicines to the firing-line, where relief might be given on the battle-field at the time of direst need. The girl who had been a bundle of fears had grown into the woman who braved every danger and any suffering to carry help to her fellow-countrymen. People who spoke of her rare initiative and practical judgment had little comprehension of the absolute simplicity and directness of her methods. She managed the sulky, rebellious drivers of her army- wagons, who had little respect for orders that placed a woman in control, in the same way that she had managed children in school. Without relaxing her firmness, she spoke to them courteously, and
  • 80.
    called them toshare the warm dinner she had prepared and spread out in appetizing fashion. When, after clearing away the dishes, she was sitting alone by the fire, the men returned in an awkward, self- conscious group. We didn't come to get warm, said their spokesman, as she kindly moved to make room for them at the flames, we come to tell you we are ashamed. The truth is we didn't want to come. We know there is fighting ahead, and we've seen enough of that for men who don't carry muskets, only whips; and then we've never seen a train under charge of a woman before, and we couldn't understand it. We've been mean and contrary all day, and you've treated us as if we'd been the general and his staff, and given us the best meal we've had in two years. We want to ask your forgiveness, and we sha'n't trouble you again. She found that a comfortable bed had been arranged for her in her ambulance, a lantern was hanging from the roof, and when next morning she emerged from her shelter, a steaming breakfast awaited her and a devoted corps of assistants stood ready for orders. I had cooked my last meal for my drivers, said Clara Barton. These men remained with me six months through frost and snow and march and camp and battle; they nursed the sick, dressed the wounded, soothed the dying, and buried the dead; and, if possible, they grew kinder and gentler every day. An incident that occurred at Antietam is typical of her quiet efficiency. According to her directions, the wounded were being fed with bread and crackers moistened in wine, when one of her assistants came to report that the entire supply was exhausted, while many helpless ones lay on the field unfed. Miss Barton's quick eye had noted that the boxes from which the wine was taken had fine Indian meal as packing. Six large kettles were at once unearthed from the farm-house in which they had taken quarters, and soon her men were carrying buckets of hot gruel for miles over the fields where lay hundreds of wounded and dying. Suddenly, in
  • 81.
    the midst ofher labors, Miss Barton came upon the surgeon in charge sitting alone, gazing at a small piece of tallow candle which flickered uncertainly in the middle of the table. Tired, Doctor? she asked sympathetically. Tired indeed! he replied bitterly; tired of such heartless neglect and carelessness. What am I to do for my thousand wounded men with night here and that inch of candle all the light I have or can get? Miss Barton took him by the arm and led him to the door, where he could see near the barn scores of lanterns gleaming like stars. What is that! he asked amazedly. The barn is lighted, she replied, and the house will be directly. Where did you get them! he gasped. Brought them with me. How many have you? All you want—four boxes. The surgeon looked at her for a moment as if he were waking from a dream; and then, as if it were the only answer he could make, fell to work. And so it was invariably that she won her complete command of people as she did of situations, by always proving herself equal to the emergency of the moment. Though, as she said in explaining the tardiness of a letter, my hands complain a little of unaccustomed hardships, she never complained of any ill, nor allowed any danger or difficulty to interrupt her work. What are my puny ailments beside the agony of our poor shattered boys lying helpless on the field? she said. And so, while doctors and officers wondered at her unlimited capacity for prompt and effective action, the men who had felt her sympathetic touch
  • 82.
    and effectual aidloved and revered her as The Angel of the Battlefield. One incident well illustrates the characteristic confidence with which she moved about amid scenes of terror and panic. At Fredericksburg, when every street was a firing-line and every house a hospital, she was passing along when she had to step aside to allow a regiment of infantry to sweep by. At that moment General Patrick caught sight of her, and, thinking she was a bewildered resident of the city who had been left behind in the general exodus, leaned from his saddle and said reassuringly: You are alone and in great danger, madam. Do you want protection? Miss Barton thanked him with a smile, and said, looking about at the ranks, I believe I am the best-protected woman in the United States. The soldiers near overheard and cried out, That's so! that's so! And the cheer that they gave was echoed by line after line until a mighty shout went up as for a victory. The courtly old general looked about comprehendingly, and, bowing low, said as he galloped away, I believe you are right, madam. Clara Barton was present on sixteen battle-fields; she was eight months at the siege of Charleston, and served for a considerable period in the hospitals of Richmond.
  • 83.
    Clara Barton When thewar was ended and the survivors of the great armies were marching homeward, her heart was touched by the distress in many homes where sons and fathers and brothers were among those listed as missing. In all, there were 80,000 men of whom no definite report could be given to their friends. She was assisting President Lincoln in answering the hundreds of heartbroken letters, imploring news, which poured in from all over the land when his tragic death left her alone with the task. Then, as no funds were available to finance a thorough investigation of every sort of record of States, hospitals, prisons, and battle-fields, she maintained out of her own means a bureau to prosecute the search. Four years were spent in this great labor, during which time Miss Barton made many public addresses, the proceeds of which were devoted to the cause. One evening in the winter of 1868, while in the midst of a lecture, her voice suddenly left her. This was the beginning of a complete nervous collapse. The hardships and prolonged strain had, in spite of her robust constitution and iron will, told at last on the endurance of that loyal worker.
  • 84.
    When able totravel, she went to Geneva, Switzerland, in the hope of winning back her health and strength. Soon after her arrival she was visited by the president and members of the International Committee for the Relief of the Wounded in War, who came to learn why the United States had refused to sign the Treaty of Geneva, providing for the relief of sick and wounded soldiers. Of all the civilized nations, our great republic alone most unaccountably held aloof. Miss Barton at once set herself to learn all she could about the ideals and methods of the International Red Cross, and during the Franco-Prussian War she had abundant opportunity to see and experience its practical working on the battle-field. At the outbreak of the war in 1870 she was urged to go as a leader, taking the same part that she had borne in the Civil War. I had not strength to trust for that, said Clara Barton, and declined with thanks, promising to follow in my own time and way; and I did follow within a week. As I journeyed on, she continued, I saw the work of these Red Cross societies in the field accomplishing in four months under their systematic organization what we failed to accomplish in four years without it—no mistakes, no needless suffering, no waste, no confusion, but order, plenty, cleanliness, and comfort wherever that little flag made its way—a whole continent marshaled under the banner of the Red Cross. As I saw all this and joined and worked in it, you will not wonder that I said to myself 'if I live to return to my country, I will try to make my people understand the Red Cross and that treaty.' Months of service in caring for the wounded and the helpless victims of siege and famine were followed by a period of nervous exhaustion from which she but slowly crept back to her former hold on health. At last she was able to return to America to devote herself to bringing her country into line with the Red Cross movement. She found that traditionary prejudice against entangling alliances with other powers, together with a singular failure to comprehend the vital importance of the matter, militated against the great cause.
  • 85.
    Why should wemake provision for the wounded? it was said. We shall never have another war; we have learned our lesson. It came to Miss Barton then that the work of the Red Cross should be extended to disasters, such as fires, floods, earthquakes, and epidemics—great public calamities which require, like war, prompt and well-organized help. Years of devoted missionary work with preoccupied officials and a heedless, short-sighted public at length bore fruit. After the Geneva Treaty received the signature of President Arthur on March 1, 1882, it was promptly ratified by the Senate, and the American National Red Cross came into being, with Clara Barton as its first president. Through her influence, too, the International Congress of Berne adopted the American Amendment, which dealt with the extension of the Red Cross to relief measures in great calamities occurring in times of peace. The story of her life from this time on is one with the story of the work of the Red Cross during the stress of such disasters as the Mississippi River floods, the Texas famine in 1885, the Charleston earthquake in 1886, the Johnstown flood in 1899, the Russian famine in 1892, and the Spanish-American War. The prompt, efficient methods followed in the relief of the flood sufferers along the Mississippi in 1884 may serve to illustrate the sane, constructive character of her work. Supply centers were established, and a steamer chartered to ply back and forth carrying help and hope to the distracted human creatures who stood wringing their hands on a frozen, fireless shore —with every coal-pit filled with water. For three weeks she patrolled the river, distributing food, clothing, and fuel, caring for the sick, and, in order to establish at once normal conditions of life, providing the people with many thousands of dollars' worth of building material, seeds, and farm implements, thus making it possible for them to help themselves and in work find a cure for their benumbing distress.
  • 86.
    Our Lady ofthe Red Cross lived past her ninetieth birthday, but her real life is measured by deeds, not days. It was truly a long one, rich in the joy of service. She abundantly proved the truth of the words: We gain in so far as we give. If we would find our life, we must be willing to lose it.
  • 87.
    Welcome to ourwebsite – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com