SlideShare a Scribd company logo
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/226335374
What is Information?
Article in Foundations of Science · January 2004
DOI: 10.1023/B:FODA.0000025034.53313.7c
CITATIONS
23
READS
49,598
1 author:
Some of the authors of this publication are also working on these related projects:
VII Conference On Quantum Foundations - Website: https://sites.google.com/site/viijornadasfundamentoscuantica/ View project
Problemas filosóficos de la dinámica científica View project
Olimpia Lombardi
CONICET-National Scientific and Technical Research Council
161 PUBLICATIONS 1,856 CITATIONS
SEE PROFILE
All content following this page was uploaded by Olimpia Lombardi on 13 March 2014.
The user has requested enhancement of the downloaded file.
OLIMPIA LOMBARDI
WHAT IS INFORMATION?
ABSTRACT. The main aim of this work is to contribute to the elucidation of the
concept of information by comparing three different views about this matter: the
view of Fred Dretske’s semantic theory of information, the perspective adopted by
Peter Kosso in his interaction-information account of scientific observation, and
the syntactic approach of Thomas Cover and Joy Thomas. We will see that these
views involve very different concepts of information, each one useful in its own
field of application. This comparison will allow us to argue in favor of a termino-
logical ‘cleansing’: it is necessary to make a terminological distinction among
the different concepts of information, in order to avoid conceptual confusions
when the word ‘information’ is used to elucidate related concepts as knowledge,
observation or entropy.
KEY WORDS: communication, information, knowledge, observation, prob-
ability
‘We live in the age of information’: this sentence has become a
commonplace of our times. Our everyday language includes the
word ‘information’ in a variety of different contexts. It seems that
we all precisely know what information is. Moreover, the explosion
in telecommunications and computer sciences endows the concept
of information with a scientific prestige that makes supposedly
unnecessary any further explanation. This apparent self-evidence
has entered the philosophical literature: philosophers usually handle
the concept of information with no careful discussion.
However, the understanding of the meaning of the word ‘infor-
mation’ is far from being so simple. The supposed agreement hides
the fact that many different senses of the same word coexist. The
flexibility of the concept of information makes it a diffuse notion
with a wide but vague application. If we ask: ‘What is information?’,
we will obtain as many different definitions as answers.
The main aim of this work is to contribute to the elucida-
tion of the concept of information by comparing three different
views about this matter: the view of Fred Dretske’s semantic theory
Foundations of Science 9: 105–134, 2004.
© 2004 Kluwer Academic Publishers. Printed in the Netherlands.
106 OLIMPIA LOMBARDI
of information, the perspective adopted by Peter Kosso in his
interaction-information account of scientific observation, and the
syntactic approach of Thomas Cover and Joy Thomas. We will see
that these views involve very different concepts of information, each
one useful in its own field of application. This comparison will allow
us to argue in favor of a terminological ‘cleansing’: it is necessary
to make a terminological distinction among the different concepts of
information, in order to avoid conceptual confusions when the word
‘information’ is used to elucidate related concepts as knowledge,
observation or entropy.
1. BASIC CONCEPTS OF SHANNON’S THEORY
We will begin our discussion by presenting some basic concepts of
Shannon’s theory, because all the three views that we will analyze
adopt this theory as their formal basis. The concepts introduced in
this section will be necessary for our further discussion.
The Theory of Information was formulated to solve certain
specific technological problems. In the early 1940s, it was thought
that increasing the transmission rate of information over a communi-
cation channel would increase the probability of error. With his
paper “The Mathematical Theory of Communication”, Claude
Shannon (1948) surprised the communication theory community by
proving that this was not true as long as the communication rate was
below the channel capacity; this capacity can be easily computed
from the characteristics of the channel. This paper was immediately
followed by many works of application to fields as radio, television
and telephony. At present, Shannon’s theory has become a basic
element of the communication engineers training.
Communication requires a source S, a receiver R and a channel
CH:
If S has a range of possible states s1, . . ., sn, whose probabilities of
occurrence are p(s1), . . ., p(sn), the amount of information generated
at the source by the occurrence of si is:1
WHAT IS INFORMATION? 107
I(si) = log1/p(si)
(1.1)
Where ‘log’ is the logarithm to the base 2.2 The choice of a
logarithmic base amounts to a choice of a unit for measuring infor-
mation. If the base 2 is used, the resulting unit is called ‘bit’ –
a contraction of binary unit –.3 One bit is the amount of infor-
mation obtained when one of two equally likely alternatives is
specified.
However, the theory is not concerned with the occurrence of
specific events, but with the communication process as a whole.
Hence, the average amount of information generated at the source
is defined as the average of the I(si) weighted by the probability of
occurrence of each state:
I(S) = p(si)I(si) = p(si)log1/p(si)
(1.2)
I(S) has its maximum value equal to log n, when all the p(si) have
the same value, p(si) = 1/n.
By analogy, if R has a range of possible states r1, . . ., rm, whose
probabilities of occurrence are p(r1), . . ., p(rm), the amount of
information received at the receiver by the occurrence of ri is:
I(ri) = log1/p(ri)
(1.3)
And the average amount of information received at the receiver is
defined as:
I(R) = p(ri)I(ri) = p(ri)log1/p(ri)
(1.4)
The relationship between I(S) and I(R) can be represented by the
following diagram:
where:
108 OLIMPIA LOMBARDI
• I(S, R) transinformation. Average amount of information
generated at S and received at R.
• E: equivocation. Average amount of information generated at S
but not received at R.
• N: noise. Average amount of information received at R but not
generated at S.
As the diagram shows, I(S, R) can be computed as:
I(S, R) = I(S) − E = I(R) − N
(1.5)
E and N are measures of the amount of dependence between the
source S and de receiver R:
• If S and R are totally independent, the values of E and N are
maximum (E = I(S) and N = I(R)), and the value of I(S, R) is
minimum (I(S, R) = 0).
• If the dependence between S and R is maximum, the values of
E and N are minimum (E = N = 0), and the value of I(S, R) is
maximum (I(S, R) = I(S) = I(R)).
The values of E and N are not only function of the source and the
receiver, but also of the communication channel. The introduction of
the communication channel leads directly to the possibility of errors
arising in the process of transmission: the channel CH is defined by
the matrix [p(rj /si)], where p(rj /si) is the conditional probability of
the occurrence of rj given that si occurred, and the elements in any
row must sum to 1. Thus, the definitions of E and N are:
E = p(rj )p(si/rj )log1/p(si/rj )
(1.6)
= p(rj , si)log1/p(si/rj )
N = p(si)p(rj /si)log1/p(rj /si)
(1.7)
= p(si, rj )log1/p(rj /si)
where p(si, rj ) = p(rj , si) is the joint probability of si and rj (p(si,
rj ) = p(si)p(rj /si); p(rj , si) = p(rj )p(si/rj)). The channel capacity is
given by:
C = max I(S, R)
(1.8)
where the maximum is taken over all the possible distributions p(si)
at the source.4
WHAT IS INFORMATION? 109
The strong relationship between the characteristics of the channel
and the values of E and N allows us to define two special types of
communication channels:
• Equivocation-free channel (E = 0): a channel defined by a
matrix with one, and only one, non-zero element in each
column.
• Noise-free channel (N = 0): a channel defined by a matrix with
one, and only one, non-zero element in each row.
2. INFORMATION AND KNOWLEDGE: DRETSKE’S SEMANTIC
THEORY OF INFORMATION
A concept usually connected with the notion of information is
the concept of knowledge: it is assumed that information provides
knowledge, that it modifies the state of knowledge of those who
receive it. Some authors even define the measure of information in
terms of knowledge; this is the case of D.A. Bell in his well-known
textbook, where he states that information “is measured as a differ-
ence between the state of knowledge of the recipient before and
after the communication of information” (Bell, 1957, p. 7).
In his Knowledge and the Flow of Information, Fred Dretske
presents an attempt to apply a semantic concept of information to
questions in the theory of knowledge. By identifying knowledge
and information-caused belief, he distinguishes between sensory
processes and cognitive processes – between seeing and recognizing
– in terms of the different ways in which information is coded, and
analyzes the capacity of holding beliefs and developing concepts.
But this is not the part of his work with which we are concerned;
we are interested on his interpretation of the concept of informa-
tion. Dretske adopts Shannon’s theory as a starting point, but he
introduces two new elements into his approach. First, he proposes a
change in the basic formulas of the theory. Second, he supplements
the resulting formal theory with a semantic dimension. Let us begin
with the first point.
According to Dretske, one of the respects in which Shannon’s
theory is unprepared to deal with semantic issues is that semantic
notions apply to particular messages, while the theory of informa-
tion deals with average amounts of information. Since Dretske is
110 OLIMPIA LOMBARDI
concerned with seeking an information-based theory of knowledge,
he is interested on the informational content of particular messages
and not on average amounts of information: “if information theory
is to tell us anything about the informational content of signals,
it must forsake its concern with averages and tell us something
about the information contained in particular messages and
signals. For it is only particular messages and signals that have
a content” (Dretske, 1981, p. 48). In order to focus on the infor-
mation contained in particular messages, Dretske changes the usual
interpretation about the relevant quantities of the theory: instead
of considering the average amount of information I(S) as the basic
quantity (equation (1.2)), he focuses on the amount of information
generated at the source by the occurrence of sa (equation (1.1)):
I(sa) = log1/p(sa)
(2.1)
and instead of adopting the transinformation I(S, R) as the relevant
quantity, he defines a new ‘individual’ transinformation I(sa, ra),
amount of information carried by a particular signal ra about sa, by
analogy with equation (1.5) (Dretske, 1981, p. 52):5
I(sa, ra) = I(sa) − E(ra)
(2.2)
where:
E(ra) = p(si/ra)log1/p(si/ra)
(2.3)
According to Dretske (1981, p. 24), E(ra) is the contribution of ra
to the equivocation E because, given the definition of E (equation
(1.6)), results that:
E = p(rj )p(si/rj )log1/p(si/rj ) = p(rj )E(rj )
(2.4)
Dretske foresees that he will be accused of misrepresenting or
misunderstanding the theory of information. For this reason, he
emphasizes that “the above formulas are now be assigned a
significance, given an interpretation, that they do not have in
standard applications of communication theory. They are now
being used to define the amount of information associated with
particular events and signals” (Dretske, 1981, p. 52). And he
immediately adds that, even though such an interpretation is foreign
WHAT IS INFORMATION? 111
to standard applications of the theory, it is “perfectly consistent”
with the orthodox uses of these formulas.
Dretske’s aim of adapting the standard theory of information
to make it capable of dealing with the information contained in
particular messages is very valuable. The problem is that the formal
resources to reach this goal have deep technical difficulties. The
least of them is talking about the ‘signal ra’ in the definition of
I(sa, ra) (equation (2.2)): ra is not a signal but one of the states
of the receiver. I(sa, ra) should be defined as the amount of infor-
mation about the state sa of the source contained in the state ra of
the receiver. It is even more troubling that Dretske uses the same
subindex ‘a’ to refer to the state of the source and to the state of the
receiver, as if some specific relationship linked certain pairs (s, r). In
order to make the definition of the new individual transinformation
(equation (2.2)) completely general, I(si, rj ) should be defined as the
amount of information about the state si of S received at R through
the occurrence of its state rj :
I(si, rj) = I(si) − E(rj )
(2.5)
where E(rj ) would be (by analogy with equation (2.3)):
E(rj ) = p(si/rj )log1/p(si/rj )
(2.6)
However, we have not reached the central difficulty yet. When
Dretske’s proposal is formally ‘cleaned’ in this way, its main tech-
nical problem shows up. If – as Dretske supposes – I(si, rj ) were
the ‘individual’ correlate of the transinformation I(S, R), then I(S,
R) should be computed as the average of the I(si, rj ). According to
the definition of the average of a function of two variables:
I(S, R) = p(si, rj )I(si, rj)
(2.7)
where I(S, R) = I(S) − E (equation (1.5)), with the standard defini-
tions of I(S) and E (equations (1.2) and (1.6)). The technical problem
is that the identity (2.7) does not hold with Dretske’s formulas.
Indeed, a simple algebraic argument shows that we cannot obtain:
I(S, R) = I(S) − E = p(si)log1/p(si)
(2.8)
−p(rj , si)log1/p(si/rj )
112 OLIMPIA LOMBARDI
from the right-hand term of (2.7) when (2.5) and (2.6) are used.6
Therefore, we cannot accept Dretske’s response to the critics who
accuse him of misunderstanding Shannon’s theory: his ‘interpre-
tation’ of the formulas by means of these new quantities is not
compatible with the formal structure of the theory.
It might be argued that this is a minor formal detail, but this
detail has relevant conceptual consequences. When Dretske defines
E(rj ), that is, the contribution of rj to the equivocation E as a
summation over the si (equation (2.6)), he makes the error of
supposing that this individual contribution is only function of the
particular state rj of the receiver. But the equivocation E is a
magnitude that essentially depends on the communication channel.
Then, any individual contribution to E must preserve such a depend-
ence. The understanding of this conceptual point allows us to
retain Dretske’s proposal by appropriately correcting his formal
approach. In order to introduce such a correction, we must define
the individual contribution of the pair (si, rj ) to the equivocation E
as:
E(si, rj ) = log1/p(si/rj )
(2.9)
With this definition, it holds that the average of E(si, rj ) is equal to
E:
E = p(rj , si)log1/p(si/rj ) = p(rj , si)E(si, rj )
(2.10)
Now we can correctly rewrite equation (2.5) as:
I(si, rj) = I(si) − E(si, rj)
(2.11)
where the average of I(si, rj ) is the transinformation I(S, R):
I(S, R) = I(S) − E = p(si, rj )I(si, rj )
(2.12)
This modified version of the formulas makes possible to reach
Dretske’s goal, that is, to adapt the standard theory of information
to deal with the information contained in particular messages. We
can now return to Dretske’s argument. When does the occurrence of
the state rj at the receiver give us the knowledge of the occurrence
of the state si at the source? The occurrence of the state rj tells
us that si has occurred when the amount of information I(si, rj ) is
WHAT IS INFORMATION? 113
equal to the amount of information I(si) generated at the source by
the occurrence of si. This means that there was no loss of informa-
tion through the individual communication, that is, the value of the
individual contribution E(si, rj ) to the equivocation is zero (Dretske,
1981, p. 55); according to equation (2.11):
E(si, rj ) = 0 ⇒ I(si, rj ) = I(si)
But, now, the value of E(si, rj ) must be obtained with the correct
formula (2.9). At this point, it should be emphasized again that,
contrary to Dretske’s assumption, the individual contribution to the
equivocation is function of the communication channel and not only
of the receiver. In other words, it is not the state rj which individu-
ally contributes to the equivocation E but the pair (si, rj ), with
its associated probabilities p(si) and p(rj ), and the corresponding
conditional probability p(rj /si) of the channel. This means that we
can get completely reliable information – we can get knowledge
– about the source even through a very low probable state of the
receiver, provided that the channel is appropriately designed.
But Dretske does not stop here. In spite of having begun
from the formal theory of information, he immediately reminds us
Shannon’s remark: “[the] semantic aspects of communication are
irrelevant to the engineering problem. The significant aspect is
that the actual message is one selected from a set of possible
messages” (Shannon, 1948, p. 379). Shannon’s theory is purely
quantitative: it only deals with amounts of information, but ignores
questions related to informational content. The main contribution
of Dretske is his semantic theory of information, which tries to
capture what he considers the nuclear sense of the term ‘informa-
tion’: “A state of affairs contains information about X to just that
extent to which a suitable placed observer could learn some-
thing about X by consulting it” (Dretske, 1981, p. 45). Dretske
defines the informational content of a state r in the following terms
(p. 65):
A state r carries the information that S is F = The condi-
tional probability of S’s being F, given r (and k), is 1 (but,
given k alone, less than 1).
where k stands for what the receiver already knows about the
possibilities existing at the source.
114 OLIMPIA LOMBARDI
However, unlike what may be supposed, the semantic character
of this proposal does not rely on such a definition of informational
content. Of course, this definition cannot be stated in terms of
the original Shannon’s theory, because it only deals with average
amounts of information. But it can be formulated with the new
quantities referred to the amount of information contained in partic-
ular states. In fact, the concept of informational content can be –
more precisely – defined as follows:
A state rB of the receiver contains the information about
the occurrence of the state sA of the source iff p(sA/rB) =
1 but p(sA)  1, given the knowledge of the probability
distribution over the possible states of the source.
where sA stands for S’s being F. If the right formulas are used, we
can guarantee that:
• If p(sA)  1, then I(sA)  0 (equation (2.1)), that is, there is a
positive amount of information generated at the source by the
occurrence of sA.
• If p(sA/rB) = 1, then E(sA, rB) = 0 (equation (2.9)), that is, the
individual contribution of the pair (sA, rB) to the equivocation
E is zero. And if E(sA, rB) = 0, then I(sA, rB) = I(sA) (equation
(2.11)).
In other words, the definition says that rB contains the information
about the occurrence of sA iff the amount of information about
sA received through the occurrence of rB is equal to the positive
amount of information generated by the occurrence of sA. Dretske
tries to express a similar idea when he says: “if the conditional
probability of S’s being F (given r) is 1, then the equivocation
of this signal must be 0 and (in accordance with formula 1.5)
the signal must carry as much information about S, I(S, R), as is
generated by S’s being F, I(sF)” (Dretske, 1981, p. 65), where his
formula 1.5 is I(S, R) = I(S) − E. The problem is that this is wrong:
p(sA/rB) = 1 does not implies that E = 0 and I(S, R) = I(S) (see
equation (1.6)). Why does he uses these formulas, which refer to
average amounts of information, instead of using the new formulas
which refer to the amount of information contained in particular
messages, for which necessity he has strongly argued? The reason
relies again on his formal error: with his definition of E(rB) (equa-
WHAT IS INFORMATION? 115
tion (2.6)), p(sA/rB) = 1 does not make the individual contribution to
the equivocation E equal to zero and, then, he cannot guarantee that
I(sA, rB) = I(sA). Only when the new formulas are properly
corrected, the idea roughly expressed by Dretske can be stated with
precision. In summary, Dretske’s definition of informational content
says nothing that cannot be said in terms of the theory of informa-
tion adapted, in the right way, to deal with particular amounts of
information.
If the semantic character of Dretske’s proposal is not based on
the definition of informational content, where is it based on? For
Dretske, information qualifies as a semantic concept in virtue of
the intentionality inherent in its transmission. And the ultimate
source of this intentionality is the nomic character of the regular-
ities on which the transmission of information depends. The channel
probabilities p(rj /si) do not represent a set of mere de facto correla-
tions; they are determined by a network of lawful connections
between the states of the source and the states of the receiver:
“The conditional probabilities used to compute noise, equivo-
cation, and amount of transmitted information (and therefore
the conditional probabilities defining the informational content
of the signal) are all determined by the lawful relations that
exist between source and signal. Correlations are irrelevant
unless these correlations are a symptom of lawful connections”
(Dretske, 1981, p. 77). It is true that, in many technological applica-
tions of information theory, statistical data are used to determine the
relevant probabilities. Nevertheless, even in these cases it is assumed
that the statistical correlations are not accidental, but manifesta-
tions of underlying lawful regularities. Indeed, there is normally
an elaborate body of theory that stands behind the attributions of
probabilities. In short, the source of the semantic character of infor-
mation is its intentionality; and information inherits its intentional
properties from the lawful regularities on which it depends.7
Dretske emphasizes the semantic character of information
because it is precisely this character what relates information to
knowledge. Even if the properties F and G are perfectly correlated
– whatever is F is G and vice-versa –, this does not assure us that
we can know that ‘x is G’ by knowing that ‘x is F’. If the correlation
between F and G is a mere coincidence, there is no information
116 OLIMPIA LOMBARDI
in x’ being F about x’ being G; the first fact tells us nothing about
the second one. In other words, the mere correlations, and even
the exceptionless accidental uniformity, do not supply knowledge.
This fact about information explains why we are sometimes in a
position to know that x is F without being able to tell whether x
is G, despite the fact that every F is G. Only on the basis of this
semantic dimension of information we can affirm that “informa-
tion is a commodity that, given the right recipient, is capable of
yielding knowledge” (Dretske, 1981, p. 47).
Dretske’s definition of informational content also refers to what
the receiver already knows about the possibilities existing at the
source (k). In formal terms, we can say that the amount of infor-
mation about the occurrence of the state sA received through the
occurrence of the state rB, I(sA, rB), depends not only on the
communication channel but also on the characteristics of the source
S: in particular, it is a function of the probabilities p(s1), . . . , p(sn).
But the definition of the source is not absolute; on the contrary, it
depends on the knowledge about the source available at the receiver
end before the transmission. In other words, the background knowl-
edge is relevant to the received information only to the extent that
it affects the value of the amount of information generated at the
source by the occurrence of a specific state. This fact implies a rela-
tivization of the informational content with respect to the knowledge
available before the transmission. Usually, the relative character of
information is not explicitly considered in technical books about
the subject, where the background knowledge is tacitly taken into
account in the definition of the source; an exception is Bell, who
admits that “the datum point of information is then the whole
body of knowledge possessed at the receiving end before the
communication” (Bell, 1957, p. 7).
However, the relative character of information does not make
the concept less objective or not amenable to precise quantification.
In this sense, Dretske disagrees with Daniel Dennet, who claims
that “the information received by people when they are spoken
to depends on what they already know and is not amenable to
precise quantification” (Dennet, 1969, p. 187). He follows Dennet
in relativizing the information contained in a message but, unlike
Dennet, he correctly asserts that this does not mean that we cannot
WHAT IS INFORMATION? 117
precisely quantify such information: if the knowledge available at
the receiver end is accurately determined, the received information
can be quantified with precision. Dretske also stresses the fact that
the background knowledge must not be conceived as a subjective
factor, but rather as a frame of reference regarding to which infor-
mation is defined. The relative character of objective magnitudes is
usual in sciences; in this sense, information is not different from
velocity or simultaneity: only when there is a shift of reference
systems does the need arise to make explicit the relative nature of
the quantity under consideration; but such relativity does not mean
non-objectivity. From Dretske’s viewpoint, only to the extent that
information is conceived as an objective magnitude, the concept of
information can be fruitfully applied to questions in the theory of
knowledge.
3. KOSSO’S INTERACTION-INFORMATION ACCOUNT OF
SCIENTIFIC OBSERVATION
During the last decades, some authors have abandoned the linguistic
approach to the problem of scientific observation – an approach
shared by the positivistic tradition and by its anti-positivistic critics
–, to focus on the study of the observational instruments and
methods used in natural sciences. From this perspective, Dudley
Shapere, Harold Brown and Peter Kosso agree in their attempt to
elucidate the scientific use of the term ‘observable’ by means of
the concept of information. Thus, Shapere proposes the following
analysis: “x is directly observed (observable) if: (i) information
is received (can be received) by an appropriate receptor; and (ii)
that information is (can be) transmitted directly, i.e., without
interference, to the receptor from the entity x (which is the
source of information)” (Shapere, 1988, p. 492). In turn, Brown
defines observation in science in the following way: “To observe
an item I is to gain information about I from the examination of
another item I*, where I* is an item that we (epistemically) see
and I is a member of the causal chain that produced I*” (Brown,
1987, p. 93). By using a terminology more familiar to physicists,
Kosso replaces the idea of causal chain by the idea of interaction;
but again, the concept of information becomes central when he
118 OLIMPIA LOMBARDI
defines: “The ordered pair object x, property P is observ-
able to the extent that there can be an interaction (or a chain of
interactions) between x and an observing apparatus such that
the information ‘that x is P’ is transmitted to the apparatus and
eventually conveyed to a human scientist” (Kosso, 1989, p. 32).
But, how is ‘information’ interpreted in this context? Shapere and
Brown do not explain its meaning, as if the concept of information
lacked interpretative difficulties. On the contrary, Kosso admits that
the concept requires further discussion: “a lot hangs on the notion
of information, and it is only by clarifying this that a full under-
standing of the observing apparatus is made clear and that the
necessary condition of interaction is augmented with sufficient,
epistemic conditions” (Kosso, 1989, pp. 35–36). For this purpose,
he follows Dretske’s semantic theory of information, and adapts it
for elucidating scientific observation: he seeks a concept of infor-
mation that makes room for observation of what is already known.
Then, Kosso introduces the following modification: he calls ‘new
information’ what Dretske calls ‘information’, and adds the descrip-
tion of redundant information for the case in which the observed
fact ‘x is P’ is included in k, that is, in the body of the previous
knowledge about the source.
With this conceptual framework, Kosso analyzes several
examples of observation in physical sciences, classifying them in
examples of entities that are unobservable in principle – when the
physical theory which describes the entity explicitly precludes its
being observed –, examples of unperceivable entities – which can
interact with some non-human device but cannot interact with a
human sense organ – and examples of perceivable entities – which
can interact in an informational way with the human being –. Never-
theless, here we are not interested on this part of Kosso’s study,
but on the concept of information that lies behind his interaction-
information account of observation in physical sciences. Kosso
explicitly says that he borrows heavily from Dretske’s semantic
theory, as if his work were an application of the semantic concept of
information provided by Dretske. However, a careful examination
of both perspectives shows up some differences between them.
The first difference is that, unlike Dretske, Kosso does not exploit
the formal resources of the information theory. This is clear when,
WHAT IS INFORMATION? 119
following Dretske, he stresses that it is a mistake to simply identify
the flow of information with a propagation of a causal influence.
His argument proposes a case where the state s of the source causes
the state a of the receiver, but a can also be caused by another state
s of the source; in this case, the occurrence of a at the receiver
does not distinguish between the possible states s and s and, then,
does not allow us to know which was the state of the source. On
this basis, Kosso concludes: “causal interaction is not sufficient
for the conveyance of information” (Kosso, 1989, p. 38). But this
idea can be precisely expressed with the formal theory: if Kosso’s
example is formally represented, we can see that the individual
contribution E(s, a) to the equivocation is not zero and, then, the
amount of information I(s) generated at the source by the occur-
rence of s is not equal to the individual transinformation I(s, a)
(equation (2.11)). This means that, even if the occurrence of a
tells us that either s or s have occurred, it does not give us the
knowledge of which of both states occurred at the source; but it
is precisely such knowledge what we need for making a scientific
observation. In other words, if – following Dretske – Kosso used
the formal resources of the theory of information, he could formu-
late the concept of scientific observation in more precise terms. In
fact, from an informational approach to scientific observation we
can characterize observation in science as a process of transmission
of information from the observed entity to the receiver through an
equivocation-free channel. This characterization allows us to under-
stand why noise does not prevent observation and to argue for the
conceptual advantages of the informational account of observation
over the causal account. However, these matters are beyond the
purposes of this paper.8
The second difference between Kosso’s and Dretske’s views
is a conceptual divergence about the very nature of information.
Although Dretske claims that the communication channel is defined
by a network of nomic connections between the states of the source
and the states of the receiver, he explicitly declares that a physical
link between source and receiver is not necessary for the transmis-
sion of information. In this sense, he considers the following case
(Dretske, 1981, pp. 38–39):
120 OLIMPIA LOMBARDI
A source S is transmitting information to both receivers RA and RB
via some physical channel. RA and RB are isolated from one another
in the sense that there is no physical interaction between them.
But Dretske considers that, even though RA and RB are physically
isolated from one another, there is an informational link between
them. According to Dretske, it is correct to say that there is a
communication channel between RA and RB because it is possible to
learn something about RB by looking at RA and viceversa. Nothing
at RA causes anything at RB or viceversa; yet RA contains informa-
tion about RB and RB about RA. Dretske stresses the fact that the
correlations between the events occurring at both receivers are not
accidental, but they are functions of the common nomic dependen-
cies of RA and RB on S. However, for him this is an example of
an informational link between two points, despite the absence of a
physical channel between them. Dretske adds that the receiver RB
may be farther from the source than RA and, then, the events at RB
may occur later in time than those at RA, but this is irrelevant for
evaluating the informational link between them: even though the
events at RB occur later, RA carries information about what will
happen at RB. In short: “from a theoretical point of view [. . .]
the communication channel may be thought of as simply the
set of depending relations between S and R. If the statistical
relations defining equivocation and noise between S and R are
appropriate, then there is a channel between these two points,
and information passes between them, even if there is no direct
physical link joining S with R” (Dretske, 1981, p. 38).
As we have seen, in his interaction-information account of
scientific observation, Kosso asserts that interaction is not a suffi-
cient condition for information flow. But he also claims that: “obser-
vation must involve interaction. Interaction between x and an
WHAT IS INFORMATION? 121
observing apparatus is a necessary condition for observation”
(Kosso, 1989, pp. 34–35). This last requirement for observation
does not seem to be added to the demand of an information flow
between the observed entity and the receiver; on the contrary, it
seems a result of the very concept of information adopted by
Kosso, when he claims that “information is transferred between
states through interaction. The object in state s which has
informational content (s is P) interacts with something else,
the observing apparatus or some intermediate informational
medium, with the result that this latter object is left in a state
A which has the information (s is P) whereas it did not have that
information before the interaction” (Kosso, 1989, p. 37). This
quote suggests that Kosso would not agree with Dretske regarding
the example of the source transmitting to two receivers: certainly
Kosso would not accept that we can observe the events at RB by
looking at RA; but surely he would neither accept that information
flows from RA to RB with no physical link between them. If this
is right, despite its own assumption, Kosso does not completely
agree with Dretske’s view about information: instead of conceiving
the concept of information as a semantic concept, his concep-
tion approaches the perspective most usually adopted in physical
sciences, where an unavoidable link between flow of information
and propagation of signals is required. Physicists and engineers
accept the well-known dictum ‘no information without represen-
tation’: the transmission of information between two points of the
physical space necessarily requires an information-bearing signal,
that is, a physical process propagating from one point to the other.
This perspective is adopted when the correlations between spatially
separate quantum systems are considered: any analysis of EPR-
experiment9 stresses the fact that there is no information flowing
between the two particles, because the propagation of a superlu-
minal signal from one particle to the other is impossible. From this
view, information is a physical entity, which can be generated, accu-
mulated, stored, processed, converted from one form to another, and
transmitted from one place to another. Precisely due to the physical
nature of information, the dynamics of its flow is ruled by natural
laws; in particular, it is constrained by relativistic limitations. The
extreme versions of this view conceive information as a physical
122 OLIMPIA LOMBARDI
entity with the same ontological status as energy, and whose essen-
tial property is to manifest itself as structure when added to matter
(cfr. Stonier, 1990).
This kind of situations, where the correlations between two
points A and B are explained by lawful regularities but there is
no signal propagation between them,10 shows up that Dretske and
Kosso are using two different concepts of information. According
to the semantic concept, information is defined by its capability of
providing knowledge. From this view, the possibility of controlling
the states at A to send information to B is not a necessary condi-
tion for defining an information channel between A and B: the only
requirement for an informational link between both points is the
possibility of knowing the state at A by looking at B. According to
the physical concept, information is a physical entity whose essen-
tial feature is its capability of being generated at one point of the
physical space and transmitted to another point. This view requires
an information-bearing signal that can be modified at the transmitter
end in order to carry information to the receiver end. Therefore, if
there is no physical link between A and B, it is impossible to define
an information channel between them: we cannot control the states
at A to send information to B.
The divergence between the semantic view and the physical view
of information acquires great relevance when the concept of infor-
mation is applied to philosophical problems. In particular, when the
concept is used to elucidate the notion of scientific observation,
this interpretative divergence becomes explicit in the case of the
so-called ‘negative experiments’. Negative experiments were origi-
nally proposed as a theoretical tool for analyzing the measurement
problem in quantum mechanics (cfr. Jammer, 1974, pp. 495–496);
but here we will only use them to show the consequences of the
choice between both concepts of information. In a negative exper-
iment, it is assumed that an event has been observed by noting
the absence of some other event; this is the case of neutral weak
currents, which are observed by noticing the absence of charged
muons (cfr. Brown, 1987, pp. 70–75). But the conceptual core of
negative experiments can be understood by means of a very simple
example. Let us suppose a tube in whose middle point a particle is
emitted at t0 towards one of the ends of the tube. Let us also suppose
WHAT IS INFORMATION? 123
that we place a detection device at the right end A in order to know
in which direction the particle was emitted.
If after the appropriate time t1 – depending on the velocity of the
particle and the length of the tube – the device indicates no detection,
we can conclude that the particle was emitted towards the left side
of the tube. At this point, we can guarantee two facts:
• there is a perfect anticorrelation between both ends of the tube.
Then, by looking at the state – presence or absence of the
particle – at the right end A, we can know the state – absence or
presence, respectively – at the left end B.
• the instantaneous propagation of a signal between A and B at t1
is physically impossible.
The question is: do we have observed the direction of the emitted
particle?
From an informational account of scientific observation, the
answer depends on the view about information adopted for elucidat-
ing the notion of observation:
• if the semantic view is adopted, a communication channel
between both ends of the tube can be defined. Then, there is
a flow of information from B to A, which allows us to observe
the presence of the particle at B, even though there is no signal
propagating from B to A.
• if the physical view is adopted, there is no information flow
from B to A because there is not, and there cannot be, a signal
instantaneously propagating between B and A at t1. Then, we
do not observe the presence of the particle at B.
In other words, according to the semantic view of information,
by looking at the detector we simultaneously observe two events,
presence-at-B and absence-at-A. On the contrary, the physical view
leads us to a concept of observation more narrow than the previous
one: by looking at the detector we observe the state at A – presence
124 OLIMPIA LOMBARDI
or absence –, but we do not observe the state at B; such a state is
inferred.
This discussion shows that it is possible to agree on the formal
theory of information and even on some interpretative points but,
despite this, to dissent on the very nature of information. Informa-
tion may be conceived as a semantic item, whose essential property
is its capability of providing knowledge. But information may also
be regarded as a physical entity ruled and constrained by natural
laws.
4. THE SYNTACTIC APPROACH OF COVER AND THOMAS
The physical view of information has been the most widespread
view in physical sciences. Perhaps this fact was due to the specific
technological problems which led to the original theory of Shannon:
the main interest of communication engineers was, and still is,
to optimize the transmission of information by means of physical
signals, whose energy and bandwidth is constrained by techno-
logical and economic limitations. In fact, the physical view of
information is the most usual in the textbooks on the subject used
in engineer’s training. However, this situation is changing in recent
times: one can see that some very popular textbooks introduce infor-
mation theory in a completely syntactic way, with no mention of
sources, receivers or signals. Only when the syntactic concepts and
their mathematical properties have been presented, the theory is
applied to the traditional case of signal transmission.
Perhaps the best example of this approach is the presentation
offered by Thomas Cover and Joy Thomas in his book Elements of
Information Theory (1991).11 Just from the beginning of this book,
the authors clearly explain their perspective: “Information theory
answers two fundamental questions in communication theory:
what is the ultimate data compression [. . .] and what is the ulti-
mate transmission rate of communication [. . .]. For this reason
some consider information theory to be a subset of communi-
cation theory. We will argue that it is much more. Indeed, it
has fundamental contributions to make in statistical physics
(thermodynamics), computer sciences (Kolmogorov complexity
or algorithmic complexity), statistical inference (Occam’s
WHAT IS INFORMATION? 125
Razor: ‘The simplest explanation is best’) and to probability
and statistics (error rates for optimal hypothesis testing and
estimation)” (Cover and Thomas, 1991, p. 1). On the basis of
this general purpose, they define the basic concepts of information
theory in terms of random variables and probability distributions
over their possible values. Let X and Y be two discrete random vari-
ables with alphabets A and B, and probability mass functions p(x) =
Pr(X = x), x ∈ A and p(y) = Pr(Y = y), y ∈ B respectively. In general,
they call ‘entropy’ what we called ‘average amount of information’.
Thus, the entropy H(X) of a discrete random variable is defined by:
H(X) = p(x)log1/p(x)
(4.1)
Next, Cover and Thomas extend the definition of entropy to a pair
of discrete random variables: the joint entropy H(X, Y) of X and Y
with a joint distribution p(x, y) is defined as:
H(X, Y) = p(x, y)log1/p(x, y)
(4.2)
And the conditional entropy H(X/Y) of X given Y is defined as:
H(X/Y) = p(x, y)log1/p(x/y)
(4.3)
The naturalness of these definitions from the viewpoint of proba-
bility theory is exhibited by the fact that the entropy of a pair of
random variables is the entropy of one of them plus the conditional
entropy of the other:
H(X, Y) = H(X) + H(Y/X)
(4.4)
Thus, what we had originally called ‘equivocation’ E (equation
(1.6)) – ‘noise’ N (equation (1.7)) – here becomes the conditional
entropy H(X/Y) − H(Y/X) –. Cover and Thomas also define the
relative entropy D(p//q) between two probability mass functions
p(x) and q(x) as:
D(p//q) = p(x)logp(x)/q(x)
(4.5)
The relative entropy D(p//q) is a measure of the inefficiency of
assuming that the distribution is q when the true distribution is p;
then, D(p//q) is always non-negative, and is zero if and only if p
126 OLIMPIA LOMBARDI
= q.12 With these elements, Cover and Thomas define the mutual
information I(X, Y) – which we called ‘transinformation’- as the
relative entropy between the joint distribution p(x, y) and the product
distribution p(x)p(y):
I(X, Y) = p(x, y)logp(x, y)/p(x)p(y)
(4.6)
Thus, the mutual information is the reduction of the uncertainty of
X given Y.
On the basis of these concepts, Cover and Thomas demonstrate
the relationships among entropy, joint entropy, conditional entropy
and mutual information, and express them in the well-known
diagram:
I(X, Y) = H(X) − H(Y/X) = H(Y) − H(X/Y)
(4.7)
H(X, Y) = H(X) + H(Y) − I(X, Y)
(4.8)
where the first of both formulas is the analogue of equation (1.5),
which was expressed in terms of equivocation and noise. Since here
the concepts are introduced in terms of random variables and their
correlations, the authors can extend the definitions to the case of
more than two random variables. For example, they define (1991,
pp. 21–23) the entropy H(X1, . . ., Xn) of a collection of random vari-
ables, the conditional mutual information I(X, Y/Z) of the random
variables X and Y given Z, and the conditional relative entropy
D(p(y/x)//q(y/x)).
This brief summary of the way in which Cover and Thomas
present the theory of information shows that this approach adopts
a syntactic concept of information. From this perspective, the defini-
WHAT IS INFORMATION? 127
tion of information has nothing to do with communication, trans-
mission and reception of messages, nor with the knowledge of an
event obtained by looking at another event: here, the only ‘objects’
of the theory are random variables and their correlations. As we have
seen, even though Dretske admits the possibility of a communica-
tion channel with no physical substratum, he nevertheless requires
that the conditional probabilities defining the channel result from the
nomic dependence between the states of the source and the states of
the receiver. But from the perspective of Cover and Thomas, the
concept of information loses even this semantic ingredient: it is
legitimate to define the mutual information of two variables even
if there is no nomic relationship between them and their condi-
tional probabilities are computed exclusively by means of de facto
correlations. For example, if the last month results of the lottery of
Sydney partially coincide with the results obtained in the lottery of
Mexico during the same period, there is a positive mutual informa-
tion between both sequences of completely independent events. If
the concept of information is so deprived of the intentional character
required by Dretske, any link between information and knowledge
vanishes: as we have seen, when the correlation between two vari-
ables is merely accidental, the value of one of them tells us nothing
about the value of the other. In short, from this syntactic view,
we lose the basic intuition according to which information modi-
fies the state of knowledge of those who receive such information.
This might seem a too high price to pay for retaining the syntactic
approach, despite its elegance and mathematical precision.
However, the position of Cover and Thomas has its own advan-
tages. By turning information into a syntactic concept, this approach
makes the theory of information applicable to a variety of fields.
Among them, communication by means of physical signals is only
one of the many applications. Thus, after the chapter that introduces
the basic concepts of the theory, Cover and Thomas devote the next
chapters of their book to explain how such concepts answer very
different problems.
A concept usually associated with information is the concept of
thermodynamic entropy. The well-known Boltzmann’s equation for
the entropy of a macrostate M,
SB(M) = k logW
128 OLIMPIA LOMBARDI
where k is Boltzmann’s constant, and W is the number of micro-
states compatible with M – is isomorphic with the equation for the
contribution of xi to the entropy H(X) of the variable X. In fact, if the
microstates are equiprobable, the probability of the macrostate M is
1/W. But, which is the relationship between informational entropy
and thermodynamic entropy? In his thoughtful discussion on this
point, Jeffrey Wicken makes a valuable argumentative effort to
stress the difference between thermodynamic entropy and Shannon
entropy as used in communication theory; in this context, he claims
that: “while the Shannon equation is symbolically isomorphic
with the Boltzmann equation, the meanings of the respective
equations bear little in common” (Wicken, 1987, p. 179). This is
certainly true if one adopts the physical interpretation of the concept
of information. But from a syntactic interpretation, Wicken’s claim
loses its original sense, not because both concepts have the same
meaning, but because the concept of information, as a purely
syntactic concept, completely lacks semantic content.
In this field, Cover and Thomas go further by formulating a
version of the Second Law of Thermodynamics in informational
terms. In particular, they explain the increasing of the coarse grained
entropy proposed by Gibbs:
Scg = kPilog1/Pi
where Pi is the probability corresponding of a cell i resulting from
a coarse grained partition of the phase space. Let pn(x) be the prob-
ability distribution over the cells at time tn, and let us suppose that
such a distribution evolves as a Markov chain. Cover and Thomas
(1991, pp. 34–35) demonstrate that the relative entropy between
pn(x) and the uniform stationary distribution p(x) = α – which
represents thermodynamic equilibrium – monotonically decreases
with time. Now, by using the definition of relative entropy (equation
(4.5)), we have:
D(pn//p) = pn(x)logpn(x)/p(x) = pn(x)log1/p(x)
−pn(x)log1/pn(x)
Hence, by equation (4.1):
D(pn//p) = logα − H(Xn)
WHAT IS INFORMATION? 129
Therefore, the monotonic decrease in the relative entropy implies
the monotonic increase in the informational entropy H(Xn), which
here represents the coarse grained thermodynamic entropy Scg.
These results presented by Cover and Thomas show that to accept
the conceptual difference between thermodynamic entropy and
Shannon entropy does not lead to conclude that the syntactic concept
of information is useless in thermodynamics. On the contrary,
Boltzmann’s entropy and coarse grained entropy can be fruitfully
treated by means of the concepts supplied by the syntactic theory of
information.
Another discipline where the syntactic approach to information
shows its applicability is computer science, in particular, the field
of algorithmic complexity. Let X be a finite length binary string; the
algorithmic complexity (Kolmogorov complexity) of X is defined
as:
K(X) = minpl(p)
where p is a Turing machine program that prints X and halts, and
l(p) is the length of p. Then, K(X) is the shortest description length
over all the descriptions supplied by a Turing machine. Intuitively,
a string has maximum algorithmic complexity when the shortest
program that prints it has approximately the same length as such
a string. Cover and Thomas (1991, p. 154) demonstrate that the
expected value of the algorithmic complexity of a sequence X is
close to its informational entropy H(X). Thus, a well-known result
about data compression can be seen in a new light: H(X) is a
lower bound on the average length of the shortest description of the
sequence X; but H(X) is also close to the algorithmic complexity
of X. Therefore, through the concept of informational entropy, the
algorithmic complexity of a sequence becomes a measure of its
incompressibility.
Cover and Thomas extend these results in order to elucidate the
controversial principle of simplicity, according to which, if there
are many explanations consistent with the observed data, one must
choose the simplest one. They demonstrate (1991, pp. 160–161)
that, if p is a program that produces the string X, the probability
of p is 2−l(p); hence, short programs are much more probable than
longer ones. If X is interpreted as the sequence of observed data and
130 OLIMPIA LOMBARDI
p is the explanatory algorithm for such data, then this result can be
used to justify the choice of the shortest -the simplest- explanation of
data. Although one may disagree with this interpretation of Occam’s
Razor, it must be admitted that this is an interesting and precise
elucidation of the concept of simplicity usually invoked in scientific
research.
These are only some examples of the many applications of the
syntactic concept of information. Other fields where the concept is
useful are the generalization of gambling processes, the theory of
optimal investment in the stock market and the computing of error
rates for optimal hypothesis testing. From this syntactic perspective,
communication theory is only an application – of course, a very
important one – of the theory of information: “While it is clear
that Shannon was motivated by problems in communication
theory, we treat information theory as a field of its own with
applications to communication theory and statistics” (Cover and
Thomas, 1991, p. viii). In summary, from the syntactic approach the
concept of information acquires a generality that makes it a powerful
formal tool for science. However, this generality is obtained at
the cost of losing its meaning links with concepts as knowledge
or communication. From this view, the word ‘information’ does
not belong to the language of factual sciences or to ordinary
language: it has no semantic content. The concept of information is a
scientific but completely formal concept, whose ‘meaning’ only has
a syntactic dimension; its generality derives from this exclusively
syntactic nature. Therefore, the theory of information becomes a
mathematical theory, a chapter of the theory of probability: only
when its concepts are semantically interpreted, the theory can be
applied to very different fields.
5. WHAT IS INFORMATION?
From our previous discussion it is clear that there is not a single
answer for this question. We have shown that there are different
concepts of information, each one of them useful for different
purposes. The semantic concept strongly links information to
knowledge: information is essentially something capable of yielding
knowledge; this concept is useful for cognitive and semantic studies.
WHAT IS INFORMATION? 131
The physical concept is the one used in communication theory:
here information is a physical entity that can be generated, trans-
mitted and received for practical purposes. The syntactic concept
is a formal notion with no reference: in this sense, the theory of
information is a mathematical theory, in particular, a chapter of the
theory of probability.
The question is: which are the relationships among these three
concepts? When we talk about three concepts of information we
do not mean that we are facing three rival views, among which
we must choose the correct one. All the three concepts are legiti-
mate when properly used. The relationship between the syntactic
concept and the other two is the relationship between a mathematical
object and its interpretations. The wave equation may represent
the mechanical motion of a material medium or the dynamics of
an electromagnetic wave: both cases share nothing else than their
syntactic structure. Analogously, the informational entropy H(X)
and the mutual information I(X, Y), as syntactic concepts, have
no reference: their syntactic ‘meaning’ is given by the role played
in the mathematical theory to which they belong. But when these
syntactic concepts are interpreted, they acquire referential content.
In the semantic theory of information, the relevant quantities are
not the average quantities but their individual correlates I(si) and
I(si, rj ): when both amounts of semantic information are equal, the
occurrence of the state of affairs rj gives us the knowledge of the
occurrence of the state of affairs si. In communication theory, H(S)
measures the average amount of the physical information generated
at the source S, and this physical information is transmitted to the
receiver by means of a carrier signal. But these are not the only
possible interpretations. In computer science, if X is interpreted as a
finite length binary string, H(X) can be related with the algorithmic
complexity of X. If, in thermodynamics, X is interpreted as a macro-
state compatible with W equiprobable states, H(X) represents the
Boltzmann’s thermodynamic entropy of X; the understanding of the
relationship between the syntactic concept of information and its
interpretations serves to evaluate the usually obscure extrapolations
from communication theory to thermodynamics.
This discussion suggests that there is a severe terminological
problem here. Usually, various meanings are subsumed under the
132 OLIMPIA LOMBARDI
term ‘information’, and many disagreements result from lacking a
terminology precise enough to distinguish the different concepts of
information. Therefore, a terminological cleansing is required in
order to avoid this situation. My own proposal is to use the word
‘information’ only for the physical concept: this option preserves
not only the generally accepted links between information and
knowledge, but also the well-established meaning that the concept
of information has in physical sciences. I think that this terminolo-
gical choice retains the pragmatic dimension of the concept to the
extent that it agrees with the vast majority of the uses of the term.
But, what about the semantic and the syntactic concepts? Perhaps
Dretske’s main goal of applying the concept of information to ques-
tions in the theory of knowledge can be also achieved by means
of the physical concept, without commitments with non-physical
information channels. Regarding the syntactic view, it would be
necessary to find a new name that expresses the purely mathema-
tical nature of the theory, avoiding confusions between the formal
concepts and their interpretations. Of course, this terminological
cleansing is not an easy task, because it entails a struggle against
the imprecise application of a vague notion of information in many
contexts. Nevertheless, this becomes a valuable task when we want
to avoid conceptual confusions and futile disputes regarding the
nature of information.
NOTES
1. Here we work with discrete situations, but the definitions can be extended to
the continuous case (cfr. Cover and Thomas, 1991, pp. 224–225).
2. In his original paper, Shannon (1948, p. 349) discusses the reason for the
choice of a logarithmic function and, in particular, of the logarithm to the
base 2 for measuring information.
3. If the natural logarithm is used, the resulting unit of information is called
‘nat’ – a contraction of natural unit –. If the logarithm to base 10 is used,
then the unit of information is the Hartley. The existence of different units
for measuring information shows the importance of distinguishing between
the amount of information associated with an event and the number of binary
symbols necessary to codify the event.
4. Shannon’s Second Theorem demonstrates that the channel capacity is the
maximum rate at which we can send information over the channel and recover
the information at the receiver with a vanishingly low probability of error
(cfr., for instance, Abramson, 1963, pp. 165–182).
WHAT IS INFORMATION? 133
5. Dretske uses Is(r) for the transinformation and Is(ra) for the new individual
transinformation. We have adapted Dretske’s terminology in order to bring it
closer to the most usual terminology in this field.
6. In fact, the right-hand term of (2.7), when (2.5) and (2.6) are used, is:
ij p(si, rj )I(si, rj ) = ij p(si, rj )[I(si) − E(rj )] = ij p(si, rj )log1/
p(si) − ij p(si, rj )kp(sk/rj )log1/p(sk/rj ). Perhaps, Dretske made the
referred mistake by misusing the subindices of the summations.
7. Dretske says that, in this context, it is not relevant to discuss where the inten-
tional character of laws comes from: “For our purpose it is not important
where natural laws acquire this puzzling property. What is important is
that they have it” (Dretske, 1981, p. 77).
8. I have argued for this view of scientific observation elsewhere (Lombardi,
“Observación e Información”, future publication in Analogia): if we want
that every state of the receiver lets us know which state of the observed
entity occurred, it is necessary that the so-called “backward probabilities”
p(si/rj ) (cfr. Abramson, 1963, p. 99) have the value 0 or 1, and this happens
in an equivocation-free channel. This explains why noise does not prevent
observation: indeed, practical situations usually include noisy channels, and
much technological effort is devoted to design appropriate filters to block the
noise bearing spurious signal. I have also argued that, unlike the informational
account of observation, the causal account does not allow us to recognize
(i) situations observationally equivalent but causally different, and (ii) situ-
ations physically – and, then, causally – identical but informationally different
which, for this reason, represent different cases of observation.
9. The experiment included in the well-known article of Einstein, Podolsky and
Rosen (1935).
10. Note that this kind of situations does not always involve a common cause. In
Dretske’s example of the source transmitting to two receivers, the correlations
between RA and RB can be explained by a common cause at S. But it is usually
accepted the impossibility of explaining quantum EPR-correlations by means
of a common cause argument (cfr. Hughes, 1989). However, in both cases
correlations depend on underlying nomic regularities.
11. This does not mean that Cover and Thomas are absolutely original. For
example, Reza (1961, p. 1) considers information theory as a new chapter
of the theory of probability; however, his presentation of the subject follows
the orthodox way of presentation in terms of communication and signal trans-
mission. An author who adopts a completely syntactic approach is Khinchin
(1957); nevertheless his text is not as rich in applications as the book of Cover
and Thomas and was not so widely used.
12. In the definition of D(p//q), the convention – based on continuity arguments
– that 0 log 0/q = 0 and p log p/0 = ∞ is used. D(p//q) is also referred to
as the ‘distance’ between the distributions p and q; however, it is not a true
distance between distributions since it is not symmetric and does not satisfy
the triangle inequality.
134 OLIMPIA LOMBARDI
REFERENCES
Abramson, N.: 1963, Information Theory and Coding. New York: McGraw-Hill.
Bell, D.A.: 1957, Information Theory and its Engineering Applications. London:
Pitman  Sons.
Brown, H.I.: 1987, Observation and Objectivity. New York/Oxford: Oxford
University Press.
Cover, T. and J.A. Thomas: 1991, Elements of Information Theory. New York:
John Wiley  Sons.
Dennet, D.C.: 1969, Content and Conciousness. London: Routledge  Kegan
Paul.
Dretske, R.: 1981, Knowledge and the Flow of Information. Cambridge, MA: MIT
Press.
Einstein, A., B. Podolsky and N. Rosen: 1935, Can Quantum-Mechanical
Description of Physical Reality be Considered Complete? Physical Review 47:
777–780.
Hughes, R.I.G.: 1989, The Structure and Interpretation of Quantum Mechanics.
Cambridge, MA: Harvard University Press.
Jammer, M.: 1974, The Philosophy of Quantum Mechanics. New York: John
Wiley  Sons.
Khinchin, A.I.: 1957, Mathematical Foundations of Information Theory. New
York: Dover Publications.
Kosso, P.: 1989, Observability and Observation in Physical Science. Dordrecht:
Kluwer Academic Publishers.
Reza, F.M.: 1961, Introduction to Information Theory. New York: McGraw-Hill.
Shannon, C.: 1948, The Mathematical Theory of Communication. Bell System
Technical Journal 27: 379–423.
Shapere, D.: 1982, The Concept of Observation in Science and Philosophy.
Philosophy of Science 49: 485–525.
Stonier, T.: 1990, Information and the Internal Structure of the Universe. London:
Springer-Verlag.
Wicken, J.S.: 1987, Entropy and Information: Suggestions for Common
Language. Philosophy of Science 54: 176–193.
University Nacional de Quilmes-CONICET
Crisólogo Larralde 3440
6◦D, 1430, Ciudad de Buenos Aires
Argentina
E-mail: olimpiafilo@arnet.com.ar
View publication stats

More Related Content

Similar to What_is_Information.pdf

Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilaritySaswat Padhi
 
Introduction to Information Theory and Coding.pdf
Introduction to Information Theory and Coding.pdfIntroduction to Information Theory and Coding.pdf
Introduction to Information Theory and Coding.pdf
Jimma University
 
What's at Stake in the Information Debate?
What's at Stake in the Information Debate?What's at Stake in the Information Debate?
What's at Stake in the Information Debate?
Craig Simon
 
Privacy Things: Systematic Approach to Privacy and Personal Identifiable Info...
Privacy Things: Systematic Approach to Privacy and Personal Identifiable Info...Privacy Things: Systematic Approach to Privacy and Personal Identifiable Info...
Privacy Things: Systematic Approach to Privacy and Personal Identifiable Info...
IJCSIS Research Publications
 
General introduction to logic
General introduction to logicGeneral introduction to logic
General introduction to logic
MUHAMMAD RIAZ
 
Measuring Social Complexity and the Emergence of Cooperation from Entropic Pr...
Measuring Social Complexity and the Emergence of Cooperation from Entropic Pr...Measuring Social Complexity and the Emergence of Cooperation from Entropic Pr...
Measuring Social Complexity and the Emergence of Cooperation from Entropic Pr...
IJEAB
 
Allerton
AllertonAllerton
Allerton
mustafa sarac
 
Transmission Of Multimedia Data Over Wireless Ad-Hoc Networks
Transmission Of Multimedia Data Over Wireless Ad-Hoc NetworksTransmission Of Multimedia Data Over Wireless Ad-Hoc Networks
Transmission Of Multimedia Data Over Wireless Ad-Hoc Networks
Jan Champagne
 
Visual mapping sentence_a_methodological (1)
Visual mapping sentence_a_methodological (1)Visual mapping sentence_a_methodological (1)
Visual mapping sentence_a_methodological (1)
Natalia Djahi
 
Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04Rinke Hoekstra
 
INFORMATION_THEORY.pdf
INFORMATION_THEORY.pdfINFORMATION_THEORY.pdf
INFORMATION_THEORY.pdf
temmy7
 
Unit I.pptx INTRODUCTION TO DIGITAL COMMUNICATION
Unit I.pptx INTRODUCTION TO DIGITAL COMMUNICATIONUnit I.pptx INTRODUCTION TO DIGITAL COMMUNICATION
Unit I.pptx INTRODUCTION TO DIGITAL COMMUNICATION
rubini Rubini
 
A model theory of induction
A model theory of inductionA model theory of induction
A model theory of induction
Duwan Arismendy
 
AI3391 Artificial intelligence Unit IV Notes _ merged.pdf
AI3391 Artificial intelligence Unit IV Notes _ merged.pdfAI3391 Artificial intelligence Unit IV Notes _ merged.pdf
AI3391 Artificial intelligence Unit IV Notes _ merged.pdf
Asst.prof M.Gokilavani
 
Information Theory - Introduction
Information Theory  -  IntroductionInformation Theory  -  Introduction
Information Theory - Introduction
Burdwan University
 
The laboratoryandthemarketinee bookchapter10pdf_merged
The laboratoryandthemarketinee bookchapter10pdf_mergedThe laboratoryandthemarketinee bookchapter10pdf_merged
The laboratoryandthemarketinee bookchapter10pdf_merged
JeenaDC
 
Information theory & coding PPT Full Syllabus.pptx
Information theory & coding PPT Full Syllabus.pptxInformation theory & coding PPT Full Syllabus.pptx
Information theory & coding PPT Full Syllabus.pptx
prernaguptaec
 
UNIT-2.pdf
UNIT-2.pdfUNIT-2.pdf
Linked sensor data
Linked sensor dataLinked sensor data
Linked sensor data
Tamiris Sousa
 
A PHYSICAL THEORY OF INFORMATION VS. A MATHEMATICAL THEORY OF COMMUNICATION
A PHYSICAL THEORY OF INFORMATION VS. A MATHEMATICAL THEORY OF COMMUNICATIONA PHYSICAL THEORY OF INFORMATION VS. A MATHEMATICAL THEORY OF COMMUNICATION
A PHYSICAL THEORY OF INFORMATION VS. A MATHEMATICAL THEORY OF COMMUNICATION
ijistjournal
 

Similar to What_is_Information.pdf (20)

Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic Similarity
 
Introduction to Information Theory and Coding.pdf
Introduction to Information Theory and Coding.pdfIntroduction to Information Theory and Coding.pdf
Introduction to Information Theory and Coding.pdf
 
What's at Stake in the Information Debate?
What's at Stake in the Information Debate?What's at Stake in the Information Debate?
What's at Stake in the Information Debate?
 
Privacy Things: Systematic Approach to Privacy and Personal Identifiable Info...
Privacy Things: Systematic Approach to Privacy and Personal Identifiable Info...Privacy Things: Systematic Approach to Privacy and Personal Identifiable Info...
Privacy Things: Systematic Approach to Privacy and Personal Identifiable Info...
 
General introduction to logic
General introduction to logicGeneral introduction to logic
General introduction to logic
 
Measuring Social Complexity and the Emergence of Cooperation from Entropic Pr...
Measuring Social Complexity and the Emergence of Cooperation from Entropic Pr...Measuring Social Complexity and the Emergence of Cooperation from Entropic Pr...
Measuring Social Complexity and the Emergence of Cooperation from Entropic Pr...
 
Allerton
AllertonAllerton
Allerton
 
Transmission Of Multimedia Data Over Wireless Ad-Hoc Networks
Transmission Of Multimedia Data Over Wireless Ad-Hoc NetworksTransmission Of Multimedia Data Over Wireless Ad-Hoc Networks
Transmission Of Multimedia Data Over Wireless Ad-Hoc Networks
 
Visual mapping sentence_a_methodological (1)
Visual mapping sentence_a_methodological (1)Visual mapping sentence_a_methodological (1)
Visual mapping sentence_a_methodological (1)
 
Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04
 
INFORMATION_THEORY.pdf
INFORMATION_THEORY.pdfINFORMATION_THEORY.pdf
INFORMATION_THEORY.pdf
 
Unit I.pptx INTRODUCTION TO DIGITAL COMMUNICATION
Unit I.pptx INTRODUCTION TO DIGITAL COMMUNICATIONUnit I.pptx INTRODUCTION TO DIGITAL COMMUNICATION
Unit I.pptx INTRODUCTION TO DIGITAL COMMUNICATION
 
A model theory of induction
A model theory of inductionA model theory of induction
A model theory of induction
 
AI3391 Artificial intelligence Unit IV Notes _ merged.pdf
AI3391 Artificial intelligence Unit IV Notes _ merged.pdfAI3391 Artificial intelligence Unit IV Notes _ merged.pdf
AI3391 Artificial intelligence Unit IV Notes _ merged.pdf
 
Information Theory - Introduction
Information Theory  -  IntroductionInformation Theory  -  Introduction
Information Theory - Introduction
 
The laboratoryandthemarketinee bookchapter10pdf_merged
The laboratoryandthemarketinee bookchapter10pdf_mergedThe laboratoryandthemarketinee bookchapter10pdf_merged
The laboratoryandthemarketinee bookchapter10pdf_merged
 
Information theory & coding PPT Full Syllabus.pptx
Information theory & coding PPT Full Syllabus.pptxInformation theory & coding PPT Full Syllabus.pptx
Information theory & coding PPT Full Syllabus.pptx
 
UNIT-2.pdf
UNIT-2.pdfUNIT-2.pdf
UNIT-2.pdf
 
Linked sensor data
Linked sensor dataLinked sensor data
Linked sensor data
 
A PHYSICAL THEORY OF INFORMATION VS. A MATHEMATICAL THEORY OF COMMUNICATION
A PHYSICAL THEORY OF INFORMATION VS. A MATHEMATICAL THEORY OF COMMUNICATIONA PHYSICAL THEORY OF INFORMATION VS. A MATHEMATICAL THEORY OF COMMUNICATION
A PHYSICAL THEORY OF INFORMATION VS. A MATHEMATICAL THEORY OF COMMUNICATION
 

Recently uploaded

PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
Bisnar Chase Personal Injury Attorneys
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
ak6969907
 
Assignment_4_ArianaBusciglio Marvel(1).docx
Assignment_4_ArianaBusciglio Marvel(1).docxAssignment_4_ArianaBusciglio Marvel(1).docx
Assignment_4_ArianaBusciglio Marvel(1).docx
ArianaBusciglio
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
Landownership in the Philippines under the Americans-2-pptx.pptx
Landownership in the Philippines under the Americans-2-pptx.pptxLandownership in the Philippines under the Americans-2-pptx.pptx
Landownership in the Philippines under the Americans-2-pptx.pptx
JezreelCabil2
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
ArianaBusciglio
 

Recently uploaded (20)

PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
 
Assignment_4_ArianaBusciglio Marvel(1).docx
Assignment_4_ArianaBusciglio Marvel(1).docxAssignment_4_ArianaBusciglio Marvel(1).docx
Assignment_4_ArianaBusciglio Marvel(1).docx
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
Landownership in the Philippines under the Americans-2-pptx.pptx
Landownership in the Philippines under the Americans-2-pptx.pptxLandownership in the Philippines under the Americans-2-pptx.pptx
Landownership in the Philippines under the Americans-2-pptx.pptx
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
 

What_is_Information.pdf

  • 1. See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/226335374 What is Information? Article in Foundations of Science · January 2004 DOI: 10.1023/B:FODA.0000025034.53313.7c CITATIONS 23 READS 49,598 1 author: Some of the authors of this publication are also working on these related projects: VII Conference On Quantum Foundations - Website: https://sites.google.com/site/viijornadasfundamentoscuantica/ View project Problemas filosóficos de la dinámica científica View project Olimpia Lombardi CONICET-National Scientific and Technical Research Council 161 PUBLICATIONS 1,856 CITATIONS SEE PROFILE All content following this page was uploaded by Olimpia Lombardi on 13 March 2014. The user has requested enhancement of the downloaded file.
  • 2. OLIMPIA LOMBARDI WHAT IS INFORMATION? ABSTRACT. The main aim of this work is to contribute to the elucidation of the concept of information by comparing three different views about this matter: the view of Fred Dretske’s semantic theory of information, the perspective adopted by Peter Kosso in his interaction-information account of scientific observation, and the syntactic approach of Thomas Cover and Joy Thomas. We will see that these views involve very different concepts of information, each one useful in its own field of application. This comparison will allow us to argue in favor of a termino- logical ‘cleansing’: it is necessary to make a terminological distinction among the different concepts of information, in order to avoid conceptual confusions when the word ‘information’ is used to elucidate related concepts as knowledge, observation or entropy. KEY WORDS: communication, information, knowledge, observation, prob- ability ‘We live in the age of information’: this sentence has become a commonplace of our times. Our everyday language includes the word ‘information’ in a variety of different contexts. It seems that we all precisely know what information is. Moreover, the explosion in telecommunications and computer sciences endows the concept of information with a scientific prestige that makes supposedly unnecessary any further explanation. This apparent self-evidence has entered the philosophical literature: philosophers usually handle the concept of information with no careful discussion. However, the understanding of the meaning of the word ‘infor- mation’ is far from being so simple. The supposed agreement hides the fact that many different senses of the same word coexist. The flexibility of the concept of information makes it a diffuse notion with a wide but vague application. If we ask: ‘What is information?’, we will obtain as many different definitions as answers. The main aim of this work is to contribute to the elucida- tion of the concept of information by comparing three different views about this matter: the view of Fred Dretske’s semantic theory Foundations of Science 9: 105–134, 2004. © 2004 Kluwer Academic Publishers. Printed in the Netherlands.
  • 3. 106 OLIMPIA LOMBARDI of information, the perspective adopted by Peter Kosso in his interaction-information account of scientific observation, and the syntactic approach of Thomas Cover and Joy Thomas. We will see that these views involve very different concepts of information, each one useful in its own field of application. This comparison will allow us to argue in favor of a terminological ‘cleansing’: it is necessary to make a terminological distinction among the different concepts of information, in order to avoid conceptual confusions when the word ‘information’ is used to elucidate related concepts as knowledge, observation or entropy. 1. BASIC CONCEPTS OF SHANNON’S THEORY We will begin our discussion by presenting some basic concepts of Shannon’s theory, because all the three views that we will analyze adopt this theory as their formal basis. The concepts introduced in this section will be necessary for our further discussion. The Theory of Information was formulated to solve certain specific technological problems. In the early 1940s, it was thought that increasing the transmission rate of information over a communi- cation channel would increase the probability of error. With his paper “The Mathematical Theory of Communication”, Claude Shannon (1948) surprised the communication theory community by proving that this was not true as long as the communication rate was below the channel capacity; this capacity can be easily computed from the characteristics of the channel. This paper was immediately followed by many works of application to fields as radio, television and telephony. At present, Shannon’s theory has become a basic element of the communication engineers training. Communication requires a source S, a receiver R and a channel CH: If S has a range of possible states s1, . . ., sn, whose probabilities of occurrence are p(s1), . . ., p(sn), the amount of information generated at the source by the occurrence of si is:1
  • 4. WHAT IS INFORMATION? 107 I(si) = log1/p(si) (1.1) Where ‘log’ is the logarithm to the base 2.2 The choice of a logarithmic base amounts to a choice of a unit for measuring infor- mation. If the base 2 is used, the resulting unit is called ‘bit’ – a contraction of binary unit –.3 One bit is the amount of infor- mation obtained when one of two equally likely alternatives is specified. However, the theory is not concerned with the occurrence of specific events, but with the communication process as a whole. Hence, the average amount of information generated at the source is defined as the average of the I(si) weighted by the probability of occurrence of each state: I(S) = p(si)I(si) = p(si)log1/p(si) (1.2) I(S) has its maximum value equal to log n, when all the p(si) have the same value, p(si) = 1/n. By analogy, if R has a range of possible states r1, . . ., rm, whose probabilities of occurrence are p(r1), . . ., p(rm), the amount of information received at the receiver by the occurrence of ri is: I(ri) = log1/p(ri) (1.3) And the average amount of information received at the receiver is defined as: I(R) = p(ri)I(ri) = p(ri)log1/p(ri) (1.4) The relationship between I(S) and I(R) can be represented by the following diagram: where:
  • 5. 108 OLIMPIA LOMBARDI • I(S, R) transinformation. Average amount of information generated at S and received at R. • E: equivocation. Average amount of information generated at S but not received at R. • N: noise. Average amount of information received at R but not generated at S. As the diagram shows, I(S, R) can be computed as: I(S, R) = I(S) − E = I(R) − N (1.5) E and N are measures of the amount of dependence between the source S and de receiver R: • If S and R are totally independent, the values of E and N are maximum (E = I(S) and N = I(R)), and the value of I(S, R) is minimum (I(S, R) = 0). • If the dependence between S and R is maximum, the values of E and N are minimum (E = N = 0), and the value of I(S, R) is maximum (I(S, R) = I(S) = I(R)). The values of E and N are not only function of the source and the receiver, but also of the communication channel. The introduction of the communication channel leads directly to the possibility of errors arising in the process of transmission: the channel CH is defined by the matrix [p(rj /si)], where p(rj /si) is the conditional probability of the occurrence of rj given that si occurred, and the elements in any row must sum to 1. Thus, the definitions of E and N are: E = p(rj )p(si/rj )log1/p(si/rj ) (1.6) = p(rj , si)log1/p(si/rj ) N = p(si)p(rj /si)log1/p(rj /si) (1.7) = p(si, rj )log1/p(rj /si) where p(si, rj ) = p(rj , si) is the joint probability of si and rj (p(si, rj ) = p(si)p(rj /si); p(rj , si) = p(rj )p(si/rj)). The channel capacity is given by: C = max I(S, R) (1.8) where the maximum is taken over all the possible distributions p(si) at the source.4
  • 6. WHAT IS INFORMATION? 109 The strong relationship between the characteristics of the channel and the values of E and N allows us to define two special types of communication channels: • Equivocation-free channel (E = 0): a channel defined by a matrix with one, and only one, non-zero element in each column. • Noise-free channel (N = 0): a channel defined by a matrix with one, and only one, non-zero element in each row. 2. INFORMATION AND KNOWLEDGE: DRETSKE’S SEMANTIC THEORY OF INFORMATION A concept usually connected with the notion of information is the concept of knowledge: it is assumed that information provides knowledge, that it modifies the state of knowledge of those who receive it. Some authors even define the measure of information in terms of knowledge; this is the case of D.A. Bell in his well-known textbook, where he states that information “is measured as a differ- ence between the state of knowledge of the recipient before and after the communication of information” (Bell, 1957, p. 7). In his Knowledge and the Flow of Information, Fred Dretske presents an attempt to apply a semantic concept of information to questions in the theory of knowledge. By identifying knowledge and information-caused belief, he distinguishes between sensory processes and cognitive processes – between seeing and recognizing – in terms of the different ways in which information is coded, and analyzes the capacity of holding beliefs and developing concepts. But this is not the part of his work with which we are concerned; we are interested on his interpretation of the concept of informa- tion. Dretske adopts Shannon’s theory as a starting point, but he introduces two new elements into his approach. First, he proposes a change in the basic formulas of the theory. Second, he supplements the resulting formal theory with a semantic dimension. Let us begin with the first point. According to Dretske, one of the respects in which Shannon’s theory is unprepared to deal with semantic issues is that semantic notions apply to particular messages, while the theory of informa- tion deals with average amounts of information. Since Dretske is
  • 7. 110 OLIMPIA LOMBARDI concerned with seeking an information-based theory of knowledge, he is interested on the informational content of particular messages and not on average amounts of information: “if information theory is to tell us anything about the informational content of signals, it must forsake its concern with averages and tell us something about the information contained in particular messages and signals. For it is only particular messages and signals that have a content” (Dretske, 1981, p. 48). In order to focus on the infor- mation contained in particular messages, Dretske changes the usual interpretation about the relevant quantities of the theory: instead of considering the average amount of information I(S) as the basic quantity (equation (1.2)), he focuses on the amount of information generated at the source by the occurrence of sa (equation (1.1)): I(sa) = log1/p(sa) (2.1) and instead of adopting the transinformation I(S, R) as the relevant quantity, he defines a new ‘individual’ transinformation I(sa, ra), amount of information carried by a particular signal ra about sa, by analogy with equation (1.5) (Dretske, 1981, p. 52):5 I(sa, ra) = I(sa) − E(ra) (2.2) where: E(ra) = p(si/ra)log1/p(si/ra) (2.3) According to Dretske (1981, p. 24), E(ra) is the contribution of ra to the equivocation E because, given the definition of E (equation (1.6)), results that: E = p(rj )p(si/rj )log1/p(si/rj ) = p(rj )E(rj ) (2.4) Dretske foresees that he will be accused of misrepresenting or misunderstanding the theory of information. For this reason, he emphasizes that “the above formulas are now be assigned a significance, given an interpretation, that they do not have in standard applications of communication theory. They are now being used to define the amount of information associated with particular events and signals” (Dretske, 1981, p. 52). And he immediately adds that, even though such an interpretation is foreign
  • 8. WHAT IS INFORMATION? 111 to standard applications of the theory, it is “perfectly consistent” with the orthodox uses of these formulas. Dretske’s aim of adapting the standard theory of information to make it capable of dealing with the information contained in particular messages is very valuable. The problem is that the formal resources to reach this goal have deep technical difficulties. The least of them is talking about the ‘signal ra’ in the definition of I(sa, ra) (equation (2.2)): ra is not a signal but one of the states of the receiver. I(sa, ra) should be defined as the amount of infor- mation about the state sa of the source contained in the state ra of the receiver. It is even more troubling that Dretske uses the same subindex ‘a’ to refer to the state of the source and to the state of the receiver, as if some specific relationship linked certain pairs (s, r). In order to make the definition of the new individual transinformation (equation (2.2)) completely general, I(si, rj ) should be defined as the amount of information about the state si of S received at R through the occurrence of its state rj : I(si, rj) = I(si) − E(rj ) (2.5) where E(rj ) would be (by analogy with equation (2.3)): E(rj ) = p(si/rj )log1/p(si/rj ) (2.6) However, we have not reached the central difficulty yet. When Dretske’s proposal is formally ‘cleaned’ in this way, its main tech- nical problem shows up. If – as Dretske supposes – I(si, rj ) were the ‘individual’ correlate of the transinformation I(S, R), then I(S, R) should be computed as the average of the I(si, rj ). According to the definition of the average of a function of two variables: I(S, R) = p(si, rj )I(si, rj) (2.7) where I(S, R) = I(S) − E (equation (1.5)), with the standard defini- tions of I(S) and E (equations (1.2) and (1.6)). The technical problem is that the identity (2.7) does not hold with Dretske’s formulas. Indeed, a simple algebraic argument shows that we cannot obtain: I(S, R) = I(S) − E = p(si)log1/p(si) (2.8) −p(rj , si)log1/p(si/rj )
  • 9. 112 OLIMPIA LOMBARDI from the right-hand term of (2.7) when (2.5) and (2.6) are used.6 Therefore, we cannot accept Dretske’s response to the critics who accuse him of misunderstanding Shannon’s theory: his ‘interpre- tation’ of the formulas by means of these new quantities is not compatible with the formal structure of the theory. It might be argued that this is a minor formal detail, but this detail has relevant conceptual consequences. When Dretske defines E(rj ), that is, the contribution of rj to the equivocation E as a summation over the si (equation (2.6)), he makes the error of supposing that this individual contribution is only function of the particular state rj of the receiver. But the equivocation E is a magnitude that essentially depends on the communication channel. Then, any individual contribution to E must preserve such a depend- ence. The understanding of this conceptual point allows us to retain Dretske’s proposal by appropriately correcting his formal approach. In order to introduce such a correction, we must define the individual contribution of the pair (si, rj ) to the equivocation E as: E(si, rj ) = log1/p(si/rj ) (2.9) With this definition, it holds that the average of E(si, rj ) is equal to E: E = p(rj , si)log1/p(si/rj ) = p(rj , si)E(si, rj ) (2.10) Now we can correctly rewrite equation (2.5) as: I(si, rj) = I(si) − E(si, rj) (2.11) where the average of I(si, rj ) is the transinformation I(S, R): I(S, R) = I(S) − E = p(si, rj )I(si, rj ) (2.12) This modified version of the formulas makes possible to reach Dretske’s goal, that is, to adapt the standard theory of information to deal with the information contained in particular messages. We can now return to Dretske’s argument. When does the occurrence of the state rj at the receiver give us the knowledge of the occurrence of the state si at the source? The occurrence of the state rj tells us that si has occurred when the amount of information I(si, rj ) is
  • 10. WHAT IS INFORMATION? 113 equal to the amount of information I(si) generated at the source by the occurrence of si. This means that there was no loss of informa- tion through the individual communication, that is, the value of the individual contribution E(si, rj ) to the equivocation is zero (Dretske, 1981, p. 55); according to equation (2.11): E(si, rj ) = 0 ⇒ I(si, rj ) = I(si) But, now, the value of E(si, rj ) must be obtained with the correct formula (2.9). At this point, it should be emphasized again that, contrary to Dretske’s assumption, the individual contribution to the equivocation is function of the communication channel and not only of the receiver. In other words, it is not the state rj which individu- ally contributes to the equivocation E but the pair (si, rj ), with its associated probabilities p(si) and p(rj ), and the corresponding conditional probability p(rj /si) of the channel. This means that we can get completely reliable information – we can get knowledge – about the source even through a very low probable state of the receiver, provided that the channel is appropriately designed. But Dretske does not stop here. In spite of having begun from the formal theory of information, he immediately reminds us Shannon’s remark: “[the] semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages” (Shannon, 1948, p. 379). Shannon’s theory is purely quantitative: it only deals with amounts of information, but ignores questions related to informational content. The main contribution of Dretske is his semantic theory of information, which tries to capture what he considers the nuclear sense of the term ‘informa- tion’: “A state of affairs contains information about X to just that extent to which a suitable placed observer could learn some- thing about X by consulting it” (Dretske, 1981, p. 45). Dretske defines the informational content of a state r in the following terms (p. 65): A state r carries the information that S is F = The condi- tional probability of S’s being F, given r (and k), is 1 (but, given k alone, less than 1). where k stands for what the receiver already knows about the possibilities existing at the source.
  • 11. 114 OLIMPIA LOMBARDI However, unlike what may be supposed, the semantic character of this proposal does not rely on such a definition of informational content. Of course, this definition cannot be stated in terms of the original Shannon’s theory, because it only deals with average amounts of information. But it can be formulated with the new quantities referred to the amount of information contained in partic- ular states. In fact, the concept of informational content can be – more precisely – defined as follows: A state rB of the receiver contains the information about the occurrence of the state sA of the source iff p(sA/rB) = 1 but p(sA) 1, given the knowledge of the probability distribution over the possible states of the source. where sA stands for S’s being F. If the right formulas are used, we can guarantee that: • If p(sA) 1, then I(sA) 0 (equation (2.1)), that is, there is a positive amount of information generated at the source by the occurrence of sA. • If p(sA/rB) = 1, then E(sA, rB) = 0 (equation (2.9)), that is, the individual contribution of the pair (sA, rB) to the equivocation E is zero. And if E(sA, rB) = 0, then I(sA, rB) = I(sA) (equation (2.11)). In other words, the definition says that rB contains the information about the occurrence of sA iff the amount of information about sA received through the occurrence of rB is equal to the positive amount of information generated by the occurrence of sA. Dretske tries to express a similar idea when he says: “if the conditional probability of S’s being F (given r) is 1, then the equivocation of this signal must be 0 and (in accordance with formula 1.5) the signal must carry as much information about S, I(S, R), as is generated by S’s being F, I(sF)” (Dretske, 1981, p. 65), where his formula 1.5 is I(S, R) = I(S) − E. The problem is that this is wrong: p(sA/rB) = 1 does not implies that E = 0 and I(S, R) = I(S) (see equation (1.6)). Why does he uses these formulas, which refer to average amounts of information, instead of using the new formulas which refer to the amount of information contained in particular messages, for which necessity he has strongly argued? The reason relies again on his formal error: with his definition of E(rB) (equa-
  • 12. WHAT IS INFORMATION? 115 tion (2.6)), p(sA/rB) = 1 does not make the individual contribution to the equivocation E equal to zero and, then, he cannot guarantee that I(sA, rB) = I(sA). Only when the new formulas are properly corrected, the idea roughly expressed by Dretske can be stated with precision. In summary, Dretske’s definition of informational content says nothing that cannot be said in terms of the theory of informa- tion adapted, in the right way, to deal with particular amounts of information. If the semantic character of Dretske’s proposal is not based on the definition of informational content, where is it based on? For Dretske, information qualifies as a semantic concept in virtue of the intentionality inherent in its transmission. And the ultimate source of this intentionality is the nomic character of the regular- ities on which the transmission of information depends. The channel probabilities p(rj /si) do not represent a set of mere de facto correla- tions; they are determined by a network of lawful connections between the states of the source and the states of the receiver: “The conditional probabilities used to compute noise, equivo- cation, and amount of transmitted information (and therefore the conditional probabilities defining the informational content of the signal) are all determined by the lawful relations that exist between source and signal. Correlations are irrelevant unless these correlations are a symptom of lawful connections” (Dretske, 1981, p. 77). It is true that, in many technological applica- tions of information theory, statistical data are used to determine the relevant probabilities. Nevertheless, even in these cases it is assumed that the statistical correlations are not accidental, but manifesta- tions of underlying lawful regularities. Indeed, there is normally an elaborate body of theory that stands behind the attributions of probabilities. In short, the source of the semantic character of infor- mation is its intentionality; and information inherits its intentional properties from the lawful regularities on which it depends.7 Dretske emphasizes the semantic character of information because it is precisely this character what relates information to knowledge. Even if the properties F and G are perfectly correlated – whatever is F is G and vice-versa –, this does not assure us that we can know that ‘x is G’ by knowing that ‘x is F’. If the correlation between F and G is a mere coincidence, there is no information
  • 13. 116 OLIMPIA LOMBARDI in x’ being F about x’ being G; the first fact tells us nothing about the second one. In other words, the mere correlations, and even the exceptionless accidental uniformity, do not supply knowledge. This fact about information explains why we are sometimes in a position to know that x is F without being able to tell whether x is G, despite the fact that every F is G. Only on the basis of this semantic dimension of information we can affirm that “informa- tion is a commodity that, given the right recipient, is capable of yielding knowledge” (Dretske, 1981, p. 47). Dretske’s definition of informational content also refers to what the receiver already knows about the possibilities existing at the source (k). In formal terms, we can say that the amount of infor- mation about the occurrence of the state sA received through the occurrence of the state rB, I(sA, rB), depends not only on the communication channel but also on the characteristics of the source S: in particular, it is a function of the probabilities p(s1), . . . , p(sn). But the definition of the source is not absolute; on the contrary, it depends on the knowledge about the source available at the receiver end before the transmission. In other words, the background knowl- edge is relevant to the received information only to the extent that it affects the value of the amount of information generated at the source by the occurrence of a specific state. This fact implies a rela- tivization of the informational content with respect to the knowledge available before the transmission. Usually, the relative character of information is not explicitly considered in technical books about the subject, where the background knowledge is tacitly taken into account in the definition of the source; an exception is Bell, who admits that “the datum point of information is then the whole body of knowledge possessed at the receiving end before the communication” (Bell, 1957, p. 7). However, the relative character of information does not make the concept less objective or not amenable to precise quantification. In this sense, Dretske disagrees with Daniel Dennet, who claims that “the information received by people when they are spoken to depends on what they already know and is not amenable to precise quantification” (Dennet, 1969, p. 187). He follows Dennet in relativizing the information contained in a message but, unlike Dennet, he correctly asserts that this does not mean that we cannot
  • 14. WHAT IS INFORMATION? 117 precisely quantify such information: if the knowledge available at the receiver end is accurately determined, the received information can be quantified with precision. Dretske also stresses the fact that the background knowledge must not be conceived as a subjective factor, but rather as a frame of reference regarding to which infor- mation is defined. The relative character of objective magnitudes is usual in sciences; in this sense, information is not different from velocity or simultaneity: only when there is a shift of reference systems does the need arise to make explicit the relative nature of the quantity under consideration; but such relativity does not mean non-objectivity. From Dretske’s viewpoint, only to the extent that information is conceived as an objective magnitude, the concept of information can be fruitfully applied to questions in the theory of knowledge. 3. KOSSO’S INTERACTION-INFORMATION ACCOUNT OF SCIENTIFIC OBSERVATION During the last decades, some authors have abandoned the linguistic approach to the problem of scientific observation – an approach shared by the positivistic tradition and by its anti-positivistic critics –, to focus on the study of the observational instruments and methods used in natural sciences. From this perspective, Dudley Shapere, Harold Brown and Peter Kosso agree in their attempt to elucidate the scientific use of the term ‘observable’ by means of the concept of information. Thus, Shapere proposes the following analysis: “x is directly observed (observable) if: (i) information is received (can be received) by an appropriate receptor; and (ii) that information is (can be) transmitted directly, i.e., without interference, to the receptor from the entity x (which is the source of information)” (Shapere, 1988, p. 492). In turn, Brown defines observation in science in the following way: “To observe an item I is to gain information about I from the examination of another item I*, where I* is an item that we (epistemically) see and I is a member of the causal chain that produced I*” (Brown, 1987, p. 93). By using a terminology more familiar to physicists, Kosso replaces the idea of causal chain by the idea of interaction; but again, the concept of information becomes central when he
  • 15. 118 OLIMPIA LOMBARDI defines: “The ordered pair object x, property P is observ- able to the extent that there can be an interaction (or a chain of interactions) between x and an observing apparatus such that the information ‘that x is P’ is transmitted to the apparatus and eventually conveyed to a human scientist” (Kosso, 1989, p. 32). But, how is ‘information’ interpreted in this context? Shapere and Brown do not explain its meaning, as if the concept of information lacked interpretative difficulties. On the contrary, Kosso admits that the concept requires further discussion: “a lot hangs on the notion of information, and it is only by clarifying this that a full under- standing of the observing apparatus is made clear and that the necessary condition of interaction is augmented with sufficient, epistemic conditions” (Kosso, 1989, pp. 35–36). For this purpose, he follows Dretske’s semantic theory of information, and adapts it for elucidating scientific observation: he seeks a concept of infor- mation that makes room for observation of what is already known. Then, Kosso introduces the following modification: he calls ‘new information’ what Dretske calls ‘information’, and adds the descrip- tion of redundant information for the case in which the observed fact ‘x is P’ is included in k, that is, in the body of the previous knowledge about the source. With this conceptual framework, Kosso analyzes several examples of observation in physical sciences, classifying them in examples of entities that are unobservable in principle – when the physical theory which describes the entity explicitly precludes its being observed –, examples of unperceivable entities – which can interact with some non-human device but cannot interact with a human sense organ – and examples of perceivable entities – which can interact in an informational way with the human being –. Never- theless, here we are not interested on this part of Kosso’s study, but on the concept of information that lies behind his interaction- information account of observation in physical sciences. Kosso explicitly says that he borrows heavily from Dretske’s semantic theory, as if his work were an application of the semantic concept of information provided by Dretske. However, a careful examination of both perspectives shows up some differences between them. The first difference is that, unlike Dretske, Kosso does not exploit the formal resources of the information theory. This is clear when,
  • 16. WHAT IS INFORMATION? 119 following Dretske, he stresses that it is a mistake to simply identify the flow of information with a propagation of a causal influence. His argument proposes a case where the state s of the source causes the state a of the receiver, but a can also be caused by another state s of the source; in this case, the occurrence of a at the receiver does not distinguish between the possible states s and s and, then, does not allow us to know which was the state of the source. On this basis, Kosso concludes: “causal interaction is not sufficient for the conveyance of information” (Kosso, 1989, p. 38). But this idea can be precisely expressed with the formal theory: if Kosso’s example is formally represented, we can see that the individual contribution E(s, a) to the equivocation is not zero and, then, the amount of information I(s) generated at the source by the occur- rence of s is not equal to the individual transinformation I(s, a) (equation (2.11)). This means that, even if the occurrence of a tells us that either s or s have occurred, it does not give us the knowledge of which of both states occurred at the source; but it is precisely such knowledge what we need for making a scientific observation. In other words, if – following Dretske – Kosso used the formal resources of the theory of information, he could formu- late the concept of scientific observation in more precise terms. In fact, from an informational approach to scientific observation we can characterize observation in science as a process of transmission of information from the observed entity to the receiver through an equivocation-free channel. This characterization allows us to under- stand why noise does not prevent observation and to argue for the conceptual advantages of the informational account of observation over the causal account. However, these matters are beyond the purposes of this paper.8 The second difference between Kosso’s and Dretske’s views is a conceptual divergence about the very nature of information. Although Dretske claims that the communication channel is defined by a network of nomic connections between the states of the source and the states of the receiver, he explicitly declares that a physical link between source and receiver is not necessary for the transmis- sion of information. In this sense, he considers the following case (Dretske, 1981, pp. 38–39):
  • 17. 120 OLIMPIA LOMBARDI A source S is transmitting information to both receivers RA and RB via some physical channel. RA and RB are isolated from one another in the sense that there is no physical interaction between them. But Dretske considers that, even though RA and RB are physically isolated from one another, there is an informational link between them. According to Dretske, it is correct to say that there is a communication channel between RA and RB because it is possible to learn something about RB by looking at RA and viceversa. Nothing at RA causes anything at RB or viceversa; yet RA contains informa- tion about RB and RB about RA. Dretske stresses the fact that the correlations between the events occurring at both receivers are not accidental, but they are functions of the common nomic dependen- cies of RA and RB on S. However, for him this is an example of an informational link between two points, despite the absence of a physical channel between them. Dretske adds that the receiver RB may be farther from the source than RA and, then, the events at RB may occur later in time than those at RA, but this is irrelevant for evaluating the informational link between them: even though the events at RB occur later, RA carries information about what will happen at RB. In short: “from a theoretical point of view [. . .] the communication channel may be thought of as simply the set of depending relations between S and R. If the statistical relations defining equivocation and noise between S and R are appropriate, then there is a channel between these two points, and information passes between them, even if there is no direct physical link joining S with R” (Dretske, 1981, p. 38). As we have seen, in his interaction-information account of scientific observation, Kosso asserts that interaction is not a suffi- cient condition for information flow. But he also claims that: “obser- vation must involve interaction. Interaction between x and an
  • 18. WHAT IS INFORMATION? 121 observing apparatus is a necessary condition for observation” (Kosso, 1989, pp. 34–35). This last requirement for observation does not seem to be added to the demand of an information flow between the observed entity and the receiver; on the contrary, it seems a result of the very concept of information adopted by Kosso, when he claims that “information is transferred between states through interaction. The object in state s which has informational content (s is P) interacts with something else, the observing apparatus or some intermediate informational medium, with the result that this latter object is left in a state A which has the information (s is P) whereas it did not have that information before the interaction” (Kosso, 1989, p. 37). This quote suggests that Kosso would not agree with Dretske regarding the example of the source transmitting to two receivers: certainly Kosso would not accept that we can observe the events at RB by looking at RA; but surely he would neither accept that information flows from RA to RB with no physical link between them. If this is right, despite its own assumption, Kosso does not completely agree with Dretske’s view about information: instead of conceiving the concept of information as a semantic concept, his concep- tion approaches the perspective most usually adopted in physical sciences, where an unavoidable link between flow of information and propagation of signals is required. Physicists and engineers accept the well-known dictum ‘no information without represen- tation’: the transmission of information between two points of the physical space necessarily requires an information-bearing signal, that is, a physical process propagating from one point to the other. This perspective is adopted when the correlations between spatially separate quantum systems are considered: any analysis of EPR- experiment9 stresses the fact that there is no information flowing between the two particles, because the propagation of a superlu- minal signal from one particle to the other is impossible. From this view, information is a physical entity, which can be generated, accu- mulated, stored, processed, converted from one form to another, and transmitted from one place to another. Precisely due to the physical nature of information, the dynamics of its flow is ruled by natural laws; in particular, it is constrained by relativistic limitations. The extreme versions of this view conceive information as a physical
  • 19. 122 OLIMPIA LOMBARDI entity with the same ontological status as energy, and whose essen- tial property is to manifest itself as structure when added to matter (cfr. Stonier, 1990). This kind of situations, where the correlations between two points A and B are explained by lawful regularities but there is no signal propagation between them,10 shows up that Dretske and Kosso are using two different concepts of information. According to the semantic concept, information is defined by its capability of providing knowledge. From this view, the possibility of controlling the states at A to send information to B is not a necessary condi- tion for defining an information channel between A and B: the only requirement for an informational link between both points is the possibility of knowing the state at A by looking at B. According to the physical concept, information is a physical entity whose essen- tial feature is its capability of being generated at one point of the physical space and transmitted to another point. This view requires an information-bearing signal that can be modified at the transmitter end in order to carry information to the receiver end. Therefore, if there is no physical link between A and B, it is impossible to define an information channel between them: we cannot control the states at A to send information to B. The divergence between the semantic view and the physical view of information acquires great relevance when the concept of infor- mation is applied to philosophical problems. In particular, when the concept is used to elucidate the notion of scientific observation, this interpretative divergence becomes explicit in the case of the so-called ‘negative experiments’. Negative experiments were origi- nally proposed as a theoretical tool for analyzing the measurement problem in quantum mechanics (cfr. Jammer, 1974, pp. 495–496); but here we will only use them to show the consequences of the choice between both concepts of information. In a negative exper- iment, it is assumed that an event has been observed by noting the absence of some other event; this is the case of neutral weak currents, which are observed by noticing the absence of charged muons (cfr. Brown, 1987, pp. 70–75). But the conceptual core of negative experiments can be understood by means of a very simple example. Let us suppose a tube in whose middle point a particle is emitted at t0 towards one of the ends of the tube. Let us also suppose
  • 20. WHAT IS INFORMATION? 123 that we place a detection device at the right end A in order to know in which direction the particle was emitted. If after the appropriate time t1 – depending on the velocity of the particle and the length of the tube – the device indicates no detection, we can conclude that the particle was emitted towards the left side of the tube. At this point, we can guarantee two facts: • there is a perfect anticorrelation between both ends of the tube. Then, by looking at the state – presence or absence of the particle – at the right end A, we can know the state – absence or presence, respectively – at the left end B. • the instantaneous propagation of a signal between A and B at t1 is physically impossible. The question is: do we have observed the direction of the emitted particle? From an informational account of scientific observation, the answer depends on the view about information adopted for elucidat- ing the notion of observation: • if the semantic view is adopted, a communication channel between both ends of the tube can be defined. Then, there is a flow of information from B to A, which allows us to observe the presence of the particle at B, even though there is no signal propagating from B to A. • if the physical view is adopted, there is no information flow from B to A because there is not, and there cannot be, a signal instantaneously propagating between B and A at t1. Then, we do not observe the presence of the particle at B. In other words, according to the semantic view of information, by looking at the detector we simultaneously observe two events, presence-at-B and absence-at-A. On the contrary, the physical view leads us to a concept of observation more narrow than the previous one: by looking at the detector we observe the state at A – presence
  • 21. 124 OLIMPIA LOMBARDI or absence –, but we do not observe the state at B; such a state is inferred. This discussion shows that it is possible to agree on the formal theory of information and even on some interpretative points but, despite this, to dissent on the very nature of information. Informa- tion may be conceived as a semantic item, whose essential property is its capability of providing knowledge. But information may also be regarded as a physical entity ruled and constrained by natural laws. 4. THE SYNTACTIC APPROACH OF COVER AND THOMAS The physical view of information has been the most widespread view in physical sciences. Perhaps this fact was due to the specific technological problems which led to the original theory of Shannon: the main interest of communication engineers was, and still is, to optimize the transmission of information by means of physical signals, whose energy and bandwidth is constrained by techno- logical and economic limitations. In fact, the physical view of information is the most usual in the textbooks on the subject used in engineer’s training. However, this situation is changing in recent times: one can see that some very popular textbooks introduce infor- mation theory in a completely syntactic way, with no mention of sources, receivers or signals. Only when the syntactic concepts and their mathematical properties have been presented, the theory is applied to the traditional case of signal transmission. Perhaps the best example of this approach is the presentation offered by Thomas Cover and Joy Thomas in his book Elements of Information Theory (1991).11 Just from the beginning of this book, the authors clearly explain their perspective: “Information theory answers two fundamental questions in communication theory: what is the ultimate data compression [. . .] and what is the ulti- mate transmission rate of communication [. . .]. For this reason some consider information theory to be a subset of communi- cation theory. We will argue that it is much more. Indeed, it has fundamental contributions to make in statistical physics (thermodynamics), computer sciences (Kolmogorov complexity or algorithmic complexity), statistical inference (Occam’s
  • 22. WHAT IS INFORMATION? 125 Razor: ‘The simplest explanation is best’) and to probability and statistics (error rates for optimal hypothesis testing and estimation)” (Cover and Thomas, 1991, p. 1). On the basis of this general purpose, they define the basic concepts of information theory in terms of random variables and probability distributions over their possible values. Let X and Y be two discrete random vari- ables with alphabets A and B, and probability mass functions p(x) = Pr(X = x), x ∈ A and p(y) = Pr(Y = y), y ∈ B respectively. In general, they call ‘entropy’ what we called ‘average amount of information’. Thus, the entropy H(X) of a discrete random variable is defined by: H(X) = p(x)log1/p(x) (4.1) Next, Cover and Thomas extend the definition of entropy to a pair of discrete random variables: the joint entropy H(X, Y) of X and Y with a joint distribution p(x, y) is defined as: H(X, Y) = p(x, y)log1/p(x, y) (4.2) And the conditional entropy H(X/Y) of X given Y is defined as: H(X/Y) = p(x, y)log1/p(x/y) (4.3) The naturalness of these definitions from the viewpoint of proba- bility theory is exhibited by the fact that the entropy of a pair of random variables is the entropy of one of them plus the conditional entropy of the other: H(X, Y) = H(X) + H(Y/X) (4.4) Thus, what we had originally called ‘equivocation’ E (equation (1.6)) – ‘noise’ N (equation (1.7)) – here becomes the conditional entropy H(X/Y) − H(Y/X) –. Cover and Thomas also define the relative entropy D(p//q) between two probability mass functions p(x) and q(x) as: D(p//q) = p(x)logp(x)/q(x) (4.5) The relative entropy D(p//q) is a measure of the inefficiency of assuming that the distribution is q when the true distribution is p; then, D(p//q) is always non-negative, and is zero if and only if p
  • 23. 126 OLIMPIA LOMBARDI = q.12 With these elements, Cover and Thomas define the mutual information I(X, Y) – which we called ‘transinformation’- as the relative entropy between the joint distribution p(x, y) and the product distribution p(x)p(y): I(X, Y) = p(x, y)logp(x, y)/p(x)p(y) (4.6) Thus, the mutual information is the reduction of the uncertainty of X given Y. On the basis of these concepts, Cover and Thomas demonstrate the relationships among entropy, joint entropy, conditional entropy and mutual information, and express them in the well-known diagram: I(X, Y) = H(X) − H(Y/X) = H(Y) − H(X/Y) (4.7) H(X, Y) = H(X) + H(Y) − I(X, Y) (4.8) where the first of both formulas is the analogue of equation (1.5), which was expressed in terms of equivocation and noise. Since here the concepts are introduced in terms of random variables and their correlations, the authors can extend the definitions to the case of more than two random variables. For example, they define (1991, pp. 21–23) the entropy H(X1, . . ., Xn) of a collection of random vari- ables, the conditional mutual information I(X, Y/Z) of the random variables X and Y given Z, and the conditional relative entropy D(p(y/x)//q(y/x)). This brief summary of the way in which Cover and Thomas present the theory of information shows that this approach adopts a syntactic concept of information. From this perspective, the defini-
  • 24. WHAT IS INFORMATION? 127 tion of information has nothing to do with communication, trans- mission and reception of messages, nor with the knowledge of an event obtained by looking at another event: here, the only ‘objects’ of the theory are random variables and their correlations. As we have seen, even though Dretske admits the possibility of a communica- tion channel with no physical substratum, he nevertheless requires that the conditional probabilities defining the channel result from the nomic dependence between the states of the source and the states of the receiver. But from the perspective of Cover and Thomas, the concept of information loses even this semantic ingredient: it is legitimate to define the mutual information of two variables even if there is no nomic relationship between them and their condi- tional probabilities are computed exclusively by means of de facto correlations. For example, if the last month results of the lottery of Sydney partially coincide with the results obtained in the lottery of Mexico during the same period, there is a positive mutual informa- tion between both sequences of completely independent events. If the concept of information is so deprived of the intentional character required by Dretske, any link between information and knowledge vanishes: as we have seen, when the correlation between two vari- ables is merely accidental, the value of one of them tells us nothing about the value of the other. In short, from this syntactic view, we lose the basic intuition according to which information modi- fies the state of knowledge of those who receive such information. This might seem a too high price to pay for retaining the syntactic approach, despite its elegance and mathematical precision. However, the position of Cover and Thomas has its own advan- tages. By turning information into a syntactic concept, this approach makes the theory of information applicable to a variety of fields. Among them, communication by means of physical signals is only one of the many applications. Thus, after the chapter that introduces the basic concepts of the theory, Cover and Thomas devote the next chapters of their book to explain how such concepts answer very different problems. A concept usually associated with information is the concept of thermodynamic entropy. The well-known Boltzmann’s equation for the entropy of a macrostate M, SB(M) = k logW
  • 25. 128 OLIMPIA LOMBARDI where k is Boltzmann’s constant, and W is the number of micro- states compatible with M – is isomorphic with the equation for the contribution of xi to the entropy H(X) of the variable X. In fact, if the microstates are equiprobable, the probability of the macrostate M is 1/W. But, which is the relationship between informational entropy and thermodynamic entropy? In his thoughtful discussion on this point, Jeffrey Wicken makes a valuable argumentative effort to stress the difference between thermodynamic entropy and Shannon entropy as used in communication theory; in this context, he claims that: “while the Shannon equation is symbolically isomorphic with the Boltzmann equation, the meanings of the respective equations bear little in common” (Wicken, 1987, p. 179). This is certainly true if one adopts the physical interpretation of the concept of information. But from a syntactic interpretation, Wicken’s claim loses its original sense, not because both concepts have the same meaning, but because the concept of information, as a purely syntactic concept, completely lacks semantic content. In this field, Cover and Thomas go further by formulating a version of the Second Law of Thermodynamics in informational terms. In particular, they explain the increasing of the coarse grained entropy proposed by Gibbs: Scg = kPilog1/Pi where Pi is the probability corresponding of a cell i resulting from a coarse grained partition of the phase space. Let pn(x) be the prob- ability distribution over the cells at time tn, and let us suppose that such a distribution evolves as a Markov chain. Cover and Thomas (1991, pp. 34–35) demonstrate that the relative entropy between pn(x) and the uniform stationary distribution p(x) = α – which represents thermodynamic equilibrium – monotonically decreases with time. Now, by using the definition of relative entropy (equation (4.5)), we have: D(pn//p) = pn(x)logpn(x)/p(x) = pn(x)log1/p(x) −pn(x)log1/pn(x) Hence, by equation (4.1): D(pn//p) = logα − H(Xn)
  • 26. WHAT IS INFORMATION? 129 Therefore, the monotonic decrease in the relative entropy implies the monotonic increase in the informational entropy H(Xn), which here represents the coarse grained thermodynamic entropy Scg. These results presented by Cover and Thomas show that to accept the conceptual difference between thermodynamic entropy and Shannon entropy does not lead to conclude that the syntactic concept of information is useless in thermodynamics. On the contrary, Boltzmann’s entropy and coarse grained entropy can be fruitfully treated by means of the concepts supplied by the syntactic theory of information. Another discipline where the syntactic approach to information shows its applicability is computer science, in particular, the field of algorithmic complexity. Let X be a finite length binary string; the algorithmic complexity (Kolmogorov complexity) of X is defined as: K(X) = minpl(p) where p is a Turing machine program that prints X and halts, and l(p) is the length of p. Then, K(X) is the shortest description length over all the descriptions supplied by a Turing machine. Intuitively, a string has maximum algorithmic complexity when the shortest program that prints it has approximately the same length as such a string. Cover and Thomas (1991, p. 154) demonstrate that the expected value of the algorithmic complexity of a sequence X is close to its informational entropy H(X). Thus, a well-known result about data compression can be seen in a new light: H(X) is a lower bound on the average length of the shortest description of the sequence X; but H(X) is also close to the algorithmic complexity of X. Therefore, through the concept of informational entropy, the algorithmic complexity of a sequence becomes a measure of its incompressibility. Cover and Thomas extend these results in order to elucidate the controversial principle of simplicity, according to which, if there are many explanations consistent with the observed data, one must choose the simplest one. They demonstrate (1991, pp. 160–161) that, if p is a program that produces the string X, the probability of p is 2−l(p); hence, short programs are much more probable than longer ones. If X is interpreted as the sequence of observed data and
  • 27. 130 OLIMPIA LOMBARDI p is the explanatory algorithm for such data, then this result can be used to justify the choice of the shortest -the simplest- explanation of data. Although one may disagree with this interpretation of Occam’s Razor, it must be admitted that this is an interesting and precise elucidation of the concept of simplicity usually invoked in scientific research. These are only some examples of the many applications of the syntactic concept of information. Other fields where the concept is useful are the generalization of gambling processes, the theory of optimal investment in the stock market and the computing of error rates for optimal hypothesis testing. From this syntactic perspective, communication theory is only an application – of course, a very important one – of the theory of information: “While it is clear that Shannon was motivated by problems in communication theory, we treat information theory as a field of its own with applications to communication theory and statistics” (Cover and Thomas, 1991, p. viii). In summary, from the syntactic approach the concept of information acquires a generality that makes it a powerful formal tool for science. However, this generality is obtained at the cost of losing its meaning links with concepts as knowledge or communication. From this view, the word ‘information’ does not belong to the language of factual sciences or to ordinary language: it has no semantic content. The concept of information is a scientific but completely formal concept, whose ‘meaning’ only has a syntactic dimension; its generality derives from this exclusively syntactic nature. Therefore, the theory of information becomes a mathematical theory, a chapter of the theory of probability: only when its concepts are semantically interpreted, the theory can be applied to very different fields. 5. WHAT IS INFORMATION? From our previous discussion it is clear that there is not a single answer for this question. We have shown that there are different concepts of information, each one of them useful for different purposes. The semantic concept strongly links information to knowledge: information is essentially something capable of yielding knowledge; this concept is useful for cognitive and semantic studies.
  • 28. WHAT IS INFORMATION? 131 The physical concept is the one used in communication theory: here information is a physical entity that can be generated, trans- mitted and received for practical purposes. The syntactic concept is a formal notion with no reference: in this sense, the theory of information is a mathematical theory, in particular, a chapter of the theory of probability. The question is: which are the relationships among these three concepts? When we talk about three concepts of information we do not mean that we are facing three rival views, among which we must choose the correct one. All the three concepts are legiti- mate when properly used. The relationship between the syntactic concept and the other two is the relationship between a mathematical object and its interpretations. The wave equation may represent the mechanical motion of a material medium or the dynamics of an electromagnetic wave: both cases share nothing else than their syntactic structure. Analogously, the informational entropy H(X) and the mutual information I(X, Y), as syntactic concepts, have no reference: their syntactic ‘meaning’ is given by the role played in the mathematical theory to which they belong. But when these syntactic concepts are interpreted, they acquire referential content. In the semantic theory of information, the relevant quantities are not the average quantities but their individual correlates I(si) and I(si, rj ): when both amounts of semantic information are equal, the occurrence of the state of affairs rj gives us the knowledge of the occurrence of the state of affairs si. In communication theory, H(S) measures the average amount of the physical information generated at the source S, and this physical information is transmitted to the receiver by means of a carrier signal. But these are not the only possible interpretations. In computer science, if X is interpreted as a finite length binary string, H(X) can be related with the algorithmic complexity of X. If, in thermodynamics, X is interpreted as a macro- state compatible with W equiprobable states, H(X) represents the Boltzmann’s thermodynamic entropy of X; the understanding of the relationship between the syntactic concept of information and its interpretations serves to evaluate the usually obscure extrapolations from communication theory to thermodynamics. This discussion suggests that there is a severe terminological problem here. Usually, various meanings are subsumed under the
  • 29. 132 OLIMPIA LOMBARDI term ‘information’, and many disagreements result from lacking a terminology precise enough to distinguish the different concepts of information. Therefore, a terminological cleansing is required in order to avoid this situation. My own proposal is to use the word ‘information’ only for the physical concept: this option preserves not only the generally accepted links between information and knowledge, but also the well-established meaning that the concept of information has in physical sciences. I think that this terminolo- gical choice retains the pragmatic dimension of the concept to the extent that it agrees with the vast majority of the uses of the term. But, what about the semantic and the syntactic concepts? Perhaps Dretske’s main goal of applying the concept of information to ques- tions in the theory of knowledge can be also achieved by means of the physical concept, without commitments with non-physical information channels. Regarding the syntactic view, it would be necessary to find a new name that expresses the purely mathema- tical nature of the theory, avoiding confusions between the formal concepts and their interpretations. Of course, this terminological cleansing is not an easy task, because it entails a struggle against the imprecise application of a vague notion of information in many contexts. Nevertheless, this becomes a valuable task when we want to avoid conceptual confusions and futile disputes regarding the nature of information. NOTES 1. Here we work with discrete situations, but the definitions can be extended to the continuous case (cfr. Cover and Thomas, 1991, pp. 224–225). 2. In his original paper, Shannon (1948, p. 349) discusses the reason for the choice of a logarithmic function and, in particular, of the logarithm to the base 2 for measuring information. 3. If the natural logarithm is used, the resulting unit of information is called ‘nat’ – a contraction of natural unit –. If the logarithm to base 10 is used, then the unit of information is the Hartley. The existence of different units for measuring information shows the importance of distinguishing between the amount of information associated with an event and the number of binary symbols necessary to codify the event. 4. Shannon’s Second Theorem demonstrates that the channel capacity is the maximum rate at which we can send information over the channel and recover the information at the receiver with a vanishingly low probability of error (cfr., for instance, Abramson, 1963, pp. 165–182).
  • 30. WHAT IS INFORMATION? 133 5. Dretske uses Is(r) for the transinformation and Is(ra) for the new individual transinformation. We have adapted Dretske’s terminology in order to bring it closer to the most usual terminology in this field. 6. In fact, the right-hand term of (2.7), when (2.5) and (2.6) are used, is: ij p(si, rj )I(si, rj ) = ij p(si, rj )[I(si) − E(rj )] = ij p(si, rj )log1/ p(si) − ij p(si, rj )kp(sk/rj )log1/p(sk/rj ). Perhaps, Dretske made the referred mistake by misusing the subindices of the summations. 7. Dretske says that, in this context, it is not relevant to discuss where the inten- tional character of laws comes from: “For our purpose it is not important where natural laws acquire this puzzling property. What is important is that they have it” (Dretske, 1981, p. 77). 8. I have argued for this view of scientific observation elsewhere (Lombardi, “Observación e Información”, future publication in Analogia): if we want that every state of the receiver lets us know which state of the observed entity occurred, it is necessary that the so-called “backward probabilities” p(si/rj ) (cfr. Abramson, 1963, p. 99) have the value 0 or 1, and this happens in an equivocation-free channel. This explains why noise does not prevent observation: indeed, practical situations usually include noisy channels, and much technological effort is devoted to design appropriate filters to block the noise bearing spurious signal. I have also argued that, unlike the informational account of observation, the causal account does not allow us to recognize (i) situations observationally equivalent but causally different, and (ii) situ- ations physically – and, then, causally – identical but informationally different which, for this reason, represent different cases of observation. 9. The experiment included in the well-known article of Einstein, Podolsky and Rosen (1935). 10. Note that this kind of situations does not always involve a common cause. In Dretske’s example of the source transmitting to two receivers, the correlations between RA and RB can be explained by a common cause at S. But it is usually accepted the impossibility of explaining quantum EPR-correlations by means of a common cause argument (cfr. Hughes, 1989). However, in both cases correlations depend on underlying nomic regularities. 11. This does not mean that Cover and Thomas are absolutely original. For example, Reza (1961, p. 1) considers information theory as a new chapter of the theory of probability; however, his presentation of the subject follows the orthodox way of presentation in terms of communication and signal trans- mission. An author who adopts a completely syntactic approach is Khinchin (1957); nevertheless his text is not as rich in applications as the book of Cover and Thomas and was not so widely used. 12. In the definition of D(p//q), the convention – based on continuity arguments – that 0 log 0/q = 0 and p log p/0 = ∞ is used. D(p//q) is also referred to as the ‘distance’ between the distributions p and q; however, it is not a true distance between distributions since it is not symmetric and does not satisfy the triangle inequality.
  • 31. 134 OLIMPIA LOMBARDI REFERENCES Abramson, N.: 1963, Information Theory and Coding. New York: McGraw-Hill. Bell, D.A.: 1957, Information Theory and its Engineering Applications. London: Pitman Sons. Brown, H.I.: 1987, Observation and Objectivity. New York/Oxford: Oxford University Press. Cover, T. and J.A. Thomas: 1991, Elements of Information Theory. New York: John Wiley Sons. Dennet, D.C.: 1969, Content and Conciousness. London: Routledge Kegan Paul. Dretske, R.: 1981, Knowledge and the Flow of Information. Cambridge, MA: MIT Press. Einstein, A., B. Podolsky and N. Rosen: 1935, Can Quantum-Mechanical Description of Physical Reality be Considered Complete? Physical Review 47: 777–780. Hughes, R.I.G.: 1989, The Structure and Interpretation of Quantum Mechanics. Cambridge, MA: Harvard University Press. Jammer, M.: 1974, The Philosophy of Quantum Mechanics. New York: John Wiley Sons. Khinchin, A.I.: 1957, Mathematical Foundations of Information Theory. New York: Dover Publications. Kosso, P.: 1989, Observability and Observation in Physical Science. Dordrecht: Kluwer Academic Publishers. Reza, F.M.: 1961, Introduction to Information Theory. New York: McGraw-Hill. Shannon, C.: 1948, The Mathematical Theory of Communication. Bell System Technical Journal 27: 379–423. Shapere, D.: 1982, The Concept of Observation in Science and Philosophy. Philosophy of Science 49: 485–525. Stonier, T.: 1990, Information and the Internal Structure of the Universe. London: Springer-Verlag. Wicken, J.S.: 1987, Entropy and Information: Suggestions for Common Language. Philosophy of Science 54: 176–193. University Nacional de Quilmes-CONICET Crisólogo Larralde 3440 6◦D, 1430, Ciudad de Buenos Aires Argentina E-mail: olimpiafilo@arnet.com.ar