Annenberg School for Communication Departmental Papers (ASC) University of Pennsylvania Year 2009Mathematical Theory of Communication Klaus Krippendorﬀ University of Pennsylvania, kkrippendorﬀ@asc.upenn.eduThis paper is posted at ScholarlyCommons.http://repository.upenn.edu/asc papers/169
Corrected pages 614-618 in S. W. Littlejohn & K. A. Foss (Eds.). Encyclopedia of Communication Theory. Los Angeles: Sage, 2009.MATHEMATICAL THEORY and two different messages should provide more information than either by itself..OF COMMUNICATION To define quantities associated with selecting messages, in his 2nd theorem, Shannon proved thatClaude Shannon’s mathematical theory of the logarithm function was the only one thatcommunication concerns quantitative limits of conforms to the above intuitions. Logarithmsmediated communication. The theory has a history increase monotonically with the number ofin cryptography and of measuring telephone traffic. alternatives available for selection, and are additiveParalleling work by U.S. cybernetician Norbert when alternatives are multiplicative. While the baseWiener and Soviet logician Andrei N. Kolmogorov, of this logarithm is arbitrary, Shannon set it to two,the theory was first published after declassification thereby acknowledging that the choice among twoin 1948. Due to Wilbur Schramm’s initiative, it equally likely alternatives—answering a yes or noappeared in 1949 as a book with a brief commentary question or turning a switch on or off—is the mostby Warren Weaver. The theory provided a scientific elementary choice conceivable. His basic measure,foundation to the emerging discipline of called entropy H iscommunication but is now recognized as addressingonly parts of the field. H( X ) xX px log2 px For Shannon, “the fundamental problem ofcommunication is reproducing at one point either where px is the probability of message x occurring inexactly or approximately a message selected at the set of possible messages X. The minus signanother point” (Shannon & Weaver, 1949, p. 3). assures that entropies are positive quantities. WithShannon did not want to confound his theory by NX as the size of the set X of possible messages, H’spsychological issues and considered meanings range isirrelevant to the problem of using, analyzing, and 0 ≤ H(X) ≤ log2NX.designing mediated communication. The key toShannon’s theory is that messages are distinguished H averages the number of binary choices neededby selecting them from a set of possible messages – to select one message from a larger set, or thewhatever criteria determine that choice. His theory number of binary digits, bits for short, needed tohas 22 theorems and seven appendices. Its basic enumerate that set. H is interpretable as a measureidea is outlined as follows. of uncertainty, variation, disorder, ignorance, or lack of information. When alternatives are equally likely, The Basic Measure No alternative = no choice, px=1 and H = 0 bitsArguably, informed choices are made to a degreebetter than chance and selecting a correct answer Two alternativesfrom among many possible answers to a question is = one binary choice, px=0.5 and H = 1 bitmore difficult and requires more information than Four alternativesselecting one from among few. For example, = two binary choices, px=0.25 and H = 2 bitsguessing the name of a person is more difficult thanguessing its gender. So, its name would provide Eight alternativesmore information than its gender, the former often = three binary choices, px=0.125 and H = 3 bitsimplying information about the latter. Intuitively, …communication that eliminates all alternatives N alternativesconveys more information than one that leaves some = log2N binary choices, px=1/N and H = log2N bitsof them uncertain. Furthermore, two identicalmessages should provide the information of any one 2N alternatives –N = N binary choices, px=2 and H = N bits.
Entropies and Communication T(S:R) = H(R) – HS(R), the uncertainty at receiverThe additivity of H gives rise to a calculus of R minus noisecommunication. For a sender S and a receiver R one = H(S) – HR(S), the uncertainty at sender Scan measure three basic entropies. minus equivocation1. The uncertainty of messages s at sender S, occurring with probability ps = H(S) + H(R) – H(SR), the sum of the uncertainties at S and R minus the total H ( S ) sS ps log 2 ps . = H(SR) – HS(R) – HR(S), the total uncertainty minus noise and equivocation2. The uncertainty of messages r at receiver R, occurring with probability pr The algebraic relationships between these quantities are visualized in the center of Figure 1. H ( R ) rR pr log 2 pr . Accordingly,3. The total uncertainty of s-r pairs of messages in Communication is the extent a sender is able a S×R table, occurring with probability psr to limit the receiver’s choices, and H ( SR ) sS rR psr log 2 psr . Information is the extent a receiver knows the sender’s choices. These lead to the sender-unrelated uncertainty Both are interpretations of the same quantity,entering a communication channel, colloquially T(S:R), the amount of information transmitted. Bothcalled noise, and expressed as: express differences between two uncertainties, with and without knowing the choices made at the other H S ( R ) H ( SR ) H ( S ), point of a communication channel. They increasethe uncertainty in the sender lost due to with the number of choices or the improbability ofsimplifications or omissions during communication, the alternatives reduced in the cause of transmission.called equivocation, and expressed as T(S:R) quantifies a symmetrical relationship, not the property of a message. H R ( S ) H ( SR ) H ( R ).and the amount of information transmitted betweensender and receiver, which can be obtained in atleast four ways:
Relation to Thermodynamics Cryptographers pursue two tasks, (1) finding a code to decipher intercepted messages whose apparentThe well-known second law of thermodynamics gibberish is presumed to transmit valuablestates that for any closed system, utilizable energy information, and (2) developing codes by whichdifferences such as of temperature, pressure, and messages with sensitive information may bechemical potential decrease over time. Only outside intercepted by unauthorized persons but cannot beresources may counteract this natural tendency. read by them. During WWII, Shannon proposedThermodynamic processes converge to a state of unbreakable codes, now outdated and replaced bymaximum entropy at which all utilizable energy is pairs of encoding and decoding algorithms whoseexhausted and everything stops. Shannon’s theory reconstruction exceeds computational limits.of communication has been considered a moregeneral formulation of this law. It states that noise Hence, the mathematical theory ofor disorder can only increase, in communication communication also addresses limits on the abilityterms eroding information, i.e., the ability of the to find codes to reproduce a sender’s originals. Asreceiver to relate what is received to what was sent. such, the transmission of choices and theWithout outside intervention, the process converges reproduction of original messages are prerequisitesto where only noise prevails, equivocation has of all mediated communication. The readability ofirrecoverably omitted all details of the original, and reproduced messages is a cultural issue, however,communication has ceased. One can experience the and goes beyond the theory. To understand eachbeginning of this process by repeatedly others’ messages, communicators must be literate inphotographing a photograph or making Xerox copies each others’ language communities.of Xerox copies ad infinitum. After each iteration,the grain of an image becomes rougher, distinctions Redundancybecome blurred, ultimately disappear, and thechance to recover the original becomes increasingly Redundancy or inefficient transmission is theunlikely. difference between the capacity of a communication channel and how much of it is utilized Coding R(S:R) = Max[T(S:R)] – T(S:R).Figure 1 distinguishes between the transmission of Redundancy may be due to (1) unused channelinformation and the reproduction of original capacity, (2) duplicate transmission of messages, andmessages. Transmission involves translating (3) restrictions on the set of possible messages, formessages from one medium to another, not example, by a grammar or specialized vocabulary.necessarily readable along its way. For example, the Redundancy seems wasteful but is of considerablesignals that transmit a document to a fax machine importance in human communication.may be overheard without making sense, yet they Much of Shannon’s theory concerns the abilityconvey all the information needed to reproduce the to devise codes that identify or correct corruptedoriginal or a close approximation of it. Fax communications. Such codes depend on themachines embody a code. A code is a formal rule by existence of redundancy. This can be experienced,which patterns are translated from one medium to for example, when proofreading text. Identifyinganother. To reproduce an original message, typos is possible only when a language does nothowever, the last code must invert the aggregate of employ all combinatorially possible words.the preceding ones. “Informition” is not an English word and assuming th According to Shannon’s 11 theorem, when the the writer is using Standard English, it can bechannel capacity identified as an error and corrected without uncertainty. English has been estimated to be about Max[T(S:R)] H(S), 70% redundant, which makes speech quite resistant to corruptions in the form of unclear pronunciationone can devise a code that reconstructs – up to a or acoustical interferences, and writing proofsmall error – the original messages from what was readable text, amenable to spellcheckers. Manytransmitted. technical communication processes avoid costly The distinction between transmission and redundancy, often to the detriment of their users.reproduction is central for cryptography. Telephone numbers, Passwords, and Zip Codes, for
example, are designed without it and tolerate no destroyed by a storm, without further distinguishinghuman error. who these people are and identifying the pieces that In his 10th theorem, Shannon proved that the led one to conclude they belonged to a house thecorrectability of corrupted communication channels storm destroyed. In communication theoreticalis limited by the amount of redundancy available in terms, mapping a sender’s information into fewereither the same or an additional channel. If meaningful categories amounts to equivocation.redundancy is unavailable, communication erodes – However, the information lost to equivocation mayanalogue to the second law of thermodynamics. not be meaningless entirely. It may contribute toWhen noise occurs, accurate communication must the aesthetic appreciation of high fidelity images.be paid for by additional redundancy. Conversely, taking a photograph and emailing it as an attachment requires only a few decisions. The camera captures far more, which its receiver may Digitalization well appreciate. Generally, humans select amongA good deal of Shannon’s work addresses the and compose messages from chunks of commonlyproblem of measuring communication of continuous understood packages of information, not individualvariables—sound, images, and motion—with bits.entropies that are defined for discrete phenomena. Weaver sought to extend Shannon’s theory, byMeanwhile, technology has caught up with identifying three levels of communication: (A) theShannon’s methods of digitalizing and quantifying technical problem of accurately reproducingcontinuous phenomena. Today we are constantly symbols, (B) the semantic problem of accuratelyconfronted with Shannon’s quantities. When buying reproducing the desired meanings of symbols, anda computer, we need to know its memory capacity (C) the effectiveness problem of causing desiredand speed, when attaching a file to an email, we conduct as the result of conveying desiredneed to be concerned for its size, and when signing meanings. Weaver conceived different codes asup for internet service, we need to be sure of a high operating on each level. However, the idea oftransmission rate. The bits of Shannon’s measures choices among alternatives that make a difference inor bytes in computer terms— 1 byte = 8 bits —have people’s lives underlies all three levels ofbecome indispensable in contemporary life. communication. Human Communicators Four MisconceptionsThe publication of Shannon’s theory encouraged Some claim Shannon’s theory is one of signalmany researchers to treat humans as channels of transmission. However, his is a content-freecommunication and measure their capacity of mathematical calculus. The physicality of messagesprocessing information. George A. Miller suggested has little to do with the quantities it defines. Easythat people reliably handle no more than seven calculation may favor its application to discreteplus/minus two bits simultaneously. There are phenomena, digital media for example, but hisestimates that reading comprehension cannot exceed quantifications are applicable to wherever its16 bits/sec. Such limits are soft, however. axioms are met and users are facing the question ofGenerally, experiences of information processing what they can or cannot transmit.overload cause stress, which results in errors that Communication literature typically bypassesreduce the capacity of humans to process Shannon’s mathematical conceptions, andinformation. interprets a simple schematic drawing that Human information processing capacities Shannon and Weaver (1949, p. 5 and 98) used totypically are a fraction of the amounts of contextualize the theory as “Shannon’s linearinformation various media transmit to us, say on communication model.” True, Shannon wastelevision screens. This does not render information concerned with the transmission of informationmeasures irrelevant to understanding human and reproduction of messages from one point tocommunication. Miller observed that humans another. This did not preclude extending theprocess information in chunks of familiar theory to circular communication structurescategories. So, on a high resolution photograph one (Krippendorff, 1986). Shannon’s theory providesmay recognize people and the ruins of a house
a versatile calculus, not a particularcommunication model. The pervasive use of the content metaphors inthe discourse of communication research easilymisleads communication theorists to interpretShannon’s entropies as measuring the informationcontent of messages. In fact, anticipating suchconfusions, Shannon refused to call his calculusinformation theory, and named his H-measures“entropies.” For him information is not an entitycontained in a message, but manifest in patterns thatare maintained during highly variable processes ofcommunication. Finally, the theory does not presume thatcommunicators share the same repertoire ofmessages, for example, having the samepreconceptions or speaking the same language. Byquantifying how choices made at one point affectthose made at another, the theory assertsfundamental limits on all levels of humancommunication. Klaus KrippendorffSee also Cybernetics, Information Theory, Uncertainty Reduction TheoryFurther ReadingsKrippendorff, K. (1986). Information Theory; Structural Models for Qualitative Data. Beverly Hills, CA: Sage.Shannon, C. E. & Weaver, W. (1949). The Mathematical Theory of Communication. Urbana, IL: University of Illinois Press.