Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Presentation notes


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Presentation notes

  1. 1. 1Focus on the engineering paradigm in IR.Start with some historical context.2I think several of you will have heard of this man, even if you don’t recognize him by photo.3Claude Shannon (1916-2001)an American mathematician, electronic engineer, and cryptographer who is credited asfounding "information theory“ and made major contributions to the fields of electricalengineering and computer science. Did graduate work at MIT where he worked with4Vannevar BushAnd for the purposes of our story, more specifically with Bush differential calculator.An analog calculatorused in World War II to calculate ballistics tables.Containing more than a thousand gears, the machine took up an entire room. It was tediouslyprogrammed by physically changing the gears with a screwdriver and wrench, and the outputwas displayed as graphs. Shannon, however, was a tinkerer by nature and liked working withthe machine and so he would program the calculator with other scientist’s equations.While studying the relay switches on the Differential Equalizer as they went about formulatingan equation, Shannon noted that the switches were always either open or closed, or on andoff. This led him to think about a mathematical way to describe the open and closed states,and he recalled the logical theories of mathematician George Boole (which he had studied asan undergrad at Michigan).
  2. 2. 5George BooleWho in the middle 1800s advanced what he called the logic of thought, in which all equationswere reduced to a binary system consisting of zeros and ones.By reducing information to a series of ones and zeros, Shannon wrote, information could beprocessed by using on-off switches. He also suggested that these switches could be connectedin such a way to allow them to perform more complex equations that would go beyond simple‘yes’ and ‘no’ statements to ‘and’, ‘or’ or ‘not’ operations.In his thesis, A Symbolic Analysis of Relay and Switching Circuits, Shannon proved that Booleanalgebra could be used to simplify the arrangement of the relays that were the building blocksof the electromechanical automatic telephone exchanges of the day.6If you’re building a long distance telephone system, a more efficient and systemic way toarrange these relays is a huge deal. Before this, the design of switches in circuits had beendesigned by individuals ad hoc. This was a huge early success for Shannon.7Also fundamental to digital circuit design and the development of computers when his workbecame widely known among the electrical engineering community during and after WorldWar II.It is while working at Bell Labs that Shannon introduced the term ‘bit’ (short for binary digit) asa measure of information. A bit is the amount of information stored by a digital device orother physical system that exists in one of two possible distinct states: open or closed,punched or not. More precisely, you can define a bit as the information that is gained whenthe value of such a binary variable becomes known.You could now quantify how much information was in a message by how many bits ofinformation it contained.During the war he joined Bell Labs where he wrote…
  3. 3. 8The Mathematical Theory of Communication, which was declassified and published in 1948.And this brings us to Shannon’s conclusion, as an enormously successful engineer who hadwritten what is arguably the most important masters thesis in the 20th Century, who crossedpaths with Einstein as a Research Fellow and worked with Turing during the war.Viewing information from a communication perspective he wrote: “Frequently the messageshave meaning; that is they refer to or are correlated according to some system with certainphysical or conceptual entities. These semantic aspects of communication are irrelevant tothe engineering problem.”Shannon founded the field of Information Theory and his works underlies at lot of modernNatural Language Processing.9This should also bring to mind something we’ve seen in class.“As pointed out earlier, the method to be developed here is a probabilistic one based on thephysical properties of written texts. No consideration is to be given to the meaning of wordsor the arguments expressed by word combinations.”This reduces information retrieval to an engineering problem.Argue: There is a critical difference between these two. It’s important to remember whatengineering problem Shannon was working on: how to electrically transmit messages quicklyand cheaply. The phone line doesn’t care about the meaning of a conversation.Your word processors don’t care about the meaning of words.
  4. 4. 10Is the paradigm of science and engineering applicable to information studies and particularlyIR?11To illustrate this, Wilson quotes a sentence from Gibbon’ The history of the decline and fall ofthe Roman empire“The Latins of Constantinople were on all sides and pressed their sole hope the last delay ofruin was in the division of their Greek and Bulgarian and of this hope they were deprived bythe arms and policy of John Vataces emperor of Nice.”"Classification by subjects would be an exceedingly useful method if it were practicable, butexperience shows it to be a logical absurdity." W. Stanley Jevons, 1905