Transcript of "Should a computer compete for language?"
Should a computer compete for language?An exploration of a computer acquiring natural language using brain modeling, evolution and affectivecomputingWhy should a computer acquire natural language?I think it is useful to pursue the quest of learning a computer natural language.It is useful to dothis because it makes us able to cope with the information overload and filter failure we face inour world. Filter failure is concept that Clay Shirkey a professor at InteractiveTelecommunications Program thought up for the information explosion of the Internet era.1 Hesays; there has been more books than anybody could read since the sixteenth century . Soinformation overload is not the problem we are facing with the Internet. The problem we facetoday is that the natural filters that existed have disappeared. For example an encyclopediacould only have a limited number of pages. A television station can only have one programairing at the same time. Look to the modern variant of these media, Wikipedia want to collect allhuman knowledge and has over 3,5 million articles. YouTube gets 24 hours of footageuploaded every second. The filters of television or paper encyclopedias have ceased to existand the floodgates are open.Putting human filters back in doesn’t seem useful because the amount of data is only gettingbigger and the human brain isn’t keeping up. A computer might be, it is able to run for days ormonths just analysing texts for relevant content. A computer then becomes a personal filterbetween the enormous data that is available and the stuff that is relevant to a certain user.Understanding which information is relevant in an argument. A computer is able to filter for aspecific person and not the taste and opinion of human filters. I think the quest for learning acomputer natural language is relevant and even necessary to manage the enormous flood ofdata.Our Internet is more and more tailor made for our interests. This seems to solve the abovemotioned problems but can make them even worse. We will come to live more and more in afilter bubble.2Our world is tailor made for us without us knowing what is filtered out. Google willgive you different result based on 60 parameters without you even being logged in to Google.This is a major issue if take into account that even news via your social network is filtered byFacebook. You will only see progressive news as a progressive voter and only conservativenews as a conservative voter. Your views will never be challenged by an algorithmicgatekeeper. Unless such a gate keeper would understand the content and witch views opposeseach other. Then it could give you a regular and solid argument that opposes your view. Oreven completely different texts on the Internet that don’t have the same subject as you like, butthe same style of writing.If we don’t tackle this problem we will all float in our own filter bubbles. Without ever findinganything that opposes our views. The idea of a free Internet where every voice is equal is goneand a great foundation for more extremist views to grow is laid.1 Shirky, C. (2008)2 Partiser, E. (2011) 1
How are we going to learn a computer natural language?The problems with natural language are big. There are an infinite number of grammaticallycorrect sentences in a language, but an even greater number of sentences that are incorrect.And the words used within these languages aren’t even clearly defined. Take the concept ofrunning. A person can run. An engine can run, A river runs and a nose can run. The problem isthat the same word can mean many things in a different sentence. If you want to describe thegrammatical rules and the meaning of every word in a language you will get in trouble. A wordcan mean so many things in a different context.We need breakthrough innovations to tackle the problem of natural language in computers. Sohow are we going to find these breakthrough innovations?3 First we are going to put the problemof natural language in new and different context Away from the idea of Turing machines, dataand building specific programs. We need to cope with input that is noisy but predictable. Look toheuristics instead of hard coded rules.After explaining this context I will explain the Hierarchical Temporal Memory(HTM) witch cancope with the parameters mentioned above, and is modelled on the neocortex.4 First we aregoing to put the problem of natural language in new and different context Away from the idea ofTuring machines, data and building specific programs. We need to cope with input that is noisybut predictable. Look to heuristics instead of hard coded rules.After explaining this context I will explain the Hierarchical Temporal Memory (HTM) witch cancope with the parameters mentioned above, and is modelled on the neocortex. # This is thebrain area where for example higher vision, hearing and language is processed. Next to thisalgorithm I suggest some concepts of affective computing to change the state of the systemaccording to changes in the input.Evolution and Memes in the context of languageOur human brains are the only device that has been able to acquire language. The human brainis a product of evolution so can we learn something from evolution that makes it easier for acomputer to acquire language.5 There are three powers that govern evolution: 1. Variation: In the context of genetic evolution these are the genetic mutations in the DNA when it is copied. These variations make sure that novel qualities can arise that might be beneficial to the animal the DNA is present in. Think of a mutation that make a bacteria resistant to penicillin. 2. Selection: Selection is the force that prohibits all variations to survive. This can be anything in the environment that prohibits the transfer of genes. This can either be by killing an animal or prohibit its reproduction. If they take the bacteria example the selection criteria may be a penicillin rich environment. Selection would give the3 Baldwin, C. Y.(2009)4 Hawkins,J. (2007)5 Dawkins, R. (1978) 2
advantage to the bacteria that has the penicillin resistant piece of DNA over the non resistant bacteria. 3. Heredity: The traits made a certain DNA mutation survive should past over to the next generation. This seems logical but is an essential part of evolution if a trait could not be passed from one generation to the next the whole process of variation and selection could not be exploited.Why are these three processes so important, because they also play a key role in our brains.Next to a genetic evolution the human brain undergoes a memetic evolution. Culture, customsand language are not transferred via genes but via so called memes. So what is a meme? Ameme is anything that can be copied between brains. This can literally be any concept,behaviour or word. The same process applies as in with genetic evolution. There is variation inmemes, a good example is the party game where a sentence is passed along by whispering it ineach others ears. The more people you add to line the more the sentence is transformed. Youcan see this as a variation. Selection some ideas stick in peoples minds and others don’t. Alsothe limited capacity of a human brain and the time it would take to pass on all ideas you havemake sure some memes are selected above others. And the heredity in the world of memeticsis that ensured by the sharing of ideas between brain.So does this memetic evolution effect language. It does; words get a new meaning and newwords are added. Some words become old fashioned or even complete languages die outbecause they are not taught to a new generation. Just like species of animals a language mustbe adept to its environment or go the path of the dodo. So a computer that learns languagemust be as adaptable and ever changing as language itself. Language is not a data set it is aprocess so the acquiring of language should be process focused. Competition like our brainsarose from genetic and memetic evolutionary competition.Imperfect data and time in the context of languageOur brains, the world and our senses are full of noise. Still our brains are very capable of ofcoping with these imperfections. Compare this to the world of computers and you see crackinglanguage will be hard with traditional Turing based computing. Knock out one bit in computermemory and it won’t function. Change a bit and a file and it will be corrupted. After a heavy nightof drinking and knocking out some brain cells our brain still functions. We can understandpeople in a crowded room with a lot of other people talking. Our brains are build to cope withimperfect data. A computer is much better in things where accuracy is required like calculus.Why is this?I think this has to do with two things the first is heuristics vs rules. A computer is able to copewith rules that are need to be followed in the same way. This is great for doing calculus, but notso great for tasks that need to be done in natural world You can see. Thus for example with amaking an algorithm that can distinguish between a cat and a dog. You can feed a computerthousand of images of cats and dogs but it will not be able to find a general rule. A computerthat learns language should be able to cope with heuristics. Heuristics are rules of thumb andnot hard coded if else statements. This being better able to cope with imperfect data. You could 3
argue that this is not really necessary. The human brain understands text only if it isgrammatically correct. But you can add a lot of noise to a text and a brain can still be able todecipher it. The the text below shows how even with a lot of noise added to a text a human isstill able to get the meaning from it.6 “Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosnt mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.”We do not read every individual letter but the the word as a whole. Also the shape and height ofdifferent letters make us able to read something much faster. “A ALL UPPERCASE TEXT IS MUCH MORE UNPLEASANT TO READ, than text with lowercase letters. We recognize different letters by their height.”For computer there is no real difference between the uppercase and lowercase letters as a datastructure. As long as the data is valid there is no problem for a computer to interpret the data.The text example above that would cause a problem. This is the fundamental differencebetween human and computers. For a computer validity is more important than structure. For ahuman structure is much more important than validity. We can see past a spelling error butknock out all line beaks, tabs and spaces in a piece of computer code and it will be unreadable.For a computer its the other way at around the mistake will make a piece of computer codeunreadable but the line breaks, spaces and tabs serve no function. The hard coded rules ask forvalidity above overall structure. Heuristics ask for overall structure but can handle much moremessy and imperfect input.A brain model for languageHierarchical Temporal Memory is a way of modelling the prefrontalcortex of a human. Theprefrontalcortex is the place where higher vision, listening and language is processed .# This isinteresting because coping with noisy input is what this part of the brain does and it isresponsible for language. So if there is a way of using the brain structures to understandlanguage it is the prefrontalcortex we should look.Hierarchical Temporal Memory (HTM) is based on concepts that are interesting for computersunderstanding language. The main one is that it is temporal. This means that things arerecognized in sequence. Language is sequential it exists in sentences. And sentences exist inparagraphs. So the temporal system makes sense for understanding language.6 Rawlinson, G. E.(1976) 4
The model is hierarchical this means that it goes up a pyramid like structure. With a broad baseand a narrow top. See image below. This is perfect for language, because this system couldfirst take up words (input image) than a paragraph (level 0) and move up through the hierarchy,and eventually answers the question what is this text about (level 1) and is interesting for user X(level 2)?So how does this HTM works? It is now used mostly for computer vision systems, but becausethe prefrontalcortex structure is the same for language processing as for vision this is noproblem. The system takes a group of pixels and looks at them if a pattern change occurs it firesup to next layer but also fires to its own layer to knock them out. In this way information travelsup the hierarchy and only the most efficient system “survives”. So the system competes on eachlayer of the hierarchy with itself. The layer that is most effective survives. This is how in thebrain different groups of brain cells compete and only the most effective paths survive. This wayof representing the world in entities and connections is how our brain works so it is effective fortranslating the products of these brains for example texts. The texts can also be representedhierarchically with words at the bottom, sentences in the next layer, paragraphs in the layer ontop of that and the full text on top. The text gets summarized into a few nodes or words at thetop level. This is how the brain stores information by finding a common denominator. If you givepeople a list of fruits but not the word fruit itself and quiz them later on the content of the list theyare sure that the word fruit was in the list. This is because the common label of the individualparts is fruits. With the HTM system a computer is able to make this same “mistake”. It does nottry to explain the complete data set of a text but tries to find the common denominator or thesubject of the text.The HTM method does not search for clear mathematical outlines but is useful for noisyheuristic problems. It is well suited for the problems raised in the chapters above. It works wellwith noisy data and competes on every step of the hierarchy with itself so an “evolutionary”process is facilitated.Conclusion 5
To overcome the flood of information that comes in via the internet and all other forms of mediawe need to install new filtering systems. A computer that could understand a text and see if it isrelevant for a user could be a solution. To do this we need a breakthrough innovation becausethe Turing machine based algorithms won’t do to solve the problem of language. So we need tolook to the only device that has solved the problem the human brain. The model we are going touse should like the human brain is able to cope with ambiguity and noise. Next to it shouldcompete in a “Darwinian” struggle. The Hierarchical Temporal Memory has these attributes andis based on the neo cortex. This is the part where language resides in the human brain. Tosolve the problem learning a computer language is the model to use.ReferenceBaldwin, C. Y., von Hippel, E. A. (2009). Modeling a Paradigm Shift: From Producer Innovationto User and Open Collaborative Innovation. Working Paper, Cambridge, December 23, 2009,http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1502864Dawkin, R.,” The Selfish Gene”,Oxford University Press, USA, 1978Hawkins, J.,” HIERARCHICAL TEMPORAL MEMORY”September 2011http://www.numenta.com/htm-overview/education/HTM_CorticalLearningAlgorithms.pdfHawkins, J.,George, D.,”Hierarchical Temporal Memory:Concepts, Theory, and Terminology” ,Numenta Inc., 27 March 2007http://www.numenta.com/htm-overview/education/Numenta_HTM_Concepts.pdfPartiser, E. , “The Filter Bubble”,Penguin Press HC,12 May 2011Shirky, C ,”Its Not Information Overload. Its Filter Failure”,Web 2.0 Expo NY ,19 September 2008http://blip.tv/web2expo/web-2-0-expo-ny-clay-shirky-shirky-com-it-s-not-information-overload-it-s-filter-failure-1283699Rawlinson, G. E., “The significance of letter position in word recognition.” Unpublished PhDThesis, Psychology Department, University of Nottingham, Nottingham UK.,1976 6