Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep misconceptions and the myth of data driven NLU

687 views

Published on

Early efforts to find theoretically elegant formal models for various linguistic phenomena did not result in any noticeable progress, despite nearly three decades of intensive research (late 1950’s through the late 1980’s ). As the various formal (and in most cases mere symbol manipulation) systems seemed to reach a deadlock, disillusionment in the brittle logical approach to language processing grew larger, and a number of researchers and practitioners in natural language processing (NLP) started to abandon theoretical elegance in favor of attaining some quick results using empirical (data-driven) approaches.

All seemed natural and expected. In the absence of theoretically elegant models that can explain a number of NL phenomena, it was quite reasonable to find researchers shifting their efforts to finding practical solutions for urgent problems using empirical methods. By the mid 1990’s, a data-driven statistical revolution that was already brewing over took the field of NLP by a storm, putting aside all efforts that were rooted in over 200 years of work in logic, metaphysics, grammars and formal semantics.

We believe, however, that this trend has overstepped the noble cause of using empirical methods to find reasonably working solutions for practical problems. In fact, the data-driven approach to NLP is now believed by many to be a plausible approach to building systems that can truly understand ordinary spoken language. This is not only a misguided trend, but is a very damaging development that will hinder significant progress in the field. In this regard, we hope this study will help start a sane, and an overdue, semantic (counter) revolution.

Published in: Technology
  • Be the first to comment

Deep misconceptions and the myth of data driven NLU

  1. 1. Deep Misconceptions and the Myth of Data-Driven Language Understanding On Putting Logical Semantics Back to Work
  2. 2. IMMANUEL KANT Every thing in nature, in the inanimate as well as in the animate world, happens according to some rules, though we do not always know them I reject the contention that an important theoretical difference exists between formal and natural languages RICHARD MONTAGUE One can assume a theory of the world that is isomorphic to the way we talk about it… in this case, semantics becomes very nearly trivial JERRY HOBBS
  3. 3. Early efforts to find theoretically elegant formal models for various linguistic phenomena did not result in any noticeable progress, despite nearly three decades of intensive research (late 1950‟s through the late 1980‟s ). As the various formal (and in most cases mere symbol manipulation) systems seemed to reach a deadlock, disillusionment in the brittle logical approach to language processing grew larger, and a number of researchers and practitioners in natural language processing (NLP) started to abandon theoretical elegance in favor of attaining some quick results using empirical (data-driven) approaches. All seemed natural and expected. In the absence of theoretically elegant models that can explain a number of NL phenomena, it was quite reasonable to find researchers shifting their efforts to finding practical solutions for urgent problems using empirical methods. By the mid 1990‟s, a data-driven statistical revolution that was already brewing over took the field of NLP by a storm, putting aside all efforts that were rooted in over 200 years of work in logic, metaphysics, grammars and formal semantics. We believe, however, that this trend has overstepped the noble cause of using empirical methods to find reasonably working solutions for practical problems. In fact, the data-driven approach to NLP is now believed by many to be a plausible approach to building systems that can truly understand ordinary spoken language. This is not only a misguided trend, but is a very damaging development that will hinder significant progress in the field. In this regard, we hope this study will help start a sane, and an overdue, semantic (counter) revolution. Copyright © 2017 WALID S. SABA a spectre is haunting NLP February 7, 2017
  4. 4. Copyright © 2017 WALID S. SABA
  5. 5. Copyright © 2017 WALID S. SABA about the resurgence of and the currently dominant paradigm in ‘AI’ …
  6. 6. Copyright © 2017 WALID S. SABA the availability of huge amounts data, coupled with advances in computer hardware and distributed computing resulted in some advances in certain types of (data-centric) problems (image, speech, fraud detection, text categorization, etc.)
  7. 7. Copyright © 2017 WALID S. SABA But … many problems in AI require understanding that is beyond discovering patterns in data
  8. 8. Copyright © 2017 WALID S. SABA Identifying an adult female in an image is a data-centric problem that might be suitable for data-driven image recognition systems However, inferring which of the two is a photo of a teacher and which is a mother requires information that is not (always!) in the data
  9. 9. Copyright © 2017 WALID S. SABA which picture would a data-driven image recognition pick-out for a query like ‘musical band’?
  10. 10. Copyright © 2017 WALID S. SABA which picture would a data-driven image recognition pick-out for a query like ‘musical band’? which picture would a data-driven image recognition pick-out for a query like ‘musical band’? Musician? a person who plays a musical instrument?
  11. 11. Copyright © 2017 WALID S. SABA So what is at issue here? The issue here is that, ontologically, there are no musicians, teachers, lawyers, or even mothers! What exists, ontologically (metaphysically), are humans, and a concept such as ‘musician’ is a logical concept that might be true of a certain human Quantitative/Data-driven approaches can only reason with (detect, infer, recognize) objects that are of an ontological type, but they can not detect logical concepts, that form the majority of objects of (human) thought
  12. 12. Copyright © 2017 WALID S. SABA human lawyer dancer ONTOLOGIACL CONCEPTS LOGIACL CONCEPTS teacher mother ...
  13. 13. Copyright © 2017 WALID S. SABA failure to distinguish between logical and ontological concepts is not only a flaw in data-driven approaches logical/formal semantics also failed to provide adequate models for natural language and for exactly the same reason
  14. 14. Copyright © 2017 WALID S. SABA Notwithstanding achievements in data-centric tasks (e.g., image and speech recognition, or numerically specifiable and finite-space problems, such as the game Go), statistical and other data-driven models (e.g., neural networks) cannot model human language comprehension because these models cannot explain, model or account for very important phenomena in ordinary spoken language, such as: • Non-Observable (thus Non-Learnable) Information • Intensionality and Compositionally • Inferential Capacity
  15. 15. Criticisms of the statistical data-driven approach to language understanding are very often automatically associated with the Chomskyan school of linguistics. At best, this is a misinformed judgement (although in many cases, it is ill informed). There is a long history of work in logical semantics (a tradition that forms the background to the proposals we will make here) that has very little to do (if anything at all) with Chomskyan linguistics. Notwithstanding Chomsky‟s (in our opinion valid) Poverty of the Stimulus (POS) argument – an argument that clearly supports the claim of some kind of innate linguistic abilities, we believe that Chomskyans put too much emphasis on syntax and grammar (which ironically made their theory vulnerable to criticism from the statistical and data-driven school). Instead, we think that syntax and grammar are just the external artifacts used to express internal, logically coherent, semantic, and compositionally and productively (i.e., recursively) constructed thoughts, something that is perhaps analogous to Jerry Fodor‟s Language of Thought (LOT). Here we should also mention that we agree somewhat with M. C. Corballis („The Recursive Mind‟) that it is thought that brought about the external tool we call language, and not the other way around. Copyright © 2017 WALID S. SABA what this study is not about
  16. 16. Copyright © 2017 WALID S. SABA Another association that criticism of the statistical and data-driven approaches to NLU often conjures up is that of building large knowledge bases with brittle rule-based inference engines. This is perhaps the biggest misunderstanding, held not only by many in the statistical and data- driven camp, but also by previously over enthused knowledge engineers that mistakenly believed at one point that all that is required to crack the NLU problem was to keep adding more knowledge and more rules. We also do not subscribe to such theories. In fact, regarding the above, we agree with an observation once made by the late John McCarthy (at IJCAI 1995) that building ad-hoc systems by simply adding more knowledge and more rules will result in building systems that we don‟t even understand. Ockham's Razor, as well as observing linguistic skills of 5-year olds, should both tell us that the conceptual structures that might be needed in language understanding should not, in principal, require all that complexity. As will become apparent later in this study, the conceptual structures that speakers of ordinary spoken language have access to are not as massive and overwhelming as is commonly believed. Instead, it will be shown that the key is in the nature of that conceptual structure and the computational processes involved. what this study is not about
  17. 17. FINALLY, our concern here is in introducing a plausible model for natural language understanding (NLU). If your concern is natural language processing (NLP), as it is used, for example, in applications such as these words-sense disambiguation (WSD); entity extraction/named-entity recognition (NER); spam filtering, categorization, classification; semantic/topic-based search; word co-occurrence/concept clustering; sentiment analysis; topic identification; automated tagging; document clustering; summarization; etc. then it is best if we part ways at this point, since this is not at all our concern here. There are many NLP and text processing systems that already do a reasonable job on such data-level tasks. In fact, I am part of a team that developed a semantic technology that does an excellent job on almost all of the above, but that system (and similar systems) are light years away from doing anything remotely related to what can be called natural language understanding (NLU), which is our concern here. Copyright © 2017 WALID S. SABA what this study is not about
  18. 18. Copyright © 2017 WALID S. SABA 1 2 3 WE WILL ARGUE THAT purely data-driven extensional models that ignore intensionality, compositionality and inferential capacities in natural language are inappropriate, even when the relevant data is available, since higher-level reasoning (the kind that‟s needed in NLU) requires intensional reasoning beyond simple data values. WE WILL ARGUE THAT many language phenomena are not learnable from data because (i) in most situations what is to be learned is not even observable in the data (or is not explicitly stated but is implicitly assumed as „shared knowledge‟ by a language community); or (ii) in many situations there‟s no statistical significance in the data as the relevant probabilities are all equal WE WILL ARGUE THAT the most plausible explanation for a number of phenomena in natural language is rooted in logical semantics, ontology, and the computational notions of polymorphism, type unification, and type casting; and we will do this by proposing solutions to a number of challenging and well-known problems in language understanding. what this study is about
  19. 19. Copyright © 2017 WALID S. SABA We will propose a plausible model rooted in logical semantics, ontology, and the computational notions of polymorphism, type casting and type unification. Our proposal provides a plausible framework for modelling various phenomena in natural language; and specifically phenomena that requires reasoning beyond the surface structure (external data). To give a hint of the kind of reasoning we have in mind, consider the following sentences: (1) a. Jon enjoyed the movie b. Jon enjoyed watching the movie (2) a. A small leather suitcase was found unattended b. A leather small suitcase was found unattended (3) a. The ham sandwich wants another beer b. The person eating the ham sandwich wants another beer (4) a. Dr. Spok told Jon he should soon be done with writing the thesis b. Dr. Spok told Jon he should soon be done with reading the thesis Our model will explain why (1a) is understood by all speakers of ordinary language as (1b); why speakers in multiple languages find (2a) more natural to say than (2b); why we all understand (3a) as (3b); and why we effortlessly resolve „he‟ in (4a) with Jon and „he‟ in (4b) with Dr. Spok. Before we do so, however, we will discuss some serious flaws in proposing a statistical and a data-driven approach to NLU. more specifically ...
  20. 20. Copyright © 2017 WALID S. SABA is not even in the data? what if the relevant information understanding language by analyzing data?
  21. 21. Copyright © 2017 WALID S. SABA Challenges in the computational comprehension of ordinary text is often due to quite a bit of missing text – text which is not explicitly stated but is often assumed as shared knowledge among a community of language users. Consider for example the sentences in (1): (1) a. Don’t worry, Simon is a rock. b. The truck in front of us is annoying me. c. Carlos likes to play bridge. d. Mary enjoyed the apple pie. e. Jon owns a house on every street in the village. Clearly, speakers of ordinary English understand the above as (2) a. Don’t worry, Simon is [as solid as] a rock. b. The [person driving the] truck in front of us is annoying me. c. Carlos likes to play [the game] bridge. d. Mary enjoyed [eating] the apple pie. e. Jon owns a [different] house on every street in the village. Since such sentences are quite common and are not at all exotic, farfetched, or contrived, any model for NLU must clearly somehow „uncover‟ this [missing text] for a proper understanding of what is being said. What is certain here is that data-driven approaches are helpless in this regard, since a crucial part of understanding NL text is not only interpreting the data, but „discovering‟ what is missing from the data. analyzing missing text? it's not even in the data
  22. 22. Copyright © 2017 WALID S. SABA Again, let us consider the sentences below, where there is some [missing text] that is not explicitly stated in every day discourse, but is often implicitly assumed: a. Don’t worry, Simon is [as solid as] a rock. b. The [person driving the] truck in front of us is annoying me. c. Carlos likes to play [the game] bridge. d. Mary enjoyed [eating] the apple pie. e. Jon owns a [different] house on every street in the village. Although the above seem to have a common denominator, namely some missing text that is often implicitly assumed, it is somewhat surprising that in looking at the literature one finds that the missing text phenomenon have been studied quite independently and under different labels such as metaphor (a), metonymy (b), lexical ambiguity (c), ellipses (d), quantifier scope ambiguity (e) it's not even in the data analyzing missing text?
  23. 23. Copyright © 2017 WALID S. SABA In ordinary spoken language there’s more than missing (and implicitly assumed) text … When surface data probabilities are all equally likely, we often resort to our shared (commonsense) knowledge in resolving certain types of ambiguities (e.g., in reference resolution)
  24. 24. Copyright © 2017 WALID S. SABA One of the most obvious challenges to statistical and data-driven NLU are situations where there does not seem to be any statistical significance in the observed data that can help in making the right inferences. As an example, consider the sentences in (1) and (2). (1) The trophy did not fit in the brown suitcase because it was too a. big b. small (2) Dr. Spok told Jon that he should soon be done a. writing his thesis b. reading his thesis For a speaker of ordinary language, the decision as to what „it‟ in (1) and „he‟ in (2) refer to are immediately obvious, even for a 5-year old. On the other hand, a statistically data-driven approach would be helpless in making such decisions since the only difference between the sentence-pairs in (1) and (2) are words that co-occur with equal probabilities (this is so because antonyms or opposites, such as big/small, night/day, hot/cold, read/write, open/close, etc. have been shown to co-occur in text with equal frequency). Clearly, then, references such those in (1) and (2) must be resolved using information that is not (directly) in the data. probabilities are all equal it's not even in the data
  25. 25. Copyright © 2017 WALID S. SABA In the absence of any statistical significance in the data, we have suggested above that references such as those in sentences (1) and (2) are resolved by relying on other information that is not (directly) in the data. It might still be suggested, however, that a learning algorithm can create statistical significance between (1a) and (1b), for example, if probabilities of some composites in the sentence (as opposed to the atomic units) are considered. What this would essentially require is creating a composite feature for every possible relation. In (1), we would need at least the following: trophy-fit-in-suitcase-small trophy-fit-in-suitcase-big trophy-not-fit-in-suitcase-small trophy-not-fit-in-suitcase-big Note here that since data-driven approaches also do not admit the existence of a type-hierarchy (or any knowledge structure, for that matter) –i.e., there‟s nothing that says that a Trophy and a Radio are both subtypes of an Artifact, and that Purse and Suitcase are both subtypes of some Container, where the „fit‟ relation applies similarly to both, other features (e.g., radio-fit-in-purse-small) would also be needed to learn how to resolve the reference „it‟ in (1). probabilities are all equal it's not even in the data
  26. 26. Copyright © 2017 WALID S. SABA Again, in the absence of a type-hierarchy (or some other source of information) statistical significance can only be salvaged if composite features are constructed for every possible relation in a meaningful sentence. Such a story leads us to something like this: trophy-fit-in-suitcase-small trophy-fit-in-suitcase-big trophy-not-fit-in-suitcase-small trophy-not-fit-in-suitcase-big radio-fit-in-purse-small radio-fit-in-purse-big radio-not-fit-in-purse-small radio-not-fit-in-purse-big etc. Although the point can be made with the above, the story in reality is much worse, as there are more „nodes‟ that must be combined in these features to capture statistical significance. For example, if „because‟ was changed to „although‟ in (1b) then „it‟ would suddenly refer to the trophy. Nevertheless, the question now is how many such features would eventually be needed, if every meaningful sentence requires a handful of composite features to capture all statistical correlations? Fodor and Pylyshyn (1988) hint that that number is in the magnitude of the number of seconds in the history of the universe, citing an experiment conducted by the psycholinguist George Miller. probabilities are all equal it's not even in the data
  27. 27. Copyright © 2017 WALID S. SABA Incidentally, in the absence of any external knowledge structures, the combinatorially implausible explosion in the number of features needed by a statistical data-driven (i.e., bottom-up) learner would also be needed by a top-down learner, one that learns by being told (or by instruction). Specifically, a top-down learner would ask for n number of clarifications in every sentence, requiring therefore a total of nm clarifications for a paragraph with m sentences. The reader can now easily work out how many clarifications would be required for a top-down learner to understand just a small paragraph1. The point here is that whether the learner tries to discover what is missing bottom-up (from the data), or top-down (by being told), it would seem therefore that the infinity lurking in language (due to the recursive productivity of thoughts) makes learning various language phenomena just from data alone a computationally implausible theory. a top-down explanation 1The reason a top-down learner would need (n x n), as opposed to (n + n), clarifications for two consecutive sentences where each requires n is that the preferences of one sentence are subject to revision in the context of the previous and/or the following sentence. This is so because, linguistically, it is paragraphs, not sentences, that are the smallest linguistic units that can be fully interpreted on their own and should not (in theory) require any additional text to be fully understood. See The Semantics of Paragraphs (Zadrozny & Jenssen, 19xx) for an excellent treatment of the subject it's not even in the data
  28. 28. Copyright © 2017 WALID S. SABA Our argument against statistical data-driven approaches in NLU are not meant to dismiss the role of statistical/probabilistic reasoning in language understanding. That would, of course, be unwise. Our argument is about which probabilities are relevant in language understanding. Consider, for example, the following: (1) The town councilors refused to give the demonstrators a permit because they advocated violence and anarchy. (2) A young teenager fired several shots at a policeman. Eyewitnesses say he immediately fled away. While the most likely reading for (1) has they referring to the demonstrators, one can imagine a scenario where a group of anarchist town councilors refused to give the demonstrators a permit specifically to incite violence and anarchy. Similarly, while the most likely reading for (2) is the one where „he‟ refers to the young teenager, one can imagine a scenario where a slightly wounded policeman fled away to escape further injuries. Obviously such occurrences are rare, and thus, in the absence of other information, the pragmatic probability of the usual reading wins over with speakers of ordinary language. What is important to note here is that the likelihoods we are speaking of are a function of pragmatics and have nothing to do with anything observed in the data. pragmatic probabilities it's not even in the data
  29. 29. Copyright © 2017 WALID S. SABA To summarize this argument, consider the table below. At the data level references can be resolved during syntactic analysis using simple NUMBER or GENDER data. At the information level, the resolution would require semantic (type) information, for example that corporations, and not lawsuits, settle a case out of court. Note also that at this level the possibilities are not all available, once the type constraints are applied. It is exactly at the pragmatic level where probabilistic/statistical reasoning factors in, since at this level the referents are all possible, yet some are more probable than others (e.g., it is more likely that the one who fell down is the one who was shot, etc.) pragmatic probabilities REFERNCES RESOLVED BY SYNTAX John informed Mary that he passed the exam. John told Steve and Diane that they were invited to the party. information data intentional REFERNCES RESOLVED BY SEMANTICS There are a number of lawsuits between Apple and Samsung, and a. both say they are more about values than patents and money. b. both say they are ready to settle out of court. REFERNCES RESOLVED BY PRAGMATICS A young teenager fired several shots at a policeman. Eyewitnesses say he immediately fled away. REFERNCES CANNOT BE RESOLVED (intention not clear) John told Bill that he has been nominated to head the committee. information level data level knowledge level intentional level it's not even in the data
  30. 30. Copyright © 2017 WALID S. SABA Perhaps chief among the “it‟s not even in the data” phenomena is that of Adjective-Ordering Restrictions (AORs), a phenomenon that can be explained by the examples below: (1) a. Carlos is a polite young man b. #Carlos is a young polite man (2) a. A small brown suitcase was found unattended b. #A brown small suitcase was found unattended The readings in (1a) and (2a) are clearly preferred by speakers of ordinary spoken language over the readings in (1b) and (2b), although there are no rules that speakers of ordinary language seem to be following. What makes the AORs phenomenon even more intriguing is the fact that these preferences are also consistently made across multiple languages. First of all this phenomena presents a paradigmatic challenge to the statistical and data-driven story about language learning, as it does not seem that speakers come to have these preferences by observing and analyzing data. Furthermore, there does not seem to be a pattern in the observed data suggesting what adjectives should precede or follow other adjectives. For example, while it is preferred that „small‟ precede „brown‟ in (2), in (3) „small‟ is not anymore preferred to be the first adjective: (3) A beautiful small suitcase was found unattended innate preferences? it's not even in the data
  31. 31. Copyright © 2017 WALID S. SABA The most crucial challenge to data-driven NLU as it relates to adjective-ordering restrictions is to explain how beautiful in (4a) could be describing Olga‟s dancing as well as Olga as a person, while this reading is not available in (4b): (4) a. Olga is a tall beautiful dancer b. Olga is a beautiful tall dancer We will see later why beautiful in (4b) cannot anymore modify Olga‟s dancing (an abstract entity of type Activity) after it was polymorphically cast into describing a physical object. For now we want to note however that while various investigations on large corpora have not yielded any plausible explanation as to what seems to govern these adjective-ordering restrictions, we argue that even if some patterns were to be discovered, the more important question is „what is behind this phenomena – i.e., what is it that makes us have these ordering preferences, and across multiple languages‟? In our opinion, what is behind this phenomenon must be much deeper than the outside (observable) data of any language. In fact, we believe that a plausible account for this phenomena must shed some light on the conceptual structures and the processes that are operating in language. As stated above, a plausible explanation for this puzzle, one that is rooted in ontology, polymorphism, type unification and type casting, will be suggested later in this study. innate preferences? it's not even in the data
  32. 32. Copyright © 2017 WALID S. SABA We have thus far argued that in the absence of some process or other source of information, a number of phenomena in natural language understanding cannot be observed, captured, or learned by simply analyzing the external linguistic data alone. Whether it‟s adjective-ordering restrictions, which seem to be not only data-independent, but even language-independent, to the missing (not explicitly stated) text that must somehow be discovered and interpreted, to situations where probabilities in the data are statistically insignificant, it is clear that data-driven approaches to NLU are inappropriate. Before we get into our proposals, however, we will next have a small discussion about intensions and how data alone, even if available, is not enough in high-level reasoning, the kind that is needed in NLU.
  33. 33. Copyright © 2017 WALID S. SABA data is (in the end) just data no matter how big, extensions and intensions
  34. 34. Copyright © 2017 WALID S. SABA Clearly, as objects (e.g., as logical gates) the expressions in (1) are not the same. For example, a logical circuit corresponding to the expression on the left-hand side has only two gates, while a gate for the other expression would have three, as shown below. What do we mean when we write an equality like this? (A ^ (B _ C)) = (A ^ B) _ (A ^ C) It would seem then that at some level, equality in data only is not enough and saying two objects are the same is different from saying they are equal (in their data value). In some contexts, as will be seen shortly, these differences are crucial. What is crucial to our discussion here is that data-driven approaches deal with data only, that is, equality in that paradigm is equality of one attribute, namely the final value. Thus, if it does turn out that equality of data alone is not enough in high-level reasoning (e.g., in NLU), then data-driven approaches to NLU, would also (or, again) clearly be inappropriate. Let us therefore take a closer look at the equality most of us know, and the related notions of intensions and extensions, notions that some of the most penetrating minds in mathematical logic have studied for nearly two centuries. (1) data and intensions
  35. 35. Copyright © 2017 WALID S. SABA Our grade school teachers once told us that 256 = 16 Can we always equate and replace the data value 16 by the data value 256? Let‟s see … data and intensions
  36. 36. Copyright © 2017 WALID S. SABA Mary taught her little brother that 7 + 9 = 16 Now if we blindly follow what our grade school teachers told us, namely that 256 = 16, we should be able to replace 16 by 256 without any problem. But if we do that we would then be able to alter reality and come up with What happened? Were we taught the wrong thing when we were told that 256 = 16? Not exactly, but we were also not told the whole story. I guess our grade school teachers did not know we will end up working in AI and NLU. If they did, they would have told us that extensional (data only) equality is not sufficient in high-level reasoning, and if equated with sameness at that level it can easily lead to false conclusions. here‘s a snapshot of some reality Mary taught her little brother that 7 + 9 = 𝟐𝟓𝟔 data and intensions
  37. 37. Copyright © 2017 WALID S. SABA The four objects below are in fact equal, including 256 and 16, but in regard to one attribute only, namely their data value. As objects, however, they are not the same, as they differ in many other attributes, for example in the number of operators and the number of operands. Note however that the attributes value, no-of-operators, and no-of-operands are still not enough to establish true intensional equality between these objects, as demonstrated by the objects (a) and (b). At a minimum, true (intensional) equality between these objects would require the equality of at least four attributes: value, no-of-operators, no-of-operands, syntaxtree. equality and sameness (a) (b) In many domains where the only relevant attribute is the data (value), working with extensional (data equality) only might be enough. In tasks that require high-level reasoning, such as NLU, however, this will lead to contradictions and false conclusions, as the example of Mary and her little brother clearly demonstrate data and intensions
  38. 38. Copyright © 2017 WALID S. SABA The four objects below are in fact equal, including 256 and 16, but only in regard to one attribute only, namely their data value. As objects, however, they are not the same, as they differ in many other attributes, for example in the number of operators and the number of operands. Note however that the attributes value, no-of-operators, and no-of-operands are still not enough to establish true intensional equality between these objects, as demonstrated by the objects (a) and (b). At a minimum, true (intensional) equality between these objects would require the equality of at least four attributes: value, no-of-operators, no-of-operands, syntaxtree. equality and sameness (a) (b) In many domains where the only relevant attribute is the data (value), working with extensional (data equality) only might be enough. In tasks that require high-level reasoning, such as NLU, however, this will lead to contradictions and false conclusions, as the example of Mary and her little brother clearly demonstrate As an aside … Reducing equality of objects to equality of one extensional attribute, namely the data value, is what is behind the so-called adversarial examples in deep neural works, where small perturbations in the image (the kind of which will not affect the human eye from making a different classification) will cause the network to classify the image in a completely different category. The same is true in the converse case, where a completely meaningless image (a blob of pixels) is classified with high certainty as a real-life object. That is, behind both of these phenomena is something similar to the fact that 256 is not always (and in all contexts) equal to 9 + 7, although certain calculations involving these data values might produce the same output value (bottom line: extensional data-only equality is not enough in high level reasoning) data and intensions
  39. 39. Copyright © 2017 WALID S. SABA Beyond grade school, we were told in high school that two functions, f and g, are equal (are the same) if for every input they produce the same output. In notation, this was expressed as But this is not entirely true – or, our high school teachers did not also tell us the whole truth: if two functions are equal whenever they agree on their input-output pairings, then MergeSort and InsertionSort would be the same objects, since for any sequence But computer scientists know that although their external values are always the same (that is, they are extensionally equal), MergeSort and InsertSort are not the same objects as they differ in many other (and very important) attributes – for example in their space and time complexity. yet another example MergeSort(sequence) = InsertionSort(sequence) data and intensions
  40. 40. Copyright © 2017 WALID S. SABA data and reasoning Here we consider an example where working with extensions (data values) only and ignoring intensions can easily lead to absurd conclusions. Consider the facts shown in the table below. Now according to the above, the teacher of Alexander the Great = Aristotle. Notice now that if we simply replace „the teacher of Alexander the Great‟ with a value that is only extensionally equal to it, we can get an absurdity from a very meaningful sentence, as shown below data and intensions
  41. 41. Copyright © 2017 WALID S. SABA Let us now consider examples illustrating why intensionality cannot be ignored in natural language understanding. Suppose we have a question-answering system that was to return the names of: (1) all the tall presidents of the United States ? (2) all the former presidents of the United States ? A simple method for answering (1) would be to get two sets, the set of names of all tall people, and the set of names of all presidents of the United States, and simply return the intersection as the result. What about the query in (2), however? Clearly we cannot do the same, because we cannot, like in the case of tall, represent former by a set (an extension) of all former things. If we did, than Ronald Reagan, for example, would have been a „former president‟ even while serving his term as president, because he would have been in both sets: the set of presidents, and the set of „former things‟ as he was also a former actor. The point here is unlike tall, which is an extensional adjective that can semantically be represented by a set (the set of all tall things), former is an intensional adjective that logically operates on a concept returning a subset of that concept as a result. data and intensions data, intensions and reasoning
  42. 42. Copyright © 2017 WALID S. SABA Let us elaborate on this subject some more. The following is a plausible meaning for (1) and (2) above: (1) tall presidents of the United States ) f x j is-president-of-the-us(x) ^ is-tall(x)g (2) former presidents of the United States ) f x j is-president-of-the-us (x) ^ F(x, president)g What the above says is: (1) „tall presidents of the United States‟ refers to any x that is in the set of presidents and also in the set of tall things; and (2) „former presidents of the United States‟ refers to any x that is in the set of presidents and also some F is also true of x. Cleary what F does with an x is something to effect of making sure that x was, at some point in time, but is not now, a president. The point here is that unlike is-tall(x), F is not a set, and has no extensional value, but is a logical expression that takes a concept and applies some condition returning a subset of the original concept. All of this is not available in data-driven NLU, where both „tall‟ and „former‟ are adjectives that equally modify nouns, which, as we have seen, can result in contradictions when executed on real data. data and intensions data, intensions and reasoning
  43. 43. Copyright © 2017 WALID S. SABA One misguided attempt at salvaging the data-only solution would be to maintain a set of for the compound former presidents. This escape attempt is doomed, however, since composite sets for previous president, former senator, former governor, previous governor, etc. would also then have to be added and maintained. In fact, insisting on a data-only solution for intensional adjectives would essentially mean maintaining a set for every construction of the form [Adj1 Adj2 Noun], [Adj1 Adj2 Noun1 Noun2], … where any adjective Adji was an intensional adjective. This is exactly the same the situation we encountered previously (pages 12-15), where composite features for every possible relation were needed to resolve references in a data-driven model. In both cases, such alternatives are neither computationally, nor psychologically plausible. data and intensions data, intensions and reasoning
  44. 44. Copyright © 2017 WALID S. SABA Another major problem with data-driven/statistical approaches to NLU is their complete denial of compositionality in computing the meaning of larger linguistic units as a function of the meaning of their constituents. To illustrate, consider the sentences below. (1) Jon bought an old copy of Das Kapital. (2) Jon read an old copy of Das Kapital. Although (1) and (2) refer to the same object, namely to a book entitled „Das Kpital‟, the reference in (1) is to a physical object that can be bought (and thus sold, and burned, etc), while in (2) the reference is to the content and ideas in that book. Thus, „Das Kapital‟ may refer to different features or properties of the book, depending on the context, where the context could extend over several sentences. For example, consider (3): (3) Jon read Das Kapital. He then burned it because he did not agree with anything it espouses. In (3), we are (at the same time) using „Das Kapital‟ to refer to an abstract object (namely the content of Das Kapital) when Jon read it and then disagreed with it‟s content, and a physical object, that can be burned. We will see later on how a strongly-typed system will discover the existence of all the potential types of objects that „Das Kapital‟ can refer to (physical object that can be burned, abstract object that can be read and disagreed with, etc.) data and intensions compostionality
  45. 45. Copyright © 2017 WALID S. SABA In natural language we can speak of anything we can conceive or imagine, existent or non-existent. We can thus speak of and refer to an event that did not exist, as in (1) John cancelled the trip. It was planned for next Saturday. In (1), we are speaking about and referring to an event (a trip), that did not actually happen, thus a trip that never existed. We can also refer to or speak of objects that do not exist, as in (2) John painted a yellow bear. In (2) what is „yellow‟ is not an actual bear, but a depiction of some object, namely a bear. Reference to abstract and nonexistent objects can be quite involved, especially in mixed contexts where the initial reference is to an object that does not necessarily exist, but is an object that subsequent context implies its existence. For example, consider the following: (3) John’s book proposal was not well received. But it later became a bestseller when it was published. In (3), the reference was initially to a book proposal, which does not imply the existence of the book, although subsequent context implies the concrete existence of a book. Such inferences cannot be made with a simple analysis of the external data. data and intensions yellow bears?
  46. 46. Copyright © 2017 WALID S. SABA Data-driven approaches typically ignore functional words (prepositions, quantifiers, etc.), and for a good reason: the probabilities of these words are equal in all contexts! But such words cannot be ignored as these words are what logically glues the various components of a sentence into a coherent whole. Consider for example the determiner „a‟, the smallest word in English, in the following sentences: (1) A paper on genetics was published by every student of Dr. Miller (2) A paper on genetics was referenced by every student of Dr. Miller While „a paper on genetics‟ may refer to a single and specific paper in (2), this not likely in (1), where „a‟ is most likely under the scope of „every‟. That is, the most likely meaning of (1) the one implied by (3) Every student of Dr. Miller published a paper on genetics Resolving such quantifier scope ambiguities are clearly beyond data-driven approaches and are a function of pragmatic world knowledge (e.g., while it is possible for several students to refer to a single paper, it is not likely that all of Dr. Miller‟s students published the same paper…) We shall later on see how a strongly-typed ontology of commonsense concepts can be used to make such inferences. data and intensions functional words
  47. 47. Copyright © 2017 WALID S. SABA We (hopefully) have demonstrated that purely quantitative (statistical data-driven) approaches are not plausible models for natural language understanding, and for two main reasons: 1. The relevant information is often not even present in the data, or in many cases there is no statistical significance in the data to make the proper inferences. Attempts to remedy this lead to a combinatorial explosion in the size of features that would have to assumed, which renders these attempts computationally implausible. 2. It was shown that even when the data is available, reasoning with data only and ignoring intensions and logical definitions can easily lead to absurdities and contradictions. While statistical and data-driven models may not be appropriate for high-level reasoning tasks in language understanding, we believe that these models have a lot to offer in some linguistic and data- centric tasks. Chief among these are part-of-speech (POS) tagging, statistical parsing, and collecting and analyzing corpus linguistic data to „enable‟ and automate some of the tasks needed in building a system that can truly understand ordinary spoken languages. We are now in a position to start describing our proposal. data-driven NLU? so where are we now?
  48. 48. Copyright © 2017 WALID S. SABA ontological vs. logical concepts
  49. 49. Copyright © 2017 WALID S. SABA We will start with our proposal by first introducing the general framework, and we will do so gradually. The material presented form hereon assumes some exposure to logic, although we will try to simplify our presentation as much as can possibly be done. One of the major features in our framework is the crucial idea of distinguishing between what can be called ontological concepts, or first-intension concepts, as Cocchiarella (19xx) calls them, and logical concepts (or, second intension concepts). The difference between these two types of concepts can be illustrated by the following examples: (1) R2 : heavy(x :: physical) R3 : hungry(x :: animal) R4 : articulate(x :: human) R5 : make(x :: human, y :: artifact) R6 : imminent(x :: event) R7 : beautiful(x :: entity) What the above says is : heavy is a property that can be said of any object x that is of type physical; that we say hungry of objects that are of type animal; that articulate applies to objects that are of type human; that we can speak of the make relation between an object of type human and an object of type artifact; that we can say imminent of objects that are of type event; and, finally, that we can say beautiful of any entity. the framework ontological vs. logical concepts
  50. 50. Copyright © 2017 WALID S. SABA the framework ontological vs. logical concepts It is also assumed that the types associated with predicates in (1), e.g. artifact, event, human, entity, etc. exist in a subsumption hierarchy as shown in the fragment hierarchy below, and where the dotted arrows indicate the existence of intermediate types. The fact that an object of type human is ultimately an object of type entity is expressed as human v entity. Furthermore, a property such as heavy can be said of objects of type human and objects of type artifact since human v physical, artifact v physical and heavy(x :: physical).
  51. 51. Copyright © 2017 WALID S. SABA As mentioned earlier, a strongly-typed ontology is assumed throughout this study. Usually this conjures up thoughts of massive amount of knowledge that has to be hand coded and engineered by experts. This is not at all what we are assuming here. In fact, the ontological structure we are assuming (and we will discuss later on) is not massive at all since most everyday concepts are actually just instances of the basic ontological types. For example, there‟s nothing meaningful (i.e., sensible, regardless of whether it is true or false), in language that we can say about a „racing car‟ that we cannot say about a car). Thus, as far as language understanding, the ontological type car belongs to the ontology, and „racing car‟ is just an instance concept. With such an analysis, most of everyday concepts are just instances of basic ontological types. This issue is related to a comment that J. Fodor once made, something to the effect that “to be a concept, is to be locked to a word in the language”. This is also inline with Fred Sommers‟ idea of applicability in his proposal about The Tree of Language. Gottleb Frege‟s idea of how a word gets its meaning, namely from all the different ways it can be used in language, is also consistent with the ontological structure we assume, which was discovered by reverse engineering language itself. That is, what we can say about concepts, tells us what structure lies behind. We will discuss the details of the ontology later on, for now, we will simply assume that this ontological structure exists. about the ontological structure the framework
  52. 52. Copyright © 2017 WALID S. SABA According to the above, in our framework we assume a Platonic universe that includes everything we can talk about in ordinary discourse, including abstract objects such as events, states, properties, etc. These ontological concepts exist as types in a strongly-typed ontology, and the logical concepts are all the properties of, or the relations that can hold between, these ontological concepts. In addition to logical and ontological concepts there are proper nouns, which are the names of objects; objects that can be of any type. We use the notation (91Sheba :: thing) to state that there is a unique object named Sheba, an object that is of type thing. With this basic machinery, let‟s consider the interpretation of the simple sentence „Sheba is a thief‟, where 〚s〛 stands for 'the meaning of s', is used to mean 'is interpreted as', and thief(x :: human) states that the property thief applies to objects that must be of type human: (2) 〚Sheba is a thief〛 ) (91Sheba :: thing)(thief(Sheba :: human)) Thus „Sheba is a thief‟ is interpreted as follows: there is some unique object named Sheba, an object that is initially assumed to be a thing; such that the property thief is true of Sheba. ontological vs. logical concepts ) the framework
  53. 53. Copyright © 2017 WALID S. SABA Note that in our interpretation (repeated below) Sheba is now associated with more than one type in a single scope. (2) 〚Sheba is a thief〛 ) (91Sheba :: thing)(thief(Sheba :: human)) Initially unknown, and thus assumed to be an object of type thing, Sheba was later assigned the type human, when described by the property (or when in the context of being a) thief. In these situations a type unification must occur, and this is done as follows, (Sheba :: (thing ² human)) ! (Sheba :: human) where (s ² t) denotes a type unification between the types s and t, and where ! stands for „unifies to‟. Note that the unification of thing and human resulted in human since human v thing; that is, since an object that is of type human is ultimately an object of type thing. The final interpretation of „Sheba is a thief‟ is now the following: (2) 〚Sheba is a thief〛 ) (91Sheba :: human)(thief(Sheba)) In the final analysis „Sheba is a thief‟ is simply interpreted as: there is a unique object named Sheba, an object that (we now know) must be of type human, and that object is a thief. type unification – the basics the framework
  54. 54. Copyright © 2017 WALID S. SABA Although we have interpreted a very simple sentence, we have already seen the power of embedding ontological types (that exist in some strongly-typed hierarchy) into the powerful machinery of logical semantics. Specifically, it was the type constraint on the property thief(x :: human), namely that it applies to objects that must be of type human, that allowed us to discover the fact that Sheba must be a human. Admittedly, this a very trivial „discovery‟ and in a very simple context. However, the power of type unification and the hidden information it will uncover will be more appreciated as we move on to more involved contexts. Suppose black(x :: physical) and own(x :: human,y :: entity). That is, we are assuming that black can be said of all objects of type physical, and that objects of type human can own any object of type entity. With that, let us consider now the following: (3) 〚 Sara owns a black cat 〛 ) (91Sara :: thing)(9c :: cat)(black(c :: physical) ^ own(Sara :: human, c :: entity)) Thus „Sara owns a black cat‟ is interpreted as follows: there is a unique thing named Sara, and some object c of type cat, such that c is black (and thus here it must be of type physical), and Sara owns c, where in this context Sara must be an object of type human and c an object of type entity. type unification – the basics the framework
  55. 55. Copyright © 2017 WALID S. SABA Our interpretation for „Sara owns a black cat‟ is repeated below. (3) 〚 Sara owns a black cat 〛 ) (91Sara :: thing)(9c :: cat)(black(c :: physical) ^ own(Sara :: human, c :: entity)) Note now that, depending on the context they are mentioned in, Sara is assigned two types, and the object c is assigned three types. The type unifications that must occur in this situation are the following: (Sara :: (thing ² human)) ! (Sara :: human) (c :: ((physical ² entity) ² cat)) ! (c :: (physical ² cat)) ! (c :: cat) Note that the type unification (physical ² entity) ² cat) is associative, so the order in which the two type unifications are done does not matter. The final interpretation of „Sara owns a black cat‟ is therefore given by: (3) 〚 Sara owns a black cat 〛 ) (91Sara :: human)(9c :: cat)(black(c) ^ own(Sara, c)) That is, there is unique object named Sara, which is of type human, and some cat c, and Sara owns c. type unification – the basics the framework
  56. 56. type unification: the basics the framework Copyright © 2017 WALID S. SABA
  57. 57. Copyright © 2017 WALID S. SABA As mentioned in our introduction, in our framework ontological concepts include abstract objects such as states, processes, events, properties, etc. Let us now consider one of these categories, namely activities. In our framework a concept such as dancer(x) is true of some x according to the following: (8x :: human)(dancer(x) ´ (9d :: activity)(dancing(d) ^ agent(d, x)) That is, any object x of type human is a dancer iff there is some object d of type activity such that d is a dancing activity, and x is the agent of d. Note that according to the above, there are at least two objects that are part of the meaning of „dancer‟, and in particular, some object x of type human, and some dancing activity, d. Thus, in saying „beautiful dancer‟, for example, one could be using „beautiful‟ to describe the dancer, or the dancing activity itself. Consider now the interpretation below, assuming that beautiful(x :: entity); that is, assuming beautiful is a property that can be said of any entity: (4) 〚 Sara is a beautiful dancer 〛 ) (91Sara :: thing)(9a :: activity) (dancing(a) ^ agent(a :: activity,Sara :: human) ^ (beautiful(a :: entity) _ beautiful(Sara :: entity))) abstract objects the framework
  58. 58. Copyright © 2017 WALID S. SABA 〚 Sara is a beautiful dancer 〛 ) (91Sara :: thing)(9a :: activity) (dancing(a) ^ agent(a :: activity,Sara :: human) ^ (beautiful(a :: entity) _ beautiful(Sara :: entity))) Thus „Sara is a beautiful dancer‟ is interpreted as follows: there‟s a unique object named Sara, some activity a, such that a is a dancing activity, and Sara is the agent of a (and as such must be an object of type human), and either the dancing is beautiful, or Sara (or, of course, both). Note now that there are a number of type unifications that must occur: (Sara :: ((thing ² human) ² entity)) ! (Sara :: (human ² entity)) ! (Sara :: human) (a :: (activity ² entity)) ! (a :: activity) After all is said and done, the interpretation of (4) is the following: (4) 〚 Sara is a beautiful dancer 〛 ) (91Sara :: human)(9a :: activity) (dancing(a) ^ agent(a, Sara) ^ (beautiful(a)_beautiful(Sara))) Note that the ambiguity of what beautiful is describing is still represented in our final interpretation. abstract objects the framework
  59. 59. Copyright © 2017 WALID S. SABA Thus far our type unifications have always succeeded. In some cases, however, a type unification between two types s and t could fail, and we write this as (s ² t) ! ? Let us see where this might occur and what would this result in. Consider the interpretation of „Sara is a blonde dancer‟ where we assume blonde(x :: human), that is, we are assuming that blonde is a property that applies to objects that must be of type human. (5) 〚 Sara is a blonde dancer 〛 ) (91Sara :: thing)(9a :: activity) (dancing(a) ^ agent(a :: activity, Sara :: human) ^ (blonde(a :: human)_blonde(Sara :: human))) The type unifications needed for Sara are quite simple: (Sara :: ((thing ² human) ² human)) ! (Sara :: (human ² human)) ! (Sara :: human) The type unification needed for the activity a, however, is not as straightforward. Before we continue, let us plug in the type unification of Sara to see where we‟re at. failed type unifications the framework
  60. 60. Copyright © 2017 WALID S. SABA a brief detour
  61. 61. Copyright © 2017 WALID S. SABA Before we continue with our proposal, we would like to illustrate the utility of separating concepts into logical and ontological concepts. We will do this here by proposing a solution to the so-called Paradox of the Ravens. Introduced in the 1940‟s by the logician (and once an assistant of Rudolph Carnap) Carl Gustav Hempel, the Paradox of the Ravens (or Hempel‟s Paradox, or the Paradox of Confirmation) has continued to occupy logicians, statisticians, and philosophers of science to this day. The paradox arises when one considers what constitutes as an evidence for a statement (or hypothesis). To illustrate what the Paradox of the Ravens is consider the following: (H1) All ravens are black (H2) All non-black things are not ravens That is, we have the hypothesis H1 that „All ravens are black‟. This hypothesis, however, is logically equivalent to the hypothesis H2 that „All non-black things are not ravens‟, as shown below: (1) and (2) are logically equivalent, thus any evidence/observation that confirms H1 must also confirm H2 and vice versa. While it sounds reasonable that observing black ravens should confirm H1, observing a white ball, or a red sofa, that do confirm H2, also confirm the logically equivalent hypothesis that all ravens are black, which does not sound plausible. what paradox of the ravens? a temporary diversion
  62. 62. Copyright © 2017 WALID S. SABA what paradox of the ravens? Observing non-black objects that are not ravens as in (b), however, confirms hypothesis H2 (that all non-black things are not ravens). But H2 is logically equivalent to H1, leaving us with the unpleasant conclusion that observing red apples, blue suede shoes, or brown briefcases, confirms the hypothesis that „All ravens are black‟. Observing black ravens confirms hypothesis H1, namely that „All ravens are black‟ - the case in in (a). (a) (b)
  63. 63. Copyright © 2017 WALID S. SABA a temporary diversion what paradox of the ravens? Many solutions have been proposed to the Paradox of the Ravens that range from accepting the paradox (that observing red apples and other non-black non-ravens does confirm the hypothesis „All ravens are black‟) to proposals in the Bayesian tradition that try to measure the „degree‟ of confirmation. The Bayesian proposals essentially amount to proposing that observing a red apple does confirm the hypothesis „All ravens are black‟ but it does so very minimally, and certainly much less than the observation of a black raven confirms „All ravens are black‟. Clearly, this is not a satisfactory solution since observing a red flower should not contribute at all to the confirmation of „All ravens are black‟. Worse, in the Bayesian analysis, the observation of black but non-raven objects actually negatively confirms (or disconfirms) the hypothesis that „All ravens are black‟. One logician that stands out in suggesting an explanation for the Paradox of the Ravens is W. V. Quine, who suggested (in „Natural Kinds‟) that there is no paradox in the first place, since universal statements of the form All Fs are Gs can only be confirmed on what he called natural kinds, and that „nonblack things‟ and „non ravens‟ are not natural kinds. Basically, for Quine, members of a natural kind must share most of their properties, and there‟s hardly anything similar between all „non-black things‟, or all non-ravens. While statistical/Bayesian and other logical proposal still have not suggested a reasonable explanation for the Ravens Paradox, we believe that the line of thought Quine was perusing is the most appropriate. However, Quine‟s natural kinds were not well-defined. In fact, what Quine was alluding to, probably, was that there is a difference between what we have called here logical concepts and that of ontological concepts
  64. 64. Copyright © 2017 WALID S. SABA a temporary diversion what paradox of the ravens? The so-called Paradox of the Ravens exists simply because of mistakenly representing both ontological and logical concepts by predicates, although, ontologically, these two types of concepts are quite different. First, let us discuss some predicates and how we usually represent them in first-order logic. Consider the following: Suppose now that we would like to add types to our variables. That is, we would like our logical expressions to be, in computer programming terminology, strongly-typed. Suppose, further, that we also like our predicates to be polymorphic; that is, they apply to objects of a certain type and all of their subtypes. That is, if a predicate applies to objects of type vehicle, then it applies to all subtypes of vehicle (e.g., car, truck, bus, …) Given this, what are the appropriate types that one might associate with the variables of the predicates above? Here are some possible type assignments:
  65. 65. Copyright © 2017 WALID S. SABA a temporary diversion what paradox of the ravens? What the above suggests is that, ignoring metaphor for the moment, the predicate black applies to objects that are of type physical. In other words, black is meaningless (or nonsensical) when applied to (or said of) objects that are not of type physical. Similarly, the above says that imminent is said of objects that are of type event (and, of course, all its subtypes, so we can say „an imminent trip‟, an „imminent meeting‟, imminent election‟, etc.). In the same vein the above says that sympathetic is said of objects that must be of type human, and that hungry applies to objects of type animal. But how about the predicates in (5) and (6)? What are the most appropriate types that can be associated with the variables in the predicates dog(x) and guitar(x), or of what types of objects can these predicates be meaningful? The only plausible answer seems to be the following: (5) and (6) are obvious tautologies, since, for example, the predicate dog applied to an object of type dog is always true. Clearly, then, (5) and (6) are quite different from the predicates in (1) through (4) : while the predicates in (1) through (4) are logical concepts, dog and guitar are not redicates/logical concepts, but ontological concepts that correspond to types in a strongly-typed ontology. With this background, let us now go back to the so-called Paradox of the Ravens.
  66. 66. Copyright © 2017 WALID S. SABA a temporary diversion what paradox of the ravens?
  67. 67. Copyright © 2017 WALID S. SABA a temporary diversion what paradox of the ravens?
  68. 68. Copyright © 2017 WALID S. SABA a temporary diversion what paradox of the ravens?
  69. 69. Copyright © 2017 WALID S. SABA the framework failed type unifications
  70. 70. Copyright © 2017 WALID S. SABA the framework failed type unifications
  71. 71. Copyright © 2017 WALID S. SABA the framework failed type unifications
  72. 72. Copyright © 2017 WALID S. SABA the framework failed type unifications
  73. 73. Copyright © 2017 WALID S. SABA the framework failed type unifications
  74. 74. Copyright © 2017 WALID S. SABA the framework failed type unifications
  75. 75. Copyright © 2017 WALID S. SABA the framework failed type unifications
  76. 76. Copyright © 2017 WALID S. SABA salient properties/relations the framework
  77. 77. Copyright © 2017 WALID S. SABA the framework failed type unifications
  78. 78. Copyright © 2017 WALID S. SABA the framework failed type unifications
  79. 79. Copyright © 2017 WALID S. SABA ontological semantics: contents the road ahead
  80. 80. Copyright © 2017 WALID S. SABA the proposal words-sense disambiguation
  81. 81. Copyright © 2017 WALID S. SABA the proposal words-sense disambiguation
  82. 82. Copyright © 2017 WALID S. SABA the proposal words-sense disambiguation
  83. 83. Copyright © 2017 WALID S. SABA Let us now look at situations where lexical ambiguities translate into ambiguities in both, logical and ontological concepts. Consider the sentences in (12) and (13): (10) Melinda ran for twenty minutes. (11) The program ran for twenty minutes. First of all, there is a clear ambiguity in the meaning of „program‟, as it could refer to a computer program (i.e., a process), or to a program of some event, among other meanings. Second, it is clear that the running of Melinda in (10) is different from the running of the program in (11). Let us consider the simplest of these two cases, namely the ambiguity in (10), assuming that there are (at least) two kinds of running activities, one who‟s agent is a (legged) animal, and one who‟s agent is a process: What the above says is the following: there‟s a unique object named Melinda, some twenty minutes that Melinda ran, and either a running activity of some human, or the running of some process. the proposal words-sense disambiguation
  84. 84. Copyright © 2017 WALID S. SABA the proposal
  85. 85. Copyright © 2017 WALID S. SABA the proposal
  86. 86. Copyright © 2017 WALID S. SABA fragment of the ontology
  87. 87. Copyright © 2017 WALID S. SABA the proposal words-sense disambiguation
  88. 88. Copyright © 2017 WALID S. SABA the proposal
  89. 89. Copyright © 2017 WALID S. SABA words-sense disambiguation the proposal
  90. 90. Copyright © 2017 WALID S. SABA the proposal words-sense disambiguation
  91. 91. Copyright © 2017 WALID S. SABA the proposal
  92. 92. Copyright © 2017 WALID S. SABA the proposal
  93. 93. Copyright © 2017 WALID S. SABA the proposal
  94. 94. Copyright © 2017 WALID S. SABA the proposal
  95. 95. the proposal The corner table wants another beer Tables have ‘wants’, and they drink beer?!
  96. 96. the proposal
  97. 97. the proposal
  98. 98. the proposal
  99. 99. Copyright © 2017 WALID S. SABA To be continued ...

×