Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Myth of Data-Driven Natural Language Understanding

369 views

Published on

A short presentation on the difference between NLP and NLU and why the Data-Driven approach, while useful for some NLP tasks, is irrelevant to NLU

Published in: Technology
  • Be the first to comment

The Myth of Data-Driven Natural Language Understanding

  1. 1. NATURAL LANGUAGE UNDERSTANDING the myth of DATA-DRIVEN
  2. 2. inanimate as well as in the animate world, happens according to some rules, though we do not always know them. Everything in nature, in the
  3. 3. yes, in recent years there has been quite a bit of misguided and overstated hype in artificial intelligence (AI)
  4. 4. but what’s behind the recent resurgence of AI?
  5. 5. the availability of huge amounts data, coupled with advances in computer hardware and distributed computing resulted in some advances in certain types of (data-centric) problems (image, speech, fraud detection, text categorization, etc.)
  6. 6. but …but …
  7. 7. where’s the musical band?
  8. 8. musicians? a musical band? where’s the musical band?
  9. 9. A data-driven image recognition system can recognize concepts (objects) such as adult-female, or human, or cat, etc.
  10. 10. But can a data-driven image recognition system recognize an object of type teacher, or lawyer, or accountant, etc.? A data-driven image recognition system can recognize concepts (objects) such as adult-female, or human, or cat, etc.
  11. 11. But can a data-driven image recognition system recognize an object of type teacher, or lawyer, or accountant, etc.? A data-driven image recognition system can recognize concepts (objects) such as adult-female, or human, or cat, etc. x teacher A B C
  12. 12. While the purely data-driven (quantitative, statistical, ML) approaches are limited even in domains that are naturally data-centric (such as image recognition), these approaches are especially inadequate in tasks that require high-level reasoning, and in particular in natural language understanding 
  13. 13. A BRIEF HISTORY Early efforts to find theoretically elegant models for various linguistic phenomena did not result in any noticeable progress despite nearly three decades of intensive research (1950‟s through the late 1980‟s) 2500 BC 1990’s
  14. 14. A BRIEF HISTORY Early efforts to find theoretically elegant models for various linguistic phenomena did not result in any noticeable progress despite nearly three decades of intensive research (1950‟s through the late 1980‟s) As the various formal (and in most cases mere symbol manipulation) systems seemed to reach a deadlock, disillusionment in the brittle logical approach to language processing grew larger A number of researchers and practitioners in natural language processing (NLP) started to abandon theoretical elegance in favor of attaining some quick results using empirical (data-driven, statistical and machine learning) approaches. 2500 BC 1990’s
  15. 15. A BRIEF HISTORY This (data-driven/statistical/ML) revolution has continued to dominate, for nearly three decades now, stimulated in recent years by some relative success of deep learning (DL) techniques in image and other „pattern recognition‟ applications In recent years, data-driven NLP has gone beyond the „noble‟ cause of using empirical methods to find reasonably working solutions for practical problems. In fact, the data-driven approach to NLP is now believed by many to be a plausible approach to building systems that can truly understand ordinary spoken language. We believe this to be utterly misguided and that this trend/belief will hinder real progress in NLU – as Ken Church (2007) has put it: this is a situation of a “pendulum swung too far” Statistical Statistical Revolution Revolution
  16. 16. NLU VS NLP There are many applications that can benefit from data-driven NLP (filtering, shallow translation, classification/categorization, search, etc.), but these approaches are, in our view, irrelevant to NLU We believe that the data-driven approach to NLU is utterly misguided, and below we will present four technical reasons why this is so
  17. 17. DATA-DRIVEN NLU IS A MYTH There are (at least) four technical reasons why the data- driven approach to NLU is utterly misguided data-driven approaches - cannot account for function words - cannot „uncover‟ missing (implicitly assumed) text - fail when there‟s no statistical significance - cannot account for intensionality
  18. 18. FUNCTIONAL WORDS Function words are what determines („glues‟ together) the final meaning. This is crucial as different interpretations/orderings of function words would yield different results (when translating NL queries to database queries, for example) 1
  19. 19. FUNCTIONAL WORDS Function words are what determines („glues‟ together) the final meaning. This is crucial as different interpretations/orderings of function words would yield different results (when translating NL queries to database queries, for example) 1 S1 = Every city that has an ethnic minority S2 = A city that has every ethnic minority S1  S2
  20. 20. FUNCTIONAL WORDS Function words are what determines („glues‟ together) the final meaning. This is crucial as different interpretations/orderings of function words would yield different results (when translating NL queries to database queries, for example) 1 S3 = Every writer that writes for the White House S4 = Every writer that writes about the White House S3  S4 S1 = Every city that has an ethnic minority S2 = A city that has every ethnic minority S1  S2
  21. 21. FUNCTIONAL WORDS Function words „determine‟ the final meaning, and this is crucial in, for example, in Q&A: in translating NL queries to a database as different interpretations of function words would return different results 1 S3 = Every writer that writes for the White House S4 = Every writer that writes about the White House S3  S4 S1 = Every city that has an ethnic minority S2 = A city that has every ethnic minority S1  S2 Functional words are what determines the final meaning Data-driven approaches cannot account for functional words because their probabilities are meaningless (they occur in all contexts with equal probabilities)  Data driven approaches cannot account for meaning
  22. 22. MISSING TEXT 2 We cannot analyze what‟s not even in the „data‟
  23. 23. MISSING TEXT 2 (1) Don‟t worry, Simon is a rock. (2) The ham sandwich wants another beer. (3) Carlos likes to play bridge. (4) Mary enjoyed the movie. (5) Carl owns a house on every street in the village. (6) Jon owns Das Kapital but he never read it. (1) Don‟t worry, Simon is [as solid as] a rock. (2) The [person eating the] ham sandwich wants another beer. (3) Carlos likes to play [the game] bridge. (4) Mary enjoyed [watching] the movie. (5) Carl owns a [different] house on every street in the village. (6) Jon owns [the book] Das Kapital but he never read it[s content]. we usually say: to mean: We cannot analyze what‟s not even in the „data‟
  24. 24. MISSING TEXT 2 (1) Don‟t worry, Simon is a rock. (2) The ham sandwich wants another beer. (3) Carlos likes to play bridge. (4) Mary enjoyed the movie. (5) Carl owns a house on every street in the village. (6) Jon owns Das Kapital but he never read it. (1) Don‟t worry, Simon is [as solid as] a rock. (2) The [person eating the] ham sandwich wants another beer. (3) Carlos likes to play [the game] bridge. (4) Mary enjoyed [watching] the movie. (5) Carl owns a [different] house on every street in the village. (6) Jon owns [the book] Das Kapital but he never read it[s content]. we usually say: to mean: We cannot analyze what‟s not there Since most of the ‘understanding’ in NLU is about discovering the [missing text] – text that we leave out and implicitly assume to be part of our shared background knowledge, data-driven approaches are inadequate, since we cannot find what’s not there
  25. 25. STATISTICAL INSIGNIFICANCE 3 (1) The trophy did not fit in the brown suitcase because it was too a. big b. small (2) Dr. Smith told Jon that he should soon finish a. writing his thesis b. reading his thesis In most cases there‟s no statistical significance in the data to make the correct inferences. Antonyms/opposites are known to co-occur in similar contexts with the same frequency. Thus, in the data above there‟s no statistical significance to decide what „it‟ and „he‟ refer to!
  26. 26. STATISTICAL INSIGNIFICANCE 3 (1) The trophy did not fit in the brown suitcase because it was too a. big b. small Let‟s see how many examples we have to see (learn from), to capture statistical significance, if we insist on treating language as „data‟ What „it‟ can refer to is effected by many components: e.g., replace „because‟ by „although, „did not‟ by „did‟, „trophy‟ by „laptop‟... In total, there are at least 40,000,000 combinations that effect what „it‟ in (1) refers to.
  27. 27. STATISTICAL INSIGNIFICANCE 3 (1) The trophy did not fit in the brown suitcase because it was too a. big b. small Let‟s see how many examples we have to see (learn from), to capture statistical significance, if we insist on treating language as „data‟ What „it‟ can refer to is effected by many components: e.g., replace „because‟ by „although, „did not‟ by „did‟, „trophy‟ by „laptop‟... In total, there are at least 40,000,000 combinations that effect what „it‟ in (1) refers to. How many examples do we need to annotate and feed to our learning algorithms to capture a pattern among the 40,000,000 combinations (i.e., just to learn how to resolve ‘it’ in patterns like the above)?
  28. 28. STATISTICAL INSIGNIFICANCE 3 (1) The trophy did not fit in the brown suitcase because it was too a. big b. small Let‟s see how many examples we have to see (learn from), to capture statistical significance, if we insist on treating language as „data‟ What „it‟ can refer to is effected by many components: e.g., replace „because‟ by „although, „did not‟ by „did‟, „trophy‟ by „laptop‟... In total, there are at least 40,000,000 combinations that effect what „it‟ in (1) refers to. From the 40,000,000 combinations, how many examples would we need to annotate, just to ‘learn’ what ‘it’ refers to in syntactic patterns like that in (1)? to account for statistical insignificance it would seem that data-driven approaches would require a number of training examples that is neither computationally nor cognitively plausible !
  29. 29. ACCOUNTING FOR INTENSIONS 4 The phrases in blue below are equal (by data value), but in NL if we blindly interchange them, we can easily get into false conclusions, absurdities and contradictions
  30. 30. ACCOUNTING FOR INTENSIONS 4 The phrases in blue below are equal (by data value), but in NL if we blindly interchange them, we can easily get into false conclusions, absurdities and contradictions
  31. 31. ACCOUNTING FOR INTENSIONS 4 The phrases in blue below are equal (by data value), but in NL if we blindly interchange them, we can easily get into false conclusions, absurdities and contradictions
  32. 32. ACCOUNTING FOR INTENSIONS 4 The phrases in blue below are equal (by data value), but in NL if we blindly interchange them, we can easily get into false conclusions, absurdities and contradictions data-driven approaches operate at the data level only and cannot account for intensions, although in NL the intension of two equal data values might be different
  33. 33. ACCOUNTING FOR INTENSIONS 4 The simple word-embedding (vector representation of words) as data-driven approaches are forced to do, cannot account for the difference in the intensions of nouns and adjectives of various ontological categories. nouns and adjectives were not all created equal!
  34. 34. ACCOUNTING FOR INTENSIONS 4 nouns and adjectives were not all created equal! The simple word-embedding (vector representation of words) as data-driven approaches are forced to do, cannot account for the difference in the intensions of nouns and adjectives of various ontological categories.
  35. 35. ACCOUNTING FOR INTENSIONS 4 nouns and adjectives were not all created equal! The simple word-embedding (vector representation of words) as data-driven approaches are forced to do, cannot account for the difference in the intensions of nouns and adjectives of various ontological categories.
  36. 36. ACCOUNTING FOR INTENSIONS 4 Jon wants a pink elephant ) Does not mean a pink elephant exists Jon owns a pink elephant ) Some pink elephant necessarily exists The notion of intension, which extensional data-driven models cannot account for, is also important when it comes to textual entailment (which is behind every linguistic construct)
  37. 37. an interim summary: data-driven NLU is a myth because data-driven approaches: cannot account for function words cannot „uncover‟ missing (implicitly assumed) text fail when there‟s no statistical significance cannot account for intensionality
  38. 38. BUT LOGICAL SEMANTICS ALSO FAILED Logical semantics may have faltered due to the use of „logic as calculus‟ – a symbolic system devoid of any content; as opposed to using „logic as a language‟ with ontological content An ontological semantics that is „connected to our knowledge of the world‟ can be developed by rectifying a major oversight in logical semantics, namely distinguishing between two fundamentally very different types of concepts Ontological concepts types in a strongly-typed ontology Logical concepts properties of and relations between objects of various ontological types
  39. 39. ONTOLOGICAL SEMANTICS assume a theory of the world that is isomorphic to the way we talk about it embed in our semantics an ontological structure that resembles the metaphysical reality implicit in our spoken language ) ? Most of the questions of philosophers arise from our failure to understand the logic of our language L. WITTGENSTEIN
  40. 40. The theoretical foundations to this work lie in various disparate places ONTOLOGICAL SEMANTICS A „Logic as a Language‟ might therefore be the answer to Hobbs‟ suggestion of embedding in our semantics „a theory of the world that is isomorphic to the way we talk about it‟ Conceptual Realism while the logic that won the day is the „logic as a calculus‟ - which is an abstract symbol manipulation system devoid of any content, in a „logic as a language‟ logic has content, and ontological content, in particular N.B.COCCHIARELLA
  41. 41. The theoretical foundations to this work lie in various disparate places ONTOLOGICAL SEMANTICS This is not too far from saying the meaning of a word is determined from its context of use (Frege), or by the company it keeps (Firth). we know any object only through predicates that we can say or think of it articulate(x) ) (x :: human) imminent(x) ) (x :: event) I.KANT
  42. 42. The theoretical foundations to this work lie in various disparate places ONTOLOGICAL SEMANTICS While we make a car, manufacturing is a specific way of making, and thus car v artifact Language Tree the two objects are of the same type if they can be sensibly predicated of exactly the same set make(x) ) (x :: artifact) manufacture(x) ) (x :: car) F.SOMMERS
  43. 43. In „Logic as a Language‟ (or ontological semantics), ONTOLOGICAL SEMANTICS ontology is concerned with „first intentions‟ concepts (concepts abstracted from reality)
  44. 44. ONTOLOGICAL SEMANTICS ontology is concerned with „first intentions‟ concepts (concepts abstracted from reality) logic is concerned with „second intentions‟ concepts (concepts abstracted from the content of first intentions) In „Logic as a Language‟ (or ontological semantics),
  45. 45. Big Picture the
  46. 46. ONTOLOGICAL SEMANTICS The details of an „ontological semantics‟ that combines the power of logical semantics with reasoning over a simple ontological structure that reflects the commonsense knowledge implicit in our ordinary spoken language will follow In the meantime, click below to take a look at a recent paper:

×