Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jordi Carrera Ventura

Speaker: Jordi Carrera Ventura, Artificial Intelligence technologist at Telefónica R&D

Summary: Chatbots (aka conversational agents, spoken dialogue systems) allow users to interface with computers using natural language by simply asking questions or issuing commands.
Given a query, the chatbot builds a semantic representation of the input, transforms it into a logical statement, and performs all the necessary actions to fulfill the user's intent. Sometimes this simply means calculating an exact answer or retrieving a fact from a database, whereas other times it means building a contextual model and running a full-fledged conversation flow while keeping track of anaphoras and cross-references.
Besides the direct applications of chatbots in IoT (Amazon’s Alexa, Apple's Siri) and IT (the historical field of Information Retrieval as a whole can be seen as a sub-problem of spoken dialogue systems), chatbots' main appeal for technologists is their location at the intersection of all major Natural Language Processing technologies and many of the deepest questions in Cognitive Science today: semantic parsing, entity recognition, knowledge representation, and coreference resolution.
In this talk, I will explore those questions in the context of an applied industry setting, and I will introduce a framework suitable for addressing them, together with an overview of the state-of-the-art in chatbot technology and some original techniques.

  • Be the first to comment

Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jordi Carrera Ventura

  1. 1. Recent advances in applied chatbot technology to be presented at AI Ukraine 2016 Jordi Carrera Ventura NLP scientist Telefónica Research & Development
  2. 2. Outline Overview Industrial state-of-the-art Current challenges Recent advances Conclusions
  3. 3. Overview
  4. 4. Chatbots are a form of conversational artificial intelligence (AI) in which a user interacts with a virtual agent through natural language messaging in • a messaging interface like Slack or Facebook Messenger • or a voice interface like Amazon Echo or Google Assistant. Chatbot1 A bot that lives in a chat (an automation routine inside a UI). Conversational agent Virtual Assistant1 A bot that takes natural language as input and returns it as output. Chatbot2 A virtual assistant that lives in a chat. User> what are you? 1 Generally assumed to have broader coverage and more advanced AI than chatbots2.
  5. 5. Usually intended to • get quick answers to a specific questions over some pre-defined repository of knowledge • perform transactions in a faster and more natural way. Can be used to • surface contextually relevant information • help a user complete an online transaction • serve as a helpdesk agent to resolve a customer’s issue without ever involving a human. Virtual assistants for Customer support, e-commerce, expense management, booking flights, booking meetings, data science... Use cases
  6. 6. Advantages of chatbots From: https://medium.com/@csengerszab/messenger-chatbots-is-it-a-new- revolution-6f550110c940
  7. 7. Actually, As an interface, chatbots are not the best. Nothing beats a spreadsheet for most things. For data already tabulated, a chatbot may actually slow things down. A well-designed UI can allow a user to do anything in a few taps, as opposed to a few whole sentences in natural language. But chatbots can generally be regarded as universal -every channel supports text. Chatbots as a UX are effective as a protocol, like email.
  8. 8. What's really important... Success conditions • Talk like a person. • The "back-end" MUST NOT BE a person. • Provide a single interface to access all other services. • Useful in industrial processes Response automation given specific, highly-repetitive linguistic inputs. Not conversations, but ~auto-replies linked to more refined email filters. • Knowledge representation.
  9. 9. What's really important... Conversational technology should really be about • dynamically finding the best possible way to browse a large repository of information/actions. • find the shortest path to any relevant action or piece of information (to avoid the plane dashboard effect). • surfacing implicit data in unstructured content ("bocadillo de calamares in Madrid"). Rather than going open-domain, taking the closed-domain and going deep into it.
  10. 10. What's really important... Conversational technology should really be about • dynamically finding the best possible way to browse a large repository of information/actions. • find the shortest path to any relevant action or piece of information (to avoid the plane dashboard effect). • surfacing implicit data in unstructured content ("bocadillo de calamares in Madrid"). Rather than going open-domain, taking the closed-domain and going deep into it.
  11. 11. ... and the hype
  12. 12. ... and the hype
  13. 13. ... and the hype
  14. 14. ... and the hype
  15. 15. ... and the hype
  16. 16. ... and the hype
  17. 17. ... and the hype
  18. 18. ... and the hype
  19. 19. ... and the hype
  20. 20. ... and the hype That was IBM Watson. 🙄
  21. 21. ... and the hype Getting started with bots is as simple as using any of a handful of new bot platforms that aim to make bot creation easy... ... sophisticated bots require an understanding of natural language processing (NLP) and other areas of artificial intelligence. The optimists
  22. 22. ... and the hype Artificial intelligence has advanced remarkably in the last three years, and is reaching the point where computers can have moderately complex conversations with users in natural language <perform simple actions as response to primitive linguistic signals>. The optimists
  23. 23. ... and the hype Most of the value of deep learning today is in narrow domains where you can get a lot of data. Here’s one example of something it cannot do: have a meaningful conversation. There are demos, and if you cherry-pick the conversation, it looks like it’s having a meaningful conversation, but if you actually try it yourself, it quickly goes off the rails. Andrew Ng (Chief Scientist, AI, Baidu)
  24. 24. ... and the hype Many companies start off by outsourcing their conversations to human workers and promise that they can “automate” it once they’ve collected enough data. That’s likely to happen only if they are operating in a pretty narrow domain – like a chat interface to call an Uber for example. Anything that’s a bit more open domain (like sales emails) is beyond what we can currently do. However, we can also use these systems to assist human workers by proposing and correcting responses. That’s much more feasible. Andrew Ng (Chief Scientist, AI, Baidu)
  25. 25. Industrial state-of-the-art
  26. 26. Lack of suitable metrics Industrial state-of-the-art Liu, Chia-Wei, et al. "How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation." arXiv preprint arXiv: 1603.08023 (2016).
  27. 27. By Intento (https://inten.to) Dataset SNIPS.ai 2017 NLU Benchmark https://github.com/snipsco/nlu-benchmark Example intents SearchCreativeWork (e.g. Find me the I, Robot television show) GetWeather (e.g. Is it windy in Boston, MA right now?) BookRestaurant (e.g. I want to book a highly rated restaurant for me and my boyfriend tomorrow night) PlayMusic (e.g. Play the last track from Beyoncé off Spotify) AddToPlaylist (e.g. Add Diamonds to my roadtrip playlist) RateBook (e.g. Give 6 stars to Of Mice and Men) SearchScreeningEvent (e.g. Check the showtimes for Wonder Woman in Paris) Industrial state-of-the-art Benchmark
  28. 28. Methodology English language. Removed duplicates that differ by number of whitespaces, quotes, lettercase, etc. Resulting dataset parameters 7 intents, 15.6K utterances (~2K per intent) 3-fold 80/20 cross-validation. Most providers do not offer programmatic interfaces for adding training data. Industrial state-of-the-art Benchmark
  29. 29. Framework Since F1 False positives Response time Price IBM Watson Conversation 2015 99.7 100% 0.35 2.5 API.ai (Google 2016) 2010 99.6 40% 0.28 Free Microsoft LUIS 2015 99.2 100% 0.21 0.75 Amazon Lex 2016 96.5 82% 0.43 0.75 Recast.ai 2016 97 75% 2.06 N/A wit.ai (Facebook) 2013 97.4 72% 0.96 Free SNIPS 2017 97.5 26% 0.36 Per device Industrial state-of-the-art
  30. 30. Logarithmic scale!
  31. 31. Current challenges
  32. 32. So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> that's a nice one but, i want to see different jackets .... how can i do that
  33. 33. So, let's be real... We've got what you're looking for! <image product_id=1 item_id=2> where can i find a pair of shoes that match this yak etc. We've got what you're looking for! <image product_id=1 item_id=3> pair of shoes (, you talking dishwasher)
  34. 34. So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets lexical variants: synonyms, paraphrases
  35. 35. So, let's be real... lexical variants: synonyms, paraphrases Intent
  36. 36. So, let's be real... lexical variants: synonyms, paraphrases Intent Entity
  37. 37. So, let's be real... lexical variants: synonyms, paraphrases Intent This can be normalized, e.g. regular expressions Entity
  38. 38. So, let's be real... lexical variants: synonyms, paraphrases Intent This can also be normalized, e.g. regular expressions Entity
  39. 39. So, let's be real... lexical variants: synonyms, paraphrases Intent This can also be normalized, e.g. regular expressions Entity 1-week project on a regular-expression-based filter. From 89% to 95% F1 with only a 5% positive rate. 7-label domain classification task using a RandomForest classifier over TFIDF-weighted vectors.
  40. 40. So, let's be real... lexical variants: synonyms, paraphrases Intent My internet connection does not work. My smartphone does not work. My landline does not work. My router does not work. My SIM does not work. My payment method does not work. My saved movies does not work.
  41. 41. So, let's be real... lexical variants: synonyms, paraphrases Intent My internet connection does not work. My smartphone does not work. My landline does not work. My router does not work. My SIM does not work. My payment method does not work. My saved movies does not work. ConnectionDebugWizard(*) TechnicalSupport(Mobile) TechnicalSupport(Landline) RouterDebugWizard() VerifyDeviceSettings AccountVerificationProcess TVDebugWizard
  42. 42. So, let's be real... lexical variants: synonyms, paraphrases Intent My internet connection does not work. My smartphone does not work. My landline does not work. My router does not work. My SIM does not work. My payment method does not work. My saved movies does not work. ConnectionDebugWizard(*) TechnicalSupport(Mobile) TechnicalSupport(Landline) RouterDebugWizard() VerifyDeviceSettings AccountVerificationProcess TVDebugWizard Since we associate each intent to an action, the ability to discriminate actions presupposes training as many separate intents, with just as many ambiguities.
  43. 43. So, let's be real... lexical variants: synonyms, paraphrases Intent My internet connection does not work. My smartphone does not work. My landline does not work. My router does not work. My SIM does not work. My payment method does not work. My saved movies does not work. ConnectionDebugWizard(*) TechnicalSupport(Mobile) TechnicalSupport(Landline) RouterDebugWizard() VerifyDeviceSettings AccountVerificationProcess TVDebugWizard It also presupposes exponentially increasing amounts of training data as we add higher-precision entities to our model (every entity * every intent that applies)
  44. 44. So, let's be real... lexical variants: synonyms, paraphrases Intent My internet connection does not work. My smartphone does not work. My landline does not work. My router does not work. My SIM does not work. My payment method does not work. My saved movies does not work. ConnectionDebugWizard(*) TechnicalSupport(Mobile) TechnicalSupport(Landline) RouterDebugWizard() VerifyDeviceSettings AccountVerificationProcess TVDebugWizard The entity resolution step (which would have helped us make the right decision) is nested downstream in our pipeline. Any errors from the previous stage will propagate down.
  45. 45. So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> that's a nice one but, I want to see different jackets .... how can i do that lexical variants: synonyms, paraphrases formal variants (typos, ASR errors) multiword chunking/parsing morphological variants
  46. 46. So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> lexical variants: synonyms, paraphrases formal variants (typos, ASR errors) multiword chunking/parsing morphological variants entity recognition hey i need to buy a men's red leather jacket with straps for my friend that's a nice one but, I want to see different jackets .... how can i do that
  47. 47. So, let's be real... where can i find a <jacket with straps> looking for a <black mens jacket> do you have anything in <leather> <present> for a friend }Our training dataset buy <red jacket>
  48. 48. So, let's be real... where can i buy <a men's red leather jacket with straps> for my friend where can i find a <jacket with straps> looking for a <black mens jacket> do you have anything in <leather> <present> for a friend hi i need <a men's red leather jacket with straps> that my friend can wear }Our training dataset }Predictions at runtime buy <red jacket>
  49. 49. So, let's be real... where can i buy <a men's red leather jacket with straps> for my friend where can i find a <jacket with straps> looking for a <black mens jacket> do you have anything in <leather> <present> for a friend hi i need <a men's red leather jacket with straps> that my friend can wear }Our training dataset }Predictions at runtime Incomplete training data will cause entity segmentation issues buy <red jacket>
  50. 50. So, let's be real... buy <red jacket> where can i find a <jacket with straps> looking for a <black mens jacket> do you have anything in <leather> <present> for a friend }Our training dataset But more importantly, in the intent-entity paradigm we are usually unable to train two entity types with the same distribution: where can i buy a <red leather jacket> where can i buy a <red leather wallet> jacket, 0.5 wallet, 0.5 jacket, 0.5 wallet, 0.5
  51. 51. So, let's be real... buy <red jacket> where can i find a <jacket with straps> looking for a <black mens jacket> do you have anything in <leather> <present> for a friend }Our training dataset ... and have no way of detecting entities used in isolation: pair of shoes MakePurchase ReturnPurchase ['START', 3-gram, 'END'] GetWeather Greeting Goodbye
  52. 52. that's a nice one but, I want to see different jackets .... how can i do that So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> lexical variants: synonyms, paraphrases formal variants (typos, ASR errors) multiword chunking/parsing morphological variants entity recognition hey i need to buy a men's red leather jacket with straps for my friend searchItemByText("leather jacket") searchItemByText( "to buy a men's ... pfff TLDR... my friend" ) [ { "name": "leather jacket", "product_category_id": 12, "gender_id": 0, "materials_id": [68, 34, ...], "features_id": [], }, ... { "name": "leather jacket with straps", "product_category_id": 12, "gender_id": 1, "materials_id": [68, 34, ...], "features_id": [8754], }, ]
  53. 53. that's a nice one but, I want to see different jackets .... how can i do that So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> lexical variants: synonyms, paraphrases formal variants (typos, ASR errors) multiword chunking/parsing morphological variants entity recognition hey i need to buy a men's red leather jacket with straps for my friend searchItemByText("leather jacket") searchItemByText( "red leather jacket with straps" ) [ { "name": "leather jacket", "product_category_id": 12, "gender_id": 0, "materials_id": [68, 34, ...], "features_id": [], }, ... { "name": "leather jacket with straps", "product_category_id": 12, "gender_id": 1, "materials_id": [68, 34, ...], "features_id": [8754], }, ] NO IDs, NO reasonable expectation of a cognitively realistic answer and NO "conversational technology".
  54. 54. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend Straps Jackets Leather items Since the conditions express an AND logical operator and we will search for an item fulfilling all of them, we could actually search for the terms in the conjunction in any order.
  55. 55. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend Given • any known word, denoted by w, • the vocabulary V of all known words, such that wi ∈ V for any i, • search space D, where d denotes every document in a collection D such that d ∈ D and d = {wi, wi + 1, ..., wn}, such that wi ≤ x ≤ n ∈ V, • a function filter(D, wx) that returns a subset of D, Dwx where wx ∈ d is true for every d: d ∈ Dwx, and • query q = {"men's", 'red', 'leather', 'jacket', 'straps'},
  56. 56. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend do filter(filter(filter(filter( filter(D, 'leather'), 'straps'), "men's"), 'jacket'),'red' ) = do filter(filter(filter(filter( filter(D, 'straps'), 'red'), 'jacket'), 'leather'),"men's" ) = do filter(filter(filter(filter( filter(D, "men's"), 'leather'), 'red'), 'straps'),'jacket' ) =
  57. 57. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend Given query q = {"men's", 'red', 'leather', 'jacket', 'straps'}, find the x: x ∈ q that satisfies: containsSubstring(x, 'leather') ^ containsSubstring(x, 'straps') ^ ... containsSubstring(x, "men's") As a result, it will lack the notion of what counts as a partial match. We cannot use more complex predicates because, in this scenario, the system has no understanding of the internal structure of the phrase, its semantic dependencies, or its syntactic dependencies.
  58. 58. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend The following partial matches will be virtually indistinguishable: • leather straps for men's watches (3/5) • men's vintage red leather shoes (3/5) • red wallet with leather straps (3/5) • red leather jackets with hood (3/5)
  59. 59. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend do filter(filter(filter(filter( filter(D, "men's"), 'red'), 'leather'), 'jacket'),'straps' ) Given • any known word, denoted by w, • the vocabulary V of all known words, such that wi ∈ V for any i, • search space D, where d denotes every document in a collection D such that d ∈ D and d = {wi, wi + 1, ..., wn}, such that wi ≤ x ≤ n ∈ V, • a function filter(D, wx) that returns a subset of D, Dwx where wx ∈ d is true for every d: d ∈ Dwx, and • query q = {"men's", 'red', 'leather', 'jacket', 'straps'}, This is our problem: we're using a function that treats equally every wx.
  60. 60. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend do filter(filter(filter(filter( filter(D, "men's"), 'red'), 'leather'), 'jacket'),'straps' ) Given • any known word, denoted by w, • the vocabulary V of all known words, such that wi ∈ V for any i, • search space D, where d denotes every document in a collection D such that d ∈ D and d = {wi, wi + 1, ..., wn}, such that wi ≤ x ≤ n ∈ V, • a function filter(D, wx) that returns a subset of D, Dwx where wx ∈ d is true for every d: d ∈ Dwx, and • query q = {"men's", 'red', 'leather', 'jacket', 'straps'}, We need a function with at least one additional parameter, rx filter(D, wx, rx) where rx denotes the depth at which the dependency tree headed by wx attaches to the node being currently evaluated.
  61. 61. So, let's be real... Your query i need to buy a men's red leather jacket with straps for my friend Tagging i/FW need/VBP to/TO buy/VB a/DT men/NNS 's/POS red/JJ leather/NN jacket/NN with/IN straps/NNS for/IN my/PRP$ friend/NN Parse (ROOT (S (NP (FW i)) (VP (VBP need) (S (VP (TO to) (VP (VB buy) (NP (NP (DT a) (NNS men) (POS 's)) (JJ red) (NN leather) (NN jacket)) (PP (IN with) (NP (NP (NNS straps)) (PP (IN for) (NP (PRP$ my) (NN friend))))))))))) http://nlp.stanford.edu:8080/parser/index.jsp (as of September 19th 2017)
  62. 62. So, let's be real... Your query i need to buy a men's red leather jacket with straps for my friend Tagging i/FW need/VBP to/TO buy/VB a/DT men/NNS 's/POS red/JJ leather/NN jacket/NN with/IN straps/NNS for/IN my/PRP$ friend/NN Parse (ROOT (S (NP (FW i)) (VP (VBP need) (S (VP (TO to) (VP (VB buy) (NP (NP (DT a) ((NNS men) (POS 's)) ((JJ red) (NN leather)) (NN jacket))) (PP (IN with) (NP (NP (NNS straps)) (PP (IN for) (NP (PRP$ my) (NN friend))))))))))) http://nlp.stanford.edu:8080/parser/index.jsp (as of September 19th 2017)
  63. 63. So, let's be real... Your query i need to buy a men's red leather jacket with straps for my friend Tagging i/FW need/VBP to/TO buy/VB a/DT men/NNS 's/POS red/JJ leather/NN jacket/NN with/IN straps/NNS for/IN my/PRP$ friend/NN Parse (ROOT (S (NP (FW i)) (VP (VBP need) (S (VP (TO to) (VP (VB buy) (NP (NP (DT a) ((NNS men) (POS 's)) ((JJ red) (NN leather)) (NN jacket))) (PP (IN with) (NP (NP (NNS straps)) (PP (IN for) (NP (PRP$ my) (NN friend))))))))))) http://nlp.stanford.edu:8080/parser/index.jsp (as of September 19th 2017) (red, leather) (leather, jacket) (jacket, buy)
  64. 64. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend do filter(filter(filter(filter( filter(D, "men's"), 'red'), 'leather'), 'jacket'),'straps' ) Given • any known word, denoted by w, • the vocabulary V of all known words, such that wi ∈ V for any i, • search space D, where d denotes every document in a collection D such that d ∈ D and d = {wi, wi + 1, ..., wn}, such that wi ≤ x ≤ n ∈ V, • a function filter(D, wx) that returns a subset of D, Dwx where wx ∈ d is true for every d: d ∈ Dwx, and • query q = {"men's", 'red', 'leather', 'jacket', 'straps'}, The output of this function will no longer be a subset of search results but a ranked list. This list can be naively ranked by ∑x |t| (1 / rx) although much more advanced scoring functions can be used.
  65. 65. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet"
  66. 66. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend >>> t1 = dependencyTree(a1) >>> t1.depth("black") 3 >>> t1.depth("jacket") 1 Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet"
  67. 67. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend >>> results = filter(D, "jacket", t1.depth("jacket")) >>> results [ ( Jacket1, 1.0 ), ( Jacket2, 1.0 ), ..., ( Jacketn, 1.0 ) ] Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet"
  68. 68. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend >>> results = filter(results, "leather", t1.depth("leather")) >>> results [ ( Jacket1, 1.5 ), ( Jacket2, 1.5 ), ..., ( Jacketn - x, 1.5 ) ] Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet"
  69. 69. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet" >>> results = filter(results, "red", t1.depth("red")) >>> results [ ( Jacket1, 1.5 ), ( Jacket2, 1.5 ), ..., ( Jacketn - x, 1.5 ) ]
  70. 70. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend >>> t2 = dependencyTree(a2) >>> t2.depth("red") 3 >>> t2.depth("jacket") None Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet"
  71. 71. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend >>> results = filter(D, "jacket", t2.depth("jacket")) >>> results [] Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet"
  72. 72. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend >>> results = filter(results, "leather", t2.depth("leather")) >>> results [ ( Wallet1, 0.5 ), ( Wallet2, 0.5 ), ..., ( Strapn, 0.5 ) ] Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet"
  73. 73. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet" >>> results = filter(results, "red", t2.depth("red")) >>> results [ ( Wallet1, 0.83 ), ( Wallet2, 0.83 ), ..., ( Strapn - x, 0.83 ) ]
  74. 74. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend Results for q2: "red leather wallet" [ ( Wallet1, 0.83 ), ( Wallet2, 0.83 ), ..., ( Strapn - x, 0.83 ) ] Results for q1: "black leather jacket" [ ( Jacket1, 1.5 ), ( Jacket2, 1.5 ), ..., ( Jacketn - x, 1.5 ) ]
  75. 75. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend However, the dependency tree is not always available, and it often contains parsing issues (cf. Stanford parser output). An evaluation of parser robustness over noisy input reports performances down to an average of 80% across datasets and parsers. Hashemi, Homa B., and Rebecca Hwa. "An Evaluation of Parser Robustness for Ungrammatical Sentences." EMNLP. 2016.
  76. 76. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend In order to make this decision [ [red leather] jacket ]👍
  77. 77. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend [ [red leather] jacket ] [ red [leather jacket] ]👍 In order to make this decision
  78. 78. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ]👍 In order to make this decision
  79. 79. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ] 😱 In order to make this decision [ [men's leather] jacket ]
  80. 80. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ] [ [men's leather] jacket ]😱 In order to make this decision ... the parser already needs as much information as we can provide regarding head- modifier attachments.
  81. 81. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ] [ [men's leather] jacket ] In order to make this decision 2 2 1 2 1 1 2 1 1 2 2 1
  82. 82. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend In order to make this decision Same pattern, different score. How to assign a probability such that less plausible results are buried at the bottom of the hypotheses space? (not only for display purposes, but for dynamic programming reasons) [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ] [ [men's leather] jacket ] 2 2 1 2 1 1 2 1 1 2 2 1
  83. 83. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend In order to make this decision p(red | jacket ) ~ p(red | leather ) p(men's | jacket ) > [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ] [ [men's leather] jacket ] 2 2 1 2 1 1 2 1 1 2 2 1 p(men's | leather )
  84. 84. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend That's a lot of data to model. Where will we get it from? [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ] [ [men's leather] jacket ] 2 2 1 2 1 1 2 1 1 2 2 1
  85. 85. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend That's a lot of data to model. Where will we get it from? p(Color | ClothingItem ) ~ p(Color | Material ) p(Person | ClothingItem ) > [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ] [ [men's leather] jacket ] 2 2 1 2 1 1 2 1 1 2 2 1 p(Person | Material )
  86. 86. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend If our taxonomy is well built, semantic associations acquired for a significant number of members of each class will propagate to new, unseen members of that class. With a sufficiently rich NER detection pipeline and a semantic relation database large enough, we can rely on simple inference for a vast majority of the disambiguations. More importantly, we can weight candidate hypotheses dynamically in order to make semantic parsing tractable over many possible syntactic trees.
  87. 87. So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> that's a nice one but, I want to see different jackets .... how can i do that lexical variants: synonyms, paraphrases formal variants (typos, ASR errors) multiword chunking/parsing morphological variants entity recognition entity linking hierarchical structure
  88. 88. So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> that's a nice one but, i want to see different jackets .... how can i do that
  89. 89. So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand
  90. 90. So, let's be real... that one do that LastResult jacket LastCommand that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand
  91. 91. So, let's be real... that LastResult IF: pPoS-tagging(that_DT | * be_VBZ, START *) > pPoS-tagging(that_IN | * be_VBZ, START *) that_DT 's_VBZ great_JJ !_PUNCT that_DT 's_VBZ the_DT last_JJ... he said that_IN i_PRP... he said that_IN trump_NNP... THEN IF: pisCoreferent(LastValueReturned | that_DT) > u # may also need to condition on the # dependency: (x, nice) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand
  92. 92. So, let's be real... one jacket Find the LastResult in the context that maximizes: pisCoreferent( LastResult.type=x | one_CD, # trigger for the function, trivial condition QualitativeEvaluation(User, x), ContextDistance(LastResult.node) ) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand
  93. 93. So, let's be real... one jacket Particularly important for decision-tree re-entry that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand Find the LastResult in the context that maximizes: pisCoreferent( LastResult.type=x | one_CD, # trigger for the function, trivial condition QualitativeEvaluation(User, x), ContextDistance(LastResult.node) )
  94. 94. So, let's be real... no! forget the last thing i said go back! third option in the menu i want to modify the address I gave you
  95. 95. So, let's be real... QualitativeEvaluation(Useru, ClothingItemi) BrowseCatalogue(Useru, filter=[ClothingItemn, j |J|]) HowTo(Useru, Commandc |Context|) }A single utterance that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand
  96. 96. So, let's be real... }A single utterance that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand QualitativeEvaluation(Useru, ClothingItemi) BrowseCatalogue(Useru, filter=[ClothingItemn, j |J|]) HowTo(Useru, Commandc |Context|)
  97. 97. So, let's be real... }random.choice(I) x if x ∈ I ^ length(x) = argmax(L) I = [ i1, i2, i3 ] L = [ | i1 |, | i2 |, | i3 | ] P = [ p( i1 ), p( i2 ), p( i3 )] x if x ∈ I ^ p(x) = argmax(P) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand QualitativeEvaluation(Useru, ClothingItemi) BrowseCatalogue(Useru, filter=[ClothingItemn, j |J|]) HowTo(Useru, Commandc |Context|)
  98. 98. So, let's be real... } random.choice(I) x if x ∈ I ^ length(x) = argmax(L) I = [ i1, i2, i3 ] L = [ | i1 |, | i2 |, | i3 | ] P = [ p( i1 | q ), p( i2 | q ), p( i3 | q ) ] x if x ∈ I ^ p(x | q) = argmax(P) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand However, this approach will still suffer from segmentation issues due to passing many more arguments than expected for resolving any specific intent.
  99. 99. So, let's be real... GetInfo(Useru, argmax(Commandc |Context| ) ) SetPreference(Useru, SetRating(ClothingItemi, k, 1) ) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand ∀(ClothingItemn, m |N| |M| | n = i, k = * ) } A standard API will normally provide these definitions as the declaration of its methods.
  100. 100. So, let's be real... GetInfo(Useru, argmax(Commandc |Context| ) ) SetPreference(Useru, SetRating(ClothingItemi, k, 1) ) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand ∀(ClothingItemn, m |N| |M| | n = i, k = * ) } We can map the argument signatures to the nodes of the taxonomy we are using for linguistic inference.
  101. 101. So, let's be real... GetInfo(Useru, argmax(Commandc |Context| ) ) SetPreference(Useru, SetRating(ClothingItemi, k, 1) ) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand ∀(ClothingItemn, m |N| |M| | n = i, k = * ) }The system will then be able to link API calls to entity parses dynamically.
  102. 102. So, let's be real... GetInfo(Useru, argmax(Commandc |Context| ) ) SetPreference(Useru, SetRating(ClothingItemi, k, 1) ) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand ∀(ClothingItemn, m |N| |M| | n = i, k = * ) } Over time, new methods added to the API will be automatically supported by the pre-existing linguistic engine.
  103. 103. So, let's be real... GetInfo(Useru, argmax(Commandc |Context| ) ) SetPreference(Useru, SetRating(ClothingItemi, k, 1) ) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand ∀(ClothingItemn, m |N| |M| | n = i, k = * ) } Over time, new methods added to the API will be automatically supported by the pre-existing linguistic engine. 😁
  104. 104. So, let's be real... If the system can reasonably solve nested semantic dependencies hierarchically, it can also be extended to handle multi-intent utterances in a natural way.
  105. 105. So, let's be real... We need: • semantic association measures between the leaves of the tree in order to attach them to the right nodes and be able to stop growing a tree at the right level, • a syntactic tree to build the dependencies, either using a parser or a PCFG/LFG grammar (both may take input from the previous step, which will provide the notion of verbal valence: Person Browse Product), • for tied analyses, an interface with the user to request clarification,
  106. 106. So, let's be real... Convert to abstract entity types Activate applicable dependencies/grammar rules (fewer and less ambiguous thanks to the previous step) i need to pay my bill and i cannot log into my account i need to PAYMENT my INVOICE and i ISSUE ACCESS ACCOUNT Resolve attachments and rank candidates by semantic association, keep top-n hypotheses i need to [ PAYMENT my INVOICE ] and i [ ISSUE [ AccessAccount() ] ] i need to [ Payment(x) ] and i [ Issue(y) ]
  107. 107. So, let's be real... AND usually joins ontologically coordinate elements, e.g. books and magazines versus *humans and Donald Trump i need to [ Payment(x) ] and i [ Issue(y) ]
  108. 108. So, let's be real... Even if p0 = p( Payment | Issue ) > u, we will still split the intents because here we have py = p( Payment | Issue, y ), and normally py < p0 for any irrelevant case (i.e., the posterior probability of the candidate attachment will not beat the baseline of its own prior probability) i need to [ Payment(x) ] and i [ Issue(y) ]
  109. 109. So, let's be real... I.e., Payment may have been attached to Issue if no other element had been attached to it before. Resolution order, as given by attachment probabilities, matters. i need to [ Payment(x) ] and i [ Issue(y) ]
  110. 110. So, let's be real... i need to [ Payment(x) ] i [ Issue(y) ] i need to [ Payment(x) ] and i [ Issue(y) ] Multi-intents solved 😎
  111. 111. Summary of desiderata 1. Normalization 1. Ability to handle input indeterminacy by activating several candidate hypotheses 2. Ability to handle synonyms 3. Ability to handle paraphrases 2. NER 1. Transversal (cross-intent) 2. No nesting under classification (feedback loop instead) 3. Contextually unbounded 4. Entity-linked 3. Semantic parsing 1. Taxonomy-driven supersense inference 2. Semantically-informed syntactic trees 3. Ability to handle decision-tree re-entries
  112. 112. Summary of desiderata 1. Normalization 1. Ability to handle input indeterminacy by activating several candidate hypotheses 2. Ability to handle synonyms 3. Ability to handle paraphrases 2. NER 1. Transversal (cross-intent) 2. No nesting under classification (feedback loop instead) 3. Contextually unbounded 4. Entity-linked 3. Semantic parsing 1. Taxonomy-driven supersense inference 2. Semantically-informed syntactic trees 3. Ability to handle decision-tree re-entries Word embeddings, domain thesaurus, fuzzy matching Inverted indexes, character- based neural networks for NER
  113. 113. Summary of desiderata 1. Normalization 1. Ability to handle input indeterminacy by activating several candidate hypotheses 2. Ability to handle synonyms 3. Ability to handle paraphrases 2. NER 1. Transversal (cross-intent) 2. No nesting under classification (feedback loop instead) 3. Contextually unbounded 4. Entity-linked 3. Semantic parsing 1. Taxonomy-driven supersense inference 2. Semantically-informed syntactic trees 3. Ability to handle decision-tree re-entries Word embeddings, domain thesaurus, fuzzy matching Inverted indexes, character- based neural networks for NER implicitly solved by Supersenses + Neural enquirer implicitly solved by Neural enquirer (layered/modular)
  114. 114. Recent advances
  115. 115. 3.1 Supersense inference GFT (Google Fine-Grained) taxonomy Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods for Fine Grained Entity Type Classification." ACL (2). 2015.
  116. 116. 3.1 Supersense inference FIGER taxonomy Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods for Fine Grained Entity Type Classification." ACL (2). 2015.
  117. 117. 3.1 Supersense inference for every position, term, tag tuple in TermExtractor3(s): for every document d in D2: FB = Freebase owem = OurWordEmbeddingsModel Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods for Fine Grained Entity Type Classification." ACL (2). 2015. for every sent s in d: sentence_vector = None if not sentence_vector: sentence_vector = vectorizeSentence(s, position) owem[ T[tag] ].update( sentence_vector ) T = OurTaxonomy1 3 TermExtractor = parser + chunker + entity resolver that assigns Freebase types to entities. 2 D = 133,000 news documents. 1 FB labels are manually mapped to fine-grained labels.
  118. 118. 3.1 Supersense inference for every position, term, tag tuple in TermExtractor3(s): for every document d in D2: FB = Freebase owem = OurWordEmbeddingsModel Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods for Fine Grained Entity Type Classification." ACL (2). 2015. for every sent s in d: sentence_vector = None if not sentence_vector: sentence_vector = vectorizeSentence(s, position) owem[ T[tag] ].update( sentence_vector ) T = OurTaxonomy1 3 TermExtractor = parser + chunker + entity resolver that assigns Freebase types to entities. 2 D = 133,000 news documents. 1 FB labels are manually mapped to fine-grained labels.
  119. 119. 3.1 Supersense inference Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods for Fine Grained Entity Type Classification." ACL (2). 2015.
  120. 120. 3.2 Semantic trees Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). q = "Which city hosted the longest game before the game in Beijing?" KB = knowledge base consisting of tables that store facts encoder = bi-directional Recurrent Neural Network v = encoder(q) # vector of the encoded query kb_encoder = KB table encoder
  121. 121. 3.2 Semantic trees Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). q = "Which city hosted the longest game before the game in Beijing?" KB = knowledge base consisting of tables that store facts encoder = bi-directional Recurrent Neural Network v = encoder(q) # vector of the encoded query kb_encoder = KB table encoder fn = embedding of the name of the n-th column wmn = embedding of the value in the n-th column of the m-th row [x ; y] = vector concatenation W = weights (what we will e.g. SGD out of the data) b = bias
  122. 122. 3.2 Semantic trees Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). q = "Which city hosted the longest game before the game in Beijing?" KB = knowledge base consisting of tables that store facts encoder = bi-directional Recurrent Neural Network v = encoder(q) # vector of the encoded query kb_encoder = KB table encoder fn = embedding of the name of the n-th column wmn = embedding of the value in the n-th column of the m-th row [x ; y] = vector concatenation W = weights (what we will e.g. SGD out of the data) b = bias normalizeBetween0and1( penaltiesOrRewards * # the weights that minimize the loss for this column [ {sydney... australia... kangaroo...} + {city... town... place...} ] + somePriorExpectations )
  123. 123. 3.2 Semantic trees Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). "olympic games near kangaroos" 👍 q = "Which city hosted the longest game before the game in Beijing?" KB = knowledge base consisting of tables that store facts encoder = bi-directional Recurrent Neural Network v = encoder(q) # vector of the encoded query kb_encoder = KB table encoder fn = embedding of the name of the n-th column wmn = embedding of the value in the n-th column of the m-th row [x ; y] = vector concatenation W = weights (what we will e.g. SGD out of the data) b = bias normalizeBetween0and1( penaltiesOrRewards * [ {sydney... australia... kangaroo...} + {city... town... place...} ] + somePriorExpectations )
  124. 124. 3.2 Semantic trees Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). "olympic games near kangaroos" Risk of false positives? Already as probabilities!! 👍 q = "Which city hosted the longest game before the game in Beijing?" KB = knowledge base consisting of tables that store facts encoder = bi-directional Recurrent Neural Network v = encoder(q) # vector of the encoded query kb_encoder = KB table encoder fn = embedding of the name of the n-th column wmn = embedding of the value in the n-th column of the m-th row [x ; y] = vector concatenation W = weights (what we will e.g. SGD out of the data) b = bias normalizeBetween0and1( penaltiesOrRewards * [ {sydney... australia... kangaroo...} + {city... town... place...} ] + somePriorExpectations )
  125. 125. 3.2 Semantic trees Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Olympic Games City Year Duration Barcelona 1992 15 Atlanta 1996 18 Sydney 2000 17 Athens 2004 14 Beijing 2008 16 London 2012 16 Rio de Janeiro 2016 18
  126. 126. 3.2 Semantic trees Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games? Olympic Games City Year Duration Barcelona 1992 15 Atlanta 1996 18 Sydney 2000 17 Athens 2004 14 Beijing 2008 16 London 2012 16 Rio de Janeiro 2016 18
  127. 127. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games? Olympic Games City Year Duration Barcelona 1992 15 Atlanta 1996 18 Sydney 2000 17 Athens 2004 14 Beijing 2008 16 London 2012 16 Rio de Janeiro 2016 18 Q(i) = T(i) = y(i) = RNN-encoded = p(vector("column:city") | [ vector("which"), vector("city")... ] ) > p(vector("column:city") | [ vector("city"), vector("which")... ] ) 3.2 Semantic trees
  128. 128. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) As many as SQL operations 3.2 Semantic trees
  129. 129. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) Each executor applies the same set of weights to all rows (since it models a single operation) 3.2 Semantic trees
  130. 130. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) The logic of max, min functions is built-in as column-wise operations. The system learns over which ranges they apply 3.2 Semantic trees
  131. 131. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) The logic of max, min functions is built-in as column-wise operations. The system learns over which ranges they apply (before, after are just temporal versions of argmax and argmin) 3.2 Semantic trees
  132. 132. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) Candidate vector at layer L for row m of our database Memory layer with the last output vector generated by executor L - 1 Evaluate a candidate: 3.2 Semantic trees
  133. 133. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) Row m, R = our table Set of columns As an attention mechanism, cumulatively weight the vector by the output weights from executor L - 1 Evaluate a candidate: 3.2 Semantic trees
  134. 134. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) Row m, R = our table Set of columns This allows the system to restrict the reference at each step. Evaluate a candidate: 3.2 Semantic trees
  135. 135. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) Row m, R = our table Set of columns As we weight the vector in the direction of each successive operation, it gets closer to any applicable answer in our DB. Evaluate a candidate: 3.2 Semantic trees
  136. 136. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). SbS is N2N with bias. SEMPRE is a toolkit for training semantic parsers, which map natural language utterances to denotations (answers) via intermediate logical forms. Here's an example for querying databases. https://nlp.stanford.edu/software/sempre/ 3.2 Semantic trees
  137. 137. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). SbS is N2N with bias. SEMPRE is a toolkit for training semantic parsers, which map natural language utterances to denotations (answers) via intermediate logical forms. Here's an example for querying databases. https://nlp.stanford.edu/software/sempre/ 😳 3.2 Semantic trees
  138. 138. 3.3 Re-entry Task: given a set of facts or a story, answer questions on those facts in a logically consistent way. Problem: Theoretically, it could be achieved by a language modeler such as a recurrent neural network (RNN). However, their memory (encoded by hidden states and weights) is typically too small, and is not compartmentalized enough to accurately remember facts from the past (knowledge is compressed into dense vectors). RNNs are known to have difficulty in memorization tasks, i.e., outputting the same input sequence they have just read. Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014).
  139. 139. Task: given a set of facts or a story, answer questions on those facts in a logically consistent way. Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). Jason went to the kitchen. Jason picked up the milk. Jason travelled to the office. Jason left the milk there. Jason went to the bathroom. jason go kitchen jason get milk jason go office jason drop milk jason go bathroom Where is the milk? milk? 3.3 Re-entry
  140. 140. F = set of facts (e.g. recent context of conversation; sentences) M = memory for fact f in F: # any linguistic pre-processing may apply store f in the next available memory slot of M Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). At the reading stage... 3.3 Re-entry
  141. 141. F = set of facts (e.g. recent context of conversation; sentences) M = memory O = inference module (oracle) x = input q = query about our known facts k = number of candidate facts to be retrieved m = a supporting fact (typically, the one that maximizes support) Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). At the inference stage... 3.3 Re-entry
  142. 142. Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). At the inference stage... jason go kitchen jason get milk jason go office jason drop milk jason go bathroom F = set of facts (e.g. recent context of conversation; sentences) M = memory O = inference module (oracle) x = input q = query about our known facts k = number of candidate facts to be retrieved m = a supporting fact (typically, the one that maximizes support) 3.3 Re-entry
  143. 143. Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). At the inference stage, t1... milk jason go kitchen jason get milk jason go office jason drop milk jason go bathroom Candidate space for k = 2 jason go kitchen jason get milk jason go office jason drop milk jason go bathroom argmax over that space milk, [ ] 3.3 Re-entry
  144. 144. Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). At the inference stage, t2... milk jason go kitchen jason get milk jason go office jason drop milk jason go bathroom Candidate space for k = 2 jason go kitchen jason get milk jason go office jason drop milk jason go bathroom argmax over that space milk, jason, drop 3.3 Re-entry
  145. 145. Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). At the inference stage, t2... jason go kitchen jason get milk jason go office jason drop milk jason go bathroom jason go kitchen jason get milk jason go office jason drop milk jason go bathroom Why not? jason, get, milk = does not contribute as much new information as jason go office 3.3 Re-entry
  146. 146. Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). Training is performed with a margin ranking loss and stochastic gradient descent (SGD). Simple model. Easy to integrate with any pipeline already working with structured facts (e.g. Neural Inquirer). Adds a temporal dimension to it. Answering the question about the location of the milk requires comprehension of the actions picked up and left. 3.3 Re-entry
  147. 147. Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). QACNN CBT-NE CBT-CN Date 69.4 66.6 63.0 Kadlec, Rudolf, et al. "Text understanding with the attention sum reader network." arXiv preprint arXiv: 1603.01547 (2016). 75.4 71.0 68.9 4 Mar 16 Sordoni, Alessandro, et al. "Iterative alternating neural attention for machine reading." arXiv preprint arXiv: 1606.02245 (2016). 76.1 72.0 71.0 7 Jun 16 Trischler, Adam, et al. "Natural language comprehension with the epireader." arXiv preprint arXiv:1606.02270 (2016). 74.0 71.8 70.6 7 Jun 16 Dhingra, Bhuwan, et al. "Gated-attention readers for text comprehension." arXiv preprint arXiv:1606.01549 (2016). 77.4 71.9 69.0 5 Jun 16 3.3 Re-entry
  148. 148. 0 25 50 75 100 KB IE Wikipedia 76.2 68.3 93.9 69.9 63.4 78.5 No Knowledge (embeddings) Standard QA System on KB 54.4% 93.5% 17% Memory Networks Key-Value Memory Networks Responseaccuracy (%) Source: Antoine Bordes, Facebook AI Research, LXMLS Lisbon July 28, 2016 Results on MovieQA
  149. 149. Conclusions
  150. 150. Summary of desiderata 3. Semantic parsing 1. Taxonomy-driven supersense inference 2. Semantically-informed syntactic trees 3. Ability to handle decision-tree re-entries Extend the neuralized architecture's domain by integrating support for supersenses. Enable inference at training and test time. Enable semantic parsing using a neuralized architecture with respect to some knowledge base serving as ground truth. Use memory networks to effectively keep track of the entities throughout the conversation and ensure logical consistency. Extend memory networks by integrating support for supersenses. Enable inference at training and test time.
  151. 151. Summary of desiderata Conversational technology should really be about • dynamically finding the best possible way to browse a large repository of information/actions. • find the shortest path to any relevant action or piece of information (to avoid the plane dashboard effect). • surfacing implicit data in unstructured content ("bocadillo de calamares in Madrid"). Rather than going open-domain, taking the closed-domain and going deep into it. Remember?
  152. 152. thanks!

    Be the first to comment

    Login to see the comments

  • noticetoworld

    Dec. 29, 2017
  • dmitrybalabka

    Jun. 15, 2021

Speaker: Jordi Carrera Ventura, Artificial Intelligence technologist at Telefónica R&D Summary: Chatbots (aka conversational agents, spoken dialogue systems) allow users to interface with computers using natural language by simply asking questions or issuing commands. Given a query, the chatbot builds a semantic representation of the input, transforms it into a logical statement, and performs all the necessary actions to fulfill the user's intent. Sometimes this simply means calculating an exact answer or retrieving a fact from a database, whereas other times it means building a contextual model and running a full-fledged conversation flow while keeping track of anaphoras and cross-references. Besides the direct applications of chatbots in IoT (Amazon’s Alexa, Apple's Siri) and IT (the historical field of Information Retrieval as a whole can be seen as a sub-problem of spoken dialogue systems), chatbots' main appeal for technologists is their location at the intersection of all major Natural Language Processing technologies and many of the deepest questions in Cognitive Science today: semantic parsing, entity recognition, knowledge representation, and coreference resolution. In this talk, I will explore those questions in the context of an applied industry setting, and I will introduce a framework suitable for addressing them, together with an overview of the state-of-the-art in chatbot technology and some original techniques.

Views

Total views

777

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

25

Shares

0

Comments

0

Likes

2

×