SlideShare a Scribd company logo
Recent advances
in applied chatbot technology
to be presented at AI Ukraine 2016
Jordi Carrera Ventura

NLP scientist

Telefónica Research & Development
Outline
Overview
Industrial state-of-the-art
Current challenges
Recent advances
Conclusions
Overview
Chatbots are a form of conversational artificial intelligence (AI) in which a user
interacts with a virtual agent through natural language messaging in
• a messaging interface like Slack or Facebook Messenger
• or a voice interface like Amazon Echo or Google Assistant.
Chatbot1
A bot that lives in a chat (an automation routine inside a UI).
Conversational agent
Virtual Assistant1
A bot that takes natural language as input and returns it as output.
Chatbot2
A virtual assistant that lives in a chat.
User> what are you?
1 Generally assumed to have broader coverage and more advanced AI than chatbots2.
Usually intended to
• get quick answers to a specific questions over some pre-defined
repository of knowledge
• perform transactions in a faster and more natural way.
Can be used to
• surface contextually relevant information
• help a user complete an online transaction
• serve as a helpdesk agent to resolve a customer’s issue without
ever involving a human.
Virtual assistants for
Customer support, e-commerce, expense management, booking
flights, booking meetings, data science...
Use cases
Advantages of chatbots
From: https://medium.com/@csengerszab/messenger-chatbots-is-it-a-new-
revolution-6f550110c940
Actually,
As an interface, chatbots are not the best.

Nothing beats a spreadsheet for most things. For data
already tabulated, a chatbot may actually slow things down.

A well-designed UI can allow a user to do anything in a few
taps, as opposed to a few whole sentences in natural
language.

But chatbots can generally be regarded as universal -every
channel supports text. Chatbots as a UX are effective as a
protocol, like email.
What's really important...
Success conditions

• Talk like a person.

• The "back-end" MUST NOT BE a person.
• Provide a single interface to access all other services.

• Useful in industrial processes

Response automation given specific, highly-repetitive
linguistic inputs. Not conversations, but ~auto-replies
linked to more refined email filters.

• Knowledge representation.
What's really important...
Conversational technology should really be about

• dynamically finding the best possible way to browse a large
repository of information/actions.

• find the shortest path to any relevant action or piece of
information (to avoid the plane dashboard effect).

• surfacing implicit data in unstructured content ("bocadillo de
calamares in Madrid"). Rather than going open-domain, taking
the closed-domain and going deep into it.
What's really important...
Conversational technology should really be about

• dynamically finding the best possible way to browse a large
repository of information/actions.

• find the shortest path to any relevant action or piece of
information (to avoid the plane dashboard effect).

• surfacing implicit data in unstructured content ("bocadillo de
calamares in Madrid"). Rather than going open-domain, taking
the closed-domain and going deep into it.
... and the hype
... and the hype
... and the hype
... and the hype
... and the hype
... and the hype
... and the hype
... and the hype
... and the hype
... and the hype
That was IBM Watson. 🙄
... and the hype
Getting started with bots is as simple as using any of a handful of
new bot platforms that aim to make bot creation easy...
... sophisticated bots require an understanding of natural language
processing (NLP) and other areas of artificial intelligence.
The optimists
... and the hype
Artificial intelligence has advanced remarkably in the last three
years, and is reaching the point where computers can have
moderately complex conversations with users in natural language
<perform simple actions as response to primitive linguistic
signals>.
The optimists
... and the hype
Most of the value of deep learning today is in narrow domains where you
can get a lot of data. Here’s one example of something it cannot do: have
a meaningful conversation. There are demos, and if you cherry-pick the
conversation, it looks like it’s having a meaningful conversation, but if you
actually try it yourself, it quickly goes off the rails.
Andrew Ng (Chief Scientist, AI, Baidu)
... and the hype
Many companies start off by outsourcing their conversations to human
workers and promise that they can “automate” it once they’ve collected
enough data.
That’s likely to happen only if they are operating in a pretty narrow
domain – like a chat interface to call an Uber for example.
Anything that’s a bit more open domain (like sales emails) is beyond what
we can currently do.
However, we can also use these systems to assist human workers by
proposing and correcting responses. That’s much more feasible.
Andrew Ng (Chief Scientist, AI, Baidu)
Industrial state-of-the-art
Lack of suitable metrics
Industrial state-of-the-art
Liu, Chia-Wei, et al. "How NOT to evaluate your dialogue system: An empirical study of
unsupervised evaluation metrics for dialogue response generation." arXiv preprint arXiv:
1603.08023 (2016).
By Intento (https://inten.to)
Dataset
SNIPS.ai 2017 NLU Benchmark
https://github.com/snipsco/nlu-benchmark
Example intents
SearchCreativeWork (e.g. Find me the I, Robot television show)
GetWeather (e.g. Is it windy in Boston, MA right now?)
BookRestaurant (e.g. I want to book a highly rated restaurant for me and my boyfriend
tomorrow night)
PlayMusic (e.g. Play the last track from Beyoncé off Spotify)
AddToPlaylist (e.g. Add Diamonds to my roadtrip playlist)
RateBook (e.g. Give 6 stars to Of Mice and Men)
SearchScreeningEvent (e.g. Check the showtimes for Wonder Woman in Paris)
Industrial state-of-the-art
Benchmark
Methodology
English language.
Removed duplicates that differ by number of whitespaces, quotes,
lettercase, etc.
Resulting dataset parameters
7 intents, 15.6K utterances (~2K per intent)
3-fold 80/20 cross-validation.
Most providers do not offer programmatic interfaces for adding training
data.
Industrial state-of-the-art
Benchmark
Framework Since F1
False
positives
Response
time
Price
IBM Watson
Conversation
2015 99.7 100% 0.35 2.5
API.ai (Google 2016) 2010 99.6 40% 0.28 Free
Microsoft LUIS 2015 99.2 100% 0.21 0.75
Amazon Lex 2016 96.5 82% 0.43 0.75
Recast.ai 2016 97 75% 2.06 N/A
wit.ai (Facebook) 2013 97.4 72% 0.96 Free
SNIPS 2017 97.5 26% 0.36 Per device
Industrial state-of-the-art
Logarithmic scale!
Current challenges
So, let's be real...
Welcome to Macy's blah-blah... !
hi im a guy looking for leda jackets
We've got what you're looking for!
<image product_id=1 item_id=1>
that's a nice one but, i want to see
different jackets .... how can i do that
So, let's be real...
We've got what you're looking for!
<image product_id=1 item_id=2>
where can i find a pair of shoes that
match this yak etc.
We've got what you're looking for!
<image product_id=1 item_id=3>
pair of shoes
(, you talking dishwasher)
So, let's be real...
Welcome to Macy's blah-blah... !
hi im a guy looking for leda jackets
lexical variants:
synonyms, paraphrases
So, let's be real...
lexical variants:
synonyms, paraphrases
Intent
So, let's be real...
lexical variants:
synonyms, paraphrases
Intent
Entity
So, let's be real...
lexical variants:
synonyms, paraphrases
Intent
This can be normalized,
e.g. regular expressions
Entity
So, let's be real...
lexical variants:
synonyms, paraphrases
Intent
This can also be normalized,
e.g. regular expressions
Entity
So, let's be real...
lexical variants:
synonyms, paraphrases
Intent
This can also be normalized,
e.g. regular expressions
Entity
1-week project on a regular-expression-based filter.
From 89% to 95% F1 with only a 5% positive rate.
7-label domain classification task using a
RandomForest classifier over TFIDF-weighted vectors.
So, let's be real...
lexical variants:
synonyms, paraphrases
Intent My internet connection does not work.
My smartphone does not work.
My landline does not work.
My router does not work.
My SIM does not work.
My payment method does not work.
My saved movies does not work.
So, let's be real...
lexical variants:
synonyms, paraphrases
Intent My internet connection does not work.
My smartphone does not work.
My landline does not work.
My router does not work.
My SIM does not work.
My payment method does not work.
My saved movies does not work.
ConnectionDebugWizard(*)
TechnicalSupport(Mobile)
TechnicalSupport(Landline)
RouterDebugWizard()
VerifyDeviceSettings
AccountVerificationProcess
TVDebugWizard
So, let's be real...
lexical variants:
synonyms, paraphrases
Intent My internet connection does not work.
My smartphone does not work.
My landline does not work.
My router does not work.
My SIM does not work.
My payment method does not work.
My saved movies does not work.
ConnectionDebugWizard(*)
TechnicalSupport(Mobile)
TechnicalSupport(Landline)
RouterDebugWizard()
VerifyDeviceSettings
AccountVerificationProcess
TVDebugWizard
Since we associate each intent to an action,

the ability to discriminate actions presupposes
training as many separate intents,

with just as many ambiguities.
So, let's be real...
lexical variants:
synonyms, paraphrases
Intent My internet connection does not work.
My smartphone does not work.
My landline does not work.
My router does not work.
My SIM does not work.
My payment method does not work.
My saved movies does not work.
ConnectionDebugWizard(*)
TechnicalSupport(Mobile)
TechnicalSupport(Landline)
RouterDebugWizard()
VerifyDeviceSettings
AccountVerificationProcess
TVDebugWizard
It also presupposes exponentially increasing
amounts of training data as we add

higher-precision entities to our model

(every entity * every intent that applies)
So, let's be real...
lexical variants:
synonyms, paraphrases
Intent My internet connection does not work.
My smartphone does not work.
My landline does not work.
My router does not work.
My SIM does not work.
My payment method does not work.
My saved movies does not work.
ConnectionDebugWizard(*)
TechnicalSupport(Mobile)
TechnicalSupport(Landline)
RouterDebugWizard()
VerifyDeviceSettings
AccountVerificationProcess
TVDebugWizard
The entity resolution step (which would have
helped us make the right decision) is nested
downstream in our pipeline.

Any errors from the previous stage

will propagate down.
So, let's be real...
Welcome to Macy's blah-blah... !
hi im a guy looking for leda jackets
We've got what you're looking for!
<image product_id=1 item_id=1>
that's a nice one but, I want to see
different jackets .... how can i do that
lexical variants:
synonyms, paraphrases formal variants
(typos, ASR errors)
multiword
chunking/parsing
morphological
variants
So, let's be real...
Welcome to Macy's blah-blah... !
hi im a guy looking for leda jackets
We've got what you're looking for!
<image product_id=1 item_id=1>
lexical variants:
synonyms, paraphrases formal variants
(typos, ASR errors)
multiword
chunking/parsing
morphological
variants
entity recognition
hey i need to buy a men's red leather
jacket with straps for my friend
that's a nice one but, I want to see
different jackets .... how can i do that
So, let's be real...
where can i find a <jacket with straps>
looking for a <black mens jacket>
do you have anything in <leather>
<present> for a friend
}Our training

dataset
buy <red jacket>
So, let's be real...
where can i buy <a men's red leather
jacket with straps> for my friend
where can i find a <jacket with straps>
looking for a <black mens jacket>
do you have anything in <leather>
<present> for a friend
hi i need <a men's red leather jacket with
straps> that my friend can wear
}Our training

dataset
}Predictions

at runtime
buy <red jacket>
So, let's be real...
where can i buy <a men's red leather
jacket with straps> for my friend
where can i find a <jacket with straps>
looking for a <black mens jacket>
do you have anything in <leather>
<present> for a friend
hi i need <a men's red leather jacket with
straps> that my friend can wear
}Our training

dataset
}Predictions

at runtime
Incomplete training data

will cause

entity segmentation issues

buy <red jacket>
So, let's be real...
buy <red jacket>
where can i find a <jacket with straps>
looking for a <black mens jacket>
do you have anything in <leather>
<present> for a friend
}Our training

dataset
But more importantly, in the intent-entity

paradigm we are usually unable to train two

entity types with the same distribution:

where can i buy a <red leather jacket>
where can i buy a <red leather wallet>
jacket, 0.5
wallet, 0.5
jacket, 0.5
wallet, 0.5
So, let's be real...
buy <red jacket>
where can i find a <jacket with straps>
looking for a <black mens jacket>
do you have anything in <leather>
<present> for a friend
}Our training

dataset
... and have no way of detecting entities

used in isolation:

pair of shoes
MakePurchase
ReturnPurchase
['START', 3-gram, 'END']
GetWeather
Greeting
Goodbye
that's a nice one but, I want to see
different jackets .... how can i do that
So, let's be real...
Welcome to Macy's blah-blah... !
hi im a guy looking for leda jackets
We've got what you're looking for!
<image product_id=1 item_id=1>
lexical variants:
synonyms, paraphrases formal variants
(typos, ASR errors)
multiword
chunking/parsing
morphological
variants
entity recognition
hey i need to buy a men's red leather
jacket with straps for my friend
searchItemByText("leather jacket")
searchItemByText(
"to buy a men's ... pfff TLDR... my friend"
)
[
{ "name": "leather jacket",
"product_category_id": 12,
"gender_id": 0,
"materials_id": [68, 34, ...],
"features_id": [],
},
...
{ "name": "leather jacket with straps",
"product_category_id": 12,
"gender_id": 1,
"materials_id": [68, 34, ...],
"features_id": [8754],
},
]
that's a nice one but, I want to see
different jackets .... how can i do that
So, let's be real...
Welcome to Macy's blah-blah... !
hi im a guy looking for leda jackets
We've got what you're looking for!
<image product_id=1 item_id=1>
lexical variants:
synonyms, paraphrases formal variants
(typos, ASR errors)
multiword
chunking/parsing
morphological
variants
entity recognition
hey i need to buy a men's red leather
jacket with straps for my friend
searchItemByText("leather jacket")
searchItemByText(
"red leather jacket with straps"
)
[
{ "name": "leather jacket",
"product_category_id": 12,
"gender_id": 0,
"materials_id": [68, 34, ...],
"features_id": [],
},
...
{ "name": "leather jacket with straps",
"product_category_id": 12,
"gender_id": 1,
"materials_id": [68, 34, ...],
"features_id": [8754],
},
]
NO IDs,
NO reasonable expectation of a
cognitively realistic answer and
NO "conversational technology".
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
Straps
Jackets
Leather items
Since the conditions express
an AND logical operator and
we will search for an item
fulfilling all of them, we could
actually search for the terms
in the conjunction in any
order.
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
Given

• any known word, denoted by w, 

• the vocabulary V of all known words, such that wi ∈ V for any i,

• search space D, where d denotes every document in a collection
D such that d ∈ D and d = {wi, wi + 1, ..., wn}, such that wi ≤ x ≤ n ∈ V,
• a function filter(D, wx) that returns a subset of D, Dwx where wx ∈
d is true for every d: d ∈ Dwx, and
• query q = {"men's", 'red', 'leather', 'jacket', 'straps'},
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
do filter(filter(filter(filter(

filter(D, 'leather'), 'straps'), "men's"), 'jacket'),'red'
) =

do filter(filter(filter(filter(

filter(D, 'straps'), 'red'), 'jacket'), 'leather'),"men's"
) =

do filter(filter(filter(filter(

filter(D, "men's"), 'leather'), 'red'), 'straps'),'jacket'
) =
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
Given query q = {"men's", 'red', 'leather', 'jacket', 'straps'},

find the x: x ∈ q that satisfies:

containsSubstring(x, 'leather') ^ 

containsSubstring(x, 'straps') ^

...

containsSubstring(x, "men's")
As a result, it will lack the notion of what counts as a partial match.
We cannot use more complex predicates because, in this scenario,
the system has no understanding of the internal structure of the
phrase, its semantic dependencies, or its syntactic dependencies.
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
The following partial matches will be

virtually indistinguishable:

• leather straps for men's watches (3/5)

• men's vintage red leather shoes (3/5)

• red wallet with leather straps (3/5)

• red leather jackets with hood (3/5)
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
do filter(filter(filter(filter(

filter(D, "men's"), 'red'), 'leather'), 'jacket'),'straps'
)

Given

• any known word, denoted by w, 

• the vocabulary V of all known words, such that wi ∈ V for any i,

• search space D, where d denotes every document in a collection D
such that d ∈ D and d = {wi, wi + 1, ..., wn}, such that wi ≤ x ≤ n ∈ V,
• a function filter(D, wx) that returns a subset of D, Dwx where wx ∈ d
is true for every d: d ∈ Dwx, and
• query q = {"men's", 'red', 'leather', 'jacket', 'straps'},
This is our problem:
we're using a function
that treats equally every wx.
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
do filter(filter(filter(filter(

filter(D, "men's"), 'red'), 'leather'), 'jacket'),'straps'
)

Given

• any known word, denoted by w, 

• the vocabulary V of all known words, such that wi ∈ V for any i,

• search space D, where d denotes every document in a collection D
such that d ∈ D and d = {wi, wi + 1, ..., wn}, such that wi ≤ x ≤ n ∈ V,
• a function filter(D, wx) that returns a subset of D, Dwx where wx ∈ d
is true for every d: d ∈ Dwx, and
• query q = {"men's", 'red', 'leather', 'jacket', 'straps'},
We need a function with at least one
additional parameter, rx
filter(D, wx, rx)
where
rx denotes the depth at which the
dependency tree headed by wx
attaches to the node being
currently evaluated.
So, let's be real...
Your query
i need to buy a men's red leather jacket with straps for my friend
Tagging
i/FW need/VBP to/TO buy/VB a/DT men/NNS 's/POS red/JJ leather/NN jacket/NN with/IN
straps/NNS for/IN my/PRP$ friend/NN
Parse
(ROOT
(S
(NP (FW i))
(VP (VBP need)
(S
(VP (TO to)
(VP (VB buy)
(NP
(NP (DT a) (NNS men) (POS 's))
(JJ red) (NN leather) (NN jacket))
(PP (IN with)
(NP
(NP (NNS straps))
(PP (IN for)
(NP (PRP$ my) (NN friend)))))))))))
http://nlp.stanford.edu:8080/parser/index.jsp (as of September 19th 2017)
So, let's be real...
Your query
i need to buy a men's red leather jacket with straps for my friend
Tagging
i/FW need/VBP to/TO buy/VB a/DT men/NNS 's/POS red/JJ leather/NN jacket/NN with/IN
straps/NNS for/IN my/PRP$ friend/NN
Parse
(ROOT
(S
(NP (FW i))
(VP (VBP need)
(S
(VP (TO to)
(VP (VB buy)
(NP
(NP (DT a) ((NNS men) (POS 's))
((JJ red) (NN leather)) (NN jacket)))
(PP (IN with)
(NP
(NP (NNS straps))
(PP (IN for)
(NP (PRP$ my) (NN friend)))))))))))
http://nlp.stanford.edu:8080/parser/index.jsp (as of September 19th 2017)
So, let's be real...
Your query
i need to buy a men's red leather jacket with straps for my friend
Tagging
i/FW need/VBP to/TO buy/VB a/DT men/NNS 's/POS red/JJ leather/NN jacket/NN with/IN
straps/NNS for/IN my/PRP$ friend/NN
Parse
(ROOT
(S
(NP (FW i))
(VP (VBP need)
(S
(VP (TO to)
(VP (VB buy)
(NP
(NP (DT a) ((NNS men) (POS 's))
((JJ red) (NN leather)) (NN jacket)))
(PP (IN with)
(NP
(NP (NNS straps))
(PP (IN for)
(NP (PRP$ my) (NN friend)))))))))))
http://nlp.stanford.edu:8080/parser/index.jsp (as of September 19th 2017)
(red, leather)

(leather, jacket)

(jacket, buy)
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
do filter(filter(filter(filter(

filter(D, "men's"), 'red'), 'leather'), 'jacket'),'straps'
)

Given

• any known word, denoted by w, 

• the vocabulary V of all known words, such that wi ∈ V for any i,

• search space D, where d denotes every document in a collection D
such that d ∈ D and d = {wi, wi + 1, ..., wn}, such that wi ≤ x ≤ n ∈ V,
• a function filter(D, wx) that returns a subset of D, Dwx where wx ∈ d
is true for every d: d ∈ Dwx, and
• query q = {"men's", 'red', 'leather', 'jacket', 'straps'},
The output of this function
will no longer be a subset of
search results but a ranked list.
This list can be naively ranked by
∑x
|t| (1 / rx)
although much more advanced
scoring functions can be used.
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
Given:
q: "red leather jacket"
a1: "black leather jacket"
a2: "red leather wallet"
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
>>> t1 = dependencyTree(a1)
>>> t1.depth("black")
3
>>> t1.depth("jacket")
1
Given:
q: "red leather jacket"
a1: "black leather jacket"
a2: "red leather wallet"
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
>>> results = filter(D, "jacket", t1.depth("jacket"))
>>> results
[ ( Jacket1, 1.0 ),
( Jacket2, 1.0 ),
...,
( Jacketn, 1.0 )
]
Given:
q: "red leather jacket"
a1: "black leather jacket"
a2: "red leather wallet"
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
>>> results = filter(results, "leather", t1.depth("leather"))
>>> results
[ ( Jacket1, 1.5 ),
( Jacket2, 1.5 ),
...,
( Jacketn - x, 1.5 )
]
Given:
q: "red leather jacket"
a1: "black leather jacket"
a2: "red leather wallet"
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
Given:
q: "red leather jacket"
a1: "black leather jacket"
a2: "red leather wallet"
>>> results = filter(results, "red", t1.depth("red"))
>>> results
[ ( Jacket1, 1.5 ),
( Jacket2, 1.5 ),
...,
( Jacketn - x, 1.5 )
]
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
>>> t2 = dependencyTree(a2)
>>> t2.depth("red")
3
>>> t2.depth("jacket")
None
Given:
q: "red leather jacket"
a1: "black leather jacket"
a2: "red leather wallet"
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
>>> results = filter(D, "jacket", t2.depth("jacket"))
>>> results
[]
Given:
q: "red leather jacket"
a1: "black leather jacket"
a2: "red leather wallet"
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
>>> results = filter(results, "leather", t2.depth("leather"))
>>> results
[ ( Wallet1, 0.5 ),
( Wallet2, 0.5 ),
...,
( Strapn, 0.5 )
]
Given:
q: "red leather jacket"
a1: "black leather jacket"
a2: "red leather wallet"
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
Given:
q: "red leather jacket"
a1: "black leather jacket"
a2: "red leather wallet"
>>> results = filter(results, "red", t2.depth("red"))
>>> results
[ ( Wallet1, 0.83 ),
( Wallet2, 0.83 ),
...,
( Strapn - x, 0.83 )
]
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
Results for q2:
"red leather wallet"
[ ( Wallet1, 0.83 ),
( Wallet2, 0.83 ),
...,
( Strapn - x, 0.83 )
]
Results for q1:
"black leather jacket"
[ ( Jacket1, 1.5 ),
( Jacket2, 1.5 ),
...,
( Jacketn - x, 1.5 )
]
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
However, the dependency tree is not always available, and it
often contains parsing issues (cf. Stanford parser output).
An evaluation of parser robustness over noisy input reports
performances down to an average of 80% across datasets
and parsers.
Hashemi, Homa B., and Rebecca Hwa. "An Evaluation of Parser Robustness for
Ungrammatical Sentences." EMNLP. 2016.
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
In order to make this decision
[ [red leather] jacket ]👍
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
[ [red leather] jacket ]
[ red [leather jacket] ]👍
In order to make this decision
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
[ [red leather] jacket ]
[ red [leather jacket] ]
[ men's [leather jacket] ]👍
In order to make this decision
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
[ [red leather] jacket ]
[ red [leather jacket] ]
[ men's [leather jacket] ]
😱
In order to make this decision
[ [men's leather] jacket ]
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
[ [red leather] jacket ]
[ red [leather jacket] ]
[ men's [leather jacket] ]
[ [men's leather] jacket ]😱
In order to make this decision
... the parser already needs
as much information as we
can provide regarding head-
modifier attachments.
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
[ [red leather] jacket ]
[ red [leather jacket] ]
[ men's [leather jacket] ]
[ [men's leather] jacket ]
In order to make this decision
2 2 1
2 1 1
2 1 1
2 2 1
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
In order to make this decision
Same pattern, different score. How to assign a probability such
that less plausible results are buried at the bottom of the
hypotheses space? (not only for display purposes, but for
dynamic programming reasons)
[ [red leather] jacket ]
[ red [leather jacket] ]
[ men's [leather jacket] ]
[ [men's leather] jacket ]
2 2 1
2 1 1
2 1 1
2 2 1
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
In order to make this decision
p(red | jacket ) ~
p(red | leather )
p(men's | jacket ) >
[ [red leather] jacket ]
[ red [leather jacket] ]
[ men's [leather jacket] ]
[ [men's leather] jacket ]
2 2 1
2 1 1
2 1 1
2 2 1 p(men's | leather )
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
That's a lot of data to model. Where will we get it from?
[ [red leather] jacket ]
[ red [leather jacket] ]
[ men's [leather jacket] ]
[ [men's leather] jacket ]
2 2 1
2 1 1
2 1 1
2 2 1
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
That's a lot of data to model. Where will we get it from?
p(Color | ClothingItem ) ~
p(Color | Material )
p(Person | ClothingItem ) >
[ [red leather] jacket ]
[ red [leather jacket] ]
[ men's [leather jacket] ]
[ [men's leather] jacket ]
2 2 1
2 1 1
2 1 1
2 2 1 p(Person | Material )
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
If our taxonomy is well built, semantic associations
acquired for a significant number of members of each
class will propagate to new, unseen members of that class.
With a sufficiently rich NER detection pipeline and a semantic
relation database large enough, we can rely on simple inference
for a vast majority of the disambiguations.
More importantly, we can weight candidate hypotheses
dynamically in order to make semantic parsing tractable
over many possible syntactic trees.
So, let's be real...
Welcome to Macy's blah-blah... !
hi im a guy looking for leda jackets
We've got what you're looking for!
<image product_id=1 item_id=1>
that's a nice one but, I want to see
different jackets .... how can i do that
lexical variants:
synonyms, paraphrases formal variants
(typos, ASR errors)
multiword
chunking/parsing
morphological
variants
entity recognition
entity linking
hierarchical structure
So, let's be real...
Welcome to Macy's blah-blah... !
hi im a guy looking for leda jackets
We've got what you're looking for!
<image product_id=1 item_id=1>
that's a nice one but, i want to see
different jackets .... how can i do that
So, let's be real...
Welcome to Macy's blah-blah... !
hi im a guy looking for leda jackets
We've got what you're looking for!
<image product_id=1 item_id=1>
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
So, let's be real...
that
one
do that
LastResult
jacket
LastCommand
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
So, let's be real...
that LastResult
IF:

pPoS-tagging(that_DT | * be_VBZ, START *) > pPoS-tagging(that_IN | * be_VBZ, START *)
that_DT 's_VBZ great_JJ !_PUNCT
that_DT 's_VBZ the_DT last_JJ...
he said that_IN i_PRP...
he said that_IN trump_NNP...
THEN IF:

pisCoreferent(LastValueReturned | that_DT) > u # may also need to condition on the
# dependency: (x, nice)
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
So, let's be real...
one jacket
Find the LastResult in the context that maximizes:

pisCoreferent(

LastResult.type=x |
one_CD, # trigger for the function, trivial condition
QualitativeEvaluation(User, x),
ContextDistance(LastResult.node)
)
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
So, let's be real...
one jacket
Particularly important
for decision-tree re-entry
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
Find the LastResult in the context that maximizes:

pisCoreferent(

LastResult.type=x |
one_CD, # trigger for the function, trivial condition
QualitativeEvaluation(User, x),
ContextDistance(LastResult.node)
)
So, let's be real...
no! forget the last thing i said
go back!
third option in the menu
i want to modify the address I gave you
So, let's be real...
QualitativeEvaluation(Useru, ClothingItemi)
BrowseCatalogue(Useru, filter=[ClothingItemn, j
|J|])
HowTo(Useru, Commandc
|Context|)
}A single utterance
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
So, let's be real...
}A single utterance
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
QualitativeEvaluation(Useru, ClothingItemi)
BrowseCatalogue(Useru, filter=[ClothingItemn, j
|J|])
HowTo(Useru, Commandc
|Context|)
So, let's be real...
}random.choice(I)
x if x ∈ I ^
length(x) = argmax(L)
I = [ i1, i2, i3 ]

L = [ | i1 |, | i2 |, | i3 | ]

P = [ p( i1 ), p( i2 ), p( i3 )]
x if x ∈ I ^
p(x) = argmax(P)
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
QualitativeEvaluation(Useru, ClothingItemi)
BrowseCatalogue(Useru, filter=[ClothingItemn, j
|J|])
HowTo(Useru, Commandc
|Context|)
So, let's be real...
}
random.choice(I)
x if x ∈ I ^
length(x) = argmax(L)
I = [ i1, i2, i3 ]

L = [ | i1 |, | i2 |, | i3 | ]

P = [ p( i1 | q ), p( i2 | q ), p( i3 | q ) ]
x if x ∈ I ^
p(x | q) = argmax(P)
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
However, this approach will still suffer from
segmentation issues due to passing many
more arguments than expected for resolving
any specific intent.
So, let's be real...
GetInfo(Useru, argmax(Commandc
|Context| ) )
SetPreference(Useru, SetRating(ClothingItemi, k, 1) )
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
∀(ClothingItemn, m
|N| |M| | n = i, k = * )
}
A standard API

will normally
provide these
definitions as the
declaration of its
methods.
So, let's be real...
GetInfo(Useru, argmax(Commandc
|Context| ) )
SetPreference(Useru, SetRating(ClothingItemi, k, 1) )
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
∀(ClothingItemn, m
|N| |M| | n = i, k = * )
}
We can map the
argument signatures
to the nodes of the
taxonomy we are
using for linguistic
inference.
So, let's be real...
GetInfo(Useru, argmax(Commandc
|Context| ) )
SetPreference(Useru, SetRating(ClothingItemi, k, 1) )
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
∀(ClothingItemn, m
|N| |M| | n = i, k = * )
}The system will
then be able to link
API calls to entity
parses dynamically.
So, let's be real...
GetInfo(Useru, argmax(Commandc
|Context| ) )
SetPreference(Useru, SetRating(ClothingItemi, k, 1) )
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
∀(ClothingItemn, m
|N| |M| | n = i, k = * )
}
Over time, new
methods added to
the API will be
automatically
supported by the
pre-existing
linguistic engine.
So, let's be real...
GetInfo(Useru, argmax(Commandc
|Context| ) )
SetPreference(Useru, SetRating(ClothingItemi, k, 1) )
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
∀(ClothingItemn, m
|N| |M| | n = i, k = * )
}
Over time, new
methods added to
the API will be
automatically
supported by the
pre-existing
linguistic engine.
😁
So, let's be real...
If the system can reasonably solve nested semantic
dependencies hierarchically,

it can also be extended to handle multi-intent utterances in a
natural way.
So, let's be real...
We need:

• semantic association measures between the leaves of the tree in
order to attach them to the right nodes and be able to stop growing
a tree at the right level,

• a syntactic tree to build the dependencies, either using a parser or a
PCFG/LFG grammar (both may take input from the previous step,
which will provide the notion of verbal valence: Person Browse
Product),

• for tied analyses, an interface with the user to request clarification,
So, let's be real...
Convert to abstract entity types
Activate applicable
dependencies/grammar rules

(fewer and less ambiguous
thanks to the previous step)
i need to pay my bill and i cannot log into my account
i need to PAYMENT my INVOICE and i ISSUE ACCESS ACCOUNT
Resolve attachments and rank
candidates by semantic association,
keep top-n hypotheses
i need to [ PAYMENT my INVOICE ] and i [ ISSUE [ AccessAccount() ] ]
i need to [ Payment(x) ] and i [ Issue(y) ]
So, let's be real...
AND usually joins

ontologically coordinate elements,

e.g. books and magazines versus
*humans and Donald Trump
i need to [ Payment(x) ] and i [ Issue(y) ]
So, let's be real...
Even if

p0 = p( Payment | Issue ) > u,
we will still split the intents because here we have

py = p( Payment | Issue, y ),

and normally py < p0 for any irrelevant case (i.e., the
posterior probability of the candidate attachment will not beat
the baseline of its own prior probability)
i need to [ Payment(x) ] and i [ Issue(y) ]
So, let's be real...
I.e., Payment may have been attached to Issue if no other element
had been attached to it before.

Resolution order, as given by attachment probabilities, matters.
i need to [ Payment(x) ] and i [ Issue(y) ]
So, let's be real...
i need to [ Payment(x) ] i [ Issue(y) ]
i need to [ Payment(x) ] and i [ Issue(y) ]
Multi-intents solved 😎
Summary of desiderata
1. Normalization
1. Ability to handle input indeterminacy by activating several
candidate hypotheses

2. Ability to handle synonyms

3. Ability to handle paraphrases

2. NER
1. Transversal (cross-intent)

2. No nesting under classification (feedback loop instead)

3. Contextually unbounded

4. Entity-linked

3. Semantic parsing
1. Taxonomy-driven supersense inference

2. Semantically-informed syntactic trees

3. Ability to handle decision-tree re-entries
Summary of desiderata
1. Normalization
1. Ability to handle input indeterminacy by activating several
candidate hypotheses

2. Ability to handle synonyms

3. Ability to handle paraphrases

2. NER
1. Transversal (cross-intent)

2. No nesting under classification (feedback loop instead)

3. Contextually unbounded

4. Entity-linked

3. Semantic parsing
1. Taxonomy-driven supersense inference

2. Semantically-informed syntactic trees

3. Ability to handle decision-tree re-entries
Word embeddings, domain
thesaurus, fuzzy matching
Inverted indexes, character-
based neural networks for NER
Summary of desiderata
1. Normalization
1. Ability to handle input indeterminacy by activating several
candidate hypotheses

2. Ability to handle synonyms

3. Ability to handle paraphrases

2. NER
1. Transversal (cross-intent)

2. No nesting under classification (feedback loop instead)

3. Contextually unbounded

4. Entity-linked

3. Semantic parsing
1. Taxonomy-driven supersense inference

2. Semantically-informed syntactic trees

3. Ability to handle decision-tree re-entries
Word embeddings, domain
thesaurus, fuzzy matching
Inverted indexes, character-
based neural networks for NER
implicitly solved by Supersenses

+ Neural enquirer
implicitly solved by Neural
enquirer (layered/modular)
Recent advances
3.1 Supersense inference
GFT (Google Fine-Grained) taxonomy
Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods
for Fine Grained Entity Type Classification." ACL (2). 2015.
3.1 Supersense inference
FIGER taxonomy
Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods
for Fine Grained Entity Type Classification." ACL (2). 2015.
3.1 Supersense inference
for every position, term, tag tuple in TermExtractor3(s):
for every document d in D2:
FB = Freebase
owem = OurWordEmbeddingsModel
Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods
for Fine Grained Entity Type Classification." ACL (2). 2015.
for every sent s in d:
sentence_vector = None
if not sentence_vector:
sentence_vector = vectorizeSentence(s, position)
owem[ T[tag] ].update( sentence_vector )
T = OurTaxonomy1
3 TermExtractor = parser + chunker + entity resolver that assigns Freebase types to entities.
2 D = 133,000 news documents.
1 FB labels are manually mapped to fine-grained labels.
3.1 Supersense inference
for every position, term, tag tuple in TermExtractor3(s):
for every document d in D2:
FB = Freebase
owem = OurWordEmbeddingsModel
Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods
for Fine Grained Entity Type Classification." ACL (2). 2015.
for every sent s in d:
sentence_vector = None
if not sentence_vector:
sentence_vector = vectorizeSentence(s, position)
owem[ T[tag] ].update( sentence_vector )
T = OurTaxonomy1
3 TermExtractor = parser + chunker + entity resolver that assigns Freebase types to entities.
2 D = 133,000 news documents.
1 FB labels are manually mapped to fine-grained labels.
3.1 Supersense inference
Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods
for Fine Grained Entity Type Classification." ACL (2). 2015.
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
q = "Which city hosted the longest game before the game in Beijing?"
KB = knowledge base consisting of tables that store facts

encoder = bi-directional Recurrent Neural Network

v = encoder(q) # vector of the encoded query

kb_encoder = KB table encoder
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
q = "Which city hosted the longest game before the game in Beijing?"
KB = knowledge base consisting of tables that store facts

encoder = bi-directional Recurrent Neural Network

v = encoder(q) # vector of the encoded query

kb_encoder = KB table encoder
fn = embedding of the name of the n-th column 

wmn = embedding of the value in the n-th column of the m-th row

[x ; y] = vector concatenation

W = weights (what we will e.g. SGD out of the data)

b = bias
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
q = "Which city hosted the longest game before the game in Beijing?"
KB = knowledge base consisting of tables that store facts

encoder = bi-directional Recurrent Neural Network

v = encoder(q) # vector of the encoded query

kb_encoder = KB table encoder
fn = embedding of the name of the n-th column 

wmn = embedding of the value in the n-th column of the m-th row

[x ; y] = vector concatenation

W = weights (what we will e.g. SGD out of the data)

b = bias
normalizeBetween0and1(

penaltiesOrRewards * # the weights that minimize the loss for this column

[ {sydney... australia... kangaroo...} + {city... town... place...} ] +

somePriorExpectations 

)
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
"olympic games near kangaroos" 👍
q = "Which city hosted the longest game before the game in Beijing?"
KB = knowledge base consisting of tables that store facts

encoder = bi-directional Recurrent Neural Network

v = encoder(q) # vector of the encoded query

kb_encoder = KB table encoder
fn = embedding of the name of the n-th column 

wmn = embedding of the value in the n-th column of the m-th row

[x ; y] = vector concatenation

W = weights (what we will e.g. SGD out of the data)

b = bias
normalizeBetween0and1(

penaltiesOrRewards * 

[ {sydney... australia... kangaroo...} + {city... town... place...} ] +

somePriorExpectations 

)
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
"olympic games near kangaroos"
Risk of false positives?
Already as probabilities!!
👍
q = "Which city hosted the longest game before the game in Beijing?"
KB = knowledge base consisting of tables that store facts

encoder = bi-directional Recurrent Neural Network

v = encoder(q) # vector of the encoded query

kb_encoder = KB table encoder
fn = embedding of the name of the n-th column 

wmn = embedding of the value in the n-th column of the m-th row

[x ; y] = vector concatenation

W = weights (what we will e.g. SGD out of the data)

b = bias
normalizeBetween0and1(

penaltiesOrRewards * 

[ {sydney... australia... kangaroo...} + {city... town... place...} ] +

somePriorExpectations 

)
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
Olympic Games
City Year Duration
Barcelona 1992 15
Atlanta 1996 18
Sydney 2000 17
Athens 2004 14
Beijing 2008 16
London 2012 16
Rio de Janeiro 2016 18
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
Which city hosted the 2012 Olympic Games?
Olympic Games
City Year Duration
Barcelona 1992 15
Atlanta 1996 18
Sydney 2000 17
Athens 2004 14
Beijing 2008 16
London 2012 16
Rio de Janeiro 2016 18
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
Which city hosted the 2012 Olympic Games?
Olympic Games
City Year Duration
Barcelona 1992 15
Atlanta 1996 18
Sydney 2000 17
Athens 2004 14
Beijing 2008 16
London 2012 16
Rio de Janeiro 2016 18
Q(i) =
T(i) =
y(i) =
RNN-encoded = p(vector("column:city") | [ vector("which"), vector("city")... ] ) >
p(vector("column:city") | [ vector("city"), vector("which")... ] )
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
Which city hosted the 2012 Olympic Games?Q(i) =
RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) >
p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L
p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) >
p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1
Executor layer L - 2
p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) >
p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] )
As many as SQL
operations
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
Which city hosted the 2012 Olympic Games?Q(i) =
RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) >
p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L
p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) >
p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1
Executor layer L - 2
p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) >
p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] )
Each executor
applies the same
set of weights to all
rows (since it
models a single
operation)
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
Which city hosted the 2012 Olympic Games?Q(i) =
RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) >
p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L
p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) >
p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1
Executor layer L - 2
p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) >
p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] )
The logic of max, min
functions is built-in as
column-wise
operations. The system
learns over which
ranges they apply
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
Which city hosted the 2012 Olympic Games?Q(i) =
RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) >
p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L
p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) >
p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1
Executor layer L - 2
p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) >
p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] )
The logic of max, min
functions is built-in as
column-wise
operations. The system
learns over which
ranges they apply
(before, after are just
temporal versions of
argmax and argmin)
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
Which city hosted the 2012 Olympic Games?Q(i) =
RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) >
p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L
p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) >
p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1
Executor layer L - 2
p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) >
p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] )
Candidate vector at layer L for
row m of our database
Memory layer with the last output
vector generated by executor L - 1
Evaluate a
candidate:
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
Which city hosted the 2012 Olympic Games?Q(i) =
RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) >
p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L
p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) >
p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1
Executor layer L - 2
p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) >
p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] )
Row m, R = our table
Set of columns
As an attention mechanism,

cumulatively weight the vector by the output
weights from executor L - 1
Evaluate a
candidate:
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
Which city hosted the 2012 Olympic Games?Q(i) =
RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) >
p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L
p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) >
p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1
Executor layer L - 2
p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) >
p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] )
Row m, R = our table
Set of columns
This allows the system to restrict
the reference at each step.
Evaluate a
candidate:
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
Which city hosted the 2012 Olympic Games?Q(i) =
RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) >
p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L
p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) >
p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1
Executor layer L - 2
p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) >
p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] )
Row m, R = our table
Set of columns
As we weight the vector in the direction of
each successive operation, it gets closer to
any applicable answer in our DB.
Evaluate a
candidate:
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
SbS is N2N with bias.
SEMPRE is a toolkit for training semantic parsers, which map
natural language utterances to denotations (answers) via
intermediate logical forms. Here's an example for querying
databases. https://nlp.stanford.edu/software/sempre/
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
SbS is N2N with bias.
SEMPRE is a toolkit for training semantic parsers, which map
natural language utterances to denotations (answers) via
intermediate logical forms. Here's an example for querying
databases. https://nlp.stanford.edu/software/sempre/
😳
3.2 Semantic trees
3.3 Re-entry
Task: given a set of facts or a story, answer questions on those
facts in a logically consistent way.

Problem: Theoretically, it could be achieved by a language
modeler such as a recurrent neural network (RNN).

However, their memory (encoded by hidden states and weights)
is typically too small, and is not compartmentalized enough to
accurately remember facts from the past (knowledge is
compressed into dense vectors).

RNNs are known to have difficulty in memorization tasks, i.e.,
outputting the same input sequence they have just read.
Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory
networks." arXiv preprint arXiv:1410.3916 (2014).
Task: given a set of facts or a story, answer questions on those
facts in a logically consistent way.

Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory
networks." arXiv preprint arXiv:1410.3916 (2014).
Jason went to the kitchen.
Jason picked up the milk.
Jason travelled to the office.
Jason left the milk there.
Jason went to the bathroom.
jason go kitchen
jason get milk
jason go office
jason drop milk
jason go bathroom
Where is the milk? milk?
3.3 Re-entry
F = set of facts (e.g. recent context of conversation; sentences)

M = memory

for fact f in F: # any linguistic pre-processing may apply

store f in the next available memory slot of M
Memory networks
Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory
networks." arXiv preprint arXiv:1410.3916 (2014).
At the reading stage...
3.3 Re-entry
F = set of facts (e.g. recent context of conversation; sentences)

M = memory

O = inference module (oracle)

x = input

q = query about our known facts

k = number of candidate facts to be retrieved

m = a supporting fact (typically, the one that maximizes support)
Memory networks
Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory
networks." arXiv preprint arXiv:1410.3916 (2014).
At the inference stage...
3.3 Re-entry
Memory networks
Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory
networks." arXiv preprint arXiv:1410.3916 (2014).
At the inference stage...
jason go kitchen
jason get milk
jason go office
jason drop milk
jason go bathroom
F = set of facts (e.g. recent context of conversation; sentences)

M = memory

O = inference module (oracle)

x = input

q = query about our known facts

k = number of candidate facts to be retrieved

m = a supporting fact (typically, the one that maximizes support)
3.3 Re-entry
Memory networks
Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory
networks." arXiv preprint arXiv:1410.3916 (2014).
At the inference stage, t1...
milk
jason go kitchen
jason get milk
jason go office
jason drop milk
jason go bathroom
Candidate space for k = 2
jason go kitchen
jason get milk
jason go office
jason drop milk
jason go bathroom
argmax over that space
milk, [ ]
3.3 Re-entry
Memory networks
Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory
networks." arXiv preprint arXiv:1410.3916 (2014).
At the inference stage, t2...
milk
jason go kitchen
jason get milk
jason go office
jason drop milk
jason go bathroom
Candidate space for k = 2
jason go kitchen
jason get milk
jason go office
jason drop milk
jason go bathroom
argmax over that space
milk, jason, drop
3.3 Re-entry
Memory networks
Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory
networks." arXiv preprint arXiv:1410.3916 (2014).
At the inference stage, t2...
jason go kitchen
jason get milk
jason go office
jason drop milk
jason go bathroom
jason go kitchen
jason get milk
jason go office
jason drop milk
jason go bathroom
Why not?
jason, get, milk = does not contribute as much
new information as jason go office
3.3 Re-entry
Memory networks
Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory
networks." arXiv preprint arXiv:1410.3916 (2014).
Training is performed with a margin ranking loss and stochastic
gradient descent (SGD). 

Simple model.

Easy to integrate with any pipeline already working with
structured facts (e.g. Neural Inquirer). Adds a temporal dimension
to it.

Answering the question about the location of the milk requires
comprehension of the actions picked up and left.
3.3 Re-entry
Memory networks
Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory
networks." arXiv preprint arXiv:1410.3916 (2014).
QACNN CBT-NE CBT-CN Date
69.4 66.6 63.0
Kadlec, Rudolf, et al. "Text understanding with the
attention sum reader network." arXiv preprint arXiv:
1603.01547 (2016).
75.4 71.0 68.9 4 Mar 16
Sordoni, Alessandro, et al. "Iterative alternating neural
attention for machine reading." arXiv preprint arXiv:
1606.02245 (2016).
76.1 72.0 71.0 7 Jun 16
Trischler, Adam, et al. "Natural language comprehension
with the epireader." arXiv preprint arXiv:1606.02270
(2016).
74.0 71.8 70.6 7 Jun 16
Dhingra, Bhuwan, et al. "Gated-attention readers for text
comprehension." arXiv preprint arXiv:1606.01549 (2016).
77.4 71.9 69.0 5 Jun 16
3.3 Re-entry
0
25
50
75
100
KB IE Wikipedia
76.2
68.3
93.9
69.9
63.4
78.5
No Knowledge
(embeddings)
Standard QA
System on KB
54.4%
93.5%
17%
Memory Networks Key-Value Memory Networks
Responseaccuracy
(%)
Source: Antoine Bordes, Facebook AI Research, LXMLS Lisbon July 28, 2016
Results on MovieQA
Conclusions
Summary of desiderata
3. Semantic parsing
1. Taxonomy-driven supersense inference

2. Semantically-informed syntactic trees

3. Ability to handle decision-tree re-entries
Extend the neuralized architecture's domain by integrating support
for supersenses. Enable inference at training and test time.
Enable semantic parsing using a neuralized architecture with respect
to some knowledge base serving as ground truth.
Use memory networks to effectively keep track of the entities
throughout the conversation and ensure logical consistency.
Extend memory networks by integrating support for supersenses.
Enable inference at training and test time.
Summary of desiderata
Conversational technology should really be about

• dynamically finding the best possible way to browse a large
repository of information/actions.

• find the shortest path to any relevant action or piece of
information (to avoid the plane dashboard effect).

• surfacing implicit data in unstructured content ("bocadillo de
calamares in Madrid"). Rather than going open-domain, taking
the closed-domain and going deep into it.
Remember?
thanks!

More Related Content

What's hot

A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3
Ishan Jain
 
Implications of GPT-3
Implications of GPT-3Implications of GPT-3
Implications of GPT-3
Raven Jiang
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk
Vijay Ganti
 
From ELIZA to Alexa and Beyond
From ELIZA to Alexa and BeyondFrom ELIZA to Alexa and Beyond
From ELIZA to Alexa and Beyond
Charmi Chokshi
 
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
ijnlc
 
An Intelligent Career Counselling Bot A System for Counselling
An Intelligent Career Counselling Bot A System for CounsellingAn Intelligent Career Counselling Bot A System for Counselling
An Intelligent Career Counselling Bot A System for Counselling
IRJET Journal
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
Michel Bruley
 
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_StudentsNLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_StudentsHimanshu kandwal
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
Yuriy Guts
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLP
Vijay Ganti
 
Chatbots and AI
Chatbots and AIChatbots and AI
Chatbots and AI
Chatbots Paris
 
Gadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLGadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLLawrie Hunter
 
Lessons learned from building a commercial bot development platform
Lessons learned from building a commercial bot development platformLessons learned from building a commercial bot development platform
Lessons learned from building a commercial bot development platform
Jordi Cabot
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
Pranav Gupta
 
Open ai’s gpt 3 language explained under 5 mins
Open ai’s gpt 3 language explained under 5 minsOpen ai’s gpt 3 language explained under 5 mins
Open ai’s gpt 3 language explained under 5 mins
Anshul Nema
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
Minh Pham
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
ananth
 
Deep Dialog System Review
Deep Dialog System ReviewDeep Dialog System Review
Deep Dialog System Review
Nguyen Quang
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2
Yuriy Guts
 

What's hot (20)

A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3
 
Implications of GPT-3
Implications of GPT-3Implications of GPT-3
Implications of GPT-3
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk
 
From ELIZA to Alexa and Beyond
From ELIZA to Alexa and BeyondFrom ELIZA to Alexa and Beyond
From ELIZA to Alexa and Beyond
 
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
 
An Intelligent Career Counselling Bot A System for Counselling
An Intelligent Career Counselling Bot A System for CounsellingAn Intelligent Career Counselling Bot A System for Counselling
An Intelligent Career Counselling Bot A System for Counselling
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_StudentsNLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLP
 
Chatbots and AI
Chatbots and AIChatbots and AI
Chatbots and AI
 
Gadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLGadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALL
 
Lessons learned from building a commercial bot development platform
Lessons learned from building a commercial bot development platformLessons learned from building a commercial bot development platform
Lessons learned from building a commercial bot development platform
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
NLP
NLPNLP
NLP
 
Open ai’s gpt 3 language explained under 5 mins
Open ai’s gpt 3 language explained under 5 minsOpen ai’s gpt 3 language explained under 5 mins
Open ai’s gpt 3 language explained under 5 mins
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
 
Deep Dialog System Review
Deep Dialog System ReviewDeep Dialog System Review
Deep Dialog System Review
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2
 

Similar to Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jordi Carrera Ventura

ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
rohitcse52
 
AI and Python: Developing a Conversational Interface using Python
AI and Python: Developing a Conversational Interface using PythonAI and Python: Developing a Conversational Interface using Python
AI and Python: Developing a Conversational Interface using Python
amyiris
 
AI 2023.pdf
AI 2023.pdfAI 2023.pdf
AI 2023.pdf
DavidCieslak4
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
Jennifer Lim
 
iadaatpa gala boston
iadaatpa gala bostoniadaatpa gala boston
iadaatpa gala boston
Manuel Herranz
 
Meetup 6/3/2017 - Artificiële Intelligentie: over chatbots & robots
Meetup 6/3/2017 - Artificiële Intelligentie: over chatbots & robotsMeetup 6/3/2017 - Artificiële Intelligentie: over chatbots & robots
Meetup 6/3/2017 - Artificiële Intelligentie: over chatbots & robots
Digipolis Antwerpen
 
Fabrice Lacroix - Connecting a Chatbot to Your Technical Content: Myth and Re...
Fabrice Lacroix - Connecting a Chatbot to Your Technical Content: Myth and Re...Fabrice Lacroix - Connecting a Chatbot to Your Technical Content: Myth and Re...
Fabrice Lacroix - Connecting a Chatbot to Your Technical Content: Myth and Re...
LavaConConference
 
Getting ready for voice
Getting ready for voiceGetting ready for voice
Getting ready for voice
Maarten Dings
 
Python enterprise vento di liberta
Python enterprise vento di libertaPython enterprise vento di liberta
Python enterprise vento di liberta
Simone Federici
 
The Age of Conversational Agents
The Age of Conversational AgentsThe Age of Conversational Agents
The Age of Conversational Agents
Faction XYZ
 
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...
DataScienceConferenc1
 
IRJET- Artificial Intelligence Based Chat-Bot
IRJET-  	  Artificial Intelligence Based Chat-BotIRJET-  	  Artificial Intelligence Based Chat-Bot
IRJET- Artificial Intelligence Based Chat-Bot
IRJET Journal
 
Realizzare un Virtual Assistant con Bot Framework Azure e Unity
Realizzare un Virtual Assistant con Bot Framework Azure e UnityRealizzare un Virtual Assistant con Bot Framework Azure e Unity
Realizzare un Virtual Assistant con Bot Framework Azure e Unity
Marco Parenzan
 
Cloud AI GenAI Overview.pptx
Cloud AI GenAI Overview.pptxCloud AI GenAI Overview.pptx
Cloud AI GenAI Overview.pptx
SahithiGurlinka
 
A.I. in the Enterprise: Computer Speech
A.I. in the Enterprise: Computer SpeechA.I. in the Enterprise: Computer Speech
A.I. in the Enterprise: Computer Speech
Christopher Mohritz
 
Accessibility for design system 19
Accessibility for design system 19Accessibility for design system 19
Accessibility for design system 19
Paya Do
 
Conversational experience by Systango
Conversational experience by SystangoConversational experience by Systango
Conversational experience by Systango
Systango
 
Whats Next for Machine Learning
Whats Next for Machine LearningWhats Next for Machine Learning
Whats Next for Machine Learning
Ogilvy Consulting
 
Filip Maertens - AI, Machine Learning and Chatbots: Think AI-first
Filip Maertens - AI, Machine Learning and Chatbots: Think AI-first Filip Maertens - AI, Machine Learning and Chatbots: Think AI-first
Filip Maertens - AI, Machine Learning and Chatbots: Think AI-first
Patrick Van Renterghem
 

Similar to Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jordi Carrera Ventura (20)

ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
 
AI and Python: Developing a Conversational Interface using Python
AI and Python: Developing a Conversational Interface using PythonAI and Python: Developing a Conversational Interface using Python
AI and Python: Developing a Conversational Interface using Python
 
AI 2023.pdf
AI 2023.pdfAI 2023.pdf
AI 2023.pdf
 
ms_3.pdf
ms_3.pdfms_3.pdf
ms_3.pdf
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
iadaatpa gala boston
iadaatpa gala bostoniadaatpa gala boston
iadaatpa gala boston
 
Meetup 6/3/2017 - Artificiële Intelligentie: over chatbots & robots
Meetup 6/3/2017 - Artificiële Intelligentie: over chatbots & robotsMeetup 6/3/2017 - Artificiële Intelligentie: over chatbots & robots
Meetup 6/3/2017 - Artificiële Intelligentie: over chatbots & robots
 
Fabrice Lacroix - Connecting a Chatbot to Your Technical Content: Myth and Re...
Fabrice Lacroix - Connecting a Chatbot to Your Technical Content: Myth and Re...Fabrice Lacroix - Connecting a Chatbot to Your Technical Content: Myth and Re...
Fabrice Lacroix - Connecting a Chatbot to Your Technical Content: Myth and Re...
 
Getting ready for voice
Getting ready for voiceGetting ready for voice
Getting ready for voice
 
Python enterprise vento di liberta
Python enterprise vento di libertaPython enterprise vento di liberta
Python enterprise vento di liberta
 
The Age of Conversational Agents
The Age of Conversational AgentsThe Age of Conversational Agents
The Age of Conversational Agents
 
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...
 
IRJET- Artificial Intelligence Based Chat-Bot
IRJET-  	  Artificial Intelligence Based Chat-BotIRJET-  	  Artificial Intelligence Based Chat-Bot
IRJET- Artificial Intelligence Based Chat-Bot
 
Realizzare un Virtual Assistant con Bot Framework Azure e Unity
Realizzare un Virtual Assistant con Bot Framework Azure e UnityRealizzare un Virtual Assistant con Bot Framework Azure e Unity
Realizzare un Virtual Assistant con Bot Framework Azure e Unity
 
Cloud AI GenAI Overview.pptx
Cloud AI GenAI Overview.pptxCloud AI GenAI Overview.pptx
Cloud AI GenAI Overview.pptx
 
A.I. in the Enterprise: Computer Speech
A.I. in the Enterprise: Computer SpeechA.I. in the Enterprise: Computer Speech
A.I. in the Enterprise: Computer Speech
 
Accessibility for design system 19
Accessibility for design system 19Accessibility for design system 19
Accessibility for design system 19
 
Conversational experience by Systango
Conversational experience by SystangoConversational experience by Systango
Conversational experience by Systango
 
Whats Next for Machine Learning
Whats Next for Machine LearningWhats Next for Machine Learning
Whats Next for Machine Learning
 
Filip Maertens - AI, Machine Learning and Chatbots: Think AI-first
Filip Maertens - AI, Machine Learning and Chatbots: Think AI-first Filip Maertens - AI, Machine Learning and Chatbots: Think AI-first
Filip Maertens - AI, Machine Learning and Chatbots: Think AI-first
 

More from Grammarly

Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly
 
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
Grammarly
 
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly
 
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly
 
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...
Grammarly
 
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Grammarly
 
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
Grammarly
 
Grammarly Meetup: DevOps at Grammarly: Scaling 100x
Grammarly Meetup: DevOps at Grammarly: Scaling 100xGrammarly Meetup: DevOps at Grammarly: Scaling 100x
Grammarly Meetup: DevOps at Grammarly: Scaling 100x
Grammarly
 
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly
 
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
Grammarly
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy GryshchukGrammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy GutsGrammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
Grammarly
 
Natural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry HamonNatural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry Hamon
Grammarly
 

More from Grammarly (13)

Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
 
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
 
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
 
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
 
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...
 
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
 
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
 
Grammarly Meetup: DevOps at Grammarly: Scaling 100x
Grammarly Meetup: DevOps at Grammarly: Scaling 100xGrammarly Meetup: DevOps at Grammarly: Scaling 100x
Grammarly Meetup: DevOps at Grammarly: Scaling 100x
 
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
 
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy GryshchukGrammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy GutsGrammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
 
Natural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry HamonNatural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry Hamon
 

Recently uploaded

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 

Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jordi Carrera Ventura

  • 1. Recent advances in applied chatbot technology to be presented at AI Ukraine 2016 Jordi Carrera Ventura NLP scientist Telefónica Research & Development
  • 4. Chatbots are a form of conversational artificial intelligence (AI) in which a user interacts with a virtual agent through natural language messaging in • a messaging interface like Slack or Facebook Messenger • or a voice interface like Amazon Echo or Google Assistant. Chatbot1 A bot that lives in a chat (an automation routine inside a UI). Conversational agent Virtual Assistant1 A bot that takes natural language as input and returns it as output. Chatbot2 A virtual assistant that lives in a chat. User> what are you? 1 Generally assumed to have broader coverage and more advanced AI than chatbots2.
  • 5. Usually intended to • get quick answers to a specific questions over some pre-defined repository of knowledge • perform transactions in a faster and more natural way. Can be used to • surface contextually relevant information • help a user complete an online transaction • serve as a helpdesk agent to resolve a customer’s issue without ever involving a human. Virtual assistants for Customer support, e-commerce, expense management, booking flights, booking meetings, data science... Use cases
  • 6. Advantages of chatbots From: https://medium.com/@csengerszab/messenger-chatbots-is-it-a-new- revolution-6f550110c940
  • 7. Actually, As an interface, chatbots are not the best. Nothing beats a spreadsheet for most things. For data already tabulated, a chatbot may actually slow things down. A well-designed UI can allow a user to do anything in a few taps, as opposed to a few whole sentences in natural language. But chatbots can generally be regarded as universal -every channel supports text. Chatbots as a UX are effective as a protocol, like email.
  • 8. What's really important... Success conditions • Talk like a person. • The "back-end" MUST NOT BE a person. • Provide a single interface to access all other services. • Useful in industrial processes Response automation given specific, highly-repetitive linguistic inputs. Not conversations, but ~auto-replies linked to more refined email filters. • Knowledge representation.
  • 9. What's really important... Conversational technology should really be about • dynamically finding the best possible way to browse a large repository of information/actions. • find the shortest path to any relevant action or piece of information (to avoid the plane dashboard effect). • surfacing implicit data in unstructured content ("bocadillo de calamares in Madrid"). Rather than going open-domain, taking the closed-domain and going deep into it.
  • 10. What's really important... Conversational technology should really be about • dynamically finding the best possible way to browse a large repository of information/actions. • find the shortest path to any relevant action or piece of information (to avoid the plane dashboard effect). • surfacing implicit data in unstructured content ("bocadillo de calamares in Madrid"). Rather than going open-domain, taking the closed-domain and going deep into it.
  • 11. ... and the hype
  • 12. ... and the hype
  • 13. ... and the hype
  • 14. ... and the hype
  • 15. ... and the hype
  • 16. ... and the hype
  • 17. ... and the hype
  • 18. ... and the hype
  • 19. ... and the hype
  • 20. ... and the hype That was IBM Watson. 🙄
  • 21. ... and the hype Getting started with bots is as simple as using any of a handful of new bot platforms that aim to make bot creation easy... ... sophisticated bots require an understanding of natural language processing (NLP) and other areas of artificial intelligence. The optimists
  • 22. ... and the hype Artificial intelligence has advanced remarkably in the last three years, and is reaching the point where computers can have moderately complex conversations with users in natural language <perform simple actions as response to primitive linguistic signals>. The optimists
  • 23. ... and the hype Most of the value of deep learning today is in narrow domains where you can get a lot of data. Here’s one example of something it cannot do: have a meaningful conversation. There are demos, and if you cherry-pick the conversation, it looks like it’s having a meaningful conversation, but if you actually try it yourself, it quickly goes off the rails. Andrew Ng (Chief Scientist, AI, Baidu)
  • 24. ... and the hype Many companies start off by outsourcing their conversations to human workers and promise that they can “automate” it once they’ve collected enough data. That’s likely to happen only if they are operating in a pretty narrow domain – like a chat interface to call an Uber for example. Anything that’s a bit more open domain (like sales emails) is beyond what we can currently do. However, we can also use these systems to assist human workers by proposing and correcting responses. That’s much more feasible. Andrew Ng (Chief Scientist, AI, Baidu)
  • 26. Lack of suitable metrics Industrial state-of-the-art Liu, Chia-Wei, et al. "How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation." arXiv preprint arXiv: 1603.08023 (2016).
  • 27. By Intento (https://inten.to) Dataset SNIPS.ai 2017 NLU Benchmark https://github.com/snipsco/nlu-benchmark Example intents SearchCreativeWork (e.g. Find me the I, Robot television show) GetWeather (e.g. Is it windy in Boston, MA right now?) BookRestaurant (e.g. I want to book a highly rated restaurant for me and my boyfriend tomorrow night) PlayMusic (e.g. Play the last track from Beyoncé off Spotify) AddToPlaylist (e.g. Add Diamonds to my roadtrip playlist) RateBook (e.g. Give 6 stars to Of Mice and Men) SearchScreeningEvent (e.g. Check the showtimes for Wonder Woman in Paris) Industrial state-of-the-art Benchmark
  • 28. Methodology English language. Removed duplicates that differ by number of whitespaces, quotes, lettercase, etc. Resulting dataset parameters 7 intents, 15.6K utterances (~2K per intent) 3-fold 80/20 cross-validation. Most providers do not offer programmatic interfaces for adding training data. Industrial state-of-the-art Benchmark
  • 29. Framework Since F1 False positives Response time Price IBM Watson Conversation 2015 99.7 100% 0.35 2.5 API.ai (Google 2016) 2010 99.6 40% 0.28 Free Microsoft LUIS 2015 99.2 100% 0.21 0.75 Amazon Lex 2016 96.5 82% 0.43 0.75 Recast.ai 2016 97 75% 2.06 N/A wit.ai (Facebook) 2013 97.4 72% 0.96 Free SNIPS 2017 97.5 26% 0.36 Per device Industrial state-of-the-art
  • 32. So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> that's a nice one but, i want to see different jackets .... how can i do that
  • 33. So, let's be real... We've got what you're looking for! <image product_id=1 item_id=2> where can i find a pair of shoes that match this yak etc. We've got what you're looking for! <image product_id=1 item_id=3> pair of shoes (, you talking dishwasher)
  • 34. So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets lexical variants: synonyms, paraphrases
  • 35. So, let's be real... lexical variants: synonyms, paraphrases Intent
  • 36. So, let's be real... lexical variants: synonyms, paraphrases Intent Entity
  • 37. So, let's be real... lexical variants: synonyms, paraphrases Intent This can be normalized, e.g. regular expressions Entity
  • 38. So, let's be real... lexical variants: synonyms, paraphrases Intent This can also be normalized, e.g. regular expressions Entity
  • 39. So, let's be real... lexical variants: synonyms, paraphrases Intent This can also be normalized, e.g. regular expressions Entity 1-week project on a regular-expression-based filter. From 89% to 95% F1 with only a 5% positive rate. 7-label domain classification task using a RandomForest classifier over TFIDF-weighted vectors.
  • 40. So, let's be real... lexical variants: synonyms, paraphrases Intent My internet connection does not work. My smartphone does not work. My landline does not work. My router does not work. My SIM does not work. My payment method does not work. My saved movies does not work.
  • 41. So, let's be real... lexical variants: synonyms, paraphrases Intent My internet connection does not work. My smartphone does not work. My landline does not work. My router does not work. My SIM does not work. My payment method does not work. My saved movies does not work. ConnectionDebugWizard(*) TechnicalSupport(Mobile) TechnicalSupport(Landline) RouterDebugWizard() VerifyDeviceSettings AccountVerificationProcess TVDebugWizard
  • 42. So, let's be real... lexical variants: synonyms, paraphrases Intent My internet connection does not work. My smartphone does not work. My landline does not work. My router does not work. My SIM does not work. My payment method does not work. My saved movies does not work. ConnectionDebugWizard(*) TechnicalSupport(Mobile) TechnicalSupport(Landline) RouterDebugWizard() VerifyDeviceSettings AccountVerificationProcess TVDebugWizard Since we associate each intent to an action, the ability to discriminate actions presupposes training as many separate intents, with just as many ambiguities.
  • 43. So, let's be real... lexical variants: synonyms, paraphrases Intent My internet connection does not work. My smartphone does not work. My landline does not work. My router does not work. My SIM does not work. My payment method does not work. My saved movies does not work. ConnectionDebugWizard(*) TechnicalSupport(Mobile) TechnicalSupport(Landline) RouterDebugWizard() VerifyDeviceSettings AccountVerificationProcess TVDebugWizard It also presupposes exponentially increasing amounts of training data as we add higher-precision entities to our model (every entity * every intent that applies)
  • 44. So, let's be real... lexical variants: synonyms, paraphrases Intent My internet connection does not work. My smartphone does not work. My landline does not work. My router does not work. My SIM does not work. My payment method does not work. My saved movies does not work. ConnectionDebugWizard(*) TechnicalSupport(Mobile) TechnicalSupport(Landline) RouterDebugWizard() VerifyDeviceSettings AccountVerificationProcess TVDebugWizard The entity resolution step (which would have helped us make the right decision) is nested downstream in our pipeline. Any errors from the previous stage will propagate down.
  • 45. So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> that's a nice one but, I want to see different jackets .... how can i do that lexical variants: synonyms, paraphrases formal variants (typos, ASR errors) multiword chunking/parsing morphological variants
  • 46. So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> lexical variants: synonyms, paraphrases formal variants (typos, ASR errors) multiword chunking/parsing morphological variants entity recognition hey i need to buy a men's red leather jacket with straps for my friend that's a nice one but, I want to see different jackets .... how can i do that
  • 47. So, let's be real... where can i find a <jacket with straps> looking for a <black mens jacket> do you have anything in <leather> <present> for a friend }Our training dataset buy <red jacket>
  • 48. So, let's be real... where can i buy <a men's red leather jacket with straps> for my friend where can i find a <jacket with straps> looking for a <black mens jacket> do you have anything in <leather> <present> for a friend hi i need <a men's red leather jacket with straps> that my friend can wear }Our training dataset }Predictions at runtime buy <red jacket>
  • 49. So, let's be real... where can i buy <a men's red leather jacket with straps> for my friend where can i find a <jacket with straps> looking for a <black mens jacket> do you have anything in <leather> <present> for a friend hi i need <a men's red leather jacket with straps> that my friend can wear }Our training dataset }Predictions at runtime Incomplete training data will cause entity segmentation issues buy <red jacket>
  • 50. So, let's be real... buy <red jacket> where can i find a <jacket with straps> looking for a <black mens jacket> do you have anything in <leather> <present> for a friend }Our training dataset But more importantly, in the intent-entity paradigm we are usually unable to train two entity types with the same distribution: where can i buy a <red leather jacket> where can i buy a <red leather wallet> jacket, 0.5 wallet, 0.5 jacket, 0.5 wallet, 0.5
  • 51. So, let's be real... buy <red jacket> where can i find a <jacket with straps> looking for a <black mens jacket> do you have anything in <leather> <present> for a friend }Our training dataset ... and have no way of detecting entities used in isolation: pair of shoes MakePurchase ReturnPurchase ['START', 3-gram, 'END'] GetWeather Greeting Goodbye
  • 52. that's a nice one but, I want to see different jackets .... how can i do that So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> lexical variants: synonyms, paraphrases formal variants (typos, ASR errors) multiword chunking/parsing morphological variants entity recognition hey i need to buy a men's red leather jacket with straps for my friend searchItemByText("leather jacket") searchItemByText( "to buy a men's ... pfff TLDR... my friend" ) [ { "name": "leather jacket", "product_category_id": 12, "gender_id": 0, "materials_id": [68, 34, ...], "features_id": [], }, ... { "name": "leather jacket with straps", "product_category_id": 12, "gender_id": 1, "materials_id": [68, 34, ...], "features_id": [8754], }, ]
  • 53. that's a nice one but, I want to see different jackets .... how can i do that So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> lexical variants: synonyms, paraphrases formal variants (typos, ASR errors) multiword chunking/parsing morphological variants entity recognition hey i need to buy a men's red leather jacket with straps for my friend searchItemByText("leather jacket") searchItemByText( "red leather jacket with straps" ) [ { "name": "leather jacket", "product_category_id": 12, "gender_id": 0, "materials_id": [68, 34, ...], "features_id": [], }, ... { "name": "leather jacket with straps", "product_category_id": 12, "gender_id": 1, "materials_id": [68, 34, ...], "features_id": [8754], }, ] NO IDs, NO reasonable expectation of a cognitively realistic answer and NO "conversational technology".
  • 54. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend Straps Jackets Leather items Since the conditions express an AND logical operator and we will search for an item fulfilling all of them, we could actually search for the terms in the conjunction in any order.
  • 55. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend Given • any known word, denoted by w, • the vocabulary V of all known words, such that wi ∈ V for any i, • search space D, where d denotes every document in a collection D such that d ∈ D and d = {wi, wi + 1, ..., wn}, such that wi ≤ x ≤ n ∈ V, • a function filter(D, wx) that returns a subset of D, Dwx where wx ∈ d is true for every d: d ∈ Dwx, and • query q = {"men's", 'red', 'leather', 'jacket', 'straps'},
  • 56. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend do filter(filter(filter(filter( filter(D, 'leather'), 'straps'), "men's"), 'jacket'),'red' ) = do filter(filter(filter(filter( filter(D, 'straps'), 'red'), 'jacket'), 'leather'),"men's" ) = do filter(filter(filter(filter( filter(D, "men's"), 'leather'), 'red'), 'straps'),'jacket' ) =
  • 57. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend Given query q = {"men's", 'red', 'leather', 'jacket', 'straps'}, find the x: x ∈ q that satisfies: containsSubstring(x, 'leather') ^ containsSubstring(x, 'straps') ^ ... containsSubstring(x, "men's") As a result, it will lack the notion of what counts as a partial match. We cannot use more complex predicates because, in this scenario, the system has no understanding of the internal structure of the phrase, its semantic dependencies, or its syntactic dependencies.
  • 58. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend The following partial matches will be virtually indistinguishable: • leather straps for men's watches (3/5) • men's vintage red leather shoes (3/5) • red wallet with leather straps (3/5) • red leather jackets with hood (3/5)
  • 59. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend do filter(filter(filter(filter( filter(D, "men's"), 'red'), 'leather'), 'jacket'),'straps' ) Given • any known word, denoted by w, • the vocabulary V of all known words, such that wi ∈ V for any i, • search space D, where d denotes every document in a collection D such that d ∈ D and d = {wi, wi + 1, ..., wn}, such that wi ≤ x ≤ n ∈ V, • a function filter(D, wx) that returns a subset of D, Dwx where wx ∈ d is true for every d: d ∈ Dwx, and • query q = {"men's", 'red', 'leather', 'jacket', 'straps'}, This is our problem: we're using a function that treats equally every wx.
  • 60. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend do filter(filter(filter(filter( filter(D, "men's"), 'red'), 'leather'), 'jacket'),'straps' ) Given • any known word, denoted by w, • the vocabulary V of all known words, such that wi ∈ V for any i, • search space D, where d denotes every document in a collection D such that d ∈ D and d = {wi, wi + 1, ..., wn}, such that wi ≤ x ≤ n ∈ V, • a function filter(D, wx) that returns a subset of D, Dwx where wx ∈ d is true for every d: d ∈ Dwx, and • query q = {"men's", 'red', 'leather', 'jacket', 'straps'}, We need a function with at least one additional parameter, rx filter(D, wx, rx) where rx denotes the depth at which the dependency tree headed by wx attaches to the node being currently evaluated.
  • 61. So, let's be real... Your query i need to buy a men's red leather jacket with straps for my friend Tagging i/FW need/VBP to/TO buy/VB a/DT men/NNS 's/POS red/JJ leather/NN jacket/NN with/IN straps/NNS for/IN my/PRP$ friend/NN Parse (ROOT (S (NP (FW i)) (VP (VBP need) (S (VP (TO to) (VP (VB buy) (NP (NP (DT a) (NNS men) (POS 's)) (JJ red) (NN leather) (NN jacket)) (PP (IN with) (NP (NP (NNS straps)) (PP (IN for) (NP (PRP$ my) (NN friend))))))))))) http://nlp.stanford.edu:8080/parser/index.jsp (as of September 19th 2017)
  • 62. So, let's be real... Your query i need to buy a men's red leather jacket with straps for my friend Tagging i/FW need/VBP to/TO buy/VB a/DT men/NNS 's/POS red/JJ leather/NN jacket/NN with/IN straps/NNS for/IN my/PRP$ friend/NN Parse (ROOT (S (NP (FW i)) (VP (VBP need) (S (VP (TO to) (VP (VB buy) (NP (NP (DT a) ((NNS men) (POS 's)) ((JJ red) (NN leather)) (NN jacket))) (PP (IN with) (NP (NP (NNS straps)) (PP (IN for) (NP (PRP$ my) (NN friend))))))))))) http://nlp.stanford.edu:8080/parser/index.jsp (as of September 19th 2017)
  • 63. So, let's be real... Your query i need to buy a men's red leather jacket with straps for my friend Tagging i/FW need/VBP to/TO buy/VB a/DT men/NNS 's/POS red/JJ leather/NN jacket/NN with/IN straps/NNS for/IN my/PRP$ friend/NN Parse (ROOT (S (NP (FW i)) (VP (VBP need) (S (VP (TO to) (VP (VB buy) (NP (NP (DT a) ((NNS men) (POS 's)) ((JJ red) (NN leather)) (NN jacket))) (PP (IN with) (NP (NP (NNS straps)) (PP (IN for) (NP (PRP$ my) (NN friend))))))))))) http://nlp.stanford.edu:8080/parser/index.jsp (as of September 19th 2017) (red, leather) (leather, jacket) (jacket, buy)
  • 64. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend do filter(filter(filter(filter( filter(D, "men's"), 'red'), 'leather'), 'jacket'),'straps' ) Given • any known word, denoted by w, • the vocabulary V of all known words, such that wi ∈ V for any i, • search space D, where d denotes every document in a collection D such that d ∈ D and d = {wi, wi + 1, ..., wn}, such that wi ≤ x ≤ n ∈ V, • a function filter(D, wx) that returns a subset of D, Dwx where wx ∈ d is true for every d: d ∈ Dwx, and • query q = {"men's", 'red', 'leather', 'jacket', 'straps'}, The output of this function will no longer be a subset of search results but a ranked list. This list can be naively ranked by ∑x |t| (1 / rx) although much more advanced scoring functions can be used.
  • 65. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet"
  • 66. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend >>> t1 = dependencyTree(a1) >>> t1.depth("black") 3 >>> t1.depth("jacket") 1 Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet"
  • 67. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend >>> results = filter(D, "jacket", t1.depth("jacket")) >>> results [ ( Jacket1, 1.0 ), ( Jacket2, 1.0 ), ..., ( Jacketn, 1.0 ) ] Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet"
  • 68. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend >>> results = filter(results, "leather", t1.depth("leather")) >>> results [ ( Jacket1, 1.5 ), ( Jacket2, 1.5 ), ..., ( Jacketn - x, 1.5 ) ] Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet"
  • 69. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet" >>> results = filter(results, "red", t1.depth("red")) >>> results [ ( Jacket1, 1.5 ), ( Jacket2, 1.5 ), ..., ( Jacketn - x, 1.5 ) ]
  • 70. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend >>> t2 = dependencyTree(a2) >>> t2.depth("red") 3 >>> t2.depth("jacket") None Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet"
  • 71. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend >>> results = filter(D, "jacket", t2.depth("jacket")) >>> results [] Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet"
  • 72. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend >>> results = filter(results, "leather", t2.depth("leather")) >>> results [ ( Wallet1, 0.5 ), ( Wallet2, 0.5 ), ..., ( Strapn, 0.5 ) ] Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet"
  • 73. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet" >>> results = filter(results, "red", t2.depth("red")) >>> results [ ( Wallet1, 0.83 ), ( Wallet2, 0.83 ), ..., ( Strapn - x, 0.83 ) ]
  • 74. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend Results for q2: "red leather wallet" [ ( Wallet1, 0.83 ), ( Wallet2, 0.83 ), ..., ( Strapn - x, 0.83 ) ] Results for q1: "black leather jacket" [ ( Jacket1, 1.5 ), ( Jacket2, 1.5 ), ..., ( Jacketn - x, 1.5 ) ]
  • 75. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend However, the dependency tree is not always available, and it often contains parsing issues (cf. Stanford parser output). An evaluation of parser robustness over noisy input reports performances down to an average of 80% across datasets and parsers. Hashemi, Homa B., and Rebecca Hwa. "An Evaluation of Parser Robustness for Ungrammatical Sentences." EMNLP. 2016.
  • 76. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend In order to make this decision [ [red leather] jacket ]👍
  • 77. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend [ [red leather] jacket ] [ red [leather jacket] ]👍 In order to make this decision
  • 78. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ]👍 In order to make this decision
  • 79. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ] 😱 In order to make this decision [ [men's leather] jacket ]
  • 80. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ] [ [men's leather] jacket ]😱 In order to make this decision ... the parser already needs as much information as we can provide regarding head- modifier attachments.
  • 81. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ] [ [men's leather] jacket ] In order to make this decision 2 2 1 2 1 1 2 1 1 2 2 1
  • 82. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend In order to make this decision Same pattern, different score. How to assign a probability such that less plausible results are buried at the bottom of the hypotheses space? (not only for display purposes, but for dynamic programming reasons) [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ] [ [men's leather] jacket ] 2 2 1 2 1 1 2 1 1 2 2 1
  • 83. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend In order to make this decision p(red | jacket ) ~ p(red | leather ) p(men's | jacket ) > [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ] [ [men's leather] jacket ] 2 2 1 2 1 1 2 1 1 2 2 1 p(men's | leather )
  • 84. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend That's a lot of data to model. Where will we get it from? [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ] [ [men's leather] jacket ] 2 2 1 2 1 1 2 1 1 2 2 1
  • 85. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend That's a lot of data to model. Where will we get it from? p(Color | ClothingItem ) ~ p(Color | Material ) p(Person | ClothingItem ) > [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ] [ [men's leather] jacket ] 2 2 1 2 1 1 2 1 1 2 2 1 p(Person | Material )
  • 86. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend If our taxonomy is well built, semantic associations acquired for a significant number of members of each class will propagate to new, unseen members of that class. With a sufficiently rich NER detection pipeline and a semantic relation database large enough, we can rely on simple inference for a vast majority of the disambiguations. More importantly, we can weight candidate hypotheses dynamically in order to make semantic parsing tractable over many possible syntactic trees.
  • 87. So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> that's a nice one but, I want to see different jackets .... how can i do that lexical variants: synonyms, paraphrases formal variants (typos, ASR errors) multiword chunking/parsing morphological variants entity recognition entity linking hierarchical structure
  • 88. So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> that's a nice one but, i want to see different jackets .... how can i do that
  • 89. So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand
  • 90. So, let's be real... that one do that LastResult jacket LastCommand that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand
  • 91. So, let's be real... that LastResult IF: pPoS-tagging(that_DT | * be_VBZ, START *) > pPoS-tagging(that_IN | * be_VBZ, START *) that_DT 's_VBZ great_JJ !_PUNCT that_DT 's_VBZ the_DT last_JJ... he said that_IN i_PRP... he said that_IN trump_NNP... THEN IF: pisCoreferent(LastValueReturned | that_DT) > u # may also need to condition on the # dependency: (x, nice) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand
  • 92. So, let's be real... one jacket Find the LastResult in the context that maximizes: pisCoreferent( LastResult.type=x | one_CD, # trigger for the function, trivial condition QualitativeEvaluation(User, x), ContextDistance(LastResult.node) ) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand
  • 93. So, let's be real... one jacket Particularly important for decision-tree re-entry that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand Find the LastResult in the context that maximizes: pisCoreferent( LastResult.type=x | one_CD, # trigger for the function, trivial condition QualitativeEvaluation(User, x), ContextDistance(LastResult.node) )
  • 94. So, let's be real... no! forget the last thing i said go back! third option in the menu i want to modify the address I gave you
  • 95. So, let's be real... QualitativeEvaluation(Useru, ClothingItemi) BrowseCatalogue(Useru, filter=[ClothingItemn, j |J|]) HowTo(Useru, Commandc |Context|) }A single utterance that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand
  • 96. So, let's be real... }A single utterance that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand QualitativeEvaluation(Useru, ClothingItemi) BrowseCatalogue(Useru, filter=[ClothingItemn, j |J|]) HowTo(Useru, Commandc |Context|)
  • 97. So, let's be real... }random.choice(I) x if x ∈ I ^ length(x) = argmax(L) I = [ i1, i2, i3 ] L = [ | i1 |, | i2 |, | i3 | ] P = [ p( i1 ), p( i2 ), p( i3 )] x if x ∈ I ^ p(x) = argmax(P) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand QualitativeEvaluation(Useru, ClothingItemi) BrowseCatalogue(Useru, filter=[ClothingItemn, j |J|]) HowTo(Useru, Commandc |Context|)
  • 98. So, let's be real... } random.choice(I) x if x ∈ I ^ length(x) = argmax(L) I = [ i1, i2, i3 ] L = [ | i1 |, | i2 |, | i3 | ] P = [ p( i1 | q ), p( i2 | q ), p( i3 | q ) ] x if x ∈ I ^ p(x | q) = argmax(P) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand However, this approach will still suffer from segmentation issues due to passing many more arguments than expected for resolving any specific intent.
  • 99. So, let's be real... GetInfo(Useru, argmax(Commandc |Context| ) ) SetPreference(Useru, SetRating(ClothingItemi, k, 1) ) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand ∀(ClothingItemn, m |N| |M| | n = i, k = * ) } A standard API will normally provide these definitions as the declaration of its methods.
  • 100. So, let's be real... GetInfo(Useru, argmax(Commandc |Context| ) ) SetPreference(Useru, SetRating(ClothingItemi, k, 1) ) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand ∀(ClothingItemn, m |N| |M| | n = i, k = * ) } We can map the argument signatures to the nodes of the taxonomy we are using for linguistic inference.
  • 101. So, let's be real... GetInfo(Useru, argmax(Commandc |Context| ) ) SetPreference(Useru, SetRating(ClothingItemi, k, 1) ) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand ∀(ClothingItemn, m |N| |M| | n = i, k = * ) }The system will then be able to link API calls to entity parses dynamically.
  • 102. So, let's be real... GetInfo(Useru, argmax(Commandc |Context| ) ) SetPreference(Useru, SetRating(ClothingItemi, k, 1) ) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand ∀(ClothingItemn, m |N| |M| | n = i, k = * ) } Over time, new methods added to the API will be automatically supported by the pre-existing linguistic engine.
  • 103. So, let's be real... GetInfo(Useru, argmax(Commandc |Context| ) ) SetPreference(Useru, SetRating(ClothingItemi, k, 1) ) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand ∀(ClothingItemn, m |N| |M| | n = i, k = * ) } Over time, new methods added to the API will be automatically supported by the pre-existing linguistic engine. 😁
  • 104. So, let's be real... If the system can reasonably solve nested semantic dependencies hierarchically, it can also be extended to handle multi-intent utterances in a natural way.
  • 105. So, let's be real... We need: • semantic association measures between the leaves of the tree in order to attach them to the right nodes and be able to stop growing a tree at the right level, • a syntactic tree to build the dependencies, either using a parser or a PCFG/LFG grammar (both may take input from the previous step, which will provide the notion of verbal valence: Person Browse Product), • for tied analyses, an interface with the user to request clarification,
  • 106. So, let's be real... Convert to abstract entity types Activate applicable dependencies/grammar rules (fewer and less ambiguous thanks to the previous step) i need to pay my bill and i cannot log into my account i need to PAYMENT my INVOICE and i ISSUE ACCESS ACCOUNT Resolve attachments and rank candidates by semantic association, keep top-n hypotheses i need to [ PAYMENT my INVOICE ] and i [ ISSUE [ AccessAccount() ] ] i need to [ Payment(x) ] and i [ Issue(y) ]
  • 107. So, let's be real... AND usually joins ontologically coordinate elements, e.g. books and magazines versus *humans and Donald Trump i need to [ Payment(x) ] and i [ Issue(y) ]
  • 108. So, let's be real... Even if p0 = p( Payment | Issue ) > u, we will still split the intents because here we have py = p( Payment | Issue, y ), and normally py < p0 for any irrelevant case (i.e., the posterior probability of the candidate attachment will not beat the baseline of its own prior probability) i need to [ Payment(x) ] and i [ Issue(y) ]
  • 109. So, let's be real... I.e., Payment may have been attached to Issue if no other element had been attached to it before. Resolution order, as given by attachment probabilities, matters. i need to [ Payment(x) ] and i [ Issue(y) ]
  • 110. So, let's be real... i need to [ Payment(x) ] i [ Issue(y) ] i need to [ Payment(x) ] and i [ Issue(y) ] Multi-intents solved 😎
  • 111. Summary of desiderata 1. Normalization 1. Ability to handle input indeterminacy by activating several candidate hypotheses 2. Ability to handle synonyms 3. Ability to handle paraphrases 2. NER 1. Transversal (cross-intent) 2. No nesting under classification (feedback loop instead) 3. Contextually unbounded 4. Entity-linked 3. Semantic parsing 1. Taxonomy-driven supersense inference 2. Semantically-informed syntactic trees 3. Ability to handle decision-tree re-entries
  • 112. Summary of desiderata 1. Normalization 1. Ability to handle input indeterminacy by activating several candidate hypotheses 2. Ability to handle synonyms 3. Ability to handle paraphrases 2. NER 1. Transversal (cross-intent) 2. No nesting under classification (feedback loop instead) 3. Contextually unbounded 4. Entity-linked 3. Semantic parsing 1. Taxonomy-driven supersense inference 2. Semantically-informed syntactic trees 3. Ability to handle decision-tree re-entries Word embeddings, domain thesaurus, fuzzy matching Inverted indexes, character- based neural networks for NER
  • 113. Summary of desiderata 1. Normalization 1. Ability to handle input indeterminacy by activating several candidate hypotheses 2. Ability to handle synonyms 3. Ability to handle paraphrases 2. NER 1. Transversal (cross-intent) 2. No nesting under classification (feedback loop instead) 3. Contextually unbounded 4. Entity-linked 3. Semantic parsing 1. Taxonomy-driven supersense inference 2. Semantically-informed syntactic trees 3. Ability to handle decision-tree re-entries Word embeddings, domain thesaurus, fuzzy matching Inverted indexes, character- based neural networks for NER implicitly solved by Supersenses + Neural enquirer implicitly solved by Neural enquirer (layered/modular)
  • 115. 3.1 Supersense inference GFT (Google Fine-Grained) taxonomy Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods for Fine Grained Entity Type Classification." ACL (2). 2015.
  • 116. 3.1 Supersense inference FIGER taxonomy Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods for Fine Grained Entity Type Classification." ACL (2). 2015.
  • 117. 3.1 Supersense inference for every position, term, tag tuple in TermExtractor3(s): for every document d in D2: FB = Freebase owem = OurWordEmbeddingsModel Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods for Fine Grained Entity Type Classification." ACL (2). 2015. for every sent s in d: sentence_vector = None if not sentence_vector: sentence_vector = vectorizeSentence(s, position) owem[ T[tag] ].update( sentence_vector ) T = OurTaxonomy1 3 TermExtractor = parser + chunker + entity resolver that assigns Freebase types to entities. 2 D = 133,000 news documents. 1 FB labels are manually mapped to fine-grained labels.
  • 118. 3.1 Supersense inference for every position, term, tag tuple in TermExtractor3(s): for every document d in D2: FB = Freebase owem = OurWordEmbeddingsModel Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods for Fine Grained Entity Type Classification." ACL (2). 2015. for every sent s in d: sentence_vector = None if not sentence_vector: sentence_vector = vectorizeSentence(s, position) owem[ T[tag] ].update( sentence_vector ) T = OurTaxonomy1 3 TermExtractor = parser + chunker + entity resolver that assigns Freebase types to entities. 2 D = 133,000 news documents. 1 FB labels are manually mapped to fine-grained labels.
  • 119. 3.1 Supersense inference Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods for Fine Grained Entity Type Classification." ACL (2). 2015.
  • 120. 3.2 Semantic trees Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). q = "Which city hosted the longest game before the game in Beijing?" KB = knowledge base consisting of tables that store facts encoder = bi-directional Recurrent Neural Network v = encoder(q) # vector of the encoded query kb_encoder = KB table encoder
  • 121. 3.2 Semantic trees Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). q = "Which city hosted the longest game before the game in Beijing?" KB = knowledge base consisting of tables that store facts encoder = bi-directional Recurrent Neural Network v = encoder(q) # vector of the encoded query kb_encoder = KB table encoder fn = embedding of the name of the n-th column wmn = embedding of the value in the n-th column of the m-th row [x ; y] = vector concatenation W = weights (what we will e.g. SGD out of the data) b = bias
  • 122. 3.2 Semantic trees Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). q = "Which city hosted the longest game before the game in Beijing?" KB = knowledge base consisting of tables that store facts encoder = bi-directional Recurrent Neural Network v = encoder(q) # vector of the encoded query kb_encoder = KB table encoder fn = embedding of the name of the n-th column wmn = embedding of the value in the n-th column of the m-th row [x ; y] = vector concatenation W = weights (what we will e.g. SGD out of the data) b = bias normalizeBetween0and1( penaltiesOrRewards * # the weights that minimize the loss for this column [ {sydney... australia... kangaroo...} + {city... town... place...} ] + somePriorExpectations )
  • 123. 3.2 Semantic trees Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). "olympic games near kangaroos" 👍 q = "Which city hosted the longest game before the game in Beijing?" KB = knowledge base consisting of tables that store facts encoder = bi-directional Recurrent Neural Network v = encoder(q) # vector of the encoded query kb_encoder = KB table encoder fn = embedding of the name of the n-th column wmn = embedding of the value in the n-th column of the m-th row [x ; y] = vector concatenation W = weights (what we will e.g. SGD out of the data) b = bias normalizeBetween0and1( penaltiesOrRewards * [ {sydney... australia... kangaroo...} + {city... town... place...} ] + somePriorExpectations )
  • 124. 3.2 Semantic trees Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). "olympic games near kangaroos" Risk of false positives? Already as probabilities!! 👍 q = "Which city hosted the longest game before the game in Beijing?" KB = knowledge base consisting of tables that store facts encoder = bi-directional Recurrent Neural Network v = encoder(q) # vector of the encoded query kb_encoder = KB table encoder fn = embedding of the name of the n-th column wmn = embedding of the value in the n-th column of the m-th row [x ; y] = vector concatenation W = weights (what we will e.g. SGD out of the data) b = bias normalizeBetween0and1( penaltiesOrRewards * [ {sydney... australia... kangaroo...} + {city... town... place...} ] + somePriorExpectations )
  • 125. 3.2 Semantic trees Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Olympic Games City Year Duration Barcelona 1992 15 Atlanta 1996 18 Sydney 2000 17 Athens 2004 14 Beijing 2008 16 London 2012 16 Rio de Janeiro 2016 18
  • 126. 3.2 Semantic trees Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games? Olympic Games City Year Duration Barcelona 1992 15 Atlanta 1996 18 Sydney 2000 17 Athens 2004 14 Beijing 2008 16 London 2012 16 Rio de Janeiro 2016 18
  • 127. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games? Olympic Games City Year Duration Barcelona 1992 15 Atlanta 1996 18 Sydney 2000 17 Athens 2004 14 Beijing 2008 16 London 2012 16 Rio de Janeiro 2016 18 Q(i) = T(i) = y(i) = RNN-encoded = p(vector("column:city") | [ vector("which"), vector("city")... ] ) > p(vector("column:city") | [ vector("city"), vector("which")... ] ) 3.2 Semantic trees
  • 128. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) As many as SQL operations 3.2 Semantic trees
  • 129. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) Each executor applies the same set of weights to all rows (since it models a single operation) 3.2 Semantic trees
  • 130. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) The logic of max, min functions is built-in as column-wise operations. The system learns over which ranges they apply 3.2 Semantic trees
  • 131. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) The logic of max, min functions is built-in as column-wise operations. The system learns over which ranges they apply (before, after are just temporal versions of argmax and argmin) 3.2 Semantic trees
  • 132. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) Candidate vector at layer L for row m of our database Memory layer with the last output vector generated by executor L - 1 Evaluate a candidate: 3.2 Semantic trees
  • 133. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) Row m, R = our table Set of columns As an attention mechanism, cumulatively weight the vector by the output weights from executor L - 1 Evaluate a candidate: 3.2 Semantic trees
  • 134. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) Row m, R = our table Set of columns This allows the system to restrict the reference at each step. Evaluate a candidate: 3.2 Semantic trees
  • 135. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) Row m, R = our table Set of columns As we weight the vector in the direction of each successive operation, it gets closer to any applicable answer in our DB. Evaluate a candidate: 3.2 Semantic trees
  • 136. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). SbS is N2N with bias. SEMPRE is a toolkit for training semantic parsers, which map natural language utterances to denotations (answers) via intermediate logical forms. Here's an example for querying databases. https://nlp.stanford.edu/software/sempre/ 3.2 Semantic trees
  • 137. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). SbS is N2N with bias. SEMPRE is a toolkit for training semantic parsers, which map natural language utterances to denotations (answers) via intermediate logical forms. Here's an example for querying databases. https://nlp.stanford.edu/software/sempre/ 😳 3.2 Semantic trees
  • 138. 3.3 Re-entry Task: given a set of facts or a story, answer questions on those facts in a logically consistent way. Problem: Theoretically, it could be achieved by a language modeler such as a recurrent neural network (RNN). However, their memory (encoded by hidden states and weights) is typically too small, and is not compartmentalized enough to accurately remember facts from the past (knowledge is compressed into dense vectors). RNNs are known to have difficulty in memorization tasks, i.e., outputting the same input sequence they have just read. Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014).
  • 139. Task: given a set of facts or a story, answer questions on those facts in a logically consistent way. Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). Jason went to the kitchen. Jason picked up the milk. Jason travelled to the office. Jason left the milk there. Jason went to the bathroom. jason go kitchen jason get milk jason go office jason drop milk jason go bathroom Where is the milk? milk? 3.3 Re-entry
  • 140. F = set of facts (e.g. recent context of conversation; sentences) M = memory for fact f in F: # any linguistic pre-processing may apply store f in the next available memory slot of M Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). At the reading stage... 3.3 Re-entry
  • 141. F = set of facts (e.g. recent context of conversation; sentences) M = memory O = inference module (oracle) x = input q = query about our known facts k = number of candidate facts to be retrieved m = a supporting fact (typically, the one that maximizes support) Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). At the inference stage... 3.3 Re-entry
  • 142. Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). At the inference stage... jason go kitchen jason get milk jason go office jason drop milk jason go bathroom F = set of facts (e.g. recent context of conversation; sentences) M = memory O = inference module (oracle) x = input q = query about our known facts k = number of candidate facts to be retrieved m = a supporting fact (typically, the one that maximizes support) 3.3 Re-entry
  • 143. Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). At the inference stage, t1... milk jason go kitchen jason get milk jason go office jason drop milk jason go bathroom Candidate space for k = 2 jason go kitchen jason get milk jason go office jason drop milk jason go bathroom argmax over that space milk, [ ] 3.3 Re-entry
  • 144. Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). At the inference stage, t2... milk jason go kitchen jason get milk jason go office jason drop milk jason go bathroom Candidate space for k = 2 jason go kitchen jason get milk jason go office jason drop milk jason go bathroom argmax over that space milk, jason, drop 3.3 Re-entry
  • 145. Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). At the inference stage, t2... jason go kitchen jason get milk jason go office jason drop milk jason go bathroom jason go kitchen jason get milk jason go office jason drop milk jason go bathroom Why not? jason, get, milk = does not contribute as much new information as jason go office 3.3 Re-entry
  • 146. Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). Training is performed with a margin ranking loss and stochastic gradient descent (SGD). Simple model. Easy to integrate with any pipeline already working with structured facts (e.g. Neural Inquirer). Adds a temporal dimension to it. Answering the question about the location of the milk requires comprehension of the actions picked up and left. 3.3 Re-entry
  • 147. Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). QACNN CBT-NE CBT-CN Date 69.4 66.6 63.0 Kadlec, Rudolf, et al. "Text understanding with the attention sum reader network." arXiv preprint arXiv: 1603.01547 (2016). 75.4 71.0 68.9 4 Mar 16 Sordoni, Alessandro, et al. "Iterative alternating neural attention for machine reading." arXiv preprint arXiv: 1606.02245 (2016). 76.1 72.0 71.0 7 Jun 16 Trischler, Adam, et al. "Natural language comprehension with the epireader." arXiv preprint arXiv:1606.02270 (2016). 74.0 71.8 70.6 7 Jun 16 Dhingra, Bhuwan, et al. "Gated-attention readers for text comprehension." arXiv preprint arXiv:1606.01549 (2016). 77.4 71.9 69.0 5 Jun 16 3.3 Re-entry
  • 148. 0 25 50 75 100 KB IE Wikipedia 76.2 68.3 93.9 69.9 63.4 78.5 No Knowledge (embeddings) Standard QA System on KB 54.4% 93.5% 17% Memory Networks Key-Value Memory Networks Responseaccuracy (%) Source: Antoine Bordes, Facebook AI Research, LXMLS Lisbon July 28, 2016 Results on MovieQA
  • 150. Summary of desiderata 3. Semantic parsing 1. Taxonomy-driven supersense inference 2. Semantically-informed syntactic trees 3. Ability to handle decision-tree re-entries Extend the neuralized architecture's domain by integrating support for supersenses. Enable inference at training and test time. Enable semantic parsing using a neuralized architecture with respect to some knowledge base serving as ground truth. Use memory networks to effectively keep track of the entities throughout the conversation and ensure logical consistency. Extend memory networks by integrating support for supersenses. Enable inference at training and test time.
  • 151. Summary of desiderata Conversational technology should really be about • dynamically finding the best possible way to browse a large repository of information/actions. • find the shortest path to any relevant action or piece of information (to avoid the plane dashboard effect). • surfacing implicit data in unstructured content ("bocadillo de calamares in Madrid"). Rather than going open-domain, taking the closed-domain and going deep into it. Remember?