Deep Semantic Learning for Conversational Agents

Deep Semantic Learning
for Conversational Agents
Candidate: Martino Mensio
Supervisor: Maurizio Morisio
Tutor: Giuseppe Rizzo
12 April 2018
1

Objectives
1. Identify the approaches to build a
Conversational Agent with Natural
Language Understanding
2. Use the context of interaction
2

Background: Conversational Agents
What they can do:
- automated interaction with customer
- virtual assistants
What content they can provide:
- Chit-chat (small talk)
- Goal-oriented
- Knowledge-based
4

Background: from questions to answers
5

Background: an example of Understanding
6

Background: Recurrent Neural Networks
7

Background: intent classification
[1] Liu, B. and Lane, I. (2016). Attention-based recurrent neural network models for joint intent detection and slot
filling. Proceedings of The 17th Annual Meeting of the International Speech Communication Association.
8

Background: slot filling
[1] Liu, B. and Lane, I. (2016). Attention-based recurrent
neural network models for joint intent detection and slot
filling. Proceedings of The 17th Annual Meeting of the
International Speech Communication Association.
9

Background: Word Embeddings
[2] Harris, Z. S. (1970). Distributional structure. In Papers in structural and transformational linguistics (pp.
775-794). Springer, Dordrecht.
10
Distributional Semantics [2]: words used in similar
contexts have similar meaning
- each word corresponds to a vector of reals
- small dimensionality (50~300)
- semantic distribution in a multidimensional space

Approach: the multi-turn interactions
- detect the change of intent
- capture intent dependencies
- consider the agent words 12

Approach: difference between multi-turn and single-turn
13

Approach: multi-turn example
14

Approach: Word Embeddings for Italian language
recomputation of Italian Wikipedia embeddings
with proper tokenization (with respect to [6])
15[6] Berardi, G., Esuli, A., & Marcheggiani, D. (2015). Word Embeddings Go to Italy: A Comparison of Models and
Training Datasets. In IIR.
“Voglio una bici vicino a piazza castello, grazie”
↓
[“Voglio”, “una”, “bici”, “vicino”, “a”, “piazza”, “castello”, “,”, “grazie”]

Results: the datasets
available:
- ATIS (single-turn) [3]
- nlu-benchmark (single-turn) [4]
- kvret (multi-turn) [5]
collected:
- bikes Italian (single-turn)
- bikes English (single-turn)
17
[3] Hemphill, C., Godfrey, J., Doddington, G. (1990). The ATIS spoken language systems pilot corpus. DARPA Speech
and Natural Language Workshop
[4] https://github.com/snipsco/nlu-benchmark
[5] Eric, M. and Manning, C. (2017). Key-value retrieval networks for task-oriented dialogue. SIGDIAL 2017: Session
on Natural Language Generation for Dialog Systems

Results: multi-turn intent classification
results on kvret dataset [5]
18
approach
F1 epoch number
intent RNN agent words
✓ LSTM ✓ 0.9987 7
✓ LSTM ✘ 0.9987 8
✓ GRU ✓ 0.9975 14
✘ ✓ 0.9951 5
✓ GRU ✘ 0.9585 9
[1]✘ ✘ 0.8524 8
[5] Eric, M. and Manning, C. (2017). Key-value retrieval networks for task-oriented dialogue. SIGDIAL 2017: Session
on Natural Language Generation for Dialog Systems

Results: Italian Word Embeddings
19
[7] Mikolov, T., Yih, W. T., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. In
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics:
Human Language Technologies (pp. 746-751).
Word Embeddings accuracy
Italian values from [6] on Wikipedia 44.81%
Computed Italian values on Wikipedia 58.14%
analogy test [7]:
- semantic (capital-country, nationality adjective, currency, family)
- syntactic (m-f, singular-plural, tenses, comparatives, superlatives)
[6] Berardi, G., Esuli, A., & Marcheggiani, D. (2015). Word Embeddings Go to Italy: A Comparison of Models and

Results: the difference on the global tasks (Italian)
Measured on the bike sharing dataset on the
approach by [1]
20
Word Embeddings intent classification F1 slot filling F1
Italian values from [6] on Wikipedia,
730k vectors
0.8421 0.5666
Computed Italian values on Wikipedia,
758k vectors
0.8947 0.6153
[7] Berardi, G., Esuli, A., & Marcheggiani, D. (2015). Word Embeddings Go to Italy: A Comparison of Models and

Results: the difference of embeddings on the two tasks (English)
21
Embeddings
intent classification F1 slot filling F1
ATIS nlu-bench
mark
bikes
english
ATIS nlu-bench
mark
bikes
english
Trainable, random
initialization
0.9740 0.9928 0.9428 0.9425 0.9177 0.9000
[8] precomputed,
685k keys,
20k unique vectors
0.9660 0.9928 0.9714 0.9588 0.8970 0.9375
[8] precomputed,
685k keys,
685k unique vectors
0.9860 0.9928 0.9714 0.9649 0.9170 0.9689
[8] https://spacy.io/models/en
Measured on the approach by [1]

Conclusions
- results of the multi-turn show the
importance of context
- results for the word embeddings show the
importance of their proper choice
22

Future works
- multi-turn slot filling to remove
handcrafted dialog tracking
23

Deep Semantic Learning for Conversational Agents

Recommended

Recommended

More Related Content

Similar to Deep Semantic Learning for Conversational Agents

Similar to Deep Semantic Learning for Conversational Agents (20)

More from Martino Mensio

More from Martino Mensio (6)

Recently uploaded

Recently uploaded (20)

Deep Semantic Learning for Conversational Agents