Skill-based Conversational
Agent
Idris Yusupov, Yury Kuratov
i.yusupov@phystech.edu, yurii.kuratov@phystech.edu
Moscow Institute of Physics and Technology
Plan
1. Intro to Conversational Agents
2. The Conversational Intelligence Challenge @ NIPS (convai.io)
3. Skill-based Conversational Agent
What is Conversational Agent?
● Siri (Apple)
● Cortana (Microsoft)
● Alexa (Amazon)
● Chat-Bots (Telegram, Facebook messenger)
Two dimensions of Conversational Agents
Artificial Intelligence
Pizza Bot
Siri
ConvAI Bot
ConvAI (The Conversational Intelligence Challenge)
ConvAI
- 6 Teams (McGill, MIPT, University of Wroclaw, …)
- Human evaluation qualification round (July, 2017)
- 1st place: 2.386 of 5 (overall dialog quality)
- 2nd (Ours): 2.318 of 5
- Released dataset: http://convai.io/data/
- about 2k dialogs
- NIPS Final (December, 2017)
- Talk with bots and help to collect the data:
http://t.me/ConvaiBot
Conversation about the text
Conversation about the text
- Skill - narrow model.
- What skills are required to discuss the text?
- Question generation (factoid, common, …)
- Question answering (factoid, common, …)
- Chit-chat skill
- Summarization skill
- Personality skill (Name, birthday …)
- …
- Models for skills:
- Seq2Seq
- Retrieval models
- Templates
- Rules
- ...
Conversation about the text
- Finite state machine (FSM) to model the conversation
- Hard to maintain
Our approach
- Focus on skills implementation
What is done. Skills
- Seq2Seq, OpenNMT
- Question generation (SQuAD)
- Chit-chat (Facebook news)
- Chit-chat (Open Subtitles)
- Question Answering (BiDAF)
- Greeting skill
- Common questions asking skill
- Checking user answer correctness skill
What is done. Skill classifier
Evolution of skill classifier
● Baseline (done)
○ no conversational data
○ use classifier to select skills
● Model with scorer (in progress)
○ we have some data after human evaluation round
○ 2k dialogs (all bots)
○ we are mostly interested in our bot scores
● Model with improved scorer (to be done)
○ data from Mechanical Turk
What is done. Dialog evaluation scorer
- 2 evaluation scorers were built by using ConvAI human evaluation dataset
- Current utterance quality scorer: [context, utterance] => (poor, good)
- Word level GRU, sequence length is 50
- Overall dialog quality scorer: [overall dialog] => (poor, neutral, good)
- Word level GRU, sequence length is a whole dialog
Future work
● Improve classifier by using current utterance quality scorer
● Setup for human dialog evaluation (Amazon Mechanical Turk, Telegram)
● Bot with bot conversation using dialog scorer
● Improve skills
● New skills (summarization, retrieval based models)
Summary
- Skill: it is a narrow model (question generation/answerer, chit-chat, …)
- Conversational agent requires management of such skills
- Management can be done using FSM, but it is hard to maintain them
- Our approach helps to get rid of FSM and focus on skill implementation
- Main idea: use classifier which decides what skill to use
- Future work may lead to interesting results
- Talk with our Telegram bot here: https://t.me/IdrisConvaioTestBot
References
1. The Conversational Intelligence Challenge: http://convai.io/
2. Our bot: https://t.me/IdrisConvaioTestBot
3. Seo M. et al. Bidirectional attention flow for machine comprehension //arXiv preprint
arXiv:1611.01603. – 2016.
4. Zhou, Q., Yang, N., Wei, F., Tan, C., Bao, H., & Zhou, M. (2017). Neural Question Generation from
Text: A Preliminary Study. arXiv preprint arXiv:1704.01792.
5. Bordes, A., & Weston, J. (2016). Learning end-to-end goal-oriented dialog. arXiv preprint
arXiv:1605.07683.
6. Serban I. V. et al. A Deep Reinforcement Learning Chatbot //arXiv preprint arXiv:1709.02349. –
2017.
7. Lewis M. et al. Deal or No Deal? End-to-End Learning for Negotiation Dialogues //arXiv preprint
arXiv:1706.05125. – 2017.

Skill-based Conversational Agent

  • 1.
    Skill-based Conversational Agent Idris Yusupov,Yury Kuratov i.yusupov@phystech.edu, yurii.kuratov@phystech.edu Moscow Institute of Physics and Technology
  • 2.
    Plan 1. Intro toConversational Agents 2. The Conversational Intelligence Challenge @ NIPS (convai.io) 3. Skill-based Conversational Agent
  • 3.
    What is ConversationalAgent? ● Siri (Apple) ● Cortana (Microsoft) ● Alexa (Amazon) ● Chat-Bots (Telegram, Facebook messenger)
  • 4.
    Two dimensions ofConversational Agents Artificial Intelligence Pizza Bot Siri ConvAI Bot
  • 5.
    ConvAI (The ConversationalIntelligence Challenge)
  • 6.
    ConvAI - 6 Teams(McGill, MIPT, University of Wroclaw, …) - Human evaluation qualification round (July, 2017) - 1st place: 2.386 of 5 (overall dialog quality) - 2nd (Ours): 2.318 of 5 - Released dataset: http://convai.io/data/ - about 2k dialogs - NIPS Final (December, 2017) - Talk with bots and help to collect the data: http://t.me/ConvaiBot
  • 8.
  • 9.
    Conversation about thetext - Skill - narrow model. - What skills are required to discuss the text? - Question generation (factoid, common, …) - Question answering (factoid, common, …) - Chit-chat skill - Summarization skill - Personality skill (Name, birthday …) - … - Models for skills: - Seq2Seq - Retrieval models - Templates - Rules - ...
  • 10.
    Conversation about thetext - Finite state machine (FSM) to model the conversation - Hard to maintain
  • 11.
    Our approach - Focuson skills implementation
  • 12.
    What is done.Skills - Seq2Seq, OpenNMT - Question generation (SQuAD) - Chit-chat (Facebook news) - Chit-chat (Open Subtitles) - Question Answering (BiDAF) - Greeting skill - Common questions asking skill - Checking user answer correctness skill
  • 13.
    What is done.Skill classifier
  • 14.
    Evolution of skillclassifier ● Baseline (done) ○ no conversational data ○ use classifier to select skills ● Model with scorer (in progress) ○ we have some data after human evaluation round ○ 2k dialogs (all bots) ○ we are mostly interested in our bot scores ● Model with improved scorer (to be done) ○ data from Mechanical Turk
  • 15.
    What is done.Dialog evaluation scorer - 2 evaluation scorers were built by using ConvAI human evaluation dataset - Current utterance quality scorer: [context, utterance] => (poor, good) - Word level GRU, sequence length is 50 - Overall dialog quality scorer: [overall dialog] => (poor, neutral, good) - Word level GRU, sequence length is a whole dialog
  • 16.
    Future work ● Improveclassifier by using current utterance quality scorer ● Setup for human dialog evaluation (Amazon Mechanical Turk, Telegram) ● Bot with bot conversation using dialog scorer ● Improve skills ● New skills (summarization, retrieval based models)
  • 17.
    Summary - Skill: itis a narrow model (question generation/answerer, chit-chat, …) - Conversational agent requires management of such skills - Management can be done using FSM, but it is hard to maintain them - Our approach helps to get rid of FSM and focus on skill implementation - Main idea: use classifier which decides what skill to use - Future work may lead to interesting results - Talk with our Telegram bot here: https://t.me/IdrisConvaioTestBot
  • 18.
    References 1. The ConversationalIntelligence Challenge: http://convai.io/ 2. Our bot: https://t.me/IdrisConvaioTestBot 3. Seo M. et al. Bidirectional attention flow for machine comprehension //arXiv preprint arXiv:1611.01603. – 2016. 4. Zhou, Q., Yang, N., Wei, F., Tan, C., Bao, H., & Zhou, M. (2017). Neural Question Generation from Text: A Preliminary Study. arXiv preprint arXiv:1704.01792. 5. Bordes, A., & Weston, J. (2016). Learning end-to-end goal-oriented dialog. arXiv preprint arXiv:1605.07683. 6. Serban I. V. et al. A Deep Reinforcement Learning Chatbot //arXiv preprint arXiv:1709.02349. – 2017. 7. Lewis M. et al. Deal or No Deal? End-to-End Learning for Negotiation Dialogues //arXiv preprint arXiv:1706.05125. – 2017.