Analysis of a State Machine-based Interactive Dialogue Management System

Analysis of a state Machine-based
Interactive Dialogue Management System
Ruixiang Qi

Content
´ Research Background
´ Methods
´ Analysis and Evaluation
´ Conclusion

What is a spoken dialogue system

Alexa’s Prize
Evaluated by user ratings
´ Random Users
´ Completely Open Domain
´ Return Users

Sports Topic Handler
´ Conversation flow controlling (State Machine)
´ Real-time information based dialogues (Database)
´ Open Domain Question Answering

Conversation Flow Controlling
´Content
´Topic Switching
´Logic (Multi-turn Conversation)

Data Driven Approach
Pros:
1. Doe not require expert
knowledge
Cons:
1. Require a lot of data
2. Not logical (Not suitable for
multi-turn conversation)
Example:
Scraped 1.3 million
conversations from Twitter to
generate responses in social
Media
State Machine Approach
Pros:
1. Could generate multi-turn
Dialogues (with a database)
Cons:
1. Based on fixed data,
structures and ontologies
2. Not Real-Time
Example:
A static database within the Lon-
don Restaurant Domain and the
travel domain.
Related Works

Static Data (Not User Specific )
Monotone Conversation (Binary )
Not Support Returning User
Emora: Do you like Sports ?
User: Yes
Emora: What is your favorite
Sports?
User: Basketball
Emora : Do you like NBA?
User : Yes
……..

Real-time information based dialogues
( A daily-updated Database)
´ Talk about real-time information (Most recent games)
´ Recommend User- specific topics

Connect Database to State Machine
´ Direct Look Up :
´ Example : Michael Jordan Plays for Chicago Bulls
´ User Attribute Table
´ Example: Already covered topics (recent games) ;
Return User

Convert open domain conversation to
closed domain

Yes/No Questions:
Expect positive/negative answers
´ Example:
´ Do you want to know the best defensive player of that game?
´ Do you like NBA?
´ Do you like sports?
´ Pros : Easy to handle
´ Cons : Boring and fixed conversation

Questions with finite possible answers:
Expect many possible answers but the number of possible answers is finite
´ Example:
´ Who is your favorite NBA player?
´ Which is your favorite NBA team?
´ Pros : Easy to handle
´ Cons : Requires manual work (create states), conversation easily breakable
(user answers in unexpected way)

Questions with infinite possible answers
Completely open domain question : Expect infinitely possible answers
´ Example:
´ Why do you like Michael Jordan?
´ What do you think about the NBA shutdown due to coronavirus?
Impossible to be handled by a state machine , in theory

What do you think about the NBA shutdown
due to coronavirus?
´ Disappointed , Sad
´ Understandable, it’s for the best
´ Doesn’t care
´ Happy
´ Shocked

Trending News
´ Death of Kobe Bryant
´ Super bowl
´ Coronavirus outbreak

Better Conversation Flow
´ Disadvantage :
´ Needs Expert Knowledge in the Domain
´ Requires Manual Work
(Develop and Maintain Database, Adding trending news)

´ Answer any factual based questions
´ Does not need to support multi-turn questions
Open Domain Question Answering

Related works
´ Using a knowledge base (freebase, DBpedia)
´ “Answer Selection Task”
´ “Reading comprehension Task”
´ “End-to-end” question answering”

Knowledge base
´ An encyclopedia that is designed for machine to read
´ Needs
Manual work
to construct
´ Does not
cover all
questions(th
e structure of
the
database is
fixed)

Answer Selection Task
´ Given a question and a set of candidate answers, answer selection is the
task of identifying which of the candidates answers the question correctly.
´ Example: WikiQA : given a question and a set of candidate sentences, the
task is to choose the correct sentence.

Reading Comprehension Task
´ Retrieving the answer to a query from a given paragraph
´ Example : SQUAD (Stanford Question Answering Dataset)

“End-to-end” question answering
´ Retrieving the answer to a query from a very large corpus (the entire
Wikipedia)
´ Dr. QA (Danqi chen, 2017)

´ Document Reader
´ Document Retriever
SQUAD Goal : Answer a query, given a
paragraph
DR.QA
QA on Wikipedia Goal : Retrieve a
paragraph from the whole Wikipedia that
is most relevant to the given query
QA on Wikipedia Goal : Retrieve a
paragraph from a single wikipedia article
that is most relevant to the given query

´ State of art : (According to the SQUAD website leaderboard)
´ ELECTRA : F1 : 91.4 EM: 88.7
´ BERT : F1 : 83.1 EM : 80.4
´ Human : F1: 89.5 EM: 86.8
Document READER (SQUAD)

´ Given an article (10-20 paragraphs), find the paragraph most relevant to the query
´ Dataset : reformat of SQUAD
Document Retriever

´ Old Task : Given question , Find best matching text span in paragraph
´ (paragraph, question, textspan )
´ New Task : Given question , Find best matching paragraph that
´ (article, question, paragraph)
´ Possible Problems with this dataset : For each query , there might be multiple
corresponding paragraphs that contain correct answer, yet we only regard one of
them as correct.

Term Frequency–Inverse Document
frequency (TF-IDF )
´ The combination score of term frequency and inverse document
frequency is used to calculate the relevancy of each word in query to
each corresponding paragraph
´ Term Frequency : the number of occurrences of a term in the paragraph
´ Inverse Document Frequency : how much information the word provides

Example
´ D: {Emory is a good university} {Emily is a good college student} {Apple is a
good type of fruit} {which university is good }
´ Query : {which university is good }

Problems with Document retrieval
´ Difficult to find a reasonable threshold of the number of paragraphs to be retrieved.

Problems with Document reader
´ High Latency

Conclusion on Open-Domain Question
Answering
´ The method does not fit into the context of a chat-bot that has a strict
requirement of low latency

Experiment and Analysis on Sports
Component
´ Consistent Positive correlation between modular updates and user ratings
´ Consistent Negative correlation between bugs and user ratings

Methods
´ Analyzing the logged information

Unweighted user-ratings
• Average ratings of
conversations that are not
influenced by the sports topic
handler at all and those are
influenced by the sports topic
handler are calculated
separately.
• Possible Bias: Some
conversation’s ratings are only
partially affected by the sports
component
Weighted user-ratings
• when calculating the
user ratings of sports
topichandler, the
proportion of the
conversation that are
handled by the sports
topichandler is used as a
weight.

Sports Rating VS. Non-sports rating (Jan)
Database Update:
Extract team ranking
and team top players
from database
Conversation flow
update: Added
Variations in
Conversations ; bug
fixes
Trending news Update:
Added Small Talk about
Kobe Bryant’s DeathUnusual low traffic to
Sports Topic handler

Sports Rating VS. Non-sports rating (Feb)
Trending News:
Support Small Talk
about Super bowl ;
Conversation about
each game in NBA
Database-related talk:
new conversation
about player age; fixed
wording at the end of
the conversation
Support return user
conversation
Bug in return user
conversation
Better Transition to
Other Component
Limited Topics Covered
Low Traffic to sports topic
handler

Unweighted user-ratings vs. Weighted
user-ratings (Feb)

Sports Topic Handler Traffic
Jan 15 : Sports Added
to Emora Suggestion List

Error Analysis
´ 5.0 : 2055
´ 4.0-5.0 : 1087
´ 3.0-4.0: 1057
´ 2.0-3.0: 1057
´ 1.0-2.0: 1227

Qualitative Error Analysis
´ Situation 1 : Uncovered topics.
User: let’s talk about sports
Emora: I love sports. My favorite sport is basketball.
What is your favoritesport?
User: volleyball
Emora: That is a great sport. I am sorry I only watch
basketball. Do youplay or watch basketball?
User: No

´ Situation 2 : Bugs and unexpected utterance.
User: I watch n. b. a.
Emora: oh sorry, i did not pay attention. could you
clarify which sport youjust mentioned?

´ Situation 3 : Ignore user response
´ Chatbot: Do you watch nba or nfl ?
´ User: i watch both.
´ Chatbot: oh sorry, I just started to watch sport recently. That is why I
currently could only talk about basketball or football in the sports domain
for now.

Conclusion
´ Associating a state machine with a database could generate real- time,
user-specific, and multi-turn dialogues, which boosts the perceived quality
(user ratings )of a chat-bot. But it requires manual work and expert
knowledge
´ End to end question answering does not fit into the context of a chat-bot
because of high latency and low accuracy.

References
´ See
https://docs.google.com/document/d/1Rh_6AiaJ5vBRyTfZAN3SwbDQHHvY
WMwrVtCb50aaui8/edit?usp=sharing

Analysis of a State Machine-based Interactive Dialogue Management System

Recommended

Recommended

More Related Content

Similar to Analysis of a State Machine-based Interactive Dialogue Management System

Similar to Analysis of a State Machine-based Interactive Dialogue Management System (20)

More from Jinho Choi

More from Jinho Choi (20)

Recently uploaded

Recently uploaded (20)

Analysis of a State Machine-based Interactive Dialogue Management System