SlideShare a Scribd company logo
Analysis of a state Machine-based
Interactive Dialogue Management System
Ruixiang Qi
Content
´ Research Background
´ Methods
´ Analysis and Evaluation
´ Conclusion
What is a spoken dialogue system
Alexa’s Prize
Evaluated by user ratings
´ Random Users
´ Completely Open Domain
´ Return Users
Dialogue System Architecture
Sports Topic Handler
´ Conversation flow controlling (State Machine)
´ Real-time information based dialogues (Database)
´ Open Domain Question Answering
Conversation Flow Controlling
´Content
´Topic Switching
´Logic (Multi-turn Conversation)
Data Driven Approach
Pros:
1. Doe not require expert
knowledge
Cons:
1. Require a lot of data
2. Not logical (Not suitable for
multi-turn conversation)
Example:
Scraped 1.3 million
conversations from Twitter to
generate responses in social
Media
State Machine Approach
Pros:
1. Could generate multi-turn
Dialogues (with a database)
Cons:
1. Based on fixed data,
structures and ontologies
2. Not Real-Time
Example:
A static database within the Lon-
don Restaurant Domain and the
travel domain.
Related Works
Static Data (Not User Specific )
Monotone Conversation (Binary )
Not Support Returning User
Emora: Do you like Sports ?
User: Yes
Emora: What is your favorite
Sports?
User: Basketball
Emora : Do you like NBA?
User : Yes
……..
Real-time information based dialogues
( A daily-updated Database)
´ Talk about real-time information (Most recent games)
´ Recommend User- specific topics
Summary Of Tables
Connect Database to State Machine
´ Direct Look Up :
´ Example : Michael Jordan Plays for Chicago Bulls
´ User Attribute Table
´ Example: Already covered topics (recent games) ;
Return User
Static Data (Not User Specific )
Monotone Conversation (Binary )
Not Support Returning User
Convert open domain conversation to
closed domain
Yes/No Questions:
Expect positive/negative answers
´ Example:
´ Do you want to know the best defensive player of that game?
´ Do you like NBA?
´ Do you like sports?
´ Pros : Easy to handle
´ Cons : Boring and fixed conversation
Questions with finite possible answers:
Expect many possible answers but the number of possible answers is finite
´ Example:
´ Who is your favorite NBA player?
´ Which is your favorite NBA team?
´ Pros : Easy to handle
´ Cons : Requires manual work (create states), conversation easily breakable
(user answers in unexpected way)
Questions with infinite possible answers
Completely open domain question : Expect infinitely possible answers
´ Example:
´ Why do you like Michael Jordan?
´ What do you think about the NBA shutdown due to coronavirus?
Impossible to be handled by a state machine , in theory
What do you think about the NBA shutdown
due to coronavirus?
´ Disappointed , Sad
´ Understandable, it’s for the best
´ Doesn’t care
´ Happy
´ Shocked
Trending News
´ Death of Kobe Bryant
´ Super bowl
´ Coronavirus outbreak
Better Conversation Flow
´ Disadvantage :
´ Needs Expert Knowledge in the Domain
´ Requires Manual Work
(Develop and Maintain Database, Adding trending news)
Static Data (Not User Specific )
Monotone Conversation (Binary )
Not Support Returning User
´ Answer any factual based questions
´ Does not need to support multi-turn questions
Open Domain Question Answering
Related works
´ Using a knowledge base (freebase, DBpedia)
´ “Answer Selection Task”
´ “Reading comprehension Task”
´ “End-to-end” question answering”
Knowledge base
´ An encyclopedia that is designed for machine to read
´ Needs
Manual work
to construct
´ Does not
cover all
questions(th
e structure of
the
database is
fixed)
Answer Selection Task
´ Given a question and a set of candidate answers, answer selection is the
task of identifying which of the candidates answers the question correctly.
´ Example: WikiQA : given a question and a set of candidate sentences, the
task is to choose the correct sentence.
Reading Comprehension Task
´ Retrieving the answer to a query from a given paragraph
´ Example : SQUAD (Stanford Question Answering Dataset)
“End-to-end” question answering
´ Retrieving the answer to a query from a very large corpus (the entire
Wikipedia)
´ Dr. QA (Danqi chen, 2017)
´ Document Reader
´ Document Retriever
SQUAD Goal : Answer a query, given a
paragraph
DR.QA
QA on Wikipedia Goal : Retrieve a
paragraph from the whole Wikipedia that
is most relevant to the given query
QA on Wikipedia Goal : Retrieve a
paragraph from a single wikipedia article
that is most relevant to the given query
´ State of art : (According to the SQUAD website leaderboard)
´ ELECTRA : F1 : 91.4 EM: 88.7
´ BERT : F1 : 83.1 EM : 80.4
´ Human : F1: 89.5 EM: 86.8
Document READER (SQUAD)
´ Given an article (10-20 paragraphs), find the paragraph most relevant to the query
´ Dataset : reformat of SQUAD
Document Retriever
´ Old Task : Given question , Find best matching text span in paragraph
´ (paragraph, question, textspan )
´ New Task : Given question , Find best matching paragraph that
´ (article, question, paragraph)
´ Possible Problems with this dataset : For each query , there might be multiple
corresponding paragraphs that contain correct answer, yet we only regard one of
them as correct.
Term Frequency–Inverse Document
frequency (TF-IDF )
´ The combination score of term frequency and inverse document
frequency is used to calculate the relevancy of each word in query to
each corresponding paragraph
´ Term Frequency : the number of occurrences of a term in the paragraph
´ Inverse Document Frequency : how much information the word provides
Example
´ D: {Emory is a good university} {Emily is a good college student} {Apple is a
good type of fruit} {which university is good }
´ Query : {which university is good }
Problems with Document retrieval
´ Difficult to find a reasonable threshold of the number of paragraphs to be retrieved.
Problems with Document reader
´ High Latency
Conclusion on Open-Domain Question
Answering
´ The method does not fit into the context of a chat-bot that has a strict
requirement of low latency
Experiment and Analysis on Sports
Component
´ Consistent Positive correlation between modular updates and user ratings
´ Consistent Negative correlation between bugs and user ratings
Methods
´ Analyzing the logged information
Unweighted user-ratings
• Average ratings of
conversations that are not
influenced by the sports topic
handler at all and those are
influenced by the sports topic
handler are calculated
separately.
• Possible Bias: Some
conversation’s ratings are only
partially affected by the sports
component
Weighted user-ratings
• when calculating the
user ratings of sports
topichandler, the
proportion of the
conversation that are
handled by the sports
topichandler is used as a
weight.
Sports Rating VS. Non-sports rating (Jan)
Database Update:
Extract team ranking
and team top players
from database
Conversation flow
update: Added
Variations in
Conversations ; bug
fixes
Trending news Update:
Added Small Talk about
Kobe Bryant’s DeathUnusual low traffic to
Sports Topic handler
Sports Rating VS. Non-sports rating (Feb)
Trending News:
Support Small Talk
about Super bowl ;
Conversation about
each game in NBA
Database-related talk:
new conversation
about player age; fixed
wording at the end of
the conversation
Support return user
conversation
Bug in return user
conversation
Better Transition to
Other Component
Limited Topics Covered
Low Traffic to sports topic
handler
Unweighted user-ratings vs. Weighted
user-ratings (Feb)
Sports Topic Handler Traffic
Jan 15 : Sports Added
to Emora Suggestion List
Error Analysis
´ 5.0 : 2055
´ 4.0-5.0 : 1087
´ 3.0-4.0: 1057
´ 2.0-3.0: 1057
´ 1.0-2.0: 1227
Qualitative Error Analysis
´ Situation 1 : Uncovered topics.
User: let’s talk about sports
Emora: I love sports. My favorite sport is basketball.
What is your favoritesport?
User: volleyball
Emora: That is a great sport. I am sorry I only watch
basketball. Do youplay or watch basketball?
User: No
Qualitative Error Analysis
´ Situation 2 : Bugs and unexpected utterance.
User: I watch n. b. a.
Emora: oh sorry, i did not pay attention. could you
clarify which sport youjust mentioned?
Qualitative Error Analysis
´ Situation 3 : Ignore user response
´ Chatbot: Do you watch nba or nfl ?
´ User: i watch both.
´ Chatbot: oh sorry, I just started to watch sport recently. That is why I
currently could only talk about basketball or football in the sports domain
for now.
Conclusion
´ Associating a state machine with a database could generate real- time,
user-specific, and multi-turn dialogues, which boosts the perceived quality
(user ratings )of a chat-bot. But it requires manual work and expert
knowledge
´ End to end question answering does not fit into the context of a chat-bot
because of high latency and low accuracy.
References
´ See
https://docs.google.com/document/d/1Rh_6AiaJ5vBRyTfZAN3SwbDQHHvY
WMwrVtCb50aaui8/edit?usp=sharing
Thanks!

More Related Content

Similar to Analysis of a State Machine-based Interactive Dialogue Management System

From Chatbots to Augmented Conversational Assistants
From Chatbots to Augmented Conversational AssistantsFrom Chatbots to Augmented Conversational Assistants
From Chatbots to Augmented Conversational Assistants
Databricks
 
Usability Testing in Federal Libraries: A Case Study
Usability Testing in Federal Libraries: A Case StudyUsability Testing in Federal Libraries: A Case Study
Usability Testing in Federal Libraries: A Case Study
nullhandle
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
SAIL_QU
 
BDD Primer
BDD PrimerBDD Primer
BDD Primer
Attila BertĂłk
 
Coaching teams in creative problem solving
Coaching teams in creative problem solvingCoaching teams in creative problem solving
Coaching teams in creative problem solving
Flowa Oy
 
How Do Users Like This Feature? A Fine Grained Sentiment Analysis of App Revi...
How Do Users Like This Feature? A Fine Grained Sentiment Analysis of App Revi...How Do Users Like This Feature? A Fine Grained Sentiment Analysis of App Revi...
How Do Users Like This Feature? A Fine Grained Sentiment Analysis of App Revi...
Walid Maalej
 
Recommendation Engines with Ruby and Redis
Recommendation Engines with Ruby and RedisRecommendation Engines with Ruby and Redis
Recommendation Engines with Ruby and Redis
evanlight
 
Strata - Final_IB_02_17
Strata - Final_IB_02_17Strata - Final_IB_02_17
Strata - Final_IB_02_17Irina Borisova
 
OpenAmplify V 2.0 Webinar
OpenAmplify V 2.0 WebinarOpenAmplify V 2.0 Webinar
OpenAmplify V 2.0 Webinar
OpenAmplify
 
DeepSearch_Project_Report
DeepSearch_Project_ReportDeepSearch_Project_Report
DeepSearch_Project_ReportUrjit Patel
 
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, EuropePatterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
Flip Kromer
 
Scaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine LearningScaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine Learning
Vo Viet Anh
 
Agilewords Document Collaboration - Get It Done
Agilewords Document Collaboration - Get It DoneAgilewords Document Collaboration - Get It Done
Agilewords Document Collaboration - Get It Done
Agilewords
 
Supporting Software
Supporting SoftwareSupporting Software
Supporting Software
Software Sustainability Institute
 
Reasesrty djhjan S - explanation required.pptx
Reasesrty djhjan S - explanation required.pptxReasesrty djhjan S - explanation required.pptx
Reasesrty djhjan S - explanation required.pptx
AnkitaVerma776806
 
We hear you!. Collecting and processing user feedback, for real!
We hear you!. Collecting and processing user feedback, for real!We hear you!. Collecting and processing user feedback, for real!
We hear you!. Collecting and processing user feedback, for real!
penpotapp
 
Analyzing Stack Overflow - Problem
Analyzing Stack Overflow - ProblemAnalyzing Stack Overflow - Problem
Analyzing Stack Overflow - Problem
Amrith Krishna
 
Comu346 lecture 7 - user evaluation
Comu346   lecture 7 - user evaluationComu346   lecture 7 - user evaluation
Comu346 lecture 7 - user evaluationDavid Farrell
 
Kata Pengantar Proposal Pdf. Online assignment writing service.
Kata Pengantar Proposal Pdf. Online assignment writing service.Kata Pengantar Proposal Pdf. Online assignment writing service.
Kata Pengantar Proposal Pdf. Online assignment writing service.
Julie Smith
 

Similar to Analysis of a State Machine-based Interactive Dialogue Management System (20)

From Chatbots to Augmented Conversational Assistants
From Chatbots to Augmented Conversational AssistantsFrom Chatbots to Augmented Conversational Assistants
From Chatbots to Augmented Conversational Assistants
 
Usability Testing in Federal Libraries: A Case Study
Usability Testing in Federal Libraries: A Case StudyUsability Testing in Federal Libraries: A Case Study
Usability Testing in Federal Libraries: A Case Study
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
BDD Primer
BDD PrimerBDD Primer
BDD Primer
 
Coaching teams in creative problem solving
Coaching teams in creative problem solvingCoaching teams in creative problem solving
Coaching teams in creative problem solving
 
How Do Users Like This Feature? A Fine Grained Sentiment Analysis of App Revi...
How Do Users Like This Feature? A Fine Grained Sentiment Analysis of App Revi...How Do Users Like This Feature? A Fine Grained Sentiment Analysis of App Revi...
How Do Users Like This Feature? A Fine Grained Sentiment Analysis of App Revi...
 
Recommendation Engines with Ruby and Redis
Recommendation Engines with Ruby and RedisRecommendation Engines with Ruby and Redis
Recommendation Engines with Ruby and Redis
 
Strata - Final_IB_02_17
Strata - Final_IB_02_17Strata - Final_IB_02_17
Strata - Final_IB_02_17
 
OpenAmplify V 2.0 Webinar
OpenAmplify V 2.0 WebinarOpenAmplify V 2.0 Webinar
OpenAmplify V 2.0 Webinar
 
DeepSearch_Project_Report
DeepSearch_Project_ReportDeepSearch_Project_Report
DeepSearch_Project_Report
 
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, EuropePatterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
 
Scaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine LearningScaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine Learning
 
Agilewords Document Collaboration - Get It Done
Agilewords Document Collaboration - Get It DoneAgilewords Document Collaboration - Get It Done
Agilewords Document Collaboration - Get It Done
 
Supporting Software
Supporting SoftwareSupporting Software
Supporting Software
 
Reasesrty djhjan S - explanation required.pptx
Reasesrty djhjan S - explanation required.pptxReasesrty djhjan S - explanation required.pptx
Reasesrty djhjan S - explanation required.pptx
 
We hear you!. Collecting and processing user feedback, for real!
We hear you!. Collecting and processing user feedback, for real!We hear you!. Collecting and processing user feedback, for real!
We hear you!. Collecting and processing user feedback, for real!
 
Analyzing Stack Overflow - Problem
Analyzing Stack Overflow - ProblemAnalyzing Stack Overflow - Problem
Analyzing Stack Overflow - Problem
 
Comu346 lecture 7 - user evaluation
Comu346   lecture 7 - user evaluationComu346   lecture 7 - user evaluation
Comu346 lecture 7 - user evaluation
 
STACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSISSTACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSIS
 
Kata Pengantar Proposal Pdf. Online assignment writing service.
Kata Pengantar Proposal Pdf. Online assignment writing service.Kata Pengantar Proposal Pdf. Online assignment writing service.
Kata Pengantar Proposal Pdf. Online assignment writing service.
 

More from Jinho Choi

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Jinho Choi
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Jinho Choi
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Jinho Choi
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Jinho Choi
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference Resolution
Jinho Choi
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Jinho Choi
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning Representation
Jinho Choi
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
Jinho Choi
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
Jinho Choi
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
Jinho Choi
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
Jinho Choi
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
Jinho Choi
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
Jinho Choi
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
Jinho Choi
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
Jinho Choi
 
Topological Sort
Topological SortTopological Sort
Topological Sort
Jinho Choi
 
Tries - Put
Tries - PutTries - Put
Tries - Put
Jinho Choi
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Jinho Choi
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Jinho Choi
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
Jinho Choi
 

More from Jinho Choi (20)

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference Resolution
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning Representation
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
 
Topological Sort
Topological SortTopological Sort
Topological Sort
 
Tries - Put
Tries - PutTries - Put
Tries - Put
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
 

Recently uploaded

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 

Recently uploaded (20)

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 

Analysis of a State Machine-based Interactive Dialogue Management System

  • 1. Analysis of a state Machine-based Interactive Dialogue Management System Ruixiang Qi
  • 2. Content ´ Research Background ´ Methods ´ Analysis and Evaluation ´ Conclusion
  • 3. What is a spoken dialogue system
  • 4. Alexa’s Prize Evaluated by user ratings ´ Random Users ´ Completely Open Domain ´ Return Users
  • 6. Sports Topic Handler ´ Conversation flow controlling (State Machine) ´ Real-time information based dialogues (Database) ´ Open Domain Question Answering
  • 7. Conversation Flow Controlling ´Content ´Topic Switching ´Logic (Multi-turn Conversation)
  • 8. Data Driven Approach Pros: 1. Doe not require expert knowledge Cons: 1. Require a lot of data 2. Not logical (Not suitable for multi-turn conversation) Example: Scraped 1.3 million conversations from Twitter to generate responses in social Media State Machine Approach Pros: 1. Could generate multi-turn Dialogues (with a database) Cons: 1. Based on fixed data, structures and ontologies 2. Not Real-Time Example: A static database within the Lon- don Restaurant Domain and the travel domain. Related Works
  • 9.
  • 10. Static Data (Not User Specific ) Monotone Conversation (Binary ) Not Support Returning User Emora: Do you like Sports ? User: Yes Emora: What is your favorite Sports? User: Basketball Emora : Do you like NBA? User : Yes ……..
  • 11. Real-time information based dialogues ( A daily-updated Database) ´ Talk about real-time information (Most recent games) ´ Recommend User- specific topics
  • 13. Connect Database to State Machine ´ Direct Look Up : ´ Example : Michael Jordan Plays for Chicago Bulls ´ User Attribute Table ´ Example: Already covered topics (recent games) ; Return User Static Data (Not User Specific ) Monotone Conversation (Binary ) Not Support Returning User
  • 14. Convert open domain conversation to closed domain
  • 15. Yes/No Questions: Expect positive/negative answers ´ Example: ´ Do you want to know the best defensive player of that game? ´ Do you like NBA? ´ Do you like sports? ´ Pros : Easy to handle ´ Cons : Boring and fixed conversation
  • 16. Questions with finite possible answers: Expect many possible answers but the number of possible answers is finite ´ Example: ´ Who is your favorite NBA player? ´ Which is your favorite NBA team? ´ Pros : Easy to handle ´ Cons : Requires manual work (create states), conversation easily breakable (user answers in unexpected way)
  • 17. Questions with infinite possible answers Completely open domain question : Expect infinitely possible answers ´ Example: ´ Why do you like Michael Jordan? ´ What do you think about the NBA shutdown due to coronavirus? Impossible to be handled by a state machine , in theory
  • 18. What do you think about the NBA shutdown due to coronavirus? ´ Disappointed , Sad ´ Understandable, it’s for the best ´ Doesn’t care ´ Happy ´ Shocked
  • 19. Trending News ´ Death of Kobe Bryant ´ Super bowl ´ Coronavirus outbreak
  • 20.
  • 21. Better Conversation Flow ´ Disadvantage : ´ Needs Expert Knowledge in the Domain ´ Requires Manual Work (Develop and Maintain Database, Adding trending news) Static Data (Not User Specific ) Monotone Conversation (Binary ) Not Support Returning User
  • 22. ´ Answer any factual based questions ´ Does not need to support multi-turn questions Open Domain Question Answering
  • 23. Related works ´ Using a knowledge base (freebase, DBpedia) ´ “Answer Selection Task” ´ “Reading comprehension Task” ´ “End-to-end” question answering”
  • 24. Knowledge base ´ An encyclopedia that is designed for machine to read ´ Needs Manual work to construct ´ Does not cover all questions(th e structure of the database is fixed)
  • 25. Answer Selection Task ´ Given a question and a set of candidate answers, answer selection is the task of identifying which of the candidates answers the question correctly. ´ Example: WikiQA : given a question and a set of candidate sentences, the task is to choose the correct sentence.
  • 26. Reading Comprehension Task ´ Retrieving the answer to a query from a given paragraph ´ Example : SQUAD (Stanford Question Answering Dataset)
  • 27. “End-to-end” question answering ´ Retrieving the answer to a query from a very large corpus (the entire Wikipedia) ´ Dr. QA (Danqi chen, 2017)
  • 28. ´ Document Reader ´ Document Retriever SQUAD Goal : Answer a query, given a paragraph DR.QA QA on Wikipedia Goal : Retrieve a paragraph from the whole Wikipedia that is most relevant to the given query QA on Wikipedia Goal : Retrieve a paragraph from a single wikipedia article that is most relevant to the given query
  • 29. ´ State of art : (According to the SQUAD website leaderboard) ´ ELECTRA : F1 : 91.4 EM: 88.7 ´ BERT : F1 : 83.1 EM : 80.4 ´ Human : F1: 89.5 EM: 86.8 Document READER (SQUAD)
  • 30. ´ Given an article (10-20 paragraphs), find the paragraph most relevant to the query ´ Dataset : reformat of SQUAD Document Retriever
  • 31. ´ Old Task : Given question , Find best matching text span in paragraph ´ (paragraph, question, textspan ) ´ New Task : Given question , Find best matching paragraph that ´ (article, question, paragraph) ´ Possible Problems with this dataset : For each query , there might be multiple corresponding paragraphs that contain correct answer, yet we only regard one of them as correct.
  • 32. Term Frequency–Inverse Document frequency (TF-IDF ) ´ The combination score of term frequency and inverse document frequency is used to calculate the relevancy of each word in query to each corresponding paragraph ´ Term Frequency : the number of occurrences of a term in the paragraph ´ Inverse Document Frequency : how much information the word provides
  • 33. Example ´ D: {Emory is a good university} {Emily is a good college student} {Apple is a good type of fruit} {which university is good } ´ Query : {which university is good }
  • 34. Problems with Document retrieval ´ Difficult to find a reasonable threshold of the number of paragraphs to be retrieved.
  • 35. Problems with Document reader ´ High Latency
  • 36. Conclusion on Open-Domain Question Answering ´ The method does not fit into the context of a chat-bot that has a strict requirement of low latency
  • 37. Experiment and Analysis on Sports Component ´ Consistent Positive correlation between modular updates and user ratings ´ Consistent Negative correlation between bugs and user ratings
  • 38. Methods ´ Analyzing the logged information
  • 39. Unweighted user-ratings • Average ratings of conversations that are not influenced by the sports topic handler at all and those are influenced by the sports topic handler are calculated separately. • Possible Bias: Some conversation’s ratings are only partially affected by the sports component Weighted user-ratings • when calculating the user ratings of sports topichandler, the proportion of the conversation that are handled by the sports topichandler is used as a weight.
  • 40. Sports Rating VS. Non-sports rating (Jan) Database Update: Extract team ranking and team top players from database Conversation flow update: Added Variations in Conversations ; bug fixes Trending news Update: Added Small Talk about Kobe Bryant’s DeathUnusual low traffic to Sports Topic handler
  • 41. Sports Rating VS. Non-sports rating (Feb) Trending News: Support Small Talk about Super bowl ; Conversation about each game in NBA Database-related talk: new conversation about player age; fixed wording at the end of the conversation Support return user conversation Bug in return user conversation Better Transition to Other Component Limited Topics Covered Low Traffic to sports topic handler
  • 42. Unweighted user-ratings vs. Weighted user-ratings (Feb)
  • 43. Sports Topic Handler Traffic Jan 15 : Sports Added to Emora Suggestion List
  • 44.
  • 45. Error Analysis ´ 5.0 : 2055 ´ 4.0-5.0 : 1087 ´ 3.0-4.0: 1057 ´ 2.0-3.0: 1057 ´ 1.0-2.0: 1227
  • 46. Qualitative Error Analysis ´ Situation 1 : Uncovered topics. User: let’s talk about sports Emora: I love sports. My favorite sport is basketball. What is your favoritesport? User: volleyball Emora: That is a great sport. I am sorry I only watch basketball. Do youplay or watch basketball? User: No
  • 47. Qualitative Error Analysis ´ Situation 2 : Bugs and unexpected utterance. User: I watch n. b. a. Emora: oh sorry, i did not pay attention. could you clarify which sport youjust mentioned?
  • 48. Qualitative Error Analysis ´ Situation 3 : Ignore user response ´ Chatbot: Do you watch nba or nfl ? ´ User: i watch both. ´ Chatbot: oh sorry, I just started to watch sport recently. That is why I currently could only talk about basketball or football in the sports domain for now.
  • 49. Conclusion ´ Associating a state machine with a database could generate real- time, user-specific, and multi-turn dialogues, which boosts the perceived quality (user ratings )of a chat-bot. But it requires manual work and expert knowledge ´ End to end question answering does not fit into the context of a chat-bot because of high latency and low accuracy.