SlideShare a Scribd company logo
www.uam.es
Mutation testing
for task-oriented chatbots
Pablo Gómez-Abajo, Sara Pérez-Soler, Pablo C. Cañizares,
Esther Guerra, Juan de Lara
{Pablo.GomezA, Sara.PerezS, Pablo.Cerro, Esther.Guerra, Juan.deLara}@uam.es
Modelling & Software Engineering Research Group
Universidad Autónoma de Madrid, Spain
18th – 21st June 2024
Motivation
• Conversational agents or chatbots are increasingly used to access
all sort of services using natural language
• Like any other software, chatbots need to be tested
• Usually by defining test scenarios
• However
• There is currently a lack of methods to assess the quality of such
test scenarios
• The result is a high risk of buggy chatbots
2/25
Motivation
• Conversational agents or chatbots are increasingly used to access
all sort of services using natural language
• Like any other software, chatbots need to be tested
• Usually by defining test scenarios
• However
• There is currently a lack of methods to assess the quality of such
test scenarios
• The result is a high risk of buggy chatbots
2/25
What is a task-oriented chatbot?
• A task-oriented chatbot is a software application used in natural language
and designed to solve a specific task
• e.g., booking a ticket, ordering a pizza, setting a medical appointment
• Via text or speech recognition
• In recent years, the use of chatbots has increased
…and many more
• Since 2022, we also have open-domain chatbots (ChatGPT, etc.) which engage in conversations
on any topic, and which we do not cover in this work
3/25
How do chatbots work?
4/25
User
NL phrase
Chatbot
chatbot
response
How do chatbots work?
5/25
User
NL phrase
intent1
intentn
Chatbot
match intent
…
intenti
…
chatbot
response
3
extract
params
build
response
external
service
1
4
2
3
How do chatbots work?
6/25
1. The user sends a natural language
message to the chatbot Utterances
Utterances (user says)
Hi there!
I need to fly from Madrid to Salerno on
Wednesday at 12 PM
Good bye!
How do chatbots work?
7/25
1. The user sends a natural language
message to the chatbot
2. The chatbot tries to match the
message with an intention
How do chatbots work?
7/25
??
Intention?
1. The user sends a natural language
message to the chatbot
2. The chatbot tries to match the
message with an intention
How do chatbots work?
8/25
Hi there!
Intent: Match the user interaction with
an intention
User says Intent
Hi there!
How do chatbots work?
8/25
Hi there!
Intent
matched
Intent: Match the user interaction with
an intention
User says Intent
Hi there! Greet
Book
How do chatbots work?
9/25
I need
to fly
User says Intent
I need to fly from
Madrid to Salerno on
Wednesday at 12 PM
Intent: Match the user interaction with
an intention
Book
How do chatbots work?
9/25
I need
to fly
Intent
matched User says Intent
I need to fly from
Madrid to Salerno on
Wednesday at 12 PM
Book a flight
Intent: Match the user interaction with
an intention
Book
How do chatbots work?
9/25
I need
to fly
Intent
matched User says Intent
I need to fly from
Madrid to Salerno on
Wednesday at 12 PM
Book a flight
HOW?!
Intent: Match the user interaction with
an intention
Book
How do chatbots work?
9/25
I need
to fly
Intent
matched User says Intent
I need to fly from
Madrid to Salerno on
Wednesday at 12 PM
Book a flight
HOW?!
Providing training phrases: a set of examples that users can use to
express an intention. Required for matching inputs with intents
Intent: Match the user interaction with
an intention
Book
How do chatbots work?
10/25
Hi there
Intent
matched
Training phrases: a set of examples
that users can use to express an
intention
● Must be provided with the intent
Training phrase Intent
Hi there! Greet
Hello Greet
Hi Greet
Hey Greet
Book
How do chatbots work?
11/25
Training phrases: a set of examples
that users can use to express an
intention
● Must be provided with the intent
I need
to fly
Intent
matched Training phrase Intent
Airplane ticket from
Madrid to Rome
tomorrow at 1 pm
Book a flight
Flight from Madrid
to Napoli on
17/06/2024 at 11:30
Book a flight
How do chatbots work?
12/25
3. Chatbot extracts information from
the message or asks for missing
information
I need
to fly
User says Intent
I need to fly from
Madrid to Salerno on
Wednesday at 12 PM
Book a flight
to:Salerno
How do chatbots work?
13/25
3. Chatbot extracts information from
the message or asks for missing
information
I need
to fly
User says Intent
I need to fly from
Madrid to Salerno on
Wednesday at 12 PM
Book a flight
At this point, the chatbot extracts key information from the input: parameters
From:Madrid when:Wed. At 12 PM
to:Salerno
How do chatbots work?
13/25
3. Chatbot extracts information from
the message or asks for missing
information
I need
to fly
User says Intent
I need to fly from
Madrid to Salerno on
Wednesday at 12 PM
Book a flight
At this point, the chatbot extracts key information from the input: parameters
From:Madrid when:Wed. At 12 PM
City
to:Salerno
How do chatbots work?
13/25
3. Chatbot extracts information from
the message or asks for missing
information
I need
to fly
User says Intent
I need to fly from
Madrid to Salerno on
Wednesday at 12 PM
Book a flight
At this point, the chatbot extracts key information from the input: parameters
From:Madrid when:Wed. At 12 PM
City entities
to:Salerno
How do chatbots work?
13/25
3. Chatbot extracts information from
the message or asks for missing
information
I need
to fly
User says Intent
I need to fly from
Madrid to Salerno on
Wednesday at 12 PM
Book a flight
At this point, the chatbot extracts key information from the input: parameters
From:Madrid when:Wed. At 12 PM
Time
City entities
How do chatbots work?
14/25
4. Build the response and send back
the response to the user
I need
to fly
● Responses to the user:
○ text, images
● External service queries
○ External API rest
○ Database, etc.
User says Action
I need to fly from
Madrid to Salerno on
Wednesday at 12 PM
The price of the
ticket is 150$.
Provide a card
nº and billing
name
Both, user responses and external services queries: actions
Testing chatbots
15/25
User
Chatbot
Testcase input Testcase output
Hi there! Hi! How can I help
you?
Hi
there!
Hi!
How can I
Help you?
…
complete
conversations
Testing chatbots
16/25
We use Botium and Rasa-test as the test suites to test the chatbots
#me
Hi there!
#bot
What day do you want to come in?
#me
GREET_UTTERANCES_USER
#bot
GREET_RESPONSES_USER
Single test interaction
Combination of multiple tests
GREET_UTTERANCES_USER
Hi there!
Hi
Hello
Hey
GREET_RESPONSES_USER
Hi! How can I help you?
Hello, what do you need?
Greetings! This is the flight ticket
assistant Antony, how can i help you?
Multiple user utterances
Possible responses
convo
file
(conversation
step)
utterances
responses
Testing chatbots
17/25
Hi
there!
I need to fly
from …
Hi!
How can I
Help you?
The price
of the
ticket …
I lost my
baggage
Please,
provide
the flight
ticket id
… and complex
conversations
Mutation testing for chatbots
18/25
User says Action
What kinds of coffee are available?
What kinds of coffee can I order?
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intentn
Chatbot
match intent
…
Order a
wine
…
chatbot
response
3
extract
params
build
response
external
service
User says Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take an
Italian wine or a
French wine
Intent
matched
Order a coffee
Order a wine
Tell me what
kinds of coffee I
can drink here
Mutation testing for chatbots
18/25
User says Action
What kinds of coffee are available?
What kinds of coffee can I order?
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intentn
Chatbot
match intent
…
Order a
wine
…
chatbot
response
3
extract
params
build
response
external
service
User says Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take an
Italian wine or a
French wine
0.512
0.538
0.475
0.474
Tell me what
kinds of coffee I
can drink here
Order a coffee: Keeps the two most different phrases
Order a wine
Semantic similarity
Mutation testing for chatbots
18/25
User says Action
What kinds of coffee are available?
What kinds of coffee can I order?
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intentn
Chatbot
match intent
…
Order a
wine
…
chatbot
response
3
extract
params
build
response
external
service
Order a coffee: Keeps the two most different phrases
User says Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take an
Italian wine or a
French wine
Order a wine
0.512
0.538
0.475
0.474
Tell me what
kinds of coffee I
can drink here
Mutation testing for chatbots
18/25
User says Action
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intentn
Chatbot
match intent
…
Order a
wine
…
chatbot
response
3
extract
params
build
response
external
service
User says Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take an
Italian wine or a
French wine
Order a wine
Intent
matched
Tell me what
kinds of coffee I
can drink here
Order a coffee
Mutation testing for chatbots
18/25
User says Action
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intentn
Chatbot
match intent
…
Order a
wine
…
chatbot
response
3
extract
params
build
response
external
service
User says Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take an
Italian wine or a
French wine
Order a wine
Intent
matched
Tell me what
kinds of coffee I
can drink here
Order a coffee
Test-suite
Mutation testing for chatbots
18/25
User says Action
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intentn
Chatbot
match intent
…
Order a
wine
…
chatbot
response
3
extract
params
build
response
external
service
User says Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take an
Italian wine or a
French wine
Order a wine
Intent
matched
Tell me what
kinds of coffee I
can drink here
Order a coffee
Mutation testing for chatbots
18/25
User says Action
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intentn
Chatbot
match intent
…
Order a
wine
…
chatbot
response
3
extract
params
build
response
external
service
User says Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take an
Italian wine or a
French wine
Order a wine
Intent
matched
Tell me what
kinds of coffee I
can drink here
Order a coffee
Test-suite
19/25
Operators for training phrases
DPmax Deletes the most representative phrase of
an intent
DPmin Deletes the most different phrase of an
intent
DPWP Deletes training phrases with required
parameter
DPWL Deletes training phrases with literal
K2Pmax Keeps the 2 most representative phrases
K2Pmin Keeps the 2 most different phrases
MPmax
Moves the most representative phrase to
the most similar intent
MPmin
Moves the most different phrase to the
most different intent
Mutation operators for chatbots
Operators for intents
DIP Deletes intent parameter
DPP Deletes parameter prompt
SPO Sets required parameter to optional
DFI Deletes fallback intent
Operators for entities
CRE Changes regular expression
DLE Deletes literal from entity
Operators for actions
DA Deletes actions
DPR Deletes a parameter used in a response
SO Swaps outputs
Operators for conversation flows
DCS Deletes conversation step
DCB Deletes conversation bifurcation
Emulation of common errors of chatbot developers
Dialogflow
chatbot
model
parse
1
CONGA
meta-model
«conforms to»
annotate
annotated
chatbot
model
Tensorflow
annotation
meta-model
«conforms to»
2
mutate
mutation
operators
(WODEL)
3
chatbot
model
mutant
generate
4
chatbot
impl.
test
5
test suites
mutation
analysis
report
chatbot impl.
WODEL-TEST
20/25
Mutation testing for chatbots
RQ1: How applicable are the defined mutation operators?
RQ2: How effective are the defined mutation operators?
21/25
39%
48%
67%
60%
77% 73%
78% 80%
67%
0% 0%
40%
50%
76%
14%
89% 87%
96%
Alive
Killed
Mutation score by
mutation operator
RQ1: How applicable are the defined mutation operators?
RQ2: How effective are the defined mutation operators?
21/25
39%
48%
67%
60%
77% 73%
78% 80%
67%
0% 0%
40%
50%
76%
14%
89% 87%
96%
Alive
Killed
Mutation score by
mutation operator
RQ3: How effective is the mutation testing process?
22/25
Botium automatic Botium by hand Rasa test
45%
94%
20%
Alive
Killed
Mutation score
by test suite kind
RQ3: How effective is the mutation testing process?
22/25
Botium automatic Botium by hand Rasa test
45%
94%
20%
Alive
Killed
Mutation score
by test suite kind
RQ4: How efficient is the mutation testing process?
23/25
0,1% 0,2% 0,3%
1,0% 1,2% 1,4% 1,6% 1,6% 1,7%
2,6%
4,9%
8,4%
12,8%
27,5%
34,7%
0%
5%
10%
15%
20%
25%
30%
35%
Covid19_tracer
bikeShop
e2e-bot
Spaceonova
personal-bot
yassinelamarti
Rasa-demo
256644
h4h-chatbot
diagrams2ai
dusbot
legal-alien-chatbot
Email-WhatsApp-Integration
lankbanfinance
Data-mining
The mutation testing
process of 67% of the
chatbots was completed
in less than 90 minutes
RQ4: How efficient is the mutation testing process?
23/25
0,1% 0,2% 0,3%
1,0% 1,2% 1,4% 1,6% 1,6% 1,7%
2,6%
4,9%
8,4%
12,8%
27,5%
34,7%
0%
5%
10%
15%
20%
25%
30%
35%
Covid19_tracer
bikeShop
e2e-bot
Spaceonova
personal-bot
yassinelamarti
Rasa-demo
256644
h4h-chatbot
diagrams2ai
dusbot
legal-alien-chatbot
Email-WhatsApp-Integration
lankbanfinance
Data-mining
The mutation testing
process of 67% of the
chatbots was completed
in less than 90 minutes
Conclusions
• Technology-independent approach for MuT of chatbots with
• A catalogue of 19 mutation operators for
• Training phrases, intents, entities, chatbot actions and conversation flows
• Support for test scenarios from botium and rasa-test
• Experiment with 15 chatbots and 29 test suites
• Positive results regarding applicability, effectiveness and efficiency
• Room for improvement in 86% of the test suites
• MuT for chatbots running times are costly but acceptable
• Less than 90 minutes for 67% of the chatbots
24/25
Future work
• Automate the detection of semantically equivalent mutants
• e.g., using confidence decrease heuristics
• Automate the synthesis of tests able to kill the alive mutants
• Adapt our approach to LLM-based agents
25/25
www.uam.es
Pablo Gómez-Abajo, Sara Pérez-Soler, Pablo C. Cañizares,
Esther Guerra, Juan de Lara
{Pablo.GomezA, Sara.PerezS, Pablo.Cerro, Esther.Guerra, Juan.deLara}@uam.es
Mutation testing
for task-oriented chatbots
Thank you!
./ Wodel-Test
Dataset
Tool demo

More Related Content

Similar to Mutation Testing for Task-Oriented Chatbots

WSO2Con US 2013 - Thinking of you. Customizing the store of the WSO2 API Mana...
WSO2Con US 2013 - Thinking of you. Customizing the store of the WSO2 API Mana...WSO2Con US 2013 - Thinking of you. Customizing the store of the WSO2 API Mana...
WSO2Con US 2013 - Thinking of you. Customizing the store of the WSO2 API Mana...
WSO2
 
Free software basics
Free software basicsFree software basics
Free software basics
Vitor Pamplona
 
Blending Delicious User Experiences For Windows Phone 7 (by VerySoftware)
Blending Delicious User Experiences For Windows Phone 7 (by VerySoftware)Blending Delicious User Experiences For Windows Phone 7 (by VerySoftware)
Blending Delicious User Experiences For Windows Phone 7 (by VerySoftware)
VerySoftware
 
Building A Lead Generating Chatbot
Building A Lead Generating ChatbotBuilding A Lead Generating Chatbot
Building A Lead Generating Chatbot
Whitehat Inbound Marketing Agency
 
How To Easy Essay Topics. Online assignment writing service.
How To Easy Essay Topics. Online assignment writing service.How To Easy Essay Topics. Online assignment writing service.
How To Easy Essay Topics. Online assignment writing service.
Melissa Lofton
 
Chatbot and AI Design Principles
Chatbot and AI Design PrinciplesChatbot and AI Design Principles
Chatbot and AI Design Principles
Mauricio Perez
 
Chat bots: what, why and (a bit of) how?
Chat bots: what, why and (a bit of) how?Chat bots: what, why and (a bit of) how?
Chat bots: what, why and (a bit of) how?
Radu Irava
 
Aboutdistanceconversion blogspot com
Aboutdistanceconversion blogspot comAboutdistanceconversion blogspot com
Aboutdistanceconversion blogspot com
Gabriel Barlow
 
How to Teach and Learn with ChatGPT - BETT 2023
How to Teach and Learn with ChatGPT - BETT 2023How to Teach and Learn with ChatGPT - BETT 2023
How to Teach and Learn with ChatGPT - BETT 2023
Dominik Lukes
 
How To Make Your College Admission Essay Stand Out
How To Make Your College Admission Essay Stand OutHow To Make Your College Admission Essay Stand Out
How To Make Your College Admission Essay Stand Out
Michelle Wilson
 
Clever Messenger Review
Clever Messenger Review Clever Messenger Review
Clever Messenger Review
New World Trade 2022
 
Build an Application from Idea to Release
Build an Application from Idea to ReleaseBuild an Application from Idea to Release
Build an Application from Idea to Release
ideatoipo
 
Conversational UI Design and Research at UXSEA Summit 2018
Conversational UI Design and Research at UXSEA Summit 2018Conversational UI Design and Research at UXSEA Summit 2018
Conversational UI Design and Research at UXSEA Summit 2018
Kuldeep Kulshreshtha
 
BAM Congres 2017: Mark Herman - When and how to start an interactive dialogue...
BAM Congres 2017: Mark Herman - When and how to start an interactive dialogue...BAM Congres 2017: Mark Herman - When and how to start an interactive dialogue...
BAM Congres 2017: Mark Herman - When and how to start an interactive dialogue...
BAM - Belgian Association of Marketing
 
Small, simple and smelly: What we can learn from examining end-user artifacts?
Small, simple and smelly: What we can learn from examining end-user artifacts?Small, simple and smelly: What we can learn from examining end-user artifacts?
Small, simple and smelly: What we can learn from examining end-user artifacts?
Felienne Hermans
 
Iadvize frenchwebwebinar-201702bots-170207091635
Iadvize frenchwebwebinar-201702bots-170207091635Iadvize frenchwebwebinar-201702bots-170207091635
Iadvize frenchwebwebinar-201702bots-170207091635
Olivier PROVOT ◆ BU Manager
 
Chatbots & expérience client : comment les chatbots transforment l'expérience...
Chatbots & expérience client : comment les chatbots transforment l'expérience...Chatbots & expérience client : comment les chatbots transforment l'expérience...
Chatbots & expérience client : comment les chatbots transforment l'expérience...
iAdvize
 
Your big idea.pptx
Your big idea.pptxYour big idea.pptx
Your big idea.pptx
SandeepKumar608872
 
How To Write An Anecdote In An Essay
How To Write An Anecdote In An EssayHow To Write An Anecdote In An Essay
How To Write An Anecdote In An Essay
Laura Benitez
 
Tips from a retired facebook app developer
Tips from a retired facebook app developerTips from a retired facebook app developer
Tips from a retired facebook app developer
Aymeric Gaurat-Apelli
 

Similar to Mutation Testing for Task-Oriented Chatbots (20)

WSO2Con US 2013 - Thinking of you. Customizing the store of the WSO2 API Mana...
WSO2Con US 2013 - Thinking of you. Customizing the store of the WSO2 API Mana...WSO2Con US 2013 - Thinking of you. Customizing the store of the WSO2 API Mana...
WSO2Con US 2013 - Thinking of you. Customizing the store of the WSO2 API Mana...
 
Free software basics
Free software basicsFree software basics
Free software basics
 
Blending Delicious User Experiences For Windows Phone 7 (by VerySoftware)
Blending Delicious User Experiences For Windows Phone 7 (by VerySoftware)Blending Delicious User Experiences For Windows Phone 7 (by VerySoftware)
Blending Delicious User Experiences For Windows Phone 7 (by VerySoftware)
 
Building A Lead Generating Chatbot
Building A Lead Generating ChatbotBuilding A Lead Generating Chatbot
Building A Lead Generating Chatbot
 
How To Easy Essay Topics. Online assignment writing service.
How To Easy Essay Topics. Online assignment writing service.How To Easy Essay Topics. Online assignment writing service.
How To Easy Essay Topics. Online assignment writing service.
 
Chatbot and AI Design Principles
Chatbot and AI Design PrinciplesChatbot and AI Design Principles
Chatbot and AI Design Principles
 
Chat bots: what, why and (a bit of) how?
Chat bots: what, why and (a bit of) how?Chat bots: what, why and (a bit of) how?
Chat bots: what, why and (a bit of) how?
 
Aboutdistanceconversion blogspot com
Aboutdistanceconversion blogspot comAboutdistanceconversion blogspot com
Aboutdistanceconversion blogspot com
 
How to Teach and Learn with ChatGPT - BETT 2023
How to Teach and Learn with ChatGPT - BETT 2023How to Teach and Learn with ChatGPT - BETT 2023
How to Teach and Learn with ChatGPT - BETT 2023
 
How To Make Your College Admission Essay Stand Out
How To Make Your College Admission Essay Stand OutHow To Make Your College Admission Essay Stand Out
How To Make Your College Admission Essay Stand Out
 
Clever Messenger Review
Clever Messenger Review Clever Messenger Review
Clever Messenger Review
 
Build an Application from Idea to Release
Build an Application from Idea to ReleaseBuild an Application from Idea to Release
Build an Application from Idea to Release
 
Conversational UI Design and Research at UXSEA Summit 2018
Conversational UI Design and Research at UXSEA Summit 2018Conversational UI Design and Research at UXSEA Summit 2018
Conversational UI Design and Research at UXSEA Summit 2018
 
BAM Congres 2017: Mark Herman - When and how to start an interactive dialogue...
BAM Congres 2017: Mark Herman - When and how to start an interactive dialogue...BAM Congres 2017: Mark Herman - When and how to start an interactive dialogue...
BAM Congres 2017: Mark Herman - When and how to start an interactive dialogue...
 
Small, simple and smelly: What we can learn from examining end-user artifacts?
Small, simple and smelly: What we can learn from examining end-user artifacts?Small, simple and smelly: What we can learn from examining end-user artifacts?
Small, simple and smelly: What we can learn from examining end-user artifacts?
 
Iadvize frenchwebwebinar-201702bots-170207091635
Iadvize frenchwebwebinar-201702bots-170207091635Iadvize frenchwebwebinar-201702bots-170207091635
Iadvize frenchwebwebinar-201702bots-170207091635
 
Chatbots & expérience client : comment les chatbots transforment l'expérience...
Chatbots & expérience client : comment les chatbots transforment l'expérience...Chatbots & expérience client : comment les chatbots transforment l'expérience...
Chatbots & expérience client : comment les chatbots transforment l'expérience...
 
Your big idea.pptx
Your big idea.pptxYour big idea.pptx
Your big idea.pptx
 
How To Write An Anecdote In An Essay
How To Write An Anecdote In An EssayHow To Write An Anecdote In An Essay
How To Write An Anecdote In An Essay
 
Tips from a retired facebook app developer
Tips from a retired facebook app developerTips from a retired facebook app developer
Tips from a retired facebook app developer
 

More from Pablo Gómez Abajo

Wodel-Edu: A tool for the generation and evaluation of diagram-based exercises
Wodel-Edu: A tool for the generation and evaluation of diagram-based exercisesWodel-Edu: A tool for the generation and evaluation of diagram-based exercises
Wodel-Edu: A tool for the generation and evaluation of diagram-based exercises
Pablo Gómez Abajo
 
Automated engineering of domain-specific metamorphic testing environments
Automated engineering of domain-specific metamorphic testing environmentsAutomated engineering of domain-specific metamorphic testing environments
Automated engineering of domain-specific metamorphic testing environments
Pablo Gómez Abajo
 
Wodel-Test: A Model-Based Framework for Language-Independent Mutation Testing
Wodel-Test: A Model-Based Framework for Language-Independent Mutation TestingWodel-Test: A Model-Based Framework for Language-Independent Mutation Testing
Wodel-Test: A Model-Based Framework for Language-Independent Mutation Testing
Pablo Gómez Abajo
 
Wodel-Test: A Model-Based Framework for Language-Independent Mutation Testing...
Wodel-Test: A Model-Based Framework for Language-Independent Mutation Testing...Wodel-Test: A Model-Based Framework for Language-Independent Mutation Testing...
Wodel-Test: A Model-Based Framework for Language-Independent Mutation Testing...
Pablo Gómez Abajo
 
Wodel-Edu: An MDE Solution for the Generation and Evaluation of Diagram-based...
Wodel-Edu: An MDE Solution for the Generation and Evaluation of Diagram-based...Wodel-Edu: An MDE Solution for the Generation and Evaluation of Diagram-based...
Wodel-Edu: An MDE Solution for the Generation and Evaluation of Diagram-based...
Pablo Gómez Abajo
 
Generation of mutation testing tools with Wodel-Test
Generation of mutation testing tools with Wodel-TestGeneration of mutation testing tools with Wodel-Test
Generation of mutation testing tools with Wodel-Test
Pablo Gómez Abajo
 
Programación de macros en Microsoft Excel VBA
Programación de macros en Microsoft Excel VBAProgramación de macros en Microsoft Excel VBA
Programación de macros en Microsoft Excel VBA
Pablo Gómez Abajo
 
PhD defense presentation
PhD defense presentationPhD defense presentation
PhD defense presentation
Pablo Gómez Abajo
 
Seed Model Synthesis for Testing Model-based Mutation Operators
Seed Model Synthesis for Testing Model-based Mutation OperatorsSeed Model Synthesis for Testing Model-based Mutation Operators
Seed Model Synthesis for Testing Model-based Mutation Operators
Pablo Gómez Abajo
 
Mutation Testing for DSLs (Tool Demo)
Mutation Testing for DSLs (Tool Demo)Mutation Testing for DSLs (Tool Demo)
Mutation Testing for DSLs (Tool Demo)
Pablo Gómez Abajo
 
Towards a model-driven engineering solution for language independent mutation...
Towards a model-driven engineering solution for language independent mutation...Towards a model-driven engineering solution for language independent mutation...
Towards a model-driven engineering solution for language independent mutation...
Pablo Gómez Abajo
 
Wodel: A DSL for Model Mutation; and Wodel-Edu: its Application to the Automa...
Wodel: A DSL for Model Mutation; and Wodel-Edu: its Application to the Automa...Wodel: A DSL for Model Mutation; and Wodel-Edu: its Application to the Automa...
Wodel: A DSL for Model Mutation; and Wodel-Edu: its Application to the Automa...
Pablo Gómez Abajo
 
A DSL for Model Mutation and its Applications to Different Domains
A DSL for Model Mutation and its Applications to Different DomainsA DSL for Model Mutation and its Applications to Different Domains
A DSL for Model Mutation and its Applications to Different Domains
Pablo Gómez Abajo
 
Un framework para la generación automática de ejercicios mediante técnicas de...
Un framework para la generación automática de ejercicios mediante técnicas de...Un framework para la generación automática de ejercicios mediante técnicas de...
Un framework para la generación automática de ejercicios mediante técnicas de...
Pablo Gómez Abajo
 
Wodel: A Domain-Specific Language for Model Mutation
Wodel: A Domain-Specific Language for Model MutationWodel: A Domain-Specific Language for Model Mutation
Wodel: A Domain-Specific Language for Model Mutation
Pablo Gómez Abajo
 

More from Pablo Gómez Abajo (15)

Wodel-Edu: A tool for the generation and evaluation of diagram-based exercises
Wodel-Edu: A tool for the generation and evaluation of diagram-based exercisesWodel-Edu: A tool for the generation and evaluation of diagram-based exercises
Wodel-Edu: A tool for the generation and evaluation of diagram-based exercises
 
Automated engineering of domain-specific metamorphic testing environments
Automated engineering of domain-specific metamorphic testing environmentsAutomated engineering of domain-specific metamorphic testing environments
Automated engineering of domain-specific metamorphic testing environments
 
Wodel-Test: A Model-Based Framework for Language-Independent Mutation Testing
Wodel-Test: A Model-Based Framework for Language-Independent Mutation TestingWodel-Test: A Model-Based Framework for Language-Independent Mutation Testing
Wodel-Test: A Model-Based Framework for Language-Independent Mutation Testing
 
Wodel-Test: A Model-Based Framework for Language-Independent Mutation Testing...
Wodel-Test: A Model-Based Framework for Language-Independent Mutation Testing...Wodel-Test: A Model-Based Framework for Language-Independent Mutation Testing...
Wodel-Test: A Model-Based Framework for Language-Independent Mutation Testing...
 
Wodel-Edu: An MDE Solution for the Generation and Evaluation of Diagram-based...
Wodel-Edu: An MDE Solution for the Generation and Evaluation of Diagram-based...Wodel-Edu: An MDE Solution for the Generation and Evaluation of Diagram-based...
Wodel-Edu: An MDE Solution for the Generation and Evaluation of Diagram-based...
 
Generation of mutation testing tools with Wodel-Test
Generation of mutation testing tools with Wodel-TestGeneration of mutation testing tools with Wodel-Test
Generation of mutation testing tools with Wodel-Test
 
Programación de macros en Microsoft Excel VBA
Programación de macros en Microsoft Excel VBAProgramación de macros en Microsoft Excel VBA
Programación de macros en Microsoft Excel VBA
 
PhD defense presentation
PhD defense presentationPhD defense presentation
PhD defense presentation
 
Seed Model Synthesis for Testing Model-based Mutation Operators
Seed Model Synthesis for Testing Model-based Mutation OperatorsSeed Model Synthesis for Testing Model-based Mutation Operators
Seed Model Synthesis for Testing Model-based Mutation Operators
 
Mutation Testing for DSLs (Tool Demo)
Mutation Testing for DSLs (Tool Demo)Mutation Testing for DSLs (Tool Demo)
Mutation Testing for DSLs (Tool Demo)
 
Towards a model-driven engineering solution for language independent mutation...
Towards a model-driven engineering solution for language independent mutation...Towards a model-driven engineering solution for language independent mutation...
Towards a model-driven engineering solution for language independent mutation...
 
Wodel: A DSL for Model Mutation; and Wodel-Edu: its Application to the Automa...
Wodel: A DSL for Model Mutation; and Wodel-Edu: its Application to the Automa...Wodel: A DSL for Model Mutation; and Wodel-Edu: its Application to the Automa...
Wodel: A DSL for Model Mutation; and Wodel-Edu: its Application to the Automa...
 
A DSL for Model Mutation and its Applications to Different Domains
A DSL for Model Mutation and its Applications to Different DomainsA DSL for Model Mutation and its Applications to Different Domains
A DSL for Model Mutation and its Applications to Different Domains
 
Un framework para la generación automática de ejercicios mediante técnicas de...
Un framework para la generación automática de ejercicios mediante técnicas de...Un framework para la generación automática de ejercicios mediante técnicas de...
Un framework para la generación automática de ejercicios mediante técnicas de...
 
Wodel: A Domain-Specific Language for Model Mutation
Wodel: A Domain-Specific Language for Model MutationWodel: A Domain-Specific Language for Model Mutation
Wodel: A Domain-Specific Language for Model Mutation
 

Recently uploaded

Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
maigasapphire
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
aslasdfmkhan4750
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
Shiv Technolabs
 
Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...
chetankumar9855
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
Jimmy Lai
 
Feature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptxFeature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptx
ssuser1915fe1
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
kumarjarun2010
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
moinahousna
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
BrainSell Technologies
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
alexjohnson7307
 
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
sunilverma7884
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
rajancomputerfbd
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Muhammad Ali
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
Lidia A.
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
Matthias Neugebauer
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
SynapseIndia
 

Recently uploaded (20)

Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
 
Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
 
Feature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptxFeature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptx
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
 
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
 

Mutation Testing for Task-Oriented Chatbots

  • 1. www.uam.es Mutation testing for task-oriented chatbots Pablo Gómez-Abajo, Sara Pérez-Soler, Pablo C. Cañizares, Esther Guerra, Juan de Lara {Pablo.GomezA, Sara.PerezS, Pablo.Cerro, Esther.Guerra, Juan.deLara}@uam.es Modelling & Software Engineering Research Group Universidad Autónoma de Madrid, Spain 18th – 21st June 2024
  • 2. Motivation • Conversational agents or chatbots are increasingly used to access all sort of services using natural language • Like any other software, chatbots need to be tested • Usually by defining test scenarios • However • There is currently a lack of methods to assess the quality of such test scenarios • The result is a high risk of buggy chatbots 2/25
  • 3. Motivation • Conversational agents or chatbots are increasingly used to access all sort of services using natural language • Like any other software, chatbots need to be tested • Usually by defining test scenarios • However • There is currently a lack of methods to assess the quality of such test scenarios • The result is a high risk of buggy chatbots 2/25
  • 4. What is a task-oriented chatbot? • A task-oriented chatbot is a software application used in natural language and designed to solve a specific task • e.g., booking a ticket, ordering a pizza, setting a medical appointment • Via text or speech recognition • In recent years, the use of chatbots has increased …and many more • Since 2022, we also have open-domain chatbots (ChatGPT, etc.) which engage in conversations on any topic, and which we do not cover in this work 3/25
  • 5. How do chatbots work? 4/25 User NL phrase Chatbot chatbot response
  • 6. How do chatbots work? 5/25 User NL phrase intent1 intentn Chatbot match intent … intenti … chatbot response 3 extract params build response external service 1 4 2 3
  • 7. How do chatbots work? 6/25 1. The user sends a natural language message to the chatbot Utterances Utterances (user says) Hi there! I need to fly from Madrid to Salerno on Wednesday at 12 PM Good bye!
  • 8. How do chatbots work? 7/25 1. The user sends a natural language message to the chatbot 2. The chatbot tries to match the message with an intention
  • 9. How do chatbots work? 7/25 ?? Intention? 1. The user sends a natural language message to the chatbot 2. The chatbot tries to match the message with an intention
  • 10. How do chatbots work? 8/25 Hi there! Intent: Match the user interaction with an intention User says Intent Hi there!
  • 11. How do chatbots work? 8/25 Hi there! Intent matched Intent: Match the user interaction with an intention User says Intent Hi there! Greet
  • 12. Book How do chatbots work? 9/25 I need to fly User says Intent I need to fly from Madrid to Salerno on Wednesday at 12 PM Intent: Match the user interaction with an intention
  • 13. Book How do chatbots work? 9/25 I need to fly Intent matched User says Intent I need to fly from Madrid to Salerno on Wednesday at 12 PM Book a flight Intent: Match the user interaction with an intention
  • 14. Book How do chatbots work? 9/25 I need to fly Intent matched User says Intent I need to fly from Madrid to Salerno on Wednesday at 12 PM Book a flight HOW?! Intent: Match the user interaction with an intention
  • 15. Book How do chatbots work? 9/25 I need to fly Intent matched User says Intent I need to fly from Madrid to Salerno on Wednesday at 12 PM Book a flight HOW?! Providing training phrases: a set of examples that users can use to express an intention. Required for matching inputs with intents Intent: Match the user interaction with an intention
  • 16. Book How do chatbots work? 10/25 Hi there Intent matched Training phrases: a set of examples that users can use to express an intention ● Must be provided with the intent Training phrase Intent Hi there! Greet Hello Greet Hi Greet Hey Greet
  • 17. Book How do chatbots work? 11/25 Training phrases: a set of examples that users can use to express an intention ● Must be provided with the intent I need to fly Intent matched Training phrase Intent Airplane ticket from Madrid to Rome tomorrow at 1 pm Book a flight Flight from Madrid to Napoli on 17/06/2024 at 11:30 Book a flight
  • 18. How do chatbots work? 12/25 3. Chatbot extracts information from the message or asks for missing information I need to fly User says Intent I need to fly from Madrid to Salerno on Wednesday at 12 PM Book a flight
  • 19. to:Salerno How do chatbots work? 13/25 3. Chatbot extracts information from the message or asks for missing information I need to fly User says Intent I need to fly from Madrid to Salerno on Wednesday at 12 PM Book a flight At this point, the chatbot extracts key information from the input: parameters From:Madrid when:Wed. At 12 PM
  • 20. to:Salerno How do chatbots work? 13/25 3. Chatbot extracts information from the message or asks for missing information I need to fly User says Intent I need to fly from Madrid to Salerno on Wednesday at 12 PM Book a flight At this point, the chatbot extracts key information from the input: parameters From:Madrid when:Wed. At 12 PM City
  • 21. to:Salerno How do chatbots work? 13/25 3. Chatbot extracts information from the message or asks for missing information I need to fly User says Intent I need to fly from Madrid to Salerno on Wednesday at 12 PM Book a flight At this point, the chatbot extracts key information from the input: parameters From:Madrid when:Wed. At 12 PM City entities
  • 22. to:Salerno How do chatbots work? 13/25 3. Chatbot extracts information from the message or asks for missing information I need to fly User says Intent I need to fly from Madrid to Salerno on Wednesday at 12 PM Book a flight At this point, the chatbot extracts key information from the input: parameters From:Madrid when:Wed. At 12 PM Time City entities
  • 23. How do chatbots work? 14/25 4. Build the response and send back the response to the user I need to fly ● Responses to the user: ○ text, images ● External service queries ○ External API rest ○ Database, etc. User says Action I need to fly from Madrid to Salerno on Wednesday at 12 PM The price of the ticket is 150$. Provide a card nº and billing name Both, user responses and external services queries: actions
  • 24. Testing chatbots 15/25 User Chatbot Testcase input Testcase output Hi there! Hi! How can I help you? Hi there! Hi! How can I Help you? … complete conversations
  • 25. Testing chatbots 16/25 We use Botium and Rasa-test as the test suites to test the chatbots #me Hi there! #bot What day do you want to come in? #me GREET_UTTERANCES_USER #bot GREET_RESPONSES_USER Single test interaction Combination of multiple tests GREET_UTTERANCES_USER Hi there! Hi Hello Hey GREET_RESPONSES_USER Hi! How can I help you? Hello, what do you need? Greetings! This is the flight ticket assistant Antony, how can i help you? Multiple user utterances Possible responses convo file (conversation step) utterances responses
  • 26. Testing chatbots 17/25 Hi there! I need to fly from … Hi! How can I Help you? The price of the ticket … I lost my baggage Please, provide the flight ticket id … and complex conversations
  • 27. Mutation testing for chatbots 18/25 User says Action What kinds of coffee are available? What kinds of coffee can I order? What can I drink here? Tell me what drinks there are You can take an expresso or an americano User NL phrase Order a coffe intentn Chatbot match intent … Order a wine … chatbot response 3 extract params build response external service User says Action What kinds of wine are available? What kinds of wine can I order? What can I drink here? Tell me what drinks there are You can take an Italian wine or a French wine Intent matched Order a coffee Order a wine Tell me what kinds of coffee I can drink here
  • 28. Mutation testing for chatbots 18/25 User says Action What kinds of coffee are available? What kinds of coffee can I order? What can I drink here? Tell me what drinks there are You can take an expresso or an americano User NL phrase Order a coffe intentn Chatbot match intent … Order a wine … chatbot response 3 extract params build response external service User says Action What kinds of wine are available? What kinds of wine can I order? What can I drink here? Tell me what drinks there are You can take an Italian wine or a French wine 0.512 0.538 0.475 0.474 Tell me what kinds of coffee I can drink here Order a coffee: Keeps the two most different phrases Order a wine Semantic similarity
  • 29. Mutation testing for chatbots 18/25 User says Action What kinds of coffee are available? What kinds of coffee can I order? What can I drink here? Tell me what drinks there are You can take an expresso or an americano User NL phrase Order a coffe intentn Chatbot match intent … Order a wine … chatbot response 3 extract params build response external service Order a coffee: Keeps the two most different phrases User says Action What kinds of wine are available? What kinds of wine can I order? What can I drink here? Tell me what drinks there are You can take an Italian wine or a French wine Order a wine 0.512 0.538 0.475 0.474 Tell me what kinds of coffee I can drink here
  • 30. Mutation testing for chatbots 18/25 User says Action What can I drink here? Tell me what drinks there are You can take an expresso or an americano User NL phrase Order a coffe intentn Chatbot match intent … Order a wine … chatbot response 3 extract params build response external service User says Action What kinds of wine are available? What kinds of wine can I order? What can I drink here? Tell me what drinks there are You can take an Italian wine or a French wine Order a wine Intent matched Tell me what kinds of coffee I can drink here Order a coffee
  • 31. Mutation testing for chatbots 18/25 User says Action What can I drink here? Tell me what drinks there are You can take an expresso or an americano User NL phrase Order a coffe intentn Chatbot match intent … Order a wine … chatbot response 3 extract params build response external service User says Action What kinds of wine are available? What kinds of wine can I order? What can I drink here? Tell me what drinks there are You can take an Italian wine or a French wine Order a wine Intent matched Tell me what kinds of coffee I can drink here Order a coffee Test-suite
  • 32. Mutation testing for chatbots 18/25 User says Action What can I drink here? Tell me what drinks there are You can take an expresso or an americano User NL phrase Order a coffe intentn Chatbot match intent … Order a wine … chatbot response 3 extract params build response external service User says Action What kinds of wine are available? What kinds of wine can I order? What can I drink here? Tell me what drinks there are You can take an Italian wine or a French wine Order a wine Intent matched Tell me what kinds of coffee I can drink here Order a coffee
  • 33. Mutation testing for chatbots 18/25 User says Action What can I drink here? Tell me what drinks there are You can take an expresso or an americano User NL phrase Order a coffe intentn Chatbot match intent … Order a wine … chatbot response 3 extract params build response external service User says Action What kinds of wine are available? What kinds of wine can I order? What can I drink here? Tell me what drinks there are You can take an Italian wine or a French wine Order a wine Intent matched Tell me what kinds of coffee I can drink here Order a coffee Test-suite
  • 34. 19/25 Operators for training phrases DPmax Deletes the most representative phrase of an intent DPmin Deletes the most different phrase of an intent DPWP Deletes training phrases with required parameter DPWL Deletes training phrases with literal K2Pmax Keeps the 2 most representative phrases K2Pmin Keeps the 2 most different phrases MPmax Moves the most representative phrase to the most similar intent MPmin Moves the most different phrase to the most different intent Mutation operators for chatbots Operators for intents DIP Deletes intent parameter DPP Deletes parameter prompt SPO Sets required parameter to optional DFI Deletes fallback intent Operators for entities CRE Changes regular expression DLE Deletes literal from entity Operators for actions DA Deletes actions DPR Deletes a parameter used in a response SO Swaps outputs Operators for conversation flows DCS Deletes conversation step DCB Deletes conversation bifurcation Emulation of common errors of chatbot developers
  • 36. RQ1: How applicable are the defined mutation operators? RQ2: How effective are the defined mutation operators? 21/25 39% 48% 67% 60% 77% 73% 78% 80% 67% 0% 0% 40% 50% 76% 14% 89% 87% 96% Alive Killed Mutation score by mutation operator
  • 37. RQ1: How applicable are the defined mutation operators? RQ2: How effective are the defined mutation operators? 21/25 39% 48% 67% 60% 77% 73% 78% 80% 67% 0% 0% 40% 50% 76% 14% 89% 87% 96% Alive Killed Mutation score by mutation operator
  • 38. RQ3: How effective is the mutation testing process? 22/25 Botium automatic Botium by hand Rasa test 45% 94% 20% Alive Killed Mutation score by test suite kind
  • 39. RQ3: How effective is the mutation testing process? 22/25 Botium automatic Botium by hand Rasa test 45% 94% 20% Alive Killed Mutation score by test suite kind
  • 40. RQ4: How efficient is the mutation testing process? 23/25 0,1% 0,2% 0,3% 1,0% 1,2% 1,4% 1,6% 1,6% 1,7% 2,6% 4,9% 8,4% 12,8% 27,5% 34,7% 0% 5% 10% 15% 20% 25% 30% 35% Covid19_tracer bikeShop e2e-bot Spaceonova personal-bot yassinelamarti Rasa-demo 256644 h4h-chatbot diagrams2ai dusbot legal-alien-chatbot Email-WhatsApp-Integration lankbanfinance Data-mining The mutation testing process of 67% of the chatbots was completed in less than 90 minutes
  • 41. RQ4: How efficient is the mutation testing process? 23/25 0,1% 0,2% 0,3% 1,0% 1,2% 1,4% 1,6% 1,6% 1,7% 2,6% 4,9% 8,4% 12,8% 27,5% 34,7% 0% 5% 10% 15% 20% 25% 30% 35% Covid19_tracer bikeShop e2e-bot Spaceonova personal-bot yassinelamarti Rasa-demo 256644 h4h-chatbot diagrams2ai dusbot legal-alien-chatbot Email-WhatsApp-Integration lankbanfinance Data-mining The mutation testing process of 67% of the chatbots was completed in less than 90 minutes
  • 42. Conclusions • Technology-independent approach for MuT of chatbots with • A catalogue of 19 mutation operators for • Training phrases, intents, entities, chatbot actions and conversation flows • Support for test scenarios from botium and rasa-test • Experiment with 15 chatbots and 29 test suites • Positive results regarding applicability, effectiveness and efficiency • Room for improvement in 86% of the test suites • MuT for chatbots running times are costly but acceptable • Less than 90 minutes for 67% of the chatbots 24/25
  • 43. Future work • Automate the detection of semantically equivalent mutants • e.g., using confidence decrease heuristics • Automate the synthesis of tests able to kill the alive mutants • Adapt our approach to LLM-based agents 25/25
  • 44. www.uam.es Pablo Gómez-Abajo, Sara Pérez-Soler, Pablo C. Cañizares, Esther Guerra, Juan de Lara {Pablo.GomezA, Sara.PerezS, Pablo.Cerro, Esther.Guerra, Juan.deLara}@uam.es Mutation testing for task-oriented chatbots Thank you! ./ Wodel-Test Dataset Tool demo