https://langdevcon.org
https://langdevcon.org
Seville 17-19 October, 2024
Mutation testing for DSLs
The case of task-
oriented chatbots
Pablo Gómez-Abajo
Modelling & Software Engineering Research Group
Universidad Autónoma de Madrid
Introduction
▪ DSLs are increasingly used to solve problems in specific domains
▪ Like any other programming language, DSLs need to be tested
▪ Usually by creating and using test suites
▪ Mutation testing (MuT) is a common technique used to improve such
software test suites quality
63OI
2/53
What is mutation testing?
▪ Approach of software testing to assess
the quality of the test suites
▪ Injection of syntax changes in a
program by using a set of mutation
operators
▪ The mutations introduced emulate
common programming faults
▪ Useful to improve the quality of the
test suites and the mutation operators
set
63OI
3/53
original program mutants
test suite
alive mutants killed mutants
mutation score
additional
test cases
Mutation testing for automata
63OI
4/53
0
1
1
0
Seed model
01
00
Test suite
✓

Mutation testing for automata
5/53
0
1
0
1
0
1
1
0
Mutant model
01
00 ✓

MT
01
00 ✓

Test suite
63OI
Seed model
Test suite
Mutation testing for automata
6/53
0
1
0
1
0
1
1
0
01
00 ✓

The mutant is alive 
MT
01
00 ✓

63OI
Mutant model
Test suite
Seed model
Test suite
Mutation testing for automata
7/53
0
1
1
0
✓
01
00 ✓

10
63OI
Seed model
Test suite
Mutation testing for automata
8/53
0
1
0
1
0
1
1
0
✓
01
00 ✓

10
MT
01
00 ✓

10 
63OI
Mutant model
Test suite
Seed model
Test suite
Mutation testing for automata
9/53
0
1
0
1
0
1
1
0
The mutant is killed
✓
01
00 ✓

10
✓
01
00 ✓

10 
MT
63OI
Mutant model
Test suite
Seed model
Test suite
Motivation
▪ However, the existing MuT tools are
▪ Specific for a language
▪ Encoded by hand
▪ They incur in high-costs of maintenance
▪ To alleviate such inconveniences, we propose Wodel-Test
▪ A model-based solution to engineer language-specific MuT tools
63OI
10/53
Wodel-Test
63OI
11/53
▪ A model-based solution to engineer
mutation testing tools
▪ MuT tools for automata, logic circuits,
Java, ATL, chatbots, etc.
MuT tool
creator
Meta-
model
M2T
transf.
T2M
transf.
Language
support
Mutation
operators
(WODEL)
Mutation
support
Execution
support
Program
compilation
Test
execution
MuT tool specification
MuT tool
Program
under
test
Equivalence
criteria
Test
cases
Generates
input
MuT report
Tester
MuT tool for chatbots
▪ We have used Wodel-Test to engineer a MuT tool for task-oriented
chatbots
▪ The solution uses the intent-based chatbot meta-model created by
S. Pérez-Soler et al. [1]
[1] S. Pérez-Soler, E. Guerra, and J. de Lara. Model-driven chatbot development. In ER, volume 12400 of
LNCS, pages 207–222. Springer, 2020
63OI
12/53
What is a task-oriented chatbot?
▪ A task-oriented chatbot is a software application used in natural language
and designed to solve a specific task
▪ e.g., booking a ticket, ordering a pizza, setting a medical appointment
▪ Via text or speech recognition
▪ In the recent years, the use of chatbots has increased
…and many more
▪ Since 2022, we also have open-domain chatbots (ChatGPT, etc.) which
engage in conversations on any topic, and which we do not cover in this
work
63OI
13/53
How do chatbots work?
User
NL phrase
Chatbot
chatbot
response
63OI
14/53
How do chatbots work?
User
NL phrase
intent1
intentn
Chatbot
match intent
…
intenti
…
chatbot
response
3
extract
params
build
response
external
service
1
4
2
3
63OI
15/53
How do chatbots work?
1. The user sends a natural language
message to the chatbot Utterances
Utterances (user says)
Hi there!
I need to fly from Madrid to Seville on
Thursday at 8 AM
Good bye!
63OI
16/53
How do chatbots work?
1. The user sends a natural language
message to the chatbot
2. The chatbot tries to match the
message with an intention
63OI
17/53
How do chatbots work?
??
Intention?
1. The user sends a natural language
message to the chatbot
2. The chatbot tries to match the
message with an intention
63OI
18/53
How do chatbots work?
Hi there!
Intent: Match the user interaction with
an intention
User says Intent
Hi there!
63OI
19/53
How do chatbots work?
Hi there!
Intent
matched
Intent: Match the user interaction with
an intention
User says Intent
Hi there! Greet
63OI
20/53
How do chatbots work?
Book
I need
to fly
User says Intent
I need to fly from
Madrid to Seville on
Thursday at 8 AM
Intent: Match the user interaction with
an intention
63OI
21/53
How do chatbots work?
Book
I need
to fly
Intent
matched User says Intent
I need to fly from
Madrid to Seville on
Thursday at 8 AM
Book a flight
Intent: Match the user interaction with
an intention
63OI
22/53
How do chatbots work?
Book
I need
to fly
Intent
matched User says Intent
I need to fly from
Madrid to Seville on
Thursday at 8 AM
Book a flight
HOW?!
Intent: Match the user interaction with
an intention
63OI
23/53
How do chatbots work?
Book
I need
to fly
Intent
matched User says Intent
I need to fly from
Madrid to Seville on
Thursday at 8 AM
Book a flight
HOW?!
Providing training phrases: a set of examples that users can use to
express an intention. Required for matching inputs with intents
Intent: Match the user interaction with
an intention
63OI
24/53
How do chatbots work?
Book
Hi there
Intent
matched
Training phrases: a set of examples
that users can use to express an
intention
● Must be provided with the intent
Training phrase Intent
Hi there! Greet
Hello Greet
Hi Greet
Hey Greet
63OI
25/53
How do chatbots work?
Book
Training phrases: a set of examples
that users can use to express an
intention
● Must be provided with the intent
I need
to fly
Intent
matched Training phrase Intent
Airplane ticket from
Madrid to Barcelona
tomorrow at 10 AM
Book a flight
Flight from Madrid
to Bilbao on
19/10/2024 at 11:30
Book a flight
63OI
26/53
How do chatbots work?
3. Chatbot extracts information from
the message or asks for missing
information
I need
to fly
User says Intent
I need to fly from
Madrid to Seville on
Thursday at 8 AM
Book a flight
63OI
27/53
How do chatbots work?
to:Seville
3. Chatbot extracts information from
the message or asks for missing
information
I need
to fly
User says Intent
I need to fly from
Madrid to Seville on
Thursday at 8 AM
Book a flight
At this point, the chatbot extracts key information from the input: parameters
From:Madrid when:Thu. At 8 AM
63OI
28/53
How do chatbots work?
to:Seville
3. Chatbot extracts information from
the message or asks for missing
information
I need
to fly
User says Intent
I need to fly from
Madrid to Seville on
Thursday at 8 AM
Book a flight
At this point, the chatbot extracts key information from the input: parameters
From:Madrid when:Thu. At 8 AM
City
63OI
29/53
How do chatbots work?
to:Seville
3. Chatbot extracts information from
the message or asks for missing
information
I need
to fly
User says Intent
I need to fly from
Madrid to Seville on
Thursday at 8 AM
Book a flight
At this point, the chatbot extracts key information from the input: parameters
From:Madrid when:Thu. At 8 AM
City entities
63OI
30/53
How do chatbots work?
to:Seville
3. Chatbot extracts information from
the message or asks for missing
information
I need
to fly
User says Intent
I need to fly from
Madrid to Seville on
Thursday at 8 AM
Book a flight
At this point, the chatbot extracts key information from the input: parameters
From:Madrid when:Thu. At 8 AM
Time
City entities
63OI
31/53
How do chatbots work?
Time
City entities
4. Build the response and send back
the response to the user
I need
to fly
● Responses to the user:
○ text, images
● External service queries
○ External API rest
○ Database, etc.
User says Action
I need to fly from
Madrid to Seville on
Thursday at 8 AM
The price of the
ticket is 120$.
Provide a card
nº and billing
name
Both, user responses and external services queries: actions
63OI
32/53
Testing chatbots
User
Chatbot
Testcase input Testcase output
Hi there! Hi! How can I help
you?
Hi
there!
Hi!
How can I
Help you?
…
complex
conversations
63OI
33/53
Testing chatbots
We use Botium and Rasa-test as the test suites to test the chatbots
#me
Hi there!
#bot
What day do you want to come in?
#me
GREET_UTTERANCES_USER
#bot
GREET_RESPONSES_USER
Single test interaction
Combination of multiple tests
GREET_UTTERANCES_USER
Hi there!
Hi
Hello
Hey
GREET_RESPONSES_USER
Hi! How can I help you?
Hello, what do you need?
Greetings! This is the flight ticket
assistant Antony, how can i help you?
Multiple user utterances
Possible responses
convo
file
(conversation
step)
utterances
responses
63OI
34/53
Testing chatbots
Hi
there!
I need to fly
from …
Hi!
How can I
Help you?
The price
of the
ticket …
I lost my
baggage
Please,
provide
the flight
ticket id
… and complex
conversations
63OI
35/53
Mutation testing for chatbots
User says Action
What kinds of coffee are available?
What kinds of coffee can I order?
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intentn
Chatbot
match intent
…
Order a
wine
…
chatbot
response
3
extract
params
build
response
external
service
User says Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take a
Spanish wine or
a French wine
Intent
matched
Order a coffee
Tell me what
kinds of coffee I
can drink here
Order a wine
63OI
36/53
Mutation testing for chatbots
User says Action
What kinds of coffee are available?
What kinds of coffee can I order?
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intentn
Chatbot
match intent
…
Order a
wine
…
chatbot
response
3
extract
params
build
response
external
service
User says Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take a
Spanish wine or
a French wine
0.522
0.538
0.475
0.474
Tell me what
kinds of coffee I
can drink here
Order a coffee: Keeps the two most different phrases
Order a wine
Semantic similarity
63OI
37/53
Mutation testing for chatbots
User says Action
What kinds of coffee are available?
What kinds of coffee can I order?
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intentn
Chatbot
match intent
…
Order a
wine
…
chatbot
response
3
extract
params
build
response
external
service
Order a coffee: Keeps the two most different phrases
User says Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take a
Spanish wine or
a French wine
Order a wine
0.522
0.538
0.475
0.474
Tell me what
kinds of coffee I
can drink here
63OI
38/53
Mutation testing for chatbots
User says Action
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intentn
Chatbot
match intent
…
Order a
wine
…
chatbot
response
3
extract
params
build
response
external
service
User says Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take a
Spanish wine or
a French wine
Order a wine
Tell me what
kinds of coffee I
can drink here
Order a coffee
63OI
39/53
Mutation testing for chatbots
User says Action
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intentn
Chatbot
match intent
…
Order a
wine
…
chatbot
response
3
extract
params
build
response
external
service
User says Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take a
Spanish wine or
a French wine
Order a wine
Intent
matched
Tell me what
kinds of coffee I
can drink here
Order a coffee
63OI
40/53
Mutation testing for chatbots
User says Action
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intentn
Chatbot
match intent
…
Order a
wine
…
chatbot
response
3
extract
params
build
response
external
service
User says Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take a
Spanish wine or
a French wine
Order a wine
Intent
matched
Tell me what
kinds of coffee I
can drink here
Order a coffee
Test-suite
63OI
41/53
Mutation testing for chatbots
User says Action
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intentn
Chatbot
match intent
…
Order a
wine
…
chatbot
response
3
extract
params
build
response
external
service
User says Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take a
Spanish wine or
a French wine
Order a wine
Intent
matched
Tell me what
kinds of coffee I
can drink here
Order a coffee
63OI
42/53
Mutation testing for chatbots
User says Action
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intentn
Chatbot
match intent
…
Order a
wine
…
chatbot
response
3
extract
params
build
respons
e
external
service
User says Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take a
Spanish wine or
a French wine
Order a wine
Intent
matched
Tell me what
kinds of coffee I
can drink here
Order a coffee
Test-suite
63OI
43/53
Mutation operators for chatbots
Operators for training phrases
DPmax Deletes the most representative phrase of
an intent
DPmin Deletes the most different phrase of an
intent
DPWP Deletes training phrases with required
parameter
DPWL Deletes training phrases with literal
K2Pmax Keeps the 2 most representative phrases
K2Pmin Keeps the 2 most different phrases
MPmax
Moves the most representative phrase to
the most similar intent
MPmin
Moves the most different phrase to the
most different intent
Operators for intents
DIP Deletes intent parameter
DPP Deletes parameter prompt
SPO Sets required parameter to optional
DFI Deletes fallback intent
Operators for entities
CRE Changes regular expression
DLE Deletes literal from entity
Operators for actions
DA Deletes actions
DPR Deletes a parameter used in a response
SO Swaps outputs
Operators for conversation flows
DCS Deletes conversation step
DCB Deletes conversation bifurcation
Emulation of common errors made by chatbot developers
63OI
44/53
Mutation testing for chatbots
Dialogflow
chatbot
model
parse
1
CONGA
meta-model
«conforms to»
annotate
annotated
chatbot
model
Tensorflow
annotation
meta-model
«conforms to»
2
mutate
mutation
operators
(WODEL)
3
chatbot
model
mutant
generate
4
chatbot
impl.
test
5
test suites
mutation
analysis
report
chatbot impl.
WODEL-TEST
63OI
45/53
RQ1: How applicable are the defined mut. ops.?
RQ2: How effective are the defined mut. ops.?
39%
48%
67%
60%
77% 73%
78% 80%
67%
0% 0%
40%
50%
76%
14%
89% 87%
96%
Alive
Killed
Mutation score by
mutation operator
63OI
46/53
RQ1: How applicable are the defined mut. ops.?
RQ2: How effective are the defined mut. ops.?
39%
48%
67%
60%
77% 73%
78% 80%
67%
0% 0%
40%
50%
76%
14%
89% 87%
96%
Alive
Killed
Mutation score by
mutation operator
63OI
47/53
RQ3: How effective is the MuT process?
Botium automatic Botium by hand Rasa test
45%
94%
20%
Alive
Killed
Mutation score
by test suite kind
63OI
48/53
RQ3: How effective is the MuT process?
Botium automatic Botium by hand Rasa test
45%
94%
20%
Alive
Killed
Mutation score
by test suite kind
63OI
49/53
RQ4: How efficient is the MuT process?
times
0%
5%
10%
15%
20%
25%
30%
35%
0,1%0,2%0,3%
1,0%1,2%1,4%1,6%1,6%1,7%
2,6%
4,9%
8,4%
12,8%
27,5%
34,7%
Covid19_tracer
bikeShop
e2e-bot
Spaceonova
personal-bot
yassinelamarti
Rasa-demo
256644
h4h-chatbot
diagrams2ai
dusbot
legal-alien-chatbot
Email-WhatsApp-Integration
lankbanfinance
Data-mining
The mutation testing
process of 67% of the
chatbots was completed
in less than 90 minutes
63OI
50/53
RQ4: How efficient is the MuT process?
times
0%
5%
10%
15%
20%
25%
30%
35%
0,1%0,2%0,3%
1,0%1,2%1,4%1,6%1,6%1,7%
2,6%
4,9%
8,4%
12,8%
27,5%
34,7%
Covid19_tracer
bikeShop
e2e-bot
Spaceonova
personal-bot
yassinelamarti
Rasa-demo
256644
h4h-chatbot
diagrams2ai
dusbot
legal-alien-chatbot
Email-WhatsApp-Integration
lankbanfinance
Data-mining
The mutation testing
process of 67% of the
chatbots was completed
in less than 90 minutes
63OI
51/53
Conclusions
▪ Wodel-Test eases the engineering of MuT tools for DSLs
▪ Wodel-Test is a better option when we need to
▪ Access the source code of the mutants
▪ Reason which mutants reduce the mutation score and why
▪ Test new mutation operators
63OI
52/53
Future work
▪ Automate the detection of semantically equivalent mutants
▪ e.g., in the case of chatbots using confidence decrease heuristics
▪ Automate the synthesis of tests able to kill the alive mutants
▪ Optimize the MuT process → Parallelize the mutants generation
▪ Chatbots: adapt our approach to LLM-based agents
63OI
53/53
https://langdevcon.org
https://langdevcon.org
Seville 17-19 October, 2024
Mutation testing for DSLs
The case of task-
oriented chatbots
63OI
Pablo Gómez-Abajo
Modelling & Software Engineering Research Group
Universidad Autónoma de Madrid

Mutation testing for DSLs - The case of task-oriented chatbots

  • 1.
    https://langdevcon.org https://langdevcon.org Seville 17-19 October,2024 Mutation testing for DSLs The case of task- oriented chatbots Pablo Gómez-Abajo Modelling & Software Engineering Research Group Universidad Autónoma de Madrid
  • 2.
    Introduction ▪ DSLs areincreasingly used to solve problems in specific domains ▪ Like any other programming language, DSLs need to be tested ▪ Usually by creating and using test suites ▪ Mutation testing (MuT) is a common technique used to improve such software test suites quality 63OI 2/53
  • 3.
    What is mutationtesting? ▪ Approach of software testing to assess the quality of the test suites ▪ Injection of syntax changes in a program by using a set of mutation operators ▪ The mutations introduced emulate common programming faults ▪ Useful to improve the quality of the test suites and the mutation operators set 63OI 3/53 original program mutants test suite alive mutants killed mutants mutation score additional test cases
  • 4.
    Mutation testing forautomata 63OI 4/53 0 1 1 0 Seed model 01 00 Test suite ✓ 
  • 5.
    Mutation testing forautomata 5/53 0 1 0 1 0 1 1 0 Mutant model 01 00 ✓  MT 01 00 ✓  Test suite 63OI Seed model Test suite
  • 6.
    Mutation testing forautomata 6/53 0 1 0 1 0 1 1 0 01 00 ✓  The mutant is alive  MT 01 00 ✓  63OI Mutant model Test suite Seed model Test suite
  • 7.
    Mutation testing forautomata 7/53 0 1 1 0 ✓ 01 00 ✓  10 63OI Seed model Test suite
  • 8.
    Mutation testing forautomata 8/53 0 1 0 1 0 1 1 0 ✓ 01 00 ✓  10 MT 01 00 ✓  10  63OI Mutant model Test suite Seed model Test suite
  • 9.
    Mutation testing forautomata 9/53 0 1 0 1 0 1 1 0 The mutant is killed ✓ 01 00 ✓  10 ✓ 01 00 ✓  10  MT 63OI Mutant model Test suite Seed model Test suite
  • 10.
    Motivation ▪ However, theexisting MuT tools are ▪ Specific for a language ▪ Encoded by hand ▪ They incur in high-costs of maintenance ▪ To alleviate such inconveniences, we propose Wodel-Test ▪ A model-based solution to engineer language-specific MuT tools 63OI 10/53
  • 11.
    Wodel-Test 63OI 11/53 ▪ A model-basedsolution to engineer mutation testing tools ▪ MuT tools for automata, logic circuits, Java, ATL, chatbots, etc. MuT tool creator Meta- model M2T transf. T2M transf. Language support Mutation operators (WODEL) Mutation support Execution support Program compilation Test execution MuT tool specification MuT tool Program under test Equivalence criteria Test cases Generates input MuT report Tester
  • 12.
    MuT tool forchatbots ▪ We have used Wodel-Test to engineer a MuT tool for task-oriented chatbots ▪ The solution uses the intent-based chatbot meta-model created by S. Pérez-Soler et al. [1] [1] S. Pérez-Soler, E. Guerra, and J. de Lara. Model-driven chatbot development. In ER, volume 12400 of LNCS, pages 207–222. Springer, 2020 63OI 12/53
  • 13.
    What is atask-oriented chatbot? ▪ A task-oriented chatbot is a software application used in natural language and designed to solve a specific task ▪ e.g., booking a ticket, ordering a pizza, setting a medical appointment ▪ Via text or speech recognition ▪ In the recent years, the use of chatbots has increased …and many more ▪ Since 2022, we also have open-domain chatbots (ChatGPT, etc.) which engage in conversations on any topic, and which we do not cover in this work 63OI 13/53
  • 14.
    How do chatbotswork? User NL phrase Chatbot chatbot response 63OI 14/53
  • 15.
    How do chatbotswork? User NL phrase intent1 intentn Chatbot match intent … intenti … chatbot response 3 extract params build response external service 1 4 2 3 63OI 15/53
  • 16.
    How do chatbotswork? 1. The user sends a natural language message to the chatbot Utterances Utterances (user says) Hi there! I need to fly from Madrid to Seville on Thursday at 8 AM Good bye! 63OI 16/53
  • 17.
    How do chatbotswork? 1. The user sends a natural language message to the chatbot 2. The chatbot tries to match the message with an intention 63OI 17/53
  • 18.
    How do chatbotswork? ?? Intention? 1. The user sends a natural language message to the chatbot 2. The chatbot tries to match the message with an intention 63OI 18/53
  • 19.
    How do chatbotswork? Hi there! Intent: Match the user interaction with an intention User says Intent Hi there! 63OI 19/53
  • 20.
    How do chatbotswork? Hi there! Intent matched Intent: Match the user interaction with an intention User says Intent Hi there! Greet 63OI 20/53
  • 21.
    How do chatbotswork? Book I need to fly User says Intent I need to fly from Madrid to Seville on Thursday at 8 AM Intent: Match the user interaction with an intention 63OI 21/53
  • 22.
    How do chatbotswork? Book I need to fly Intent matched User says Intent I need to fly from Madrid to Seville on Thursday at 8 AM Book a flight Intent: Match the user interaction with an intention 63OI 22/53
  • 23.
    How do chatbotswork? Book I need to fly Intent matched User says Intent I need to fly from Madrid to Seville on Thursday at 8 AM Book a flight HOW?! Intent: Match the user interaction with an intention 63OI 23/53
  • 24.
    How do chatbotswork? Book I need to fly Intent matched User says Intent I need to fly from Madrid to Seville on Thursday at 8 AM Book a flight HOW?! Providing training phrases: a set of examples that users can use to express an intention. Required for matching inputs with intents Intent: Match the user interaction with an intention 63OI 24/53
  • 25.
    How do chatbotswork? Book Hi there Intent matched Training phrases: a set of examples that users can use to express an intention ● Must be provided with the intent Training phrase Intent Hi there! Greet Hello Greet Hi Greet Hey Greet 63OI 25/53
  • 26.
    How do chatbotswork? Book Training phrases: a set of examples that users can use to express an intention ● Must be provided with the intent I need to fly Intent matched Training phrase Intent Airplane ticket from Madrid to Barcelona tomorrow at 10 AM Book a flight Flight from Madrid to Bilbao on 19/10/2024 at 11:30 Book a flight 63OI 26/53
  • 27.
    How do chatbotswork? 3. Chatbot extracts information from the message or asks for missing information I need to fly User says Intent I need to fly from Madrid to Seville on Thursday at 8 AM Book a flight 63OI 27/53
  • 28.
    How do chatbotswork? to:Seville 3. Chatbot extracts information from the message or asks for missing information I need to fly User says Intent I need to fly from Madrid to Seville on Thursday at 8 AM Book a flight At this point, the chatbot extracts key information from the input: parameters From:Madrid when:Thu. At 8 AM 63OI 28/53
  • 29.
    How do chatbotswork? to:Seville 3. Chatbot extracts information from the message or asks for missing information I need to fly User says Intent I need to fly from Madrid to Seville on Thursday at 8 AM Book a flight At this point, the chatbot extracts key information from the input: parameters From:Madrid when:Thu. At 8 AM City 63OI 29/53
  • 30.
    How do chatbotswork? to:Seville 3. Chatbot extracts information from the message or asks for missing information I need to fly User says Intent I need to fly from Madrid to Seville on Thursday at 8 AM Book a flight At this point, the chatbot extracts key information from the input: parameters From:Madrid when:Thu. At 8 AM City entities 63OI 30/53
  • 31.
    How do chatbotswork? to:Seville 3. Chatbot extracts information from the message or asks for missing information I need to fly User says Intent I need to fly from Madrid to Seville on Thursday at 8 AM Book a flight At this point, the chatbot extracts key information from the input: parameters From:Madrid when:Thu. At 8 AM Time City entities 63OI 31/53
  • 32.
    How do chatbotswork? Time City entities 4. Build the response and send back the response to the user I need to fly ● Responses to the user: ○ text, images ● External service queries ○ External API rest ○ Database, etc. User says Action I need to fly from Madrid to Seville on Thursday at 8 AM The price of the ticket is 120$. Provide a card nº and billing name Both, user responses and external services queries: actions 63OI 32/53
  • 33.
    Testing chatbots User Chatbot Testcase inputTestcase output Hi there! Hi! How can I help you? Hi there! Hi! How can I Help you? … complex conversations 63OI 33/53
  • 34.
    Testing chatbots We useBotium and Rasa-test as the test suites to test the chatbots #me Hi there! #bot What day do you want to come in? #me GREET_UTTERANCES_USER #bot GREET_RESPONSES_USER Single test interaction Combination of multiple tests GREET_UTTERANCES_USER Hi there! Hi Hello Hey GREET_RESPONSES_USER Hi! How can I help you? Hello, what do you need? Greetings! This is the flight ticket assistant Antony, how can i help you? Multiple user utterances Possible responses convo file (conversation step) utterances responses 63OI 34/53
  • 35.
    Testing chatbots Hi there! I needto fly from … Hi! How can I Help you? The price of the ticket … I lost my baggage Please, provide the flight ticket id … and complex conversations 63OI 35/53
  • 36.
    Mutation testing forchatbots User says Action What kinds of coffee are available? What kinds of coffee can I order? What can I drink here? Tell me what drinks there are You can take an expresso or an americano User NL phrase Order a coffe intentn Chatbot match intent … Order a wine … chatbot response 3 extract params build response external service User says Action What kinds of wine are available? What kinds of wine can I order? What can I drink here? Tell me what drinks there are You can take a Spanish wine or a French wine Intent matched Order a coffee Tell me what kinds of coffee I can drink here Order a wine 63OI 36/53
  • 37.
    Mutation testing forchatbots User says Action What kinds of coffee are available? What kinds of coffee can I order? What can I drink here? Tell me what drinks there are You can take an expresso or an americano User NL phrase Order a coffe intentn Chatbot match intent … Order a wine … chatbot response 3 extract params build response external service User says Action What kinds of wine are available? What kinds of wine can I order? What can I drink here? Tell me what drinks there are You can take a Spanish wine or a French wine 0.522 0.538 0.475 0.474 Tell me what kinds of coffee I can drink here Order a coffee: Keeps the two most different phrases Order a wine Semantic similarity 63OI 37/53
  • 38.
    Mutation testing forchatbots User says Action What kinds of coffee are available? What kinds of coffee can I order? What can I drink here? Tell me what drinks there are You can take an expresso or an americano User NL phrase Order a coffe intentn Chatbot match intent … Order a wine … chatbot response 3 extract params build response external service Order a coffee: Keeps the two most different phrases User says Action What kinds of wine are available? What kinds of wine can I order? What can I drink here? Tell me what drinks there are You can take a Spanish wine or a French wine Order a wine 0.522 0.538 0.475 0.474 Tell me what kinds of coffee I can drink here 63OI 38/53
  • 39.
    Mutation testing forchatbots User says Action What can I drink here? Tell me what drinks there are You can take an expresso or an americano User NL phrase Order a coffe intentn Chatbot match intent … Order a wine … chatbot response 3 extract params build response external service User says Action What kinds of wine are available? What kinds of wine can I order? What can I drink here? Tell me what drinks there are You can take a Spanish wine or a French wine Order a wine Tell me what kinds of coffee I can drink here Order a coffee 63OI 39/53
  • 40.
    Mutation testing forchatbots User says Action What can I drink here? Tell me what drinks there are You can take an expresso or an americano User NL phrase Order a coffe intentn Chatbot match intent … Order a wine … chatbot response 3 extract params build response external service User says Action What kinds of wine are available? What kinds of wine can I order? What can I drink here? Tell me what drinks there are You can take a Spanish wine or a French wine Order a wine Intent matched Tell me what kinds of coffee I can drink here Order a coffee 63OI 40/53
  • 41.
    Mutation testing forchatbots User says Action What can I drink here? Tell me what drinks there are You can take an expresso or an americano User NL phrase Order a coffe intentn Chatbot match intent … Order a wine … chatbot response 3 extract params build response external service User says Action What kinds of wine are available? What kinds of wine can I order? What can I drink here? Tell me what drinks there are You can take a Spanish wine or a French wine Order a wine Intent matched Tell me what kinds of coffee I can drink here Order a coffee Test-suite 63OI 41/53
  • 42.
    Mutation testing forchatbots User says Action What can I drink here? Tell me what drinks there are You can take an expresso or an americano User NL phrase Order a coffe intentn Chatbot match intent … Order a wine … chatbot response 3 extract params build response external service User says Action What kinds of wine are available? What kinds of wine can I order? What can I drink here? Tell me what drinks there are You can take a Spanish wine or a French wine Order a wine Intent matched Tell me what kinds of coffee I can drink here Order a coffee 63OI 42/53
  • 43.
    Mutation testing forchatbots User says Action What can I drink here? Tell me what drinks there are You can take an expresso or an americano User NL phrase Order a coffe intentn Chatbot match intent … Order a wine … chatbot response 3 extract params build respons e external service User says Action What kinds of wine are available? What kinds of wine can I order? What can I drink here? Tell me what drinks there are You can take a Spanish wine or a French wine Order a wine Intent matched Tell me what kinds of coffee I can drink here Order a coffee Test-suite 63OI 43/53
  • 44.
    Mutation operators forchatbots Operators for training phrases DPmax Deletes the most representative phrase of an intent DPmin Deletes the most different phrase of an intent DPWP Deletes training phrases with required parameter DPWL Deletes training phrases with literal K2Pmax Keeps the 2 most representative phrases K2Pmin Keeps the 2 most different phrases MPmax Moves the most representative phrase to the most similar intent MPmin Moves the most different phrase to the most different intent Operators for intents DIP Deletes intent parameter DPP Deletes parameter prompt SPO Sets required parameter to optional DFI Deletes fallback intent Operators for entities CRE Changes regular expression DLE Deletes literal from entity Operators for actions DA Deletes actions DPR Deletes a parameter used in a response SO Swaps outputs Operators for conversation flows DCS Deletes conversation step DCB Deletes conversation bifurcation Emulation of common errors made by chatbot developers 63OI 44/53
  • 45.
    Mutation testing forchatbots Dialogflow chatbot model parse 1 CONGA meta-model «conforms to» annotate annotated chatbot model Tensorflow annotation meta-model «conforms to» 2 mutate mutation operators (WODEL) 3 chatbot model mutant generate 4 chatbot impl. test 5 test suites mutation analysis report chatbot impl. WODEL-TEST 63OI 45/53
  • 46.
    RQ1: How applicableare the defined mut. ops.? RQ2: How effective are the defined mut. ops.? 39% 48% 67% 60% 77% 73% 78% 80% 67% 0% 0% 40% 50% 76% 14% 89% 87% 96% Alive Killed Mutation score by mutation operator 63OI 46/53
  • 47.
    RQ1: How applicableare the defined mut. ops.? RQ2: How effective are the defined mut. ops.? 39% 48% 67% 60% 77% 73% 78% 80% 67% 0% 0% 40% 50% 76% 14% 89% 87% 96% Alive Killed Mutation score by mutation operator 63OI 47/53
  • 48.
    RQ3: How effectiveis the MuT process? Botium automatic Botium by hand Rasa test 45% 94% 20% Alive Killed Mutation score by test suite kind 63OI 48/53
  • 49.
    RQ3: How effectiveis the MuT process? Botium automatic Botium by hand Rasa test 45% 94% 20% Alive Killed Mutation score by test suite kind 63OI 49/53
  • 50.
    RQ4: How efficientis the MuT process? times 0% 5% 10% 15% 20% 25% 30% 35% 0,1%0,2%0,3% 1,0%1,2%1,4%1,6%1,6%1,7% 2,6% 4,9% 8,4% 12,8% 27,5% 34,7% Covid19_tracer bikeShop e2e-bot Spaceonova personal-bot yassinelamarti Rasa-demo 256644 h4h-chatbot diagrams2ai dusbot legal-alien-chatbot Email-WhatsApp-Integration lankbanfinance Data-mining The mutation testing process of 67% of the chatbots was completed in less than 90 minutes 63OI 50/53
  • 51.
    RQ4: How efficientis the MuT process? times 0% 5% 10% 15% 20% 25% 30% 35% 0,1%0,2%0,3% 1,0%1,2%1,4%1,6%1,6%1,7% 2,6% 4,9% 8,4% 12,8% 27,5% 34,7% Covid19_tracer bikeShop e2e-bot Spaceonova personal-bot yassinelamarti Rasa-demo 256644 h4h-chatbot diagrams2ai dusbot legal-alien-chatbot Email-WhatsApp-Integration lankbanfinance Data-mining The mutation testing process of 67% of the chatbots was completed in less than 90 minutes 63OI 51/53
  • 52.
    Conclusions ▪ Wodel-Test easesthe engineering of MuT tools for DSLs ▪ Wodel-Test is a better option when we need to ▪ Access the source code of the mutants ▪ Reason which mutants reduce the mutation score and why ▪ Test new mutation operators 63OI 52/53
  • 53.
    Future work ▪ Automatethe detection of semantically equivalent mutants ▪ e.g., in the case of chatbots using confidence decrease heuristics ▪ Automate the synthesis of tests able to kill the alive mutants ▪ Optimize the MuT process → Parallelize the mutants generation ▪ Chatbots: adapt our approach to LLM-based agents 63OI 53/53
  • 54.
    https://langdevcon.org https://langdevcon.org Seville 17-19 October,2024 Mutation testing for DSLs The case of task- oriented chatbots 63OI Pablo Gómez-Abajo Modelling & Software Engineering Research Group Universidad Autónoma de Madrid