How to Effectively Test Your Chatbot | Rasa Summit

•

0 likes•146 views

QA has always been under-rated and thus it is important to consider this equally important as the Dev. If we look at the Chatbot QA, it had been considered as a highly challenging work specially when you do not know where your bot may break while you sequentially will be only running your flow (stories). Most of the companies / tools only check the flow which are coded in a fixed format which often breaks while testing. There may be cases where bot are migrated to new version and it breaks. The presentation will discuss the possibilities to test the bots by helping folks to create their coverage matrix for your stories, efficiently looking at the logs and mine information and most importantly what to test and which components to test. Presented by Director QA, DevOps & AIML at APTY.IO, Soumya Mukherjee at the 2021 Rasa Summit https://rasa.com/summit/

Technology

How to
”Effectively” ”Test”
your Chatbot
Soumya Mukherjee
Director QA, DevOps & AIML
Apty.IO

How are we doing our QA today
• Testing is Blackbox for testers
• Mostly manual testing done in organization
• Conversational flow testing
• Small Talk
• Fallback checks
• Integrations
• Automation done on UI and API layer
• Testing is mostly done on same training data
• Models are trained by engineers and are not being
monitored by QA
• There are analytics tools available to monitor but it
needs technical expertise for the QA
• Result : More than 90% times bot breaks (no one
understands when it will break), most of them fallback
and get stuck - once bot is stuck it is stuck
Q ?
A

What are the issues in QA ?
• Bots are evolving and continuous story creation is a problem
• No tool manage story coverage
• Your training data may not correspond to new stories or vice versa (it’s a
mismatch) – most org keep training on the same data
• Most automation tools offers record and playback (My stories are
already written how to port is the question)

What are the issues in QA ?
• No (unified) centralized dashboard present where QA can check (everything is quite scattered)
• Intent Matching
• Entity Testing – Slot identification
• Entity Testing – Entity Validation
• Confidence score
• Confusion Matrix along with Precision/Recall/F1-Score
• No easy way to reset the failed bot !
• Bot versioning is a mess and A/B testing becomes difficult
• Multilingual bot QA is a challenge (have to make 2 separate bots)
• High confidence score is also a problem as your bot will only predict same thing (if the data is same
for multiple intents then it will predict the one with highest confidence score – may be incorrect)
How to make sure your bot never breaks ?

How to make your test effective ?
• Create scenarios for happy path, contextual questions, digressions, domain
specific questions, stateless conversations
• Map proper entities for common scenarios (example bus fee, tuition fee) –
flow should change with entities in the stories
• Automated tests should consume all stories and run them each time as part
of regression testing
• Story coverage visualization
• For Manual Testing use Bot emulation product (like RasaX, Botfront) to test

Other KPIs to track
• Activity Volume
• Bounce rate
• Retention rate
• Open sessions count
• Session times (conversation length)
• Goal completion rate
• User feedback (sentiments)
• Fallback rate (Confusion rate, reset rate & Human takeover rate)

Thanks
@QASoumya
Linkedin.com/in/mukherjeesoumya

What's hot

AI and Python: Developing a Conversational Interface using Pythonamyiris

Introduction to Aspect Oriented ProgrammingYan Cui

Chatbot Tutorial - Create your first bot with Xatkit Jordi Cabot

Aspect Oriented Programing - IntroductionVenkaiah Chowdary Koneru

Code Review tool for personal effectiveness and waste analysisMikalai Alimenkou

Webinar: How to Use Integrated Version Control in Rasa XRasa Technologies

DevOps & Technical Agility: From Theory to PracticeLemi Orhan Ergin

Presentation delexAlexander Pushkarev

Developing Intelligent Chatbots using RASA, OW2con'19, June 12-13, 2019 in ParisOW2

When you get lost in api testing #ForumPHPPaula Čučuk

Best Practices for a Repeatable Shift-Left CommitmentApplause

Skillshare - From Noob to Tech CEO - nov 7th, 2011Kareem Amin

Kaiser Permanente CSUN 2018Mark Stimson

The 7 minute accessibility assessment and app rating systemAidan Tierney

Introduction to Aspect Oriented Programming (DDD South West 4.0)Yan Cui

Writing Testable Code in SharePointTim McCarthy

Research Updates from Rasa: Transformers in NLU and DialogueRasa Technologies

Low-code vs Model-Driven EngineeringJordi Cabot

Android application development part2Mayank Bhatt

Elements of a Test FrameworkSmartBear

What's hot (20)

AI and Python: Developing a Conversational Interface using Python

Introduction to Aspect Oriented Programming

Chatbot Tutorial - Create your first bot with Xatkit

Aspect Oriented Programing - Introduction

Code Review tool for personal effectiveness and waste analysis

Webinar: How to Use Integrated Version Control in Rasa X

DevOps & Technical Agility: From Theory to Practice

Presentation delex

Developing Intelligent Chatbots using RASA, OW2con'19, June 12-13, 2019 in Paris

When you get lost in api testing #ForumPHP

Best Practices for a Repeatable Shift-Left Commitment

Skillshare - From Noob to Tech CEO - nov 7th, 2011

Kaiser Permanente CSUN 2018

The 7 minute accessibility assessment and app rating system

Introduction to Aspect Oriented Programming (DDD South West 4.0)

Writing Testable Code in SharePoint

Research Updates from Rasa: Transformers in NLU and Dialogue

Low-code vs Model-Driven Engineering

Android application development part2

Elements of a Test Framework

Similar to How to Effectively Test Your Chatbot | Rasa Summit

Thomas Haver - Mobile Testing.pdfQA or the Highway

QAorHighway2016Bhupesh Dahal

Creating testing tools to support developmentChema del Barco

Test automation lessonSadaaki Emura

Test Automation Architecture That Works by Bhupesh DahalQA or the Highway

Karishma Kolli – Myth Busters on Test AutomationPractiTest

CV_Sachin_11Years_Automation_PerformanceSachin Kodagali

Automated Testing but like for PowerShell (April 2012)Rob Reynolds

Test team dynamics, Антон МужайлоSigma Software

Testing Conversational AIShama Ugale

How to scale your Test AutomationKlaus Salchner

Agile testingRaj Indugula

Why test automation projects are failingIgor Khrol

Start with passing tests (tdd for bugs) v0.5 (22 sep 2016)Dinis Cruz

SauceCon 2017: Making Your Mobile App AutomatableSauce Labs

A Sampling of ToolsDawn Code

Unit Testing and role of Test doublesRitesh Mehrotra

Winning the battle against Automated testingElena Laskavaia

How to Go Codeless for Automated Mobile App TestingApplause

Automated Acceptance Test Practices and PitfallsWyn B. Van Devanter

Similar to How to Effectively Test Your Chatbot | Rasa Summit (20)

Thomas Haver - Mobile Testing.pdf

QAorHighway2016

Creating testing tools to support development

Test automation lesson

Test Automation Architecture That Works by Bhupesh Dahal

Karishma Kolli – Myth Busters on Test Automation

CV_Sachin_11Years_Automation_Performance

Automated Testing but like for PowerShell (April 2012)

Test team dynamics, Антон Мужайло

Testing Conversational AI

How to scale your Test Automation

Agile testing

Why test automation projects are failing

Start with passing tests (tdd for bugs) v0.5 (22 sep 2016)

SauceCon 2017: Making Your Mobile App Automatable

A Sampling of Tools

Unit Testing and role of Test doubles

Winning the battle against Automated testing

How to Go Codeless for Automated Mobile App Testing

Automated Acceptance Test Practices and Pitfalls

Recently uploaded

Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea

DBX First Quarter 2024 Investor PresentationDropbox

Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez

CNIC Information System with Pakdata Cf In Pakistandanishmna97

AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

Platformless Horizons for Digital AdaptabilityWSO2

[BuildWithAI] Introduction to Gemini.pdfSandro Moreira

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays

AI in Action: Real World Use Cases by AnitarajAnitaRaj43

Why Teams call analytics are critical to your entire businesspanagenda

Understanding the FAA Part 107 License ..Christopher Logan Kennedy

Recently uploaded (20)

Six Myths about Ontologies: The Basics of Formal Ontology

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

DBX First Quarter 2024 Investor Presentation

Introduction to Multilingual Retrieval Augmented Generation (RAG)

Strategies for Landing an Oracle DBA Job as a Fresher

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

CNIC Information System with Pakdata Cf In Pakistan

AWS Community Day CPH - Three problems of Terraform

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

Platformless Horizons for Digital Adaptability

[BuildWithAI] Introduction to Gemini.pdf

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

AI in Action: Real World Use Cases by Anitaraj

Why Teams call analytics are critical to your entire business

Understanding the FAA Part 107 License ..

How to Effectively Test Your Chatbot | Rasa Summit

1. How to ”Effectively” ”Test” your Chatbot Soumya Mukherjee Director QA, DevOps & AIML Apty.IO

2. How are we doing our QA today • Testing is Blackbox for testers • Mostly manual testing done in organization • Conversational flow testing • Small Talk • Fallback checks • Integrations • Automation done on UI and API layer • Testing is mostly done on same training data • Models are trained by engineers and are not being monitored by QA • There are analytics tools available to monitor but it needs technical expertise for the QA • Result : More than 90% times bot breaks (no one understands when it will break), most of them fallback and get stuck - once bot is stuck it is stuck Q ? A

3. What are the issues in QA ? • Bots are evolving and continuous story creation is a problem • No tool manage story coverage • Your training data may not correspond to new stories or vice versa (it’s a mismatch) – most org keep training on the same data • Most automation tools offers record and playback (My stories are already written how to port is the question)

4. What are the issues in QA ? • No (unified) centralized dashboard present where QA can check (everything is quite scattered) • Intent Matching • Entity Testing – Slot identification • Entity Testing – Entity Validation • Confidence score • Confusion Matrix along with Precision/Recall/F1-Score • No easy way to reset the failed bot ! • Bot versioning is a mess and A/B testing becomes difficult • Multilingual bot QA is a challenge (have to make 2 separate bots) • High confidence score is also a problem as your bot will only predict same thing (if the data is same for multiple intents then it will predict the one with highest confidence score – may be incorrect) How to make sure your bot never breaks ?

5. How to make your test effective ? • Create scenarios for happy path, contextual questions, digressions, domain specific questions, stateless conversations • Map proper entities for common scenarios (example bus fee, tuition fee) – flow should change with entities in the stories • Automated tests should consume all stories and run them each time as part of regression testing • Story coverage visualization • For Manual Testing use Bot emulation product (like RasaX, Botfront) to test

6. How to make your test effective ? • Central dashboarding including : • Confusion matrix, Precision, Recall and F1-Score • Cumulative accuracy profile • Cross validation results • Perform Exhaustive testing (bot resiliency), Integration checks across platforms, Webhooks • Perform fault tolerance testing by performing performance testing (bot response, session management) & security testing (api interaction, typing speed check, punctuations, typo errors)

7. Other KPIs to track • Activity Volume • Bounce rate • Retention rate • Open sessions count • Session times (conversation length) • Goal completion rate • User feedback (sentiments) • Fallback rate (Confusion rate, reset rate & Human takeover rate)

8. Thanks @QASoumya Linkedin.com/in/mukherjeesoumya

How to Effectively Test Your Chatbot | Rasa Summit

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to How to Effectively Test Your Chatbot | Rasa Summit

Similar to How to Effectively Test Your Chatbot | Rasa Summit (20)

More from Rasa Technologies

More from Rasa Technologies (20)

Recently uploaded

Recently uploaded (20)

How to Effectively Test Your Chatbot | Rasa Summit