SlideShare a Scribd company logo
1 of 35
Download to read offline
Swiss Testing Day / DevOps Fusion 2019
Behavior-Driven Development for
Conversational Applications
Florian Georg, IBM
IBM Watson & Cloud Platform
2
About Me
Florian Georg
IBM Cloud & Cognitive Technical Leader
Cloud Platform Architect - A.I. and Data Science
IBM Switzerland
3
AGENDA
• Conversational Applications
• Case Study: UBS Companion
• Behaviour Driven Development and Live Specifications
• Summary
4
CONVERSATIONAL APPLICATIONS
• Natural language conversations between human
and machine
• Virtual Assistants, Customer Care, Information
Systems, First Level support ...
• Self-improvement through ongoing (re)-training and
feedback loops.
5
Concepts
Intents and Entities
#Intents
6
Concepts
Dialog
User Interface
Back-end
system
Watson Assistant
Other Cognitive Services
Dialog Skill
Search Skill
Assistant
Layer
13
AGENDA
• Conversational Applications
• Case Study: UBS Companion
• Behaviour Driven Development and Live Specifications
• Summary
14
http://ubs.com/beta
UBS Companion
An Exploration on
human digital
assistants
15
Motion Capturing
by FaceMe.com
16
Setup
Avatar
Client
Client Advisor
17
Service Line
World View
Friendly, non-human avatar
FIN
Non-Human Avatar
18
World View
19
„House View“:
Curated, consolidated
economic outlook and
advise.
„Serious“ Advisor
Dr. Daniel Kalt
Human-like Avatar
21
AGENDA
• Conversational Applications
• Case Study: UBS Companion
• Behaviour Driven Development and Live Specifications
• Summary
22
MANUAL TESTING
When building a conversational application, typical first steps
include repeated, manual testing of conversations as they are
modelled.
Problems
Manual conversation testing lacks
• Repeatability
• Consistency
• Automation
• Speed
• internals are not easily exposed for
inspection (e.g. confidence levels, context
variables...)
23
UNIT TESTS
Teams have implemented Unit tests against API calls
comparing actual vs. expected responses
Problems
• written and maintained by technical team
• technical tests are source code, e.g. not suitable for
communication with business stakeholders and
domain experts
24
BEHAVIOR-DRIVEN DEVELOPMENT (BDD)
“ Behavior-driven development combines the general techniques
and principles of TDD with ideas from domain-driven design and
object- oriented analysis and design to provide software
development and management teams with shared tools and a
shared process to collaborate on software development.”
https://en.wikipedia.org/wiki/Behavior-driven_development
25
UBIQUITOUS LANGUAGE
“A ubiquitous language is a (semi-)formal language
that is shared by all members of a software
development team — both software developers
and non-technical personnel. The language in
question is both used and developed by all team
members as a common means of discussing the
domain of the software in question.”
https://en.wikipedia.org/wiki/Behavior-driven_development
26
CUCUMBER AND GHERKIN
• Cucumber: Test Automation Framework.
We use the JavaScript / Node implementation
https://github.com/cucumber/cucumber-js
• Gherkin: Domain-specific Language (DSL) to
describe feature specifications using a semi-
structured, more "natural language" for non-
technical stakeholders
28
OUR APPROACH
Describe conversation scenarios using an (almost)
natural language
• "Happy path" conversation regression testing
• Corner cases / Digression / ”Navigation”
• Assert minimum #intent confidence
• Assert on context variables (white box)
• Goal:
create, discuss and test conversation specifications
collaboratively with technical and non-technical domain
experts.
29
SCENARIO SPECIFICATION
• Feature: Name of feature under test
• Background: setup stage before each scenario/test run
• Scenario: A single "test case"
• Steps: Indicate a sequence of executable steps in the
test case.
• The understood phrases are domain specific to our tool
(see next slides).
Feature: Customer Service – Sample
Background:
Given the conversation workspace is ”Customer Service Assistant"
And I start a new conversation
#Strict text matching
Scenario: Get directions to store
When I ask "give me directions"
Then Watson will respond "We're located by Union Square on the
corner of 13th and Broadway"
30
TEXT MATCHING
• Assert "strict" response text:
• Watson will respond "<STRING>"
• Match response text against a regular
expression:
• Watson will say something like
"<REG_EXP>"
• partial matches, ignores case
• Multiple output texts get concatenated
• Note: "I ask ..." - steps will trigger an API
request to the Watson Assistant service
# Regexp text matching
Scenario: Get directions to store from Landmark
When I ask "How do I find you coming from Times Square?"
Then Watson will respond something like
"from …* take the …* We're located by …"
31
INTENT DETECTION
• Assert that a specific #intent is
detected:
• Watson will detect that my intent is
"<intent name>"
• Assert a minimum confidence score:
• [and] have a confidence of at least
<percentage>
# Recognize intent
Scenario: Get connected to an human agent
When I ask "agent, please"
Then Watson will detect that my intent is "Connect_to_Agent"
And have a confidence of at least 95%
32
TEST FOR CONTEXT VARIABLES
• Assert $context_variable:
• "context_variable" will have a value of
"<expected_value>"
• "context_variable" will be "<expected_value>"
• "context_variable" will contain
"<expected_substring>"
• "context_variable" will match
"<expected_pattern>"
• Supports simple data types (Numbers, Strings)
and JSON
• {...}: JSON objects: deep equal test or partial
matching (JSON.stringify)
# Check context variables
Scenario: Make an appointment
When I ask "make an appointment"
Then Watson will respond like "What day ?"
And when I say ”next monday"
Then "date" will have a value of "2019-03-25"
33
SCENARIO OUTLINES
• Test scenarios using <placeholders>
• Examples: table containing the placeholder
values
• One row = one test case
34
GOOD PRACTICES
Some good practices that proved
useful in building real world
conversational
35
“GUARD RAILS”
• Detect deterioration and regressions, e.g.
caused by continuous (re-) training, workspace
migrations etc..
• Assert that "happy path" conversation flows
consistently work as expected
• Test correct dialog flow, e.g. digressions and
drill down/out conversation paths
• Test corner cases and fuzzy / problematic
intents
36
WHITE-BOX TEST
• Test "internals”
• Assert value of context variables after some
specific dialog steps
• Useful for checking correct filling of slots
• Confidence baselines for a set of related
inputs / intent examples
• integration with other systems
• Not really "BDD” - test on implementation
details
37
WEB APPLICATION: DEPLOYMENT & SETUP
• Docker image
• Manifests for IBM Cloud Kubernetes
service
• Persistence volume storage for
configuration and feature
specification (= test cases) data
38
RUN TEST
• Editor with syntax highlighting
• Run test
• Save & save as...
• Stderr/Stdout streamed to client
using Server-sent Events (SSE)
39
EMBED INTO CI/CD PIPELINE
#!/bin/bash
USERNAME=xxxxxx
PASSWORD=yyyyyy
URL=https://gateway-fra.watsonplatform.net/conversation/api
VERSION=2018-02-16
PROFILE=default
./node_modules/.bin/cucumber-js --profile $PROFILE 
--world-parameters '{"username": "'$USERNAME'", "password":
"'$PASSWORD'",  "url": "'$URL'", "version":"'$VERSION'"}'
40
AGENDA
• Conversational Applications
• Case Study: UBS Companion
• Behaviour Driven Development and Live Specifications
• Summary
41
Summary
... and Lessons Learned
• Test-first: Describe “happy path“
• Test-later: Automate test cases to avoid regressions
• Developer „white box“ tests: confidence levels and “intents“ behind utterances
• You will need domain experts, not just developers + requirements / chatlogs
• Test cases will evolve and need some “relaxing“
• Separate test vs. deployment instances, e.g. using API-call tagging
• Digressions, jumps and navigation (drill down / out) tend to be brittle
• Iterativly learning machines: automated testing and analytics is a must
Thank You!

More Related Content

Similar to Behavior-Driven-Development (BDD) for Conversational Applications

Delivering Developer Tools at Scale
Delivering Developer Tools at ScaleDelivering Developer Tools at Scale
Delivering Developer Tools at ScaleOracle Developers
 
Talent42 2014 Sam Wholley -
Talent42 2014 Sam Wholley - Talent42 2014 Sam Wholley -
Talent42 2014 Sam Wholley - Talent42
 
Enter the mind of an Agile Developer
Enter the mind of an Agile DeveloperEnter the mind of an Agile Developer
Enter the mind of an Agile DeveloperBSGAfrica
 
Platform engineering 101
Platform engineering 101Platform engineering 101
Platform engineering 101Sander Knape
 
APIs distribuidos con alta escalabilidad
APIs distribuidos con alta escalabilidadAPIs distribuidos con alta escalabilidad
APIs distribuidos con alta escalabilidadSoftware Guru
 
DevOps Friendly Doc Publishing for APIs & Microservices
DevOps Friendly Doc Publishing for APIs & MicroservicesDevOps Friendly Doc Publishing for APIs & Microservices
DevOps Friendly Doc Publishing for APIs & MicroservicesSonatype
 
Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020Varun Manik
 
A guide to hiring a great developer to build your first app (redacted version)
A guide to hiring a great developer to build your first app (redacted version)A guide to hiring a great developer to build your first app (redacted version)
A guide to hiring a great developer to build your first app (redacted version)Oursky
 
Microsoft BotFramework - Global AI Bootcamp Nepal 2022
Microsoft BotFramework - Global AI Bootcamp Nepal 2022Microsoft BotFramework - Global AI Bootcamp Nepal 2022
Microsoft BotFramework - Global AI Bootcamp Nepal 2022Marvin Heng
 
Cloud-based Modelling Solutions Empowering Tool Integration
Cloud-based Modelling Solutions Empowering Tool IntegrationCloud-based Modelling Solutions Empowering Tool Integration
Cloud-based Modelling Solutions Empowering Tool IntegrationIstvan Rath
 
CBDW2014 - Behavior Driven Development with TestBox
CBDW2014 - Behavior Driven Development with TestBoxCBDW2014 - Behavior Driven Development with TestBox
CBDW2014 - Behavior Driven Development with TestBoxOrtus Solutions, Corp
 
Web Development using Ruby on Rails
Web Development using Ruby on RailsWeb Development using Ruby on Rails
Web Development using Ruby on RailsAvi Kedar
 
If your code could speak, what would it tell you? Let GitHub Copilot Chat hel...
If your code could speak, what would it tell you? Let GitHub Copilot Chat hel...If your code could speak, what would it tell you? Let GitHub Copilot Chat hel...
If your code could speak, what would it tell you? Let GitHub Copilot Chat hel...Maxim Salnikov
 
SRV312 DevOps on AWS: Building Systems to Deliver Faster
SRV312 DevOps on AWS: Building Systems to Deliver FasterSRV312 DevOps on AWS: Building Systems to Deliver Faster
SRV312 DevOps on AWS: Building Systems to Deliver FasterAmazon Web Services
 
Build a Great Conversationalist
Build a Great ConversationalistBuild a Great Conversationalist
Build a Great ConversationalistLorenzo Barbieri
 
Office 365 Developer Bootcamp: Microsoft Teams
Office 365 Developer Bootcamp: Microsoft TeamsOffice 365 Developer Bootcamp: Microsoft Teams
Office 365 Developer Bootcamp: Microsoft TeamsDavid Schneider
 
Forge - DevCon 2016: Implementing Rich Applications in the Browser
Forge - DevCon 2016: Implementing Rich Applications in the BrowserForge - DevCon 2016: Implementing Rich Applications in the Browser
Forge - DevCon 2016: Implementing Rich Applications in the BrowserAutodesk
 
TypeScript - Javascript done right
TypeScript - Javascript done rightTypeScript - Javascript done right
TypeScript - Javascript done rightWekoslav Stefanovski
 

Similar to Behavior-Driven-Development (BDD) for Conversational Applications (20)

Delivering Developer Tools at Scale
Delivering Developer Tools at ScaleDelivering Developer Tools at Scale
Delivering Developer Tools at Scale
 
Talent42 2014 Sam Wholley -
Talent42 2014 Sam Wholley - Talent42 2014 Sam Wholley -
Talent42 2014 Sam Wholley -
 
Enter the mind of an Agile Developer
Enter the mind of an Agile DeveloperEnter the mind of an Agile Developer
Enter the mind of an Agile Developer
 
Platform engineering 101
Platform engineering 101Platform engineering 101
Platform engineering 101
 
Test box bdd
Test box bddTest box bdd
Test box bdd
 
APIs distribuidos con alta escalabilidad
APIs distribuidos con alta escalabilidadAPIs distribuidos con alta escalabilidad
APIs distribuidos con alta escalabilidad
 
DevOps Friendly Doc Publishing for APIs & Microservices
DevOps Friendly Doc Publishing for APIs & MicroservicesDevOps Friendly Doc Publishing for APIs & Microservices
DevOps Friendly Doc Publishing for APIs & Microservices
 
Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020
 
A guide to hiring a great developer to build your first app (redacted version)
A guide to hiring a great developer to build your first app (redacted version)A guide to hiring a great developer to build your first app (redacted version)
A guide to hiring a great developer to build your first app (redacted version)
 
Microsoft BotFramework - Global AI Bootcamp Nepal 2022
Microsoft BotFramework - Global AI Bootcamp Nepal 2022Microsoft BotFramework - Global AI Bootcamp Nepal 2022
Microsoft BotFramework - Global AI Bootcamp Nepal 2022
 
Cloud-based Modelling Solutions Empowering Tool Integration
Cloud-based Modelling Solutions Empowering Tool IntegrationCloud-based Modelling Solutions Empowering Tool Integration
Cloud-based Modelling Solutions Empowering Tool Integration
 
CBDW2014 - Behavior Driven Development with TestBox
CBDW2014 - Behavior Driven Development with TestBoxCBDW2014 - Behavior Driven Development with TestBox
CBDW2014 - Behavior Driven Development with TestBox
 
Web Development using Ruby on Rails
Web Development using Ruby on RailsWeb Development using Ruby on Rails
Web Development using Ruby on Rails
 
If your code could speak, what would it tell you? Let GitHub Copilot Chat hel...
If your code could speak, what would it tell you? Let GitHub Copilot Chat hel...If your code could speak, what would it tell you? Let GitHub Copilot Chat hel...
If your code could speak, what would it tell you? Let GitHub Copilot Chat hel...
 
SRV312 DevOps on AWS: Building Systems to Deliver Faster
SRV312 DevOps on AWS: Building Systems to Deliver FasterSRV312 DevOps on AWS: Building Systems to Deliver Faster
SRV312 DevOps on AWS: Building Systems to Deliver Faster
 
Case study
Case studyCase study
Case study
 
Build a Great Conversationalist
Build a Great ConversationalistBuild a Great Conversationalist
Build a Great Conversationalist
 
Office 365 Developer Bootcamp: Microsoft Teams
Office 365 Developer Bootcamp: Microsoft TeamsOffice 365 Developer Bootcamp: Microsoft Teams
Office 365 Developer Bootcamp: Microsoft Teams
 
Forge - DevCon 2016: Implementing Rich Applications in the Browser
Forge - DevCon 2016: Implementing Rich Applications in the BrowserForge - DevCon 2016: Implementing Rich Applications in the Browser
Forge - DevCon 2016: Implementing Rich Applications in the Browser
 
TypeScript - Javascript done right
TypeScript - Javascript done rightTypeScript - Javascript done right
TypeScript - Javascript done right
 

More from Florian Georg

Artificial Intelligence and Cognitive Computing
Artificial Intelligence and Cognitive ComputingArtificial Intelligence and Cognitive Computing
Artificial Intelligence and Cognitive ComputingFlorian Georg
 
Pillars of DevOps: Platform, Method and Architecture
Pillars of DevOps: Platform, Method and ArchitecturePillars of DevOps: Platform, Method and Architecture
Pillars of DevOps: Platform, Method and ArchitectureFlorian Georg
 
Enterprise PaaS, Cloud-Native Architecture and Microservices
Enterprise PaaS, Cloud-Native Architecture and MicroservicesEnterprise PaaS, Cloud-Native Architecture and Microservices
Enterprise PaaS, Cloud-Native Architecture and MicroservicesFlorian Georg
 
Continuous Delivery of Cloud Applications with Docker Containers and IBM Bluemix
Continuous Delivery of Cloud Applications with Docker Containers and IBM BluemixContinuous Delivery of Cloud Applications with Docker Containers and IBM Bluemix
Continuous Delivery of Cloud Applications with Docker Containers and IBM BluemixFlorian Georg
 
Stop Observing, Start Reacting - A new way for building collaborative, real-t...
Stop Observing, Start Reacting - A new way for building collaborative, real-t...Stop Observing, Start Reacting - A new way for building collaborative, real-t...
Stop Observing, Start Reacting - A new way for building collaborative, real-t...Florian Georg
 
Visual Exploration of Large Data sets with D3, crossfilter and dc.js
Visual Exploration of Large Data sets with D3, crossfilter and dc.jsVisual Exploration of Large Data sets with D3, crossfilter and dc.js
Visual Exploration of Large Data sets with D3, crossfilter and dc.jsFlorian Georg
 
The IBM Open Cloud Architecture (and Platform)
The IBM Open Cloud Architecture (and Platform)The IBM Open Cloud Architecture (and Platform)
The IBM Open Cloud Architecture (and Platform)Florian Georg
 
Development in the cloud for the cloud
Development in the cloud for the cloudDevelopment in the cloud for the cloud
Development in the cloud for the cloudFlorian Georg
 

More from Florian Georg (8)

Artificial Intelligence and Cognitive Computing
Artificial Intelligence and Cognitive ComputingArtificial Intelligence and Cognitive Computing
Artificial Intelligence and Cognitive Computing
 
Pillars of DevOps: Platform, Method and Architecture
Pillars of DevOps: Platform, Method and ArchitecturePillars of DevOps: Platform, Method and Architecture
Pillars of DevOps: Platform, Method and Architecture
 
Enterprise PaaS, Cloud-Native Architecture and Microservices
Enterprise PaaS, Cloud-Native Architecture and MicroservicesEnterprise PaaS, Cloud-Native Architecture and Microservices
Enterprise PaaS, Cloud-Native Architecture and Microservices
 
Continuous Delivery of Cloud Applications with Docker Containers and IBM Bluemix
Continuous Delivery of Cloud Applications with Docker Containers and IBM BluemixContinuous Delivery of Cloud Applications with Docker Containers and IBM Bluemix
Continuous Delivery of Cloud Applications with Docker Containers and IBM Bluemix
 
Stop Observing, Start Reacting - A new way for building collaborative, real-t...
Stop Observing, Start Reacting - A new way for building collaborative, real-t...Stop Observing, Start Reacting - A new way for building collaborative, real-t...
Stop Observing, Start Reacting - A new way for building collaborative, real-t...
 
Visual Exploration of Large Data sets with D3, crossfilter and dc.js
Visual Exploration of Large Data sets with D3, crossfilter and dc.jsVisual Exploration of Large Data sets with D3, crossfilter and dc.js
Visual Exploration of Large Data sets with D3, crossfilter and dc.js
 
The IBM Open Cloud Architecture (and Platform)
The IBM Open Cloud Architecture (and Platform)The IBM Open Cloud Architecture (and Platform)
The IBM Open Cloud Architecture (and Platform)
 
Development in the cloud for the cloud
Development in the cloud for the cloudDevelopment in the cloud for the cloud
Development in the cloud for the cloud
 

Recently uploaded

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Recently uploaded (20)

Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

Behavior-Driven-Development (BDD) for Conversational Applications

  • 1. Swiss Testing Day / DevOps Fusion 2019 Behavior-Driven Development for Conversational Applications Florian Georg, IBM IBM Watson & Cloud Platform
  • 2. 2 About Me Florian Georg IBM Cloud & Cognitive Technical Leader Cloud Platform Architect - A.I. and Data Science IBM Switzerland
  • 3. 3 AGENDA • Conversational Applications • Case Study: UBS Companion • Behaviour Driven Development and Live Specifications • Summary
  • 4. 4 CONVERSATIONAL APPLICATIONS • Natural language conversations between human and machine • Virtual Assistants, Customer Care, Information Systems, First Level support ... • Self-improvement through ongoing (re)-training and feedback loops.
  • 7. User Interface Back-end system Watson Assistant Other Cognitive Services Dialog Skill Search Skill Assistant Layer
  • 8. 13 AGENDA • Conversational Applications • Case Study: UBS Companion • Behaviour Driven Development and Live Specifications • Summary
  • 12. 17 Service Line World View Friendly, non-human avatar FIN Non-Human Avatar
  • 14. 19 „House View“: Curated, consolidated economic outlook and advise. „Serious“ Advisor Dr. Daniel Kalt Human-like Avatar
  • 15. 21 AGENDA • Conversational Applications • Case Study: UBS Companion • Behaviour Driven Development and Live Specifications • Summary
  • 16. 22 MANUAL TESTING When building a conversational application, typical first steps include repeated, manual testing of conversations as they are modelled. Problems Manual conversation testing lacks • Repeatability • Consistency • Automation • Speed • internals are not easily exposed for inspection (e.g. confidence levels, context variables...)
  • 17. 23 UNIT TESTS Teams have implemented Unit tests against API calls comparing actual vs. expected responses Problems • written and maintained by technical team • technical tests are source code, e.g. not suitable for communication with business stakeholders and domain experts
  • 18. 24 BEHAVIOR-DRIVEN DEVELOPMENT (BDD) “ Behavior-driven development combines the general techniques and principles of TDD with ideas from domain-driven design and object- oriented analysis and design to provide software development and management teams with shared tools and a shared process to collaborate on software development.” https://en.wikipedia.org/wiki/Behavior-driven_development
  • 19. 25 UBIQUITOUS LANGUAGE “A ubiquitous language is a (semi-)formal language that is shared by all members of a software development team — both software developers and non-technical personnel. The language in question is both used and developed by all team members as a common means of discussing the domain of the software in question.” https://en.wikipedia.org/wiki/Behavior-driven_development
  • 20. 26 CUCUMBER AND GHERKIN • Cucumber: Test Automation Framework. We use the JavaScript / Node implementation https://github.com/cucumber/cucumber-js • Gherkin: Domain-specific Language (DSL) to describe feature specifications using a semi- structured, more "natural language" for non- technical stakeholders
  • 21. 28 OUR APPROACH Describe conversation scenarios using an (almost) natural language • "Happy path" conversation regression testing • Corner cases / Digression / ”Navigation” • Assert minimum #intent confidence • Assert on context variables (white box) • Goal: create, discuss and test conversation specifications collaboratively with technical and non-technical domain experts.
  • 22. 29 SCENARIO SPECIFICATION • Feature: Name of feature under test • Background: setup stage before each scenario/test run • Scenario: A single "test case" • Steps: Indicate a sequence of executable steps in the test case. • The understood phrases are domain specific to our tool (see next slides). Feature: Customer Service – Sample Background: Given the conversation workspace is ”Customer Service Assistant" And I start a new conversation #Strict text matching Scenario: Get directions to store When I ask "give me directions" Then Watson will respond "We're located by Union Square on the corner of 13th and Broadway"
  • 23. 30 TEXT MATCHING • Assert "strict" response text: • Watson will respond "<STRING>" • Match response text against a regular expression: • Watson will say something like "<REG_EXP>" • partial matches, ignores case • Multiple output texts get concatenated • Note: "I ask ..." - steps will trigger an API request to the Watson Assistant service # Regexp text matching Scenario: Get directions to store from Landmark When I ask "How do I find you coming from Times Square?" Then Watson will respond something like "from …* take the …* We're located by …"
  • 24. 31 INTENT DETECTION • Assert that a specific #intent is detected: • Watson will detect that my intent is "<intent name>" • Assert a minimum confidence score: • [and] have a confidence of at least <percentage> # Recognize intent Scenario: Get connected to an human agent When I ask "agent, please" Then Watson will detect that my intent is "Connect_to_Agent" And have a confidence of at least 95%
  • 25. 32 TEST FOR CONTEXT VARIABLES • Assert $context_variable: • "context_variable" will have a value of "<expected_value>" • "context_variable" will be "<expected_value>" • "context_variable" will contain "<expected_substring>" • "context_variable" will match "<expected_pattern>" • Supports simple data types (Numbers, Strings) and JSON • {...}: JSON objects: deep equal test or partial matching (JSON.stringify) # Check context variables Scenario: Make an appointment When I ask "make an appointment" Then Watson will respond like "What day ?" And when I say ”next monday" Then "date" will have a value of "2019-03-25"
  • 26. 33 SCENARIO OUTLINES • Test scenarios using <placeholders> • Examples: table containing the placeholder values • One row = one test case
  • 27. 34 GOOD PRACTICES Some good practices that proved useful in building real world conversational
  • 28. 35 “GUARD RAILS” • Detect deterioration and regressions, e.g. caused by continuous (re-) training, workspace migrations etc.. • Assert that "happy path" conversation flows consistently work as expected • Test correct dialog flow, e.g. digressions and drill down/out conversation paths • Test corner cases and fuzzy / problematic intents
  • 29. 36 WHITE-BOX TEST • Test "internals” • Assert value of context variables after some specific dialog steps • Useful for checking correct filling of slots • Confidence baselines for a set of related inputs / intent examples • integration with other systems • Not really "BDD” - test on implementation details
  • 30. 37 WEB APPLICATION: DEPLOYMENT & SETUP • Docker image • Manifests for IBM Cloud Kubernetes service • Persistence volume storage for configuration and feature specification (= test cases) data
  • 31. 38 RUN TEST • Editor with syntax highlighting • Run test • Save & save as... • Stderr/Stdout streamed to client using Server-sent Events (SSE)
  • 32. 39 EMBED INTO CI/CD PIPELINE #!/bin/bash USERNAME=xxxxxx PASSWORD=yyyyyy URL=https://gateway-fra.watsonplatform.net/conversation/api VERSION=2018-02-16 PROFILE=default ./node_modules/.bin/cucumber-js --profile $PROFILE --world-parameters '{"username": "'$USERNAME'", "password": "'$PASSWORD'", "url": "'$URL'", "version":"'$VERSION'"}'
  • 33. 40 AGENDA • Conversational Applications • Case Study: UBS Companion • Behaviour Driven Development and Live Specifications • Summary
  • 34. 41 Summary ... and Lessons Learned • Test-first: Describe “happy path“ • Test-later: Automate test cases to avoid regressions • Developer „white box“ tests: confidence levels and “intents“ behind utterances • You will need domain experts, not just developers + requirements / chatlogs • Test cases will evolve and need some “relaxing“ • Separate test vs. deployment instances, e.g. using API-call tagging • Digressions, jumps and navigation (drill down / out) tend to be brittle • Iterativly learning machines: automated testing and analytics is a must