Behavior-Driven-Development (BDD) for Conversational Applications

Swiss Testing Day / DevOps Fusion 2019
Behavior-Driven Development for
Conversational Applications
Florian Georg, IBM
IBM Watson & Cloud Platform

2
About Me
Florian Georg
IBM Cloud & Cognitive Technical Leader
Cloud Platform Architect - A.I. and Data Science
IBM Switzerland

3
AGENDA
• Conversational Applications
• Case Study: UBS Companion
• Behaviour Driven Development and Live Specifications
• Summary

4
CONVERSATIONAL APPLICATIONS
• Natural language conversations between human
and machine
• Virtual Assistants, Customer Care, Information
Systems, First Level support ...
• Self-improvement through ongoing (re)-training and
feedback loops.

5
Concepts
Intents and Entities
#Intents

User Interface
Back-end
system
Watson Assistant
Other Cognitive Services
Dialog Skill
Search Skill
Assistant
Layer

13
AGENDA
• Summary

14
http://ubs.com/beta
UBS Companion
An Exploration on
human digital
assistants

15
Motion Capturing
by FaceMe.com

16
Setup
Avatar
Client
Client Advisor

17
Service Line
World View
Friendly, non-human avatar
FIN
Non-Human Avatar

19
„House View“:
Curated, consolidated
economic outlook and
advise.
„Serious“ Advisor
Dr. Daniel Kalt
Human-like Avatar

21
AGENDA
• Summary

22
MANUAL TESTING
When building a conversational application, typical first steps
include repeated, manual testing of conversations as they are
modelled.
Problems
Manual conversation testing lacks
• Repeatability
• Consistency
• Automation
• Speed
• internals are not easily exposed for
inspection (e.g. confidence levels, context
variables...)

23
UNIT TESTS
Teams have implemented Unit tests against API calls
comparing actual vs. expected responses
Problems
• written and maintained by technical team
• technical tests are source code, e.g. not suitable for
communication with business stakeholders and
domain experts

24
BEHAVIOR-DRIVEN DEVELOPMENT (BDD)
“ Behavior-driven development combines the general techniques
and principles of TDD with ideas from domain-driven design and
object- oriented analysis and design to provide software
development and management teams with shared tools and a
shared process to collaborate on software development.”
https://en.wikipedia.org/wiki/Behavior-driven_development

25
UBIQUITOUS LANGUAGE
“A ubiquitous language is a (semi-)formal language
that is shared by all members of a software
development team — both software developers
and non-technical personnel. The language in
question is both used and developed by all team
members as a common means of discussing the
domain of the software in question.”
https://en.wikipedia.org/wiki/Behavior-driven_development

26
CUCUMBER AND GHERKIN
• Cucumber: Test Automation Framework.
We use the JavaScript / Node implementation
https://github.com/cucumber/cucumber-js
• Gherkin: Domain-specific Language (DSL) to
describe feature specifications using a semi-
structured, more "natural language" for non-
technical stakeholders

28
OUR APPROACH
Describe conversation scenarios using an (almost)
natural language
• "Happy path" conversation regression testing
• Corner cases / Digression / ”Navigation”
• Assert minimum #intent confidence
• Assert on context variables (white box)
• Goal:
create, discuss and test conversation specifications
collaboratively with technical and non-technical domain
experts.

29
SCENARIO SPECIFICATION
• Feature: Name of feature under test
• Background: setup stage before each scenario/test run
• Scenario: A single "test case"
• Steps: Indicate a sequence of executable steps in the
test case.
• The understood phrases are domain specific to our tool
(see next slides).
Feature: Customer Service – Sample
Background:
Given the conversation workspace is ”Customer Service Assistant"
And I start a new conversation
#Strict text matching
Scenario: Get directions to store
When I ask "give me directions"
Then Watson will respond "We're located by Union Square on the
corner of 13th and Broadway"

30
TEXT MATCHING
• Assert "strict" response text:
• Watson will respond "<STRING>"
• Match response text against a regular
expression:
• Watson will say something like
"<REG_EXP>"
• partial matches, ignores case
• Multiple output texts get concatenated
• Note: "I ask ..." - steps will trigger an API
request to the Watson Assistant service
# Regexp text matching
Scenario: Get directions to store from Landmark
When I ask "How do I find you coming from Times Square?"
Then Watson will respond something like
"from …* take the …* We're located by …"

31
INTENT DETECTION
• Assert that a specific #intent is
detected:
• Watson will detect that my intent is
"<intent name>"
• Assert a minimum confidence score:
• [and] have a confidence of at least
<percentage>
# Recognize intent
Scenario: Get connected to an human agent
When I ask "agent, please"
Then Watson will detect that my intent is "Connect_to_Agent"
And have a confidence of at least 95%

32
TEST FOR CONTEXT VARIABLES
• Assert $context_variable:
• "context_variable" will have a value of
"<expected_value>"
• "context_variable" will be "<expected_value>"
• "context_variable" will contain
"<expected_substring>"
• "context_variable" will match
"<expected_pattern>"
• Supports simple data types (Numbers, Strings)
and JSON
• {...}: JSON objects: deep equal test or partial
matching (JSON.stringify)
# Check context variables
Scenario: Make an appointment
When I ask "make an appointment"
Then Watson will respond like "What day ?"
And when I say ”next monday"
Then "date" will have a value of "2019-03-25"

33
SCENARIO OUTLINES
• Test scenarios using <placeholders>
• Examples: table containing the placeholder
values
• One row = one test case

34
GOOD PRACTICES
Some good practices that proved
useful in building real world
conversational

35
“GUARD RAILS”
• Detect deterioration and regressions, e.g.
caused by continuous (re-) training, workspace
migrations etc..
• Assert that "happy path" conversation flows
consistently work as expected
• Test correct dialog flow, e.g. digressions and
drill down/out conversation paths
• Test corner cases and fuzzy / problematic
intents

36
WHITE-BOX TEST
• Test "internals”
• Assert value of context variables after some
specific dialog steps
• Useful for checking correct filling of slots
• Confidence baselines for a set of related
inputs / intent examples
• integration with other systems
• Not really "BDD” - test on implementation
details

37
WEB APPLICATION: DEPLOYMENT & SETUP
• Docker image
• Manifests for IBM Cloud Kubernetes
service
• Persistence volume storage for
configuration and feature
specification (= test cases) data

38
RUN TEST
• Editor with syntax highlighting
• Run test
• Save & save as...
• Stderr/Stdout streamed to client
using Server-sent Events (SSE)

39
EMBED INTO CI/CD PIPELINE
#!/bin/bash
USERNAME=xxxxxx
PASSWORD=yyyyyy
URL=https://gateway-fra.watsonplatform.net/conversation/api
VERSION=2018-02-16
PROFILE=default
./node_modules/.bin/cucumber-js --profile $PROFILE
--world-parameters '{"username": "'$USERNAME'", "password":
"'$PASSWORD'", "url": "'$URL'", "version":"'$VERSION'"}'

40
AGENDA
• Summary

41
Summary
... and Lessons Learned
• Test-first: Describe “happy path“
• Test-later: Automate test cases to avoid regressions
• Developer „white box“ tests: confidence levels and “intents“ behind utterances
• You will need domain experts, not just developers + requirements / chatlogs
• Test cases will evolve and need some “relaxing“
• Separate test vs. deployment instances, e.g. using API-call tagging
• Digressions, jumps and navigation (drill down / out) tend to be brittle
• Iterativly learning machines: automated testing and analytics is a must

Behavior-Driven-Development (BDD) for Conversational Applications

Recommended

Recommended

More Related Content

Similar to Behavior-Driven-Development (BDD) for Conversational Applications

Similar to Behavior-Driven-Development (BDD) for Conversational Applications (20)

More from Florian Georg

More from Florian Georg (8)

Recently uploaded

Recently uploaded (20)

Behavior-Driven-Development (BDD) for Conversational Applications