With each passing day, our relationship with computers grows more personal. The touch of a human hand has replaced the mouse, and conversational interfaces now seem set to replace all manner of button or conventional interface. Is this pure hype, or a true step change in the evolution of personal computing?
In this workshop we will look at the current state of conversational interfaces, the challenges and benefits they bring, and where things are heading.
DESIGNING FOR—AND BEYOND
—BOTS AND AGENTS
September 23 @nextconf
“Machines should work; People should think” an excerpt from
The Jim Henson Company 1967 video "Paperwork Explosion”.
"The dream of conversational interfaces
is that they will ﬁnally allow humans to
talk to computers in a way that puts the
onus on the software—not the user—to
ﬁgure out how to get things done.”
— FastCompany, Conversational Interfaces, explained
A conversational interface is a program that you primarily
interact with through a back-and-forth dialog—using either voice
or text—instead of a more traditional graphical UI.
…at least, that’s how we think of them today.
What is a conversational UI (CUI)?
The two most common types of CUI are currently (text-based)
chatbots and (mostly voice-based) AI assistants. But there
are also already, many variations on this theme.
What kind of CUIs are there?
“The introduction of bots to
Facebook and other platforms has
been overhyped—and the bots
themselves often aren't very
good…[many] aren’t nearly as good
as the native apps they were
designed to replace.”
— Facebook Messenger chief David Marcus
Is this bot thing just hype?
Right now…maybe. :) There *is*
a lot of hype, and many bots
are barely useful.
But it’s important to consider
why bots and AI assistant
exist today, as this can help us
understand where they go in
Chatbots are not a new invention, and either are AI assistant.
The much hated Clippy was
annoying, because it
promised a smart, helpful
assistant, yet wasn’t
sophisticated enough to
deliver on that promise.
Developed by MIT, the most famous Eliza bot was
DOCTOR, a simulation of a Rogerian psychotherapist.
We’ve been here before…
The reason conversational
interfaces may ﬁnally go
mainstream, is that we’ve
reached a combination or
human and technological
tipping points that have
created new opportunities
•cloud computing + data
The past few years have seen big
advances in artiﬁcial intelligence,
and machine learning technologies.
These technologies enable key
aspects of CUIs, such as automatic
speech recognition (which converts
voice to text) and natural language
processing (which determines an
input’s meaning). an example of language parsing and processing using
Facebook’s open source wit.ai
Cloud computing + data
The widespread availability of
low-cost, “inﬁnite storage”
through cloud computing let to a
big data explosion, and greatly
reduced the cost of the intensive
computation needed to run
(Many popular machine learning
APIs are in fact now combined
with a cloud offering).
cloud-based machine learning and
AWS cloud computing and cloud-based
cloud-based machine learning
Mobile is everywhere
Number of mobile internet device subscriptions worldwide (in billions)
Mobile now reaches half the
worldwide population, with
the largest recent and
projected gains in Asia and
countries outside Europe
and N. America.
This demographic change is
important as a mobile is
often the ﬁrst or only
computer these new
internet users will own.
For many mobile-ﬁrst users, social and messaging apps are a primary window
onto the internet. In fact—many even believe these apps are the internet.
kik (N America)
And if you use mobile, you use messaging
Source: Why Southeast Asia is Leading the world’s most disruptive business models
ﬁnd a social vendor browse products inquire via messaging
(often using another app)
get payment details
(digital or otherwise)
ship anconﬁrm payment
These messaging apps were in fact the ﬁrst prototypes of ‘conversational commerce’—
ad-hoc experiences assembled by users to meet a need.
“Most smartphone users download zero apps per month” - Quartz
Fewer apps used per month
of time spent on mobile is within
ﬁve non-native apps
Most download zero apps per month
These trends are colliding with a growing app fatigue. Although time
spent in apps is up, most people primarily use just a few apps—and
many of these, are messaging apps.
“Only ﬁve apps see heavy use” - TechCrunch
AI assistants are services whose
job is to serve as an enabler for
different types of interactions.
Their primary means of input
tends to be voice, but a user’s
mobile is often used to output
more complex data and
(can be voice + screen)
(can be voice + screen)
(can be voice + screen)
Most assistants have a collection of core behaviours—such as
fetching the time, setting an alarm, or sending an email—but most
are also platforms.
Just a few of ‘Ok Google’s’ core behaviours
With each new brand that
creates a service for the
platform, the assistant (and
therefore its users) gain a
new set of skills*.
*Amazon (shown right) actually calls these
skills. Other platform will have different names
Third party ‘skills’
Bots are small services
that you ‘chat’ with
through a text interface
such as Facebook
Messenger or SMS.
Chatbots (…or Bots)
The Taco Bell tacobot for Slack
Some bots are standalone products,
while others aim to provide a subset of
tasks from a larger service.
In this sense, bots are similar to the
‘skills’ found within assistants: single-
domain micro-applications that help
users complete a range of tasks related
to an activity—such as booking a flight
or ﬁnding an apartment.
Trim is a personal ﬁnance bot with a very
simple value proposition—help you save
money by keeping an eye on where and
how you spend.
The Expedia bot enables users to search for
hotels, and book them using expedia.com.
There are already quite a few
hybrid approaches. Facebook
M for example, is an AI
assistant that uses text chat
instead of voice.
More importantly however, it’s
one of a growing number of
services that combine
automation with ‘humans in
the loop’ .
“Hi! I’m M, your
personal assistant in
Facebook M has human
trainers who silently
supervise, and take over
assistants get to know
their clients to better
curate products to their
Clara, a scheduling AI
is supported by
Hopefully not :-)
There are many contexts where
we will still need a more traditional
graphical UI—either because the
task is just too graphical in nature,
or just because a bot doesn’t
really add to the experience.
Will everything become a bot or CUI?
These apps may however
soon have bots of their own.
interfaces are starting to
appear within more complex
apps that could beneﬁt from
smart, human-guided use of
Embedded, assistive AIs
While not (yet) conversational, the Google Sheets Explore
panel acts as an assistant that proactively suggests
alternate data renderings for your spreadsheet.
An AI whose job is to watch
•to proactively problem solve,
•suggest more effective
ways to complete a task,
•provide a more ‘human’
interface through which to
collaborate (with other
people, or other bots). Crystal provides ‘personality proﬁles’ for contacts, and
helps you better communicate with them.
…hence all the hype :)
The promise of conversational apps appears huge:
•more human and personal than a GUI
•faster and simpler to use…if the context is right
•low commitment, ephemeral…closer to the web than apps
•mobile ‘native’…born of, and uniquely suited to mobile
e.g. interaction models, contexts of use, use of sensors
…maybe one day a
merely a tool
Although bots are zero-install,
(and ‘skills’ for assistant
platforms are broadly similar)
users still have to know the
service exists before they can
enable or interact with it.
In this sense, we’ve somewhat
replaced the app store discovery
problem with a bot store
discovery problem :(
Thankfully, some platforms
already offer tools that make it
easy to share a bot or embed
just-in-time discovery within
(This will hopefully become standard
practice, and make bots more similar to
web sites, than traditional apps).
Just in time discovery plugins
Facebook web plugins enable users
to initiate a chat conversation, or
pass information to Messenger for
Share a bot
Share Telegram and
Facebook Messenger bots
using a hyperlink*.
*A URL opens in any browser, but Messenger and Telegram bots only
function within those apps. A shame that there isn’t further interoperability.
Just in time discovery isn’t
limited to digital platforms. A
key enabler, within WeChat is QR
codes—which are often used to
initiate or complete an offline-to-
online (O2O) interaction.
kik, Facebook Messenger and
Snapchat offer similar 2D codes,
which users can scan to follow a
brand, or initiate a conversation. ...in Korea, grocery
stores are embedded on
Subway platforms where
users scan QR codes to
buy items that are
delivered just-in-time for
KLM embeds Messenger plugins
at various stages:
• ticket purchase,
• boarding pass retrieval
Users who opt-in, then receive their
conﬁrmation, check-in notice, boarding pass
and flight status updates via Messenger.
Today (and for the foreseeable
future) bots and AI assistants will
remain pretty simple. Today’s
services are good at answering
simple questions, and are best
suited to completing simple,
If your bot promises more than this,
it will likely disappoint, and this is as
much due to human factors as
CUI proponents often
compare them to gesture
and touch based interfaces.
Interfaces that ‘natural’—
because most people
already know how to scroll,
swipe, speak or type.
‘Natural’ but not
While they may at ﬁrst glance
seem intuitive, ‘natural’
interaction models often
share similar challenges.
If for example, a gesture is
completely new, it will have
to be taught, and may be
hard to discover on its own.
Dash by Bragi “a discrete personal
assistant right in your ear”
Gesture: activate touch lock
Gesture: deactivate touch lock
Similarly, if you don’t know
what a bot or AI assistant can
do, or how to properly ask, you
can waste a lot of time
The simpler the bot, the easier
it will be for users to quickly,
build a conceptual model of
what it can do.
This is particularly critical for voice-only services
as there’s no screen to refer to.
The majority of bots are also still
powered by rules (not that
different from the decision trees
we’ve used for years in telephone
And although chats look like a
conversation, the bot is simply ‘slot-
ﬁlling’—asking the necessary questions
to formulate a query with set
It can only understand certain
questions, and respond with speciﬁc,
pre-chosen commands. If a user say
the wrong thing, it won’t know what she
Bots that use elements of
machine learning may go a step
further, as they can begin to
Users can therefore be less
speciﬁc with their commands, and
the system can generate its own
its vocabulary over time.
Next up…machine learning
Image: Isazi consulting*to a degree, you can’t yet expect full fluency from any of these systems
The most useful and successful
bots (even fairly complex ones)
have one job.
They also solve real,
demonstrable problems (and
ideally, something for which a
much better alternative doesn’t
Give the bot one job
This extremely simple bot identiﬁes images.
The problem the bot solves
should be easy to convey, simple
to understand, and (hopefully)
include steps that users may be
able to guess on their own.
Bots that leverage mobile
(camera, sensors, notiﬁcations
etc.) to simplify tasks, will often
be particularly useful.
Energy company account bot
• receive monthly bills
• check balance
• get monthly reminders to submit
a meter reading
• snap a photo of the meter to
send your reading (or type it in)
Use any means available to
help users quickly understand
what they can do.
Monday, 4:09 pm
Monday, 4:09 pm
Monday, 4:12 pm
Monday, 4:15 pm
Monday, 4:16 pm
Most bots are zero-install, but users
still see a bit of information before
they begin a chat.
Facebook Messenger for example,
provides an introductory screen where
you can set basic assumptions:
• how fast does the bot respond?
• what does the bot do?
• what can you ask?
• what personal data will it see?
It’s also good practice to welcome users with a few prompts describing
the most likely starting point, and what information the bot will need
to complete a that request.
I might get
This is my job
Start like this
Here are terms I
can ﬁlter by
Can I interest you in
this useful thing?
The more constrained or well understood the task—for example booking a train
ticket—the more likely users will make correct assumptions of their own. This
is less likely if your bot does something new or bespoke to your service.
A known/ﬁxed task?
Trim, the personal ﬁnance bot “can show you a few ways to save money”. Because
‘saving money’ isn’t binary…it must then explain what this means.
Platforms such as Facebook
Messenger, Telegram, and
Slack also enable you to
include custom buttons and
keyboards (in Telegram only)
that allow for faster, and
more accurate input.
Facebook quick reply buttons
Telegram custom keyboard
RESTRICTING TASKS AT PLATFORM-LEVEL
Apple has restricting third-party apps
within Siri to six domains: ride booking,
messaging, photo and video, payments,
VoIP and workouts.
This helps set expectations, as users are
(a bit) less likely to ask Siri for something
outside these categories.
Users also enjoy better UX as Apple can
gradually release, and optimize
vocabularies for each domain.
Third-party apps in Siri
Bots shouldn’t attempt
to replace what is best
left to a traditional
(…and if they do, they maybe shouldn’t use
Poncho the weather bot as role model)
glanceable, easy to understand
despite high information density
They also shouldn’t
attempt to replace things
that humans are really
Computers are really good at…
• data retrieval, sorting, ﬁltering
• complex maths,
• parsing vast datasets
• doing this over and over (they won’t get bored or frustrated)
Computers are getting better at…
• analyzing human sentiment
• understanding intent outside set domains or vocabularies
• determining content and context of images, video etc.
Computers are incapable of…
• emotional intelligence
• human reasoning
• (un-scripted) persuasion
• actual conversation!
(…a partial list in all cases)
There are also very basic
aspects of ‘real’ human
conversation that computers
still struggle with.
This includes, maintaining the
scope of a conversation,
together, and differentiating a
new question, from a follow-on
question. Source: @jonesabi
This can be particularly aggravating with text chat, as there’s a visual
record of the conversation. It’s therefore easy for users to assume the
bot ‘knows’ everything that’s been said.
In the case of Facebook M and personal
assistants like x.ai, providing human
assistance in tandem with automation
may be purely tactical.
"M is a human-trained system:
Human operators evaluate the
AI's suggested responses, and
then they produce responses
while the AI observes and
learns from them.”
— Facebook AI Research
Other reasons to involve humans
Take over complex tasks
that can’t be automated
• “plan a birthday party”
Offer services that can’t
yet be automated
• APIs often don’t yet exist
for one AI or service to
interface with another
Generate usage data • clarify key use cases to
inform the product
• to train the AI
BUILT-IN HUMAN ROUTING BASED ON CONTEXT
Edward’s design was informed by a
deep understanding of typical
guest queries. The goal was to
automate the most common and
routine queries, to free up front desk
staff for face to face interactions.
what cuisine does your restaurant serve?
Tuesday, 8:30 pm
please send me some ice
Tuesday, 8:00 pm
please don’t clean my room today
Tuesday, 7:45 am
what time do I need to check out?
Wednesday, 7:00 am
can you send me more towels?
Tuesday, 7:12 am
I’d like a paper delivered to my room
Monday, 6:00 pm
Hi…i’m Edward, Radisson Blu
Edwardian’s virtual host
Monday, 4:09 pm
“We were intrigued to ﬁnd out
how many different questions
a guest can have during a stay:
153 to be precise”
— Tobias Goebel, Aspect software
Edward, the virtual host
Edward’s handles routine questions,
and automatically routes more
complex requests to appropriate staff.
Source: Aspect software
Universal template for
Despite your best efforts,
users will get stuck, or
need help that’s beyond
the bot’s capabilities.
Always build in easy and
intuitive ways for users to
quit a task, start over, or
speak to a person*.
*even if the response is not immediate
Shown when you ﬁrst open the bot.
Nice! But you may forget it’s there.
I tried this, to see what would happen,
and was pleasantly surprised. Nice!
From Bot to human…
Users see this when they
directly message customer
service out of hours.
From human to bot…
A few nice examples…
“Pretending that bots are humans is
impersonal. If customers are in conversation
with an entity that they think is a person, but
then realise through inevitable technical
limitations that it is in fact a bot, how do you
imagine they will feel?
And how could that feeling ever be good for
— Paul Adams, Bots vs. humans
While it’s good practice
to enable users to
switch from human to
process may not be in
your best interest.
This is down to trust, but also our tendency to anthropomorphize;
to attribute human characteristics to animals, inanimate objects,
or natural phenomena.
“[iRobot] regularly received calls asking for help to
ﬁx “Rosie” or “Seamus” or “Floorence”. Customers
expressed concern when iRobot told them to mail
in their Roomba, and receive a new one in return—
as they might with another small appliance.
…They didn’t want a new vacuum…they wanted
“Rosie” to be ﬁxed—or more to the point, healed.”
— Paul Colin Angle, CEO if iRobot
Anthropomorphism isn’t completely understood, but can occur even
if the object has no recognizable human form.
…it can even occur
when a ‘thing’ has no
physical form at all.
As people are likely to
attribute human qualities to
your bot regardless, you
should consider what kind of
personality you’d like it have.
“Bots are personas, whether or not
it’s intended. Every participant will
project an identity onto the bot, its
gender and personality — whether
or not it has been created
intentionally by the design team.”
— Chatbots ultimate prototyping tool, IDEO
Personality can be tricky to get
right. A common problem is to
misjudge how much personality
may be too much—and in what
Jokes may be OK for this weather
bot, but would be exasperating if
this were a airline bot with a flight
Facebook is trying to seem friendly,
but if the context is wrong, it just feels
weird (Zach is Scott’s son).
“…we had to outlaw Howdy’s bots
from asking rhetorical questions
‘because people expect to respond
to them, even though the bot was
just being polite’.”
— FastCompany, Designing chatbot personalities
Cultural and social norms
Politeness can be deeply cultural, and
consumers is certain markets may feel
particularly compelled to respond.
Culture, social norms, and the user’s personal context are also a factor.
People often experiment with a
bot (either to understand what
it can do, or just for fun).
Anticipating these questions is
a nice way to develop the bot’s
personality in a more neutral
context (i.e. users aren’t actively
trying to ‘get things done’…so
may be more open to chit chat).
Google Assistant, within Allo
Communicating with services
on a private device, and in a
more personal context, also
changes our expectations.
Any brand or organization
entering this space should
consider whether this may
create entirely new, and
Source: Washington Post (March 2016)
Siri’s response to
‘I was raped’…
“I don’t know what that
means. If you like, I can
search the Web for ‘I was
Samsung S Voice:
‘I am depressed’…
“Maybe it’s time for you to
take a break and get a
change of scenery.”
Society is still coming to terms
with what this means, and
where the responsibility may
lie in these complex, and very
A complicating factor is that,
some software is no longer
taught what to say—it simply
decides on its own*.
*based on input from millions of users with
many thanks to the
amazing photographers on
Presentation deck available @