- Learn how to "usability test" AI interactions with humans and measure success
- Understand the two distinct ways that humans construct commands to AI systems and how, using physiological measurements, you can measure the human response to the AI system responses
Description
John Whalen explores the concept of cognitive design, describing how humans structure their commands to AI systems (syntax, word usage, prosody) and how to measure human reactions to AI responses using biometrics (facial emotion recognition, heart rate, GSR). Along the way, John shares insights into how to optimally architect the customer experience.
John offers an overview of the results of an evaluation of four major AI systems (Siri, Cortana, Alexa, and Google Assistant), tested by the young and old, those new to AI systems and those that use these tools every day, native and non-native speakers, and techies and non-techies. Each were asked to interact with the systems to request facts, complex information, jokes, commands, and calendar information while the evaluators recorded their commands, the AI response, and the human’s physiological response to the AI response (facial emotion, heart rate, and GSR).
There were several intriguing findings:
- There were two distinct ways humans constructed commands for the AI systems.
- The testers’ favorite AI systems were not always the ones that performed the best in terms of giving correct answers.
- There was a distinct physiological signature associated with a positive experience.
John explains how these findings can help you determine how you should measure the success of your AI system or chatbot and suggests new ways to predict market success that go beyond AI answer accuracy.
3. PhD Cognitive Science
Johns Hopkins University
Cognitive Neuroscience
Linguistics
Neural Networks/ML
Vision Science
Post Doc. at UCLA
during dot.com
boom
Professor in
Psychology
Univ. Delaware
Biometrics
Numerical Cognition
Founder, UX Lead
Brilliant Experience
User Insights
Digital Strategy
UX / CX
CXO
10Pearls
User Insights
Digital Strategy
AI + UX
John Whalen
6. An end-to-end digital experience and enterprise
software application partner.
Digital Experience &
Enterprise Mobile
Partner
Supplier of the Year:
Building Digital
Marketplace
50 on Fire – Hottest
Companies Award
8. Agenda
1. Why AI UX is so different
2. Designing for how people think
3. Introducing studies
4. Examples & results breakdown
5. Summary & implications
19. Language Wayfinding
Vision /
Attention
MemoryEmotion
Decision
Making
@johnwhalen #TheAIConf
Use brain science to build intelligent experiences
What words and
word order were in the
command?
Could user
understand their choices and
where they are in the
system?
Did the AI match
expectations?
Did the AI generate
a positive emotional
response?
Did the answer help
to solve a problem?
Did AI focus the user
on the answer?
35. Answer accuracy by age
63% 68% 66% 65%
@johnwhalen #TheAIConf
Note: Command style variance
36. Answer accuracy by assistant
76% 68% 61% 61%
@johnwhalen #TheAIConf
Siri CortanaGoogle
Assistant
Alexa
37. Note: The most accurate AI tools
are not the most preferred.
Emotion matters!
@johnwhalen #TheAIConf
38. Accuracy and preference by assistant
76% 61% 68% 61%
@johnwhalen #TheAIConf
PREFERREDACCURACY
38% 33% 5% 5%
Siri CortanaGoogle
Assistant
Alexa
39. Preferred Google Assistant:
“Annoying when they are
being human. I don't feel
anything with them.”
“It is weird that they
have a personality and
can already tell a joke.”
“Just give me the answer”
“I want general information
and answers fast.”
@johnwhalen #TheAIConf
40. Preferred Alexa:
“Alexa feels like it is addressing
me back. Feels like I am
interacting with a person.”
“Prefer Alexa because
it tells me its not sure.”
“I like when it answers the
question the way I worded it.”
“Responses were direct.
Fun to banter back and
forth.”
@johnwhalen #TheAIConf
41. 4b. Study 2 Results:
Personal + Business Tasks
42. Task types & AI assistants
Google
Home
Alexa SiriHound
Study2
Personal+
Business
50. Answer accuracy by assistant
77% 74% 59% 50%
@johnwhalen #TheAIConf
Hound
Google
Assistant
Siri Alexa
51. Answer accuracy by task type
72% 58%
@johnwhalen #TheAIConf
Personal
Tasks
Business
Tasks
52. Accuracy and preference by assistant
50% 74% 59% 77%
@johnwhalen #TheAIConf
PREFERREDACCURACY
35% 21% 21% 14%
Google
Assistant
SiriAlexa Hound
53. Preferred Hound:
“I do love that when you ask
it a question it provides you
an answer in two options.”
“It remembered what I
was talking about.”
“I liked the way it asked more
than one question to get to
the answer.”
“It was better at the
back and forth.”
@johnwhalen #TheAIConf
57. 1. The most accurate tools are not the most preferred.
50% 74% 59% 77%
@johnwhalen #TheAIConf
FAVORITEACCURACY
35% 21% 21% 14%
Google
Assistant
SiriAlexa Hound
62. Summary of Findings
1. The most accurate tools are not the most preferred.
Emotion matters!
2. Biometrics can detect emotional preference.
3. There is an opportunity for business assistants.
4. Humanize the experience. We need to design for how
people think.
@johnwhalen #TheAIConf