Where's Jarvis? The Future of Voice Recognition and Natural Language User Interfaces UXPA 2016

Crispin Reedy
Crispin ReedyVoice User Experience Designer at Versay
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Where’s Jarvis?
The Future of Voice
Recognition and Natural
Language User Interfaces.
Crispin Reedy, Versay Solutions
@crispinTX crispinreedy.com
#UXPA2016
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
From the session description
• What is voice recognition?
• What is natural language understanding?
• What are the common technologies in the market
today?
• How does this fit with IoT?
• What are design considerations / methods to
evaluate these types of interfaces?
• Implied: Should I speech-enable my ___?
• Bonus Q: Why doesn’t it work the way we want it
to, and when will it?
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Should I Speech-Enable My ___?
Iron Man 2: Marvel Studios, Paramount Pictures
Star Trek Voyager: Paramount Television
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
“Tomato soup”
“Tomato soup.
Ok, what kind?”
“Just plain”
“Coming right
up!”
Implicit
confirmation
Second level-open
ended prompting
Cultural context: plain = hot
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Terms & Technologies
• Speech Recognition
• Natural Language Understanding
• Voice Verification (Biometrics)
• Text to Speech
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Speech Recognition “ASR”
“See the cat.”
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Natural Language Understanding
• Extracting meaning from natural text
“Hello, yes,
I’d like to
pay my
water bill.
Can you
help me with
that?
Intent =
BillPay
Entity
(Bill Type) =
Water
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Voice Verification
“My voice is
my password.”
“Authenticated.
Welcome, Mr.
Smith.”
✓
Text To Speech
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
What Is Good TTS?
• Phonemes change based on location
• “Cat”
• “Alligator”
• Elision
• “I’m. Awaiting. You.”
• “I’m awaiting you.”
• Intonation
• “Do you want coffee?”
• “Do you want soda, tea, or coffee?”
• Most TTS isn’t “Movie Quality”
IMDB
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
SSML Example
SSML
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Speech Recognition
• Hands-free command /
control
• Dictation
• Input text
• Small form factor
device, etc.
Text To Speech
• Output text dynamically
• Respond to input
• Useful when no
display is available
Natural Language
Understanding
• Necessary for all
language-based input
• Extract meaning
• Parse large volumes of
text
Voice Verification
• Security
ASR
Application
Data
• Sign-In
• Interaction
• Request
• Action
• Meaning
• Access Data
• Output
TTS
NLU
Voice
prints
Verifi-
cation
ASR
Application
Data
• Sign-In
• Interaction
• Request
• Action
• Meaning
• Access Data
• Output
TTS
NLU
Voice
prints
Verifi-
cation
Touch
Keyboard
Manage I/O Modality
Determine Meaning in
Context
Visual
Context!
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
ASR
Where's Jarvis?  The Future of Voice Recognition and Natural Language User Interfaces UXPA 2016
World
Knowledge
Semantics
Syntax
Lexicon
Morphology
Phonetics
Acoustics
Linguistics
Physiology
Concepts
Phrases
Words
Phonemes
Sounds
ASR
NLU
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Speech is ambiguous
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Language is ambiguous
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Everything is ambiguous
Speaker Independence
Speaker
Dependent
Multiple
Speakers
Speaker
Independent
Isolated Words
Connected
Words
Natural Speech
10 words
1000 words
100,000 words
Unlimited
VocabularySize
Humanlike
AUDREY: Automatic Digit
Recognizer
Bell Labs 1952
X — states
y — possible
observations
a — state transition
probabilities
b — output
probabilities
"HiddenMarkovModel" by Tdunningvectorization: Wikimedia
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Training
Speech
Recognition
Engine
Acoustic
Model
SLM and/or
Grammar
Pronunciation
Model
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Utterance
Noise
Levels?
Barge-In?
Feature
Extraction
Endpointing
Speech
Recognition
Engine
Grammar or SLM
Probabilities
n:best list
Literal return
Tokens
Recognition Event
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Early Commercial Adoptions
• Interactive Voice Response
• “Those Phone Menus”
• Server-based ASR
• Nuance
• Microsoft
• Voice-Enabled Handheld Devices
• Industrial / Productivity applications
• Device-based ASR
• Network not needed
Note: Call center
is still an
important
customer
touchpoint!
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Today’s Speech Agents vs. APIs
• Siri / Apple APIs
• Cortana / Cortana APIs
• Google Now / Google Voice Actions
• Amazon Echo (Alexa) / AVS API
• Jibo
• Ubi / Ubi Kit
• Assistant.ai / Api.ai
Alexa Skill vs. Amazon Voice Service
Amazon.com
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Alexa Skill Example
Amazon.com
Amazon.com
Capitol One.com
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
NLU
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Natural Language Understanding
• Parsing input to extract meaning
• Covers a large field
• Commands
• Automatic classification of emails
• Newspaper articles, large chunks of text
• Bots
• Conversational agents
• Messaging apps
• Personal assistants
• Input could be via speech or via text
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Levels of Meaning
Too Broad / Ambiguous Too MuchJust Right
“I’m having a problem
with my account.”
“Well, I was
looking at my
bill, because I
do that every
week, and I was
reviewing
everything on
there, and I
saw…”
“I’m seeing an
unusual charge
on my bill.”
“How can I help you?”
NLU Tasks
http://www.conversational-technologies.com/nldemos/nlDemos.html
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Intents and Entities
• “I’d like to transfer $50 from my checking account
to my savings account.”
• ACTION = Transfer (Intent)
• FROM_ACCOUNT = Checking (Entity)
• TO_ACCOUNT = Savings (Entity)
• AMOUNT = $50 (Entity)
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
NLU APIs
• API.ai
• Alexa
• Microsoft LUIS
• Wit.ai
• Google Voice Actions
• Etc.
Today’s NLU APIs
• Microsoft LUIS (part of Project Oxford)
Microsoft.com
Today’s NLU APIs
API.ai|
• API.ai
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
The Future Is Here
• DNN (Deep Neural Networks)
• Being applied to both ASR and NLU problems
• Requires large amounts of data to train the models
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
What’s The Glue Here?
Consistency
Across
Contexts?
“Omnichannel CX”
Data
Is
Everywhere
State Chart XML?
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
ASR vs. NLU: Wrap Up
ASR
• Spoken aloud
• Requires some NLU
even if it’s hand-crafted
(tagging)
• Useful in hands-free,
eyes-free contexts
NLU
• Focuses on meaning
extraction
• Could be used for chat
bots, etc.
• Machine learning to
train models
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Design Considerations
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Design Considerations
• What are you trying to build?
• What’s your platform?
• Existing guidelines / research
• User testing is key
• Especially if you’re trying to do something complicated
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Should I Speech-Enable My ___?
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
What’s Your ASR/NLU Platform?
Write an app (skill) for
an agent such as
Cortana / Alexa
Use cloud APIs to add
ASR / NLU to your app /
device / page / gadget
Download software and
use full-featured
capabilities for more robust
recognition on a specific
device
Build your own
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Network Availability
• Simply irritating… or totally unusable?
“What’s on my
calendar today?
“Sorry, I can’t
complete that request
right now.”
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Appropriate Modality?
• Voice Only? Voice + Display?
• Is it possible for the user to switch modalities?
• Or would switching potentially be dangerous?
“How long is the
flight from Dallas to
Seattle?
“I’ve got a few results
to show you.”
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Is State Maintained?
• Does your platform support a multiple-stage
interaction?
• Does it remember what you did previously?
“Who is Barack Obama?”
“Barack Obama is the 44th
president of the United
States.”
“How old is he?”
“I’m sorry, I don’t understand
your question.”
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Wake-Up Words
• How many of these “Agents”
will we be talking to?
“Jibo, take a picture.”
“Alexa, play music.”
“OK Google, set the
temperature to 77
degrees.”
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
System Personality
• Are you writing for an “Agent”
who has an existing style?
• What if your skill or app doesn’t
match that style?
• If not, should you create one?
“Hi, I’m Julie!”
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Context
• Real-world context
• Digital context
• How much does your app
know about where you are
and what it can do?
“When I get home,
remind me to take
out the trash.”
“I’m sorry, your calendar
doesn’t support location-
based reminders.”
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
What Are You Trying To Recognize?
• Long utterances work
better than short ones
• Letter names require extra
work
“Start a session”
“Got it”
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
And So Much More….
• What will you do when the
recognizer just can’t get it?
“I want my…. BARK
BARK BARK Timmy STOP
THAT NOW GET
DOWN!”
????
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Existing Guidelines / Research
• Caveat: Best practices evolved in one modality (e.g.
voice-only) may not apply the same way in another
(e.g. combined voice + touch)
• But they could be adapted
• Association for Voice Interaction Design (AVIxD.org)
• Wiki
• Peer-Reviewed Journal
• Virtual “Brown Bags”
• Academic Sources, Books
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
AVIxD.org
CUI Working Group is actively recruiting!
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Specific Example: “Help”
Voice XML
Standard
(2004)
“Help” should
be a global
command
AVIxD Wiki
(2014)
Stop using
“Help” as a
global
Agent API
Doc
(2015)
Offer “Help”
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Specific Example: “Help”
• Designers who tune applications have seen that the
word “help” is a known “False Attractor”
• Other things that you say which are short get recognized
as “help”
• People don’t voluntarily come up with “help”
unless they are prompted
• Give callers a context specific command only
where help may truly be needed, and call it
something besides "help”
• System: Say or enter your account number, or say, where
do I find it.
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Special Case: Car
• “Distracted Driver” is a hot topic!
• Richard Young, Wayne State University
• Paper: “Safe Interaction For Drivers”
• “Visual-Manual Mode” – What we do today
• “Auditory-Vocal Mode” – Speech only. NO GUI.
• “Mixed Mode” – Speech and GUI being used together
• Finding: If you give someone a graphic interface,
they’re going to look at it
• And take their eyes off the road
Design Documents
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Usability Studies / Research
• Special Challenges
• Technical setup
• Phone tap / Recording both sides
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions Warner Bros.
Early Stage Voice Only Prototype
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Should I Speech-Enable My ___?
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
What’s the Use Case?
• Enabling application
• User can’t do it any other way
• New tasks
• Enhancing application
• User can do it now
• But speech makes it better
• Faster
• Safer
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
API-Based
Device-
Based
Roll Your
Own /
Open-
Source
• Flexibility
• Power
• Customization
• Time
• Difficulty
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Cloud vs. Downloadable / Embedded
• Easy to get started
• Lightweight
• Not much specialized
knowledge
• Customizable
• Probably better recognition
• Can be device-specific
• More features
• Higher powered
• May require specialized
knowledge
– Speech scientist
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Open Source ASR
• CMU Sphinx
• pocketsphinx
• Kaldi
• http://kaldi-asr.org/
• Github
• New updates include some pretty interesting stuff (DNN)
• Requires:
• Corpus
• Tech know-how
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Should I Speech-Enable My ___?
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Should I Speech-Enable My ___?
Maybe
Iron Man 2: Marvel Studios, Paramount Pictures
Where’s Jarvis?
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Where’s Jarvis?
Gesture
Based
Interface
Artificial
Intelligence
Voice Based
Interface
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Where’s Jarvis?
ASR
NLU
Voice Design
Context
#UXPA2016Session Survey: http://www.uxpa2016.org/sessionsurvey?sessionid=321© 2016 Versay Solutions
Resources
• Handout / Web page
1 of 77

Recommended

Association for Voice Interaction Design Annual Meeting 2016 by
Association for Voice Interaction Design Annual Meeting 2016Association for Voice Interaction Design Annual Meeting 2016
Association for Voice Interaction Design Annual Meeting 2016Crispin Reedy
431 views17 slides
Chatbots vs. Voicebots Sunrise Session SpeechTEK 2017-final by
Chatbots vs. Voicebots Sunrise Session SpeechTEK 2017-finalChatbots vs. Voicebots Sunrise Session SpeechTEK 2017-final
Chatbots vs. Voicebots Sunrise Session SpeechTEK 2017-finalCrispin Reedy
720 views27 slides
Conversational User Interfaces, Past and Future by
Conversational User Interfaces, Past and FutureConversational User Interfaces, Past and Future
Conversational User Interfaces, Past and FutureCrispin Reedy
1.9K views58 slides
Association for Voice Interaction Design - Annual Meeting 2018 by
Association for Voice Interaction Design - Annual Meeting 2018Association for Voice Interaction Design - Annual Meeting 2018
Association for Voice Interaction Design - Annual Meeting 2018Crispin Reedy
389 views15 slides
Voice Recognition and Natural Language - Dallas TechFest 2016 by
Voice Recognition and Natural Language - Dallas TechFest 2016Voice Recognition and Natural Language - Dallas TechFest 2016
Voice Recognition and Natural Language - Dallas TechFest 2016Crispin Reedy
1K views88 slides
Adding Visuals to Voice Panel - SpeechTEK 2017 by
Adding Visuals to Voice Panel - SpeechTEK 2017Adding Visuals to Voice Panel - SpeechTEK 2017
Adding Visuals to Voice Panel - SpeechTEK 2017Crispin Reedy
622 views29 slides

More Related Content

Similar to Where's Jarvis? The Future of Voice Recognition and Natural Language User Interfaces UXPA 2016

Who's Using Our Product? A Story of Enterprise UX Research by
Who's Using Our Product? A Story of Enterprise UX ResearchWho's Using Our Product? A Story of Enterprise UX Research
Who's Using Our Product? A Story of Enterprise UX ResearchUXPA International
708 views42 slides
UXPA 2016 - Using UX Skills to Shape Your Career by
UXPA 2016 - Using UX Skills to Shape Your CareerUXPA 2016 - Using UX Skills to Shape Your Career
UXPA 2016 - Using UX Skills to Shape Your CareerAmanda Stockwell
786 views69 slides
Using UX skills to craft your career by
Using UX skills to craft your careerUsing UX skills to craft your career
Using UX skills to craft your careerUXPA International
1.7K views70 slides
What can social psychology teach us about (better) UX research? by
What can social psychology teach us about (better) UX research?What can social psychology teach us about (better) UX research?
What can social psychology teach us about (better) UX research?UXPA International
860 views52 slides
UX Research within an Agile Design and Development Sprint Cycle by
UX Research within an Agile Design and Development Sprint CycleUX Research within an Agile Design and Development Sprint Cycle
UX Research within an Agile Design and Development Sprint CycleUXPA International
3K views53 slides
Design Jams! How to run creative sessions with the people who use your product. by
Design Jams! How to run creative sessions with the people who use your product.Design Jams! How to run creative sessions with the people who use your product.
Design Jams! How to run creative sessions with the people who use your product.UXPA International
1.4K views63 slides

Similar to Where's Jarvis? The Future of Voice Recognition and Natural Language User Interfaces UXPA 2016(20)

Who's Using Our Product? A Story of Enterprise UX Research by UXPA International
Who's Using Our Product? A Story of Enterprise UX ResearchWho's Using Our Product? A Story of Enterprise UX Research
Who's Using Our Product? A Story of Enterprise UX Research
UXPA 2016 - Using UX Skills to Shape Your Career by Amanda Stockwell
UXPA 2016 - Using UX Skills to Shape Your CareerUXPA 2016 - Using UX Skills to Shape Your Career
UXPA 2016 - Using UX Skills to Shape Your Career
Amanda Stockwell786 views
What can social psychology teach us about (better) UX research? by UXPA International
What can social psychology teach us about (better) UX research?What can social psychology teach us about (better) UX research?
What can social psychology teach us about (better) UX research?
UX Research within an Agile Design and Development Sprint Cycle by UXPA International
UX Research within an Agile Design and Development Sprint CycleUX Research within an Agile Design and Development Sprint Cycle
UX Research within an Agile Design and Development Sprint Cycle
Design Jams! How to run creative sessions with the people who use your product. by UXPA International
Design Jams! How to run creative sessions with the people who use your product.Design Jams! How to run creative sessions with the people who use your product.
Design Jams! How to run creative sessions with the people who use your product.
UXPA International 1.4K views
Mature Products: The Cycle of UX Reinvention UXPA 2016 by Carol Smith
Mature Products: The Cycle of UX Reinvention UXPA 2016Mature Products: The Cycle of UX Reinvention UXPA 2016
Mature Products: The Cycle of UX Reinvention UXPA 2016
Carol Smith3.3K views
UserZoom & UXPA Present a Webinar: Build a Better Experience by UserZoom
UserZoom & UXPA Present a Webinar: Build a Better ExperienceUserZoom & UXPA Present a Webinar: Build a Better Experience
UserZoom & UXPA Present a Webinar: Build a Better Experience
UserZoom803 views
Re-use and Recycle: Building sustainable relationships with your users by UXPA International
Re-use and Recycle: Building sustainable relationships with your usersRe-use and Recycle: Building sustainable relationships with your users
Re-use and Recycle: Building sustainable relationships with your users
Presumptive Design: "It's not research! We're getting stuff done!" by UXPA International
Presumptive Design: "It's not research! We're getting stuff done!"Presumptive Design: "It's not research! We're getting stuff done!"
Presumptive Design: "It's not research! We're getting stuff done!"
UXPA International 1.9K views
Prototyping - 4 Strategic Factors for Designers by UXPA International
Prototyping - 4 Strategic Factors for DesignersPrototyping - 4 Strategic Factors for Designers
Prototyping - 4 Strategic Factors for Designers
UXPA International 1.7K views
Prototyping - 4 Strategic Factors for Designers - UXPA 2016 by Lyle Kantrovich
Prototyping - 4 Strategic Factors for Designers - UXPA 2016Prototyping - 4 Strategic Factors for Designers - UXPA 2016
Prototyping - 4 Strategic Factors for Designers - UXPA 2016
Lyle Kantrovich546 views
Incorporating UX into Your Projects by Karl Kaufmann
Incorporating UX into Your ProjectsIncorporating UX into Your Projects
Incorporating UX into Your Projects
Karl Kaufmann836 views
Under the Knife: Plastic Surgery for Classic Software by UXPA International
Under the Knife: Plastic Surgery for Classic SoftwareUnder the Knife: Plastic Surgery for Classic Software
Under the Knife: Plastic Surgery for Classic Software
The Journey Towards Continuous Deployment by Brian Mericle
The Journey Towards Continuous DeploymentThe Journey Towards Continuous Deployment
The Journey Towards Continuous Deployment
Brian Mericle92 views
Embedded User Assistance: Third Rail or Third Way? by Steven Jong
Embedded User Assistance: Third Rail or Third Way?Embedded User Assistance: Third Rail or Third Way?
Embedded User Assistance: Third Rail or Third Way?
Steven Jong2.8K views

More from Crispin Reedy

Assertive Niceness by
Assertive NicenessAssertive Niceness
Assertive NicenessCrispin Reedy
600 views48 slides
Voice User Interface Design - Big Design 2017 by
Voice User Interface Design - Big Design 2017Voice User Interface Design - Big Design 2017
Voice User Interface Design - Big Design 2017Crispin Reedy
2.7K views76 slides
Association for Voice Interaction Design Annual Meeting 2017 by
Association for Voice Interaction Design Annual Meeting 2017Association for Voice Interaction Design Annual Meeting 2017
Association for Voice Interaction Design Annual Meeting 2017Crispin Reedy
486 views15 slides
Top 10 Tips for Making Complicated Things Simple by
Top 10 Tips for Making Complicated Things SimpleTop 10 Tips for Making Complicated Things Simple
Top 10 Tips for Making Complicated Things SimpleCrispin Reedy
7.8K views91 slides
Going Solo: Design and Productivity Techniques for the Team of One by
Going Solo: Design and Productivity Techniques for the Team of OneGoing Solo: Design and Productivity Techniques for the Team of One
Going Solo: Design and Productivity Techniques for the Team of OneCrispin Reedy
645 views81 slides
Service Design and the Omnichannel Experience - SpeechTEK 2015 by
Service Design and the Omnichannel Experience - SpeechTEK 2015Service Design and the Omnichannel Experience - SpeechTEK 2015
Service Design and the Omnichannel Experience - SpeechTEK 2015Crispin Reedy
4.9K views31 slides

More from Crispin Reedy(10)

Voice User Interface Design - Big Design 2017 by Crispin Reedy
Voice User Interface Design - Big Design 2017Voice User Interface Design - Big Design 2017
Voice User Interface Design - Big Design 2017
Crispin Reedy2.7K views
Association for Voice Interaction Design Annual Meeting 2017 by Crispin Reedy
Association for Voice Interaction Design Annual Meeting 2017Association for Voice Interaction Design Annual Meeting 2017
Association for Voice Interaction Design Annual Meeting 2017
Crispin Reedy486 views
Top 10 Tips for Making Complicated Things Simple by Crispin Reedy
Top 10 Tips for Making Complicated Things SimpleTop 10 Tips for Making Complicated Things Simple
Top 10 Tips for Making Complicated Things Simple
Crispin Reedy7.8K views
Going Solo: Design and Productivity Techniques for the Team of One by Crispin Reedy
Going Solo: Design and Productivity Techniques for the Team of OneGoing Solo: Design and Productivity Techniques for the Team of One
Going Solo: Design and Productivity Techniques for the Team of One
Crispin Reedy645 views
Service Design and the Omnichannel Experience - SpeechTEK 2015 by Crispin Reedy
Service Design and the Omnichannel Experience - SpeechTEK 2015Service Design and the Omnichannel Experience - SpeechTEK 2015
Service Design and the Omnichannel Experience - SpeechTEK 2015
Crispin Reedy4.9K views
Association for Voice Interaction Design Annual Meeting 2015 by Crispin Reedy
Association for Voice Interaction Design Annual Meeting 2015Association for Voice Interaction Design Annual Meeting 2015
Association for Voice Interaction Design Annual Meeting 2015
Crispin Reedy816 views
SpeechTEK University Outtakes 2014: Zero Out Strategies by Crispin Reedy
SpeechTEK University Outtakes 2014: Zero Out StrategiesSpeechTEK University Outtakes 2014: Zero Out Strategies
SpeechTEK University Outtakes 2014: Zero Out Strategies
Crispin Reedy698 views
2013 Speech TEK - Alphanumeric Recognition Discussion by Crispin Reedy
2013 Speech TEK - Alphanumeric Recognition Discussion2013 Speech TEK - Alphanumeric Recognition Discussion
2013 Speech TEK - Alphanumeric Recognition Discussion
Crispin Reedy894 views
Design Thinking Action Lab Exercise 1 by Crispin Reedy
Design Thinking Action Lab Exercise 1Design Thinking Action Lab Exercise 1
Design Thinking Action Lab Exercise 1
Crispin Reedy327 views

Recently uploaded

Future of AR - Facebook Presentation by
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook PresentationRob McCarty
65 views27 slides
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...The Digital Insurer
91 views52 slides
Evaluation of Quality of Experience of ABR Schemes in Gaming Stream by
Evaluation of Quality of Experience of ABR Schemes in Gaming StreamEvaluation of Quality of Experience of ABR Schemes in Gaming Stream
Evaluation of Quality of Experience of ABR Schemes in Gaming StreamAlpen-Adria-Universität
38 views34 slides
Business Analyst Series 2023 - Week 4 Session 8 by
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8DianaGray10
145 views13 slides
The Role of Patterns in the Era of Large Language Models by
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language ModelsYunyao Li
91 views65 slides
State of the Union - Rohit Yadav - Apache CloudStack by
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStackShapeBlue
303 views53 slides

Recently uploaded(20)

Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty65 views
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
Business Analyst Series 2023 - Week 4 Session 8 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8
DianaGray10145 views
The Role of Patterns in the Era of Large Language Models by Yunyao Li
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language Models
Yunyao Li91 views
State of the Union - Rohit Yadav - Apache CloudStack by ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue303 views
NTGapps NTG LowCode Platform by Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu437 views
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue199 views
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023 by BookNet Canada
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
BookNet Canada44 views
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online by ShapeBlue
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
ShapeBlue225 views
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ... by ShapeBlue
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
ShapeBlue129 views
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue171 views
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue by ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
ShapeBlue265 views
Transcript: Redefining the book supply chain: A glimpse into the future - Tec... by BookNet Canada
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
BookNet Canada41 views
"Node.js Development in 2024: trends and tools", Nikita Galkin by Fwdays
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin
Fwdays33 views
The Power of Generative AI in Accelerating No Code Adoption.pdf by Saeed Al Dhaheri
The Power of Generative AI in Accelerating No Code Adoption.pdfThe Power of Generative AI in Accelerating No Code Adoption.pdf
The Power of Generative AI in Accelerating No Code Adoption.pdf
Saeed Al Dhaheri39 views
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti... by ShapeBlue
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
ShapeBlue141 views
Initiating and Advancing Your Strategic GIS Governance Strategy by Safe Software
Initiating and Advancing Your Strategic GIS Governance StrategyInitiating and Advancing Your Strategic GIS Governance Strategy
Initiating and Advancing Your Strategic GIS Governance Strategy
Safe Software184 views
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha... by ShapeBlue
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
ShapeBlue183 views

Where's Jarvis? The Future of Voice Recognition and Natural Language User Interfaces UXPA 2016

Editor's Notes

  1. Voice User Interface Designer 10 years in the field English major, former coder; got interested in UX President of the Association for Voice Interaction Design Consultant for Versay Solutions 2 weeks in a row for conferences
  2. Jarvis: Audio and gestural Perfect recognition. No error recovery needed Great voice quality Connected to vast amounts of data Understands all the parts of the model: “Lose the landscape.” Context-sensitive. Aware of the space around him Sense of humor. “Am I to include the Belgian Waffle stands?” Takes initiative. “What is it you’re trying to achieve, sir?”
  3. Replicator: Good recognition No error recovery needed Good voice quality – understandable Connected to data – perhaps too much so? Context sensitive- but was this enough? A design failure (not a tech failure) Specifically around excessive disambiguation
  4. A Better Replicator Conversation
  5. “Speech to Text” ? Spoken Language – Machine readable format
  6. Not necessarily tied to speech recognition
  7. Also called voiceprints, biometrics, voice authentication, etc. Not going to discuss this one in a lot of detail today but it’s important that you understand the difference between these technologies. Recognizes a person, not necessarily what they are saying. You can have ASR without Voice Verification And vice versa
  8. Human voice talent Hundreds of hours of recording Digitized Phonemes: Concatenated speech synthesis
  9. Dynamic Speech Synthesis Many commercial products are available API-based Downloadable Quality varies If possible, record audio TTS has improved considerably, but is still noticeable High quality TTS may not be available in all situations If you have a lot of dynamic data TTS is useful You can mix recorded audio and TTS You may have to use TTS Voice Agent (Alexa, Cortana, etc.) API-based Some of them do let you mark up your TTS with SSML More phonemes = higher quality voice Also means a bigger download and install (if on device) Exceptions (addresses, names) can be iffy May require a lot of work to handle well St. James St. Saint James Street Punctuation Your data needs to be clean and ready to voice back Acronyms, incomplete sentences will not sound good It is possible to build a custom voice But it takes a lot of work!
  10. Speech Synthesis Markup Language XML based WC3 standard Not universally supported Tags which allow you produce a more natural quality output. Emphasis Break Voice Prosody Pitch
  11. World Knowledge: Concepts of the world around us, i.e. Tables have four legs, what is left and right, what is a car, etc. This is the level before language Semantics: The first level of language. Knowledge can be represented in structured meaningful elements. Example: semantics of a party invitation Syntax: The rules that govern putting words together to form meaningful units Lexicon: What words mean Morphology: How words change their form to perform differently in a language i.e. horse / horses Phonetics: Phonemes and how words are built Acoustics: What phonemes sound like and how to create them
  12. Speech is never stationary Coarticulation Noisy environments Accents Different speakers have voices with different acoustic qualities Goats Challenges vary depending on what you are going to recognize Spelling (short utterances) can be difficult even for humans Phonetic alphabet (Military)
  13. Humans can deduce meaning from context and unknown words “How can I help you?” I’m having a problem with my account. I’d like that one. No, not the green one, the red one. Time flies like an arrow. Fruit flies like a banana.
  14. All modern speech recognition is probabilistic GUI: Button clicked? true / false VUI: There is an 85% chance that button was clicked
  15. Three Dimensions of Speech Problems
  16. AUDREY: Davis, Biddulph, and Balashek - Bell Labs 1952 Analog Isolated digit recognition Pause between digits Speaker-dependent Speech recognition with vacuum tubes – How very steampunk. Her name was AUDREY. Let that sink in a minute. (Automatic Digit Recognizer)
  17. 1980’s: The Power of Statistics The recognition of connected speech becomes a search for the best path in a large network Problem of finding the probabilities Statistical Language Models Not all sequences of words are equally probable Rank all permissible sentences in terms of probability “Correct” grammar is not applicable Restricted by domain Hidden Markov Models (HMM) Unified probabilistic model for speech
  18. You’re Only As Good As What You’re Trained On Corpora Collection of speech used to train a recognizer Acoustic and/or Pronunciation Model Associates sounds with symbols and words. Created by a general speech corpora and a phonetic and orthographic transcription Statistical Language Model (SLM) A probability distribution over sequences of words Created by a domain-specific speech corpora and a tagged transcription to extract meaning
  19. Speech Agent: The “Person” who Distributed speech recognition Collection and compression of speech is on the device The language models are typically on the network Phone can be speaker-dependent Trains itself on your voice and on the acoustic environments you are in most often Many companies are providing APIs to use their speech recognition
  20. Alexa, Ask Capitol One What’s my current credit card balance?
  21. Observations to make: Represents the entirety of a VUI experience Placement of Spanish prompt would vary depending on type of call. Confirmation is variable Confirmation prompt is general
  22. What do you need it for? What kind of device will you be running it on? Connectivity? Can you use cloud based ASR? How much control do you need over the application / user interface?
  23. Jarvis: Audio and gestural Perfect recognition. No error recovery needed Great voice quality Connected to vast amounts of data Understands all the parts of the model: “Lose the landscape.” Context-sensitive. Aware of the space around him Sense of humor. “Am I to include the Belgian Waffle stands?” Takes initiative. “What is it you’re trying to achieve, sir?”