Moving from design of unimodal speech-enabled applications to multi- modal smartphone applications is not always straightforward. Bouzid highlights fundamental differences between these contexts and how we can transfer what we have learned from telephony-based IVR space to build highly usable voice solutions on the smartphone platform. Kelly discusses how understanding users, their environments, their devices, and available software tools can allow organizations to create powerful, engaging multimodal designs.
3. Why Spoken Conversation?
§ Speech is Natural
§ Conversation is Natural
§ Speech is efficient: speaking
requires less effort than typing
§ Use cases
• Dictation
• When searching is easier than
selecting
• Several interactions that require simple
responses
• Hands are busy
• Eyes are busy
• Short questions from device
• Short responses from user
• Sharing a spoken joke with friends
4. Why Conversation?
Multi-step Interactions aimed at solving a problem/
accomplishing something
User: What is Chipotle trading at?
App: Chipotle Mexican Grill is at $321.56. Up just a tad.
User: What’s the highest it has been in the last three months?
App: July 10 was highest in the last 3 months, trading at
$344.21.
User: Buy 100 shares.
App: You have Schwab and Fidelity. Which would you like?
User: Schwab.
App: Got it. I see you have an account ending in 2234. Use that
account?”
User: Yes.
App: OK. 100 shares at Market or at a Specific Price?
User: Market.
App: Got it. That trade has been placed for 100 shares at market.
I will send you an email confirmation when the shares are
purchased.
5. Telephony VUI
§ IVR is intrusive: Caller called to
speak to a human (Serving not
the caller but the business)
§ Only the Audio Mode: For input
(Speech, DTMF), For output
(Voice and sounds)
§ Clear interaction End Points
§ Interaction is Time Metered
(Utility business model)
§ User must give their Full
Attention to IVR
§ Personalization potential: low
§ Low sound quality: Played to
caller and spoken by caller
6. Smartphone NUI
§ User engages UI Voluntarily: I want to speak
to my assistant
§ Multi-modality available: user gets to input in
more than one mode
§ Start and End points are Fuzzy
§ Interaction is Task focused: engage to
accomplish a specific task (Task Completion
business model)
§ Multiple Tasks on at the same time
§ User Not Trapped in the interaction: may
Pause and return at their leisure. Pausing is
natural
§ Personalization potential: high
§ User seems to tolerate Delays more on
Smartphone than IVR
§ High Sound Quality: Played to user spoken
by caller
7. Telephony VUI vs Smartphone NUI
Telephony
Smartphone
Engagement
Type
Compulsory
Voluntary
Interac5on
Modes
Exclusively
Audio
Mul5-‐modal
Interac5on
Unit
Time
Task
Interac5on
End
Points
Clear
Fuzzy
ABen5on
Monopoly
High
Low
Tolera5on
for
Delays
Low
High
Sound
Quality
Low
High
8. VUI Strategies
§ Pausing:
– In telephony need full VUI
– In Smartphone, just stop and then resume
§ Latency
– Telephony: Percolation sounds
– Smartphone: Visuals
§ Error Strategies
– VUI: NI/NM
§ Pause if no inputs in Smartphone
§ Use MM if no match on Smartphone
§ More room to help with handling errors:
§ Display what user can say
§ Offer tutorials to user
– Non-VUI
§ Telephony: Hold the caller until transaction is done.
§ Smartphone: Asynchronous alerting. Don’t need to hold the user. Send alert when
transaction done.
9. UI Strategy Differences
Behavior
Telephony
VUI
Smartphone
NUI
Pausing
Taxing
on
caller
Leverage
Visual
Latency
Limited
to
Audio
Leverage
Visual
Error
Strategies
Taxing
on
caller
Leverage
visual
Web
Service
Comple5on
Hold
the
caller
un5l
done
Message
the
user
when
done
10. Conversational NUI
- Transaction requires multiple pieces
of information
- Complex requests that can be
efficiently formulated in a sentence:
“What’s the highest it has been in the
last three months?”
- Short responses from user:
“Schwab,” “Yes,” “Market.”
- Short commands from user: “Buy 100
shares.”
11. How Visual helps Audio
• Redundancy
• Visual Confirmation
• No match issues: present
menu to select option/or give
keyboard to type
• Help: visual help more
effective than spoken help
• Complementary info: Show
bill/show device
• When visual is needed:
location in bill
• Summary of info. collected
• Enable user to quickly correct
info provided earlier
12. Pause Case
§ Pause Point: where in the
conversation did the pause occur
§ Age of Pause: how long ago?
– If resuming ordering a book from 5
minutes ago, then ask if want to
continue
– If resuming ordering a book from 5
hours ago: then ask if want to
continue + provide summary of
where left off
– If resuming ordering book from 5
days ago: then start from scratch
(maybe selections are obsolete,
etc.).