Speaking commands into actions isn’t a futuristic fantasy anymore - it’s the present. Voice as a user interface has arrived and it can elevate the search and discovery experience. In this webinar, you'll learn about the complexities behind how voice search works and how it can provide immense value to your business.
8. Voice-Only
Voice in,
voice out.
Levels of Voice
Voice-First
Voice is the primary mode.
Voice-Added
Voice is just one input or
output mechanism.
Input and Output Precision
9. The right result, for the right user,
at the right time, presented in the
right way.
10. “It [was] obvious that there were
major design flaws. Callers over the
telephone were overwhelmed by the
same volume and organization of mail
headers that worked so effectively in
the graphical interface.”
From the book Voice Interaction Design
11. Default
Solutions
“You get what you pay for” is true here, too.
LIMITED
But, hey, it’s free
iOS/Android
Alexa/Google Assistant
Google Chrome
Speech to Text
Established
Providers
MORE CONTROL
With a cost
Google Cloud
Watson Speech to Text
Twilio
Up and
Comers
UNIQUE APPROACH
Assembly AI
Snips
15. There Will Be
Misunderstanding
Speech-to-Text Isn’t Perfect
The best speech-to-text hovers just over 95% accurate; this is better than humans
but not good enough.
Searchable Items Are Enums
They represent a constrained set of items, which cause problems when a user looks
for something different outside that list.
16. There Will Be
Misunderstanding
Senders
Lizzy Carl
Y. L. Stiff
Mark Dwayne
Lorenzo Gomez
Lisa Fong
Emmanuelle Smith
Amy Hall
Michael Banian
Emilie Pulisic
“What’s the latest
email from Mark
Twain?”
17. There Will Be
Misunderstanding
Speech-to-Text Isn’t Perfect
The best speech-to-text hovers just over 95% accurate; this is better than humans
but not good enough.
Searchable Items Are Enums
They represent a constrained set of items, which cause problems when a user looks
for something a little different.
Nonetheless, A Match Must be Found
Let’s examine some commonly recommended approaches...
21. What… Is Fuzzy
Matching?
Levenshtein Distance
Also known as the “minimum edit distance.”
Represents the minimum number of edits needed to change one string (i.e. “word”)
into another.
22. How It Works: Levenshtein Distance
CALCULATE
THE MINIMUM
NUMBER OF EDITS
NECESSARY
Dwayne
Twayne
Twayne
Twayne
Twaine
Twaine
Twain_
Twain
0
1
1
1
2
2
3
3
Banian
Tanian
Twanian
Twanian
Twa_ian
Twa_ian
Twa_i_n
Twain
0
1
2
2
3
3
4
4
24. 4
The
Experiment
1
2
3
Phoneticize all attributes across all
records for all algorithms
Create a list of 20 queries
Run queries with real people, with each
algorithm, with levenshtein distance
Have them specify which results (1 per
algorithm) are relevant
32. Filterable Values
In Spoken Queries
To Find Result
Slot Values From NLU
Slot values sent directly from Alexa, Dialogflow, etc.
Query Scanning
Simplest option, works well without NLU, but can be a bit blunt force.
I.e. does this query have a filterable value?
Built-in Search Engine Tooling
For example, Algolia’s query rules which apply filters or other rules from free-form
textual queries
const colors = [`red`, `blue`, `green`];
query = query.toLowercase();
const matched = colors.filter(color => {
query.indexOf(color) !== -1;
});
33. “Find me this week’s emails
about the latest budget proposal
from my team.”
Filter to reduce searchable items
type:emails
34. Personalization
and
Context
for
Voice Search
The Right Result for the Right User
Use the users’ affinities to filter results, further reducing the haystack.
Most search engines can “boost” results rather than filtering so you don’t reduce the
searchable records too far.
At the Right Time
Contextual information can filter or boost.
Information such as:
● Recent requests
● Time or date
● Number of user requests
35. “Find me this week’s emails
about the latest budget proposal
from my team.”
sent_before:now
sent_after:Sunday
team_id:9
Filter to reduce searchable items
type:emails
36. Analytics
and
Synonyms
for
Voice Search
Analytics
People are going to use different words for the same concept (pop, soda, Coke).
Prep goes a long way, but there’s no prep and assumptions that’s better than data.
Synonyms
Synonyms allow a user to search with one term, but match another equivalent term.
38. Analytics
and
Synonyms
for
Voice Search
Analytics
People are going to use different words for the same concept (pop, soda, Coke).
Prep goes a long way, but there’s no prep and assumptions that’s better than data.
Speech to text will also misunderstand people; that needs to be handled.
Synonyms
Synonyms allow a user to search with one term, but match another equivalent term.
These can “mean the same thing,” or correct for errors.
39. “Find me this week’s emails
about the latest proposal
from my team.”
sent_before:now
sent_after:Sunday
team_id:9
Filter to reduce searchable items
type:email
budget
spending
42. Sort by what’s
important.
Maybe the
latest?
Subject Sent
RE: RE: Budget Proposal for Q3 Today
RE: RE: RE: RE: Budget Proposal for Q3 Today
Question on your budget proposal Last week
2016 budget proposal 2 years ago
RE: Budget Proposal for Q3 Yesterday
43. Sort by what’s
important.
Maybe the
latest?
Subject Sent
RE: RE: Budget Proposal for Q3 Today
RE: RE: RE: RE: Budget Proposal for Q3 Today
RE: Budget Proposal for Q3 Yesterday
Question on your budget proposal Last week
2016 budget proposal 2 years ago
↓
44. Then grab all
you need.
(Voice-first
often needs just
one result.)
Subject Sent
RE: RE: Budget Proposal for Q3 Today
RE: RE: RE: RE: Budget Proposal for Q3 Today
RE: Budget Proposal for Q3 Yesterday
Question on your budget proposal Last week
2016 budget proposal 2 years ago
↓
45. Train the User to
Reduce the
Haystack
Understand What’s Best Understood
Some terms are better understood by speech to text than others.
“Turn off (on?) the lights”
“Turn out the lights”
You’ll discover this in your testing and in your analytics.
People Respond In-Kind
People will respond with the same vocabulary as their conversation partners.
Be consistent in the output to guide users to the more understandable word choices.
47. No Good Results?
Do You Really Want to Show Results?
Sometimes, asking for clarification is better than showing a best guess.
Sometimes, show a best guess is better than asking for clarification.
Different Answer for Different Use Cases
Different use cases have a different level of tolerance for best guess
“How easy is it to back out of this choice?” “Will any damage be done?”
Different Media, Different Choices
The input and output influence the decision, too.
50. Voice-Added and Voice-First
VOICE-ADDED
Aides the user in achieving a goal on-screen; the user can
always fall back to the screen when voice is unsuited to
the task.
Can better handle “close” matches.
52. Voice-Added and Voice-First
VOICE-FIRST
Return just the right information.
The right information for that customer, for that time, for
that query.
Easily allow customer to make corrections.
53. “Find me this week’s emails
about the latest proposal
from my team.”
sent_before:now
sent_after:Sunday
team_id:9
Mark every word that’s left optional
Sort by number of matching words
type:email
budget
spending
54. Natural Language Queries
Better Analytics
Optional Words
Remove Stop Words
Faceting
Ignore Plurals
Personalization
Recent Context
Interaction Insights Context Usage
Relevance
56. Takeaways
Levenshtein Distance Yes; Phonetic Algorithms No
Levenshtein distance is a low-cost way to improve relevance
Synonyms to handle variety or misunderstandings further improve results
Reduce the Haystack
Use filters within the spoken query, plus personalization and context
Decide If This Calls For a Result
Voice-added experiences are more tolerant of approximate results
Voice-first and voice-only call for user confirmation and clarification