Voice search lessons

ALGOLIA WEBINAR
Voice Search Lessons

71%
Voice Usage
32%search via voice daily
would rather use voice than
keyboard to search
PWC

Voice doesn’t have to be a back-
and-forth

Voice doesn’t have to be a back-
and-forth
(though it can be)

Alexa and Google Assistant are only
part of voice

Alexa and Google Assistant are only
part of voice
(Don’t forget mobile)

Voice-Only
Voice in,
voice out.
Levels of Voice
Voice-First
Voice is the primary mode.
Voice-Added
Voice is just one input or
output mechanism.

Voice-Only
Voice in,
voice out.
Levels of Voice
Voice-First
Voice is the primary mode.
Voice-Added
Voice is just one input or
output mechanism.
Input and Output Precision

The right result, for the right user,
at the right time, presented in the
right way.

“It [was] obvious that there were
major design flaws. Callers over the
telephone were overwhelmed by the
same volume and organization of mail
headers that worked so effectively in
the graphical interface.”
From the book Voice Interaction Design

Default
Solutions
“You get what you pay for” is true here, too.
LIMITED
But, hey, it’s free
iOS/Android
Alexa/Google Assistant
Google Chrome
Speech to Text
Established
Providers
MORE CONTROL
With a cost
Google Cloud
Watson Speech to Text
Twilio
Up and
Comers
UNIQUE APPROACH
Assembly AI
Snips

There Will Be
Misunderstanding
Speech-to-Text Isn’t Perfect
The best speech-to-text hovers just over 95% accurate; this is better than humans
but not as precise as typing.

There Will Be
Misunderstanding
“What’s the latest email
from
Mark Dwayne?”

There Will Be
Misunderstanding
but not good enough.
Searchable Items Are Enums
They represent a constrained set of items, which cause problems when a user looks
for something different outside that list.

There Will Be
Misunderstanding
Senders
Lizzy Carl
Y. L. Stiff
Mark Dwayne
Lorenzo Gomez
Lisa Fong
Emmanuelle Smith
Amy Hall
Michael Banian
Emilie Pulisic
“What’s the latest
email from Mark
Twain?”

There Will Be
Misunderstanding
but not good enough.
Searchable Items Are Enums
They represent a constrained set of items, which cause problems when a user looks
for something a little different.
Nonetheless, A Match Must be Found
Let’s examine some commonly recommended approaches...

“Hey, you’ve gotta use fuzzy
matching.”

What… Is Fuzzy
Matching?
Phonetic Algorithms
These algorithms transform words into approximate phonetic representations.
A sample:
● Soundex
● Fuzzy Soundex
● Lein
● Metaphone
● Double Metaphone
● Metaphone 3 (commercially licensed)
● NYSIIS
● ONCA
● Roger Root
● Phonex

Different Encodings
350TWN1111111 D500
TWN
DWAYN
DWAN D5
01200000000000TND200
01200
376000 DWN

What… Is Fuzzy
Matching?
Levenshtein Distance
Also known as the “minimum edit distance.”
Represents the minimum number of edits needed to change one string (i.e. “word”)
into another.

How It Works: Levenshtein Distance
CALCULATE
THE MINIMUM
NUMBER OF EDITS
NECESSARY
Dwayne
Twayne
Twayne
Twayne
Twaine
Twaine
Twain_
Twain
0
1
1
1
2
2
3
3
Banian
Tanian
Twanian
Twanian
Twa_ian
Twa_ian
Twa_i_n
Twain
0
1
2
2
3
3
4
4

Okay,
let’s stop assuming.
Let’s test it.

4
The
Experiment
1
2
3
Phoneticize all attributes across all
records for all algorithms
Create a list of 20 queries
Run queries with real people, with each
algorithm, with levenshtein distance
Have them specify which results (1 per
algorithm) are relevant

Algorithm Performance
THE RAW TEXT
CONSISTENTLY
OUT-PERFORMED
EVERY
ALGORITHM

Algorithm Performance
OVER 75%
ACCURACY
WITHOUT
RELEVANCE
OPTIMIZATION

Then… How Do We
Build Relevance?

“Find me this week’s emails
about the latest budget proposal
from my team.”
Our initial query

from my team.”
Remove inconsequential (stop)
words

Filter First
100%
The entire
haystack
A filtered, more
manageable
haystack
[

How to
Reduce the
Haystack
1
2
3
Look for filterable values in query
Apply personalization
Take overall context

Filterable Values
In Spoken Queries
To Find Result
Slot Values From NLU
Slot values sent directly from Alexa, Dialogflow, etc.
Query Scanning
Simplest option, works well without NLU, but can be a bit blunt force.
I.e. does this query have a filterable value?
Built-in Search Engine Tooling
For example, Algolia’s query rules which apply filters or other rules from free-form
textual queries
const colors = [`red`, `blue`, `green`];
query = query.toLowercase();
const matched = colors.filter(color => {
query.indexOf(color) !== -1;
});

from my team.”
Filter to reduce searchable items
type:emails

Personalization
and
Context
for
Voice Search
The Right Result for the Right User
Use the users’ affinities to filter results, further reducing the haystack.
Most search engines can “boost” results rather than filtering so you don’t reduce the
searchable records too far.
At the Right Time
Contextual information can filter or boost.
Information such as:
● Recent requests
● Time or date
● Number of user requests

from my team.”
sent_before:now
sent_after:Sunday
team_id:9
type:emails

Analytics
and
Synonyms
for
Voice Search
Analytics
People are going to use different words for the same concept (pop, soda, Coke).
Prep goes a long way, but there’s no prep and assumptions that’s better than data.
Synonyms
Synonyms allow a user to search with one term, but match another equivalent term.

Raw Performance
STT Correctly
Understood
and Good
Result
Overall Good
Result

Analytics
and
Synonyms
for
Voice Search
Analytics
People are going to use different words for the same concept (pop, soda, Coke).
Prep goes a long way, but there’s no prep and assumptions that’s better than data.
Speech to text will also misunderstand people; that needs to be handled.
Synonyms
Synonyms allow a user to search with one term, but match another equivalent term.
These can “mean the same thing,” or correct for errors.

about the latest proposal
from my team.”
sent_before:now
sent_after:Sunday
team_id:9
type:email
budget
spending

Probably more
than one result,
so you need
some way to
pick the best

Sort them, then
take what you
need

Sort by what’s
important.
Maybe the
latest?
Subject Sent
RE: RE: Budget Proposal for Q3 Today
RE: RE: RE: RE: Budget Proposal for Q3 Today
Question on your budget proposal Last week
2016 budget proposal 2 years ago
RE: Budget Proposal for Q3 Yesterday

Sort by what’s
important.
Maybe the
latest?
Subject Sent
↓

Then grab all
you need.
(Voice-first
often needs just
one result.)
Subject Sent
↓

Train the User to
Reduce the
Haystack
Understand What’s Best Understood
Some terms are better understood by speech to text than others.
“Turn off (on?) the lights”
“Turn out the lights”
You’ll discover this in your testing and in your analytics.
People Respond In-Kind
People will respond with the same vocabulary as their conversation partners.
Be consistent in the output to guide users to the more understandable word choices.

But What If There Is
No Perfect Result?

No Good Results?
Do You Really Want to Show Results?
Sometimes, asking for clarification is better than showing a best guess.
Sometimes, show a best guess is better than asking for clarification.
Different Answer for Different Use Cases
Different use cases have a different level of tolerance for best guess
“How easy is it to back out of this choice?” “Will any damage be done?”
Different Media, Different Choices
The input and output influence the decision, too.

Media
1
2
3
Voice-Only
Voice-First
Voice-Added

Voice-Added and Voice-First
VOICE-ADDED VOICE-FIRST

VOICE-ADDED
Aides the user in achieving a goal on-screen; the user can
always fall back to the screen when voice is unsuited to
the task.
Can better handle “close” matches.

VOICE-FIRST
Voice is the primary (or only) input and output
mechanism. Less precision than with a display and
keyboard input.

VOICE-FIRST
Return just the right information.
The right information for that customer, for that time, for
that query.
Easily allow customer to make corrections.

about the latest proposal
from my team.”
sent_before:now
sent_after:Sunday
team_id:9
Mark every word that’s left optional
Sort by number of matching words
type:email
budget
spending

Natural Language Queries
Better Analytics
Optional Words
Remove Stop Words
Faceting
Ignore Plurals
Personalization
Recent Context
Interaction Insights Context Usage
Relevance

Takeaways
Levenshtein Distance Yes; Phonetic Algorithms No
Levenshtein distance is a low-cost way to improve relevance
Synonyms to handle variety or misunderstandings further improve results
Reduce the Haystack
Use filters within the spoken query, plus personalization and context
Decide If This Calls For a Result
Voice-added experiences are more tolerant of approximate results
Voice-first and voice-only call for user confirmation and clarification

Thank you!
algolia.com/solutions/voice-search
CONFIDENTIAL

Voice search lessons

Recommended

Recommended

More Related Content

Similar to Voice search lessons

Similar to Voice search lessons (20)

Recently uploaded

Recently uploaded (20)

Voice search lessons

Editor's Notes