NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing
1. NLify
Lightweight
Spoken
Natural
Language
Interfaces
via
Exhaus:ve
Paraphrasing
Seungyeop
Han
U.
of
Washington
Ma@hai
Philipose,
Yun-‐Cheng
Ju
MicrosoF
2. Speech-‐Based
UIs
are
Here
Ubicomp
2013
2
Today
Siri,
…
Today
Hey
Glass,
…
Tomorrow
Hey
Microwave,
…
3. Keyphrases
Don’t
Scale
Ubicomp
2013
3
What
:me
is
it?
…
Use
Spoken
Natural
Language
App1
App2
Next
bus
to
Sea@le
App3
Tomorrow’s
weather
App50
…
App26
When
is
the
next
mee:ng
“What
&me
is
the
next
mee:ng”
…
Keyphrase
Hell
4. Spoken
Natural
Language
(SNL)
Today:
First-‐party
Applica:ons
“Hey,
Siri.
Do
you
love
me?”
Ubicomp
2013
4
• Personal
assistant
model
• Large
speech
engine
(20-‐600GB)
• Experts
mapping
speech
to
a
few
domains
Speech
Recogni:on
Language
Processing
Text:
“Hey
Siri…”
…
“I’m
not
allowed,
Seungyeop”
8. Challenges
• Developers
are
not
SNL
experts
• Applica:ons
are
developed
independently
• Cloud-‐based
SNL
does
not
scale
as
UI
– UI
capability
must
not
rely
on
connec:vity
– UI
events
must
have
minimal
cost
Ubicomp
2013
8
10. Specifying
Spoken
Keyphrase
UIs
<CommandPrefix>Magic
Memo</CommandPrefix>
<Command
Name="newMemo">
<ListenFor>Enter
[a]
[new]
memo</ListenFor>
<ListenFor>Make
[a]
[new]
memo</ListenFor>
<ListenFor>Start
[a]
[new]
memo</ListenFor>
<Feedback>Entering
a
new
memo</Feedback>
<Navigate
Target=“/Newmemo.xaml”>
</Command>
...
How
does
natural
language
differ
from
keyphrases?
Ubicomp
2013
10
11. Difference
1:
Local
Varia:on
• Missing
words
• Repeated
words
• Re-‐arranged
words
• New
combina:ons
of
phrases
When
is
the
next
meeCng?
When
is
next
mee:ng?
When
is
the
next..
next
mee:ng?
When
the
next
mee:ng
is?
What
:me
is
the
next
mee:ng?
Ubicomp
2013
11
12. Difference
2:
Paraphrases
show
me
the
current
:me
what
is
the
:me
:me
what
is
the
current
:me
may
i
know
the
:me
please
give
:me
show
me
the
:me
show
me
the
clock
tell
me
what
:me
it
is
what
is
:me
current
:me
tell
what
:me
it
is
list
the
:me
what
:me
what
:me
it
is
now
show
current
:me
what
:me
please
show
:me
what
is
the
:me
now
current
:me
please
say
the
:me
find
the
current
:me
please
what
:me
is
it
what
is
current
:me
what
:me
is
it
tell
me
:me
current
what's
the
:me
tell
current
:me
what
:me
is
it
now
what
:me
is
it
currently
check
:me
the
:me
now
tell
me
the
current
:me
what's
:me
:me
now
tell
me
the
:me
can
you
please
tell
me
what
:me
it
is
tell
me
current
:me
give
me
the
:me
:me
please
show
me
the
:me
now
Ubicomp
2013
12
13. Specifying
SNL
Systems
Ubicomp
2013
13
Speech
Recogni:on
Language
Processing
whanme()
“what
:me
is
it?”
Few
rules,
lots
of
data
Use
sta:s:cal
language
models
that
require
li@le
an:cipa:on
of
local
noise
Use
data-‐driven
models
that
require
li@le
domain
knowledge
Encode
local
varia:on
in
grammar
Encode
domain
knowledge
on
paraphrases
in
models
e.g.
CRFs
Lots
of
rules,
liFle
data
14. Exhaus:ve
Paraphrasing
by
Automated
Crowdsourcing
Ubicomp
2013
14
Examples
from
developers
Handler:
whanme()
Descrip:on:
When
you
want
to
know
the
:me
Examples:
What
:me
is
it
now
What’s
the
:me
Tell
me
the
:me
Handler:
whanme()
Descrip:on:
When
you
want
to
know
the
:me
Examples:
What
:me
is
it
now
What’s
the
:me
Tell
me
the
:me
Current
:me
Find
the
current
:me
please
Time
now
Give
me
:me
…
following task,
descrip:on
example
direc:ons
Automa:cally
generated
crowdsourcing
15. install
:me
Seed
Examples
dev
:me
“Tell
me
when
it’s
@T=20
min
…”
SAPI
TFIDF
+
NN
NLNo:fyEvent
e
nlwidget
Compiling
SNL
Models
.What
is
the
date
@d
.Tell
me
the
date
@d
…
amplify
.What
is
the
date
@d
.Tell
me
the
date
@d
.What
date
is
it
@d
.Give
me
the
date
@d
.@d
is
what
date
…
Internet
crowdsourcing
service
Amplified
Examples
compile
Nearest
neighbor
model
SLM
Sta:s:cal
Models
run
:me
Ubicomp
2013
15
16. install
:me
dev
:me
“Tell
me
when
it’s
@T=20
min
…”
SAPI
TFIDF
+
NN
NLNo:fyEvent
e
nlwidget
SNL
Models
for
Mul:ple
Apps
Amplified
Examples
compile
Nearest
neighbor
model
SLM
Sta:s:cal
Models
run
:me
Ubicomp
2013
16
.What
is
the
date
@d
.Tell
me
the
date
@d
.What
date
is
it
@d
.Give
me
the
date
@d
.@d
is
what
date
…
Applica:on
1
• Apps
developed
separately
=>
“late
assembly”
of
models
• Limited
:me
for
learning
at
install
:me
=>
simple
(e.g.,
NN)
models
• Users
no
longer
say
anything
but
what
they
have
installed
=>
“natural
language
shortcut”
mental
model
.How
much
is
@com
.Get
me
quote
for
@com
.What’s
the
price
for
@com
…
Applica:on
2
…
Applica:on
N
24. Evalua:on
• How
good
are
SNL
recogni:on
rates?
• How
does
performance
scale
with
commands?
• How
do
design
decisions
impact
recogni:on?
• How
prac:cal
is
on-‐phone
implementa:on?
• What
is
the
developer
experience?
Ubicomp
2013
24
25. Evalua:on
Dataset
Ubicomp
2013
25
Domain
Intent
&
Slots
Example
Clock
FindTime()
What
:me
is
it?
FindDate(day)
What’s
the
date
today?
Calendar
CheckNextMtg()
What’s
my
next
mee:ng?
Bus
FindNextBus(route,
dest)
When
is
the
next
20
to
Sea@le?
Finance
FindStockPrice(company)
How
much
is
MicrosoF
stock?
CaculateTip(Money,
NumPeople)
How
much
is
the
:p
for
$20
for
three
people
CondiCon
FindWeather(day)
How
is
the
weather
tomorrow?
Contacts
FindOfficeLoca:on(person)
Where
is
the
Janet
Smith’s
office?
FindGroup(person)
Which
group
does
Ma@hai
work
in?
…
Across
27
different
commands,
collected
1612
paraphrases,
3505
audio
samples
26. Evalua:on
Dataset
Ubicomp
2013
26
Seed
5
paraphrases/intent
By
authors
Amplify
via
Crowdsourcing
$.03/paraphrase
Crowd
~60
paraphrases/intent
By
Crowd
Audio
130
u@erance/intent
By
20
subjects
Asking
“What
would
you
say
to
the
phone
to
do
the
described
task”
with
an
example
Training
Tes:ng
27. Overall
Recogni:on
Performance
Ubicomp
2013
27
• Absolute
recogni:on
rate
is
good
(avg:
85%,
std:
7%)
• Significant
rela:ve
improvement
from
Seed
(69%)
29. Design
Decisions
Impact
Recogni:on
Rates
Ubicomp
2013
29
• The
more
exhaus:ve
paraphrasing
the
be@er:
• Sta:s:cal
model
improves
recogni:on
rate
by
16%
vs.
determinis:c
model
0%
20%
40%
60%
80%
100%
20%
40%
60%
80%
100%
RecogniCon
Rate
Training
Set
30. Feasibility
of
Running
on
Mobiles
• NLify
is
compe::ve
with
a
large
vocabulary
model
• Memory
usage
is
acceptable:
maximum
memory
for
27
intents
was
32M
• Power
consump:on
very
close
to
listening
loop
Ubicomp
2013
30
ands.
plates.
rithms that iden-
slot recognition
competitors; in
a big difference,
ser examination
ompetitors does
ons (e.g., 11, 12
(a) intent recognition (b) slots recognition
Figure 7. Benefit of statistical modeling.
Figure 8. Comparison to a large vocabulary model.
prove noticeably between the 80 and 100% configurations,
indicating that rates have likely not topped out; improvement
is spread across many functions, indicating that more tem-
plates are broadly beneficial; and there is a big difference be-
tween the 20% and the 80% mark. The last point indicates
that even had the developer added an additional dozen seeds,
crowdsourcing would still have been beneficial.
Given that templates may provide good coverage across para-
[Average]
SLM:
85%
LV:
80%
31. Developer
Study
w/
5
Devs
Asked
to
add
Nlify
into
the
exis:ng
programs
Ubicomp
2013
31
DescripCon
Sample
commands
Original
LOC
Time
Taken
Control
a
night
light
“turn
off
the
light”
200
30
mins
Get
sen:ment
on
Twi@er
“review
this”
2000
30
mins
Query,
control
loca:on
disclosure
“where
is
Alice?”
2800
40
mins
Query
weather
“weather
tomorrow?”
3800
70
mins
Query
bus
service
“when
is
next
545
to
Sea@le?”
8300
3
days
(+)
How
well
did
NLify’s
capabili:es
match
your
needs?
(-‐)
Did
the
cost/benefit
of
Nlify
scale?
(-‐)
How
long
do
you
think
you
can
afford
to
wait
crowdsourcing
32. Conclusions
It
is
feasible
to
build
mobile
SNL
systems,
where:
• Developers
are
not
SNL
experts
• Applica:ons
are
developed
independently
• All
UI
processing
happens
on
the
phone
Fast,
compact,
automaCcally
generated
models
enabled
by
exhausCve
paraphrasing
are
the
key.
Ubicomp
2013
32
33. For
Data
and
Code
Check
Ma@hai’s
Homepage.
h@p://research.microsoF.com/en-‐us/people/ma@haip/
Or
e-‐mail
the
authors
On/aVer
October
1.
Ubicomp
2013
33