NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

NLify

Lightweight
Spoken
Natural
Language
Interfaces

via
Exhaus:ve
Paraphrasing

Seungyeop
Han

U.
of
Washington

Ma@hai
Philipose,
Yun-‐Cheng
Ju

MicrosoF

Speech-‐Based
UIs
are
Here

Ubicomp
2013
2

Today

Siri,
…

Today

Hey
Glass,
…

Tomorrow

Hey
Microwave,
…

Keyphrases
Don’t
Scale

Ubicomp
2013
3

What
:me
is
it?

…

Use
Spoken
Natural
Language

App1

App2
Next
bus
to
Sea@le

App3
Tomorrow’s
weather

App50
…
App26
When
is
the
next
mee:ng
“What
&me
is
the
next
mee:ng”

…

Keyphrase
Hell

Spoken
Natural
Language
(SNL)
Today:

First-‐party
Applica:ons

“Hey,
Siri.

Do
you
love
me?”

Ubicomp
2013
4

•  Personal
assistant
model

•  Large
speech
engine

(20-‐600GB)

•  Experts
mapping
speech
to
a
few
domains

Speech

Recogni:on

Language

Processing

Text:
“Hey
Siri…”

…
“I’m
not
allowed,
Seungyeop”

NLify:
Scaling
Spoken
NL
Interfaces

1st
party
app
(e.g.,
Xbox,
Siri)

mul:ple
PhDs,
10s
of
developers

3rd
party
app

(e.g.,
intuit,
spo:fy)

0
PhDs,
1-‐3
developers

end-‐user
macro
(e.g.,
iF@.com)

0
PhDs,
0
developers

10

10,000

10,000,000

#
apps

Ubicomp
2013
5

Goal

Make

programming
spoken
natural
language
interfaces

as
easy
and
robust
as

programming
graphical
user
interfaces

Ubicomp
2013
6

Outline

•  Mo:va:on
/
Goal

•  System
Design

•  Demonstra:on

•  Evalua:on

•  Conclusion

Ubicomp
2013
7

Challenges

•  Developers
are
not
SNL
experts

•  Applica:ons
are
developed
independently

•  Cloud-‐based
SNL
does
not
scale
as
UI

– UI
capability
must
not
rely
on
connec:vity

– UI
events
must
have
minimal
cost

Ubicomp
2013
8

Specifying
GUIs

Ubicomp
2013
9

Intui:ve
deﬁni:on
of
UI
handler
linking
to
code

Specifying
Spoken
Keyphrase
UIs

<CommandPrefix>Magic
Memo</CommandPrefix>

<Command
Name="newMemo">

<ListenFor>Enter
[a]
[new]
memo</ListenFor>

<ListenFor>Make
[a]
[new]
memo</ListenFor>

<ListenFor>Start
[a]
[new]
memo</ListenFor>

<Feedback>Entering
a
new
memo</Feedback>

<Navigate
Target=“/Newmemo.xaml”>

</Command>

...

How
does
natural
language
diﬀer
from
keyphrases?

Ubicomp
2013
10

Diﬀerence
1:
Local
Varia:on

•  Missing
words

•  Repeated
words

•  Re-‐arranged
words

•  New
combina:ons
of
phrases

When
is
the
next
meeCng?

When
is
next
mee:ng?

When
is
the
next..
next
mee:ng?

When
the
next
mee:ng
is?

What
:me
is
the
next
mee:ng?

Ubicomp
2013
11

Diﬀerence
2:
Paraphrases

show
me
the
current
:me

what
is
the
:me

:me

what
is
the
current
:me

may
i
know
the
:me
please

give
:me

show
me
the
:me

show
me
the
clock

tell
me
what
:me
it
is

what
is
:me

current
:me

tell
what
:me
it
is

list
the
:me

what
:me

what
:me
it
is
now

show
current
:me

what
:me
please

show
:me

what
is
the
:me
now

current
:me
please

say
the
:me

ﬁnd
the
current
:me
please

what
:me
is
it

what
is
current
:me

what
:me
is
it
tell
me

:me
current

what's
the
:me

tell
current
:me

what
:me
is
it
now

what
:me
is
it
currently

check
:me

the
:me
now

tell
me
the
current
:me

what's
:me

:me
now

tell
me
the
:me

can
you
please
tell
me

what
:me
it
is

tell
me
current
:me

give
me
the
:me

:me
please

show
me
the
:me
now

Ubicomp
2013
12

Specifying
SNL
Systems

Ubicomp
2013
13

Speech

Recogni:on

Language

Processing

whanme()

“what
:me
is
it?”

Few
rules,
lots
of
data

Use
sta:s:cal
language

models
that
require
li@le

an:cipa:on
of
local
noise

Use
data-‐driven
models
that

require
li@le
domain

knowledge

Encode
local
varia:on
in

grammar

Encode
domain
knowledge
on

paraphrases
in
models
e.g.
CRFs

Lots
of
rules,
liFle
data

Exhaus:ve
Paraphrasing
by

Automated
Crowdsourcing

Ubicomp
2013
14

Examples
from
developers

Handler:
whanme()

Descrip:on:
When
you
want
to
know
the
:me

Examples:

What
:me
is
it
now

What’s
the
:me

Tell
me
the
:me

Handler:
whanme()

Descrip:on:
When
you
want
to
know
the
:me

Examples:

What
:me
is
it
now

What’s
the
:me

Tell
me
the
:me

Current
:me

Find
the
current
:me
please

Time
now

Give
me
:me

…

following task,
descrip:on

example

direc:ons

Automa:cally
generated
crowdsourcing

install
:me

Seed
Examples

dev
:me

“Tell
me

when
it’s

@T=20
min

…”

SAPI

TFIDF
+

NN
NLNo:fyEvent
e

nlwidget

Compiling
SNL
Models

.What
is
the
date
@d

.Tell
me
the
date
@d

…

amplify

.What
is
the
date
@d

.Tell
me
the
date
@d

.What
date
is
it
@d

.Give
me
the
date
@d

.@d
is
what
date

…

Internet

crowdsourcing

service

Ampliﬁed
Examples

compile

Nearest

neighbor
model

SLM

Sta:s:cal
Models

run
:me

Ubicomp
2013
15

install
:me

dev
:me

“Tell
me

when
it’s

@T=20
min

…”

SAPI

TFIDF
+

NN
NLNo:fyEvent
e

nlwidget

SNL
Models
for
Mul:ple
Apps

Ampliﬁed

Examples

compile

Nearest

neighbor
model
SLM

Sta:s:cal

Models

run
:me

Ubicomp
2013
16

.What
is
the
date
@d

.Tell
me
the
date
@d

.What
date
is
it
@d

.Give
me
the
date
@d

.@d
is
what
date

…

Applica:on
1

•  Apps
developed
separately
=>
“late
assembly”
of
models

•  Limited
:me
for
learning
at
install
:me
=>
simple
(e.g.,
NN)
models

•  Users
no
longer
say
anything
but
what
they
have
installed
=>
“natural

language
shortcut”
mental
model

.How
much
is
@com

.Get
me
quote
for
@com

.What’s
the
price
for

@com

…

Applica:on
2

…

Applica:on
N

Outline

•  Mo:va:on
/
Goal

•  System
Design

•  Demo:
SNL
interfaces
in
4
easy
steps

•  Evalua:on

•  Conclusion

Ubicomp
2013
17

Ubicomp
2013
18

1.
Add
NLify
DLL

2.
Providing
Examples

Ubicomp
2013
19

3.
Wri:ng
a
Handler

Ubicomp
2013
20

4.
Adding
a
GUI
Element

Ubicomp
2013
21

Ubicomp
2013
22

Enjoy
J

Outline

•  Mo:va:on
/
Goal

•  System
Design

•  Demonstra:on

•  Evalua:on

•  Conclusion

Ubicomp
2013
23

Evalua:on

•  How
good
are
SNL
recogni:on
rates?

•  How
does
performance
scale
with
commands?

•  How
do
design
decisions
impact
recogni:on?

•  How
prac:cal
is
on-‐phone
implementa:on?

•  What
is
the
developer
experience?

Ubicomp
2013
24

Evalua:on
Dataset

Ubicomp
2013
25

Domain
Intent
&
Slots
Example

Clock
FindTime()
What
:me
is
it?

FindDate(day)
What’s
the
date
today?

Calendar
CheckNextMtg()
What’s
my
next
mee:ng?

Bus
FindNextBus(route,
dest)
When
is
the
next
20
to
Sea@le?

Finance
FindStockPrice(company)
How
much
is
MicrosoF
stock?

CaculateTip(Money,
NumPeople)
How
much
is
the
:p
for
$20
for
three
people

CondiCon
FindWeather(day)
How
is
the
weather
tomorrow?

Contacts
FindOfficeLoca:on(person)
Where
is
the
Janet
Smith’s
office?

FindGroup(person)
Which
group
does
Ma@hai
work
in?

…

Across
27
different
commands,

collected
1612
paraphrases,
3505
audio
samples

Evalua:on
Dataset

Ubicomp
2013
26

Seed

5
paraphrases/intent

By
authors
Amplify
via

Crowdsourcing

$.03/paraphrase

Crowd

~60
paraphrases/intent

By
Crowd

Audio

130
u@erance/intent

By
20
subjects

Asking
“What
would
you
say
to
the
phone
to

do
the
described
task”
with
an
example

Training

Tes:ng

Overall
Recogni:on
Performance

Ubicomp
2013
27

•  Absolute
recogni:on
rate
is
good
(avg:
85%,
std:
7%)

•  Signiﬁcant
rela:ve
improvement
from
Seed
(69%)

Performance
Scales
Well
with

Number
of
Commands

Ubicomp
2013
28

Design
Decisions
Impact
Recogni:on
Rates

Ubicomp
2013
29

•  The
more
exhaus:ve
paraphrasing
the
be@er:

•  Sta:s:cal
model
improves
recogni:on
rate
by

16%
vs.
determinis:c
model

0%

20%

40%

60%

80%

100%

20%
40%
60%
80%
100%

RecogniCon
Rate

Training
Set

Feasibility
of
Running
on
Mobiles

•  NLify
is
compe::ve
with
a
large
vocabulary
model

•  Memory
usage
is
acceptable:
maximum
memory

for
27
intents
was
32M

•  Power
consump:on
very
close
to
listening
loop

Ubicomp
2013
30

ands.
plates.
rithms that iden-
slot recognition
competitors; in
a big difference,
ser examination
ompetitors does
ons (e.g., 11, 12
(a) intent recognition (b) slots recognition
Figure 7. Benefit of statistical modeling.
Figure 8. Comparison to a large vocabulary model.
prove noticeably between the 80 and 100% configurations,
indicating that rates have likely not topped out; improvement
is spread across many functions, indicating that more tem-
plates are broadly beneficial; and there is a big difference be-
tween the 20% and the 80% mark. The last point indicates
that even had the developer added an additional dozen seeds,
crowdsourcing would still have been beneficial.
Given that templates may provide good coverage across para-
[Average]

SLM:
85%

LV:
80%

Developer
Study
w/
5
Devs

Asked
to
add
Nlify
into
the
exis:ng
programs

Ubicomp
2013
31

DescripCon
Sample
commands
Original

LOC

Time

Taken

Control
a
night
light
“turn
off
the
light”
200
30
mins

Get
sen:ment
on
Twi@er
“review
this”
2000
30
mins

Query,
control
loca:on

disclosure

“where
is
Alice?”
2800
40
mins

Query
weather
“weather
tomorrow?”
3800
70
mins

Query
bus
service
“when
is
next
545
to
Sea@le?”
8300
3
days

(+)
How
well
did
NLify’s
capabili:es
match
your
needs?

(-‐)
Did
the
cost/benefit
of
Nlify
scale?

(-‐)
How
long
do
you
think
you
can
afford
to
wait
crowdsourcing

Conclusions

It
is
feasible
to
build
mobile
SNL
systems,
where:

•  Developers
are
not
SNL
experts

•  Applica:ons
are
developed
independently

•  All
UI
processing
happens
on
the
phone

Fast,
compact,
automaCcally
generated
models

enabled
by
exhausCve
paraphrasing
are
the
key.

Ubicomp
2013
32

For
Data
and
Code

Check
Ma@hai’s
Homepage.

h@p://research.microsoF.com/en-‐us/people/ma@haip/

Or
e-‐mail
the
authors

On/aVer
October
1.

Ubicomp
2013
33

NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Recommended

Recommended

More Related Content

Similar to NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Similar to NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing (20)

Recently uploaded

Recently uploaded (20)

NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing