NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing


  1. 1. NLify     Lightweight  Spoken  Natural  Language  Interfaces     via  Exhaus:ve  Paraphrasing   Seungyeop  Han            U.  of  Washington   Ma@hai  Philipose,  Yun-­‐Cheng  Ju    MicrosoF  
  2. 2. Speech-­‐Based  UIs  are  Here   Ubicomp  2013   2   Today   Siri,  …   Today   Hey  Glass,  …   Tomorrow   Hey  Microwave,  …  
  3. 3. Keyphrases  Don’t  Scale   Ubicomp  2013   3   What  :me  is  it?   …   Use  Spoken  Natural  Language   App1   App2   Next  bus  to  Sea@le   App3   Tomorrow’s  weather   App50   …  App26   When  is  the  next  mee:ng   “What  &me  is  the  next  mee:ng”   …   Keyphrase  Hell  
  4. 4. Spoken  Natural  Language  (SNL)  Today:   First-­‐party  Applica:ons   “Hey,  Siri.     Do  you  love  me?”   Ubicomp  2013   4   •  Personal  assistant  model   •  Large  speech  engine    (20-­‐600GB)   •  Experts  mapping  speech  to  a  few  domains   Speech   Recogni:on   Language   Processing   Text:  “Hey  Siri…”   …  “I’m  not  allowed,  Seungyeop”  
  5. 5. NLify:  Scaling  Spoken  NL  Interfaces   1st  party  app  (e.g.,  Xbox,  Siri)   mul:ple  PhDs,  10s  of  developers   3rd  party  app    (e.g.,  intuit,  spo:fy)   0  PhDs,  1-­‐3  developers   end-­‐user  macro  (e.g.,   0  PhDs,  0  developers   10   10,000     10,000,000   #  apps   Ubicomp  2013   5  
  6. 6. Goal     Make     programming  spoken  natural  language  interfaces     as  easy  and  robust  as     programming  graphical  user  interfaces   Ubicomp  2013   6  
  7. 7. Outline   •  Mo:va:on  /  Goal   •  System  Design   •  Demonstra:on   •  Evalua:on   •  Conclusion   Ubicomp  2013   7  
  8. 8. Challenges   •  Developers  are  not  SNL  experts   •  Applica:ons  are  developed  independently   •  Cloud-­‐based  SNL  does  not  scale  as  UI     – UI  capability  must  not  rely  on  connec:vity   – UI  events  must  have  minimal  cost   Ubicomp  2013   8  
  9. 9. Specifying  GUIs   Ubicomp  2013   9   Intui:ve  defini:on  of  UI   handler  linking  to  code  
  10. 10. Specifying  Spoken  Keyphrase  UIs   <CommandPrefix>Magic  Memo</CommandPrefix>     <Command  Name="newMemo">      <ListenFor>Enter  [a]  [new]  memo</ListenFor>      <ListenFor>Make  [a]  [new]  memo</ListenFor>      <ListenFor>Start  [a]  [new]  memo</ListenFor>      <Feedback>Entering  a  new  memo</Feedback>      <Navigate  Target=“/Newmemo.xaml”>     </Command>   ...   How  does  natural  language  differ  from  keyphrases?   Ubicomp  2013   10  
  11. 11. Difference  1:  Local  Varia:on   •  Missing  words   •  Repeated  words   •  Re-­‐arranged  words   •  New  combina:ons  of  phrases   When  is  the  next  meeCng?   When  is  next  mee:ng?   When  is  the  next..  next  mee:ng?   When  the  next  mee:ng  is?   What  :me  is  the  next  mee:ng?   Ubicomp  2013   11  
  12. 12. Difference  2:  Paraphrases   show  me  the  current  :me   what  is  the  :me   :me   what  is  the  current  :me   may  i  know  the  :me  please   give  :me   show  me  the  :me   show  me  the  clock   tell  me  what  :me  it  is   what  is  :me   current  :me   tell  what  :me  it  is   list  the  :me   what  :me   what  :me  it  is  now   show  current  :me   what  :me  please   show  :me   what  is  the  :me  now   current  :me  please   say  the  :me   find  the  current  :me  please   what  :me  is  it   what  is  current  :me   what  :me  is  it  tell  me   :me  current   what's  the  :me   tell  current  :me   what  :me  is  it  now   what  :me  is  it  currently   check  :me   the  :me  now   tell  me  the  current  :me   what's  :me   :me  now   tell  me  the  :me   can  you  please  tell  me   what  :me  it  is   tell  me  current  :me   give  me  the  :me   :me  please   show  me  the  :me  now     Ubicomp  2013   12  
  13. 13. Specifying  SNL  Systems   Ubicomp  2013   13   Speech   Recogni:on   Language   Processing   whanme()   “what  :me  is  it?”   Few  rules,  lots  of  data   Use  sta:s:cal  language   models  that  require  li@le   an:cipa:on  of  local  noise   Use  data-­‐driven  models  that   require  li@le  domain   knowledge   Encode  local  varia:on  in   grammar   Encode  domain  knowledge  on   paraphrases  in  models  e.g.  CRFs   Lots  of  rules,  liFle  data  
  14. 14. Exhaus:ve  Paraphrasing  by     Automated  Crowdsourcing   Ubicomp  2013   14   Examples  from  developers   Handler:  whanme()   Descrip:on:  When  you  want  to  know  the  :me   Examples:     What  :me  is  it  now   What’s  the  :me   Tell  me  the  :me     Handler:  whanme()   Descrip:on:  When  you  want  to  know  the  :me   Examples:     What  :me  is  it  now   What’s  the  :me   Tell  me  the  :me   Current  :me   Find  the  current  :me  please   Time  now   Give  me  :me   …     following task, descrip:on   example   direc:ons   Automa:cally  generated  crowdsourcing    
  15. 15. install  :me   Seed  Examples   dev  :me   “Tell  me   when  it’s   @T=20  min   …”   SAPI   TFIDF  +   NN   NLNo:fyEvent  e   nlwidget   Compiling  SNL  Models   .What  is  the  date  @d   .Tell  me  the  date  @d   …    amplify   .What  is  the  date  @d   .Tell  me  the  date  @d   .What  date  is  it  @d   .Give  me  the  date  @d   .@d  is  what  date   …   Internet   crowdsourcing   service   Amplified  Examples   compile   Nearest   neighbor model   SLM   Sta:s:cal  Models   run  :me   Ubicomp  2013   15  
  16. 16. install  :me   dev  :me   “Tell  me   when  it’s   @T=20  min   …”   SAPI   TFIDF  +   NN   NLNo:fyEvent  e   nlwidget   SNL  Models  for  Mul:ple  Apps   Amplified     Examples   compile   Nearest   neighbor  model  SLM   Sta:s:cal     Models   run  :me   Ubicomp  2013   16   .What  is  the  date  @d   .Tell  me  the  date  @d   .What  date  is  it  @d   .Give  me  the  date  @d   .@d  is  what  date   …   Applica:on  1   •  Apps  developed  separately  =>  “late  assembly”  of  models   •  Limited  :me  for  learning  at  install  :me  =>  simple  (e.g.,  NN)  models   •  Users  no  longer  say  anything  but  what  they  have  installed  =>  “natural   language  shortcut”  mental  model   .How  much  is  @com   .Get  me  quote  for  @com   .What’s  the  price  for   @com   …   Applica:on  2   …   Applica:on  N  
  17. 17. Outline   •  Mo:va:on  /  Goal   •  System  Design   •  Demo:  SNL  interfaces  in  4  easy  steps   •  Evalua:on   •  Conclusion   Ubicomp  2013   17  
  18. 18. Ubicomp  2013   18   1.  Add  NLify  DLL  
  19. 19. 2.  Providing  Examples   Ubicomp  2013   19  
  20. 20. 3.  Wri:ng  a  Handler   Ubicomp  2013   20  
  21. 21. 4.  Adding  a  GUI  Element   Ubicomp  2013   21  
  22. 22. Ubicomp  2013   22   Enjoy  J  
  23. 23. Outline   •  Mo:va:on  /  Goal   •  System  Design   •  Demonstra:on   •  Evalua:on   •  Conclusion   Ubicomp  2013   23  
  24. 24. Evalua:on   •  How  good  are  SNL  recogni:on  rates?   •  How  does  performance  scale  with  commands?   •  How  do  design  decisions  impact  recogni:on?   •  How  prac:cal  is  on-­‐phone  implementa:on?   •  What  is  the  developer  experience?   Ubicomp  2013   24  
  25. 25. Evalua:on  Dataset   Ubicomp  2013   25   Domain   Intent  &  Slots   Example   Clock   FindTime()   What  :me  is  it?   FindDate(day)   What’s  the  date  today?   Calendar   CheckNextMtg()   What’s  my  next  mee:ng?   Bus   FindNextBus(route,  dest)   When  is  the  next  20  to  Sea@le?   Finance   FindStockPrice(company)   How  much  is  MicrosoF  stock?   CaculateTip(Money,  NumPeople)   How  much  is  the  :p  for  $20  for  three  people   CondiCon   FindWeather(day)   How  is  the  weather  tomorrow?   Contacts   FindOfficeLoca:on(person)   Where  is  the  Janet  Smith’s  office?   FindGroup(person)   Which  group  does  Ma@hai  work  in?   …   Across  27  different  commands,     collected  1612  paraphrases,  3505  audio  samples  
  26. 26. Evalua:on  Dataset   Ubicomp  2013   26   Seed   5  paraphrases/intent   By  authors   Amplify  via   Crowdsourcing   $.03/paraphrase     Crowd   ~60  paraphrases/intent   By  Crowd   Audio   130  u@erance/intent   By  20  subjects   Asking  “What  would  you  say  to  the  phone  to     do  the  described  task”  with  an  example   Training   Tes:ng  
  27. 27. Overall  Recogni:on  Performance   Ubicomp  2013   27   •  Absolute  recogni:on  rate  is  good  (avg:  85%,  std:  7%)     •  Significant  rela:ve  improvement  from  Seed  (69%)  
  28. 28. Performance  Scales  Well  with     Number  of  Commands     Ubicomp  2013   28  
  29. 29. Design  Decisions  Impact  Recogni:on  Rates   Ubicomp  2013   29   •  The  more  exhaus:ve  paraphrasing  the  be@er:   •  Sta:s:cal  model  improves  recogni:on  rate  by   16%  vs.  determinis:c  model   0%   20%   40%   60%   80%   100%   20%   40%   60%   80%   100%   RecogniCon  Rate   Training  Set  
  30. 30. Feasibility  of  Running  on  Mobiles   •  NLify  is  compe::ve  with  a  large  vocabulary  model   •  Memory  usage  is  acceptable:  maximum  memory   for  27  intents  was  32M   •  Power  consump:on  very  close  to  listening  loop   Ubicomp  2013   30   ands. plates. rithms that iden- slot recognition competitors; in a big difference, ser examination ompetitors does ons (e.g., 11, 12 (a) intent recognition (b) slots recognition Figure 7. Benefit of statistical modeling. Figure 8. Comparison to a large vocabulary model. prove noticeably between the 80 and 100% configurations, indicating that rates have likely not topped out; improvement is spread across many functions, indicating that more tem- plates are broadly beneficial; and there is a big difference be- tween the 20% and the 80% mark. The last point indicates that even had the developer added an additional dozen seeds, crowdsourcing would still have been beneficial. Given that templates may provide good coverage across para- [Average]   SLM:  85%   LV:  80%  
  31. 31. Developer  Study  w/  5  Devs   Asked  to  add  Nlify  into  the  exis:ng  programs       Ubicomp  2013   31   DescripCon   Sample  commands   Original   LOC   Time     Taken   Control  a  night  light   “turn  off  the  light”   200   30  mins   Get  sen:ment  on  Twi@er   “review  this”   2000   30  mins   Query,  control  loca:on   disclosure   “where  is  Alice?”   2800   40  mins   Query  weather   “weather  tomorrow?”   3800   70  mins   Query  bus  service   “when  is  next  545  to  Sea@le?”   8300   3  days   (+)  How  well  did  NLify’s  capabili:es  match  your  needs?   (-­‐)  Did  the  cost/benefit  of  Nlify  scale?   (-­‐)  How  long  do  you  think  you  can  afford  to  wait  crowdsourcing  
  32. 32. Conclusions   It  is  feasible  to  build  mobile  SNL  systems,  where:   •  Developers  are  not  SNL  experts   •  Applica:ons  are  developed  independently   •  All  UI  processing  happens  on  the  phone     Fast,  compact,  automaCcally  generated  models   enabled  by  exhausCve  paraphrasing  are  the  key.   Ubicomp  2013   32  
  33. 33. For  Data  and  Code   Check  Ma@hai’s  Homepage.      h@p://­‐us/people/ma@haip/     Or  e-­‐mail  the  authors     On/aVer  October  1.   Ubicomp  2013   33