Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dekang Lin at AI Frontiers: Adding Conversation to GUIs

483 views

Published on

Most AI assistants on mobile phones uses a conversational user interface (CUI) that mimics a chat app and translates user requests to API calls to backend services. I will present Conversational GUI (CGUI) which provides a thin layer of conversational interaction on top of existing GUI of mobile apps, by translating user requests into sequences of GUI actions such as clicks and swipes that user would have to perform by themselves. CUI avoids rebuilding existing user experiences in a chat window. More importantly, it makes it possible for end users, instead of software engineers, to create new skills by providing pairs of natural language expressions and a demonstration of the GUI actions.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Dekang Lin at AI Frontiers: Adding Conversation to GUIs

  1. 1. Adding Conversation to GUIs Dekang Lin Naturali 1
  2. 2. A Tale of Two Uber Rides uber ride to crowne plaza sfo
  3. 3. Naturali A Beijing-based startup company Upgrade apps with a speech interface Naturali Sesami ✦ Translate speech inputs to action sequences in apps and execute them on users’ behalf. ✦ Chinese version launched on LeTV phones as a system app on April 12, 2017 ✦ Available as a third party app all Android phones since Aug. 2017
  4. 4. Advantages of Speech Speed ✦ voice input is three times as fast as typing Hand-free: ✦ send messages, play music, order food ✦ turn on hotspot: 5 clicks Mind-free: ✦ where is my luggage?
  5. 5. Voice Assistants Chat window Fulfillment by backend API calls
  6. 6. Chat + API: the down sides Chat assistants displace apps, but Chat is not the best mode of interaction for everything. editing browsing viewing None the less, there are plenty of needs for voice interaction. who has access to this?
  7. 7. Who has access? Just ask
  8. 8. Chat + API: the down sides Re-invention of user experience inside the chat window: ✦ usually not as good as specialized apps, ✦ requires a great deal of repeated development effort
  9. 9. Chat + API: the down sides Re-invention of user experience inside the chat window: ✦ usually not as good as specialized apps, ✦ requires a great deal of development effort
  10. 10. Chat + API: the down sides Economic interests of the assistant and the backend services may not be aligned.
  11. 11. Naturali Sesami A thin, transparent translation layer over apps. ✦ voice ➜ front end UI actions Seamless integration of speech and graphics ✦ Existing GUI interactions are still available ✦ Making voice interaction available on any app page
  12. 12. Use Yelp to find greek food near Santa Clara Convention Center
  13. 13. Voice to Actions in Three Steps Speech Recognition: sound → text ✦ data Semantic Interpretation: text → intent ✦ knowledge Plan Generation: intent → actions ✦ grounding
  14. 14. Speech Recognition: sound → text Third party services Open source tools
  15. 15. Naturali Speech End-to-end DNN: CNN+LSTM+Attention+CTC ✦ built from scratch with TensorFlow ✦ trained with thousands of hours of transcribed speech Personalized and contextualized language model: ✦ contact names ✦ app specific vocabulary
  16. 16. Semantic Interpretation: text → intent An intent identifies a task and the necessary information (parameters) for the task Example: ✦ task: FlightSearch ✦ parameters: (to, from, date, airline, class)
  17. 17. Entities and Types Persons: singers/directors/contacts Locations: cities/POIs/addresses Apps and Games Media: songs/shows/movies/books Time and Date Food Sports teams ……
  18. 18. Recognizing Thousands of Types It is not an option to use manually labeled training examples. An alternative is to use naturally annotated data: ✦ Hearst patterns: NPtype such as NPinst ✦ Other examples: navigate to NPloc
  19. 19. Multi-round Conversation Complex intents may not be articulated in one shot ✦ FlightSearch(to, from, date, airline, class) A multi-round conversation incrementally collects information from user and guides the user in the process.
  20. 20. Dialog Management
  21. 21. Composite Intents Messenger chat with Alex and say let’s meet on saturday ✦ OpenMessenger ✦ ChatWithPerson ✦ SendMessage get a uber black ride to SFO ✦ UberRide ✦ SetDest ✦ SelectUberBlack
  22. 22. Messenger Chat
  23. 23. Plan Generation: intent → actions Grounding: establishes the connection between in the inside (the assistant) and the outside (apps and devices). Example: ✦ intent: {“task”: “FlightStatus”, “number”:”UA888”, “date”:”2017-11-04”} ✦ action: select * from flight_db where “airline”=“United Airlines”, flight_num = “888” and year=2017 and month=11 and day=4
  24. 24. Actions on Google grounding
  25. 25. What is my data usage?
  26. 26. Teaching a New Skill
  27. 27. Grounding by Crowd Sourcing context expression actions Skills=
  28. 28. Crowd Sourced Skills Skills are immediately usable by the creator. ✦ The user may share the skills with others, e.g., tech support for parents Vetted skills can be made available to the public
  29. 29. Summary Voice interaction is inevitable Naturali Sesami translates user requests into sequences of actions in APPs. Sesami grows by crowd sourcing skills. Join US! ✦ jobs@naturali.ai

×