Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Samsung voice intelligence outline

"Building a next generation Speech & NLU Engine: In pursuit of a Multi-modal experience for Bixby"

  • Login to see the comments

Samsung voice intelligence outline

  1. 1. Taking the Road Less Travelled: In pursuit of a Multi-modal experience for Bixby Samsung R&D Bangalore, India Dr. Vikram Vij
  2. 2. 2013 Team Foundation for Voice Intelligence 2015 US Launch Voice Assistance 2017 Bixby Global Launch SRI-B’s Foot Steps Towards Voice Intelligence
  3. 3. Bixby Introduction • Bixby is an intelligent, personalized voice interface for your phone. • It lets you seamlessly switch between voice and touch modes. • Just speak using natural language and Bixby will execute your command. • Launch Date : 19th July 2017 (US), 22nd Aug (Global) • Expanded in more than 200 countries • More than 75 Domains supported (eg. Camera, Gallery, Messages, WhatsApp, Youtube, Uber etc.) • • •
  4. 4. Bixby | Human Computer Interface Revolution With English Support, Samsung's Bixby Impresses Vs. Siri And Google Assistant Bixby is perhaps in the most precarious spot, as it’s going to be competing directly against Google Assistant on some devices. Bixby’s capabilities sound quite impressive thanks to its integration with other Samsung apps Galaxy S8's voice sidekick can do things Siri can't
  5. 5. Traditional Flow of Voice Assistance NLU Platform mom Text to Mom Machine Learning Models Command Domain Classifier Intent Classifier Slot Tagger
  6. 6. Bixby v1.0 in a Minimalistic View ASR NLU voice packet text input command
  7. 7. Bixby ASR - Fundamentals Language Model(s) voice packet Feature Extraction Decoder Acoustic Model ASR System ASR Hypothesis Inverse Text Normalization
  8. 8. Bixby ASR – Multi Accent United States China India United Kingdom SpainSouth Korea DEFAULT ACCENTED On-Boarding Utterances SIM Card Information Keyboard Language Contact Details Accent Determination Australia Canada
  9. 9. Bixby: The Multi-Modal Point of View ① Home ② Settings ③ Connections ③ Data Usage Touch Interface Voice Interface + “Show me the mobile data usage”
  10. 10. Bixby: The Multi-Modal Point of View (cont’d) Touch UI Screen Flow Voice UI “Find Hawaii photos in Gallery” Context Context Context Context
  11. 11. Leap Required for NLU toward Multi-Modality Traditional NLU Multi-Modal NLU Context Awareness Massive Number of Contexts Varying Set of Commands … … … … … Thousands of states Note8 … … … … … … S8 TabS Various device models, apps, locales, …
  12. 12. Challenge of Massive Contextual Input Space
  13. 13. Challenge of RNN vs CNN
  14. 14. Challenge of Filter Size
  15. 15. Data Governence
  16. 16. Approach for Massive Contextual Input Space
  17. 17. Challenge of Variable Output Space
  18. 18. Approach for Variable Output Space
  19. 19. Challenge for Indian Market • Hindi targeted as language of experimentation. • Indian Languages e.g. Hindi is used in conjunction with English e.g. camera खुला करो • We have developed bi-lingual model for Hindi classifier
  20. 20. Bixby1.0 is a Success 0 1000000 2000000 3000000 4000000 5000000 6000000 7000000 8000000 Aug'17 Sep'17 Oct'17 Nov'17 Dec'17 Jan'18 Feb'18 Total Bixby Users and Active Users Total Users DAU US Launch 19th July 2017 Global Launch 22nd Aug 2017