Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Voice Interface Revolution

696 views

Published on

A modern take on Voice Interface based Applications design and development, with comprehensive recommendations and next future's forecasts about Internet of Things and Home Automation.

Published in: Software
  • Yes you are right. There are many research paper writing services available now. But almost services are fake and illegal. Only a genuine service will treat their customer with quality research papers. ⇒ www.WritePaper.info ⇐
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • I pasted a website that might be helpful to you: ⇒ HelpWriting.net ⇐ Good luck!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • One of the key benefits of ⇒ HelpWriting.net ⇐ clients is that you communicate with writer directly and manage your order personally.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

The Voice Interface Revolution

  1. 1. THE VOICE INTERFACE REVOLUTION_
  2. 2. GENERAL TRAITS + INTRODUCTION
  3. 3. WHY NOW?_ ‣ WEB SERVICES AND IoT EXPLOTION ‣ HARDWARE NOW SUPPORTS FAR-FIELD VOICE INPUT PROCESSING ‣ SCIENCE BEHIND THE SCENES IS NOW ACCESIBLE ‣ AUTOMATIC SPEECH RECOGNITION, NATURAL LANGUAGE UNDERSTANDING, TEXT TO SPEECH ‣ ARTIFICIAL INTELLIGENCE IS MAKING VOICE INTERFACES SMARTER ‣ PERSONALIZATION TO USER SPEECH, CONTEXTS AND PREFERENCES
  4. 4. BENEFITS_ ‣ MOST NATURAL INTERFACE FOR HUMANS ‣ INSTANT VALUE FOR QUICK DEMANDS ‣ SUITABLE FOR NON-TECHNOLOGICAL USERS ‣ LOW HARDWARE NEED, NO SCREEN REQUIRED ‣ ACCESIBILITY FOR LOW VISION CAPABILITIES AND HAND DISABILITY
  5. 5. DRAWBACKS_ ‣ ERRORS IN SPEECH RECOGNITION ‣ DIFFERENT SPEECH RECOGNITION/GENERATION ACCURACY AMONG LANGUAGES ‣ BACKGROUND NOISE SUSCEPTIBILITY ‣ NON-ACCESIBLE FOR DEAF/MUTE ‣ RESPONSES WITH SLOW DATA EXPOSURE
  6. 6. USE CASES_ ‣ SUITABLE ‣ QUICK LOW-PARAMETERIZED INFORMATION DEMANDS ‣ LOW-PARAMETERIZED NON-CRITICAL TRANSACTIONS ‣ RESPONSES WITH REDUCED AMOUNT OF DATA ‣ NON-SUITABLE ‣ HIGH-PARAMETERIZED QUESTIONS OR TRANSACTIONS ‣ CRITICAL TRANSACTIONS DUE TO ERROR POSSIBILITIES ‣ RESPONSES WITH LARGE AMOUNT OF DATA
  7. 7. DESIGN + INTERFACE
  8. 8. BASICS_ ‣ CHOOSE YOUR CHANNEL WISELY ‣ CUSTOM APPLICATION ‣ GENERAL ASSISTANT ‣ STUDY YOUR DOMAIN ‣ VOICE-ONLY? ‣ BEST OPTION IS USALLY COMBINED GRAPHIC AND VOICE INTERFACE
  9. 9. RECOMMEDATIONS_ ‣ SHORT INTERACTIONS ‣ SHORTER THAN TEXT-BASED EXPERIENCES ‣ NO LONG FUNNELS MORE THAN TWO STEPS ‣ MEANINGFUL RESPONSES WITH VALUE ‣ TRANSPARENCY ‣ ENGAGEMENT ‣ TAIL QUESTIONS ‣ NOTIFICATIONS
  10. 10. ENTRANCE_ ‣ DRIVEN FIRST INTERACTION ‣ SINGLE POINT ALLOW MORE CONTROL ‣ QUICK WELCOME
  11. 11. CUT THE BULLSHIT_ ‣ VALUE OVER SMALLTALK ‣ VALUE OVER PERSONALITY ‣ VALUE OVER HUMOUR ‣ BE HONEST
  12. 12. CONVERSATIONS BASICS_ ‣ Turn-taking ‣ Threading ‣ Leveraging inherent efficiency of language ‣ Anticipating variable user behaviour ‣ Understanding cooperative behaviour ‣ Cooperative principle ‣ Paul Grice’s Maxims ‣ Use everyday language ‣ Instilling user confidence
  13. 13. GRICE’S MAXIMS_ ‣ The maxim of quantity, where one tries to be as informative as one possibly can, and gives as much information as is needed, and no more. ‣ The maxim of quality, where one tries to be truthful, and does not give information that is false or that is not supported by evidence. ‣ The maxim of relation, where one tries to be relevant, and says things that are pertinent to the discussion. ‣ The maxim of manner, when one tries to be as clear, as brief, and as orderly as one can in what one says, and where one avoids obscurity and ambiguity.
  14. 14. DEVELOPMENT + TECHNICAL
  15. 15. BASICS_ ‣ SPEECH RECOGNITION/GENERATION ‣ AUTOMATIC IN GENERAL ASSISTANTS ‣ SERVICE OR LIBRARY BASED IN CUSTOM ASSISTANTS ‣ CORE COMPONENT IS THE DIALOG ENGINE ‣ GOOGLE DIALOGFLOW ‣ MICROSOFT BOT FRAMEWORK ‣ IBM WATSON ASSISTANT ‣ YOUR OWN
  16. 16. EXAMPLE DIALOG ENGINE DIAGRAM_ ‣ NLU Platform to receive requests and converts them to intents, parameters
  17. 17. RECOMMEDATIONS_ ‣ NODEJS AS BACKEND TECHNOLOGY ‣ IDEAL FOR PaaS AND EVEN FaaS ‣ OWN USER SYSTEM ‣ MIXED CONTEXT STRATEGY: ‣ KEEP CONVERSATIONS ON MEMORY ‣ KEEP MEANINGFUL, ACTIONABLE DATA ON DATABASE ‣ PRINCIPLES OF MODULARITY AND COMPONENTIZATION
  18. 18. PATTERNS_ ‣ ADAPTER ‣ IDEAL FOR CUSTOM INPUT ENTRIES: ‣ OWN/THIRD PARTY WEBHOOK ‣ MESSAGE SYSTEM LIBRARY ‣ IDEAL FOR CUSTOM OUTPUT EXITS ‣ MIDDLEWARE ‣ FOR USER INPUT ‣ FOR OUTPUT GENERATION
  19. 19. SPEECH RECOGNITION_ ‣ ACCURACY IS DOWN TO 4.9 ERROR PERCENTAGE BY GOOGLE THANKS TO AI TECHNIQUES LIKE DEEP LEARNING ‣ THREE MODELS WORK TOGETHER IN A GRAPH: ‣ ACOUSTIC: WAVEFORM TO EACH SOUND FRAGMENT ‣ PRONUNCIATION: SOUNDS TO WORDS ‣ LANGUAGE: WORDS TO SENTENCES ‣ STANDARD DATASET TO MEASURE ACCURACY IS NIST 2000 SWITCHBOARD
  20. 20. SPEECH RECOGNITION APIS_ ‣ GOOGLE CLOUD SPEECH ‣ Converts audio to text, synchronously and asynchronously in 80+ different languages with a high degree of accuracy ‣ https://cloud.google.com/speech/docs ‣ MICROSOFT LUIS ‣ Interprets intents and extract entities, with built-in trained ones ‣ https://www.luis.ai/home ‣ IBM WATSON SPEECH-TO-TEXT ‣ https://www.ibm.com/watson/services/speech-to-text/ ‣ AMAZON TRANSCRIBE ‣ https://aws.amazon.com/es/transcribe/
  21. 21. GOOGLE CLOUD API_ ‣ NATURAL LANGUAGE ‣ Provides natural language understanding technologies to developers. Examples include sentiment analysis, entity recognition, entity sentiment analysis, and text annotations. ‣ https://cloud.google.com/natural-language/docs/reference/rest ‣ TRANSLATION ‣ Translates over 80+ languages and detect language from speech. ‣ https://cloud.google.com/translate/docs/reference/rest
  22. 22. GOOGLE ASSISTANT DEVELOPMENT + EXAMPLE
  23. 23. ACTIONS ON GOOGLE_ ‣ Platform to build actions invoked by users to fulfill some need ‣ Easy way with Dialogflow integration ‣ Custom way with ACTIONS SDK ‣ How it works: ‣ User requests an action “Talk to my Hotel Concierge” ‣ Assistant asks Actions on Google to invoke the particular app ‣ The conversation between the user and the app begins ‣ Subsequent user input is sent directly to app until the app fulfills the intent and ends
  24. 24. INTENTS_ ‣ Represent a mapping between what a user says and what action should be taken by your software. ‣ User Says (Expressions) ‣ Natural language expressions annotated with parameters that are linked to entities ‣ Actions ‣ Trigger-name with associated parameters to perform an action on the app ‣ Response ‣ You can add Simple Text or Rich Response depending on platform ‣ Contexts ‣ Passing info from other intents or external. Input are prerequisite
  25. 25. ENTITIES_ ‣ Significant data extracted from user input in form of parameter value ‣ Entities are associated to particular actions ‣ There are three types: ‣ System ‣ Pre-built entities provided by API.AI in order to facilitate handling common concepts (colors, locations,…) ‣ Developer ‣ Custom entities created with Reference Value plus Synonyms ‣ User Entities ‣ Defined for the session, specific playlists for instance
  26. 26. CONTEXTS_ ‣ Persisted information that can be used through intents ‣ It can be internal like a particular movie the user is asking for ‣ Or external like the user data retrieved from a user system ‣ Lifespan: ‣ By default they last for 5 requests or 10 minutes ‣ Input Context: ‣ Limit intents to be matched only when certain contexts are set ‣ For example when you need specific info to perform action ‣ Output Context: ‣ They are tied to user sessions, is shared by the intent ‣ Automatically added to follow-up intents
  27. 27. EVENTS & DIALOGS_ ‣ Events is a feature that allows you to invoke intents by an event name instead of a user query ‣ Dialogs ‣ Linear ‣ With Slot Filling you define required parameters with prompts and order them. Agent will ask for them until has all info. ‣ Non-linear ‣ Complex dialogs are formed from context routing, removing Output Context for Intent Responses, and adding new Output Context that is matched for next question
  28. 28. GENERAL ASSISTANTS + CHANNELS
  29. 29. GENERAL ASSISTANTS MAP_
  30. 30. GOOGLE ASSISTANT_ ‣ ACTIONS ON GOOGLE ALLOWS BUILDING APPS ‣ GOOGLE HOME, HOME MINI, ANDROID, ANDROID AUTO, WEAR OS ‣ FUTURE UP TO 80% OF WORLD MOBILE
 MARKET ‣ CUSTOM DIALOG ENGINE DIALOGFLOW ‣ BEST SPEECH RECOGNITION ‣ BEST POSSIBLE FUTURE INTEGRATION WITH OTHER GOOGLE SERVICES
  31. 31. ALEXA_ ‣ ALEXA SKILLS FOR THIRD PARTY INTEGRATIONS ‣ AMAZON ECHO, DOT ECHO ‣ LARGEST SALES CHANNEL ‣ LARGEST CURRENT MARKET SHARE THANKS
 TO EARLY TIME-TO-MARKET ‣ IOT INTEGRATION THROUGH ALEXA VOICE SERVICE
  32. 32. HOME DEVICES_ ‣ SMART SPEAKERS WILL BE THE CENTER OF HOME USAGE ‣ HOME AUTOMATION ‣ IoT DEVICES ‣ GENERAL USE CASES: ‣ PURCHASES ‣ INFORMATION DEMAND ‣ AGENDA ‣ CONTROL OVER OTHER DEVICES
  33. 33. US HOME DEVICES MARKET SHARE_
  34. 34. FUTURE + ADVANCED
  35. 35. CURRENT USAGE & PREDICTIONS_ ‣ 50% OF ALL SEARCHES WILL BE VOICE-BASED BY 2020 ‣ 22M SMART SPEAKERS IN US BY 2020 ‣ 400M DEVICES WITH ACCESS TO GOOGLE ASSISTANT THIS YEAR ‣ A GOOGLE HOME IS SOLD EVERY SECOND IN US ‣ 40% ADULTS USE VOICE SEARCH
  36. 36. THANK YOU

×