Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The rise of voice platforms - Comparing voice related API's


Published on

Voice First Devices is a massiv growing market. Amazon Echo and Google Home are the first to create an open eco system and offer basic integration possibilities. The AI software to deliver this experience is available as API and can be used to offer custom sophisticated solutions. Key to success is the speech-to-text quality. Comparing different API's and sharing and demonstration of best practices for speech recognition API usage.

Published in: Technology
  • Be the first to comment

The rise of voice platforms - Comparing voice related API's

  1. 1. Comparing voice related API’s Christian Rebernik @crebernik7791
  2. 2. Voice First Footprint In 2017 there will be 33 mio devices ● The Voice 2017 Report - VoiceLabs analysis combined with research from CIRP, KPCB and InfoScout
  3. 3. Voice adoption The ‘Voice First’ era has already started ● Alexa in 4% of US households (end 2016) ● Siri handles over 2bn commands a week ● 20% of Google searches on Android handsets input by voice Alexa Google home Ding Dong
  4. 4. Voice Devices Creating an open ecosystem Amazon Echo Skills and Alexa Voices Service Google Home Google Assistant Actions
  5. 5. Speech Recognition API Developing for the Amazon Alexa ● Limit understanding Amazon Echo is build for predefined options (e.g. no custom notes). Session is ended after 8 sec. ● Predefined wake word defines the customer experience. Only 4 wake words available and must be in any conversation. ● No notifications and no presence You can’t alert the user of an event. You cannot react on e.g. welcome home. ● No audio / No identification Anybody can use Alexa (guests, etc.) and access all informations
  6. 6. Technology Stack Components enabling Voice User Interfaces Implemented use cases leveraging the Hardware and AI Software Software that interprets speech, enables conversations and provide natural voice. Devices the consumer is interacting like Amazon Echo or Google Home Applications AI Software Hardware
  7. 7. AI overview 120 companies in Speech Recognition Ventures Scanner, Contact
  8. 8. Speech Recognition API Real time speech-to-text API’s Google4 IBM3 Microsoft2 Status Beta Beta/Production Preview Language Support1 43 (89) 8 (14) 6 (7) Cost/min 0,024 € 0,006 / 15sec 0,02 € 0,06 € 1000 calls a 15 sec for 4$ Speaker detection no English (8KHz) no Audio Formats FLAC, Linear16, MULAW, ARM, AMR_WB FLAC, PCM, WAV, OGG, NULAW PCM single channel, Siren, SirenSR Noise Friendly Yes Unkown Unkown Word hints Yes No No 1) Languages support (Languages supported including dialects) 2) Microsoft: 3) IBM: 4) Google:
  9. 9. ● High audio capturing quality Use lossless coding. Capture audio with 16,000 Hz or higher. Use native sample rate. ● No additional noise API’s include noise reduction. Duplicate noise reduction can reduce the quality. Echo and noise has huge impact on speech recognition quality ● User education Educate user to be close to the microphone ● One speaker per stream. For multi speaker setting try to separate the audio streams as the current API’s are built for dictation ● Provide context Context matters a lot. Provide word hints to help the system to correct detection. Speech Recognition API Best practices
  10. 10. Problem Real life - Voice is in the early days Speech-to-text-quality Speaker recognition Language mixing Punctuation
  11. 11. Demo Voice interaction in IoT
  12. 12. We are building a voice first company and are looking for support - Technical Research - Deep Learning & NLP Scientist - Software Engineers Christian Rebernik Contact: