SlideShare a Scribd company logo
1 of 12
Download to read offline
Comparing voice related API’s
Christian Rebernik
@crebernik7791
Voice First Footprint
In 2017 there will be 33 mio devices
● The Voice 2017 Report - VoiceLabs analysis combined with research from CIRP, KPCB and InfoScout
Voice adoption
The ‘Voice First’ era has already started
● Alexa in 4% of US households
(end 2016)
● Siri handles over 2bn commands
a week
● 20% of Google searches on
Android handsets input by voice
Alexa
Google
home
Ding Dong
Voice Devices
Creating an open ecosystem
Amazon Echo
Skills and Alexa Voices Service
Google Home
Google Assistant Actions
Speech Recognition API
Developing for the Amazon Alexa
● Limit understanding
Amazon Echo is build for predefined options (e.g. no custom notes).
Session is ended after 8 sec.
● Predefined wake word defines the customer experience.
Only 4 wake words available and must be in any conversation.
● No notifications and no presence
You can’t alert the user of an event. You cannot react on e.g. welcome
home.
● No audio / No identification
Anybody can use Alexa (guests, etc.) and access all informations
Technology Stack
Components enabling Voice User Interfaces
Implemented use cases leveraging
the Hardware and AI Software
Software that interprets speech,
enables conversations and provide
natural voice.
Devices the consumer is
interacting like Amazon Echo or
Google Home
Applications
AI Software
Hardware
AI overview
120 companies in Speech Recognition
Ventures Scanner, Contact info@venturescanner.com
Speech Recognition API
Real time speech-to-text API’s
Google4
IBM3
Microsoft2
Status Beta Beta/Production Preview
Language Support1
43 (89) 8 (14) 6 (7)
Cost/min 0,024 €
0,006 / 15sec
0,02 € 0,06 €
1000 calls a 15 sec for 4$
Speaker detection no English (8KHz) no
Audio Formats FLAC, Linear16, MULAW,
ARM, AMR_WB
FLAC, PCM, WAV, OGG,
NULAW
PCM single channel, Siren,
SirenSR
Noise Friendly Yes Unkown Unkown
Word hints Yes No No
1) Languages support (Languages supported including dialects)
2) Microsoft: https://www.microsoft.com/cognitive-services/en-us/speech-api
3) IBM: http://www.ibm.com/watson/developercloud/speech-to-text.html
4) Google: https://cloud.google.com/speech/
● High audio capturing quality
Use lossless coding. Capture audio with 16,000 Hz or higher. Use native sample rate.
● No additional noise
API’s include noise reduction. Duplicate noise reduction can reduce the quality. Echo
and noise has huge impact on speech recognition quality
● User education
Educate user to be close to the microphone
● One speaker per stream.
For multi speaker setting try to separate the audio streams as the current API’s are
built for dictation
● Provide context
Context matters a lot. Provide word hints to help the system to correct detection.
Speech Recognition API
Best practices
Problem
Real life - Voice is in the early days
Speech-to-text-quality
Speaker
recognition
Language mixing
Punctuation
Demo
Voice interaction in IoT
We are building a voice first company
and are looking for support
- Technical Research
- Deep Learning & NLP Scientist
- Software Engineers
Christian Rebernik
Contact: christian@6voices.com

More Related Content

Viewers also liked

Viewers also liked (10)

Daum 음성인식 API (김한샘)
Daum 음성인식 API (김한샘)Daum 음성인식 API (김한샘)
Daum 음성인식 API (김한샘)
 
음성인식 기술 및 활용 트렌드 (2013년)
음성인식 기술 및 활용 트렌드 (2013년)음성인식 기술 및 활용 트렌드 (2013년)
음성인식 기술 및 활용 트렌드 (2013년)
 
오픈 API 활용방법(Daum 사례 중심, 윤석찬)
오픈 API 활용방법(Daum 사례 중심, 윤석찬)오픈 API 활용방법(Daum 사례 중심, 윤석찬)
오픈 API 활용방법(Daum 사례 중심, 윤석찬)
 
20160409 microsoft 세미나 머신러닝관련 발표자료
20160409 microsoft 세미나 머신러닝관련 발표자료20160409 microsoft 세미나 머신러닝관련 발표자료
20160409 microsoft 세미나 머신러닝관련 발표자료
 
마인즈랩 발표자료 V1.9_for public
마인즈랩 발표자료 V1.9_for public마인즈랩 발표자료 V1.9_for public
마인즈랩 발표자료 V1.9_for public
 
(MBL310) Alexa Voice Service Under the Hood
(MBL310) Alexa Voice Service Under the Hood(MBL310) Alexa Voice Service Under the Hood
(MBL310) Alexa Voice Service Under the Hood
 
Multi-Factor Auth in Alexa Skills - Faisal Valli
Multi-Factor Auth in Alexa Skills - Faisal ValliMulti-Factor Auth in Alexa Skills - Faisal Valli
Multi-Factor Auth in Alexa Skills - Faisal Valli
 
Google Home
Google HomeGoogle Home
Google Home
 
(MBL301) Creating Voice Experiences Using Amazon Alexa
(MBL301) Creating Voice Experiences Using Amazon Alexa(MBL301) Creating Voice Experiences Using Amazon Alexa
(MBL301) Creating Voice Experiences Using Amazon Alexa
 
Speak Up! Build an Alexa Skill for a Cause
 Speak Up! Build an Alexa Skill for a Cause Speak Up! Build an Alexa Skill for a Cause
Speak Up! Build an Alexa Skill for a Cause
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

The rise of voice platforms - Comparing voice related API's

  • 1. Comparing voice related API’s Christian Rebernik @crebernik7791
  • 2. Voice First Footprint In 2017 there will be 33 mio devices ● The Voice 2017 Report - VoiceLabs analysis combined with research from CIRP, KPCB and InfoScout
  • 3. Voice adoption The ‘Voice First’ era has already started ● Alexa in 4% of US households (end 2016) ● Siri handles over 2bn commands a week ● 20% of Google searches on Android handsets input by voice Alexa Google home Ding Dong
  • 4. Voice Devices Creating an open ecosystem Amazon Echo Skills and Alexa Voices Service Google Home Google Assistant Actions
  • 5. Speech Recognition API Developing for the Amazon Alexa ● Limit understanding Amazon Echo is build for predefined options (e.g. no custom notes). Session is ended after 8 sec. ● Predefined wake word defines the customer experience. Only 4 wake words available and must be in any conversation. ● No notifications and no presence You can’t alert the user of an event. You cannot react on e.g. welcome home. ● No audio / No identification Anybody can use Alexa (guests, etc.) and access all informations
  • 6. Technology Stack Components enabling Voice User Interfaces Implemented use cases leveraging the Hardware and AI Software Software that interprets speech, enables conversations and provide natural voice. Devices the consumer is interacting like Amazon Echo or Google Home Applications AI Software Hardware
  • 7. AI overview 120 companies in Speech Recognition Ventures Scanner, Contact info@venturescanner.com
  • 8. Speech Recognition API Real time speech-to-text API’s Google4 IBM3 Microsoft2 Status Beta Beta/Production Preview Language Support1 43 (89) 8 (14) 6 (7) Cost/min 0,024 € 0,006 / 15sec 0,02 € 0,06 € 1000 calls a 15 sec for 4$ Speaker detection no English (8KHz) no Audio Formats FLAC, Linear16, MULAW, ARM, AMR_WB FLAC, PCM, WAV, OGG, NULAW PCM single channel, Siren, SirenSR Noise Friendly Yes Unkown Unkown Word hints Yes No No 1) Languages support (Languages supported including dialects) 2) Microsoft: https://www.microsoft.com/cognitive-services/en-us/speech-api 3) IBM: http://www.ibm.com/watson/developercloud/speech-to-text.html 4) Google: https://cloud.google.com/speech/
  • 9. ● High audio capturing quality Use lossless coding. Capture audio with 16,000 Hz or higher. Use native sample rate. ● No additional noise API’s include noise reduction. Duplicate noise reduction can reduce the quality. Echo and noise has huge impact on speech recognition quality ● User education Educate user to be close to the microphone ● One speaker per stream. For multi speaker setting try to separate the audio streams as the current API’s are built for dictation ● Provide context Context matters a lot. Provide word hints to help the system to correct detection. Speech Recognition API Best practices
  • 10. Problem Real life - Voice is in the early days Speech-to-text-quality Speaker recognition Language mixing Punctuation
  • 12. We are building a voice first company and are looking for support - Technical Research - Deep Learning & NLP Scientist - Software Engineers Christian Rebernik Contact: christian@6voices.com