SlideShare a Scribd company logo
1 of 73
The State of 
Speech 
Recognition on 
Mobile
Watch the video with slide 
synchronization on InfoQ.com! 
http://www.infoq.com/presentations 
/state-speech-recognition-mobile 
InfoQ.com: News & Community Site 
• 750,000 unique visitors/month 
• Published in 4 languages (English, Chinese, Japanese and Brazilian 
Portuguese) 
• Post content from our QCon conferences 
• News 15-20 / week 
• Articles 3-4 / week 
• Presentations (videos) 12-15 / week 
• Interviews 2-3 / week 
• Books 1 / month
Presented at QCon New York 
www.qconnewyork.com 
Purpose of QCon 
- to empower software development by facilitating the spread of 
knowledge and innovation 
Strategy 
- practitioner-driven conference designed for YOU: influencers of 
change and innovation in your teams 
- speakers and topics driving the evolution and innovation 
- connecting and catalyzing the influencers and innovators 
Highlights 
- attended by more than 12,000 delegates since 2007 
- held in 9 cities worldwide
The state of speech recognition on mobile
The state of speech recognition on mobile
The state of speech recognition on mobile
The state of speech recognition on mobile
The future won't be like Star Trek. 
Scott Adams, creator of Dilbert
The state of speech recognition on mobile
Why do I care about speech rec?
The state of speech recognition on mobile
+ 
= Cape Bretoner
Here's a conversation between two Cape 
Bretoners 
P1: jeet? 
P2: naw, jew? 
P1: naw, t'rly t'eet bye.
And here's the translation 
P1: jeet? 
P1: Did you eat? 
P2: naw, jew? 
P2: No, did you? 
P1: naw, t'rly t'eet bye. 
P1: No, it's too early to eat buddy.
Regular Alphabet 
26 letters 
Cape Breton 
Alphabet 
12 letters!
Alright, enough about me
What is speech 
recognition?
Speech recognition is the 
process of translating the 
spoken word into text.
The process of speech rec 
includes...
Record and digitize the audio 
data
Perform end pointing 
(trimming)
Split data into phonemes
What is a phoneme? 
It is a perceptually distinct 
units of sound in a specified 
language that distinguish one 
word from another.
The English language has 44 
distinct sounds 
Source: English language phoneme chart
By comparison, the Rotokas 
speakers in Papua New Guinea 
have 11 phonemes. 
But the !Xóõ speakers who 
mostly live in Botswana have 
112 phonemes.
Apply the phonemes to the 
recognition model. This is a 
massive lexicon which takes 
into account all of the different 
ways words can be 
pronounced.
Analyze the results against the 
grammar
Return a confidence weighted 
result 
[ 
{ 
"confidence": 0.97335243225098, 
"transcript": "hello" 
}, 
{ 
"confidence": 0.19940405040800, 
"transcript": "hell low" 
}, 
{ 
"confidence": 0.19910827091000, 
"transcript": "how low" 
} 
]
Basically...
The state of speech recognition on mobile
We want it to be like this 
0:02
but more often than not... 
0:25
Why is that? 
When two people talk 
comprehension rates are better 
than 97%
A really good english language 
speech recognition system is 
right 92% of the time
Where does that extra 5% in 
error rate come from? 
Vocabulary size and confusability 
Speaker dependence vs independence 
Isolated or continuous speech 
Initiated vs spontaneous speech 
Adverse conditions
Mobile Speech Recognition 
OS Application SDK 
Android Google Now Java API 
iOS Siri Many 3rd party Obj-C SDK's 
Windows Phone Cortana C# API
So how do we 
add speech rec 
to our app?
You may look at the W3C 
Speech API Specification
but only Chrome on the 
desktop has implemented that 
spec
But that's okay!
The spec looks like this: 
interface SpeechRecognition : EventTarget { 
// recognition parameters 
attribute SpeechGrammarList grammars; 
attribute DOMString lang; 
attribute boolean continuous; 
attribute boolean interimResults; 
attribute unsigned long maxAlternatives; 
attribute DOMString serviceURI; 
// methods to drive the speech interaction 
void start(); 
void stop(); 
void abort(); 
};
With additional event methods 
to control behaviour: 
attribute EventHandler onaudiostart; 
attribute EventHandler onsoundstart; 
attribute EventHandler onspeechstart; 
attribute EventHandler onspeechend; 
attribute EventHandler onsoundend; 
attribute EventHandler onaudioend; 
attribute EventHandler onresult; 
attribute EventHandler onnomatch; 
attribute EventHandler onerror; 
attribute EventHandler onstart; 
attribute EventHandler onend;
Let's recognize some speech 
var recognition = new SpeechRecognition(); 
recognition.onresult = function(event) { 
if (event.results.length > 0) { 
var test1 = document.getElementById("test1"); 
test1.innerHTML = event.results[0][0].transcript; 
} 
}; 
recognition.start(); 
Click to Speak 
Replace me...
So that's pretty 
cool...
...if taking dictation gets you 
going
But I want to do 
something more 
exciting with the 
result
Let's do something a little less 
trivial 
recognition.onresult = function(event) { 
var result = event.results[0][0].transcript; 
var music = document.getElementById("music"); 
switch(result) { 
case "jazz": 
music.src="jazz.mp3"; 
music.play(); 
break; 
case "rock": 
music.src="rock.mp3"; 
music.play(); 
break; 
case "stop": 
default: 
music.pause(); 
} 
}; 
Click to Speak
Which seems 
much cooler to 
me
Let's ask the web a question 
Click to Speak
Works pretty 
good... 
...but ugly!
Let's style our 
button with some 
CSS
+ 
= 
<a class="speechinput"> 
<img src="images/mic.png"> 
</a> 
#speechinput input { 
cursor:pointer; 
margin:auto; 
margin:15px; 
color:transparent; 
background-color:transparent; 
border:5px; 
width:15px; 
-webkit-transform: scale(3.0, 3.0); 
}
And we'll add some color using 
Speech 
Bubbles 
by Nicholas Gallagher 
Pure-CSS-Speech-Bubbles
Then pull it all 
together!
The state of speech recognition on mobile
But wait, why am 
I using my eyes 
like a sucker?
We'll output the answer using 
SpeechSynthesis
The SpeechSynthesis spec 
looks like this: 
interface SpeechSynthesis { 
readonly attribute boolean pending; 
readonly attribute boolean speaking; 
readonly attribute boolean paused; 
void speak(SpeechSynthesisUtterance utterance); 
void cancel(); 
void pause(); 
void resume(); 
SpeechSynthesisVoiceList getVoices(); 
};
The SpeechSynthesisUtterance 
spec looks like this: 
interface SpeechSynthesisUtterance : EventTarget { 
attribute DOMString text; 
attribute DOMString lang; 
attribute DOMString voiceURI; 
attribute float volume; 
attribute float rate; 
attribute float pitch; 
};
With additional event methods 
to control behaviour: 
attribute EventHandler onstart; 
attribute EventHandler onend; 
attribute EventHandler onerror; 
attribute EventHandler onpause; 
attribute EventHandler onresume; 
attribute EventHandler onmark; 
attribute EventHandler onboundary;
The state of speech recognition on mobile
Plugin repo's 
SpeechRecognitionPlugin - 
https://github.com/macdonst/SpeechRecognitionPlugin 
SpeechSynthesisPlugin - 
https://github.com/macdonst/SpeechSynthesisPlugin
Availability 
OS Recognition Synthesis 
Android ✓ ✓ 
iOS* Active development Native to iOS 7.0 
Windows Phone × × 
* Working with Julio César (@jcesarmobile) to get iOS done
Getting started 
cordova create speech com.example.speech speech 
cd speech 
cordova build android 
cordova local plugin add https://github.com/macdonst/SpeechRecognitionPlugin 
cordova local plugin add https://github.com/macdonst/SpeechSynthesisPlugin 
cordova install android
For more information on hybrid 
applications 
Check out 
Christophe 
presentation on 
Coenraets 
Creating Native-Like Mobile 
Apps with AngularJS, Ionic and 
Cordova 3:00pm today right 
here in Salon C.
But wait, one 
more thing...
Speech recognition and speech 
synthesis are not well 
supported in the emulator 
and sometimes developing on 
the device can be a bit of a 
pain.
That's why I coded 
speechshim.js 
https://github.com/macdonst/SpeechShim
Chrome + speechshim.js 
= 
W3C Web Speech API on your 
desktop
The state of speech recognition on mobile
The state of speech recognition on mobile
The state of speech recognition on mobile
Watch the video with slide synchronization on 
InfoQ.com! 
http://www.infoq.com/presentations/state-speech- 
recognition-mobile

More Related Content

Viewers also liked

Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemREHMAT ULLAH
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognitionCharu Joshi
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice RecognitionAmrita More
 
Speech recognition challenges
Speech recognition challengesSpeech recognition challenges
Speech recognition challengesAlexandru Chica
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By MatlabAnkit Gujrati
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionRichie
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerLuminary Labs
 

Viewers also liked (7)

Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition system
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
Speech recognition challenges
Speech recognition challengesSpeech recognition challenges
Speech recognition challenges
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 

More from C4Media

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoC4Media
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileC4Media
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020C4Media
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No KeeperC4Media
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like OwnersC4Media
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 

More from C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Recently uploaded

Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 

Recently uploaded (20)

Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 

The state of speech recognition on mobile

  • 1. The State of Speech Recognition on Mobile
  • 2. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /state-speech-recognition-mobile InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month
  • 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 8. The future won't be like Star Trek. Scott Adams, creator of Dilbert
  • 10. Why do I care about speech rec?
  • 12. + = Cape Bretoner
  • 13. Here's a conversation between two Cape Bretoners P1: jeet? P2: naw, jew? P1: naw, t'rly t'eet bye.
  • 14. And here's the translation P1: jeet? P1: Did you eat? P2: naw, jew? P2: No, did you? P1: naw, t'rly t'eet bye. P1: No, it's too early to eat buddy.
  • 15. Regular Alphabet 26 letters Cape Breton Alphabet 12 letters!
  • 17. What is speech recognition?
  • 18. Speech recognition is the process of translating the spoken word into text.
  • 19. The process of speech rec includes...
  • 20. Record and digitize the audio data
  • 21. Perform end pointing (trimming)
  • 22. Split data into phonemes
  • 23. What is a phoneme? It is a perceptually distinct units of sound in a specified language that distinguish one word from another.
  • 24. The English language has 44 distinct sounds Source: English language phoneme chart
  • 25. By comparison, the Rotokas speakers in Papua New Guinea have 11 phonemes. But the !Xóõ speakers who mostly live in Botswana have 112 phonemes.
  • 26. Apply the phonemes to the recognition model. This is a massive lexicon which takes into account all of the different ways words can be pronounced.
  • 27. Analyze the results against the grammar
  • 28. Return a confidence weighted result [ { "confidence": 0.97335243225098, "transcript": "hello" }, { "confidence": 0.19940405040800, "transcript": "hell low" }, { "confidence": 0.19910827091000, "transcript": "how low" } ]
  • 31. We want it to be like this 0:02
  • 32. but more often than not... 0:25
  • 33. Why is that? When two people talk comprehension rates are better than 97%
  • 34. A really good english language speech recognition system is right 92% of the time
  • 35. Where does that extra 5% in error rate come from? Vocabulary size and confusability Speaker dependence vs independence Isolated or continuous speech Initiated vs spontaneous speech Adverse conditions
  • 36. Mobile Speech Recognition OS Application SDK Android Google Now Java API iOS Siri Many 3rd party Obj-C SDK's Windows Phone Cortana C# API
  • 37. So how do we add speech rec to our app?
  • 38. You may look at the W3C Speech API Specification
  • 39. but only Chrome on the desktop has implemented that spec
  • 41. The spec looks like this: interface SpeechRecognition : EventTarget { // recognition parameters attribute SpeechGrammarList grammars; attribute DOMString lang; attribute boolean continuous; attribute boolean interimResults; attribute unsigned long maxAlternatives; attribute DOMString serviceURI; // methods to drive the speech interaction void start(); void stop(); void abort(); };
  • 42. With additional event methods to control behaviour: attribute EventHandler onaudiostart; attribute EventHandler onsoundstart; attribute EventHandler onspeechstart; attribute EventHandler onspeechend; attribute EventHandler onsoundend; attribute EventHandler onaudioend; attribute EventHandler onresult; attribute EventHandler onnomatch; attribute EventHandler onerror; attribute EventHandler onstart; attribute EventHandler onend;
  • 43. Let's recognize some speech var recognition = new SpeechRecognition(); recognition.onresult = function(event) { if (event.results.length > 0) { var test1 = document.getElementById("test1"); test1.innerHTML = event.results[0][0].transcript; } }; recognition.start(); Click to Speak Replace me...
  • 44. So that's pretty cool...
  • 45. ...if taking dictation gets you going
  • 46. But I want to do something more exciting with the result
  • 47. Let's do something a little less trivial recognition.onresult = function(event) { var result = event.results[0][0].transcript; var music = document.getElementById("music"); switch(result) { case "jazz": music.src="jazz.mp3"; music.play(); break; case "rock": music.src="rock.mp3"; music.play(); break; case "stop": default: music.pause(); } }; Click to Speak
  • 48. Which seems much cooler to me
  • 49. Let's ask the web a question Click to Speak
  • 50. Works pretty good... ...but ugly!
  • 51. Let's style our button with some CSS
  • 52. + = <a class="speechinput"> <img src="images/mic.png"> </a> #speechinput input { cursor:pointer; margin:auto; margin:15px; color:transparent; background-color:transparent; border:5px; width:15px; -webkit-transform: scale(3.0, 3.0); }
  • 53. And we'll add some color using Speech Bubbles by Nicholas Gallagher Pure-CSS-Speech-Bubbles
  • 54. Then pull it all together!
  • 56. But wait, why am I using my eyes like a sucker?
  • 57. We'll output the answer using SpeechSynthesis
  • 58. The SpeechSynthesis spec looks like this: interface SpeechSynthesis { readonly attribute boolean pending; readonly attribute boolean speaking; readonly attribute boolean paused; void speak(SpeechSynthesisUtterance utterance); void cancel(); void pause(); void resume(); SpeechSynthesisVoiceList getVoices(); };
  • 59. The SpeechSynthesisUtterance spec looks like this: interface SpeechSynthesisUtterance : EventTarget { attribute DOMString text; attribute DOMString lang; attribute DOMString voiceURI; attribute float volume; attribute float rate; attribute float pitch; };
  • 60. With additional event methods to control behaviour: attribute EventHandler onstart; attribute EventHandler onend; attribute EventHandler onerror; attribute EventHandler onpause; attribute EventHandler onresume; attribute EventHandler onmark; attribute EventHandler onboundary;
  • 62. Plugin repo's SpeechRecognitionPlugin - https://github.com/macdonst/SpeechRecognitionPlugin SpeechSynthesisPlugin - https://github.com/macdonst/SpeechSynthesisPlugin
  • 63. Availability OS Recognition Synthesis Android ✓ ✓ iOS* Active development Native to iOS 7.0 Windows Phone × × * Working with Julio César (@jcesarmobile) to get iOS done
  • 64. Getting started cordova create speech com.example.speech speech cd speech cordova build android cordova local plugin add https://github.com/macdonst/SpeechRecognitionPlugin cordova local plugin add https://github.com/macdonst/SpeechSynthesisPlugin cordova install android
  • 65. For more information on hybrid applications Check out Christophe presentation on Coenraets Creating Native-Like Mobile Apps with AngularJS, Ionic and Cordova 3:00pm today right here in Salon C.
  • 66. But wait, one more thing...
  • 67. Speech recognition and speech synthesis are not well supported in the emulator and sometimes developing on the device can be a bit of a pain.
  • 68. That's why I coded speechshim.js https://github.com/macdonst/SpeechShim
  • 69. Chrome + speechshim.js = W3C Web Speech API on your desktop
  • 73. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/state-speech- recognition-mobile