SlideShare a Scribd company logo
PAGE1
© 2015 Apio Systems, Inc. Confidential 1
Jared Sheehan @ Driversiti
Speech Recognition as a User Interface
PAGE2
© 2015 Apio Systems, Inc. Confidential 2
Who am I
Glass explorer, speech recognition enthusiast and big android nerd
Android Lead @Driversiti - driving safety for the mobile generation
Speech Recognition application for the Amazon Fire Phone
Suite of applications - AIM Android, Engadget Android, Distro Android, TechCrunch
Android, AOL HD, AIM Blackberry
Meetup evangelist – “DC Android Meetup Group” – Join today!
PAGE3
© 2015 Apio Systems, Inc. Confidential 3
Overview
What is voice/speech recognition?
What awesome stuff you can do with it?
How it works…
Demo!
Question and Answer
PAGE4
© 2015 Apio Systems, Inc. Confidential 4
Hello Computer…
PAGE5
© 2015 Apio Systems, Inc. Confidential 5
Definition
PAGE6
© 2015 Apio Systems, Inc. Confidential 6
What can you do with SR?
Technology that allows spoken input into software systems.
You speak to your computer, tablet, phone or device and it uses what you said as input to
trigger some sort of action.
Replace other methods of input like clicking, swiping, typing or selecting in other ways.
It is a means to make devices and software more user-friendly and to increase productivity.
It is used extensively as a form of accessibility assistance.
PAGE7
© 2015 Apio Systems, Inc. Confidential 7
ASR - Dictation
Automatic speech recognition (ASR) also called Dictation
Translates speech input into words, sentences and punctuation.
Audio is input through a microphone and streamed somewhere
The result is usually returned as a string with a confidence level
Very easy integration with Android – 2 ways to do it.
PAGE8
© 2015 Apio Systems, Inc. Confidential 8
How does it work?
A user speaks into a recording device of some sort
Speech recognition begins with the digital sampling of speech and then acoustic signal
processing of the audio.
Several processes including DTW (Dynamic time warping), HMM (Hidden Markov models)
and NN’s (Neural Networks) can achieve the desired results
Most systems use language specific knowledge to tune the models.
Next is the actual recognition of phonemes, groups of phonemes and words
PAGE9
© 2015 Apio Systems, Inc. Confidential 9
Speech Recognition system architecture
PAGE10
© 2015 Apio Systems, Inc. Confidential 10
Into the weeds
Speaker dependence
Speaker independence
Continuous Speech
How good is your system? Hint: Word Error Rate
Isolated word
Is that all it does??
PAGE11
© 2015 Apio Systems, Inc. Confidential 11
Dictation is cool, but not that cool
Next step is understanding what the user wants to do
Then act on it
Generally, the ASR results are passed into an Intent recognition system with additional
information
Contextual information can be, where the utterance is coming from (mobile phone,
computer), what app they are using, location etc.
That information is used to determine the user’s intent and execute the request.
PAGE12
© 2015 Apio Systems, Inc. Confidential 12
Intent recognition
Recognizing speech is only part of the process. How does Google Now know that I want to
send an SMS message to a friend? How does Siri know when I want to know how tall
Kobe Bryant is?
ASR is only the first step in true Speech as a user interface. To successfully help users
perform useful actions we must understand their intent. How to do this?
Three systems; ASR, Intent Recognition and a Dialog Engine
The Dialog engine takes the output from the IR system and sends responses and
actionable information to the caller.
PAGE13
© 2015 Apio Systems, Inc. Confidential 13
Android Speech APIs
PAGE14
© 2015 Apio Systems, Inc. Confidential 14
Android Speech APIs
http://developer.android.com/reference/android/speech/package-summary.html
Relatively easy implementation
<uses-permission android:name="android.permission.RECORD_AUDIO" />
A UI and no UI API
InputMethodServices use the no UI version - Keyboards
PAGE15
© 2015 Apio Systems, Inc. Confidential 15
Recognizer Intent
UI is supplied for you
Fire the intent and get a result
Again very easy to use
PAGE16
© 2015 Apio Systems, Inc. Confidential 16
SpeechRecognizer
UI is not supplied for you
Results are streamed directly to the EditText
Still “fairly” easy to use
PAGE17
© 2015 Apio Systems, Inc. Confidential 17
Google Now – Onto Intent recognition systems…
PAGE18
© 2015 Apio Systems, Inc. Confidential 18
Google Now – On tap
PAGE19
© 2015 Apio Systems, Inc. Confidential 19
Apple – Siri
PAGE20
© 2015 Apio Systems, Inc. Confidential 20
Amazon – Fire phone, Fire Tv and Echo
PAGE21
© 2015 Apio Systems, Inc. Confidential 21
Microsoft – Cortana
PAGE22
© 2015 Apio Systems, Inc. Confidential 22
Speech providers – Google, Nuance, IBM Watson
PAGE23
© 2015 Apio Systems, Inc. Confidential 23
Google Voice Interaction API
PAGE24
© 2015 Apio Systems, Inc. Confidential 24
Nuance Speech SDK
Dragon Mobile – SDK – Free up to 20k transactions per/month
Upload custom vocabularies
Developer: Uploads a new song and music vocabulary
Utterance: “Eminem” higher probability then “M&M”
PAGE25
© 2015 Apio Systems, Inc. Confidential 25
User Interface examples - Google Glass
PAGE26
© 2015 Apio Systems, Inc. Confidential 26
User Interface examples - Google Glass continued…
PAGE27
© 2015 Apio Systems, Inc. Confidential 27
User Interface examples - Google Glass continued…
PAGE28
© 2015 Apio Systems, Inc. Confidential
Enough talk!
PAGE29
© 2015 Apio Systems, Inc. Confidential
Show me code!
PAGE30
© 2015 Apio Systems, Inc. Confidential
jared.sheehan@driversiti.com
http://www.meetup.com/DCAndroid/
Tweet: @jayroo5245
THANK YOU

More Related Content

What's hot

OOW13: Developing secure mobile applications (CON8902)
OOW13: Developing secure mobile applications (CON8902)OOW13: Developing secure mobile applications (CON8902)
OOW13: Developing secure mobile applications (CON8902)
GregOracle
 
Device Management for Connected Devices
Device Management for Connected Devices Device Management for Connected Devices
Device Management for Connected Devices WSO2
 
SYPHERSAFE
SYPHERSAFESYPHERSAFE
SYPHERSAFE
Mustafa Kuğu
 
Effective Smartphone UX at GREE
Effective Smartphone UX at GREEEffective Smartphone UX at GREE
Effective Smartphone UX at GREE
Kenichi Yonekawa
 
Connecting The Real World With The Virtual World
Connecting The Real World With The Virtual WorldConnecting The Real World With The Virtual World
Connecting The Real World With The Virtual World
Ping Identity
 
Providing Internet Access via WSO2 Enterprise Mobility Manager
Providing Internet Access via WSO2 Enterprise Mobility Manager Providing Internet Access via WSO2 Enterprise Mobility Manager
Providing Internet Access via WSO2 Enterprise Mobility Manager WSO2
 
I phone
I phoneI phone
I phone
uos
 
Nexus Protocol Gateway and BYOD
Nexus Protocol Gateway and BYODNexus Protocol Gateway and BYOD
Nexus Protocol Gateway and BYOD
Samuel Erdtman
 
Patterns and Practices in Mobile SSO
Patterns and Practices in Mobile SSOPatterns and Practices in Mobile SSO
Patterns and Practices in Mobile SSOWSO2
 
Beyond MDM: 5 Things You Must do to Secure Mobile Devices in the Enterprise
Beyond MDM: 5 Things You Must do to Secure Mobile Devices in the EnterpriseBeyond MDM: 5 Things You Must do to Secure Mobile Devices in the Enterprise
Beyond MDM: 5 Things You Must do to Secure Mobile Devices in the Enterprise
CA API Management
 
Mobile SSO using NAPPS
Mobile SSO using NAPPSMobile SSO using NAPPS
Mobile SSO using NAPPS
Ashish Jain
 
Security Checklist: how iOS can help protecting your data.
Security Checklist: how iOS can help protecting your data.Security Checklist: how iOS can help protecting your data.
Security Checklist: how iOS can help protecting your data.
Tomek Cejner
 
CASE STUDY - Ironclad Messaging & Secure App Dev for Regulated Industries
CASE STUDY - Ironclad Messaging & Secure App Dev for Regulated IndustriesCASE STUDY - Ironclad Messaging & Secure App Dev for Regulated Industries
CASE STUDY - Ironclad Messaging & Secure App Dev for Regulated Industries
NowSecure
 

What's hot (13)

OOW13: Developing secure mobile applications (CON8902)
OOW13: Developing secure mobile applications (CON8902)OOW13: Developing secure mobile applications (CON8902)
OOW13: Developing secure mobile applications (CON8902)
 
Device Management for Connected Devices
Device Management for Connected Devices Device Management for Connected Devices
Device Management for Connected Devices
 
SYPHERSAFE
SYPHERSAFESYPHERSAFE
SYPHERSAFE
 
Effective Smartphone UX at GREE
Effective Smartphone UX at GREEEffective Smartphone UX at GREE
Effective Smartphone UX at GREE
 
Connecting The Real World With The Virtual World
Connecting The Real World With The Virtual WorldConnecting The Real World With The Virtual World
Connecting The Real World With The Virtual World
 
Providing Internet Access via WSO2 Enterprise Mobility Manager
Providing Internet Access via WSO2 Enterprise Mobility Manager Providing Internet Access via WSO2 Enterprise Mobility Manager
Providing Internet Access via WSO2 Enterprise Mobility Manager
 
I phone
I phoneI phone
I phone
 
Nexus Protocol Gateway and BYOD
Nexus Protocol Gateway and BYODNexus Protocol Gateway and BYOD
Nexus Protocol Gateway and BYOD
 
Patterns and Practices in Mobile SSO
Patterns and Practices in Mobile SSOPatterns and Practices in Mobile SSO
Patterns and Practices in Mobile SSO
 
Beyond MDM: 5 Things You Must do to Secure Mobile Devices in the Enterprise
Beyond MDM: 5 Things You Must do to Secure Mobile Devices in the EnterpriseBeyond MDM: 5 Things You Must do to Secure Mobile Devices in the Enterprise
Beyond MDM: 5 Things You Must do to Secure Mobile Devices in the Enterprise
 
Mobile SSO using NAPPS
Mobile SSO using NAPPSMobile SSO using NAPPS
Mobile SSO using NAPPS
 
Security Checklist: how iOS can help protecting your data.
Security Checklist: how iOS can help protecting your data.Security Checklist: how iOS can help protecting your data.
Security Checklist: how iOS can help protecting your data.
 
CASE STUDY - Ironclad Messaging & Secure App Dev for Regulated Industries
CASE STUDY - Ironclad Messaging & Secure App Dev for Regulated IndustriesCASE STUDY - Ironclad Messaging & Secure App Dev for Regulated Industries
CASE STUDY - Ironclad Messaging & Secure App Dev for Regulated Industries
 

Similar to Speech Recognition as a User Interface

IRJET- Voice Recognition(AI) : Voice Assistant Robot
IRJET-  	  Voice Recognition(AI) : Voice Assistant RobotIRJET-  	  Voice Recognition(AI) : Voice Assistant Robot
IRJET- Voice Recognition(AI) : Voice Assistant Robot
IRJET Journal
 
Voice automator
Voice automatorVoice automator
Voice automator
Prafull Agrawal
 
Enterprise on the Go - Devon Winkworth, Snr. Principal Consultant, Layer 7 @ ...
Enterprise on the Go - Devon Winkworth, Snr. Principal Consultant, Layer 7 @ ...Enterprise on the Go - Devon Winkworth, Snr. Principal Consultant, Layer 7 @ ...
Enterprise on the Go - Devon Winkworth, Snr. Principal Consultant, Layer 7 @ ...
CA API Management
 
Another Update of Tablet Strategy Bootcamp
Another Update of Tablet Strategy BootcampAnother Update of Tablet Strategy Bootcamp
Another Update of Tablet Strategy Bootcamp
Paul Saunders
 
Overview of Enterprise Mobility
Overview of Enterprise MobilityOverview of Enterprise Mobility
Overview of Enterprise Mobility
Yuvaraj Ilangovan
 
Summary of Device Coverage Report 2021.pdf
Summary of Device Coverage Report 2021.pdfSummary of Device Coverage Report 2021.pdf
Summary of Device Coverage Report 2021.pdf
pCloudy
 
Make Good Apps great - Using IBM MobileFirst Foundation
Make Good Apps great - Using IBM MobileFirst FoundationMake Good Apps great - Using IBM MobileFirst Foundation
Make Good Apps great - Using IBM MobileFirst Foundation
Ajay Chebbi
 
Wake-up-word speech recognition using GPS on smart phone
Wake-up-word speech recognition using GPS on smart phoneWake-up-word speech recognition using GPS on smart phone
Wake-up-word speech recognition using GPS on smart phone
IJERA Editor
 
Mobile Pharma: When 'Go Mobile' Goes Wrong - Top 9 Mistakes
Mobile Pharma: When 'Go Mobile' Goes Wrong - Top 9 MistakesMobile Pharma: When 'Go Mobile' Goes Wrong - Top 9 Mistakes
Mobile Pharma: When 'Go Mobile' Goes Wrong - Top 9 Mistakes
yonster
 
Addressing the Challenges of Mobile Test Automation
Addressing the Challenges of Mobile Test AutomationAddressing the Challenges of Mobile Test Automation
Addressing the Challenges of Mobile Test Automation
TechWell
 
Core Concepts of Mobile Development.pdf
Core Concepts of Mobile Development.pdfCore Concepts of Mobile Development.pdf
Core Concepts of Mobile Development.pdf
ShaiAlmog1
 
Mobile simplificado
Mobile simplificadoMobile simplificado
Mobile simplificado
Mobile Marketing Association
 
JUMP13 Whitepapers Live: Mobile Innovation
JUMP13 Whitepapers Live: Mobile InnovationJUMP13 Whitepapers Live: Mobile Innovation
JUMP13 Whitepapers Live: Mobile Innovation
Jamie Brighton
 
Connect 2013 - Infrastructure Fitness and Design Simplicity for IBM Mobile Co...
Connect 2013 - Infrastructure Fitness and Design Simplicity for IBM Mobile Co...Connect 2013 - Infrastructure Fitness and Design Simplicity for IBM Mobile Co...
Connect 2013 - Infrastructure Fitness and Design Simplicity for IBM Mobile Co...
René Winkelmeyer
 
Enterprise Mobility kan det gøres let for alle
Enterprise Mobility kan det gøres let for alleEnterprise Mobility kan det gøres let for alle
Enterprise Mobility kan det gøres let for alle
Microsoft
 
How IBM and Dialogic Are Making Conferencing Smarter with AI
How IBM and Dialogic Are Making Conferencing Smarter with AIHow IBM and Dialogic Are Making Conferencing Smarter with AI
How IBM and Dialogic Are Making Conferencing Smarter with AI
Dialogic Inc.
 
IBM Mobile Overview for Ecosystem Partners
IBM Mobile Overview for Ecosystem PartnersIBM Mobile Overview for Ecosystem Partners
IBM Mobile Overview for Ecosystem Partners
Jeremy Siewert
 
Mobile to Mainframe - En-to-end transformation
Mobile to Mainframe - En-to-end transformationMobile to Mainframe - En-to-end transformation
Mobile to Mainframe - En-to-end transformation
Sanjeev Sharma
 

Similar to Speech Recognition as a User Interface (20)

Marketing in the Age of Mobile
Marketing in the Age of MobileMarketing in the Age of Mobile
Marketing in the Age of Mobile
 
IRJET- Voice Recognition(AI) : Voice Assistant Robot
IRJET-  	  Voice Recognition(AI) : Voice Assistant RobotIRJET-  	  Voice Recognition(AI) : Voice Assistant Robot
IRJET- Voice Recognition(AI) : Voice Assistant Robot
 
Voice automator
Voice automatorVoice automator
Voice automator
 
FingerprintTouch
FingerprintTouchFingerprintTouch
FingerprintTouch
 
Enterprise on the Go - Devon Winkworth, Snr. Principal Consultant, Layer 7 @ ...
Enterprise on the Go - Devon Winkworth, Snr. Principal Consultant, Layer 7 @ ...Enterprise on the Go - Devon Winkworth, Snr. Principal Consultant, Layer 7 @ ...
Enterprise on the Go - Devon Winkworth, Snr. Principal Consultant, Layer 7 @ ...
 
Another Update of Tablet Strategy Bootcamp
Another Update of Tablet Strategy BootcampAnother Update of Tablet Strategy Bootcamp
Another Update of Tablet Strategy Bootcamp
 
Overview of Enterprise Mobility
Overview of Enterprise MobilityOverview of Enterprise Mobility
Overview of Enterprise Mobility
 
Summary of Device Coverage Report 2021.pdf
Summary of Device Coverage Report 2021.pdfSummary of Device Coverage Report 2021.pdf
Summary of Device Coverage Report 2021.pdf
 
Make Good Apps great - Using IBM MobileFirst Foundation
Make Good Apps great - Using IBM MobileFirst FoundationMake Good Apps great - Using IBM MobileFirst Foundation
Make Good Apps great - Using IBM MobileFirst Foundation
 
Wake-up-word speech recognition using GPS on smart phone
Wake-up-word speech recognition using GPS on smart phoneWake-up-word speech recognition using GPS on smart phone
Wake-up-word speech recognition using GPS on smart phone
 
Mobile Pharma: When 'Go Mobile' Goes Wrong - Top 9 Mistakes
Mobile Pharma: When 'Go Mobile' Goes Wrong - Top 9 MistakesMobile Pharma: When 'Go Mobile' Goes Wrong - Top 9 Mistakes
Mobile Pharma: When 'Go Mobile' Goes Wrong - Top 9 Mistakes
 
Addressing the Challenges of Mobile Test Automation
Addressing the Challenges of Mobile Test AutomationAddressing the Challenges of Mobile Test Automation
Addressing the Challenges of Mobile Test Automation
 
Core Concepts of Mobile Development.pdf
Core Concepts of Mobile Development.pdfCore Concepts of Mobile Development.pdf
Core Concepts of Mobile Development.pdf
 
Mobile simplificado
Mobile simplificadoMobile simplificado
Mobile simplificado
 
JUMP13 Whitepapers Live: Mobile Innovation
JUMP13 Whitepapers Live: Mobile InnovationJUMP13 Whitepapers Live: Mobile Innovation
JUMP13 Whitepapers Live: Mobile Innovation
 
Connect 2013 - Infrastructure Fitness and Design Simplicity for IBM Mobile Co...
Connect 2013 - Infrastructure Fitness and Design Simplicity for IBM Mobile Co...Connect 2013 - Infrastructure Fitness and Design Simplicity for IBM Mobile Co...
Connect 2013 - Infrastructure Fitness and Design Simplicity for IBM Mobile Co...
 
Enterprise Mobility kan det gøres let for alle
Enterprise Mobility kan det gøres let for alleEnterprise Mobility kan det gøres let for alle
Enterprise Mobility kan det gøres let for alle
 
How IBM and Dialogic Are Making Conferencing Smarter with AI
How IBM and Dialogic Are Making Conferencing Smarter with AIHow IBM and Dialogic Are Making Conferencing Smarter with AI
How IBM and Dialogic Are Making Conferencing Smarter with AI
 
IBM Mobile Overview for Ecosystem Partners
IBM Mobile Overview for Ecosystem PartnersIBM Mobile Overview for Ecosystem Partners
IBM Mobile Overview for Ecosystem Partners
 
Mobile to Mainframe - En-to-end transformation
Mobile to Mainframe - En-to-end transformationMobile to Mainframe - En-to-end transformation
Mobile to Mainframe - En-to-end transformation
 

Recently uploaded

GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 

Recently uploaded (20)

GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 

Speech Recognition as a User Interface

  • 1. PAGE1 © 2015 Apio Systems, Inc. Confidential 1 Jared Sheehan @ Driversiti Speech Recognition as a User Interface
  • 2. PAGE2 © 2015 Apio Systems, Inc. Confidential 2 Who am I Glass explorer, speech recognition enthusiast and big android nerd Android Lead @Driversiti - driving safety for the mobile generation Speech Recognition application for the Amazon Fire Phone Suite of applications - AIM Android, Engadget Android, Distro Android, TechCrunch Android, AOL HD, AIM Blackberry Meetup evangelist – “DC Android Meetup Group” – Join today!
  • 3. PAGE3 © 2015 Apio Systems, Inc. Confidential 3 Overview What is voice/speech recognition? What awesome stuff you can do with it? How it works… Demo! Question and Answer
  • 4. PAGE4 © 2015 Apio Systems, Inc. Confidential 4 Hello Computer…
  • 5. PAGE5 © 2015 Apio Systems, Inc. Confidential 5 Definition
  • 6. PAGE6 © 2015 Apio Systems, Inc. Confidential 6 What can you do with SR? Technology that allows spoken input into software systems. You speak to your computer, tablet, phone or device and it uses what you said as input to trigger some sort of action. Replace other methods of input like clicking, swiping, typing or selecting in other ways. It is a means to make devices and software more user-friendly and to increase productivity. It is used extensively as a form of accessibility assistance.
  • 7. PAGE7 © 2015 Apio Systems, Inc. Confidential 7 ASR - Dictation Automatic speech recognition (ASR) also called Dictation Translates speech input into words, sentences and punctuation. Audio is input through a microphone and streamed somewhere The result is usually returned as a string with a confidence level Very easy integration with Android – 2 ways to do it.
  • 8. PAGE8 © 2015 Apio Systems, Inc. Confidential 8 How does it work? A user speaks into a recording device of some sort Speech recognition begins with the digital sampling of speech and then acoustic signal processing of the audio. Several processes including DTW (Dynamic time warping), HMM (Hidden Markov models) and NN’s (Neural Networks) can achieve the desired results Most systems use language specific knowledge to tune the models. Next is the actual recognition of phonemes, groups of phonemes and words
  • 9. PAGE9 © 2015 Apio Systems, Inc. Confidential 9 Speech Recognition system architecture
  • 10. PAGE10 © 2015 Apio Systems, Inc. Confidential 10 Into the weeds Speaker dependence Speaker independence Continuous Speech How good is your system? Hint: Word Error Rate Isolated word Is that all it does??
  • 11. PAGE11 © 2015 Apio Systems, Inc. Confidential 11 Dictation is cool, but not that cool Next step is understanding what the user wants to do Then act on it Generally, the ASR results are passed into an Intent recognition system with additional information Contextual information can be, where the utterance is coming from (mobile phone, computer), what app they are using, location etc. That information is used to determine the user’s intent and execute the request.
  • 12. PAGE12 © 2015 Apio Systems, Inc. Confidential 12 Intent recognition Recognizing speech is only part of the process. How does Google Now know that I want to send an SMS message to a friend? How does Siri know when I want to know how tall Kobe Bryant is? ASR is only the first step in true Speech as a user interface. To successfully help users perform useful actions we must understand their intent. How to do this? Three systems; ASR, Intent Recognition and a Dialog Engine The Dialog engine takes the output from the IR system and sends responses and actionable information to the caller.
  • 13. PAGE13 © 2015 Apio Systems, Inc. Confidential 13 Android Speech APIs
  • 14. PAGE14 © 2015 Apio Systems, Inc. Confidential 14 Android Speech APIs http://developer.android.com/reference/android/speech/package-summary.html Relatively easy implementation <uses-permission android:name="android.permission.RECORD_AUDIO" /> A UI and no UI API InputMethodServices use the no UI version - Keyboards
  • 15. PAGE15 © 2015 Apio Systems, Inc. Confidential 15 Recognizer Intent UI is supplied for you Fire the intent and get a result Again very easy to use
  • 16. PAGE16 © 2015 Apio Systems, Inc. Confidential 16 SpeechRecognizer UI is not supplied for you Results are streamed directly to the EditText Still “fairly” easy to use
  • 17. PAGE17 © 2015 Apio Systems, Inc. Confidential 17 Google Now – Onto Intent recognition systems…
  • 18. PAGE18 © 2015 Apio Systems, Inc. Confidential 18 Google Now – On tap
  • 19. PAGE19 © 2015 Apio Systems, Inc. Confidential 19 Apple – Siri
  • 20. PAGE20 © 2015 Apio Systems, Inc. Confidential 20 Amazon – Fire phone, Fire Tv and Echo
  • 21. PAGE21 © 2015 Apio Systems, Inc. Confidential 21 Microsoft – Cortana
  • 22. PAGE22 © 2015 Apio Systems, Inc. Confidential 22 Speech providers – Google, Nuance, IBM Watson
  • 23. PAGE23 © 2015 Apio Systems, Inc. Confidential 23 Google Voice Interaction API
  • 24. PAGE24 © 2015 Apio Systems, Inc. Confidential 24 Nuance Speech SDK Dragon Mobile – SDK – Free up to 20k transactions per/month Upload custom vocabularies Developer: Uploads a new song and music vocabulary Utterance: “Eminem” higher probability then “M&M”
  • 25. PAGE25 © 2015 Apio Systems, Inc. Confidential 25 User Interface examples - Google Glass
  • 26. PAGE26 © 2015 Apio Systems, Inc. Confidential 26 User Interface examples - Google Glass continued…
  • 27. PAGE27 © 2015 Apio Systems, Inc. Confidential 27 User Interface examples - Google Glass continued…
  • 28. PAGE28 © 2015 Apio Systems, Inc. Confidential Enough talk!
  • 29. PAGE29 © 2015 Apio Systems, Inc. Confidential Show me code!
  • 30. PAGE30 © 2015 Apio Systems, Inc. Confidential jared.sheehan@driversiti.com http://www.meetup.com/DCAndroid/ Tweet: @jayroo5245 THANK YOU

Editor's Notes

  1. What else is there?