Speech Recognition as a User Interface

© 2015 Apio Systems, Inc. Confidential 1
Jared Sheehan @ Driversiti
Speech Recognition as a User Interface

Who am I
Glass explorer, speech recognition enthusiast and big android nerd
Android Lead @Driversiti - driving safety for the mobile generation
Speech Recognition application for the Amazon Fire Phone
Suite of applications - AIM Android, Engadget Android, Distro Android, TechCrunch
Android, AOL HD, AIM Blackberry
Meetup evangelist – “DC Android Meetup Group” – Join today!

Overview
What is voice/speech recognition?
What awesome stuff you can do with it?
How it works…
Demo!
Question and Answer

Hello Computer…

Definition

What can you do with SR?
Technology that allows spoken input into software systems.
You speak to your computer, tablet, phone or device and it uses what you said as input to
trigger some sort of action.
Replace other methods of input like clicking, swiping, typing or selecting in other ways.
It is a means to make devices and software more user-friendly and to increase productivity.
It is used extensively as a form of accessibility assistance.

ASR - Dictation
Automatic speech recognition (ASR) also called Dictation
Translates speech input into words, sentences and punctuation.
Audio is input through a microphone and streamed somewhere
The result is usually returned as a string with a confidence level
Very easy integration with Android – 2 ways to do it.

How does it work?
A user speaks into a recording device of some sort
Speech recognition begins with the digital sampling of speech and then acoustic signal
processing of the audio.
Several processes including DTW (Dynamic time warping), HMM (Hidden Markov models)
and NN’s (Neural Networks) can achieve the desired results
Most systems use language specific knowledge to tune the models.
Next is the actual recognition of phonemes, groups of phonemes and words

Speech Recognition system architecture

Into the weeds
Speaker dependence
Speaker independence
Continuous Speech
How good is your system? Hint: Word Error Rate
Isolated word
Is that all it does??

Dictation is cool, but not that cool
Next step is understanding what the user wants to do
Then act on it
Generally, the ASR results are passed into an Intent recognition system with additional
information
Contextual information can be, where the utterance is coming from (mobile phone,
computer), what app they are using, location etc.
That information is used to determine the user’s intent and execute the request.

Intent recognition
Recognizing speech is only part of the process. How does Google Now know that I want to
send an SMS message to a friend? How does Siri know when I want to know how tall
Kobe Bryant is?
ASR is only the first step in true Speech as a user interface. To successfully help users
perform useful actions we must understand their intent. How to do this?
Three systems; ASR, Intent Recognition and a Dialog Engine
The Dialog engine takes the output from the IR system and sends responses and
actionable information to the caller.

Android Speech APIs

Android Speech APIs
http://developer.android.com/reference/android/speech/package-summary.html
Relatively easy implementation
<uses-permission android:name="android.permission.RECORD_AUDIO" />
A UI and no UI API
InputMethodServices use the no UI version - Keyboards

Recognizer Intent
UI is supplied for you
Fire the intent and get a result
Again very easy to use

SpeechRecognizer
UI is not supplied for you
Results are streamed directly to the EditText
Still “fairly” easy to use

Google Now – Onto Intent recognition systems…

Google Now – On tap

Apple – Siri

Amazon – Fire phone, Fire Tv and Echo

Microsoft – Cortana

Speech providers – Google, Nuance, IBM Watson

Google Voice Interaction API

Nuance Speech SDK
Dragon Mobile – SDK – Free up to 20k transactions per/month
Upload custom vocabularies
Developer: Uploads a new song and music vocabulary
Utterance: “Eminem” higher probability then “M&M”

User Interface examples - Google Glass

User Interface examples - Google Glass continued…

Show me code!

jared.sheehan@driversiti.com
http://www.meetup.com/DCAndroid/
Tweet: @jayroo5245
THANK YOU

Speech Recognition as a User Interface

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Similar to Speech Recognition as a User Interface

Similar to Speech Recognition as a User Interface (20)

Recently uploaded

Recently uploaded (20)

Speech Recognition as a User Interface

Editor's Notes