SlideShare a Scribd company logo
Revised v4Presenter
Speech Input API For Android
Alex Gruenstein
Outline
•  Android built-in speech features
•  Speech recognition primer
•  How to: integrate speech input directly in your
Android application
Voice Search
•  Speak any Google search query
•  Supported on Android, iPhone/
iPod/iPad, Blackberry, Nokia s60
•  15 Languages:
• English (US, UK, Indian,
Australian), Japanese,
Mandarin, Korean, Taiwanese,
French, Italian, German,
Spanish, Russian, Polish,
Czech
•  Video
Voice Actions
•  Beyond search
•  Send text to Clare Homberlyn
Hey are you coming home?
•  Send e-mail I’m running late.
•  Navigate to the Museum of
Modern Art
•  Listen to The Beatles
•  Go to Wikipedia
•  Video
Android Voice Input
•  Speak anywhere
you would
normally type.
•  Status updates,
Twitter, SMS,
Email, etc.
•  Video
Revised v4Presenter
Speech Recognition
Google’s Speech Recognizer
Google speech server 
US English
Acoustic
Model
Dictionary
Search
Language
Model
Dictation
Language
Model
Japanese
Acoustic
Model
Dictionary
Search
Language
Model
Dictation
Language
Model
…
Layered Stochastic Models
Audio -> phonetic units
•  P(t1 -> “eh”) = .7
•  P(t1 -> “iy”) = .3
Words -> phonetic units
•  P(read -> r eh d) = .6
•  P(read -> r iy d) = .4
Probability of word sequences
•  P(“read a book”) > P(“read a flower”)
Acoustic
Model
Dictionary
Language
Model
t0
t1
…
Estimated with Data
•  The language model is estimated using logs
of billions of Google searches.Language
Model
Estimated with Data
•  The language model is estimated using logs
of billions of Google searches.
•  Counts of short sequences of words are
used to estimate the probability of any
sentence
•  “san francisco golden gate bridge” ->
•  “san francisco golden”
•  “francisco golden gate”
•  “golden gate bridge”
•  Counting and probability smoothing
requires many hours on thousands of
computers!
Language
Model
Revised v4Presenter
How to:
Integrate speech input directly in
your Android application
Android Speech Input API
•  Android’s open platform makes it simple to
access Google’s speech recognizer
programmatically from your application.
•  (Or any recognizer that registers for
RecognizerIntent)
•  Simple to use to the API to:
•  Prompt the user to start speaking,
•  Stream the audio Google’s servers,
•  Retrieve the recognition hypothesis.
Example code
// Called when someone clicks a button in your app
public void onClick(View button) {
// Create a recognition request
Intent intent = new
Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
// Set the language model
intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
// Send the request to display prompt, record audio, and return a result
startActivityForResult(intent, 0);
}
// Called when speech recognition is finished
protected void onActivityResult(int requestCode,
int resultCode,
Intent intent) {
// Get the n-best list
ArrayList<String> nbest =
intent.getStringArrayListExtra(
RecognizerIntent.EXTRA_RESULTS);
// Do something with best result, e.g. “golden gate bridge”
DoSomething(nbest.get(0))
}
Parameters
•  Language (EXTRA_LANGUAGE), e.g.
• ja_jp (Japanese)
• en_us (US English)
•  If not set, then the phone’s default language is
used.
•  Language Model hints
(EXTRA_LANGUAGE_MODEL)
•  Search – Good for short queries, business
names, cities. The types of things people
search for on Google.
•  Free form – For dictation. Sending e-mail,
SMS, etc.
Google Speech Technology
•  More than just mobile phones…
•  Automatic subtitles for YouTube videos
•  Voicemail transcription for Google Voice
•  1-800-GOOG-411: free telephone directory
assistance
What’s next?
•  Video
•  http://www.google.co.jp/intl/ja/landing/animaru/

More Related Content

What's hot

2007q4 Developer Roadmap
2007q4 Developer Roadmap2007q4 Developer Roadmap
2007q4 Developer Roadmap
Phil Wolff
 

What's hot (20)

CLI, SDK, Doc... What if we generate them?
CLI, SDK, Doc... What if we generate them?CLI, SDK, Doc... What if we generate them?
CLI, SDK, Doc... What if we generate them?
 
The Ring programming language version 1.5.1 book - Part 4 of 180
The Ring programming language version 1.5.1 book - Part 4 of 180The Ring programming language version 1.5.1 book - Part 4 of 180
The Ring programming language version 1.5.1 book - Part 4 of 180
 
Writing Code That Writes Code
Writing Code That Writes CodeWriting Code That Writes Code
Writing Code That Writes Code
 
Computer language
Computer languageComputer language
Computer language
 
Rapid Prototyping with Cordova aka Phonegap
Rapid Prototyping with Cordova aka PhonegapRapid Prototyping with Cordova aka Phonegap
Rapid Prototyping with Cordova aka Phonegap
 
Doppl Code Sharing
Doppl Code SharingDoppl Code Sharing
Doppl Code Sharing
 
Using JavaScript for Mobile Development
Using JavaScript for Mobile DevelopmentUsing JavaScript for Mobile Development
Using JavaScript for Mobile Development
 
The Ring programming language version 1.5.4 book - Part 5 of 185
The Ring programming language version 1.5.4 book - Part 5 of 185The Ring programming language version 1.5.4 book - Part 5 of 185
The Ring programming language version 1.5.4 book - Part 5 of 185
 
Challenges of Developing BLE Application on Android
Challenges of Developing BLE Application on AndroidChallenges of Developing BLE Application on Android
Challenges of Developing BLE Application on Android
 
One Global Presentation
One Global PresentationOne Global Presentation
One Global Presentation
 
DevOps + MongoDB Serverless = 
DevOps + MongoDB Serverless = DevOps + MongoDB Serverless = 
DevOps + MongoDB Serverless = 
 
What is Kotlin Multiplaform? Why & How?
What is Kotlin Multiplaform? Why & How? What is Kotlin Multiplaform? Why & How?
What is Kotlin Multiplaform? Why & How?
 
The magic of flutter
The magic of flutterThe magic of flutter
The magic of flutter
 
The Ring programming language version 1.5.3 book - Part 5 of 184
The Ring programming language version 1.5.3 book - Part 5 of 184The Ring programming language version 1.5.3 book - Part 5 of 184
The Ring programming language version 1.5.3 book - Part 5 of 184
 
2007q4 Developer Roadmap
2007q4 Developer Roadmap2007q4 Developer Roadmap
2007q4 Developer Roadmap
 
Tech Talk Tokyo #1
Tech Talk Tokyo #1Tech Talk Tokyo #1
Tech Talk Tokyo #1
 
Native mobile application development with Flutter (Dart)
Native mobile application development with Flutter (Dart)Native mobile application development with Flutter (Dart)
Native mobile application development with Flutter (Dart)
 
Android Development with Kotlin course
Android Development  with Kotlin courseAndroid Development  with Kotlin course
Android Development with Kotlin course
 
Flutter study jam 2019
Flutter study jam 2019Flutter study jam 2019
Flutter study jam 2019
 
Flock 2017-g11n
Flock 2017-g11nFlock 2017-g11n
Flock 2017-g11n
 

Viewers also liked

Viewers also liked (6)

我行·你行·大家行03
我行·你行·大家行03我行·你行·大家行03
我行·你行·大家行03
 
Google Developer Day 2010 Japan: Part 1: Google App Engine for Business の概要 P...
Google Developer Day 2010 Japan: Part 1: Google App Engine for Business の概要 P...Google Developer Day 2010 Japan: Part 1: Google App Engine for Business の概要 P...
Google Developer Day 2010 Japan: Part 1: Google App Engine for Business の概要 P...
 
Google Developer Day 2010 Japan: 新 SocialWeb: 全てはオープンスタンダードの元に (ティモシー ジョーダン)
Google Developer Day 2010 Japan: 新 SocialWeb: 全てはオープンスタンダードの元に (ティモシー ジョーダン)Google Developer Day 2010 Japan: 新 SocialWeb: 全てはオープンスタンダードの元に (ティモシー ジョーダン)
Google Developer Day 2010 Japan: 新 SocialWeb: 全てはオープンスタンダードの元に (ティモシー ジョーダン)
 
GeoTechTalk InkSatogaeri Project
GeoTechTalk InkSatogaeri ProjectGeoTechTalk InkSatogaeri Project
GeoTechTalk InkSatogaeri Project
 
Google Developer Day 2010 Japan: Google エンジニアの日常 (山内 知昭)
Google Developer Day 2010 Japan: Google エンジニアの日常 (山内 知昭)Google Developer Day 2010 Japan: Google エンジニアの日常 (山内 知昭)
Google Developer Day 2010 Japan: Google エンジニアの日常 (山内 知昭)
 
Google Developer Day 2010 Japan: 「App Engine 開発者コミュニティ「appengine ja night」とフレ...
Google Developer Day 2010 Japan: 「App Engine 開発者コミュニティ「appengine ja night」とフレ...Google Developer Day 2010 Japan: 「App Engine 開発者コミュニティ「appengine ja night」とフレ...
Google Developer Day 2010 Japan: 「App Engine 開発者コミュニティ「appengine ja night」とフレ...
 

Similar to Google Developer Day 2010 Japan: 音声入力 API for Android (アレックス グランスタイン, 小西 祐介)

Building Speech Enabled Products with Amazon Polly & Amazon Lex
Building Speech Enabled Products with Amazon Polly & Amazon LexBuilding Speech Enabled Products with Amazon Polly & Amazon Lex
Building Speech Enabled Products with Amazon Polly & Amazon Lex
Amazon Web Services
 
Building speech enabled products with Amazon Polly & Amazon Lex
Building speech enabled products with Amazon Polly & Amazon LexBuilding speech enabled products with Amazon Polly & Amazon Lex
Building speech enabled products with Amazon Polly & Amazon Lex
Amazon Web Services
 

Similar to Google Developer Day 2010 Japan: 音声入力 API for Android (アレックス グランスタイン, 小西 祐介) (20)

Visual Studio 2015: novità per gli sviluppatori iOS, Android e Cross-Platform
Visual Studio 2015: novità per gli sviluppatori iOS, Android e Cross-PlatformVisual Studio 2015: novità per gli sviluppatori iOS, Android e Cross-Platform
Visual Studio 2015: novità per gli sviluppatori iOS, Android e Cross-Platform
 
Evolve your app’s video experience with Azure: Processing and Video AI at scale
Evolve your app’s video experience with Azure: Processing and Video AI at scaleEvolve your app’s video experience with Azure: Processing and Video AI at scale
Evolve your app’s video experience with Azure: Processing and Video AI at scale
 
Android voice skill sprint
Android voice skill sprintAndroid voice skill sprint
Android voice skill sprint
 
Xamarin: Create native iOS, Android and Windows apps in C#
Xamarin: Create native iOS, Android and Windows apps in C#Xamarin: Create native iOS, Android and Windows apps in C#
Xamarin: Create native iOS, Android and Windows apps in C#
 
Building Native “apps” with Visual Studio 2015
Building Native “apps” with Visual Studio 2015Building Native “apps” with Visual Studio 2015
Building Native “apps” with Visual Studio 2015
 
Xamarin v.Now
Xamarin v.NowXamarin v.Now
Xamarin v.Now
 
re:Invent Recap keynote - An introduction to the latest AWS services
re:Invent Recap keynote  - An introduction to the latest AWS servicesre:Invent Recap keynote  - An introduction to the latest AWS services
re:Invent Recap keynote - An introduction to the latest AWS services
 
Microsoft cognitive services
Microsoft cognitive servicesMicrosoft cognitive services
Microsoft cognitive services
 
How to implement voice recognition feature in ionic application converted
How to implement voice recognition feature in ionic application convertedHow to implement voice recognition feature in ionic application converted
How to implement voice recognition feature in ionic application converted
 
The Great Mobile Debate: Native vs. Hybrid App Development
The Great Mobile Debate: Native vs. Hybrid App DevelopmentThe Great Mobile Debate: Native vs. Hybrid App Development
The Great Mobile Debate: Native vs. Hybrid App Development
 
Hybrid Mobile App Development - Xamarin
Hybrid Mobile App Development - XamarinHybrid Mobile App Development - Xamarin
Hybrid Mobile App Development - Xamarin
 
C# everywhere
C# everywhereC# everywhere
C# everywhere
 
Building Speech Enabled Products with Amazon Polly & Amazon Lex
Building Speech Enabled Products with Amazon Polly & Amazon LexBuilding Speech Enabled Products with Amazon Polly & Amazon Lex
Building Speech Enabled Products with Amazon Polly & Amazon Lex
 
Practical implementation of Natural language processing with python
Practical implementation of Natural language processing with pythonPractical implementation of Natural language processing with python
Practical implementation of Natural language processing with python
 
10 World’s Leading Speech or Voice Recognition Software That Can 3X Your Prod...
10 World’s Leading Speech or Voice Recognition Software That Can 3X Your Prod...10 World’s Leading Speech or Voice Recognition Software That Can 3X Your Prod...
10 World’s Leading Speech or Voice Recognition Software That Can 3X Your Prod...
 
iPhone OS: The Next Killer Platform
iPhone OS: The Next Killer PlatformiPhone OS: The Next Killer Platform
iPhone OS: The Next Killer Platform
 
Cross-Platform Native Apps with JavaScript
Cross-Platform Native Apps with JavaScriptCross-Platform Native Apps with JavaScript
Cross-Platform Native Apps with JavaScript
 
Building speech enabled products with Amazon Polly & Amazon Lex
Building speech enabled products with Amazon Polly & Amazon LexBuilding speech enabled products with Amazon Polly & Amazon Lex
Building speech enabled products with Amazon Polly & Amazon Lex
 
Building A Great API - Evan Cooke, Cloudstock, December 2010
Building A Great API - Evan Cooke, Cloudstock, December 2010Building A Great API - Evan Cooke, Cloudstock, December 2010
Building A Great API - Evan Cooke, Cloudstock, December 2010
 
Real speaker usa
Real speaker   usaReal speaker   usa
Real speaker usa
 

More from Google Developer Relations Team

More from Google Developer Relations Team (10)

Google Developer Day 2010 Japan: Google App Engine についての最新情報 (松尾貴史)
Google Developer Day 2010 Japan: Google App Engine についての最新情報 (松尾貴史)Google Developer Day 2010 Japan: Google App Engine についての最新情報 (松尾貴史)
Google Developer Day 2010 Japan: Google App Engine についての最新情報 (松尾貴史)
 
Google Developer Day 2010 Japan: Google Chrome の Developer Tools (ミカイル ナガノフ, ...
Google Developer Day 2010 Japan: Google Chrome の Developer Tools (ミカイル ナガノフ, ...Google Developer Day 2010 Japan: Google Chrome の Developer Tools (ミカイル ナガノフ, ...
Google Developer Day 2010 Japan: Google Chrome の Developer Tools (ミカイル ナガノフ, ...
 
Google Developer DAy 2010 Japan: HTML5 についての最新情報 (マイク スミス)
Google Developer DAy 2010 Japan: HTML5 についての最新情報 (マイク スミス)Google Developer DAy 2010 Japan: HTML5 についての最新情報 (マイク スミス)
Google Developer DAy 2010 Japan: HTML5 についての最新情報 (マイク スミス)
 
Google Developer Day 2010 Japan: Android や iPhone で活用する Maps API のモバイル端末向け新機能...
Google Developer Day 2010 Japan: Android や iPhone で活用する Maps API のモバイル端末向け新機能...Google Developer Day 2010 Japan: Android や iPhone で活用する Maps API のモバイル端末向け新機能...
Google Developer Day 2010 Japan: Android や iPhone で活用する Maps API のモバイル端末向け新機能...
 
Google Developer Day 2010 Japan: プログラミング言語 Go (鵜飼 文敏)
Google Developer Day 2010 Japan: プログラミング言語 Go (鵜飼 文敏)Google Developer Day 2010 Japan: プログラミング言語 Go (鵜飼 文敏)
Google Developer Day 2010 Japan: プログラミング言語 Go (鵜飼 文敏)
 
Google Developer Day 2010 Japan: HTML5 とウェブサイトデザイン (矢倉 眞隆)
Google Developer Day 2010 Japan: HTML5 とウェブサイトデザイン (矢倉 眞隆)Google Developer Day 2010 Japan: HTML5 とウェブサイトデザイン (矢倉 眞隆)
Google Developer Day 2010 Japan: HTML5 とウェブサイトデザイン (矢倉 眞隆)
 
Google Developer Day 2010 Japan: Android でリアルタイムゲームを開発する方法: リベンジ (クリス プルエット)
Google Developer Day 2010 Japan: Android でリアルタイムゲームを開発する方法: リベンジ (クリス プルエット)Google Developer Day 2010 Japan: Android でリアルタイムゲームを開発する方法: リベンジ (クリス プルエット)
Google Developer Day 2010 Japan: Android でリアルタイムゲームを開発する方法: リベンジ (クリス プルエット)
 
Google Developer Day 2010 Japan: クールな Android アプリを作るには (安生真, 山下盛史, 江川崇)
Google Developer Day 2010 Japan: クールな Android アプリを作るには (安生真, 山下盛史, 江川崇)Google Developer Day 2010 Japan: クールな Android アプリを作るには (安生真, 山下盛史, 江川崇)
Google Developer Day 2010 Japan: クールな Android アプリを作るには (安生真, 山下盛史, 江川崇)
 
Google Developer Day 2010 Japan: マーケットライセンシングを使って Android アプリケーションを守るには (トニー ...
Google Developer Day 2010 Japan: マーケットライセンシングを使って Android アプリケーションを守るには (トニー ...Google Developer Day 2010 Japan: マーケットライセンシングを使って Android アプリケーションを守るには (トニー ...
Google Developer Day 2010 Japan: マーケットライセンシングを使って Android アプリケーションを守るには (トニー ...
 
Google Developer Day 2010 Japan: 高性能な Android アプリを作るには (ティム ブレイ)
Google Developer Day 2010 Japan: 高性能な Android アプリを作るには (ティム ブレイ)Google Developer Day 2010 Japan: 高性能な Android アプリを作るには (ティム ブレイ)
Google Developer Day 2010 Japan: 高性能な Android アプリを作るには (ティム ブレイ)
 

Recently uploaded

Recently uploaded (20)

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 

Google Developer Day 2010 Japan: 音声入力 API for Android (アレックス グランスタイン, 小西 祐介)

  • 1.
  • 2. Revised v4Presenter Speech Input API For Android Alex Gruenstein
  • 3. Outline •  Android built-in speech features •  Speech recognition primer •  How to: integrate speech input directly in your Android application
  • 4. Voice Search •  Speak any Google search query •  Supported on Android, iPhone/ iPod/iPad, Blackberry, Nokia s60 •  15 Languages: • English (US, UK, Indian, Australian), Japanese, Mandarin, Korean, Taiwanese, French, Italian, German, Spanish, Russian, Polish, Czech •  Video
  • 5. Voice Actions •  Beyond search •  Send text to Clare Homberlyn Hey are you coming home? •  Send e-mail I’m running late. •  Navigate to the Museum of Modern Art •  Listen to The Beatles •  Go to Wikipedia •  Video
  • 6. Android Voice Input •  Speak anywhere you would normally type. •  Status updates, Twitter, SMS, Email, etc. •  Video
  • 8. Google’s Speech Recognizer Google speech server US English Acoustic Model Dictionary Search Language Model Dictation Language Model Japanese Acoustic Model Dictionary Search Language Model Dictation Language Model …
  • 9. Layered Stochastic Models Audio -> phonetic units •  P(t1 -> “eh”) = .7 •  P(t1 -> “iy”) = .3 Words -> phonetic units •  P(read -> r eh d) = .6 •  P(read -> r iy d) = .4 Probability of word sequences •  P(“read a book”) > P(“read a flower”) Acoustic Model Dictionary Language Model t0 t1 …
  • 10. Estimated with Data •  The language model is estimated using logs of billions of Google searches.Language Model
  • 11. Estimated with Data •  The language model is estimated using logs of billions of Google searches. •  Counts of short sequences of words are used to estimate the probability of any sentence •  “san francisco golden gate bridge” -> •  “san francisco golden” •  “francisco golden gate” •  “golden gate bridge” •  Counting and probability smoothing requires many hours on thousands of computers! Language Model
  • 12. Revised v4Presenter How to: Integrate speech input directly in your Android application
  • 13. Android Speech Input API •  Android’s open platform makes it simple to access Google’s speech recognizer programmatically from your application. •  (Or any recognizer that registers for RecognizerIntent) •  Simple to use to the API to: •  Prompt the user to start speaking, •  Stream the audio Google’s servers, •  Retrieve the recognition hypothesis.
  • 14. Example code // Called when someone clicks a button in your app public void onClick(View button) { // Create a recognition request Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH); // Set the language model intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM); // Send the request to display prompt, record audio, and return a result startActivityForResult(intent, 0); } // Called when speech recognition is finished protected void onActivityResult(int requestCode, int resultCode, Intent intent) { // Get the n-best list ArrayList<String> nbest = intent.getStringArrayListExtra( RecognizerIntent.EXTRA_RESULTS); // Do something with best result, e.g. “golden gate bridge” DoSomething(nbest.get(0)) }
  • 15. Parameters •  Language (EXTRA_LANGUAGE), e.g. • ja_jp (Japanese) • en_us (US English) •  If not set, then the phone’s default language is used. •  Language Model hints (EXTRA_LANGUAGE_MODEL) •  Search – Good for short queries, business names, cities. The types of things people search for on Google. •  Free form – For dictation. Sending e-mail, SMS, etc.
  • 16. Google Speech Technology •  More than just mobile phones… •  Automatic subtitles for YouTube videos •  Voicemail transcription for Google Voice •  1-800-GOOG-411: free telephone directory assistance
  • 17. What’s next? •  Video •  http://www.google.co.jp/intl/ja/landing/animaru/