SlideShare a Scribd company logo
1 of 28
Watch the video with slide 
synchronization on InfoQ.com! 
http://www.infoq.com/presentations 
/speech-ui-mobile 
InfoQ.com: News & Community Site 
• 750,000 unique visitors/month 
• Published in 4 languages (English, Chinese, Japanese and Brazilian 
Portuguese) 
• Post content from our QCon conferences 
• News 15-20 / week 
• Articles 3-4 / week 
• Presentations (videos) 12-15 / week 
• Interviews 2-3 / week 
• Books 1 / month
Presented at QCon London 
www.qconlondon.com 
Purpose of QCon 
- to empower software development by facilitating the spread of 
knowledge and innovation 
Strategy 
- practitioner-driven conference designed for YOU: influencers of 
change and innovation in your teams 
- speakers and topics driving the evolution and innovation 
- connecting and catalyzing the influencers and innovators 
Highlights 
- attended by more than 12,000 delegates since 2007 
- held in 9 cities worldwide
An Unseen Interface 
Creating Speech-driven UI For Your App That Makes Users Happy 
by Halle Winkler, @politepix http://www.politepix.com 
:D
What is a speech-driven UI?
A speech-driven UI uses either 
speech recognition as an input 
method, speech synthesis as 
an information source for the 
user, or both together. 
...but it can also be multi-modal.
How does speech 
recognition work? 
The elements of speech recognition are: 
1. An acoustic model 
2. A lexicon 
3. A language model (probability) or grammar (ruleset for states) 
4. A decoder
What kind of apps benefit 
from speech UIs? 
Large Vocabulary Tasks: server, built-in vocabulary 
(UITextView, Android.speech, Nuance, AT&T, iSpeech) 
Tasks in which free-form dictation is useful 
Tasks which relate specifically to language 
Command and control tasks: offline, you generally define 
vocabulary (OpenEars or other CMU Sphinx or Julius 
implementations, some Android.speech devices and OSes) 
Interfaces where the user is looking somewhere else 
Interfaces where speech provides a new input or output 
Interfaces that are more fun with speech 
Interfaces where it’s easier to speak than type 
Interfaces where it’s easier to listen than read 
Interfaces where a heavy obstacle is removed
Why offline? 
The interface is always available to your user 
Speed is as fast or faster as a network API – and it's quantifiable! 
Interface design and implementation is simpler and more 
predictable without an asynchronous network dependency 
The user is not giving away any of their data
How is a speech UI 
different from a 
visual UI? 
What are the dimensions on which a 
visual UI is rendered? 
What are the dimensions on which a speech UI is rendered? 
A speech UI is rendered on the dimension of time. 
People value their time exquisitely.
Do people understand each other 
perfectly all the time? 
Why not? 
Accents 
Lack of shared vocabulary/Dialect 
Noise 
Distractions 
Interruptions 
Hearing difficulties 
Distance 
Language errors 
! 
Human speech interactions have frequent comprehension faults 
Emotional intelligence makes us incredibly fault-tolerant
Automated speech 
recognition is subject to all 
the same issues as human 
speech recognition, but 
without the emotional 
intelligence
We have to stack 
the deck in our 
(users’) favor.
Short is good. 
Don't bite off more than you can chew – small (read "fast") steps 
forward means small (read "fast") steps backwards 
! 
Use keyword detection to launch events 
! 
Switch between small vocabularies that each relate to one domain 
This results in accuracy, speed, and a large vocabulary!
Short is bad. 
Phonemes are the smallest unit of speech 
Words with few of them have a lot of rhymes 
Contextless rhyming is our enemy 
Medium-sized, crunchy granola words are our friends
My app, my rules 
Some apps need to recognize words 
or phrases in ways that can be 
expressed by rules. 
Or be flexible 
Some apps need to do probability-based detection 
There are probability-based language models for 
expressing this such as ARPA models
Out of vocabulary 
Your app also has to behave well when 
people aren't speaking to it!
Mic distance and 
vocabulary 
The more distance, the less vocabulary
Test, test, test. 
And obtain appropriate test material.
Case study 1: Recipe App 
A natural implementation of offline speech recognition
What are our interface 
considerations? 
• What are we buying with our time? Hands-free operation, moving locus 
• Hands-free doesn't mean eyes-free! We can provide visual info 
• Operational distance is pretty far 
• Instead of NLP, offline grammar 
• Secret weapon: we know all the words in a recipe in advance 
• Fault tolerance: one level of complexity, don't confirm; return! 
• Challenges: noise, moving locus, reflection, competing speech 
!
Case study 2: Marco Polo 
A dialog management tag game: one user checks in 
a single location and the other user receives 
volume-based speech feedback about their 
proximity to the target when they say “Marco”
UX Considerations 
• What are we buying with our time: play! 
• For a single word, language model is fast and sufficient 
• Acoustic environment and OOV semi-important 
• This is a single-mode interface – an actual dialog 
manager 
• Extra development time should be put into increasing 
voice dynamic range
Case study 3: 
TalkCheater 
An app to whisper sweet presentation notes in your ear
UX Considerations 
• What are we buying? Eye contact, moving locus, 
enhanced human capabilities 
• Is this a speech recognition app? 
• Does this have a visual or a touch interface? 
• The body is the interface 
• Fault tolerance, always important but most 
important in a high-value scenario 
• Volume 
• Speaking speed of synthesized speech
Talk to me @politepix 
and the OpenEars 
forums. I will tell you 
all the things.
Watch the video with slide synchronization on 
InfoQ.com! 
http://www.infoq.com/presentations/speech-ui- 
mobile

More Related Content

More from C4Media

Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020C4Media
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No KeeperC4Media
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like OwnersC4Media
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsC4Media
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechC4Media
 

More from C4Media (20)

Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

An Unseen Interface

  • 1.
  • 2. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /speech-ui-mobile InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month
  • 3. Presented at QCon London www.qconlondon.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. An Unseen Interface Creating Speech-driven UI For Your App That Makes Users Happy by Halle Winkler, @politepix http://www.politepix.com :D
  • 5. What is a speech-driven UI?
  • 6. A speech-driven UI uses either speech recognition as an input method, speech synthesis as an information source for the user, or both together. ...but it can also be multi-modal.
  • 7. How does speech recognition work? The elements of speech recognition are: 1. An acoustic model 2. A lexicon 3. A language model (probability) or grammar (ruleset for states) 4. A decoder
  • 8. What kind of apps benefit from speech UIs? Large Vocabulary Tasks: server, built-in vocabulary (UITextView, Android.speech, Nuance, AT&T, iSpeech) Tasks in which free-form dictation is useful Tasks which relate specifically to language Command and control tasks: offline, you generally define vocabulary (OpenEars or other CMU Sphinx or Julius implementations, some Android.speech devices and OSes) Interfaces where the user is looking somewhere else Interfaces where speech provides a new input or output Interfaces that are more fun with speech Interfaces where it’s easier to speak than type Interfaces where it’s easier to listen than read Interfaces where a heavy obstacle is removed
  • 9. Why offline? The interface is always available to your user Speed is as fast or faster as a network API – and it's quantifiable! Interface design and implementation is simpler and more predictable without an asynchronous network dependency The user is not giving away any of their data
  • 10. How is a speech UI different from a visual UI? What are the dimensions on which a visual UI is rendered? What are the dimensions on which a speech UI is rendered? A speech UI is rendered on the dimension of time. People value their time exquisitely.
  • 11. Do people understand each other perfectly all the time? Why not? Accents Lack of shared vocabulary/Dialect Noise Distractions Interruptions Hearing difficulties Distance Language errors ! Human speech interactions have frequent comprehension faults Emotional intelligence makes us incredibly fault-tolerant
  • 12. Automated speech recognition is subject to all the same issues as human speech recognition, but without the emotional intelligence
  • 13. We have to stack the deck in our (users’) favor.
  • 14. Short is good. Don't bite off more than you can chew – small (read "fast") steps forward means small (read "fast") steps backwards ! Use keyword detection to launch events ! Switch between small vocabularies that each relate to one domain This results in accuracy, speed, and a large vocabulary!
  • 15. Short is bad. Phonemes are the smallest unit of speech Words with few of them have a lot of rhymes Contextless rhyming is our enemy Medium-sized, crunchy granola words are our friends
  • 16. My app, my rules Some apps need to recognize words or phrases in ways that can be expressed by rules. Or be flexible Some apps need to do probability-based detection There are probability-based language models for expressing this such as ARPA models
  • 17. Out of vocabulary Your app also has to behave well when people aren't speaking to it!
  • 18. Mic distance and vocabulary The more distance, the less vocabulary
  • 19. Test, test, test. And obtain appropriate test material.
  • 20. Case study 1: Recipe App A natural implementation of offline speech recognition
  • 21. What are our interface considerations? • What are we buying with our time? Hands-free operation, moving locus • Hands-free doesn't mean eyes-free! We can provide visual info • Operational distance is pretty far • Instead of NLP, offline grammar • Secret weapon: we know all the words in a recipe in advance • Fault tolerance: one level of complexity, don't confirm; return! • Challenges: noise, moving locus, reflection, competing speech !
  • 22. Case study 2: Marco Polo A dialog management tag game: one user checks in a single location and the other user receives volume-based speech feedback about their proximity to the target when they say “Marco”
  • 23. UX Considerations • What are we buying with our time: play! • For a single word, language model is fast and sufficient • Acoustic environment and OOV semi-important • This is a single-mode interface – an actual dialog manager • Extra development time should be put into increasing voice dynamic range
  • 24. Case study 3: TalkCheater An app to whisper sweet presentation notes in your ear
  • 25. UX Considerations • What are we buying? Eye contact, moving locus, enhanced human capabilities • Is this a speech recognition app? • Does this have a visual or a touch interface? • The body is the interface • Fault tolerance, always important but most important in a high-value scenario • Volume • Speaking speed of synthesized speech
  • 26. Talk to me @politepix and the OpenEars forums. I will tell you all the things.
  • 27.
  • 28. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/speech-ui- mobile