Your SlideShare is downloading. ×
Speech recognition challenges
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Speech recognition challenges

1,864
views

Published on


0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,864
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
70
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Speech Recognition Challenges Presenter: Alexandru Chica
  • 2. Contents Speech User Interface basic concepts •Speech recognition •Speech synthesis Speech Recognition Challenges •Accuracy •User responsiveness •Performance •Reliability •Fault tolerance
  • 3. Speech User Interface basic concepts Speech Recognition •The translation of spoken text into written text algorithm "#spit&S#" "speech" •Statistical Processing Phonetic representation •Hidden Marcov Models of speech •Dynamic Time Warping Types of speech recognition: •Command and control •Dictation
  • 4. Speech User Interface basic concepts Speech Recognition Components •Audio input (front-end) •Grammars – contain commands that can be spoken by the user •Acoustic models – language dependant, used to “define” the language features •Recognition algorithms (back-end) Back end feature extraction result Audio input / Acoustic Recognition Grammars models algorithms Front end
  • 5. Speech User Interface basic concepts Speech Recognition APIs Microsoft SAPI IBM: Embedded ViaVoice Nuance: VoCon VoiceBox Speech Recognition
  • 6. Speech User Interface basic concepts Speech Synthesis •The translation of written text into spoken text g2p "speech" "#spit&S#"
  • 7. Speech User Interface basic concepts Speech Synthesis APIs Microsoft SAPI SoftVoice TTS Apple PlainTalk Nuance: Vocalizer SVOX TTS eSpeak
  • 8. Speech User Interface basic concepts - Usage In car: •Control media player / radio stations •Control navigation •Control phone book and phone activities •Find POI locations (POI : point of interests) •E-mail/SMS reading On the web: •HTML 5 speech input •Google Search with voice input •Reading of web page content
  • 9. Speech Recognition Challenges – Accuracy Audio Input Problem: Audio signal quality Impact: loss of recognition accuracy Solution 1: Echo cancellation Solution 2: Beamforming
  • 10. Speech Recognition Challenges – Accuracy Audio Input Problem: Talk-over problem Impact: loss of recognition accuracy Solution: Barge-In TTS User
  • 11. Speech Recognition Challenges – User responsiveness Speech Recognition Problem: resources are not ready and user starts to speak the command Solution: Delayed speech recognition Resource loading / Back-end processing Front-end processing Delayed Speech Recognition
  • 12. Speech Recognition Challenges – User responsiveness Speech Recognition Problem: synchronization with multiple applications (media, phone, navigation) Solution: apply concurrent design patterns •Active Object •Monitor •Double-checked locking
  • 13. Speech Recognition Challenges – Performance Grammars Use cases: • Command & Control grammars • 200 – 500 commands •Navigation grammars • 100k+ static data •Music grammars • 10k+ dynamic data
  • 14. Speech Recognition Challenges – Performance Grammars (1) Problem: Grammar size too big Impact: • increased loading times of files from disk to memory Solution: Grammar optimization •merging of similar command tokens
  • 15. Speech Recognition Challenges – Performance Grammars (2) •removal / replacement of recursion rules
  • 16. Speech Recognition Challenges – Performance Grammars (3) Problem: Grammar token collisions Impact: • loss of recognition accuracy Solution: •replacement of collision prone tokens with synonyms •adding special pronunciation tokens to collision words Examples: sum – sun – sung bet – bed
  • 17. Speech Recognition Challenges – Performance Dynamic Grammars Problem: synchronization with USB devices, phones, navigation databases takes too much time Solution 1: implementation of a caching mechanism
  • 18. Speech Recognition Challenges – Performance Use id3 parser to read from mp3 files Title: One titles, artists, composers, genre, album. Artist: U2, etc. Album: Achtung Baby, Genre: rock ... Phoneme cache dynamic transcriptions grammar add to slot: title <DYN_TITLE> artist <DYN_ARTIST>
  • 19. Speech Recognition Challenges – Performance Dynamic Grammars Solution 2: split the processing in two, and dispatch part of the work to a different processor Use id3 parser to read from mp3 files CPU1 Title: One titles, artists, composers, genre, album. Artist: U2, etc. Album: Achtung Baby, Genre: rock ... CPU2 CPU1 dynamic CPU2 Preprocessing step grammar add to slot: title <DYN_TITLE> artist <DYN_ARTIST>
  • 20. Speech Recognition Challenges – Reliability Reliability - the ability of the system to keep operating over time Problem: system has to operate correctly over large periods of time Solution 1: automated tests Solution 2: drive tests
  • 21. Speech Recognition Challenges – Fault tolerance Problem: Recovery from system failures must be possible Solution: • system is modeled in a modular manner, with components that communicate via internal car area network. • individual components can be restarted without affecting other system components
  • 22. Speech Recognition Challenges TTS & ASR Demo
  • 23. Speech Recognition Challenges Questions ?
  • 24. Speech Recognition Challenges Thank You

×