Speech recognition challenges

3,315 views
2,943 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,315
On SlideShare
0
From Embeds
0
Number of Embeds
22
Actions
Shares
0
Downloads
96
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Speech recognition challenges

  1. 1. Speech Recognition Challenges Presenter: Alexandru Chica
  2. 2. Contents Speech User Interface basic concepts •Speech recognition •Speech synthesis Speech Recognition Challenges •Accuracy •User responsiveness •Performance •Reliability •Fault tolerance
  3. 3. Speech User Interface basic concepts Speech Recognition •The translation of spoken text into written text algorithm "#spit&S#" "speech" •Statistical Processing Phonetic representation •Hidden Marcov Models of speech •Dynamic Time Warping Types of speech recognition: •Command and control •Dictation
  4. 4. Speech User Interface basic concepts Speech Recognition Components •Audio input (front-end) •Grammars – contain commands that can be spoken by the user •Acoustic models – language dependant, used to “define” the language features •Recognition algorithms (back-end) Back end feature extraction result Audio input / Acoustic Recognition Grammars models algorithms Front end
  5. 5. Speech User Interface basic concepts Speech Recognition APIs Microsoft SAPI IBM: Embedded ViaVoice Nuance: VoCon VoiceBox Speech Recognition
  6. 6. Speech User Interface basic concepts Speech Synthesis •The translation of written text into spoken text g2p "speech" "#spit&S#"
  7. 7. Speech User Interface basic concepts Speech Synthesis APIs Microsoft SAPI SoftVoice TTS Apple PlainTalk Nuance: Vocalizer SVOX TTS eSpeak
  8. 8. Speech User Interface basic concepts - Usage In car: •Control media player / radio stations •Control navigation •Control phone book and phone activities •Find POI locations (POI : point of interests) •E-mail/SMS reading On the web: •HTML 5 speech input •Google Search with voice input •Reading of web page content
  9. 9. Speech Recognition Challenges – Accuracy Audio Input Problem: Audio signal quality Impact: loss of recognition accuracy Solution 1: Echo cancellation Solution 2: Beamforming
  10. 10. Speech Recognition Challenges – Accuracy Audio Input Problem: Talk-over problem Impact: loss of recognition accuracy Solution: Barge-In TTS User
  11. 11. Speech Recognition Challenges – User responsiveness Speech Recognition Problem: resources are not ready and user starts to speak the command Solution: Delayed speech recognition Resource loading / Back-end processing Front-end processing Delayed Speech Recognition
  12. 12. Speech Recognition Challenges – User responsiveness Speech Recognition Problem: synchronization with multiple applications (media, phone, navigation) Solution: apply concurrent design patterns •Active Object •Monitor •Double-checked locking
  13. 13. Speech Recognition Challenges – Performance Grammars Use cases: • Command & Control grammars • 200 – 500 commands •Navigation grammars • 100k+ static data •Music grammars • 10k+ dynamic data
  14. 14. Speech Recognition Challenges – Performance Grammars (1) Problem: Grammar size too big Impact: • increased loading times of files from disk to memory Solution: Grammar optimization •merging of similar command tokens
  15. 15. Speech Recognition Challenges – Performance Grammars (2) •removal / replacement of recursion rules
  16. 16. Speech Recognition Challenges – Performance Grammars (3) Problem: Grammar token collisions Impact: • loss of recognition accuracy Solution: •replacement of collision prone tokens with synonyms •adding special pronunciation tokens to collision words Examples: sum – sun – sung bet – bed
  17. 17. Speech Recognition Challenges – Performance Dynamic Grammars Problem: synchronization with USB devices, phones, navigation databases takes too much time Solution 1: implementation of a caching mechanism
  18. 18. Speech Recognition Challenges – Performance Use id3 parser to read from mp3 files Title: One titles, artists, composers, genre, album. Artist: U2, etc. Album: Achtung Baby, Genre: rock ... Phoneme cache dynamic transcriptions grammar add to slot: title <DYN_TITLE> artist <DYN_ARTIST>
  19. 19. Speech Recognition Challenges – Performance Dynamic Grammars Solution 2: split the processing in two, and dispatch part of the work to a different processor Use id3 parser to read from mp3 files CPU1 Title: One titles, artists, composers, genre, album. Artist: U2, etc. Album: Achtung Baby, Genre: rock ... CPU2 CPU1 dynamic CPU2 Preprocessing step grammar add to slot: title <DYN_TITLE> artist <DYN_ARTIST>
  20. 20. Speech Recognition Challenges – Reliability Reliability - the ability of the system to keep operating over time Problem: system has to operate correctly over large periods of time Solution 1: automated tests Solution 2: drive tests
  21. 21. Speech Recognition Challenges – Fault tolerance Problem: Recovery from system failures must be possible Solution: • system is modeled in a modular manner, with components that communicate via internal car area network. • individual components can be restarted without affecting other system components
  22. 22. Speech Recognition Challenges TTS & ASR Demo
  23. 23. Speech Recognition Challenges Questions ?
  24. 24. Speech Recognition Challenges Thank You

×