Aplikace pro rozpoznávání řeči - Jan Šedivý

418 views

Published on

Prezentace z World Usabilty Day, 8.11.2012

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
418
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Aplikace pro rozpoznávání řeči - Jan Šedivý

  1. 1. Voice: The New UI for Mobile Devices Jan Šedivý WORLD USIBILITY DAY – 20121
  2. 2. Fred Jelinek (1932-2010) During 21 years at IBM Research and nearly two decades at Johns Hopkins, he has pioneered the statistical methods that enable modern computers to understand spoken language. “He envisioned applying the mathematics of probability to the problem of processing speech and language,” said Sanjeev Khudanpur, a Johns Hopkins associate2
  3. 3. WHY SPEECH RECOGNITION? 3
  4. 4. Speech reco benefits Speech • • Speech is much richer then two mouse buttons Disambiguation, dialog • Show me all emails from David about Linux server is rich • “Call David”, David Smith or Stone? Home or cell? Text • Speech expresses not only text entry but C&C, search, URI entry • Speech entry is part of the keyboard entry • “command box”, general source of information WYSIWYG == What You Say Is What You Get4
  5. 5. Elements of success • Access to huge content: Internet, YouTube, maps, music, pictures, SMS, email… • Train on all available data: contact, location Best names addresses, email, documents content, history, personalization and other sensors: GPS, accuracy: accelerometers, camera, compass • Computationally expensive - huge clusters of computers to speed up training • speech reco must not introduce any friction to the interface • keyboard, touch screen, multi-touch, keyboard, Great UI speaker, microphone • OS control, part of the OS, noise reduction, AD design: converter • Use all sensors available on the phone to inject extra information to app5
  6. 6. WHERE IS SPEECH RECOGNITIONUSEFUL? 6
  7. 7. Speech recognition areas Command Creation of Telephonycontrol, digit texts, dictati IVR dictation on Mobile VoiceAutomotive devices search Speech is the most natural way we communicate7
  8. 8. The main areas in time perspective PC – C&C, dictation Telephony Automotive Mobile devices UI 1995 2000 20058
  9. 9. Little more history 1993 IBM Personal Dictation System IBM PC, audio adapter card 1996 VoiceType (Win 95, dictation, isolated words, email, …) 1996 Nuance deployed its first commercial speech application 1997 Dragon Systems unveiled its Naturally Speaking 1999 VoiceXML 2000 Telephony applications, IVR 2002 Car control (control car equipment, make a phone call, select music, dictate address to navigation) 2003 Microsoft includes speech to Office 2003 2007 Growth of mobile phones/devices 2008 Google launches speech to Search iPhone 2009 Nuance Acquires IBMs patents Speech Technology rights 2011 iOS 5, Siri 9
  10. 10. HOW SPEECH RECOGNITIONWORKS 10
  11. 11. Speech recognition – high level Digitize audio AD convertor FFT, Non-lin, DFFT Front End feature extraction Application API Labeling triphones, prototy pes Text output Search LM, HMM, Viter Back End bi classification11
  12. 12. APPLICATIONS DEVELOPMENTCHRONOLOGICALLY 12
  13. 13. IBM speech recognition – the early days Large vocabulary, dictation (1990…) Office correspondence task – Tangora Written in Fortran IBM RISC System/6000, AIX, Tangora Albert Tangora (July 2, 1903 – April 7, 1978) set the world speed record for sustained typing on a manual keyboard for one hour, 147 words per minute, on 13 October 22, 1923.
  14. 14. How to get reco running on PC -1994 • Add-on board with ASIC Front End • Integer version on CPU • Input - 39 dim cepstrum coeffs feature vectorHierarchical each 10 ms • Output - 100 most likely prototypes out of labeler 30k, diagonal Gaussians • Statistical LM – high compression, log, Search • Viterbi search, Hidden Markov Models14
  15. 15. How get reco running on Embedded 1999 • Resource efficient speech recognition engineEasy Port to • Written in C/C++ • Integer implementation, GCC compilerEmbedded • Simple API to customize for any platform • Grammar support for command control applications Basic reco • Special emphasis on digit recognition • Robust front end for noisy environments • Command control Cars • Digit and name dialing • Navigation controlapplications: • On-board entertainment control15
  16. 16. MOBILE DEVICES 16
  17. 17. 7 billion people Over 5.3 billion people or 77% of the world’s population are now on mobile. according to WIPRO17
  18. 18. Mobile operating system preferences  Sparkwiz 201218
  19. 19. 19 ECSS 2010, 10/122/2010
  20. 20. Mobile Internet Access20
  21. 21. Factors accelerating better mobile apps Basic phone More powerful CPU more memory Connectivity, Internet Much better UI, multi- touch screen Rapid growth of mobile phones/devices is driving the adoption of speech recognition21
  22. 22. Why is reco so important for mobile?Small screenLimited keyboardDifficult text entryDifficult to navigateSlow, not reliable connectivity (latency) Speech is fundamentally changing the mobile user22 experience
  23. 23. LATEST APPLICATIONS 23
  24. 24. Google Now, Google search Some Android phones: two mics24
  25. 25. iOS Siri25
  26. 26. Poor performance in the Czech Rep.26
  27. 27. iOS Siri versus Google search Siri are "natural language processing" apps that use statistical Siri is deep in iOS, start apps, make calls, set meetings Google is deep in the search engine Cant launch apps with Google, you can dictate an email or a text message. Google is faster (much faster) Future – combination of AI and different UI 27
  28. 28. FUTURE 28
  29. 29. Future challengesBetter recognition, ROBUSTNES (noisy conditions,dictation)Better UI integration (speech button)Multiple languages (how would a German native search foran address in France?)Switching between multiple languagesUI combining multiplemodalities, (voice, text, video, sensors)Work on dictated text correctionBetter integration of speech reco to special applications29 ECSS 2010, 10/12/2010
  30. 30. QUESTIONS & THANK YOU 30

×