Moshe Yudkowsky's Presentation at Emerging Communication Conference & Awards 2009 Europe

650 views
617 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
650
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
25
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Other topics: APIs, IDE, Grammar building tools, VUI tools
  • 1. Ask the person next to you a question as if you were an airline reservations system. Find out what city he wants to fly to.
    2. Ask the person next to you for a twitter updates of the conference.
  • 1. Ask the person next to you a question as if you were an airline reservations system. Find out what city he wants to fly to.
    2. Ask the person next to you for a twitter updates of the conference.
  • Google, for example, does Voice mail transcriptions - poorly.
  • Google, for example, does Voice mail transcriptions - poorly.
  • Google, for example, does Voice mail transcriptions - poorly.
  • Google, for example, does Voice mail transcriptions - poorly.
  • Google, for example, does Voice mail transcriptions - poorly.
  • Google, for example, does Voice mail transcriptions - poorly.
  • Practical deployment configurations
  • The telco server is also hosted. The voice of the user (the “utterance”) must have a good, clean path to the recognition system.
  • Known text: address book, firmware
    Complex: dictation, add-on
  • Not practical in the network: who is using the phone?
  • We have reviewed the hardware and the types of recognition. I will now review some more specific details about recognition.
  • Not magic. You still have to manage the data; enroll users; deal with users who are locked out; etc.
  • Moshe Yudkowsky's Presentation at Emerging Communication Conference & Awards 2009 Europe

    1. 1. 2009 | Westergasfabriek | Amsterdam | http://eComm.ec
    2. 2. Practical Edge of Speech Technology Moshe Yudkowsky www.Disaggregate.com 2
    3. 3. “Practical” is Relative Affordable Schedule Achievable 3
    4. 4. Core Technology: Speech Recognition (ASR), Text- Engines to-Speech (TTS), Biometrics, Thynometrics (emotions) Data mining, problem Analytics discovery 4
    5. 5. Two 20-second Exercises 5
    6. 6. Two 20-second Exercises Exercise 1 Travel Agency Automated Reservations 5
    7. 7. Two 20-second Exercises Exercise 1 Exercise 2 Travel Agency Twitter Update Automated of eComm Reservations Conference 5
    8. 8. Lessons Exercise 1 Exercise 2 6
    9. 9. Lessons Exercise 1 Exercise 2 Everyone has the same & simple answers Call centers; 6
    10. 10. Lessons Exercise 1 Exercise 2 Everyone has the same & simple answers Call centers; standard device commands Speaker 6
    11. 11. Lessons Exercise 1 Exercise 2 Everyone has the same & simple answers Call centers; standard device commands Speaker Speaker Independent 6
    12. 12. Lessons Exercise 1 Exercise 2 Everyone has the Highly Personal same & simple Answers answers Call centers; standard device commands Speaker Speaker Independent 6
    13. 13. Lessons Exercise 1 Exercise 2 Everyone has the Highly Personal same & simple Answers answers Call centers; Dictation; voice standard device search commands Speaker Speaker Speaker Independent 6
    14. 14. Lessons Exercise 1 Exercise 2 Everyone has the Highly Personal same & simple Answers answers Call centers; Dictation; voice standard device search commands Speaker Speaker Speaker Dependent Independent or 6
    15. 15. Network Hardware for Speaker Independent 7
    16. 16. Network- based systems: Your equipment (“Premises”)
    17. 17. Network- based systems: “Hosted”
    18. 18. Local Hardware 10
    19. 19. Device- based systems ASR Results Local Recogniti on Known text Complex, personal text
    20. 20. Device- based systems: Hybrid Voice Results Voice to server, data back to device Speaker independent (?) ASR
    21. 21. Engine Speech Recognition (ASR) s Summary: You can do almost anything — but the more you do, the more you pay. 13
    22. 22. Telephony ASR is excellent: Inexpensiv “What city?”— “Amsterdam” “What is wrong with your phone?” — “I dropped it Very on the floor, and the expensive screen is cracked, and now I can’t see anything.” 14
    23. 23. Cautions No such thing as “speech to text” Speaker dependent comes closest Voicemail to text: human assisted Some telephone ASR is also human assisted 15
    24. 24. Speaker Dependant Desktop computers can do excellent transcription, need corrections Hand-held devices have more memory & power → better ASR 16
    25. 25. Engine Text-to-speech (TTS) s Summary: Available in many languages, reasonable quality, sometimes difficult to understand. 17
    26. 26. 18
    27. 27. TTS requires language understanding and specific jargon translation: 18
    28. 28. TTS requires language understanding and specific jargon translation: “Mr.” → “Mister” 18
    29. 29. TTS requires language understanding and specific jargon translation: “Mr.” → “Mister” “bbl” →“Be Back Later 18
    30. 30. TTS requires language understanding and specific jargon translation: “Mr.” → “Mister” “bbl” →“Be Back Later “287 m” →“about 300 meters” 18
    31. 31. TTS requires language understanding and specific jargon translation: “Mr.” → “Mister” “bbl” →“Be Back Later “287 m” →“about 300 meters” Custom voices available 18
    32. 32. Biometrics (Speaker Engine Identification, Speaker s Verification, Speaker Characterization) Summary: Speaker verification practical but still rare; speaker identification & characterization practical and secret 19
    33. 33. Speaker Verification (is that really you?) Available, practical Rare in the US, more prevalent in Australia, Israel, and Canada Roadblocks: valid fear; fear of biometrics; love of fingerprints; only part of complete solution 20
    34. 34. •Speaker Identification (who are you?) •Speaker Characterization (what are you?) 21
    35. 35. Analytic Data mining, problem s discovery Summary: Surprising useful, expensive 22
    36. 36. Not a real-time process Word searches, “speech to text” Emotion detection by ASR (swearing) and by thynometrics (pitch, volume) 23
    37. 37. About Disaggregate Moshe Yudkowsky Disaggregate 2952 W. Fargo Chicago, IL 60645 +1 773 764 8727 www.Disaggregate.com
    38. 38. Headline Sponsor Platinum Sponsors Gold Sponsors 2009 | Westergasfabriek | Amsterdam | http://eComm.ec

    ×