Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
2009   |   Westergasfabriek   |   Amsterdam   |   http://eComm.ec
Practical Edge of
Speech Technology

    Moshe Yudkowsky
   www.Disaggregate.com
                          2
“Practical” is Relative


              Affordable

              Schedule

             Achievable

                     ...
Core Technology: Speech
            Recognition (ASR), Text-
Engines     to-Speech (TTS),
            Biometrics, Thynomet...
Two 20-second Exercises




                          5
Two 20-second Exercises

   Exercise 1


   Travel Agency
    Automated
   Reservations



                          5
Two 20-second Exercises

   Exercise 1       Exercise 2


   Travel Agency   Twitter Update
    Automated        of eComm
...
Lessons
Exercise 1     Exercise 2




                            6
Lessons
  Exercise 1       Exercise 2
Everyone has the
 same & simple
    answers
  Call centers;




                    ...
Lessons
  Exercise 1       Exercise 2
Everyone has the
  same & simple
     answers
   Call centers;
 standard device
    ...
Lessons
  Exercise 1       Exercise 2
Everyone has the
  same & simple
     answers
   Call centers;
 standard device
    ...
Lessons
  Exercise 1        Exercise 2
Everyone has the
                   Highly Personal
  same & simple
               ...
Lessons
  Exercise 1        Exercise 2
Everyone has the
                   Highly Personal
  same & simple
               ...
Lessons
  Exercise 1        Exercise 2
Everyone has the
                   Highly Personal
  same & simple
               ...
Network Hardware for
Speaker Independent


                       7
Network-
based
systems:
Your
equipment
(“Premises”)
Network-
based
systems:
“Hosted”
Local Hardware



                 10
Device-
based
systems
                       ASR
             Results

Local
Recogniti
on
Known text
Complex,
personal
text
Device-
based
systems:
Hybrid

              Voice    Results
Voice to
server,
data back
to device
Speaker
independent
(?)...
Engine
       Speech Recognition (ASR)
s



Summary:
You can do almost anything — but
the more you do, the more you
pay.
 ...
Telephony ASR is excellent:
Inexpensiv “What city?”—
           “Amsterdam”
           “What is wrong with your
          ...
Cautions

No such thing as “speech to text”
 Speaker dependent comes closest
 Voicemail to text: human assisted
 Some tele...
Speaker Dependant


Desktop computers can do excellent
transcription, need corrections
Hand-held devices have more
memory ...
Engine
       Text-to-speech (TTS)
s



Summary:
Available in many languages,
reasonable quality, sometimes
difficult to u...
18
TTS requires language understanding
and specific jargon translation:




                                      18
TTS requires language understanding
and specific jargon translation:
 “Mr.” → “Mister”




                                ...
TTS requires language understanding
and specific jargon translation:
 “Mr.” → “Mister”
 “bbl” →“Be Back Later




         ...
TTS requires language understanding
and specific jargon translation:
 “Mr.” → “Mister”
 “bbl” →“Be Back Later
 “287 m” →“ab...
TTS requires language understanding
and specific jargon translation:
 “Mr.” → “Mister”
 “bbl” →“Be Back Later
 “287 m” →“ab...
Biometrics (Speaker
Engine Identification, Speaker
s      Verification, Speaker
       Characterization)

Summary:
Speaker v...
Speaker Verification (is that really
you?)
 Available, practical
 Rare in the US, more prevalent in
 Australia, Israel, and...
•Speaker Identification (who are
you?)
•Speaker Characterization (what are
you?)




                                      ...
Analytic Data mining, problem
s        discovery




Summary:
Surprising useful, expensive

                              ...
Not a real-time process
Word searches, “speech to text”
Emotion detection by ASR (swearing)
and by thynometrics (pitch, vo...
About Disaggregate



              Moshe Yudkowsky
              Disaggregate
              2952 W. Fargo
              C...
Headline Sponsor




                      Platinum Sponsors




                        Gold Sponsors




2009   |   West...
Upcoming SlideShare
Loading in …5
×

Moshe Yudkowsky's Presentation at Emerging Communication Conference & Awards 2009 Europe

807 views

Published on

Published in: Technology
  • Be the first to comment

Moshe Yudkowsky's Presentation at Emerging Communication Conference & Awards 2009 Europe

  1. 1. 2009 | Westergasfabriek | Amsterdam | http://eComm.ec
  2. 2. Practical Edge of Speech Technology Moshe Yudkowsky www.Disaggregate.com 2
  3. 3. “Practical” is Relative Affordable Schedule Achievable 3
  4. 4. Core Technology: Speech Recognition (ASR), Text- Engines to-Speech (TTS), Biometrics, Thynometrics (emotions) Data mining, problem Analytics discovery 4
  5. 5. Two 20-second Exercises 5
  6. 6. Two 20-second Exercises Exercise 1 Travel Agency Automated Reservations 5
  7. 7. Two 20-second Exercises Exercise 1 Exercise 2 Travel Agency Twitter Update Automated of eComm Reservations Conference 5
  8. 8. Lessons Exercise 1 Exercise 2 6
  9. 9. Lessons Exercise 1 Exercise 2 Everyone has the same & simple answers Call centers; 6
  10. 10. Lessons Exercise 1 Exercise 2 Everyone has the same & simple answers Call centers; standard device commands Speaker 6
  11. 11. Lessons Exercise 1 Exercise 2 Everyone has the same & simple answers Call centers; standard device commands Speaker Speaker Independent 6
  12. 12. Lessons Exercise 1 Exercise 2 Everyone has the Highly Personal same & simple Answers answers Call centers; standard device commands Speaker Speaker Independent 6
  13. 13. Lessons Exercise 1 Exercise 2 Everyone has the Highly Personal same & simple Answers answers Call centers; Dictation; voice standard device search commands Speaker Speaker Speaker Independent 6
  14. 14. Lessons Exercise 1 Exercise 2 Everyone has the Highly Personal same & simple Answers answers Call centers; Dictation; voice standard device search commands Speaker Speaker Speaker Dependent Independent or 6
  15. 15. Network Hardware for Speaker Independent 7
  16. 16. Network- based systems: Your equipment (“Premises”)
  17. 17. Network- based systems: “Hosted”
  18. 18. Local Hardware 10
  19. 19. Device- based systems ASR Results Local Recogniti on Known text Complex, personal text
  20. 20. Device- based systems: Hybrid Voice Results Voice to server, data back to device Speaker independent (?) ASR
  21. 21. Engine Speech Recognition (ASR) s Summary: You can do almost anything — but the more you do, the more you pay. 13
  22. 22. Telephony ASR is excellent: Inexpensiv “What city?”— “Amsterdam” “What is wrong with your phone?” — “I dropped it Very on the floor, and the expensive screen is cracked, and now I can’t see anything.” 14
  23. 23. Cautions No such thing as “speech to text” Speaker dependent comes closest Voicemail to text: human assisted Some telephone ASR is also human assisted 15
  24. 24. Speaker Dependant Desktop computers can do excellent transcription, need corrections Hand-held devices have more memory & power → better ASR 16
  25. 25. Engine Text-to-speech (TTS) s Summary: Available in many languages, reasonable quality, sometimes difficult to understand. 17
  26. 26. 18
  27. 27. TTS requires language understanding and specific jargon translation: 18
  28. 28. TTS requires language understanding and specific jargon translation: “Mr.” → “Mister” 18
  29. 29. TTS requires language understanding and specific jargon translation: “Mr.” → “Mister” “bbl” →“Be Back Later 18
  30. 30. TTS requires language understanding and specific jargon translation: “Mr.” → “Mister” “bbl” →“Be Back Later “287 m” →“about 300 meters” 18
  31. 31. TTS requires language understanding and specific jargon translation: “Mr.” → “Mister” “bbl” →“Be Back Later “287 m” →“about 300 meters” Custom voices available 18
  32. 32. Biometrics (Speaker Engine Identification, Speaker s Verification, Speaker Characterization) Summary: Speaker verification practical but still rare; speaker identification & characterization practical and secret 19
  33. 33. Speaker Verification (is that really you?) Available, practical Rare in the US, more prevalent in Australia, Israel, and Canada Roadblocks: valid fear; fear of biometrics; love of fingerprints; only part of complete solution 20
  34. 34. •Speaker Identification (who are you?) •Speaker Characterization (what are you?) 21
  35. 35. Analytic Data mining, problem s discovery Summary: Surprising useful, expensive 22
  36. 36. Not a real-time process Word searches, “speech to text” Emotion detection by ASR (swearing) and by thynometrics (pitch, volume) 23
  37. 37. About Disaggregate Moshe Yudkowsky Disaggregate 2952 W. Fargo Chicago, IL 60645 +1 773 764 8727 www.Disaggregate.com
  38. 38. Headline Sponsor Platinum Sponsors Gold Sponsors 2009 | Westergasfabriek | Amsterdam | http://eComm.ec

×