Moshe Yudkowsky's Presentation at Emerging Communication Conference & Awards 2009 Europe
Upcoming SlideShare
Loading in...5
×
 

Moshe Yudkowsky's Presentation at Emerging Communication Conference & Awards 2009 Europe

on

  • 801 views

 

Statistics

Views

Total Views
801
Views on SlideShare
781
Embed Views
20

Actions

Likes
1
Downloads
24
Comments
0

1 Embed 20

http://blog.ecomm.ec 20

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Other topics: APIs, IDE, Grammar building tools, VUI tools
  • 1. Ask the person next to you a question as if you were an airline reservations system. Find out what city he wants to fly to. <br /> 2. Ask the person next to you for a twitter updates of the conference.
  • 1. Ask the person next to you a question as if you were an airline reservations system. Find out what city he wants to fly to. <br /> 2. Ask the person next to you for a twitter updates of the conference.
  • Google, for example, does Voice mail transcriptions - poorly.
  • Google, for example, does Voice mail transcriptions - poorly.
  • Google, for example, does Voice mail transcriptions - poorly.
  • Google, for example, does Voice mail transcriptions - poorly.
  • Google, for example, does Voice mail transcriptions - poorly.
  • Google, for example, does Voice mail transcriptions - poorly.
  • Practical deployment configurations
  • The telco server is also hosted. The voice of the user (the &#x201C;utterance&#x201D;) must have a good, clean path to the recognition system.
  • Known text: address book, firmware <br /> Complex: dictation, add-on
  • Not practical in the network: who is using the phone?
  • We have reviewed the hardware and the types of recognition. I will now review some more specific details about recognition.
  • Not magic. You still have to manage the data; enroll users; deal with users who are locked out; etc.

Moshe Yudkowsky's Presentation at Emerging Communication Conference & Awards 2009 Europe Moshe Yudkowsky's Presentation at Emerging Communication Conference & Awards 2009 Europe Presentation Transcript

  • 2009 | Westergasfabriek | Amsterdam | http://eComm.ec
  • Practical Edge of Speech Technology Moshe Yudkowsky www.Disaggregate.com 2
  • “Practical” is Relative Affordable Schedule Achievable 3
  • Core Technology: Speech Recognition (ASR), Text- Engines to-Speech (TTS), Biometrics, Thynometrics (emotions) Data mining, problem Analytics discovery 4
  • Two 20-second Exercises 5
  • Two 20-second Exercises Exercise 1 Travel Agency Automated Reservations 5
  • Two 20-second Exercises Exercise 1 Exercise 2 Travel Agency Twitter Update Automated of eComm Reservations Conference 5
  • Lessons Exercise 1 Exercise 2 6
  • Lessons Exercise 1 Exercise 2 Everyone has the same & simple answers Call centers; 6
  • Lessons Exercise 1 Exercise 2 Everyone has the same & simple answers Call centers; standard device commands Speaker 6
  • Lessons Exercise 1 Exercise 2 Everyone has the same & simple answers Call centers; standard device commands Speaker Speaker Independent 6
  • Lessons Exercise 1 Exercise 2 Everyone has the Highly Personal same & simple Answers answers Call centers; standard device commands Speaker Speaker Independent 6
  • Lessons Exercise 1 Exercise 2 Everyone has the Highly Personal same & simple Answers answers Call centers; Dictation; voice standard device search commands Speaker Speaker Speaker Independent 6
  • Lessons Exercise 1 Exercise 2 Everyone has the Highly Personal same & simple Answers answers Call centers; Dictation; voice standard device search commands Speaker Speaker Speaker Dependent Independent or 6
  • Network Hardware for Speaker Independent 7
  • Network- based systems: Your equipment (“Premises”)
  • Network- based systems: “Hosted”
  • Local Hardware 10
  • Device- based systems ASR Results Local Recogniti on Known text Complex, personal text
  • Device- based systems: Hybrid Voice Results Voice to server, data back to device Speaker independent (?) ASR
  • Engine Speech Recognition (ASR) s Summary: You can do almost anything — but the more you do, the more you pay. 13
  • Telephony ASR is excellent: Inexpensiv “What city?”— “Amsterdam” “What is wrong with your phone?” — “I dropped it Very on the floor, and the expensive screen is cracked, and now I can’t see anything.” 14
  • Cautions No such thing as “speech to text” Speaker dependent comes closest Voicemail to text: human assisted Some telephone ASR is also human assisted 15
  • Speaker Dependant Desktop computers can do excellent transcription, need corrections Hand-held devices have more memory & power → better ASR 16
  • Engine Text-to-speech (TTS) s Summary: Available in many languages, reasonable quality, sometimes difficult to understand. 17
  • 18
  • TTS requires language understanding and specific jargon translation: 18
  • TTS requires language understanding and specific jargon translation: “Mr.” → “Mister” 18
  • TTS requires language understanding and specific jargon translation: “Mr.” → “Mister” “bbl” →“Be Back Later 18
  • TTS requires language understanding and specific jargon translation: “Mr.” → “Mister” “bbl” →“Be Back Later “287 m” →“about 300 meters” 18
  • TTS requires language understanding and specific jargon translation: “Mr.” → “Mister” “bbl” →“Be Back Later “287 m” →“about 300 meters” Custom voices available 18
  • Biometrics (Speaker Engine Identification, Speaker s Verification, Speaker Characterization) Summary: Speaker verification practical but still rare; speaker identification & characterization practical and secret 19
  • Speaker Verification (is that really you?) Available, practical Rare in the US, more prevalent in Australia, Israel, and Canada Roadblocks: valid fear; fear of biometrics; love of fingerprints; only part of complete solution 20
  • •Speaker Identification (who are you?) •Speaker Characterization (what are you?) 21
  • Analytic Data mining, problem s discovery Summary: Surprising useful, expensive 22
  • Not a real-time process Word searches, “speech to text” Emotion detection by ASR (swearing) and by thynometrics (pitch, volume) 23
  • About Disaggregate Moshe Yudkowsky Disaggregate 2952 W. Fargo Chicago, IL 60645 +1 773 764 8727 www.Disaggregate.com
  • Headline Sponsor Platinum Sponsors Gold Sponsors 2009 | Westergasfabriek | Amsterdam | http://eComm.ec