Using Asterisk 
to create “Her”
CAN YOU SPEAK MAGIC? 
2 
Allison Smith Ben Klang 
as “Her”
CAN YOU SPEAK MAGIC? 
3
CAN YOU SPEAK MAGIC? 
3
CAN YOU SPEAK MAGIC? 
ALL ABOUT “HER” 
4 
Allison
CAN YOU SPEAK MAGIC? 
5
CAN YOU SPEAK MAGIC? 
HOW DOES THIS WORK IN ASTERISK 
•We have access to the same core tech 
•ASR: Automatic Speech Recognition 
•NLU: Natural Language Understanding 
•TTS: Text-to-Speech 
•API: Application Program Interfaces 
•But it’s not just about the tech 
•It has to be useful 
•It has to usable 
6
CAN YOU SPEAK MAGIC? 
USABILITY: “HER” PERSONALITY 
7
CAN YOU SPEAK MAGIC? 
CREATING “HER” PERSONALITY 
•What kind of assistant is she? 
•Straight, no-nonsense 
•Bubbly, friendly 
•Sassy, smart-mouthed 
•Relaxed, laid back 
•Energetic, excited 
•Sultry, provocative 
8
CAN YOU SPEAK MAGIC? 
WHY PERSONALITY MATTERS 
9
CAN YOU SPEAK MAGIC? 
HOW DOES “SHE” WORK? 
10
CAN YOU SPEAK MAGIC? 
INSIDE “HER” 
ASR Recognizing 
Researching API NLU Understanding 
11 
Responding TTS 
Input/Output Channel 
Voice
CAN YOU SPEAK MAGIC? 
INSIDE “HER” 
ASR Recognizing 
Researching API NLU Understanding 
12 
Responding TTS 
Input/Output Channel 
Voice
CAN YOU SPEAK MAGIC? 
RECOGNIZING 
•Different kinds of ASR 
•Dictation / Transcription 
•Grammar-based 
•Hotword 
•Biometrics / Identity 
•DTMF has its place 
•The Media Connection 
•MRCP 
•HTTP APIs 
13
CAN YOU SPEAK MAGIC? 
RECOGNIZING INTERFACES 
•MRCP 
+ Streaming recognition = fastest response 
+ MRCPv2 is SIP-based 
– Somewhat more complex 
– Mobile-app unfriendly 
•HTTP API 
+ Mobile-friendly 
+ Simple API 
– Record-and-upload = slower response 
14
CAN YOU SPEAK MAGIC? 
15 
ASR 
Vendors 
/ 
MRCP HTTP Grammar Dictation Hotword 
Nuance ✓ ✓ ✓ ✓ ✓ 
Lumenvox ✓ ✓ 
Vestec ✓ ✓ ✓ 
AT&T 
Watson ✓ ✓ ✓ 
Google ✓ ✓
CAN YOU SPEAK MAGIC? 
INSIDE “HER” 
ASR Recognizing 
Researching API NLU Understanding 
16 
Responding TTS 
Input/Output Channel 
Voice
CAN YOU SPEAK MAGIC? 
GRAMMAR-BASED RECOG 
17 
Where would you like to go? 
Chicago 
Tell me the month and day you want to leave? 
August fifth 
Tell me the month and day you want to return? 
August eighth 
What can I help you with? 
Book a flight 
Where are you flying from? 
Atlanta
CAN YOU SPEAK MAGIC? 
NATURAL LANGUAGE 
“Hm, I want to go to AstriCon in Las Vegas on 
October 21st for three days, and I want the last flight out.” 
✓ Destination 
✓ Departing Date 
✓ Returning Date 
+ Extra Constraint 
18 
? Origin
CAN YOU SPEAK MAGIC? 
INSIDE “HER” 
ASR Recognizing 
Researching API NLU Understanding 
19 
Responding TTS 
Input/Output Channel 
Voice
CAN YOU SPEAK MAGIC? 
20 
Send a tweet… Check in at… 
What is the weather today? 
Get me a table for two… 
Who won the game last night? 
What is Google 
trading at? 
When is my 
next appointment?
CAN YOU SPEAK MAGIC? 
ZZZZZZzzzzzz…… 
20 
Send a tweet… Check in at… 
What is the weather today? 
Get me a table for two… 
Who won the game last night? 
What is Google 
trading at? 
When is my 
next appointment?
CAN YOU SPEAK MAGIC? 
21 
How much have we sold 
so far this month? 
How many sales reps 
are still in homes? 
How many callers 
are in the queue 
right now? 
Add my manager to this call 
When is my next open 
appointment slot?
CAN YOU SPEAK MAGIC? 
INSIDE “HER” 
ASR Recognizing 
Researching API NLU Understanding 
22 
Responding TTS 
Input/Output Channel 
Voice
CAN YOU SPEAK MAGIC? 
TEXT-TO-SPEECH 
•Choose your voice carefully 
•Voice DBs’ quality varies widely 
•Tone of voice imparts as much as content 
•Mix TTS with recorded audio 
•Consider context of user 
•Check prosody (rate, pitch, volume) 
•Structure answers similarly to questions 
•Give option to repeat 
•Speech Synthesis Markup 
23
CAN YOU SPEAK MAGIC? 
INSIDE “HER” 
ASR Recognizing 
Researching API NLU Understanding 
24 
Responding TTS 
Input/Output Channel 
Voice
CAN YOU SPEAK MAGIC? 
BEYOND VOICE: 
GETTING VISUAL 
25
CAN YOU SPEAK MAGIC? 
MULTI-MODE APPS 
•Request information by voice 
•Receive information via screen 
•SMS 
•Web browser (WebRTC!) 
•Allow continued input from alternate 
source 
•Respond via mouse click *or* voice 
26
CAN YOU SPEAK MAGIC? 
27
CAN YOU SPEAK MAGIC? 
QUESTIONS? 
PS: ALLISON WANTS TO BE THE 
NEXT SIRI! 
28

Using Asterisk to Create "Her"

  • 1.
    Using Asterisk tocreate “Her”
  • 2.
    CAN YOU SPEAKMAGIC? 2 Allison Smith Ben Klang as “Her”
  • 3.
    CAN YOU SPEAKMAGIC? 3
  • 4.
    CAN YOU SPEAKMAGIC? 3
  • 5.
    CAN YOU SPEAKMAGIC? ALL ABOUT “HER” 4 Allison
  • 6.
    CAN YOU SPEAKMAGIC? 5
  • 7.
    CAN YOU SPEAKMAGIC? HOW DOES THIS WORK IN ASTERISK •We have access to the same core tech •ASR: Automatic Speech Recognition •NLU: Natural Language Understanding •TTS: Text-to-Speech •API: Application Program Interfaces •But it’s not just about the tech •It has to be useful •It has to usable 6
  • 8.
    CAN YOU SPEAKMAGIC? USABILITY: “HER” PERSONALITY 7
  • 9.
    CAN YOU SPEAKMAGIC? CREATING “HER” PERSONALITY •What kind of assistant is she? •Straight, no-nonsense •Bubbly, friendly •Sassy, smart-mouthed •Relaxed, laid back •Energetic, excited •Sultry, provocative 8
  • 10.
    CAN YOU SPEAKMAGIC? WHY PERSONALITY MATTERS 9
  • 11.
    CAN YOU SPEAKMAGIC? HOW DOES “SHE” WORK? 10
  • 12.
    CAN YOU SPEAKMAGIC? INSIDE “HER” ASR Recognizing Researching API NLU Understanding 11 Responding TTS Input/Output Channel Voice
  • 13.
    CAN YOU SPEAKMAGIC? INSIDE “HER” ASR Recognizing Researching API NLU Understanding 12 Responding TTS Input/Output Channel Voice
  • 14.
    CAN YOU SPEAKMAGIC? RECOGNIZING •Different kinds of ASR •Dictation / Transcription •Grammar-based •Hotword •Biometrics / Identity •DTMF has its place •The Media Connection •MRCP •HTTP APIs 13
  • 15.
    CAN YOU SPEAKMAGIC? RECOGNIZING INTERFACES •MRCP + Streaming recognition = fastest response + MRCPv2 is SIP-based – Somewhat more complex – Mobile-app unfriendly •HTTP API + Mobile-friendly + Simple API – Record-and-upload = slower response 14
  • 16.
    CAN YOU SPEAKMAGIC? 15 ASR Vendors / MRCP HTTP Grammar Dictation Hotword Nuance ✓ ✓ ✓ ✓ ✓ Lumenvox ✓ ✓ Vestec ✓ ✓ ✓ AT&T Watson ✓ ✓ ✓ Google ✓ ✓
  • 17.
    CAN YOU SPEAKMAGIC? INSIDE “HER” ASR Recognizing Researching API NLU Understanding 16 Responding TTS Input/Output Channel Voice
  • 18.
    CAN YOU SPEAKMAGIC? GRAMMAR-BASED RECOG 17 Where would you like to go? Chicago Tell me the month and day you want to leave? August fifth Tell me the month and day you want to return? August eighth What can I help you with? Book a flight Where are you flying from? Atlanta
  • 19.
    CAN YOU SPEAKMAGIC? NATURAL LANGUAGE “Hm, I want to go to AstriCon in Las Vegas on October 21st for three days, and I want the last flight out.” ✓ Destination ✓ Departing Date ✓ Returning Date + Extra Constraint 18 ? Origin
  • 20.
    CAN YOU SPEAKMAGIC? INSIDE “HER” ASR Recognizing Researching API NLU Understanding 19 Responding TTS Input/Output Channel Voice
  • 21.
    CAN YOU SPEAKMAGIC? 20 Send a tweet… Check in at… What is the weather today? Get me a table for two… Who won the game last night? What is Google trading at? When is my next appointment?
  • 22.
    CAN YOU SPEAKMAGIC? ZZZZZZzzzzzz…… 20 Send a tweet… Check in at… What is the weather today? Get me a table for two… Who won the game last night? What is Google trading at? When is my next appointment?
  • 23.
    CAN YOU SPEAKMAGIC? 21 How much have we sold so far this month? How many sales reps are still in homes? How many callers are in the queue right now? Add my manager to this call When is my next open appointment slot?
  • 24.
    CAN YOU SPEAKMAGIC? INSIDE “HER” ASR Recognizing Researching API NLU Understanding 22 Responding TTS Input/Output Channel Voice
  • 25.
    CAN YOU SPEAKMAGIC? TEXT-TO-SPEECH •Choose your voice carefully •Voice DBs’ quality varies widely •Tone of voice imparts as much as content •Mix TTS with recorded audio •Consider context of user •Check prosody (rate, pitch, volume) •Structure answers similarly to questions •Give option to repeat •Speech Synthesis Markup 23
  • 26.
    CAN YOU SPEAKMAGIC? INSIDE “HER” ASR Recognizing Researching API NLU Understanding 24 Responding TTS Input/Output Channel Voice
  • 27.
    CAN YOU SPEAKMAGIC? BEYOND VOICE: GETTING VISUAL 25
  • 28.
    CAN YOU SPEAKMAGIC? MULTI-MODE APPS •Request information by voice •Receive information via screen •SMS •Web browser (WebRTC!) •Allow continued input from alternate source •Respond via mouse click *or* voice 26
  • 29.
    CAN YOU SPEAKMAGIC? 27
  • 30.
    CAN YOU SPEAKMAGIC? QUESTIONS? PS: ALLISON WANTS TO BE THE NEXT SIRI! 28