17. Prosody for natural sounding reading
Bi-directional recurrent network
pitch duration
• Phonetic features
• Linguistic features
• Semantic word vectors
targets for
segment
intensity
Stockholm Summit 5/3/2017 (C) Amazon.com 17
18. Long-form example
“Over a lunch of diet cokes and lobster salad one balmy fall day in
Boston, Joseph Martin, the genial, white-haired, former dean of
Harvard medical school, told me how many hours of pain education
Harvard med students get during four years of medical school.”
Before After
Stockholm Summit 5/3/2017 (C) Amazon.com 18
19. Polly: Life-like speech service
Stockholm Summit 5/3/2017 (C) Amazon.com 19
Converts text
to life-like speech
Low latency,
real time
47 voices 24 languagesFully managed
https://aws.amazon.com/polly/
22. Transfer learning from English to German
Hidden layer 1
Hidden layer 2
Last hidden layer
æI ɑɜ ʊ … eæI ɑɜ u: … œ
Output layer
Stockholm Summit 5/3/2017 (C) Amazon.com 22
23. The cocktail party problem Alexa!
Blah
Blah
Blah
Blah
Blah
Blah
Stockholm Summit 5/3/2017 (C) Amazon.com 23
24. The cocktail party problem … play
some jazz!
…blah,
blah, blah,
blah…
…blah,
blah, blah,
blah…
…blah,
blah, blah,
blah…
…blah,
blah, blah,
blah…
…blah,
blah, blah,
blah…
…blah,
blah, blah,
blah…
Stockholm Summit 5/3/2017 (C) Amazon.com 24
25. Anchored speech detection
Roland Maas, Sree Hari Krishnan Parthasarathi, Brian King, Ruitong Huang, Björn Hoffmeister. “Anchored Speech Detection.” INTERSPEECH. 2016.
Alexa, play some jazz!
Wake word Request
“Anchor” Speech consistent with anchor
Encoder Decoder
Stockholm Summit 5/3/2017 (C) Amazon.com 25
26. Anchored speech detection
Roland Maas, Sree Hari Krishnan Parthasarathi, Brian King, Ruitong Huang, Björn Hoffmeister. “Anchored Speech Detection.” INTERSPEECH. 2016.
Alexa, play some jazz!
t
LSTM
Encoder
speech features
from wake word speech features from request
endpoint decision
anchor embedding
LSTM
Decoder
Stockholm Summit 5/3/2017 (C) Amazon.com 26
27. Lex: Build Natural, Conversational Interactions In Voice & Text
Stockholm Summit 5/3/2017 (C) Amazon.com 27
Voice & Text
“Chatbots”
Alexa
technology
Deploy to
mobile, IoT &
Chat services
Enterprise Connectors
Salesforce
Microsoft Dynamics
Marketo
Zendesk
Quickbooks
Hubspot
Sample Use Cases:
Check Weather
Order tickets
Check inventory status
Control devices
https://aws.amazon.com/lex/
29. Longer-form talk at AWS re:Invent 2016
https://www.youtube.com/watch?v=TYRckcVm4WE
Deep Learning in
Alexa (MAC202)
Stockholm Summit 5/3/2017 (C) Amazon.com 29
30. Stockholm Summit 5/3/2017 (C) Amazon.com 30
AWS Deep Learning AMI
One-Click GPU or CPU
Deep Learning
Up to~40k CUDA cores
Apache MXNet
TensorFlow
Theano
Caffe
Torch
Keras
Pre-configured CUDA drivers, MKL
Anaconda, Python3
Ubuntu and Amazon Linux
+ CloudFormation template
+ Container Image
32. $2.5M inaugural competition to advance the
field of Conversational AI
CHALLENGE
Create a socialbot that can converse coherently
and engagingly on popular topics for 20 minutes
A L E X A ,
L E T ’ S
T A L K
A B O U T
A I
Stockholm Summit 5/3/2017 (C) Amazon.com 32
33. 33
The Alexa Fund
$100 MM venture capital
fund to invest in early-
stage and growth-stage
companies
We seek to support best-of-breed
entrepreneurs and companies that
can innovate on the Alexa service
o Products or services which introduce new and
compelling voice use cases to Alexa through
hardware or software
o Enabling technologies that can enhance the
capabilities of the Alexa service itself, including
natural language understanding (NLU), automatic
speech recognition (ASR), artificial intelligence (AI),
and text-to-speech (TTS)
Stockholm Summit 5/3/2017 (C) Amazon.com
37. Speech recognition
Stockholm Summit 5/3/2017 (C) Amazon.com 37
Signal
processing
Acoustic model
Decoder
(inference)
Post
processing
Feature
vectors
[4.7, 2.3, -1.4, …]
Phonetic
probabilities
[0.1, 0.1, 0.4, …]
Words
increase to 70
degrees
Text
Increase to 70⁰
Sound