3. Serverless
At the Edge, On IoT Devices
Algorithms
Tons of CPUs
Elastic capacity
Software
Lowering Cost on Data Storage
Data
PBs of existing data
Algorithms and Neural Networks
7. Amazon Polly: Life-like Speech Service
Converts text
to life-like
speech
50 voices 24 languages Low latency,
real time
Fully managed
8. Amazon Polly: Language Portfolio
Americas:
• Brazilian Portuguese
• Canadian French
• English (US)
• Spanish (US)
A-PAC:
• Australian English
• Indian English
• Japanese
EMEA:
• British English
• Danish
• Dutch
• French
• German
• Icelandic
• Italian
• Norwegian
• Polish
• Portuguese
• Romanian
• Russian
• Spanish
• Swedish
• Turkish
• Welsh
• Welsh English
10. Amazon Polly: Text In, Life-like Speech Out
Amazon Polly
“The temperature
in WA is 75°F”
“The temperature
in Washington is 75 degrees
Fahrenheit”
11. “Today in Mumbai, India, it’s 31°C”
‘"We live for the music" live from the Madison Square Garden.’
1. Automatic, Accurate Text Processing
Polly: A Focus On Voice Quality & Pronunciation
12. Polly: A Focus On Voice Quality & Pronunciation
2. Intelligible and Easy to Understand
1. Automatic, Accurate Text Processing
13. 2. Intelligible and Easy to Understand
3. Add Semantic Meaning to Text
“Richard’s number is 2122341237“
“Richard’s number is 2122341237“
Telephone Number
Polly: A Focus On Voice Quality & Pronunciation
1. Automatic, Accurate Text Processing
14. Polly: A Focus On Voice Quality & Pronunciation
2. Intelligible and Easy to Understand
1. Automatic, Accurate Text Processing
3. Add Semantic Meaning to Text
4. Speech Effect: Wisper
15. TEXT
Market grew by > 20%.
WORDSPHONEMES
{
{
{
{
{
ˈtwɛn.ti
pɚ.ˈsɛnt
ˈmɑɹ.kət ˈgɹu baɪ ˈmoʊɹ
ˈðæn
PROSODY CONTOURUNIT SELECTION AND ADAPTATION
TEXT PROCESSING
PROSODY MODIFICATIONSTREAMING
Market grew by more
than
twenty
percent
Speech units
inventory
17. GoAnimate is a cloud-based, animated video creation
plarform.
Amazon Polly gives
GoAnimate users the ability
to immediately give voice to
the characters they animate
using our platform.
Alvin Hung
CEO, GoAnimate
”
“ • Multi-language communication
• Training or HR professionals who
have to create content in many
languages
• Video preproduction
• Video makers who need to iterate
and fine-tune before the text-to-
speech is eventually replaced by a
professional voiceover
• K–12 education
• Students who make videos and
don’t have access to professional
voices or time for or knowledge of
voiceover
With Polly, GoAnimate gives voice to the characters in their animations
18. Royal National Institute of Blind People creates and
distributes accessible information in the form of
synthesized content
Amazon Polly delivers
incredibly lifelike voices which
captivate and engage our
readers.
John Worsfold
Solutions Implementation Manager, RNIB
”
“ • RNIB delivers largest library of
audiobooks in the UK for nearly 2
million people with sight loss
• Naturalness of generated speech is
critical to captivate and engage readers
• No restrictions on speech
redistributions enables RNIB to create
and distribute accessible information in
a form of synthesized content
RNIB provides the largest library in the UK for people with sight loss
27. Amazon Rekognition Customers
• Digital Asset Management
• Media and Entertainment
• Travel and Hospitality
• Influencer Marketing
• Systems Integration
• Digital Advertising
• Consumer Storage
• Law Enforcement
• Public Safety
• eCommerce
• Education
29. The Advent Of Conversational Interactions
1st Gen: Machine-oriented
interactions
30. The Advent Of Conversational Interactions
2nd Gen: Control-oriented
& translated
1st Gen: Machine-oriented
interactions
31. The Advent Of Conversational Interactions
1st Gen: Machine-oriented
interactions
2nd Gen: Control-oriented
& translated
3rd Gen:
Intent-oriented
32. Lex: Build Natural, Conversational Interactions In Voice & Text
Voice & Text
“Chatbots”
Powers
Alexa
Voice interactions on
mobile, web &
devices
Text interaction
with Slack & Messenger
Enterprise
Connectors
(with more coming)
Salesforce
Microsoft Dynamics
Marketo
Zendesk
Quickbooks
Hubspot
35. Customer Testimonials: HubSpot
“Through Amazon's Lex, we're adding sophisticated natural language processing capabilities that helps
GrowthBot provide a more intuitive UI for our users. Amazon Lex lets us take advantage of advanced A.I.
and machine learning without having to code the algorithms ourselves.”
“HubSpot's GrowthBot is an all-in-one chatbot which helps marketers and sales
people be more productive by providing access to relevant data and services using a
conversational interface. With GrowthBot, marketers can get help creating content,
researching competitors, and monitoring their analytics.”
36. Customer Testimonials: Capital One
“A highly scalable solution, it also offers potential to speed time to market for a new generation of voice
and text interactions such as our recently launched Capital One skill for Alexa.”
“As a heavy user of AWS, Amazon Lex’s seamless integration with
other AWS services like AWS Lambda and AWS DynamoDB is really
appealing.”
37. Elastic GPUs On EC2
P2M4 D2 X1 G2T2 R4 I3 C5
General
Purpose
GPU
General Purpose
Dense storage Large memory
Graphics
intensive
Memory intensive High I/O
Compute intensiveBurstable
Lightsail
Simple VPS
F1
FPGAs
Instance Families
38. Up to
40 thousand parallel processing cores
70 teraflops (single precision)
over 23 teraflops (double precision)
Instance Size GPUs GPU Peer
to Peer
vCPUs Memory
(GiB)
Network
Bandwidth*
p2.xlarge 1 - 4 61 1.25Gbps
p2.8xlarge 8 Y 32 488 10Gbps
p2.16xlarge 16 Y 64 732 20Gbps
*In a placement group
Amazon EC2 P2 Instances
39. F1 Instances: Bringing Hardware
Acceleration To All
FPGA Images Available In AWS Marketplace
F1 Instance
W ith your custom
logic running on an
FPGA
Develop, simulate,
debug
& compile your code
Package as
FPGA Images
40. Deep Learning Frameworks
MXNet, Caffe, Tensorflow,
Theano, and Torch
Pre-installed components to
speed productivity, such as
Nvidia drivers, cuDNN,
Anaconda, Python 2 & 3
AWS Integration
Deep Learning AMI
41. Deep Learning Framework Comparison
Apache MXNet TensorFlow Cognitive Toolkit
Industry Owner
N/A – Apache
Community
Google Microsoft
Programmability
Imperative and
Declarative
Declarative only Declarative only
Language
Support
R, Python, Scala, Julia,
Cpp. Javascript, Go,
Matlab and more..
Python, Cpp.
Experimental Go and
Java
Python, Cpp,
Brainscript.
Code Length|
AlexNet (Python)
44 sloc 107 sloc using TF.Slim 214 sloc
Memory Footprint
(LSTM)
2.6GB 7.2GB N/A
42. DEMO
Realtime detection and tracking on TX1
~10 frame/sec with 640x480 resolutionAutonomous driving at night
43. Some Quick Links on Deep Learning
Recommendation Engine: https://github.com/amzn/amazon-dsstne
It’s a sparse tensor network, suited for recommendations.
MXNet Learning: https://becominghuman.ai/an-introduction-to-the-
mxnet-api-part-1-848febdcf8ab
This is a 6-part series that walks through learning MXNet.
The basics are pretty simple, but the service has deep functionality.
You can send the service a simple it a simple string of text, and it will generate the life like voice in your choice of 47 different voices. But it’s not naive of the context of the text. For example, the text here - ‘WA’ and ‘degree F’, that would sound strange if it were spoken out loud, like I just had to. Instead, Polly will automatically expand the text strings ‘WA’ and ‘degree F’, to ‘Washington’ and ‘degrees fahrenheit’, to create more life like speech. The developer doesn’t have to do anything - just send the text, and get life like voice back.
19 min.
DEEP LEARNING FOR G2P and PROSODY CONTOUR
For phonemes – about use of Machine Learning
For Contour – again – LSTM: We took audio
Mention units adaptation to make sure that units match eacg other
STORY BACKGROUND
SOLUTION AND BENEFITS
ADDITIONAL INFORMATION
STORY BACKGROUND
SOLUTION AND BENEFITS
ADDITIONAL INFORMATION
It’s clear we love us some compute
Workloads are not vanilla - they are of different sizes and constraints
Like building a house: you don’t use just one tool, you use lots of tools in your toolbox.
It’s also true with any of these building block services; the right tool for the job you need to get done.
You can also add graphics processing which looks and operates just like a GPU. You can use the same OpenGL code that your application or game already uses, and have them rendered on a GPU. This is perfect if you only need a small part of a GPU more cost effectively (with the smallest option starting at just 1/8th of a GPU), or would like to be able to add graphics processing capabilities to instances which are optimized for I/O, storage, or memory workloads (scaling all the way up to connecting one or more full GPUs).
1/Develop, simulate, debug & compile your code
2/HW development kit and FPGA image
3/Create your own FPGA acceleration that you package into FPGA image
4/ Upload FPGA image
5/ MP