Hands-on with Amazon AI
Julien Simon"
Principal Technical Evangelist
julsimon@amazon.fr
@julsimon
Artificial Intelligence At Amazon
Thousands Of Employees Across The Company Focused on AI
Discovery &
Search
Fulfilment &
Logistics
Enhance
Existing Products
Define New
Categories Of
Products
Bring Machine
Learning To All
Amazon AI: Three New Deep Learning Services
Polly
 Rekognition
 Lex
Life-like Speech
 Image Analysis
 Conversational 
Engine
Amazon Polly
What is Amazon Polly
•  A service that converts text into lifelike speech
•  Offers 47 lifelike voices across 24 languages
•  Low latency responses enable developers to build real-
time systems
•  Developers can store, replay and distribute generated
speech
Amazon Polly: Quality
Natural sounding speech

A subjective measure of how close TTS output is to human speech.

Accurate text processing
Ability of the system to interpret common text formats such as abbreviations, numerical
sequences, homographs etc.
Today in Las Vegas, NV it's 54°F.
"We live for the music", live from the Madison Square Garden. 

Highly intelligibile 

A measure of how comprehensible speech is.
”Peter Piper picked a peck of pickled peppers.”
Amazon Polly: Language Portfolio
Americas:
•  Brazilian Portuguese
•  Canadian French
•  English (US) 
•  Spanish (US)
A-PAC:
•  Australian English 
•  Indian English 
•  Japanese
EMEA:
•  British English
•  Danish
•  Dutch 
•  French
•  German
•  Icelandic
•  Italian
•  Norwegian 
•  Polish 
•  Portuguese
•  Romanian
•  Russian
•  Spanish
•  Swedish
•  Turkish
•  Welsh
•  Welsh English
Amazon Polly features: SSML
Speech Synthesis Markup Language 


is a W3C recommendation, an XML-based markup language for speech

synthesis applications
<speak>
My name is Kuklinski. It is spelled
<prosody rate='x-slow'>
<say-as interpret-as="characters">Kuklinski</say-as>
</prosody>
</speak>
Amazon Polly features: Lexicons
Enables developers to customize the pronunciation of words or
phrases
My daughter’s name is Kaja.
<lexeme>
<grapheme>Kaja</grapheme>
<grapheme>kaja</grapheme>
<grapheme>KAJA</grapheme>
<phoneme>"kaI.@</phoneme>
</lexeme>
TEXT
Market grew by > 20%.
WORDS
PHONEMES
{
{
{
{
{
ˈtwɛn.ti 
pɚ.ˈsɛnt
ˈmɑɹ.kət
 ˈgɹu
 baɪ
 ˈmoʊɹ 
ˈðæn
PROSODY CONTOUR
UNIT SELECTION AND ADAPTATION
TEXT PROCESSING
PROSODY MODIFICATION
STREAMING
Market
 grew
 by
 more
than
twenty
percent
Speech units 
inventory
Polly Demo
Amazon Lex
Developer Challenges
Speech
Recognition
 Language
Understanding
Business Logic
Disparate Systems
Authentication
Messaging
platforms
Scale
 Testing
Security
Availability
Mobile
Conversational interfaces need to combine a large number of
sophisticated algorithms and technologies
Text and Speech Language Understanding 
Speech
Recognition
Natural Language 
Understanding
Powered by the same Deep Learning technology as Alexa
Deployment to Chat Services
Amazon Lex
Facebook Messenger
Card Description
Button 1
Button 2
Button 3
Card
Description
Option 1
Option 2
Authentication
Rich Formatting
One-Click Deployment
Mobile
Lex Bot Structure
Utterances
Spoken or typed phrases that invoke
your intent 
BookHotel
Intents
An Intent performs an action in
response to natural language user input
Slots
Slots are input data required to fulfill the
intent
Fulfillment
Fulfillment mechanism for your intent
Utterances
I’d like to book a hotel
I want to make my hotel reservations
I want to book a hotel in New York City

Can you help me book my hotel?
Slots
Destination
 City
 New York City, Seattle, London, …
Slot
 Type
 Values
Check In
 Date
 Valid dates
Check Out
 Date
 Valid dates
Slot Elicitation
I’d like to book a hotel
What date do you check in?
New York City
Sure what city do you want to book?
Nov 30th
 Check In
11/30/2016
City
New York City
Fulfillment
AWS Lambda Integration
 Return to Client
User input parsed to derive
intents and slot values. Output
returned to client for further
processing.
Intents and slots passed to
AWS Lambda function for
business logic
implementation.
“Book a Hotel”
Book
 Hotel
NYC
“Book a Hotel in 
NYC”
Automatic Speech
Recognition
Hotel Booking
New York City
Natural Language
Understanding
Intent/Slot 
Model
Utterances
Hotel Booking
City New York City
Check In Nov 30th
Check Out Dec 2nd
“Your hotel is booked for Nov
30th” 
Polly
Confirmation: “Your hotel is
booked for Nov 30th” 
a
in
“Can I go ahead with
the booking?
Amazon Lex - Technology
Amazon Lex
Automatic Speech
Recognition (ASR)
Natural Language
Understanding (NLU)
Same technology that powers Alexa
Cognito
 CloudTrail
 CloudWatch
AWS Services
Action
AWS Lambda
Authentication
& Visibility
Speech
API
Language
API
Fulfillment
End-Users
Developers
Console
SDK
Intents, 
Slots,
Prompts,
Utterances
Input: 
Speech 
or Text
Multi-Platform Clients:
Mobile, IoT, Web,
Chat 
API
Response: 
Speech (via Polly TTS)
or Text
Lex Demo
Amazon Rekognition
Amazon Rekognition
Deep learning-based image recognition service
Search, verify, and organize millions of images
Object and Scene
Detection
Facial
Analysis
Face
Comparison
Facial
Recognition
Amazon Rekognition API
DetectLabels
Object and Scene Detection
Detect objects, scenes, and concepts in images
Amazon Rekognition API
DetectLabels
{
"Confidence": 94.62968444824219,
"Name": "adventure"
},
{
"Confidence": 94.62968444824219,
"Name": "boat"
},
{
"Confidence": 94.62968444824219,
"Name": "rafting"
},
. . .
Amazon Rekognition API
Facial Analysis
Detect face and key facial characteristics
DetectFaces
Amazon Rekognition API
[
{
"BoundingBox": {
"Height": 0.3449999988079071,
"Left": 0.09666666388511658,
"Top": 0.27166667580604553,
"Width": 0.23000000417232513
},
"Confidence": 100,
"Emotions": [
{"Confidence": 99.1335220336914,
"Type": "HAPPY" },
{"Confidence": 3.3275485038757324,
"Type": "CALM"},
{"Confidence": 0.31517744064331055,
"Type": "SAD"}
],
"Eyeglasses": {"Confidence": 99.8050537109375,
"Value": false},
"EyesOpen": {Confidence": 99.99979400634766,
"Value": true},
"Gender": {"Confidence": 100,
"Value": "Female”}
DetectFaces
Demographic Data
Facial Landmarks
Sentiment Expressed
Image Quality
Facial Analysis
Brightness: 25.84
Sharpness: 160
General Attributes
CompareFaces
Amazon Rekognition API
Face Comparison
Face-based user verification
Amazon Rekognition API
CompareFaces
{
"FaceMatches": [
{"Face": {"BoundingBox": {
"Height": 0.2683333456516266,
"Left": 0.5099999904632568,
"Top": 0.1783333271741867,
"Width": 0.17888888716697693},
"Confidence": 99.99845123291016},
"Similarity": 96
},
{"Face": {"BoundingBox": {
"Height": 0.2383333295583725,
"Left": 0.6233333349227905,
"Top": 0.3016666769981384,
"Width": 0.15888889133930206},
"Confidence": 99.71249389648438},
"Similarity": 0
}
],
"SourceImageFace": {"BoundingBox": {
"Height": 0.23983436822891235,
"Left": 0.28333333134651184,
"Top": 0.351423978805542,
"Width": 0.1599999964237213},
"Confidence": 99.99344635009766}
}
Face Comparison
Amazon Rekognition API
Face Recognition
Index and Search faces in a collection
Index
Search
Collection
IndexFaces
SearchFacesByImage
Amazon Rekognition API
f7a3a278-2a59-5102-a549-a12ab1a8cae8
&
v1
02e56305-1579-5b39-ba57-9afb0fd8782d
&
v2
Face ID & vector<float>Face
4c55926e-69b3-5c80-8c9b-78ea01d30690
&
v3transformed
stored
{
f7a3a278-2a59-5102-a549-a12ab1a8cae8,
02e56305-1579-5b39-ba57-9afb0fd8782d,
4c55926e-69b3-5c80-8c9b-78ea01d30690
}
IndexFace
 Collection
Amazon Rekognition API
Face
{
f7a3a278-2a59-5102-a549-a12ab1a8cae8,
02e56305-1579-5b39-ba57-9afb0fd8782d,
4c55926e-69b3-5c80-8c9b-78ea01d30690
}
SearchFacebyImage
Collection
Nearest neighbor
search
Face ID
Face Recognition
Rekognition Demo
Julien Simon
julsimon@amazon.fr
@julsimon 
Your feedback 
is important to us!

Amazon AI (March 2017)

  • 1.
    Hands-on with AmazonAI Julien Simon" Principal Technical Evangelist julsimon@amazon.fr @julsimon
  • 2.
    Artificial Intelligence AtAmazon Thousands Of Employees Across The Company Focused on AI Discovery & Search Fulfilment & Logistics Enhance Existing Products Define New Categories Of Products Bring Machine Learning To All
  • 4.
    Amazon AI: ThreeNew Deep Learning Services Polly Rekognition Lex Life-like Speech Image Analysis Conversational Engine
  • 5.
  • 6.
    What is AmazonPolly •  A service that converts text into lifelike speech •  Offers 47 lifelike voices across 24 languages •  Low latency responses enable developers to build real- time systems •  Developers can store, replay and distribute generated speech
  • 7.
    Amazon Polly: Quality Naturalsounding speech A subjective measure of how close TTS output is to human speech. Accurate text processing Ability of the system to interpret common text formats such as abbreviations, numerical sequences, homographs etc. Today in Las Vegas, NV it's 54°F. "We live for the music", live from the Madison Square Garden. Highly intelligibile A measure of how comprehensible speech is. ”Peter Piper picked a peck of pickled peppers.”
  • 8.
    Amazon Polly: LanguagePortfolio Americas: •  Brazilian Portuguese •  Canadian French •  English (US) •  Spanish (US) A-PAC: •  Australian English •  Indian English •  Japanese EMEA: •  British English •  Danish •  Dutch •  French •  German •  Icelandic •  Italian •  Norwegian •  Polish •  Portuguese •  Romanian •  Russian •  Spanish •  Swedish •  Turkish •  Welsh •  Welsh English
  • 9.
    Amazon Polly features:SSML Speech Synthesis Markup Language is a W3C recommendation, an XML-based markup language for speech synthesis applications <speak> My name is Kuklinski. It is spelled <prosody rate='x-slow'> <say-as interpret-as="characters">Kuklinski</say-as> </prosody> </speak>
  • 10.
    Amazon Polly features:Lexicons Enables developers to customize the pronunciation of words or phrases My daughter’s name is Kaja. <lexeme> <grapheme>Kaja</grapheme> <grapheme>kaja</grapheme> <grapheme>KAJA</grapheme> <phoneme>"kaI.@</phoneme> </lexeme>
  • 11.
    TEXT Market grew by> 20%. WORDS PHONEMES { { { { { ˈtwɛn.ti pɚ.ˈsɛnt ˈmɑɹ.kət ˈgɹu baɪ ˈmoʊɹ ˈðæn PROSODY CONTOUR UNIT SELECTION AND ADAPTATION TEXT PROCESSING PROSODY MODIFICATION STREAMING Market grew by more than twenty percent Speech units inventory
  • 12.
  • 13.
  • 14.
    Developer Challenges Speech Recognition Language Understanding BusinessLogic Disparate Systems Authentication Messaging platforms Scale Testing Security Availability Mobile Conversational interfaces need to combine a large number of sophisticated algorithms and technologies
  • 15.
    Text and SpeechLanguage Understanding Speech Recognition Natural Language Understanding Powered by the same Deep Learning technology as Alexa
  • 16.
    Deployment to ChatServices Amazon Lex Facebook Messenger Card Description Button 1 Button 2 Button 3 Card Description Option 1 Option 2 Authentication Rich Formatting One-Click Deployment Mobile
  • 17.
    Lex Bot Structure Utterances Spokenor typed phrases that invoke your intent BookHotel Intents An Intent performs an action in response to natural language user input Slots Slots are input data required to fulfill the intent Fulfillment Fulfillment mechanism for your intent
  • 18.
    Utterances I’d like tobook a hotel I want to make my hotel reservations I want to book a hotel in New York City Can you help me book my hotel?
  • 19.
    Slots Destination City NewYork City, Seattle, London, … Slot Type Values Check In Date Valid dates Check Out Date Valid dates
  • 20.
    Slot Elicitation I’d liketo book a hotel What date do you check in? New York City Sure what city do you want to book? Nov 30th Check In 11/30/2016 City New York City
  • 21.
    Fulfillment AWS Lambda Integration Return to Client User input parsed to derive intents and slot values. Output returned to client for further processing. Intents and slots passed to AWS Lambda function for business logic implementation.
  • 22.
    “Book a Hotel” Book Hotel NYC “Book a Hotel in NYC” Automatic Speech Recognition Hotel Booking New York City Natural Language Understanding Intent/Slot Model Utterances Hotel Booking City New York City Check In Nov 30th Check Out Dec 2nd “Your hotel is booked for Nov 30th” Polly Confirmation: “Your hotel is booked for Nov 30th” a in “Can I go ahead with the booking?
  • 23.
    Amazon Lex -Technology Amazon Lex Automatic Speech Recognition (ASR) Natural Language Understanding (NLU) Same technology that powers Alexa Cognito CloudTrail CloudWatch AWS Services Action AWS Lambda Authentication & Visibility Speech API Language API Fulfillment End-Users Developers Console SDK Intents, Slots, Prompts, Utterances Input: Speech or Text Multi-Platform Clients: Mobile, IoT, Web, Chat API Response: Speech (via Polly TTS) or Text
  • 24.
  • 25.
  • 26.
    Amazon Rekognition Deep learning-basedimage recognition service Search, verify, and organize millions of images Object and Scene Detection Facial Analysis Face Comparison Facial Recognition
  • 27.
    Amazon Rekognition API DetectLabels Objectand Scene Detection Detect objects, scenes, and concepts in images
  • 28.
    Amazon Rekognition API DetectLabels { "Confidence":94.62968444824219, "Name": "adventure" }, { "Confidence": 94.62968444824219, "Name": "boat" }, { "Confidence": 94.62968444824219, "Name": "rafting" }, . . .
  • 29.
    Amazon Rekognition API FacialAnalysis Detect face and key facial characteristics DetectFaces
  • 30.
    Amazon Rekognition API [ { "BoundingBox":{ "Height": 0.3449999988079071, "Left": 0.09666666388511658, "Top": 0.27166667580604553, "Width": 0.23000000417232513 }, "Confidence": 100, "Emotions": [ {"Confidence": 99.1335220336914, "Type": "HAPPY" }, {"Confidence": 3.3275485038757324, "Type": "CALM"}, {"Confidence": 0.31517744064331055, "Type": "SAD"} ], "Eyeglasses": {"Confidence": 99.8050537109375, "Value": false}, "EyesOpen": {Confidence": 99.99979400634766, "Value": true}, "Gender": {"Confidence": 100, "Value": "Female”} DetectFaces
  • 31.
    Demographic Data Facial Landmarks SentimentExpressed Image Quality Facial Analysis Brightness: 25.84 Sharpness: 160 General Attributes
  • 32.
    CompareFaces Amazon Rekognition API FaceComparison Face-based user verification
  • 33.
    Amazon Rekognition API CompareFaces { "FaceMatches":[ {"Face": {"BoundingBox": { "Height": 0.2683333456516266, "Left": 0.5099999904632568, "Top": 0.1783333271741867, "Width": 0.17888888716697693}, "Confidence": 99.99845123291016}, "Similarity": 96 }, {"Face": {"BoundingBox": { "Height": 0.2383333295583725, "Left": 0.6233333349227905, "Top": 0.3016666769981384, "Width": 0.15888889133930206}, "Confidence": 99.71249389648438}, "Similarity": 0 } ], "SourceImageFace": {"BoundingBox": { "Height": 0.23983436822891235, "Left": 0.28333333134651184, "Top": 0.351423978805542, "Width": 0.1599999964237213}, "Confidence": 99.99344635009766} }
  • 34.
  • 35.
    Amazon Rekognition API FaceRecognition Index and Search faces in a collection Index Search Collection IndexFaces SearchFacesByImage
  • 36.
    Amazon Rekognition API f7a3a278-2a59-5102-a549-a12ab1a8cae8 & v1 02e56305-1579-5b39-ba57-9afb0fd8782d & v2 FaceID & vector<float>Face 4c55926e-69b3-5c80-8c9b-78ea01d30690 & v3transformed stored { f7a3a278-2a59-5102-a549-a12ab1a8cae8, 02e56305-1579-5b39-ba57-9afb0fd8782d, 4c55926e-69b3-5c80-8c9b-78ea01d30690 } IndexFace Collection
  • 37.
  • 38.
  • 39.
  • 40.