Building Speech Enabled Products with Amazon Polly & Amazon Lex

Introduction to
Amazon Polly
and
Amazon Lex

What to Expect from the Session
• Introduction to Amazon Polly
• Features and Functionality
• Pricing
• Use Cases

Why we built Polly
• Apps using voice to communicate with end-users are
becoming more common
• Naturalness of generated speech is a key element of
user experience
• Integration of speech varies across use cases

What is Polly
• A service that converts text into lifelike speech
• Low latency responses enable developers to build
real-time systems
• Developers can store, replay and distribute generated
speech
• Offers 47 lifelike voices and 24 languages

Polly – Wide Selection of Voices and Languages
Americas:
• Brazilian Portuguese
• Canadian French
• English (US)
• Spanish (US)
A-PAC:
• Australian English
• Indian English
• Japanese
EMEA:
• Danish
• Dutch
• British English
• French
• German
• Icelandic
• Italian
• Norwegian
• Polish
• Portuguese
• Romanian
• Russian
• Spanish
• Swedish
• Turkish
• Welsh
• Welsh English

Polly – Quality
Natural sounding speech
A subjective measure of how close TTS output is to human speech.
Accurate text processing
Ability of the system to interpret common text formats such as abbreviations, numerical
sequences, homographs etc.
Today in Las Vegas, NV it's 90°F.
"We live for the music", live from the Madison Square Garden.
Highly intelligibile
A measure of how comprehensible speech is.
”Peter Piper picked a peck of pickled peppers.”

Polly features: Simple-to-use API
Simple-to-use API
Amazon Polly provides an API that enables you to quickly integrate speech
synthesis into your application. You simply send the text you want converted
into speech to the Polly API, and Polly immediately returns the audio stream to
your application so your application can begin streaming it directly or store it in
a standard audio file format, such as MP3.
Sampling Rate
"Hi. My name is Kuklinski."
Sample Code
from boto3 import client
polly = client("polly", region_name="us-east-1")
response = polly.synthesize_speech(
Text="Hi. My name is Kuklinski.",
OutputFormat="mp3",
VoiceId="Joanna")

Polly features: SSML
Speech Synthesis Markup Language
is a W3C recommendation, an XML-based markup language for speech
synthesis applications
<speak>
My name is Kuklinski. It is spelled
<prosody rate='x-slow'>
<say-as interpret-as="characters">Kuklinski</say-as>
</prosody>
</speak>

Polly features: Lexicons
Enables developers to customize the pronunciation of
words or phrases
My daughter’s name is Kaja.
<lexeme>
<grapheme>Kaja</grapheme>
<grapheme>kaja</grapheme>
<grapheme>KAJA</grapheme>
<phoneme>"kaI.@</phoneme>
</lexeme>

Polly is cost-effective
• Pay-as-you-go
• $4 for 1M characters
• Free Tier of 5M characters/month - first year
• You can store and reuse generated speech

Use cases for text-to-speech
• Multi-language communication
• Training or HR professionals who have to create content in
many languages
• Video preproduction
• Video makers who need to iterate and fine-tune before the
text-to-speech is eventually replaced by a professional
voiceover
• K–12 education
• Students who make videos and don’t have access to
professional voices or time for or knowledge of voiceover

Duolingo voices its language learning service Using Polly
Duolingo is a free language learning service where
users help translate the web and rate translations.
With Amazon Polly our users
benefit from the most lifelike
Text-to-Speech voices
available on the market.
Severin Hacker
CTO, Duolingo
”
“ • Spoken language crucial for
language learning
• Accurate pronunciation matters
• Faster iteration thanks to TTS
• As good as natural human speech

GoAnimate is a cloud-based, animated video creation
plarform.
Amazon Polly gives
GoAnimate users the ability
to immediately give voice to
the characters they animate
using our platform.
Alvin Hung
CEO, GoAnimate
”
“ • Multi-language communication
• Training or HR professionals who
have to create content in many
languages
• Video preproduction
• Video makers who need to iterate
and fine-tune before the text-to-
speech is eventually replaced by a
professional voiceover
• K–12 education
• Students who make videos and
don’t have access to professional
voices or time for or knowledge of
voiceover
With Polly, GoAnimate gives voice to the characters in their animations

Amazon Lex: New service for building
conversational interfaces using voice and
text

What to Expect from the Session
• Introduction to Amazon Lex
• Features and Functionalities
• Case Studies
• Building a Slack chat bot using Amazon Lex + AWS
Lambda

Advent of Conversational Interactions
1st Gen:
Punch Cards & Memory Registers
2nd Gen:
Pointers & Sliders
3nd Gen:
Conversational Interfaces

Conversational Access
On-Demand
Accessible
Efficient
Natural

Developer Challenges
Speech
Recognition Language
Understanding
Business Logic
Disparate
Systems
Authentication
Messaging
platforms
Scale Testing
Security
Availability
Mobile
Conversational interfaces need to combine a large number of
sophisticated algorithms and technologies

Amazon Lex - Features
Text and Speech language understanding: Powered by the same technology as
Alexa
Enterprise SaaS Connectors: Connect to enterprise systems
Deployment to chat services
Efficient and intuitive tools to build conversations; scales automatically
Versioning and alias support

Text and Speech Language Understanding
Speech
Recognition
Natural Language
Understanding
Powered by the same Deep Learning technology as Alexa

Easy to Build
Efficient and intuitive tools to build conversations

Deployment to Chat Services
Amazon Lex
Facebook
Messenger
Card Description
Button 1
Button 2
Button 3
Card
Description
Option 1
Option 2
Authentication
Rich FormattingOne-Click Deployment
Mobile

Versioning and Alias Support
AliasVersioning
v1 v2 v3 latest
• Supported for Intents, Slots and Bots
• Enables multi-developer environment
• Rollback to previous versions
v1 Dev
v2 Stage
v3 Prod
• Deploy different aliases to different platforms
• Run different stacks for dev, stage and prod environments
• Target different user groups with different aliases

AWS Mobile Hub Integration
Authenticate users
Analyze user behavior
Store and share media
Synchronize data
More ….
Track retention
Conversational Bots
LexAWS Mobile SDKs
AWS Mobile Hub

Amazon Lex – Use Cases
Informational Bots
Chatbots for everyday consumer requests
Application Bots
Build powerful interfaces to mobile applications
• News updates
• Weather information
• Game scores ….
• Book tickets
• Order food
• Manage bank accounts ….
Enterprise Productivity Bots
Streamline enterprise work activities and improve efficiencies
• Check sales numbers
• Marketing performance
• Inventory status ….
Internet of Things (IoT) Bots
Enable conversational interfaces for device interactions
• Wearables
• Appliances
• Auto ….

Lex Bot Structure
Utterances
Spoken or typed phrases that invoke
your intent
BookHotel
Intents
An Intent performs an action in
response to natural language user
input
Slots
Slots are input data required to fulfill
the intent
Fulfillment
Fulfillment mechanism for your intent

Save, Build and Publish
Save Build
Saving your bot
preserves the current
state on the server
Building your bot
creates versions
that you can test.
Publish
Publishing your bot will create a
version of your bot and provide
an alias to your clients
Test
Test your bot in a
chat window on the
Console

Monitoring
Track your bot
Request Latency
Missed Utterance Count

Customer Testimonials: Capital One
“A highly scalable solution, it also offers potential to speed time to market for a new generation of voice
and text interactions such as our recently launched Capital One skill for Alexa.”
“As a heavy user of AWS, Amazon Lex’s seamless integration with
other AWS services like AWS Lambda and AWS DynamoDB is really
appealing.”

Customer Testimonials: HubSpot
“Through Amazon's Lex, we're adding sophisticated natural language processing capabilities that helps
GrowthBot provide a more intuitive UI for our users. Amazon Lex lets us take advantage of advanced A.I.
and machine learning without having to code the algorithms ourselves.”
“HubSpot's GrowthBot is an all-in-one chatbot which helps marketers and sales
people be more productive by providing access to relevant data and services using a
conversational interface. With GrowthBot, marketers can get help creating content,
researching competitors, and monitoring their analytics.”

Demo
A “DevOps” chat bot
integrated with Slack
using Lex + Lambda

Building Speech Enabled Products with Amazon Polly & Amazon Lex

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Building Speech Enabled Products with Amazon Polly & Amazon Lex

Similar to Building Speech Enabled Products with Amazon Polly & Amazon Lex (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

Building Speech Enabled Products with Amazon Polly & Amazon Lex

Editor's Notes