2. What to Expect from the Session
• Introduction to Amazon Polly
• Features and Functionality
• Pricing
• Use Cases
3. Why we built Polly
• Apps using voice to communicate with end-users are
becoming more common
• Naturalness of generated speech is a key element of
user experience
• Integration of speech varies across use cases
4. What is Polly
• A service that converts text into lifelike speech
• Low latency responses enable developers to build
real-time systems
• Developers can store, replay and distribute generated
speech
• Offers 47 lifelike voices and 24 languages
5. Polly – Wide Selection of Voices and Languages
Americas:
• Brazilian Portuguese
• Canadian French
• English (US)
• Spanish (US)
A-PAC:
• Australian English
• Indian English
• Japanese
EMEA:
• Danish
• Dutch
• British English
• French
• German
• Icelandic
• Italian
• Norwegian
• Polish
• Portuguese
• Romanian
• Russian
• Spanish
• Swedish
• Turkish
• Welsh
• Welsh English
6. Polly – Quality
Natural sounding speech
A subjective measure of how close TTS output is to human speech.
Accurate text processing
Ability of the system to interpret common text formats such as abbreviations, numerical
sequences, homographs etc.
Today in Las Vegas, NV it's 90°F.
"We live for the music", live from the Madison Square Garden.
Highly intelligibile
A measure of how comprehensible speech is.
”Peter Piper picked a peck of pickled peppers.”
8. Polly features: Simple-to-use API
Simple-to-use API
Amazon Polly provides an API that enables you to quickly integrate speech
synthesis into your application. You simply send the text you want converted
into speech to the Polly API, and Polly immediately returns the audio stream to
your application so your application can begin streaming it directly or store it in
a standard audio file format, such as MP3.
Sampling Rate
"Hi. My name is Kuklinski."
Sample Code
from boto3 import client
polly = client("polly", region_name="us-east-1")
response = polly.synthesize_speech(
Text="Hi. My name is Kuklinski.",
OutputFormat="mp3",
VoiceId="Joanna")
9. Polly features: SSML
Speech Synthesis Markup Language
is a W3C recommendation, an XML-based markup language for speech
synthesis applications
<speak>
My name is Kuklinski. It is spelled
<prosody rate='x-slow'>
<say-as interpret-as="characters">Kuklinski</say-as>
</prosody>
</speak>
10. Polly features: Lexicons
Enables developers to customize the pronunciation of
words or phrases
My daughter’s name is Kaja.
<lexeme>
<grapheme>Kaja</grapheme>
<grapheme>kaja</grapheme>
<grapheme>KAJA</grapheme>
<phoneme>"kaI.@</phoneme>
</lexeme>
13. Polly is cost-effective
• Pay-as-you-go
• $4 for 1M characters
• Free Tier of 5M characters/month - first year
• You can store and reuse generated speech
15. Use cases for text-to-speech
• Multi-language communication
• Training or HR professionals who have to create content in
many languages
• Video preproduction
• Video makers who need to iterate and fine-tune before the
text-to-speech is eventually replaced by a professional
voiceover
• K–12 education
• Students who make videos and don’t have access to
professional voices or time for or knowledge of voiceover
16. Duolingo voices its language learning service Using Polly
Duolingo is a free language learning service where
users help translate the web and rate translations.
With Amazon Polly our users
benefit from the most lifelike
Text-to-Speech voices
available on the market.
Severin Hacker
CTO, Duolingo
”
“ • Spoken language crucial for
language learning
• Accurate pronunciation matters
• Faster iteration thanks to TTS
• As good as natural human speech
17. GoAnimate is a cloud-based, animated video creation
plarform.
Amazon Polly gives
GoAnimate users the ability
to immediately give voice to
the characters they animate
using our platform.
Alvin Hung
CEO, GoAnimate
”
“ • Multi-language communication
• Training or HR professionals who
have to create content in many
languages
• Video preproduction
• Video makers who need to iterate
and fine-tune before the text-to-
speech is eventually replaced by a
professional voiceover
• K–12 education
• Students who make videos and
don’t have access to professional
voices or time for or knowledge of
voiceover
With Polly, GoAnimate gives voice to the characters in their animations
18. Amazon Lex: New service for building
conversational interfaces using voice and
text
19. What to Expect from the Session
• Introduction to Amazon Lex
• Features and Functionalities
• Case Studies
• Building a Slack chat bot using Amazon Lex + AWS
Lambda
23. Amazon Lex - Features
Text and Speech language understanding: Powered by the same technology as
Alexa
Enterprise SaaS Connectors: Connect to enterprise systems
Deployment to chat services
Efficient and intuitive tools to build conversations; scales automatically
Versioning and alias support
24. Text and Speech Language Understanding
Speech
Recognition
Natural Language
Understanding
Powered by the same Deep Learning technology as Alexa
27. Versioning and Alias Support
AliasVersioning
v1 v2 v3 latest
• Supported for Intents, Slots and Bots
• Enables multi-developer environment
• Rollback to previous versions
v1 Dev
v2 Stage
v3 Prod
• Deploy different aliases to different platforms
• Run different stacks for dev, stage and prod environments
• Target different user groups with different aliases
28. AWS Mobile Hub Integration
Authenticate users
Analyze user behavior
Store and share media
Synchronize data
More ….
Track retention
Conversational Bots
LexAWS Mobile SDKs
AWS Mobile Hub
29. Amazon Lex – Use Cases
Informational Bots
Chatbots for everyday consumer requests
Application Bots
Build powerful interfaces to mobile applications
• News updates
• Weather information
• Game scores ….
• Book tickets
• Order food
• Manage bank accounts ….
Enterprise Productivity Bots
Streamline enterprise work activities and improve efficiencies
• Check sales numbers
• Marketing performance
• Inventory status ….
Internet of Things (IoT) Bots
Enable conversational interfaces for device interactions
• Wearables
• Appliances
• Auto ….
30. Lex Bot Structure
Utterances
Spoken or typed phrases that invoke
your intent
BookHotel
Intents
An Intent performs an action in
response to natural language user
input
Slots
Slots are input data required to fulfill
the intent
Fulfillment
Fulfillment mechanism for your intent
31. Save, Build and Publish
Save Build
Saving your bot
preserves the current
state on the server
Building your bot
creates versions
that you can test.
Publish
Publishing your bot will create a
version of your bot and provide
an alias to your clients
Test
Test your bot in a
chat window on the
Console
33. Customer Testimonials: Capital One
“A highly scalable solution, it also offers potential to speed time to market for a new generation of voice
and text interactions such as our recently launched Capital One skill for Alexa.”
“As a heavy user of AWS, Amazon Lex’s seamless integration with
other AWS services like AWS Lambda and AWS DynamoDB is really
appealing.”
34. Customer Testimonials: HubSpot
“Through Amazon's Lex, we're adding sophisticated natural language processing capabilities that helps
GrowthBot provide a more intuitive UI for our users. Amazon Lex lets us take advantage of advanced A.I.
and machine learning without having to code the algorithms ourselves.”
“HubSpot's GrowthBot is an all-in-one chatbot which helps marketers and sales
people be more productive by providing access to relevant data and services using a
conversational interface. With GrowthBot, marketers can get help creating content,
researching competitors, and monitoring their analytics.”
I will start with few words on what is Amazon Polly and why we built it. Talk a bit about features and functionality of the service. Then I will dive more into the speech sytnehsis technology to present how this system works. I will show how easy it is to start with Polly and present how we deal with various texts. I will end with description of pricing of the service.
And then last but not least we will have demo of real-life use cases for 2 of Amazon Polly customers who agreed to join us on stage today.
So I am sure it will be interesting for you.
When we talked to our customers, the feedback we were consistently getting is that the use of Text-to-Speech in their applications become a standard element of UI. More and more applications use speech as one of mediums for communciation with end-user. One of critical drivers for adoption of TTS is naturalness of generated speech (so naturalness of interaction for end-users). Another important element is that customers use speech in variety of different ways sometimes streaming it directly to customers and sometimes storing it or caching on their premise. Customers wants to easy to integrate with and quick to deploy final solution. Amazon Polly addresses all those needs.
Thats a services that converts text to lifelike specch.
It is fully managed and continuously improved: You dont have to worry about updating your TTS on-device. Any time you refer to Polly you will get most accurate and natural speech.
Polly can speak in 24 languages and offer variety of 47 voices.
Low latency responses enables developers to build real-time solutions like conversational systems
Amazon Polly includes 47 lifelike voices and has support for 24 languages. Regions marked with orange on this map are regions where one of our languages is commonly used so those are regions where you can dleiver your app using Amazon Polly.
Rephrase what is it really it is.
First functionality I would like to present today is SSML. It is an XML-based markup lkanguage which enables developers to annotate text they push to Amazon Polly and this way control the output. With SSML developers can control various aspects of speech. They can change pitch, speech rate, add additional pause (for example additional pause before punch line in a joke) or control the way how TTS system interprets the text.
Here is an example which would especially usefull for me. Spelling your last name using foreign language is difficult (at least for me) and each time I need to check-in in the hotel, I need to spell-out my last time and I am struggling with that. TTS could be very helpfull in such case.
Second functionality of Polly I would like to present is ability to customize speech output. If we would like to customize pronunciation of specific word (for example foreign name, nick name, product name we just figured out) and make it consistent behavior on my account (in my application) I can use lexicon to achieve that. This is my favorite example since this is really my daughters name. And each time english speakers tries to pronounce it, it sounds like that
....
That is perfectly OK in English because thats how english native speakers pronounce J. But in reality my daughters name is Polish so I would like it to be prouned with J. If I will add such entry to my lexicon, every time my app will find Kaja written in any of those ways it will proonounce it
...
Rephrase what is it really it is.
4 dollars per 1M characters means that:
synthesizing single average news article from costs 3 cents.
Cost of synthesizing „A Christmas Carol” by Charles Dickens costs 66 cents and
synthesizing „The Two towers” (which is 15 hours of speech) is a cost of 3 dollars and 10 cents.
5M characters per month for first year. Again this is quite a lot of text.
It means that in a month you can synthesize ~1600 email messages,
770 news articles or
3 first parts of A song of Ice and Fire (A Game of Thrones, A clash of kings and a storm of swords)
Useful links : http://cesspit.net/drupal/node/1869/ ;
Rephrase what is it really it is.
STORY BACKGROUND
Duolingo provides a free language-learning app that uses crowd sourcing to translate web content as users learn.
In 2012, Apple named the Duolingo app iPhone App of the Year.
Duolingo has to be able to scale to manage new users and in addition, expand the service to offer new languages.
SOLUTION AND BENEFITS
Learned about Amazon DynamoDB at re:Invent 2012.
DynamoDB is Duolingo’s largest and most active data store.
The company also uses Amazon Relational Database Service (Amazon RDS) running MySQL with provisioned IOPS storage.
Elastic Load Balancing distributes web and mobile traffic across approximately 170 Amazon Elastic Compute Cloud (Amazon EC2) instances.
Using Amazon DynamoDB, Amazon EC2, Elastic Load Balancing, Amazon SNS, Amazon SQS, Amazon VPC, Amazon CloudFront and Amazon CloudWatch
ADDITIONAL INFORMATION
https://aws.amazon.com/solutions/case-studies/duolingo
STORY BACKGROUND
SOLUTION AND BENEFITS
ADDITIONAL INFORMATION
Adding conversational interfaces
One example is chat bots
LEX: Architecture, Features, Benefits and Walkthrough of building a FB bot on LEX and Lambda.
MOBILE HUB AND Enterprise SaaS connectors: Mobile and enterprise integrations
Integration with Mobile Hub
Deployment to Chat services
Versioning
First one: deep learning
Design for builders
Deploy to chat services
Versioing & aliases
Enterprise connectors
Lex is integrated with AWS Mobile Hub.
Mobile Hub has serveral other integrations that make building apps easier – such as for User Authentication, Analytics, etc.
Lex is one more such integration to easily add conversational interfaces to your apps
DEMO:
Next I will provide a complete walkthrough of building a serverless chatbot using LEX and LAMBDA
PartA: Lambda as bot backend
PartB: Lex bot with NLU and management
PartC: Deployment to facebook