Amazon Polly is a service that turns text into lifelike speech. Amazon Polly lets you create applications that can talk, enabling you to create entirely new categories of speech-enabled products. In this webinar, you’ll get an overview of how Polly uses advanced deep learning technologies to synthesize speech that sounds like a human voice. You’ll also learn how you can use Polly’s 47 lifelike voices and support for 24 languages to build speech-enabled applications that work in many different countries.
Learning Objectives:
• Learn about the capabilities and features of Amazon Polly
• Learn about the benefits of Amazon Polly
• Learn about the different use cases
• Learn how to get started using Amazon Polly
• Learn how Polly speech audio can be distributed without restriction
• Get an overview of SSML
• Understand what is included in the AWS Free Tier and how to estimate usage costs
2. Introduction to Amazon Polly
Features and functionalities
Text-to-Speech: Under the Hood
Getting started
Workshop & Demo
Pricing
Q&A
What to Expect from the Session
4. Why we built Polly
Apps using voice to communicate with end-users are
becoming more common every day
Naturalness of generated speech is a key element of
user experience
Integration of speech varies across use cases
5. What is Polly
A service that converts text into lifelike speech
Offers 47 lifelike voices and 24 languages
Low latency responses enable developers to build real-time systems
Developers can store, replay and distribute generated speech
6. Polly – Quality
Natural sounding speech
A subjective measure of how close TTS output is to human speech.
Accurate text processing
Ability of the system to interpret common text formats such as abbreviations, numerical
sequences, homographs etc.
Today in Las Vegas, NV it's 90°F.
"We live for the music", live from the Madison Square Garden.
Highly intelligibile
A measure of how comprehensible speech is.
”Peter Piper picked a peck of pickled peppers.”
7. Polly – Language Portfolio
Americas:
Brazilian Portuguese
Canadian French
English (US)
Spanish (US)
A-PAC:
Australian English
Indian English
Japanese
EMEA:
Danish
Dutch
British English
French
German
Icelandic
Italian
Norwegian
Polish
Portuguese
Romanian
Russian
Spanish
Swedish
Turkish
Welsh
Welsh English
9. Polly features: SSML
Speech Synthesis Markup Language
is a W3C recommendation, an XML-based markup language for speech
synthesis applications
<speak>
My name is Kuklinski. It is spelled
<prosody rate='x-slow'>
<say-as interpret-as="characters">Kuklinski</say-as>
</prosody>
</speak>
10. Polly features: Lexicons
Enables developers to customize the pronunciation of
words or phrases
My daughter’s name is Kaja.
<lexeme>
<grapheme>Kaja</grapheme>
<grapheme>kaja</grapheme>
<grapheme>KAJA</grapheme>
<phoneme>"kaI.@</phoneme>
</lexeme>
12. Goal: Convert text into intelligible, accurate, and natural speech
Challenges
• Homographs: words written identically that have different
pronunciation
I live in Las Vegas vs This presentation broadcasts live from Las Vegas
• Text normalization: disambiguation of abbreviations, acronyms, units
‘St.’ expanded as ‘street’ or ‘saint’
• Conversion of text to phonemes (Grapheme-to-Phoneme) in
languages with complex mapping such as English e.g. tough,
through, though
• Foreign words (déjà vu), proper names (François Hollande), slang
(ASAP, LOL) etc.
Main Challenges of Text-to-Speech
13. TEXT
Market grew by > 20%.
WORDSPHONEMES
{
{
{
{
{
ˈtwɛn.ti
pɚ.ˈsɛnt
ˈmɑɹ.kət ˈgɹu baɪ ˈmoʊɹ
ˈðæn
PROSODY CONTOURUNIT SELECTION AND ADAPTATION
TEXT PROCESSING
PROSODY MODIFICATIONSTREAMING
Market grew by more
than
twenty
percent
Speech units
inventory
14. Unit Selection
Conversion of phoneme sequence to waveform
Database of recorded audio
Unit – diphone
Coverage of diphones and various features
e.g. Allophonic variation
• Pin vs Spin vs limping
15. Recording Data for TTS
Tons of text
Recording script:
Few weeks of
recordings
Automatic
selection of
texts
Recording script:
• Covers all combinations of diphones
and significant features in a
language
16. an error occurred while searching for your route
because snaps weren't all so obedient anymore,
now we say apple again. and we say apple,
general electric soars today. information on general
electric
quick breads, zucchini, holiday, crock pot, cake,
so are you still keeping tabs on your old team,
that weighs more than four tons, disrupts the
herring's swim
…
An apple a day, keeps …
26. Polly is cost-effective
Pay-as-you-go
$4 for 1M characters
Free Tier of 5M characters/month - first year
You can store and reuse generated speech