This document summarizes Amazon Web Services machine learning language services, including Amazon Transcribe, Translate, Polly, Comprehend, and Lex. It provides examples of how companies like Duolingo, Hotels.com, RingDNA, and ClearView Social use these services. The document also discusses how to get started with Amazon's machine learning stack and language APIs.
2. A Long History of ML at Amazon
Personalized
recommendations
Inventing entirely
new customer
experiences
Fulfillment
automation and
inventory
management
Drones Voice driven
interactions
5. The Amazon ML Stack
Platform Services
Application Services
Frameworks & Interfaces
Caffe2 CNTK
Apache
MXNet
PyTorch TensorFlow Torch Keras Gluon
AWS Deep Learning AMIs
Amazon SageMaker AWS DeepLens
Rekognition Transcribe Translate Polly Comprehend Lex
6. Powerful language capabilities
Amazon Transcribe
Automatic conversion of speech into accurate, grammatically correct
text
Amazon Translate Natural and fluent language translation
Amazon Polly Turn text into lifelike speech using deep learning
Amazon Comprehend Discover insights and relationships in text
Amazon Lex Conversational interfaces for text-based and voice-based applications
7. Benefits of ML Language Services
• Easily add intelligence to apps
with pre-trained APIs for speech, transcription, translation, language analysis, and chatbot
functionality
• Connect to comprehensive analytics
including data warehousing, business intelligence, batch processing, stream processing, and
workflow orchestration
• Integrate with the most complete big data platform
including the data lake and database tools to run machine learning workloads
8. Common Language Use Cases
Information Bots
Education
Accessibility
Knowledge Management
Voice of Customer
Applications
Customer Service/
Call Centers
Enterprise
Digital Assistant
Semantic Search
Captioning Workflows
LocalizationPersonalization
10. Amazon Polly
Turn text into lifelike speech using deep learning
Wide selection of voices
and languages
Synchronize speech Fine-grained control Unlimited replay
11. <speak xml:lang="en-US">
The price of this book is <prosody rate="60%">45€</prosody>
</speak>
A Focus OnVoice Quality & Pronunciation
Support for Speech Synthesis Markup Language (SSML)Version 1.0
https://www.w3.org/TR/speech-synthesis
12. Duolingo
Duolingo is the most popular language-
learning platform and the most
downloaded education app in the world,
with more than 170 million users.
They have run six A/B tests, testing an
Amazon Polly voice against a voice from
otherTTS providers.
For all of these experiments, the
winning condition was theAmazon
Polly voice
https://aws.amazon.com/blogs/machine-
learning/powering-language-learning-on-
duolingo-with-amazon-polly/
15. Languages
Available now
English to/from:
- Spanish
- Portuguese
- German
- French
- Arabic
- Simplified Chinese
Coming soon
- Japanese
- Russian
- Italian
- Traditional Chinese
- Turkish
- Czech
16. Hotels.com
MachineTranslation
« At Hotels.com, we are committed to offering all of our customers the most
relevant and up to date information about their destination.To achieve that,
we operate 90 localized websites in 41 languages.We have more than 25M
customer reviews and more are coming in every day, making a great
candidate for machine translation. Having evaluated AmazonTranslate and
several other solutions, we believe that AmazonTranslate presents a quick,
efficient and most importantly, accurate solution »
Matt Fryer,VP and Chief Data Science Officer, Hotels.com
18. AmazonTranscribe
Automatic conversion of speech into accurate, grammatically
correct text
Support for
telephony audio
Timestamp
generation
Intelligent punctuation
and formatting
Recognize multiple
speakers
Custom
vocabulary
Multiple
languages
19. ringDNA
RingDNA is an end-to-end communications
platform for sales teams.
Hundreds of enterprise organizations use
RingDNA to increase productivity, engage in
smarter sales conversations, gain predictive
sales insights and improve their win rate.
Speech toText
"A critical component of RingDNA’s Conversation AI
requires best of breed speech-to-text to deliver
transcriptions of every phone call. RingDNA is excited
about Amazon Transcribe since it provides high-quality
speech recognition at scale, helping us to better
transcribe every call to text"
Howard Brown, CEO & Founder, RingDNA
https://www.youtube.com/watch?v=1ZJ_f1bDdog
23. ClearView Social
ClearView Social enables a company’s
employees to share approved content on
LinkedIn,Twitter, and other social networks.
To eliminate the reliance on manual tagging,
ClearView Social turned to Amazon Comprehend
to find insights and relationships in text.
https://aws.amazon.com/blogs/machine-learning/clearview-social-
uses-amazon-comprehend-to-measure-the-impact-of-social-sharing/
Natural Language Processing
"We use Amazon Comprehend to read an article and
extract topics, which are automatically tagged using
machine learning. This automatic tagging helps
customers easily estimate the market value of their
engagement according to the current bid prices from the
Google AdWords API"
Bill Boulden, CTO,ClearView Social
26. Amazon Lex
Building conversational interfaces into any application using voice and
text
Integrated
development in the
AWS console
Trigger
Lambda
functions
Multi-step
conversations
One-click deployment Fully managed
27. Lex Use Case: Digital Assistant to Book a Hotel
Book hotel
NYC
“Book a hotel in
NYC”
Automatic speech recognition
Hotel booking
New York City
Natural language understanding Intent/slot
model
UtterancesHotel booking
City New York City
Check in November 30
Check out December 2
“Your hotel is booked for
November 30.”
Amazon Polly
Confirmation: “Your hotel is
booked for November 30.”
“Can I go ahead
with the booking?”
a
in
28. Liberty Mutual
https://www.youtube.com/watch?v=TeLvFqLW_0A
Chatbots
«With the growing importance of cognitive
technologies in every area of business, we were
delighted to see Amazon Lex, with its Speech
Recognition and Natural Language Understanding
technologies, added to the broad portfolio of
services AWS provides. Amazon Lex integrates
easily into our existing applications, as well as our
new cloud-native serverless architectures. »
Gillian Armstrong,Technologist, Liberty Mutual
At Amazon, we’ve been making huge investments in ML since 1995. It’s important to remember that many of the capabilities customers experience with Amazon are driven by machine learning. Our recommendations engine is driven by ML. The paths that optimize robotic picking routes in our fulfillment centers are driven by ML; and our supply chain, forecasting, and capacity planning are informed by ML algorithms. Our drone initiative, Prime Air, and the computer vision technology in our new retail experience, Amazon Go, are driven by deep learning.
We have thousands of engineers at Amazon committed to machine learning and deep learning, and it’s a big part of our heritage. Across every industry, motivated, technically skilled companies are starting to meaningfully use ML in their businesses too, and more are choosing AWS as the place to do it than anybody else.
We’re excited to have a broad set of customers running ML on AWS today.
[ADD: Conde Nas – using P2 and TF to detect hand bags https://technology.condenast.com/story/handbag-brand-and-color-detection]
NOTE: MTurk and Amazon ML
In the language space, Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for developers to add speech to text capability to their applications.
Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation.
Amazon Polly allows customers to easily turn text into life-like speech across a large number of voices and languages. Duolingo, the world’s most popular language-learning platform with more than 170 million users, ran several voice tests of Amazon Polly against their previous TTS providers, and Amazon Polly voice won in every scenario.
Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. Amazon Comprehend can indentify the language of the text; extract key phrases, places, people, brands, or events; understand how positive or negative the text is; and automatically organize a collection of text files by topic.
Finally, Amazon Lex uses the automatic speech recognition and natural language understanding technology that fuels Amazon Alexa to allow developers to quickly build intelligent conversational applications. Liberty Mutual uses Lex to provide a chatbot across a broad range of devices to help its employees quickly get answers to questions and complete common tasks.
Machine Learning Language services extract actionable insights from data in real time
Easily add intelligence to apps with pre-trained APIs for speech, transcription, translation, language analysis, and chatbot functionality
Connect to comprehensive analytics: Connect to services including data warehousing, business intelligence, batch processing, stream processing, and workflow orchestration
Platform integration: Integrate with the most complete platform for big data including the data lake and database tools to run machine learning workloads
ML Language use cases:
Voice of Customer Applications - Analyze what customer are saying about your brand, products and services
Information Bots - Build chatbots for everyday requests like news updates, weather info, sport scores
Education - Improve the usability of learning tools by adding speech to visual experiences across a wide variety of languages
Localization - Use real-time machine translation to fast-track localization projects and enable multilingual interaction
Semantic Search - Make search smarter by searching by keyphrase, sentiment and topic
Enterprise Digital Assistant - Streamline manual activities like checking sales performance or inventory status
Accessibility - Create and distribute information in the form of synthesized speech for visually impaired people
Knowledge Management - Organize and categorize documents for optimal discovery of insights
Customer Service/Call Centers - Improve customer engagements with voice and conversational customer experiences in contact centers
Captioning Workflows - Generate time-stamped subtitles that can be translated and displayed along with video content to improve access
Personalization - Use insights to personalize customer experiences and increase loyalty
Polly also support Speech Synthesis Markup Language (SSML) Version 1.0
The Voice Browser Working Group has sought to develop standards to enable access to the Web using spoken interaction.
You have customers all over the world who speak several different languages, so you want to translate it into a lot of different languages. Here again, the way that people have traditionally solved this problem is that they've hired translation agencies, which are expensive and time-consuming. They only pick out their most-important pieces to do, and they leave all that value on the table. Amazon Translate, which automatically translates text between languages, helps to solve this problem.
Translate is great for use cases that require real-time translation. These are things like live customer support or business communications with social media. You can also translate an entire bucket at one time in a batch operation.
Soon, you’ll be able to use the service to recognize the source language on the fly, so you don't even have to specify what language you're trying to translate from. And like all our other services, you will find this to be very cost-effective.
One of the things that's been interesting is that there's so much data now that's being locked up in audio and video files. The problem is that it’s really hard to search audio well. The best way to do it is to convert it from audio to text.
Traditionally how people have done this is that they've hired manual transcription agencies. They're expensive and they're time-consuming. So people typically only pick out the very most important things they want to transcribe, and they leave all the rest on the table. All this data and all this value is sitting out there, not being taken advantage of and leveraged. Amazon Transcribe solves this problem.
Transcribe does long-form automatic speech recognition. It can analyze any WAV or MP3 audio file and return text. It's super-useful for all kinds of things, like call logs and subtitles for videos or capturing what's said in a presentation or a meeting. We‘ve started with English and Spanish, but we'll have many more languages coming soon.
One of the things that we do with this service which is different from other transcription services is it won't show up to you as just one long, uninterrupted string of text like you'll find in other transcription services. Instead, we use machine learning to add in punctuation and grammatical formatting so the text you get back is immediately usable. Then we time-stamp every word so you can align subtitles to the video so that it's much easier to index.
The service supports high-quality audio, but also because so much of the audio data today is generated from phones, you have to be able to deal with lower-quality, low-bit-rate audio. And we uniquely support that as well.
Very soon, you’ll also be able to distinguish between multiple speakers, and add your own custom libraries and vocabularies because there are certain words that you may use in a different way than others that you want the service to understand so it could be used quickly the way you mean it.
Comprehend can understand documents, social networking posts, and really any other data in AWS. You simply provide the data that you're storing in your data lake in S3 via the Comprehend API.
Then Comprehend uses natural language processing to give you highly accurate information about what it contains in four categories: entities, which are really people and places and dates and brands and quantities; key phrases that add significance to the language; what language is being used; and sentiment, whether or not people are feeling positively or negatively about that content.
I'll give you a practical example, with Hotels.com, which is an entity of Expedia. They have thousands and thousands of customer views and comments on the various hotels that people have used their site to stay in, and what they liked or didn't like.
Before, it just looks like this big blob of data. It's really hard to pull out what matters and doesn't matter. They’re using Comprehend to understand, what are the unique characteristics that people really like about each hotel? Whether it's that they have a great dinner menu or great bathroom features or whatever it is. And what do they not like? So that when they're making recommendations to their users, they have the right way to differentiate and distinguish each hotel so people can make the right choices for themselves.
The other thing that Comprehend has, which you won't find anywhere else, is the ability not just to look at a single document at a time, but to look at millions of documents in order to identify the topics within these documents.
It's something that we call topic modeling. And it's incredibly useful if you think about it. If you're a publisher with thousands of articles, it makes it easy for you to just to use topic model and to figure out how to show just the business articles to the customers you think are interested in business. Or just the political articles to the people you think are just interested in politics. Or if you're a healthcare provider, you may want to group all these documents based on diagnosis or based on symptoms.
This is a much easier way to understand what is in your data and then group them in a way that you could actually display news that's useful for your customers.
Comprehend does this in an incredibly efficient manner. Just to give you an example, if you tried to take 300 documents, each about a megabyte each in size, Comprehend can build a topic model in just 45 minutes for $1.80. It completely changes the meaning that you can get from your content.
With Lex, any application running on the web, a mobile app, or a device, can send natural language - as both text or speech - to Lex using an API or SDK. Lex will apply ASR and NLU to the incoming message to understand the intent of the user, so to understand what the question is, and map that to a Lambda function which will process the information, and then form a response, which will be passed back to the user as either text, or will use Polly automatically to generate a voice response.
There's an integrated development console that lets you define what phrases you want to elicit what responses or additional questions...in either text or speech (where lambda triggers the response back)…
The service has an integrated development console - including the ability to be able to quickly specify example phrases which Lex will expand to capture more possible input from your users, and build sophisticated multi-step conversations - and you can build and test these models, or bots, right in the AWS console.
You can then deploy the application to a custom platform or to Facebook Messenger, Slack, Kik, and Twilio.
We also provide a collection of enterprise connectors: Salesforce, Microsoft Dynamics, Marketo, Zendesk, Quickbooks, and Hubspot
Which help you build new interfaces around existing enterprise data, and the whole thing is fully managed.