Amazon Rekognition Image and Video make it easy to add image and video analysis to your applications. You just provide an image or video to the Rekognition API, and the service can identify the objects, people, text, scenes, and activities, as well as detect any inappropriate content. Amazon Rekognition also provides highly accurate facial analysis and facial recognition, even with live stream video. You can detect, analyze, and compare faces for a wide variety of user verification, cataloging, people counting, and public safety use cases.
For example, Marinas Analytics is using Amazon Rekognition to identify, locate, and rescue missing persons. Marinas added facial recognition to Traffic Jam, its suite of tools for law enforcement agencies working on these types of investigations, and has effectively turned facial recognition technology against the vast secret networks of human traffickers.
There is so much data now that's being locked up in audio and video files. The problem is that it’s really hard to search audio well. The best way to do it is to convert it from audio to text.
Traditionally, how people have done this is that they've hired manual transcription agencies. Doing so is expensive and time-consuming. So, people typically only pick out the most important things they want to transcribe, and they leave the rest on the table. All this data and all this value is sitting out there, not being taken advantage of and leveraged. Amazon Transcribe solves this problem.
Transcribe does long-form automatic speech recognition. It can analyze any WAV or MP3 audio file and return text. It has many uses, including: call logs, subtitles for videos, and capturing what's said in a presentation or a meeting. We started with English and Spanish, but we'll have many more languages coming soon.
One of the things that is different from other transcription services is that the text won't display as just one long uninterrupted string of text as with other transcription services. Instead, we use machine learning to add in punctuation and grammatical formatting so the text is immediately usable. Then we time-stamp every word so you can align subtitles to the video to make for indexing purposes.
The service supports high-quality audio and, because so much of the audio data today is generated from phones that produce lower-quality, low-bit-rate audio, we uniquely support this as well.
Very soon, you’ll also be able to distinguish between multiple speakers, and add your own custom libraries and vocabularies, to manage words that have different meanings.
To bring this to life, here are a couple of examples of how Transcribe is working in practice today:
In Media & Entertainment Transcribe is used to extract text from rich media as an alternative to closed captioning;
Call Centers use Transcribe against phone calls to understand behavior, training opportunities, and call routing.
Our customers’ customers can be located all over the world and speak several different languages. They want to translate content into different languages. Here again, the way that people traditionally solved this problem is that they hired translation agencies that are expensive in terms of time and money. Customers choose only their most important content to translate to manage the costs, and they leave all that value on the table. Amazon Translate, which automatically translates text between languages, helps to solve this problem.
Translate is great for use cases that require real-time translation. Examples of this are live customer support and business communications over social media. You can also translate an entire bucket at one time in a batch operation.
Soon, our customers will be able to use Translate to recognize the source language on-the-fly, so that they won’t need to specify the language of origin. And, as is the case with our other services, this one is very cost-effective.
As an example of how Translate can benefit our customers, the auto-translation capabilities will help them expand their brands globally.
Amazon Polly provides the ability to develop natural-sounding, accurate, and intelligible speech in 52 voices across 25 languages. These are available globally in 14 AWS regions. The naturalness of the voice is very important, as this is a quality measure and is at the leading edge of where this science can go. If customers are looking for a replacement for their Interactive Voice Response (or IVR) systems or synthetic voices for other applications, it’s a big leap ahead in quality. This is an exciting area for spoken responses.
We have two ways for customers to interact with Polly. One is by creating text dynamically and calling a service to have it speak. The other is called caching which is used when you create a scripted output, post it in the cloud, save it to an MP3, and then run the MP3. This is an enormous cost savings if you run these messages repeatedly. Polly provides the foundational level so that you can launch disconnected devices, as well. It enables customers to build all of the instructions into a script and load these inside the device itself with the MP3 onboard, becoming a self-contained system. An example of this is an IoT manufacturer who is building Polly into a security system using Alexa as well. They built dynamic interactive tutorials with voice projecting from the security device. This is an interesting and alternative way of embedding voice at the edge.
Amazon Comprehend is a natural language processing service that uses machine learning to find insights and relationships in text. Amazon Comprehend identifies the language of the text; extracts key phrases, places, people, brands, or events; understands how positive or negative the text is; and automatically organizes a collection of text files by topic.
Our customers are using Amazon Comprehend to identify key topics, entities, and sentiments in social media and news streams, and to enhance their ability to access and aggregate unstructured data from the vast document libraries that exist within their organizations.
Hotels.com has thousands of customer views and comments that are submitted by people who stay at the properties. It’s historically been difficult to find what matters in all this data. By using Amazon Comprehend, Hotels.com is able to uncover the unique characteristics that people like or don’t like about each hotel. Consequently, the company is better able to make recommendations to their users.
Amazon Lex uses the automatic speech recognition and natural language understanding technology that fuels Amazon Alexa to allow developers to quickly build intelligent conversational applications, such as chatbots. With Lex, any application running on the web, a mobile app, or a device, can process natural language using an API or SDK.
Lex will apply automatic speech recognition (referred to as ASR) and natural language understanding (referred to as NLU) to the incoming message to understand the intent of the user. In turn, this inbound request is mapped to a Lambda function which processes the information, forms a response, and passes it back to the user as either voice, using Polly, or text.
Amazon Lex has an integrated development console that allows users to build multi-step conversations that can be tested in the AWS console and then deployed to a custom platform, Facebook Messenger, Slack, Kik, and Twilio.
Amazon provides a collection of enterprise connectors for Amazon Lex to Salesforce, Microsoft Dynamics, Marketo, Zendesk, Quickbooks, and HubSpot. These connectors help customers build new interfaces around existing enterprise data.
As a fully managed service, Amazon Lex scales automatically, so you don’t need to worry about managing infrastructure.