Have you always wanted to add predictive capabilities to your application, but haven’t been able to find the time or the right technology to get started? In this session, learn how a smart application for predictive customer service can be built in the AWS cloud. We will walk through the process of labeling data, setting up a real-time data ingestion pipeline and using machine learning to make real-time predictions for messages arriving via social media channels. You will be able to later replicate everything shown on your own, using the provided sample code and training dataset.
Alex Ingerman leads the product management team for Amazon Machine Learning. He joined Amazon in 2012, after working on products including web-scale search, content recommendation systems, immersive data exploration environments, and enterprise email and content servers. Alex holds a Bachelor of Science degree in Computer Science, and a Master of Science degree in Medical Engineering.
2. Agenda
• What is predictive customer service?
• Using machine learning to find important social media
conversations
• Building an end-to-end application to act on these
conversations
3. Application details
Goal: build a smart application for social media listening in the cloud
Full source code and documentation are on GitHub:
http://bit.ly/AmazonMLCodeSample
Amazon
Kinesis
AWS
Lambda
Amazon
Machine Learning
Amazon
SNS
Amazon
Mechanical Turk
8. Why do we need machine learning for this?
The social media stream is high-volume, and most of the
messages are not CS-actionable
9. Amazon Machine Learning in one slide
• Easy to use, managed machine learning
service built for developers
• Robust, powerful machine learning
technology based on Amazon’s internal
systems
• Create models using your data already
stored in the AWS cloud
• Deploy models to production in seconds
11. Formulating the business problem
We would like to…
Instantly find new tweets mentioning @awscloud, ingest and
analyze each one to predict whether a customer service agent
should act on it, and, if so, send that tweet to customer service
agents.
13. Establishing the data flow
We would like to…
Instantly find new tweets mentioning @awscloud, ingest and
analyze each one to predict whether a customer service agent
should act on it, and, if so, send that tweet to customer service
agents.
Twitter API
14. Establishing the data flow
We would like to…
Instantly find new tweets mentioning @awscloud, ingest and
analyze each one to predict whether a customer service agent
should act on it, and, if so, send that tweet to customer service
agents.
Twitter API Amazon
Kinesis
15. Establishing the data flow
We would like to…
Instantly find new tweets mentioning @awscloud, ingest and
analyze each one to predict whether a customer service agent
should act on it, and, if so, send that tweet to customer service
agents.
Twitter API Amazon
Kinesis
AWS
Lambda
16. Establishing the data flow
We would like to…
Instantly find new tweets mentioning @awscloud, ingest and
analyze each one to predict whether a customer service
agent should act on it, and, if so, send that tweet to customer
service agents.
Twitter API Amazon
Kinesis
AWS
Lambda
Amazon
Machine Learning
17. Establishing the data flow
We would like to…
Instantly find new tweets mentioning @awscloud, ingest and
analyze each one to predict whether a customer service agent
should act on it, and, if so, send that tweet to customer
service agents.
Twitter API Amazon
Kinesis
AWS
Lambda
Amazon
Machine Learning
Amazon
SNS
19. Picking the machine learning strategy
Question we want to answer:
Is this tweet customer service-actionable, or not?
Our dataset:
Text and metadata from past tweets mentioning @awscloud
Machine learning approach:
Create a binary classification model to answer a yes/no question, and
provide a confidence score
21. Retrieve past tweets
Twitter API can be used to search for tweets containing our
company’s handle (e.g., @awscloud)
import twitter
twitter_api = twitter.Api(**twitter_credentials)
twitter_handle = ‘awscloud’
search_query = '@' + twitter_handle + ' -from:' + twitter_handle
results = twitter_api.GetSearch(term=search_query, count=100, result_type='recent’)
# We can go further back in time by issuing additional search requests
22. Retrieve past tweets
Twitter API can be used to search for tweets containing our
company’s handle (e.g., @awscloud)
import twitter
twitter_api = twitter.Api(**twitter_credentials)
twitter_handle = ‘awscloud’
search_query = '@' + twitter_handle + ' -from:' + twitter_handle
results = twitter_api.GetSearch(term=search_query, count=100, result_type='recent')
# We can go further back in time by issuing additional search requests
Good news: data is well-structured and clean
Bad news: tweets are not categorized (labeled) for us
23. Labeling past tweets
Why label tweets?
(Many) machine learning algorithms work by discovering
patterns connecting data points and labels
How many tweets need to be labeled?
Several thousands to start with
Can I pay someone to do this?
Yes! Amazon Mechanical Turk is a marketplace for tasks that
require human intelligence
25. Amazon ML process, in a nutshell
1. Create your datasources
Two API calls to create your training and evaluation data
Sanity-check your data in service console
2. Create your ML model
One API call to build a model, with smart default or custom setting
3. Evaluate your ML model
One API call to compute your model’s quality metric
4. Adjust your ML model
Use console to align performance trade-offs to your business goals
28. Reminder: Our data flow
Twitter API Amazon
Kinesis
AWS
Lambda
Amazon
Machine Learning
Amazon
SNS
29. Create an Amazon ML endpoint for retrieving real-
time predictions
import boto
ml = boto.connect_machinelearning()
ml.create_realtime_endpoint(“ml-tweets”)
# Endpoint information can be retrieved using the get_ml_model() method. Sample output:
#"EndpointInfo": {
# "CreatedAt": 1424378682.266,
# "EndpointStatus": "READY",
# "EndpointUrl": ”https://realtime.machinelearning.us-east-1.amazonaws.com",
# "PeakRequestsPerSecond": 200}
Twitter API Amazon
Kinesis
AWS
Lambda
Amazon
Machine Learning
Amazon
SNS
30. Create an Amazon Kinesis stream for receiving
tweets
import boto
kinesis = boto.connect_kinesis()
kinesis.create_stream(stream_name = ‘tweetStream’, shard_count = 1)
# Each open shard can support up to 5 read transactions per second, up to a
# maximum total of 2 MB of data read per second. Each shard can support up to
# 1000 records written per second, up to a maximum total of 1 MB data written
# per second.
Twitter API Amazon
Kinesis
AWS
Lambda
Amazon
Machine Learning
Amazon
SNS
31. Set up AWS Lambda to coordinate the data flow
The Lambda function is our application’s backbone. We will:
1. Write the code that will process and route tweets
2. Configure the Lambda execution policy (what is it allowed to do?)
3. Add the Kinesis stream as the data source for the Lambda function
Twitter API Amazon
Kinesis
AWS
Lambda
Amazon
Machine Learning
Amazon
SNS
32. Create Lambda functions
Twitter API Amazon
Kinesis
AWS
Lambda
Amazon
Machine Learning
Amazon
SNS
// These are our function’s signatures and globals only. See GitHub repository for full source.
var ml = new AWS.MachineLearning();
var endpointUrl = '';
var mlModelId = ’ml-tweets';
var snsTopicArn = 'arn:aws:sns:{region}:{awsAccountId}:{snsTopic}';
var snsMessageSubject = 'Respond to tweet';
var snsMessagePrefix = 'ML model '+mlModelId+': Respond to this tweet:
https://twitter.com/0/status/';
var processRecords = function() {…} // Base64 decode the Kinesis payload and parse JSON
var callPredict = function(tweetData) {…} // Call Amazon ML real-time prediction API
var updateSns = function(tweetData) {…} // Publish CS-actionable tweets to SNS topic
var checkRealtimeEndpoint = function(err, data) {…} // Get Amazon ML endpoint URI
33. Create Lambda functions
Twitter API Amazon
Kinesis
AWS
Lambda
Amazon
Machine Learning
Amazon
SNS
// These are our function’s signatures and globals only. See GitHub repository for full source.
var ml = new AWS.MachineLearning();
var endpointUrl = '';
var mlModelId = ’ml-tweets';
var snsTopicArn = 'arn:aws:sns:{region}:{awsAccountId}:{snsTopic}';
var snsMessageSubject = 'Respond to tweet';
var snsMessagePrefix = 'ML model '+mlModelId+': Respond to this tweet:
https://twitter.com/0/status/';
var processRecords = function() {…} // Base64 decode the Kinesis payload and parse JSON
var callPredict = function(tweetData) {…} // Call Amazon ML real-time prediction API
var updateSns = function(tweetData) {…} // Publish CS-actionable tweets to SNS topic
var checkRealtimeEndpoint = function(err, data) {…} // Get Amazon ML endpoint URI
42. Amazon ML real-time predictions test
Here is the same tweet…as a JSON blob:
{
"statuses_count": "8617",
"description": "Software Developer",
"friends_count": "96",
"text": "`scala-aws-s3` A Simple Amazon #S3 Wrapper for #Scala 1.10.20 available :
https://t.co/q76PLTovFg",
"verified": "False",
"geo_enabled": "True",
"uid": "3800711",
"favourites_count": "36",
"screen_name": "turutosiya",
"followers_count": "640",
"user.name": "Toshiya TSURU",
"sid": "647222291672100864"
}
43. Amazon ML real-time predictions test
Let’s use the AWS Command Line Interface to request a prediction for this tweet:
aws machinelearning predict
--predict-endpoint https://realtime.machinelearning.us-east-1.amazonaws.com
--ml-model-id ml-tweets
--record ‘<json_blob>’
44. Amazon ML real-time predictions test
Let’s use the AWS Command Line Interface to request a prediction for this tweet:
aws machinelearning predict
--predict-endpoint https://realtime.machinelearning.us-east-1.amazonaws.com
--ml-model-id ml-tweets
--record ‘<json_blob>’
{
"Prediction": {
"predictedLabel": "0",
"predictedScores": {
"0": 0.012336540967226028
},
"details": {
"PredictiveModelType": "BINARY",
"Algorithm": "SGD"
}
}
}
45. Generalizing to more feedback channels
Amazon
Kinesis
AWS
Lambda
Model 1 Amazon
SNS
Model 2
Model 3
46. What’s next?
Try the service:
http://aws.amazon.com/machine-learning/
Download the Social Media Listening application code:
http://bit.ly/AmazonMLCodeSample
Get in touch!
ingerman@amazon.com