Production and Beyond: Deploying and Managing Machine Learning Models

Production and Beyond:
Rajat Arya, Senior Product Manager
Alice Zheng, Director of Data Science
1
Deploying and Managing
Machine Learning Models

Choosing between deployed models.
What is Production?
Evaluation
Monitoring
Deployment
Management
Easily serve live predictions.
Measuring quality of deployed models.
Tracking model quality & operations.

Lifecycle of ML in Production
Evaluation
Monitoring
Deployment
Management

The Setup
Suppose we are building a website with product
recommendations, trained using Amazon reviews.
• 34.6M reviews
• 2.4M products
• 6.6M users

Deployment System
Model
Historical
Data
Predictions
Live
Data
Batch training Real-time predictions
Feedback

Deployment System
Model
Historical
Data
input
recommendations

Batch Training: DIY
• Use entire cluster
efficiently
• Scale nodes up or
down
• No data transfer
• Operation metrics,
dashboard, alarming
8
Scalable
Distributed
Easy to schedule, launch,
and monitor
Co-located with Data
Model
Historical
Data
Batch training

Dato Distributed
Dato Distributed Architecture
GraphLab Create

Dato Distributed
In one line of code launch a long-running
cluster of machines to do parallel /
distributed execution from GraphLab Create.

Deployment System
Model
Historical
Data
Predictions
Live
Data

Real-time Predictions: DIY
• REST endpoints,
language independence
• Fast predictions, with
caching
• Replicated models
• Alarming, Metrics
• Scale up or down
• Easy to update / deploy
models
12
Low Latency
Ease of Integration
Scalable
Fault Tolerant
Maintainable
Predictions
Live
Data
Real-time predictions

Dato Predictive Services Architecture
Dato Predictive Services
website,
mobile,
browser, etc
REST API
DIST. CACHE
MODEL

In one line of code launch a fault-tolerant,
scalable, robust, and maintainable cluster,
putting a service-oriented architecture on
machine learning models.

Dato Predictive Services Benchmark
Amazon review dataset (34.6M reviews, 2.4mi users, 6.6mi products).
Measured end to end latency
Average: < 65ms
P99: < 100ms
AWS m3.xlarge instances
• 4 cores
• 15 GB RAM
3 Node deployment
Moderate Load Generated

Recommendation System
Model
Historical
Data
Predictions
Live
Data

What happens after (initial) deployment

ML production life cycle
Evaluation
Monitoring
Deployment
Management

After deployment
Evaluate and track metrics over time.
React to feedback from deployed models.
MonitoringManagementEvaluation

ML in production - 101
Model
Historical
Data
Predictions
Live
Data
Feedback

ML in production - 101
Model
Historical
Data
Real-time predictionsBatch training
Predictions
Model 2
Live
Data

Key questions
• When to update a model?
• How to choose between existing models?
• Answer: continuous evaluation and testing

What is evaluation?
Predictions Metric
+
Evaluation
What data?
Which metric?

Evaluating a recommender
Model
Historical
Data
Predictions
Live
Data
Ranking
loss
User
engagement

Evaluating a recommender
Model
Historical
Data
Predictions
Live
Data
Ranking
loss
User
engagementOffline evaluation:
When to update model
Online evaluation:
Choosing between models

Updating ML models
Why update?
• Trends and user tastes change over time
• Model performance drops
When to update?
• Track statistics of data over time
• Monitor both offline & online metrics on live data
• Update when offline metric diverges from online metrics

Choosing between ML models
Model 2
Model 1
2000 visits
10% CTR
Group A
Everybody gets
Model 2
2000 visits
30% CTR
Group B
Strategy 1: A/B testing—select the best model and use it all the time

A statistician walks into a casino…
Pay-off $1:$1000 Pay-off $1:$200 Pay-off $1:$500
Play this 85% of
the time
Play this 10% of
the time
Play this 5% of
the time
Multi-armed
bandits

A statistician walks into an ML production environment
Pay-off $1:$1000 Pay-off $1:$200 Pay-off $1:$500
Use this 85% of
the time
(Exploitation)
Use this 10% of
the time
(Exploration)
Use this 5% of
the time
(Exploration)
Model 1 Model 2 Model 3

MAB vs. A/B testing
Why MAB?
• Continuous optimization, “set and forget”
• Maximize overall reward
Why A/B test?
• Simple to understand
• Single winner
• Tricky to do right

Other production considerations
• Versioning
• Logging
• Provenance
• Dashboards
• Reports
“Machine learning: The high interest rate credit card of technical debt,” D. Sculley et al, Google, 2014
“Two big challenges in machine learning,” Leon Bottou, ICML 2015 invited talk

Conclusions
Evaluation
Monitoring
Deployment
Management
Dato Distributed
&
A/B testing,
multi-armed bandits
& much more
Dato – one stop shop for all stages of the ML lifecycle
Simple, platform agnostic interface
@datoinc, #DataSmt

Production and Beyond: Deploying and Managing Machine Learning Models

More Related Content

What's hot

Viewers also liked

Similar to Production and Beyond: Deploying and Managing Machine Learning Models

More from Turi, Inc.

Recently uploaded

Production and Beyond: Deploying and Managing Machine Learning Models

Editor's Notes