What happens after (initial) deployment
ML production life cycle
Evaluation
Monitoring
Deployment
Management
After deployment
Evaluate and track metrics over time.
React to feedback from deployed models.
MonitoringManagementEvaluation
ML in production - 101
Model
Historical
Data
Predictions
Live
Data
Feedback
Batch training Real-time predictions
ML in production - 101
Model
Historical
Data
Real-time predictionsBatch training
Predictions
Model 2
Live
Data
Key questions
• When to update a model?
• How to choose between existing models?
• Answer: continuous evaluation and testing
What is evaluation?
Predictions Metric
+
Evaluation
What data?
Which metric?
Evaluating a recommender
Model
Historical
Data
Predictions
Live
Data
Ranking
loss
User
engagement
Evaluating a recommender
Model
Historical
Data
Predictions
Live
Data
Ranking
loss
User
engagementOffline evaluation:
When to update model
Online evaluation:
Choosing between models
Updating ML models
Why update?
• Trends and user tastes change over time
• Model performance drops
When to update?
• Track statistics of data over time
• Monitor both offline & online metrics on live data
• Update when offline metric diverges from online metrics
Choosing between ML models
Model 2
Model 1
2000 visits
10% CTR
Group A
Everybody gets
Model 2
2000 visits
30% CTR
Group B
Strategy 1: A/B testing—select the best model and use it all the time
Choosing between ML models
A statistician walks into a casino…
Pay-off $1:$1000 Pay-off $1:$200 Pay-off $1:$500
Play this 85% of
the time
Play this 10% of
the time
Play this 5% of the
time
Multi-armed
bandits
Choosing between ML models
A statistician walks into an ML production environment
Pay-off $1:$1000 Pay-off $1:$200 Pay-off $1:$500
Use this 85% of
the time
(Exploitation)
Use this 10% of
the time
(Exploration)
Use this 5% of the
time
(Exploration)
Model 1 Model 2 Model 3
MAB vs. A/B testing
Why MAB?
• Continuous optimization, “set and forget”
• Maximize overall reward
Why A/B test?
• Simple to understand
• Single winner
• Tricky to do right
Other production considerations
• Versioning
• Logging
• Provenance
• Dashboards
• Reports
“Machine learning: The high interest rate credit card of technical debt,” D. Sculley et al, Google, 2014
“Two big challenges in machine learning,” Leon Bottou, ICML 2015 invited talk
Conclusions
Evaluation
Monitoring
Deployment
Management
Dato Distributed
&
Dato Predictive Services
A/B testing,
multi-armed bandits
& much more
Dato – one stop shop for all stages of the ML life cycle
Simple, platform agnostic interface
@datoinc, #DataSmt

Production and Beyond: Deploying and Managing Machine Learning Models