MLflow: A Platform for Production Machine Learning
1. : A Platform for
Production Machine Learning
Matei Zaharia
Databricks and Stanford University
@matei_zaharia
2. 2
ML Research & Courses ML Products
ML in Production is Different from ML Research
Focus: reliably solving a business problem
Data is often the top challenge
(for models, try many common ones)
Must continuously deploy, monitor &
retrain models to maintain quality
Need new tools to enable this process!
(reproducibility, monitoring, …)
Focus: designing a good model
Data is provided and ready to use
(e.g. benchmark dataset)
No need to deploy, monitor, retrain
Tools for model design & evaluation
(e.g. TensorFlow, PyTorch, …)
3. Response: ML Platforms
Facebook FBLearner, Uber Michelangelo, Google TFX, …
+Standardize the data prep / training / deploy cycle:
if you work within the platform, you get these!
–Limited to a few algorithms or frameworks
–Tied to each company’s infrastructure
Can we provide similar benefits in an open manner?
4. Open source machine learning platform
• Works with any ML library, algorithm, language, etc
• Open interface design (use with any code you already have)
Tracking
Record and query
experiments: code,
data, confs, results
Projects
Packaging format
for reproducible
runs and workflows
Models
General format
that standardizes
deployment paths
Model Registry
Centralized model
management,
review & sharing
new
5. Community
158 contributors from >50 companies
• Integrated in RStudio, Azure ML, Faculty.ai, Neptune, Splice
900k downloads/month on PyPI
12. Interesting MLflow Use Cases
1) Massive number of independent models
• Company wants to train a separate model for each {facility,
chemical processing machine, household, …}
• Solution: large Spark job that runs an AutoML library for each task
+ MLflow for managing & selecting models
• ML scientists can’t look at each model ⇒ need hands-free ML!
14. Interesting MLflow Use Cases
2) Big data analytics on model training results
• ML developer wants to analyze the result of multiple runs
interactively, possibly slicing across data points
• Solution: Pandas & SQL interfaces to MLflow tracking data
df = mlflow.search_runs(experiment_id, “metrics.loss < 2.5”)
15. Conclusion
Turning ML into reliable products is hard and requires a new
class of systems (ML Platforms)
Try MLflow at mlflow.org
Join the MLOps workshop at MLSys 2020