: A Platform for
Production Machine Learning
Matei Zaharia
Databricks and Stanford University
@matei_zaharia
2
ML Research & Courses ML Products
ML in Production is Different from ML Research
Focus: reliably solving a business problem
Data is often the top challenge
(for models, try many common ones)
Must continuously deploy, monitor &
retrain models to maintain quality
Need new tools to enable this process!
(reproducibility, monitoring, …)
Focus: designing a good model
Data is provided and ready to use
(e.g. benchmark dataset)
No need to deploy, monitor, retrain
Tools for model design & evaluation
(e.g. TensorFlow, PyTorch, …)
Response: ML Platforms
Facebook FBLearner, Uber Michelangelo, Google TFX, …
+Standardize the data prep / training / deploy cycle:
if you work within the platform, you get these!
–Limited to a few algorithms or frameworks
–Tied to each company’s infrastructure
Can we provide similar benefits in an open manner?
Open source machine learning platform
• Works with any ML library, algorithm, language, etc
• Open interface design (use with any code you already have)
Tracking
Record and query
experiments: code,
data, confs, results
Projects
Packaging format
for reproducible
runs and workflows
Models
General format
that standardizes
deployment paths
Model Registry
Centralized model
management,
review & sharing
new
Community
158 contributors from >50 companies
• Integrated in RStudio, Azure ML, Faculty.ai, Neptune, Splice
900k downloads/month on PyPI
$ mlflow ui
MLflow Tracking
data = load_text(file)
ngrams = extract_ngrams(data, N=n)
model = train_model(ngrams,
learning_rate=lr)
score = compute_accuracy(model)
mlflow.log_param(“data_file”, file)
mlflow.log_param(“n”, n)
mlflow.log_param(“learning_rate”, lr)
mlflow.log_metric(“score”, score)
mlflow.keras.log_model(model)
Track parameters, metrics,
output files & code version
data = load_text(file)
ngrams = extract_ngrams(data, N=n)
model = train_model(ngrams,
learning_rate=lr)
score = compute_accuracy(model)
mlflow.log_param(“data_file”, file)
mlflow.log_param(“n”, n)
mlflow.log_param(“learning_rate”, lr)
mlflow.log_metric(“score”, score)
mlflow.keras.log_model(model)
$ mlflow ui
MLflow Tracking
Track parameters, metrics,
output files & code version
mlflow.keras.autolog()
MLflow UI: Inspecting Runs
MLflow Model Registry
GitHub-like environment for organizing & reviewing models
Model Registry
MODEL
DEVELOPER
DOWNSTREAM
USERS
REST SERVING
REVIEWERS,
CI/CD TOOLS
10
11
Released in MLflow 1.4
Interesting MLflow Use Cases
1) Massive number of independent models
• Company wants to train a separate model for each {facility,
chemical processing machine, household, …}
• Solution: large Spark job that runs an AutoML library for each task
+ MLflow for managing & selecting models
• ML scientists can’t look at each model ⇒ need hands-free ML!
Example:
Millions of models trained on terabytes of data/day
Interesting MLflow Use Cases
2) Big data analytics on model training results
• ML developer wants to analyze the result of multiple runs
interactively, possibly slicing across data points
• Solution: Pandas & SQL interfaces to MLflow tracking data
df = mlflow.search_runs(experiment_id, “metrics.loss < 2.5”)
Conclusion
Turning ML into reliable products is hard and requires a new
class of systems (ML Platforms)
Try MLflow at mlflow.org
Join the MLOps workshop at MLSys 2020

MLflow: A Platform for Production Machine Learning

  • 1.
    : A Platformfor Production Machine Learning Matei Zaharia Databricks and Stanford University @matei_zaharia
  • 2.
    2 ML Research &Courses ML Products ML in Production is Different from ML Research Focus: reliably solving a business problem Data is often the top challenge (for models, try many common ones) Must continuously deploy, monitor & retrain models to maintain quality Need new tools to enable this process! (reproducibility, monitoring, …) Focus: designing a good model Data is provided and ready to use (e.g. benchmark dataset) No need to deploy, monitor, retrain Tools for model design & evaluation (e.g. TensorFlow, PyTorch, …)
  • 3.
    Response: ML Platforms FacebookFBLearner, Uber Michelangelo, Google TFX, … +Standardize the data prep / training / deploy cycle: if you work within the platform, you get these! –Limited to a few algorithms or frameworks –Tied to each company’s infrastructure Can we provide similar benefits in an open manner?
  • 4.
    Open source machinelearning platform • Works with any ML library, algorithm, language, etc • Open interface design (use with any code you already have) Tracking Record and query experiments: code, data, confs, results Projects Packaging format for reproducible runs and workflows Models General format that standardizes deployment paths Model Registry Centralized model management, review & sharing new
  • 5.
    Community 158 contributors from>50 companies • Integrated in RStudio, Azure ML, Faculty.ai, Neptune, Splice 900k downloads/month on PyPI
  • 6.
    $ mlflow ui MLflowTracking data = load_text(file) ngrams = extract_ngrams(data, N=n) model = train_model(ngrams, learning_rate=lr) score = compute_accuracy(model) mlflow.log_param(“data_file”, file) mlflow.log_param(“n”, n) mlflow.log_param(“learning_rate”, lr) mlflow.log_metric(“score”, score) mlflow.keras.log_model(model) Track parameters, metrics, output files & code version
  • 7.
    data = load_text(file) ngrams= extract_ngrams(data, N=n) model = train_model(ngrams, learning_rate=lr) score = compute_accuracy(model) mlflow.log_param(“data_file”, file) mlflow.log_param(“n”, n) mlflow.log_param(“learning_rate”, lr) mlflow.log_metric(“score”, score) mlflow.keras.log_model(model) $ mlflow ui MLflow Tracking Track parameters, metrics, output files & code version mlflow.keras.autolog()
  • 8.
  • 9.
    MLflow Model Registry GitHub-likeenvironment for organizing & reviewing models Model Registry MODEL DEVELOPER DOWNSTREAM USERS REST SERVING REVIEWERS, CI/CD TOOLS
  • 10.
  • 11.
  • 12.
    Interesting MLflow UseCases 1) Massive number of independent models • Company wants to train a separate model for each {facility, chemical processing machine, household, …} • Solution: large Spark job that runs an AutoML library for each task + MLflow for managing & selecting models • ML scientists can’t look at each model ⇒ need hands-free ML!
  • 13.
    Example: Millions of modelstrained on terabytes of data/day
  • 14.
    Interesting MLflow UseCases 2) Big data analytics on model training results • ML developer wants to analyze the result of multiple runs interactively, possibly slicing across data points • Solution: Pandas & SQL interfaces to MLflow tracking data df = mlflow.search_runs(experiment_id, “metrics.loss < 2.5”)
  • 15.
    Conclusion Turning ML intoreliable products is hard and requires a new class of systems (ML Platforms) Try MLflow at mlflow.org Join the MLOps workshop at MLSys 2020