: Accelerating the
End-to-End ML Lifecycle
Matei Zaharia
@matei_zaharia
2
About Databricks
Founded in 2013 by the original creators of Apache Spark
Unified analytics platform for data
science, engineering and AI
• Optimized Spark and ML runtime
• Collaborative & secure workspace
Apache Spark, Spark and Apache are trademarks of the Apache Software Foundation
Some of Our Customers
Financial	Services Healthcare	&	Pharma Media	&	Entertainment Technology
Public	Sector Retail	&	CPG		 Consumer	Services Energy	&	Industrial	IoTMarketing	&	AdTech
Data	&	Analytics	Services
Some of Our Customers
Financial	Services Healthcare	&	Pharma Media	&	Entertainment Technology
Public	Sector Retail	&	CPG		 Consumer	Services Energy	&	Industrial	IoTMarketing	&	AdTech
Data	&	Analytics	Services
Correlate EMR of 50,000 patients
compared with their DNA
Some of Our CustomersFinancial	Services Healthcare	&	Pharma Media	&	Entertainment Technology
Public	Sector Retail	&	CPG		 Consumer	Services Energy	&	Industrial	IoTMarketing	&	AdTech
Data	&	Analytics	Services
Provide recommendations to sales
using NLP and deep learning
Some of Our Customers
Financial	Services Healthcare	&	Pharma Media	&	Entertainment Technology
Public	Sector Retail	&	CPG		 Consumer	Services Energy	&	Industrial	IoTMarketing	&	AdTech
Data	&	Analytics	Services
Curb abusive behavior
across gamers globally
Machine Learning
Development is Complex
ML Lifecycle
8
Delta
Data Prep
Training
Deploy
Raw Data
μ
λ θ Tuning
Scale
μ
λ θ Tuning
Scale
Scale
Scale
Model
Exchange
Governance
“I build 100s of models/day to lift revenue, using any library:
MLlib, PyTorch, R, etc. There’s no easy way to see what data
went in a model from a week ago and rebuild it.”
-- Chief scientist at ad tech firm
Example
Example
“Our company has 100 teams using ML worldwide. We can’t
share work across them: when a new team tries to run some
code, it doesn’t even give the same result.”
-- Large consumer electronics firm
Custom ML Platforms
Facebook FBLearner, Uber Michelangelo, Google TFX
+Standardize the data prep / training / deploy cycle:
if you work within the platform, you get these!
– Limited to a few algorithms or frameworks
– Tied to one company’s infrastructure
Can we provide similar benefits in an open manner?
www.mlflow.org
Introducing
Open source machine learning platform
• Works with any ML library & language
• Runs the same way everywhere & cross-cloud
• Scales to big data with Apache Spark
Launched this June
• Already >50 contributors and many new features!
MLflow Design Philosophy
1. API-first, “open-interface” platform
• Allow submitting runs, models, etc from any library & language
– Example: a “model” can just be a Python lambda function
• Key enabler: built around REST APIs and CLI
2. Modular design
• Let users mix & match components in their existing workflows
• Key enabler: multiple components that can be used separately
MLflow Components
14
Tracking
Record and query
experiments: code,
data, config, results
Projects
Package workflows
into reproducible
and reusable steps
Models
Model packaging
format for diverse
deployment tools
Model Development without MLflow
data = load_text(file)
ngrams = extract_ngrams(data, N=n)
model = train_model(ngrams,
learning_rate=lr)
score = compute_accuracy(model)
print(“For n=%d, lr=%f: accuracy=%f”
% (n, lr, score))
pickle.dump(model, open(“model.pkl”))
For n=2, lr=0.1: accuracy=0.71
For n=2, lr=0.2: accuracy=0.79
For n=2, lr=0.5: accuracy=0.83
For n=2, lr=0.9: accuracy=0.79
For n=3, lr=0.1: accuracy=0.83
For n=3, lr=0.2: accuracy=0.82
For n=4, lr=0.5: accuracy=0.75
...
What if I expand
the input data?
What if I tune this
other parameter?
What if I upgrade
my ML library?
What version of
my code was this
result from?
Model Deployment without MLflow
Code & Models
DATA
SCIENTIST
PRODUCTION
ENGINEER
Please deploy this
SciKit model!
Please deploy this
Spark model!
Please deploy this
R model!
Please deploy this
TensorFlow model!
Please deploy this
ArXiv paper!
…
Model Development with MLflow
data = load_text(file)
ngrams = extract_ngrams(data, N=n)
model = train_model(ngrams,
learning_rate=lr)
score = compute_accuracy(model)
print(“For n=%d, lr=%f: accuracy=%f”
% (n, lr, score))
pickle.dump(model, open(“model.pkl”))
$ mlflow ui
Model Development with MLflow
data = load_text(file)
ngrams = extract_ngrams(data, N=n)
model = train_model(ngrams,
learning_rate=lr)
score = compute_accuracy(model)
mlflow.log_param(“data_file”, file)
mlflow.log_param(“n”, n)
mlflow.log_param(“learning_rate”, lr)
mlflow.log_metric(“score”, score)
mlflow.sklearn.log_model(model)
Track parameters, metrics,
output files & code version
Search using UI or API
MLflow UI: Inspecting Runs
MLflow UI: Comparing Runs
Project Spec
Code ConfigDeps
Local Execution
Remote Cluster
Packaging Code: MLflow Projects
$ mlflow run git://...
Example MLflow Project
my_project/
├── MLproject
│
│
│
│
│
├── conda.yaml
├── main.py
└── model.py
...
conda_env: conda.yaml
entry_points:
main:
parameters:
training_data: path
lambda: {type: float, default: 0.1}
command: python main.py {training_data} {lambda}
$ mlflow run git://<my_project>
mlflow.run(“git://<my_project>”, ...)
Model Format
ONNX Flavor
Python Flavor
Training Apps
Batch & Stream Scoring
REST Serving Tools
Packaging Models: MLflow Models
Packaging Format
. . .
Inference Code
MLlib
Example MLflow Model
my_model/
├── MLmodel
│
│
│
│
│
└── estimator/
├── saved_model.pb
└── variables/
...
Usable by tools that understand
TensorFlow model format
Usable by any tool that can run
Python (Docker, Spark, etc!)
run_id: 769915006efd4c4bbd662461
time_created: 2018-06-28T12:34
flavors:
tensorflow:
saved_model_dir: estimator
signature_def_key: predict
python_function:
loader_module: mlflow.tensorflow
$ mlflow pyfunc serve -r <run_id>
spark_udf = pyfunc.spark_udf(<run_id>)
Model Deployment with MLflow
DATA
SCIENTIST
PRODUCTION
ENGINEER
Please deploy this
MLflow Model!
OK, it’s up in our REST
server & Spark!
Please run this
MLflow Project
nightly for updates!
Don’t even tell me
what ArXiv paper
that’s from...
MLflow Development Status
Many new features since our release in June
• Model packaging for MLlib, H2O, TensorFlow, PyTorch, Keras
• R API (contributed by RStudio)
• Java & Scala API
• Storage backends: Azure, AWS, Google, (S)FTP
Just released MLflow 0.8.0
• Multi-step workflow UI
• Compact table view
• Azure ML Serving Workspace deploys
Early MLflow Use Cases
Uses MLflow Tracking to build & monitor
hundreds of models for entities in energy grid
Uses MLflow Projects to package and run
reproducible deep learning jobs in the cloud
Uses MLflow Models to package and deploy a
recommendation model + custom logic
European
Energy Company
Marketplace
Online Retailer
Conclusion
Machine learning platforms can simplify ML development
for both data scientists and engineers
To get started with MLflow, just pip install mlflow
Docs & tutorials at mlflow.org
+See Jules Damji’s talk tomorrow at 11!

mlflow: Accelerating the End-to-End ML lifecycle

  • 1.
    : Accelerating the End-to-EndML Lifecycle Matei Zaharia @matei_zaharia
  • 2.
    2 About Databricks Founded in2013 by the original creators of Apache Spark Unified analytics platform for data science, engineering and AI • Optimized Spark and ML runtime • Collaborative & secure workspace Apache Spark, Spark and Apache are trademarks of the Apache Software Foundation
  • 3.
    Some of OurCustomers Financial Services Healthcare & Pharma Media & Entertainment Technology Public Sector Retail & CPG Consumer Services Energy & Industrial IoTMarketing & AdTech Data & Analytics Services
  • 4.
    Some of OurCustomers Financial Services Healthcare & Pharma Media & Entertainment Technology Public Sector Retail & CPG Consumer Services Energy & Industrial IoTMarketing & AdTech Data & Analytics Services Correlate EMR of 50,000 patients compared with their DNA
  • 5.
    Some of OurCustomersFinancial Services Healthcare & Pharma Media & Entertainment Technology Public Sector Retail & CPG Consumer Services Energy & Industrial IoTMarketing & AdTech Data & Analytics Services Provide recommendations to sales using NLP and deep learning
  • 6.
    Some of OurCustomers Financial Services Healthcare & Pharma Media & Entertainment Technology Public Sector Retail & CPG Consumer Services Energy & Industrial IoTMarketing & AdTech Data & Analytics Services Curb abusive behavior across gamers globally
  • 7.
  • 8.
    ML Lifecycle 8 Delta Data Prep Training Deploy RawData μ λ θ Tuning Scale μ λ θ Tuning Scale Scale Scale Model Exchange Governance
  • 9.
    “I build 100sof models/day to lift revenue, using any library: MLlib, PyTorch, R, etc. There’s no easy way to see what data went in a model from a week ago and rebuild it.” -- Chief scientist at ad tech firm Example
  • 10.
    Example “Our company has100 teams using ML worldwide. We can’t share work across them: when a new team tries to run some code, it doesn’t even give the same result.” -- Large consumer electronics firm
  • 11.
    Custom ML Platforms FacebookFBLearner, Uber Michelangelo, Google TFX +Standardize the data prep / training / deploy cycle: if you work within the platform, you get these! – Limited to a few algorithms or frameworks – Tied to one company’s infrastructure Can we provide similar benefits in an open manner?
  • 12.
    www.mlflow.org Introducing Open source machinelearning platform • Works with any ML library & language • Runs the same way everywhere & cross-cloud • Scales to big data with Apache Spark Launched this June • Already >50 contributors and many new features!
  • 13.
    MLflow Design Philosophy 1.API-first, “open-interface” platform • Allow submitting runs, models, etc from any library & language – Example: a “model” can just be a Python lambda function • Key enabler: built around REST APIs and CLI 2. Modular design • Let users mix & match components in their existing workflows • Key enabler: multiple components that can be used separately
  • 14.
    MLflow Components 14 Tracking Record andquery experiments: code, data, config, results Projects Package workflows into reproducible and reusable steps Models Model packaging format for diverse deployment tools
  • 15.
    Model Development withoutMLflow data = load_text(file) ngrams = extract_ngrams(data, N=n) model = train_model(ngrams, learning_rate=lr) score = compute_accuracy(model) print(“For n=%d, lr=%f: accuracy=%f” % (n, lr, score)) pickle.dump(model, open(“model.pkl”)) For n=2, lr=0.1: accuracy=0.71 For n=2, lr=0.2: accuracy=0.79 For n=2, lr=0.5: accuracy=0.83 For n=2, lr=0.9: accuracy=0.79 For n=3, lr=0.1: accuracy=0.83 For n=3, lr=0.2: accuracy=0.82 For n=4, lr=0.5: accuracy=0.75 ... What if I expand the input data? What if I tune this other parameter? What if I upgrade my ML library? What version of my code was this result from?
  • 16.
    Model Deployment withoutMLflow Code & Models DATA SCIENTIST PRODUCTION ENGINEER Please deploy this SciKit model! Please deploy this Spark model! Please deploy this R model! Please deploy this TensorFlow model! Please deploy this ArXiv paper! …
  • 17.
    Model Development withMLflow data = load_text(file) ngrams = extract_ngrams(data, N=n) model = train_model(ngrams, learning_rate=lr) score = compute_accuracy(model) print(“For n=%d, lr=%f: accuracy=%f” % (n, lr, score)) pickle.dump(model, open(“model.pkl”))
  • 18.
    $ mlflow ui ModelDevelopment with MLflow data = load_text(file) ngrams = extract_ngrams(data, N=n) model = train_model(ngrams, learning_rate=lr) score = compute_accuracy(model) mlflow.log_param(“data_file”, file) mlflow.log_param(“n”, n) mlflow.log_param(“learning_rate”, lr) mlflow.log_metric(“score”, score) mlflow.sklearn.log_model(model) Track parameters, metrics, output files & code version Search using UI or API
  • 19.
  • 20.
  • 21.
    Project Spec Code ConfigDeps LocalExecution Remote Cluster Packaging Code: MLflow Projects $ mlflow run git://...
  • 22.
    Example MLflow Project my_project/ ├──MLproject │ │ │ │ │ ├── conda.yaml ├── main.py └── model.py ... conda_env: conda.yaml entry_points: main: parameters: training_data: path lambda: {type: float, default: 0.1} command: python main.py {training_data} {lambda} $ mlflow run git://<my_project> mlflow.run(“git://<my_project>”, ...)
  • 23.
    Model Format ONNX Flavor PythonFlavor Training Apps Batch & Stream Scoring REST Serving Tools Packaging Models: MLflow Models Packaging Format . . . Inference Code MLlib
  • 24.
    Example MLflow Model my_model/ ├──MLmodel │ │ │ │ │ └── estimator/ ├── saved_model.pb └── variables/ ... Usable by tools that understand TensorFlow model format Usable by any tool that can run Python (Docker, Spark, etc!) run_id: 769915006efd4c4bbd662461 time_created: 2018-06-28T12:34 flavors: tensorflow: saved_model_dir: estimator signature_def_key: predict python_function: loader_module: mlflow.tensorflow $ mlflow pyfunc serve -r <run_id> spark_udf = pyfunc.spark_udf(<run_id>)
  • 25.
    Model Deployment withMLflow DATA SCIENTIST PRODUCTION ENGINEER Please deploy this MLflow Model! OK, it’s up in our REST server & Spark! Please run this MLflow Project nightly for updates! Don’t even tell me what ArXiv paper that’s from...
  • 26.
    MLflow Development Status Manynew features since our release in June • Model packaging for MLlib, H2O, TensorFlow, PyTorch, Keras • R API (contributed by RStudio) • Java & Scala API • Storage backends: Azure, AWS, Google, (S)FTP Just released MLflow 0.8.0 • Multi-step workflow UI • Compact table view • Azure ML Serving Workspace deploys
  • 27.
    Early MLflow UseCases Uses MLflow Tracking to build & monitor hundreds of models for entities in energy grid Uses MLflow Projects to package and run reproducible deep learning jobs in the cloud Uses MLflow Models to package and deploy a recommendation model + custom logic European Energy Company Marketplace Online Retailer
  • 28.
    Conclusion Machine learning platformscan simplify ML development for both data scientists and engineers To get started with MLflow, just pip install mlflow Docs & tutorials at mlflow.org +See Jules Damji’s talk tomorrow at 11!