Montreal| March 27, 2020
Platform for Complete Machine
Learning Lifecycle
Jules S. Damji
@2twitme
PyData Montreal
Apache Spark Developer & Community Advocate @ Databricks
Developer Advocate @ Hortonworks
Software engineering @ Sun Microsystems, Netscape, @Home,
Excite@Home, VeriSign, Scalix, Centrify, LoudCloud/Opsware, ProQuest
Program Co-chair Spark + AI Summit
https://www.linkedin.com/in/dmatrix
@2twitme
$ whoami
Accelerate innovation by unifying data science,
engineering and business to solve data problems
• Original creators of
• 2000+ global companies use our platform across big data &
machine learning lifecycle
VISION
WHO WE
ARE
Unified Data Analytics PlatformSOLUTION
Koalas
Outline
• Overview of ML development challenges
• How MLflow tackles these
• MLflow Components
• MLflow Tracking, Projects, Models &
Registry
• Managed MLflow Demo
• Q & A
Machine Learning
Development is Complex
Traditional Software Machine Learning
Goal: Optimize a metric (e.g., accuracy)
• Constantly experiment to improve it
Quality depends on input data
and tuning parameters
Compare + combine many libraries,
models & algorithms for the same task
Goal: Meet a functional specification
Quality depends only on code
Typically pick one software stack
ML Lifecycle
7
Delta
Data Prep
Training
Deploy
Raw Data
μ
λ θ Tuning
Scale
μ
λ θ Tuning
Scale
Scale
Scale
Model
Exchange
Governance
Custom ML Platforms
Some Big Data Companies
+Standardize the data prep / training / deploy loop:
if you work with the platform, you get these!
–Limited to a few algorithms or frameworks
–Tied to one company’s infrastructure
–Out of luck if you left the company….
Can we provide similar benefits in an open manner?
Introducing
Open machine learning platform
• Works with popular ML library & language
• Runs the same way anywhere (e.g., any cloud or locally)
• Designed to be useful for 1 or 1000+ person orgs
• Simple. Modular.Easy-to-use.
• Offers positive developer experience to get started!
10
“API-first” Modular design
MLflow Design Philosophy
• Allow different components
individually (e.g., use MLflow’s project
format but not its deployment tools)
• Not monolithic
• But Distinctive and Selective
Key enabler: distinct components
(Tracking/Projects/Models/Registry)
• Submit runs, log models, metrics, etc.
from popular library & language
• Abstract “model” lambda function
that MLflow can then deploy in many
places (Docker, Azure ML, Spark UDF)
• Open interface allows easy integration
from the community
Key enabler: built around
Programmatic APIs, REST APIs & CLI
MLflow Components
Tracking
Record and query
experiments: code,
data, config, and
results
Projects
Package data
science code in a
format that enables
reproducible runs
on any platform
Models
Deploy machine
learning models in
diverse serving
environments
environments
Model
Registry
Store, annotate
and manage
models in a
central repository
new
mlflow.org github.com/mlflow twitter.com/MLflow
databricks.com/
mlflow
Key Concepts in MLflow Tracking
Parameters: key-value inputs to your code
Metrics: numeric values (can update over time)
Tags and Notes: information about a run
Artifacts: files, data, and models
Source: what code ran?
Version: what of the code?
Model Development without MLflow
data = load_text(file)
ngrams = extract_ngrams(data, N=n)
model = train_model(ngrams,
learning_rate=lr)
score = compute_accuracy(model)
print(“For n=%d, lr=%f: accuracy=%f”
% (n, lr, score))
pickle.dump(model, open(“model.pkl”))
For n=2, lr=0.1: accuracy=0.71
For n=2, lr=0.2: accuracy=0.79
For n=2, lr=0.5: accuracy=0.83
For n=2, lr=0.9: accuracy=0.79
For n=3, lr=0.1: accuracy=0.83
For n=3, lr=0.2: accuracy=0.82
For n=4, lr=0.5: accuracy=0.75
...
What if I expand
the input data?
What if I tune this
other parameter?
What if I upgrade
my ML library?
What version of
my code was this
result from?
MLflow Tracking API: Simple & Pythonic!
14
Tracking
Record and query
experiments: code,
configs, results,
…etc
import mlflow
import mflow.tensorflow
# log model’s tuning parameters
with mlflow.start_run() as run:
mlflow.log_param("layers", layers)
mlflow.log_param("alpha", alpha)
# log metrics and model
mlflow.log_metric("mse", model.mse())
mlflow.log_artifact("plot", model.plot(test_df))
mlflow.tensorflow.log_model(model)
$ mlflow ui
Model Development with MLflow is Simple!
data = load_text(file)
ngrams = extract_ngrams(data, N=n)
model = train_model(ngrams,
learning_rate=lr)
score = compute_accuracy(model)
with mlflow.start_run() as run:
mlflow.log_param(“data_file”, file)
mlflow.log_param(“n”, n)
mlflow.log_param(“learn_rate”, lr)
mlflow.log_metric(“score”, score)
mlflow.sklearn.log_model(model)
Track parameters, metrics,
output files & code version
Search using UI or API
Notebooks
Local Apps
Cloud Jobs
MLflow Tracking
Python,
Java, R or
REST API
$ export MLFLOW_TRACKING_URI <URI>
mlflow.set_tracking_uri(URI)
UI
API
Tracking Server
Parameters Metrics Artifacts
ModelsMetadata Spark
Data Source
MLflow Tracking Backend Stores
1. Entity (Metadata) Store
• FileStore (local filesystem)
• SQLStore (via SQLAlchemy)
• PostgreSQL, MySQL, SQLlite
2. Artifact Store
• S3 backed store
• Azure Blob storage
• Google Cloud storage
• DBFS artifact repo
MLflow Components
Tracking
Record and query
experiments: code,
data, config, and
results
Projects
Package data
science code in a
format that enables
reproducible runs
on any platform
Models
Deploy machine
learning models in
diverse serving
environments
Model
Registry
Store, annotate
and manage
models in a
central repository
new
mlflow.org github.com/mlflow twitter.com/MLflow
databricks.com/
mlflow
MLflow Projects Motivation
Diverse set of tools
Diverse set of environments
Challenge: ML results difficult to reproduce
Projects
Package data science
code in a format that
enables reproducible
runs on any platform
Project Spec
Code
Data
Config
Local Execution
Remote Execution
MLflow Projects
Dependencies
Example MLflow Project
my_project/
├── MLproject
│
│
│
│
│
├── conda.yaml
├── main.py
└── model.py
...
conda_env: conda.yaml
entry_points:
main:
parameters:
training_data: path
lambda: {type: float, default: 0.1}
command: python main.py {training_data} {lambda}
$ mlflow run git://<my_project> -P lambda=0.2
mlflow.run(“git://<my_project>”, ...)
mlflow run . –e main –P lambda=0.2
Example MLflow Project
my_project/
├── MLproject
│
│
│
│
│
├── conda.yaml
├── main.py
└── model.py
...
channels:
- defaults
dependencies:
- python=3.7.3
- scikit-learn=0.20.3
- pip:
- mlflow
- cloudpickle==0.8.0
name: mlflow-env
MLflow Components
Tracking
Record and query
experiments: code,
data, config, and
results
Projects
Package data
science code in a
format that enables
reproducible runs
on any platform
Models
Deploy machine
learning models in
diverse serving
environments
Model
Registry
Store, annotate
and manage
models in a
central repository
new
mlflow.org github.com/mlflow twitter.com/MLflow
databricks.com/
mlflow
Inference Code
Batch & Stream Scoring
Serving Tools
MLflow Model Motivation
ML Frameworks
Model Format
Flavor 2Flavor 1
ML Frameworks
Inference Code
Batch & Stream
Scoring
Serving Tools
Standard for ML models
MLflow Models
Example MLflow Model
my_model/
├── MLmodel
│
│
│
│
│
└── estimator/
├── saved_model.pb
└── variables/
...
Usable by tools that understand
TensorFlow model format
Usable by any tool that can run
Python (Docker, Spark, etc!)
run_id: 769915006efd4c4bbd662461
time_created: 2018-06-28T12:34
flavors:
tensorflow:
saved_model_dir: estimator
signature_def_key: predict
python_function:
loader_module: mlflow.tensorflow
mlflow.tensorflow.log_model(...)
Model Flavors Example
Train a model
mlflow.keras.log_model(…)
Model
Format
Flavor 1:
Pyfunc
Flavor 2:
Keras
predict = mlflow.pyfunc.load_model(…)
predict(pandas.input_dataframe)
model = mlflow.keras.load_model(…)
model.predict(keras.Input(…))
Model Flavors Example
predict = mlflow.pyfunc.load_model(model_uri)
predict(pandas.input_dataframe)
MLflow Components
Tracking
Record and query
experiments: code,
data, config, and
results
Projects
Package data
science code in a
format that enables
reproducible runs
on any platform
Models
Deploy machine
learning models in
diverse serving
environments
Model
Registry
Store, annotate
and manage
models in a
central repository
new
mlflow.org github.com/mlflow twitter.com/MLflow
databricks.com/
mlflow
The Model Management Problem
When you’re working on one ML app alone, storing your
models in files is manageable
MODEL
DEVELOPER classifier_v1.h5
classifier_v2.h5
classifier_v3_sept_19.h5
classifier_v3_new.h5
…
The Model Management Problem
When you work in a large organization with many models,
many data teams, management becomes a major
challenge:
• Where can I find the best version of this model?
• How was this model trained?
• How can I track docs for each model?
• How can I review models?
MODEL
DEVELOPER
REVIEWER
MODEL
USER
???
Model Registry
VISION: Centralized and collaborative model lifecycle management
Tracking Server
Parameters Metrics Artifacts
Staging Production Archived
ModelsMetadata
Model Registry
Data Scientists Deployment Engineers
MLflow Model Registry
Repository of named, versioned
models with comments & tags
Track each model’s stage: none,
staging, production, or archived
Easily load a specific version
Provides Model lineage & activities
Model Registry Workflow API
Model Registry
MODEL
DEVELOPER
DOWNSTREAM
USERS
AUTOMATED JOBS
REST SERVING
REVIEWERS,
CI/CD TOOLS
mlflow.register_model("runs:/e148278107274..832/artifacts/model",
" WeatherForecastModel")
client = mlflow.tracking.Mlflowclient()
client.transition_model_version_stage(name=”WeatherForecastModel”,
version=5,
stage= "Production")
model_uri= "models:/{model_name}/production".format(
model_name=" WeatherForecastModel")
model_prod = mlflow.sklearn.load_model(model_uri)
What Did We Talk About?
Modular Components greatly simplify the ML lifecycle
• Available APIs: Python, Java & R (Soon Scala)
• Easy to install and use
• Develop & Deploy locally and track locally or remotely
• Visualize experiments and compare runs
• Centrally register and mange model lifecycle
0
20
40
60
80
100
120
140
0 6 12 18 24 30 36 42
#ofContributors
Months since Project Launch
Project Contributors over Time
MLflow
Apache
Spark
Today at 179
contributors
Learning More About MLflow
• pip install mlflow to get started
• Find docs & examples at mlflow.org
• https://github.com/mlflow/mlflow
• tinyurl.com/mlflow-slac
• dbricks.co/mlflow-tutorials
38
MLflow Demo
Fun & Fake Demo
Serious & Real Demo
Merci Montreal J
Q & A
jules@databricks.com
@2twitme
https://www.linkedin.com/in/dmatrix/

"Managing the Complete Machine Learning Lifecycle with MLflow"

  • 1.
    Montreal| March 27,2020 Platform for Complete Machine Learning Lifecycle Jules S. Damji @2twitme PyData Montreal
  • 2.
    Apache Spark Developer& Community Advocate @ Databricks Developer Advocate @ Hortonworks Software engineering @ Sun Microsystems, Netscape, @Home, Excite@Home, VeriSign, Scalix, Centrify, LoudCloud/Opsware, ProQuest Program Co-chair Spark + AI Summit https://www.linkedin.com/in/dmatrix @2twitme $ whoami
  • 3.
    Accelerate innovation byunifying data science, engineering and business to solve data problems • Original creators of • 2000+ global companies use our platform across big data & machine learning lifecycle VISION WHO WE ARE Unified Data Analytics PlatformSOLUTION Koalas
  • 4.
    Outline • Overview ofML development challenges • How MLflow tackles these • MLflow Components • MLflow Tracking, Projects, Models & Registry • Managed MLflow Demo • Q & A
  • 5.
  • 6.
    Traditional Software MachineLearning Goal: Optimize a metric (e.g., accuracy) • Constantly experiment to improve it Quality depends on input data and tuning parameters Compare + combine many libraries, models & algorithms for the same task Goal: Meet a functional specification Quality depends only on code Typically pick one software stack
  • 7.
    ML Lifecycle 7 Delta Data Prep Training Deploy RawData μ λ θ Tuning Scale μ λ θ Tuning Scale Scale Scale Model Exchange Governance
  • 8.
    Custom ML Platforms SomeBig Data Companies +Standardize the data prep / training / deploy loop: if you work with the platform, you get these! –Limited to a few algorithms or frameworks –Tied to one company’s infrastructure –Out of luck if you left the company…. Can we provide similar benefits in an open manner?
  • 9.
    Introducing Open machine learningplatform • Works with popular ML library & language • Runs the same way anywhere (e.g., any cloud or locally) • Designed to be useful for 1 or 1000+ person orgs • Simple. Modular.Easy-to-use. • Offers positive developer experience to get started!
  • 10.
    10 “API-first” Modular design MLflowDesign Philosophy • Allow different components individually (e.g., use MLflow’s project format but not its deployment tools) • Not monolithic • But Distinctive and Selective Key enabler: distinct components (Tracking/Projects/Models/Registry) • Submit runs, log models, metrics, etc. from popular library & language • Abstract “model” lambda function that MLflow can then deploy in many places (Docker, Azure ML, Spark UDF) • Open interface allows easy integration from the community Key enabler: built around Programmatic APIs, REST APIs & CLI
  • 11.
    MLflow Components Tracking Record andquery experiments: code, data, config, and results Projects Package data science code in a format that enables reproducible runs on any platform Models Deploy machine learning models in diverse serving environments environments Model Registry Store, annotate and manage models in a central repository new mlflow.org github.com/mlflow twitter.com/MLflow databricks.com/ mlflow
  • 12.
    Key Concepts inMLflow Tracking Parameters: key-value inputs to your code Metrics: numeric values (can update over time) Tags and Notes: information about a run Artifacts: files, data, and models Source: what code ran? Version: what of the code?
  • 13.
    Model Development withoutMLflow data = load_text(file) ngrams = extract_ngrams(data, N=n) model = train_model(ngrams, learning_rate=lr) score = compute_accuracy(model) print(“For n=%d, lr=%f: accuracy=%f” % (n, lr, score)) pickle.dump(model, open(“model.pkl”)) For n=2, lr=0.1: accuracy=0.71 For n=2, lr=0.2: accuracy=0.79 For n=2, lr=0.5: accuracy=0.83 For n=2, lr=0.9: accuracy=0.79 For n=3, lr=0.1: accuracy=0.83 For n=3, lr=0.2: accuracy=0.82 For n=4, lr=0.5: accuracy=0.75 ... What if I expand the input data? What if I tune this other parameter? What if I upgrade my ML library? What version of my code was this result from?
  • 14.
    MLflow Tracking API:Simple & Pythonic! 14 Tracking Record and query experiments: code, configs, results, …etc import mlflow import mflow.tensorflow # log model’s tuning parameters with mlflow.start_run() as run: mlflow.log_param("layers", layers) mlflow.log_param("alpha", alpha) # log metrics and model mlflow.log_metric("mse", model.mse()) mlflow.log_artifact("plot", model.plot(test_df)) mlflow.tensorflow.log_model(model)
  • 15.
    $ mlflow ui ModelDevelopment with MLflow is Simple! data = load_text(file) ngrams = extract_ngrams(data, N=n) model = train_model(ngrams, learning_rate=lr) score = compute_accuracy(model) with mlflow.start_run() as run: mlflow.log_param(“data_file”, file) mlflow.log_param(“n”, n) mlflow.log_param(“learn_rate”, lr) mlflow.log_metric(“score”, score) mlflow.sklearn.log_model(model) Track parameters, metrics, output files & code version Search using UI or API
  • 16.
    Notebooks Local Apps Cloud Jobs MLflowTracking Python, Java, R or REST API $ export MLFLOW_TRACKING_URI <URI> mlflow.set_tracking_uri(URI) UI API Tracking Server Parameters Metrics Artifacts ModelsMetadata Spark Data Source
  • 17.
    MLflow Tracking BackendStores 1. Entity (Metadata) Store • FileStore (local filesystem) • SQLStore (via SQLAlchemy) • PostgreSQL, MySQL, SQLlite 2. Artifact Store • S3 backed store • Azure Blob storage • Google Cloud storage • DBFS artifact repo
  • 18.
    MLflow Components Tracking Record andquery experiments: code, data, config, and results Projects Package data science code in a format that enables reproducible runs on any platform Models Deploy machine learning models in diverse serving environments Model Registry Store, annotate and manage models in a central repository new mlflow.org github.com/mlflow twitter.com/MLflow databricks.com/ mlflow
  • 19.
    MLflow Projects Motivation Diverseset of tools Diverse set of environments Challenge: ML results difficult to reproduce Projects Package data science code in a format that enables reproducible runs on any platform
  • 20.
    Project Spec Code Data Config Local Execution RemoteExecution MLflow Projects Dependencies
  • 21.
    Example MLflow Project my_project/ ├──MLproject │ │ │ │ │ ├── conda.yaml ├── main.py └── model.py ... conda_env: conda.yaml entry_points: main: parameters: training_data: path lambda: {type: float, default: 0.1} command: python main.py {training_data} {lambda} $ mlflow run git://<my_project> -P lambda=0.2 mlflow.run(“git://<my_project>”, ...) mlflow run . –e main –P lambda=0.2
  • 22.
    Example MLflow Project my_project/ ├──MLproject │ │ │ │ │ ├── conda.yaml ├── main.py └── model.py ... channels: - defaults dependencies: - python=3.7.3 - scikit-learn=0.20.3 - pip: - mlflow - cloudpickle==0.8.0 name: mlflow-env
  • 23.
    MLflow Components Tracking Record andquery experiments: code, data, config, and results Projects Package data science code in a format that enables reproducible runs on any platform Models Deploy machine learning models in diverse serving environments Model Registry Store, annotate and manage models in a central repository new mlflow.org github.com/mlflow twitter.com/MLflow databricks.com/ mlflow
  • 24.
    Inference Code Batch &Stream Scoring Serving Tools MLflow Model Motivation ML Frameworks
  • 25.
    Model Format Flavor 2Flavor1 ML Frameworks Inference Code Batch & Stream Scoring Serving Tools Standard for ML models MLflow Models
  • 27.
    Example MLflow Model my_model/ ├──MLmodel │ │ │ │ │ └── estimator/ ├── saved_model.pb └── variables/ ... Usable by tools that understand TensorFlow model format Usable by any tool that can run Python (Docker, Spark, etc!) run_id: 769915006efd4c4bbd662461 time_created: 2018-06-28T12:34 flavors: tensorflow: saved_model_dir: estimator signature_def_key: predict python_function: loader_module: mlflow.tensorflow mlflow.tensorflow.log_model(...)
  • 28.
    Model Flavors Example Traina model mlflow.keras.log_model(…) Model Format Flavor 1: Pyfunc Flavor 2: Keras predict = mlflow.pyfunc.load_model(…) predict(pandas.input_dataframe) model = mlflow.keras.load_model(…) model.predict(keras.Input(…))
  • 29.
    Model Flavors Example predict= mlflow.pyfunc.load_model(model_uri) predict(pandas.input_dataframe)
  • 30.
    MLflow Components Tracking Record andquery experiments: code, data, config, and results Projects Package data science code in a format that enables reproducible runs on any platform Models Deploy machine learning models in diverse serving environments Model Registry Store, annotate and manage models in a central repository new mlflow.org github.com/mlflow twitter.com/MLflow databricks.com/ mlflow
  • 31.
    The Model ManagementProblem When you’re working on one ML app alone, storing your models in files is manageable MODEL DEVELOPER classifier_v1.h5 classifier_v2.h5 classifier_v3_sept_19.h5 classifier_v3_new.h5 …
  • 32.
    The Model ManagementProblem When you work in a large organization with many models, many data teams, management becomes a major challenge: • Where can I find the best version of this model? • How was this model trained? • How can I track docs for each model? • How can I review models? MODEL DEVELOPER REVIEWER MODEL USER ???
  • 33.
    Model Registry VISION: Centralizedand collaborative model lifecycle management Tracking Server Parameters Metrics Artifacts Staging Production Archived ModelsMetadata Model Registry Data Scientists Deployment Engineers
  • 34.
    MLflow Model Registry Repositoryof named, versioned models with comments & tags Track each model’s stage: none, staging, production, or archived Easily load a specific version Provides Model lineage & activities
  • 35.
    Model Registry WorkflowAPI Model Registry MODEL DEVELOPER DOWNSTREAM USERS AUTOMATED JOBS REST SERVING REVIEWERS, CI/CD TOOLS mlflow.register_model("runs:/e148278107274..832/artifacts/model", " WeatherForecastModel") client = mlflow.tracking.Mlflowclient() client.transition_model_version_stage(name=”WeatherForecastModel”, version=5, stage= "Production") model_uri= "models:/{model_name}/production".format( model_name=" WeatherForecastModel") model_prod = mlflow.sklearn.load_model(model_uri)
  • 36.
    What Did WeTalk About? Modular Components greatly simplify the ML lifecycle • Available APIs: Python, Java & R (Soon Scala) • Easy to install and use • Develop & Deploy locally and track locally or remotely • Visualize experiments and compare runs • Centrally register and mange model lifecycle
  • 37.
    0 20 40 60 80 100 120 140 0 6 1218 24 30 36 42 #ofContributors Months since Project Launch Project Contributors over Time MLflow Apache Spark Today at 179 contributors
  • 38.
    Learning More AboutMLflow • pip install mlflow to get started • Find docs & examples at mlflow.org • https://github.com/mlflow/mlflow • tinyurl.com/mlflow-slac • dbricks.co/mlflow-tutorials 38
  • 39.
    MLflow Demo Fun &Fake Demo Serious & Real Demo
  • 40.
    Merci Montreal J Q& A jules@databricks.com @2twitme https://www.linkedin.com/in/dmatrix/