"Managing the Complete Machine Learning Lifecycle with MLflow"

Montreal| March 27, 2020
Platform for Complete Machine
Learning Lifecycle
Jules S. Damji
@2twitme
PyData Montreal

Apache Spark Developer & Community Advocate @ Databricks
Developer Advocate @ Hortonworks
Software engineering @ Sun Microsystems, Netscape, @Home,
Excite@Home, VeriSign, Scalix, Centrify, LoudCloud/Opsware, ProQuest
Program Co-chair Spark + AI Summit
https://www.linkedin.com/in/dmatrix
@2twitme
$ whoami

Accelerate innovation by unifying data science,
engineering and business to solve data problems
• Original creators of
• 2000+ global companies use our platform across big data &
machine learning lifecycle
VISION
WHO WE
ARE
Unified Data Analytics PlatformSOLUTION
Koalas

Outline
• Overview of ML development challenges
• How MLflow tackles these
• MLflow Components
• MLflow Tracking, Projects, Models &
Registry
• Managed MLflow Demo
• Q & A

Machine Learning
Development is Complex

Traditional Software Machine Learning
Goal: Optimize a metric (e.g., accuracy)
• Constantly experiment to improve it
Quality depends on input data
and tuning parameters
Compare + combine many libraries,
models & algorithms for the same task
Goal: Meet a functional specification
Quality depends only on code
Typically pick one software stack

ML Lifecycle
7
Delta
Data Prep
Training
Deploy
Raw Data
μ
λ θ Tuning
Scale
μ
λ θ Tuning
Scale
Scale
Scale
Model
Exchange
Governance

Custom ML Platforms
Some Big Data Companies
+Standardize the data prep / training / deploy loop:
if you work with the platform, you get these!
–Limited to a few algorithms or frameworks
–Tied to one company’s infrastructure
–Out of luck if you left the company….
Can we provide similar benefits in an open manner?

Introducing
Open machine learning platform
• Works with popular ML library & language
• Runs the same way anywhere (e.g., any cloud or locally)
• Designed to be useful for 1 or 1000+ person orgs
• Simple. Modular.Easy-to-use.
• Offers positive developer experience to get started!

10
“API-first” Modular design
MLflow Design Philosophy
• Allow different components
individually (e.g., use MLflow’s project
format but not its deployment tools)
• Not monolithic
• But Distinctive and Selective
Key enabler: distinct components
(Tracking/Projects/Models/Registry)
• Submit runs, log models, metrics, etc.
from popular library & language
• Abstract “model” lambda function
that MLflow can then deploy in many
places (Docker, Azure ML, Spark UDF)
• Open interface allows easy integration
from the community
Key enabler: built around
Programmatic APIs, REST APIs & CLI

MLflow Components
Tracking
Record and query
experiments: code,
data, config, and
results
Projects
Package data
science code in a
format that enables
reproducible runs
on any platform
Models
Deploy machine
learning models in
diverse serving
environments
environments
Model
Registry
Store, annotate
and manage
models in a
central repository
new
mlflow.org github.com/mlflow twitter.com/MLflow
databricks.com/
mlflow

Key Concepts in MLflow Tracking
Parameters: key-value inputs to your code
Metrics: numeric values (can update over time)
Tags and Notes: information about a run
Artifacts: files, data, and models
Source: what code ran?
Version: what of the code?

Model Development without MLflow
data = load_text(file)
ngrams = extract_ngrams(data, N=n)
model = train_model(ngrams,
learning_rate=lr)
score = compute_accuracy(model)
print(“For n=%d, lr=%f: accuracy=%f”
% (n, lr, score))
pickle.dump(model, open(“model.pkl”))
For n=2, lr=0.1: accuracy=0.71
...
What if I expand
the input data?
What if I tune this
other parameter?
What if I upgrade
my ML library?
What version of
my code was this
result from?

MLflow Tracking API: Simple & Pythonic!
14
Tracking
Record and query
experiments: code,
configs, results,
…etc
import mlflow
import mflow.tensorflow
# log model’s tuning parameters
with mlflow.start_run() as run:
mlflow.log_param("layers", layers)
mlflow.log_param("alpha", alpha)
# log metrics and model
mlflow.log_metric("mse", model.mse())
mlflow.log_artifact("plot", model.plot(test_df))
mlflow.tensorflow.log_model(model)

$ mlflow ui
Model Development with MLflow is Simple!
data = load_text(file)
ngrams = extract_ngrams(data, N=n)
model = train_model(ngrams,
learning_rate=lr)
score = compute_accuracy(model)
with mlflow.start_run() as run:
mlflow.log_param(“data_file”, file)
mlflow.log_param(“n”, n)
mlflow.log_param(“learn_rate”, lr)
mlflow.log_metric(“score”, score)
mlflow.sklearn.log_model(model)
Track parameters, metrics,
output files & code version
Search using UI or API

Notebooks
Local Apps
Cloud Jobs
MLflow Tracking
Python,
Java, R or
REST API
$ export MLFLOW_TRACKING_URI <URI>
mlflow.set_tracking_uri(URI)
UI
API
Tracking Server
Parameters Metrics Artifacts
ModelsMetadata Spark
Data Source

MLflow Tracking Backend Stores
1. Entity (Metadata) Store
• FileStore (local filesystem)
• SQLStore (via SQLAlchemy)
• PostgreSQL, MySQL, SQLlite
2. Artifact Store
• S3 backed store
• Azure Blob storage
• Google Cloud storage
• DBFS artifact repo

MLflow Components
Tracking
Record and query
experiments: code,
data, config, and
results
Projects
Package data
science code in a
format that enables
reproducible runs
on any platform
Models
Deploy machine
learning models in
diverse serving
environments
Model
Registry
Store, annotate
and manage
models in a
central repository
new
mlflow.org github.com/mlflow twitter.com/MLflow
databricks.com/
mlflow

MLflow Projects Motivation
Diverse set of tools
Diverse set of environments
Challenge: ML results difficult to reproduce
Projects
Package data science
code in a format that
enables reproducible
runs on any platform

Project Spec
Code
Data
Config
Local Execution
Remote Execution
MLflow Projects
Dependencies

Example MLflow Project
my_project/
├── MLproject
│
│
│
│
│
├── conda.yaml
├── main.py
└── model.py
...
conda_env: conda.yaml
entry_points:
main:
parameters:
training_data: path
lambda: {type: float, default: 0.1}
command: python main.py {training_data} {lambda}
$ mlflow run git://<my_project> -P lambda=0.2
mlflow.run(“git://<my_project>”, ...)
mlflow run . –e main –P lambda=0.2

Example MLflow Project
my_project/
├── MLproject
│
│
│
│
│
├── conda.yaml
├── main.py
└── model.py
...
channels:
- defaults
dependencies:
- python=3.7.3
- scikit-learn=0.20.3
- pip:
- mlflow
- cloudpickle==0.8.0
name: mlflow-env

Inference Code
Batch & Stream Scoring
Serving Tools
MLflow Model Motivation
ML Frameworks

Model Format
Flavor 2Flavor 1
ML Frameworks
Inference Code
Batch & Stream
Scoring
Serving Tools
Standard for ML models
MLflow Models

Example MLflow Model
my_model/
├── MLmodel
│
│
│
│
│
└── estimator/
├── saved_model.pb
└── variables/
...
Usable by tools that understand
TensorFlow model format
Usable by any tool that can run
Python (Docker, Spark, etc!)
run_id: 769915006efd4c4bbd662461
time_created: 2018-06-28T12:34
flavors:
tensorflow:
saved_model_dir: estimator
signature_def_key: predict
python_function:
loader_module: mlflow.tensorflow
mlflow.tensorflow.log_model(...)

Model Flavors Example
Train a model
mlflow.keras.log_model(…)
Model
Format
Flavor 1:
Pyfunc
Flavor 2:
Keras
predict = mlflow.pyfunc.load_model(…)
predict(pandas.input_dataframe)
model = mlflow.keras.load_model(…)
model.predict(keras.Input(…))

Model Flavors Example
predict = mlflow.pyfunc.load_model(model_uri)
predict(pandas.input_dataframe)

The Model Management Problem
When you’re working on one ML app alone, storing your
models in files is manageable
MODEL
DEVELOPER classifier_v1.h5
classifier_v2.h5
classifier_v3_sept_19.h5
classifier_v3_new.h5
…

The Model Management Problem
When you work in a large organization with many models,
many data teams, management becomes a major
challenge:
• Where can I find the best version of this model?
• How was this model trained?
• How can I track docs for each model?
• How can I review models?
MODEL
DEVELOPER
REVIEWER
MODEL
USER
???

Model Registry
VISION: Centralized and collaborative model lifecycle management
Tracking Server
Parameters Metrics Artifacts
Staging Production Archived
ModelsMetadata
Model Registry
Data Scientists Deployment Engineers

MLflow Model Registry
Repository of named, versioned
models with comments & tags
Track each model’s stage: none,
staging, production, or archived
Easily load a specific version
Provides Model lineage & activities

Model Registry Workflow API
Model Registry
MODEL
DEVELOPER
DOWNSTREAM
USERS
AUTOMATED JOBS
REST SERVING
REVIEWERS,
CI/CD TOOLS
mlflow.register_model("runs:/e148278107274..832/artifacts/model",
" WeatherForecastModel")
client = mlflow.tracking.Mlflowclient()
client.transition_model_version_stage(name=”WeatherForecastModel”,
version=5,
stage= "Production")
model_uri= "models:/{model_name}/production".format(
model_name=" WeatherForecastModel")
model_prod = mlflow.sklearn.load_model(model_uri)

What Did We Talk About?
Modular Components greatly simplify the ML lifecycle
• Available APIs: Python, Java & R (Soon Scala)
• Easy to install and use
• Develop & Deploy locally and track locally or remotely
• Visualize experiments and compare runs
• Centrally register and mange model lifecycle

0
20
40
60
80
100
120
140
0 6 12 18 24 30 36 42
#ofContributors
Months since Project Launch
Project Contributors over Time
MLflow
Apache
Spark
Today at 179
contributors

Learning More About MLflow
• pip install mlflow to get started
• Find docs & examples at mlflow.org
• https://github.com/mlflow/mlflow
• tinyurl.com/mlflow-slac
• dbricks.co/mlflow-tutorials
38

MLflow Demo
Fun & Fake Demo
Serious & Real Demo

Merci Montreal J
Q & A
jules@databricks.com
@2twitme
https://www.linkedin.com/in/dmatrix/

"Managing the Complete Machine Learning Lifecycle with MLflow"

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to "Managing the Complete Machine Learning Lifecycle with MLflow"

Similar to "Managing the Complete Machine Learning Lifecycle with MLflow" (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

"Managing the Complete Machine Learning Lifecycle with MLflow"