Standardising machine
learning lifecycle on Mlflow
Thunder Shiviah & Michael Shtelma
SAIS EUROPE - 2019 1
ML
Code
Configuration
Data Collection
Data
Verification
Feature
Extraction
Machine
Resource
Management
Analysis Tools
Process
Management Tools
Serving
Infrastructure
Monitoring
“Hidden Technical Debt in Machine Learning Systems,” Google NeurIPS 2015
Figure 1: Only a small fraction of real-world ML systems is composed of the ML code, as shown by the
small green box in the middle. The required surrounding infrastructure is vast and complex.
Hardest Part of ML isn’t ML, it’s Data
DATA
ENGINEERS
x
Data & ML Tech and People are in Silos
DATA
SCIENTISTS
ML Lifecycle is Manual, Inconsistent
and Disconnected
● Ad hoc approach to track
experiments
● Very hard to reproduce
experiments
Prep Data
● Multiple tightly coupled
deployment options
● Different monitoring approach
for each framework
Build Model Deploy Model
● Lowlevel integrations for
Data and ML
● Difficult to track data used
for a model
The need for
standardization
What is MLflow ?
Unveiled in June 2018, MLflow is an open source framework to manage the
complete Machine Learning Lifecycle.
Tracking
Record and query
experiments: code,
data, config, results
Projects
Packaging format
for reproducible runs
on any platform
Models
General model format
that supports diverse
deployment tools
MLflow Momentum at a glance
Rapid Community Adoption
● Time till 74 contributors: Spark = 3 years; MLflow = 8 months
Experiment Tracking
Notebooks
Local Apps
Cloud Jobs
Tracking Server
UI
API
MLflow Tracking
Python or
REST API
Key Concepts in Tracking
• Parameters: key-value inputs to your code
• Metrics: numeric values (can update over time)
• Artifacts: arbitrary files, including models
• Source: what code ran?
Experiment Tracking with Managed MLflow
Record runs, and keep track of
models parameters, results, code,
and data from each experiment
in one place.
Provides:
● Pre-configured MLflow tracking server
● Databricks Workspace & Notebooks UI integration
● S3, Azure Blob Storage, Google Cloud for artifacts storage
● Experiments management via role based Access Control Lists (ACLs)
Reproducible Projects
Project Spec
Code DataConfig
Local Execution
Remote Execution
MLflow Projects
Example MLflow Project
my_project/
├── MLproject
│
│
│
│
│
├── conda.yaml
├── main.py
└── model.py
...
conda_env: conda.yaml
entry_points:
main:
parameters:
training_data: path
lambda: {type: float, default: 0.1}
command: python main.py {training_data} {lambda}
$ mlflow run git://<my_project>
mlflow.run(“git://<my_project>”, ...)
Reproducible Projects with Managed MLflow
Build composable projects,
capture dependencies and code
history for reproducible results,
and share projects with peers.
Provides:
● Support for Git, Conda, and
other file storage systems
● Remote execution via command line as a Databricks Job
Model Deployment
Model Format
Flavor 2Flavor 1
Run Sources
Inference Code
Batch & Stream Scoring
Cloud Serving Tools
MLflow Models
Simple model flavors
usable by many tools
Example MLflow Model
my_model/
├── MLmodel
│
│
│
│
│
└── estimator/
├── saved_model.pb
└── variables/
...
Usable by tools that understand
TensorFlow model format
Usable by any tool that can run
Python (Docker, Spark, etc!)
run_id: 769915006efd4c4bbd662461
time_created: 2018-06-28T12:34
flavors:
tensorflow:
saved_model_dir: estimator
signature_def_key: predict
python_function:
loader_module: mlflow.tensorflow
Model Deployment with Managed MLflow
Quickly deploy models to any
platform based on your needs,
locally or in the cloud, from
experimentation to production.
Supports:
● Databricks Jobs and Clusters for
Production Model Operations
● Batch inference on Databricks (Apache Spark)
● REST endpoints via Docker containers, Azure ML, or SageMaker
What’s next?
Multi-step workflow GUI
https://databricks.com/sparkaisummit/north-america/2019-spark-summit-ai-keynotes-2#keynote-e
Model registry & deployment tracking
https://databricks.com/sparkaisummit/north-america/2019-spark-summit-ai-keynotes-2#keynote-e
Demo and exercise
Questions ?
Thank you!

Managing the Complete Machine Learning Lifecycle with MLflow

  • 1.
    Standardising machine learning lifecycleon Mlflow Thunder Shiviah & Michael Shtelma SAIS EUROPE - 2019 1
  • 2.
    ML Code Configuration Data Collection Data Verification Feature Extraction Machine Resource Management Analysis Tools Process ManagementTools Serving Infrastructure Monitoring “Hidden Technical Debt in Machine Learning Systems,” Google NeurIPS 2015 Figure 1: Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small green box in the middle. The required surrounding infrastructure is vast and complex. Hardest Part of ML isn’t ML, it’s Data
  • 3.
    DATA ENGINEERS x Data & MLTech and People are in Silos DATA SCIENTISTS
  • 4.
    ML Lifecycle isManual, Inconsistent and Disconnected ● Ad hoc approach to track experiments ● Very hard to reproduce experiments Prep Data ● Multiple tightly coupled deployment options ● Different monitoring approach for each framework Build Model Deploy Model ● Lowlevel integrations for Data and ML ● Difficult to track data used for a model
  • 5.
  • 6.
    What is MLflow? Unveiled in June 2018, MLflow is an open source framework to manage the complete Machine Learning Lifecycle. Tracking Record and query experiments: code, data, config, results Projects Packaging format for reproducible runs on any platform Models General model format that supports diverse deployment tools
  • 7.
  • 8.
    Rapid Community Adoption ●Time till 74 contributors: Spark = 3 years; MLflow = 8 months
  • 9.
  • 10.
    Notebooks Local Apps Cloud Jobs TrackingServer UI API MLflow Tracking Python or REST API
  • 11.
    Key Concepts inTracking • Parameters: key-value inputs to your code • Metrics: numeric values (can update over time) • Artifacts: arbitrary files, including models • Source: what code ran?
  • 12.
    Experiment Tracking withManaged MLflow Record runs, and keep track of models parameters, results, code, and data from each experiment in one place. Provides: ● Pre-configured MLflow tracking server ● Databricks Workspace & Notebooks UI integration ● S3, Azure Blob Storage, Google Cloud for artifacts storage ● Experiments management via role based Access Control Lists (ACLs)
  • 13.
  • 14.
    Project Spec Code DataConfig LocalExecution Remote Execution MLflow Projects
  • 15.
    Example MLflow Project my_project/ ├──MLproject │ │ │ │ │ ├── conda.yaml ├── main.py └── model.py ... conda_env: conda.yaml entry_points: main: parameters: training_data: path lambda: {type: float, default: 0.1} command: python main.py {training_data} {lambda} $ mlflow run git://<my_project> mlflow.run(“git://<my_project>”, ...)
  • 16.
    Reproducible Projects withManaged MLflow Build composable projects, capture dependencies and code history for reproducible results, and share projects with peers. Provides: ● Support for Git, Conda, and other file storage systems ● Remote execution via command line as a Databricks Job
  • 17.
  • 18.
    Model Format Flavor 2Flavor1 Run Sources Inference Code Batch & Stream Scoring Cloud Serving Tools MLflow Models Simple model flavors usable by many tools
  • 19.
    Example MLflow Model my_model/ ├──MLmodel │ │ │ │ │ └── estimator/ ├── saved_model.pb └── variables/ ... Usable by tools that understand TensorFlow model format Usable by any tool that can run Python (Docker, Spark, etc!) run_id: 769915006efd4c4bbd662461 time_created: 2018-06-28T12:34 flavors: tensorflow: saved_model_dir: estimator signature_def_key: predict python_function: loader_module: mlflow.tensorflow
  • 20.
    Model Deployment withManaged MLflow Quickly deploy models to any platform based on your needs, locally or in the cloud, from experimentation to production. Supports: ● Databricks Jobs and Clusters for Production Model Operations ● Batch inference on Databricks (Apache Spark) ● REST endpoints via Docker containers, Azure ML, or SageMaker
  • 21.
  • 22.
  • 23.
    Model registry &deployment tracking https://databricks.com/sparkaisummit/north-america/2019-spark-summit-ai-keynotes-2#keynote-e
  • 24.
  • 25.
  • 26.