SlideShare a Scribd company logo
: Accelerating the
End-to-End ML Lifecycle
Matei Zaharia
@matei_zaharia
2
About Databricks
Founded in 2013 by the original creators of Apache Spark
Unified analytics platform for data
science, engineering and AI
• Optimized Spark and ML runtime
• Collaborative & secure workspace
Apache Spark, Spark and Apache are trademarks of the Apache Software Foundation
Some of Our Customers
Financial	Services Healthcare	&	Pharma Media	&	Entertainment Technology
Public	Sector Retail	&	CPG		 Consumer	Services Energy	&	Industrial	IoTMarketing	&	AdTech
Data	&	Analytics	Services
Some of Our Customers
Financial	Services Healthcare	&	Pharma Media	&	Entertainment Technology
Public	Sector Retail	&	CPG		 Consumer	Services Energy	&	Industrial	IoTMarketing	&	AdTech
Data	&	Analytics	Services
Correlate EMR of 50,000 patients
compared with their DNA
Some of Our CustomersFinancial	Services Healthcare	&	Pharma Media	&	Entertainment Technology
Public	Sector Retail	&	CPG		 Consumer	Services Energy	&	Industrial	IoTMarketing	&	AdTech
Data	&	Analytics	Services
Provide recommendations to sales
using NLP and deep learning
Some of Our Customers
Financial	Services Healthcare	&	Pharma Media	&	Entertainment Technology
Public	Sector Retail	&	CPG		 Consumer	Services Energy	&	Industrial	IoTMarketing	&	AdTech
Data	&	Analytics	Services
Curb abusive behavior
across gamers globally
Machine Learning
Development is Complex
ML Lifecycle
8
Delta
Data Prep
Training
Deploy
Raw Data
μ
λ θ Tuning
Scale
μ
λ θ Tuning
Scale
Scale
Scale
Model
Exchange
Governance
“I build 100s of models/day to lift revenue, using any library:
MLlib, PyTorch, R, etc. There’s no easy way to see what data
went in a model from a week ago and rebuild it.”
-- Chief scientist at ad tech firm
Example
Example
“Our company has 100 teams using ML worldwide. We can’t
share work across them: when a new team tries to run some
code, it doesn’t even give the same result.”
-- Large consumer electronics firm
Custom ML Platforms
Facebook FBLearner, Uber Michelangelo, Google TFX
+Standardize the data prep / training / deploy cycle:
if you work within the platform, you get these!
– Limited to a few algorithms or frameworks
– Tied to one company’s infrastructure
Can we provide similar benefits in an open manner?
www.mlflow.org
Introducing
Open source machine learning platform
• Works with any ML library & language
• Runs the same way everywhere & cross-cloud
• Scales to big data with Apache Spark
Launched this June
• Already >50 contributors and many new features!
MLflow Design Philosophy
1. API-first, “open-interface” platform
• Allow submitting runs, models, etc from any library & language
– Example: a “model” can just be a Python lambda function
• Key enabler: built around REST APIs and CLI
2. Modular design
• Let users mix & match components in their existing workflows
• Key enabler: multiple components that can be used separately
MLflow Components
14
Tracking
Record and query
experiments: code,
data, config, results
Projects
Package workflows
into reproducible
and reusable steps
Models
Model packaging
format for diverse
deployment tools
Model Development without MLflow
data = load_text(file)
ngrams = extract_ngrams(data, N=n)
model = train_model(ngrams,
learning_rate=lr)
score = compute_accuracy(model)
print(“For n=%d, lr=%f: accuracy=%f”
% (n, lr, score))
pickle.dump(model, open(“model.pkl”))
For n=2, lr=0.1: accuracy=0.71
For n=2, lr=0.2: accuracy=0.79
For n=2, lr=0.5: accuracy=0.83
For n=2, lr=0.9: accuracy=0.79
For n=3, lr=0.1: accuracy=0.83
For n=3, lr=0.2: accuracy=0.82
For n=4, lr=0.5: accuracy=0.75
...
What if I expand
the input data?
What if I tune this
other parameter?
What if I upgrade
my ML library?
What version of
my code was this
result from?
Model Deployment without MLflow
Code & Models
DATA
SCIENTIST
PRODUCTION
ENGINEER
Please deploy this
SciKit model!
Please deploy this
Spark model!
Please deploy this
R model!
Please deploy this
TensorFlow model!
Please deploy this
ArXiv paper!
…
Model Development with MLflow
data = load_text(file)
ngrams = extract_ngrams(data, N=n)
model = train_model(ngrams,
learning_rate=lr)
score = compute_accuracy(model)
print(“For n=%d, lr=%f: accuracy=%f”
% (n, lr, score))
pickle.dump(model, open(“model.pkl”))
$ mlflow ui
Model Development with MLflow
data = load_text(file)
ngrams = extract_ngrams(data, N=n)
model = train_model(ngrams,
learning_rate=lr)
score = compute_accuracy(model)
mlflow.log_param(“data_file”, file)
mlflow.log_param(“n”, n)
mlflow.log_param(“learning_rate”, lr)
mlflow.log_metric(“score”, score)
mlflow.sklearn.log_model(model)
Track parameters, metrics,
output files & code version
Search using UI or API
MLflow UI: Inspecting Runs
MLflow UI: Comparing Runs
Project Spec
Code ConfigDeps
Local Execution
Remote Cluster
Packaging Code: MLflow Projects
$ mlflow run git://...
Example MLflow Project
my_project/
├── MLproject
│
│
│
│
│
├── conda.yaml
├── main.py
└── model.py
...
conda_env: conda.yaml
entry_points:
main:
parameters:
training_data: path
lambda: {type: float, default: 0.1}
command: python main.py {training_data} {lambda}
$ mlflow run git://<my_project>
mlflow.run(“git://<my_project>”, ...)
Model Format
ONNX Flavor
Python Flavor
Training Apps
Batch & Stream Scoring
REST Serving Tools
Packaging Models: MLflow Models
Packaging Format
. . .
Inference Code
MLlib
Example MLflow Model
my_model/
├── MLmodel
│
│
│
│
│
└── estimator/
├── saved_model.pb
└── variables/
...
Usable by tools that understand
TensorFlow model format
Usable by any tool that can run
Python (Docker, Spark, etc!)
run_id: 769915006efd4c4bbd662461
time_created: 2018-06-28T12:34
flavors:
tensorflow:
saved_model_dir: estimator
signature_def_key: predict
python_function:
loader_module: mlflow.tensorflow
$ mlflow pyfunc serve -r <run_id>
spark_udf = pyfunc.spark_udf(<run_id>)
Model Deployment with MLflow
DATA
SCIENTIST
PRODUCTION
ENGINEER
Please deploy this
MLflow Model!
OK, it’s up in our REST
server & Spark!
Please run this
MLflow Project
nightly for updates!
Don’t even tell me
what ArXiv paper
that’s from...
MLflow Development Status
Many new features since our release in June
• Model packaging for MLlib, H2O, TensorFlow, PyTorch, Keras
• R API (contributed by RStudio)
• Java & Scala API
• Storage backends: Azure, AWS, Google, (S)FTP
Just released MLflow 0.8.0
• Multi-step workflow UI
• Compact table view
• Azure ML Serving Workspace deploys
Early MLflow Use Cases
Uses MLflow Tracking to build & monitor
hundreds of models for entities in energy grid
Uses MLflow Projects to package and run
reproducible deep learning jobs in the cloud
Uses MLflow Models to package and deploy a
recommendation model + custom logic
European
Energy Company
Marketplace
Online Retailer
Conclusion
Machine learning platforms can simplify ML development
for both data scientists and engineers
To get started with MLflow, just pip install mlflow
Docs & tutorials at mlflow.org
+See Jules Damji’s talk tomorrow at 11!

More Related Content

What's hot

MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
cnvrg.io AI OS - Hands-on ML Workshops
 
MLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine LearningMLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine Learning
Matei Zaharia
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
Databricks
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricks
Liangjun Jiang
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
Databricks
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of ML
Jordan Birdsell
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
Carl W. Handlin
 
MLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleMLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at Scale
Databricks
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOps
DataPhoenix
 
MLOps.pptx
MLOps.pptxMLOps.pptx
MLOps.pptx
AllenPeter7
 
MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model Serving
Databricks
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
Databricks
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
Pieter de Bruin
 
Productionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingProductionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model Serving
Databricks
 
How to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-SourceHow to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-Source
Databricks
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
Henrik Skogström
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
Avinash Patil
 

What's hot (20)

MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
 
MLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine LearningMLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine Learning
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricks
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of ML
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
 
MLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleMLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at Scale
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOps
 
MLOps.pptx
MLOps.pptxMLOps.pptx
MLOps.pptx
 
MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model Serving
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
Productionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingProductionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model Serving
 
How to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-SourceHow to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-Source
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
 

Similar to mlflow: Accelerating the End-to-End ML lifecycle

databricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineeringdatabricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineering
Mohamed MEJDOUBI
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
Stepan Pushkarev
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operations
Stepan Pushkarev
 
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
Amazon Web Services
 
Utilisation de MLflow pour le cycle de vie des projet Machine learning
Utilisation de MLflow pour le cycle de vie des projet Machine learningUtilisation de MLflow pour le cycle de vie des projet Machine learning
Utilisation de MLflow pour le cycle de vie des projet Machine learning
Paris Data Engineers !
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
Databricks
 
03_aiops-1.pptx
03_aiops-1.pptx03_aiops-1.pptx
03_aiops-1.pptx
FarazulHoda2
 
Hydrosphere.io Platform for AI/ML Operations Automation
Hydrosphere.io Platform for AI/ML Operations AutomationHydrosphere.io Platform for AI/ML Operations Automation
Hydrosphere.io Platform for AI/ML Operations Automation
Rustem Zakiev
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
Databricks
 
EPAM ML/AI Accelerator - ODAHU
EPAM ML/AI Accelerator - ODAHUEPAM ML/AI Accelerator - ODAHU
EPAM ML/AI Accelerator - ODAHU
Dmitrii Suslov
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
Márton Kodok
 
Scaling up Machine Learning Development
Scaling up Machine Learning DevelopmentScaling up Machine Learning Development
Scaling up Machine Learning Development
Matei Zaharia
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsDevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflows
Márton Kodok
 
Python
PythonPython
MLFlow 1.0 Meetup
MLFlow 1.0 Meetup MLFlow 1.0 Meetup
MLFlow 1.0 Meetup
Databricks
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Anyscale
 
Bhadale group of companies - Our project works
Bhadale group of companies - Our project worksBhadale group of companies - Our project works
Bhadale group of companies - Our project works
Vijayananda Mohire
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
David Talby
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
Databricks
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
Abhinav Joshi
 

Similar to mlflow: Accelerating the End-to-End ML lifecycle (20)

databricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineeringdatabricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineering
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operations
 
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
 
Utilisation de MLflow pour le cycle de vie des projet Machine learning
Utilisation de MLflow pour le cycle de vie des projet Machine learningUtilisation de MLflow pour le cycle de vie des projet Machine learning
Utilisation de MLflow pour le cycle de vie des projet Machine learning
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
 
03_aiops-1.pptx
03_aiops-1.pptx03_aiops-1.pptx
03_aiops-1.pptx
 
Hydrosphere.io Platform for AI/ML Operations Automation
Hydrosphere.io Platform for AI/ML Operations AutomationHydrosphere.io Platform for AI/ML Operations Automation
Hydrosphere.io Platform for AI/ML Operations Automation
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
 
EPAM ML/AI Accelerator - ODAHU
EPAM ML/AI Accelerator - ODAHUEPAM ML/AI Accelerator - ODAHU
EPAM ML/AI Accelerator - ODAHU
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
 
Scaling up Machine Learning Development
Scaling up Machine Learning DevelopmentScaling up Machine Learning Development
Scaling up Machine Learning Development
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsDevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflows
 
Python
PythonPython
Python
 
MLFlow 1.0 Meetup
MLFlow 1.0 Meetup MLFlow 1.0 Meetup
MLFlow 1.0 Meetup
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Bhadale group of companies - Our project works
Bhadale group of companies - Our project worksBhadale group of companies - Our project works
Bhadale group of companies - Our project works
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
Jio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdfJio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdf
inaya7568
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 

Recently uploaded (20)

The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
Jio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdfJio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdf
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 

mlflow: Accelerating the End-to-End ML lifecycle

  • 1. : Accelerating the End-to-End ML Lifecycle Matei Zaharia @matei_zaharia
  • 2. 2 About Databricks Founded in 2013 by the original creators of Apache Spark Unified analytics platform for data science, engineering and AI • Optimized Spark and ML runtime • Collaborative & secure workspace Apache Spark, Spark and Apache are trademarks of the Apache Software Foundation
  • 3. Some of Our Customers Financial Services Healthcare & Pharma Media & Entertainment Technology Public Sector Retail & CPG Consumer Services Energy & Industrial IoTMarketing & AdTech Data & Analytics Services
  • 4. Some of Our Customers Financial Services Healthcare & Pharma Media & Entertainment Technology Public Sector Retail & CPG Consumer Services Energy & Industrial IoTMarketing & AdTech Data & Analytics Services Correlate EMR of 50,000 patients compared with their DNA
  • 5. Some of Our CustomersFinancial Services Healthcare & Pharma Media & Entertainment Technology Public Sector Retail & CPG Consumer Services Energy & Industrial IoTMarketing & AdTech Data & Analytics Services Provide recommendations to sales using NLP and deep learning
  • 6. Some of Our Customers Financial Services Healthcare & Pharma Media & Entertainment Technology Public Sector Retail & CPG Consumer Services Energy & Industrial IoTMarketing & AdTech Data & Analytics Services Curb abusive behavior across gamers globally
  • 8. ML Lifecycle 8 Delta Data Prep Training Deploy Raw Data μ λ θ Tuning Scale μ λ θ Tuning Scale Scale Scale Model Exchange Governance
  • 9. “I build 100s of models/day to lift revenue, using any library: MLlib, PyTorch, R, etc. There’s no easy way to see what data went in a model from a week ago and rebuild it.” -- Chief scientist at ad tech firm Example
  • 10. Example “Our company has 100 teams using ML worldwide. We can’t share work across them: when a new team tries to run some code, it doesn’t even give the same result.” -- Large consumer electronics firm
  • 11. Custom ML Platforms Facebook FBLearner, Uber Michelangelo, Google TFX +Standardize the data prep / training / deploy cycle: if you work within the platform, you get these! – Limited to a few algorithms or frameworks – Tied to one company’s infrastructure Can we provide similar benefits in an open manner?
  • 12. www.mlflow.org Introducing Open source machine learning platform • Works with any ML library & language • Runs the same way everywhere & cross-cloud • Scales to big data with Apache Spark Launched this June • Already >50 contributors and many new features!
  • 13. MLflow Design Philosophy 1. API-first, “open-interface” platform • Allow submitting runs, models, etc from any library & language – Example: a “model” can just be a Python lambda function • Key enabler: built around REST APIs and CLI 2. Modular design • Let users mix & match components in their existing workflows • Key enabler: multiple components that can be used separately
  • 14. MLflow Components 14 Tracking Record and query experiments: code, data, config, results Projects Package workflows into reproducible and reusable steps Models Model packaging format for diverse deployment tools
  • 15. Model Development without MLflow data = load_text(file) ngrams = extract_ngrams(data, N=n) model = train_model(ngrams, learning_rate=lr) score = compute_accuracy(model) print(“For n=%d, lr=%f: accuracy=%f” % (n, lr, score)) pickle.dump(model, open(“model.pkl”)) For n=2, lr=0.1: accuracy=0.71 For n=2, lr=0.2: accuracy=0.79 For n=2, lr=0.5: accuracy=0.83 For n=2, lr=0.9: accuracy=0.79 For n=3, lr=0.1: accuracy=0.83 For n=3, lr=0.2: accuracy=0.82 For n=4, lr=0.5: accuracy=0.75 ... What if I expand the input data? What if I tune this other parameter? What if I upgrade my ML library? What version of my code was this result from?
  • 16. Model Deployment without MLflow Code & Models DATA SCIENTIST PRODUCTION ENGINEER Please deploy this SciKit model! Please deploy this Spark model! Please deploy this R model! Please deploy this TensorFlow model! Please deploy this ArXiv paper! …
  • 17. Model Development with MLflow data = load_text(file) ngrams = extract_ngrams(data, N=n) model = train_model(ngrams, learning_rate=lr) score = compute_accuracy(model) print(“For n=%d, lr=%f: accuracy=%f” % (n, lr, score)) pickle.dump(model, open(“model.pkl”))
  • 18. $ mlflow ui Model Development with MLflow data = load_text(file) ngrams = extract_ngrams(data, N=n) model = train_model(ngrams, learning_rate=lr) score = compute_accuracy(model) mlflow.log_param(“data_file”, file) mlflow.log_param(“n”, n) mlflow.log_param(“learning_rate”, lr) mlflow.log_metric(“score”, score) mlflow.sklearn.log_model(model) Track parameters, metrics, output files & code version Search using UI or API
  • 21. Project Spec Code ConfigDeps Local Execution Remote Cluster Packaging Code: MLflow Projects $ mlflow run git://...
  • 22. Example MLflow Project my_project/ ├── MLproject │ │ │ │ │ ├── conda.yaml ├── main.py └── model.py ... conda_env: conda.yaml entry_points: main: parameters: training_data: path lambda: {type: float, default: 0.1} command: python main.py {training_data} {lambda} $ mlflow run git://<my_project> mlflow.run(“git://<my_project>”, ...)
  • 23. Model Format ONNX Flavor Python Flavor Training Apps Batch & Stream Scoring REST Serving Tools Packaging Models: MLflow Models Packaging Format . . . Inference Code MLlib
  • 24. Example MLflow Model my_model/ ├── MLmodel │ │ │ │ │ └── estimator/ ├── saved_model.pb └── variables/ ... Usable by tools that understand TensorFlow model format Usable by any tool that can run Python (Docker, Spark, etc!) run_id: 769915006efd4c4bbd662461 time_created: 2018-06-28T12:34 flavors: tensorflow: saved_model_dir: estimator signature_def_key: predict python_function: loader_module: mlflow.tensorflow $ mlflow pyfunc serve -r <run_id> spark_udf = pyfunc.spark_udf(<run_id>)
  • 25. Model Deployment with MLflow DATA SCIENTIST PRODUCTION ENGINEER Please deploy this MLflow Model! OK, it’s up in our REST server & Spark! Please run this MLflow Project nightly for updates! Don’t even tell me what ArXiv paper that’s from...
  • 26. MLflow Development Status Many new features since our release in June • Model packaging for MLlib, H2O, TensorFlow, PyTorch, Keras • R API (contributed by RStudio) • Java & Scala API • Storage backends: Azure, AWS, Google, (S)FTP Just released MLflow 0.8.0 • Multi-step workflow UI • Compact table view • Azure ML Serving Workspace deploys
  • 27. Early MLflow Use Cases Uses MLflow Tracking to build & monitor hundreds of models for entities in energy grid Uses MLflow Projects to package and run reproducible deep learning jobs in the cloud Uses MLflow Models to package and deploy a recommendation model + custom logic European Energy Company Marketplace Online Retailer
  • 28. Conclusion Machine learning platforms can simplify ML development for both data scientists and engineers To get started with MLflow, just pip install mlflow Docs & tutorials at mlflow.org +See Jules Damji’s talk tomorrow at 11!