SlideShare a Scribd company logo
Utilisation de pour le
cycle de vie des projets
Machine Learning
Arduino Cascella, Solutions Architect
Outline
• Introductions
• MLflow Intro
• Overview of Components
• MLflow 1.0 & Roadmap
33
Arduino Cascella
Solutions Architect
arduino@databricks.com
@ArduinoCascella
About me
Accelerate innovation by unifying data science,
engineering and business
• Original creators of
• 2000+ global companies use our platform across big
data & machine learning lifecycle
VISION
WHO WE
ARE
Unified Analytics PlatformSOLUTION
Machine Learning
Development is
Complex...
Source: https://xkcd.com/1838/
ML Lifecycle Challenges
Delta
Tuning Model Mgmt
Raw Data ETL TrainFeaturize Score/Serve
Batch + Realtime
Monitor
Alert, Debug
Deploy
RetrainUpdate Features
Zoo of Ecosystem Frameworks
Collaboration Scale Governance
An open source platform for the machin
learning lifecycle
New Data
Custom ML Platforms
Facebook FBLearner, Uber Michelangelo, Google TFX
+ Standardize the data prep / training / deploy loop:
if you work with the platform, you get these!
– Limited to a few algorithms or frameworks
– Tied to one company’s infrastructure
Can we provide similar benefits in an open manner?
Introducing
Open machine learning platform
• Works with any ML library & language
• Runs the same way anywhere (e.g., any cloud)
• Designed to be useful for 1 or 1000+ person orgs
MLflow Design Philosophy
1. “API-first”, open platform
• Allow submitting runs, models, etc from any library & language
• Example: a “model” can just be a lambda function that MLflow can
then deploy in many places (Docker, Azure ML, Spark UDF, …)
Key enabler: built around REST APIs and CLI
MLflow Design Philosophy
2. Modular design
• Let people use different components individually (e.g., use
MLflow’s project format but not its deployment tools)
• Easy to integrate into existing ML platforms & workflows
Key enabler: distinct components (Tracking/Projects/Models)
MLflow Community Growth
600k 100+ 40
Comparison: Apache Spark took 3 years to get to 100 contributors,
and has 1.2M downloads/month on PyPI
Some Users & Contributors
Supported Integrations: June ‘18
13
Supported Integrations: June ‘19
14
MLflow Components
15
Tracking
Record and query
experiments: code,
data, config, results
Projects
Packaging format
for reproducible runs
on any platform
Models
General model format
that supports diverse
deployment tools
Keeping Track of Experiments
Source: https://fr.wikipedia.org/wiki/Fichier:Jean_Mi%C3%A9lot,_Brussels.jpg
Key Concepts in Tracking
Parameters: key-value inputs to your code
Metrics: numeric values (can update over time)
Artifacts: arbitrary files, including data and models
Source: what code ran?
Tags & Comments
MLflow Tracking API: Simple!
18
Tracking
Record and query
experiments: code,
configs, results,
…etc
import mlflow
# log model’s tuning parameters
with mlflow.start_run():
mlflow.log_param("layers", layers)
mlflow.log_param("alpha", alpha)
# log model’s metrics
mlflow.log_metric("mse", model.mse())
mlflow.log_artifact("plot", model.plot(test_df))
mlflow.tensorflow.log_model(model)
Notebooks
Local Apps
Cloud Jobs
Tracking Server
UI
API
MLflow Tracking
Python or
REST API
20
mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow
Demo: tracking
Source: https://www.digital-science.com/blog/guest/digital-science-doodles-data-reproducibility/
Need for reproducible Data Science...
MLflow Projects Motivation
Diverse set of tools
Diverse set of environments
Projects
Packaging format
for reproducible
runs
on any platform
Result: It is difficult to productionize and share.
Project Spec
Code DataConfig
Local Execution
Remote Execution
MLflow Projects
Example MLflow Project
my_project/
├── MLproject
│
│
│
│
│
├── conda.yaml
├── main.py
└── model.py
...
conda_env: conda.yaml
entry_points:
main:
parameters:
training_data: path
lambda: {type: float, default: 0.1}
command: python main.py {training_data} {lambda}
$ mlflow run git://<my_project>
mlflow.run(“git://<my_project>”, ...)
25
mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow
Demo: reproducible projects
Model Format
Flavor 2Flavor 1
Run Sources
Inference Code
Batch & Stream Scoring
Cloud Serving Tools
MLflow Models
Simple model flavors
usable by many tools
Example MLflow Model
my_model/
├── MLmodel
│
│
│
│
│
└── estimator/
├── saved_model.pb
└── variables/
...
Usable by tools that understand
TensorFlow model format
Usable by any tool that can run
Python (Docker, Spark, etc!)
run_id: 769915006efd4c4bbd662461
time_created: 2018-06-28T12:34
flavors:
tensorflow:
saved_model_dir: estimator
signature_def_key: predict
python_function:
loader_module: mlflow.tensorflow
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
About Those Flavors
Models can generate several flavors
• Spark ML model generates spark, mleap and pyfunc flavors
Generic flavors provide abstraction layer
• pyfunc can be served locally, as spark_udf, on Azure ML, …
• By generating pyfunc flavor we get all above
Different Flavors Can be Loaded By Different Languages
• pyfunc in python, mleap in Java, crate in R, Keras in python and R
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
PyFunc - Generic Python Model
PyFunc is saved as “just a directory”:
./model_dir/
./MLmodel: configuration
<code>: code packaged with the model (specified in the MLmodel file)
<data>: data packaged with the model (specified in the MLmodel file)
<env>: Conda environment definition (specified in the MLmodel file)
Model loader specified in MLmodel file:
Arbitrary python code loaded dynamically at runtime
Loaded Pyfunc is “just an object” with a predict method:
predict(pandas.DataFrame) -> pandas.DataFrame | numpy.array
30
mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow
Demo: easy to deploy models
MLflow Components
31
Tracking
Record and query
experiments: code,
data, config, results
Projects
Packaging format
for reproducible runs
on any platform
Models
General model format
that supports diverse
deployment tools
mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow
What’s new with
What Does the 1.0 Release Mean?
API stability of the original components
• Safe to build apps and integrations around them long term
Time to start adding some new features!
33
Selected New Features in MLflow 1.0
• Support for logging metrics per user-defined step
• HDFS support for artifacts
• ONNX Model Flavor [experimental]
• Deploying an MLflow Model as a Docker Image [experimental]
Support for logging metrics per user-defined step
Metrics logged at the end of a run, e.g.:
● Overall accuracy
● Overall AUC
● Overall loss
Metrics logged while training, e.g.:
● Accuracy per minibatch
● AUC per minibatch
● Loss per minibatch
Currently visualized by logging order:
Support for logging metrics per user-defined step
New step argument for log_metric
● Define the x coordinate for the metric
● Define ordering and scale of the horizontal axis in visualizations
log_metric ("exp", 1, 10)
log_metric ("exp", 2, 1000)
log_metric ("exp", 4, 10000)
log_metric ("exp", 8, 100000)
log_metric ("exp", 16, 1000000)
log_metric(key, value, step=None)
Improved Search
Search API supports a simplified version of the SQL WHERE clause, e.g.:
params.model = "LogisticRegression" and metrics.error <= 0.05
Improved Search
Search API supports a simplified version of the SQL WHERE clause, e.g.:
params.model = "LogisticRegression" and metrics.error <= 0.05
all_experiments = [exp.experiment_id for
exp in MlflowClient().list_experiments()]
runs = MlflowClient().search_runs(
all_experiments,
"params.model='LogisticRegression'"
" and metrics.error<=0.05",
ViewType.ALL)
Python API Example
Improved Search
Search API supports a simplified version of the SQL WHERE clause, e.g.:
Python API Example UI Example
all_experiments = [exp.experiment_id for
exp in MlflowClient().list_experiments()]
runs = MlflowClient().search_runs(
all_experiments,
"params.model='LogisticRegression'"
" and metrics.error<=0.05",
ViewType.ALL)
params.model = "LogisticRegression" and metrics.error <= 0.05
HDFS Support for Artifacts
mlflow.log_artifact(local_path, artifact_path=None)
AWS S3 Azure Blob
Store
Google Cloud
Storage
HDFS● DBFS
● NFS
● FTP
● SFTP
Supported Artifact Stores
ONNX Model Flavor
[Experimental]
ONNX models export both
• ONNX native format
• Pyfunc
mlflow.onnx.load_model(model_uri)
mlflow.onnx.log_model(onnx_model, artifact_path, conda_env=None)
mlflow.onnx.save_model(onnx_model, path, conda_env=None,
mlflow_model=<mlflow.models.Model object>)
Supported Model Flavors
Scikit TensorFlow MLlib H2O PyTorch Keras MLeap
Python
Function
R FunctionONNX
Docker Build
[Experimental]
$ mlflow models build-docker -m "runs:/some-run-uuid/my-model" -n "my-image-name"
$ docker run -p 5001:8080 "my-image-name"
Builds a Docker image whose default entrypoint serves the
specified MLflow model at port 8080 within the container.
43
mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow
beyond 1.0
What users want to see next
What’s coming soon
• New component: Model Registry
• Version-controlled registry of models
• Model lifecycle management
• Model monitoring
Model registry & deployment tracking
https://databricks.com/sparkaisummit/north-america/2019-spark-summit-ai-keynotes-2#keynote-e
What’s coming soon
• New component: Model Registry
• Version-controlled registry of models
• Model lifecycle management
• Model monitoring
• Auto-logging from common frameworks
What’s coming soon
• New component: Model Registry
• Version-controlled registry of models
• Model lifecycle management
• Model monitoring
• Auto-logging from common frameworks
• Parallel coordinates plot
What’s coming soon
• New component: Model Registry
• Version-controlled registry of models
• Model lifecycle management
• Model monitoring
• Auto-logging from common frameworks
• Parallel coordinates plot
• Delta Lake integration (Delta.io) for Data Versioning
Learning More About MLflow
pip install mlflow to get started
Find docs & examples at mlflow.org
https://mlflow.org/docs/latest/tutorial.html
tinyurl.com/mlflow-slack
50
Rendez-vous à Amsterdam !
51
52
mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow
Thank You

More Related Content

Similar to Utilisation de MLflow pour le cycle de vie des projet Machine learning

"Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow""Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow"
Databricks
 
Whats new in_mlflow
Whats new in_mlflowWhats new in_mlflow
Whats new in_mlflow
Databricks
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
mlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecyclemlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecycle
Databricks
 
MLFlow 1.0 Meetup
MLFlow 1.0 Meetup MLFlow 1.0 Meetup
MLFlow 1.0 Meetup
Databricks
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
Databricks
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
Databricks
 
Scaling up Machine Learning Development
Scaling up Machine Learning DevelopmentScaling up Machine Learning Development
Scaling up Machine Learning Development
Matei Zaharia
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
Databricks
 
MLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to productionMLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to production
Fabian Hadiji
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
Robert Grossman
 
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Databricks
 
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
Amazon Web Services
 
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowImproving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Databricks
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with Databricks
Liangjun Jiang
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricks
Liangjun Jiang
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
James Anderson
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
Jim Dowling
 
Managing the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflowManaging the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflow
Databricks
 
Simplifying Model Management with MLflow
Simplifying Model Management with MLflowSimplifying Model Management with MLflow
Simplifying Model Management with MLflow
Databricks
 

Similar to Utilisation de MLflow pour le cycle de vie des projet Machine learning (20)

"Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow""Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow"
 
Whats new in_mlflow
Whats new in_mlflowWhats new in_mlflow
Whats new in_mlflow
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
mlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecyclemlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecycle
 
MLFlow 1.0 Meetup
MLFlow 1.0 Meetup MLFlow 1.0 Meetup
MLFlow 1.0 Meetup
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
 
Scaling up Machine Learning Development
Scaling up Machine Learning DevelopmentScaling up Machine Learning Development
Scaling up Machine Learning Development
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
 
MLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to productionMLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to production
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
 
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
 
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
 
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowImproving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with Databricks
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricks
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
 
Managing the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflowManaging the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflow
 
Simplifying Model Management with MLflow
Simplifying Model Management with MLflowSimplifying Model Management with MLflow
Simplifying Model Management with MLflow
 

More from Paris Data Engineers !

Spark tools by Jonathan Winandy
Spark tools by Jonathan WinandySpark tools by Jonathan Winandy
Spark tools by Jonathan Winandy
Paris Data Engineers !
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Paris Data Engineers !
 
SCIO : Apache Beam API
SCIO : Apache Beam APISCIO : Apache Beam API
SCIO : Apache Beam API
Paris Data Engineers !
 
Apache Beam de A à Z
 Apache Beam de A à Z Apache Beam de A à Z
Apache Beam de A à Z
Paris Data Engineers !
 
REX : pourquoi et comment développer son propre scheduler
REX : pourquoi et comment développer son propre schedulerREX : pourquoi et comment développer son propre scheduler
REX : pourquoi et comment développer son propre scheduler
Paris Data Engineers !
 
Deeplearning in production
Deeplearning in productionDeeplearning in production
Deeplearning in production
Paris Data Engineers !
 
Introduction à Apache Pulsar
 Introduction à Apache Pulsar Introduction à Apache Pulsar
Introduction à Apache Pulsar
Paris Data Engineers !
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production
Paris Data Engineers !
 
Change Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHChange Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVH
Paris Data Engineers !
 
Building highly reliable data pipeline @datadog par Quentin François
Building highly reliable data pipeline @datadog par Quentin FrançoisBuilding highly reliable data pipeline @datadog par Quentin François
Building highly reliable data pipeline @datadog par Quentin François
Paris Data Engineers !
 
Scala pour le Data Engineering par Jonathan Winandy
Scala pour le Data Engineering par Jonathan WinandyScala pour le Data Engineering par Jonathan Winandy
Scala pour le Data Engineering par Jonathan Winandy
Paris Data Engineers !
 

More from Paris Data Engineers ! (11)

Spark tools by Jonathan Winandy
Spark tools by Jonathan WinandySpark tools by Jonathan Winandy
Spark tools by Jonathan Winandy
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
SCIO : Apache Beam API
SCIO : Apache Beam APISCIO : Apache Beam API
SCIO : Apache Beam API
 
Apache Beam de A à Z
 Apache Beam de A à Z Apache Beam de A à Z
Apache Beam de A à Z
 
REX : pourquoi et comment développer son propre scheduler
REX : pourquoi et comment développer son propre schedulerREX : pourquoi et comment développer son propre scheduler
REX : pourquoi et comment développer son propre scheduler
 
Deeplearning in production
Deeplearning in productionDeeplearning in production
Deeplearning in production
 
Introduction à Apache Pulsar
 Introduction à Apache Pulsar Introduction à Apache Pulsar
Introduction à Apache Pulsar
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production
 
Change Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHChange Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVH
 
Building highly reliable data pipeline @datadog par Quentin François
Building highly reliable data pipeline @datadog par Quentin FrançoisBuilding highly reliable data pipeline @datadog par Quentin François
Building highly reliable data pipeline @datadog par Quentin François
 
Scala pour le Data Engineering par Jonathan Winandy
Scala pour le Data Engineering par Jonathan WinandyScala pour le Data Engineering par Jonathan Winandy
Scala pour le Data Engineering par Jonathan Winandy
 

Recently uploaded

Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 

Recently uploaded (20)

Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 

Utilisation de MLflow pour le cycle de vie des projet Machine learning

  • 1. Utilisation de pour le cycle de vie des projets Machine Learning Arduino Cascella, Solutions Architect
  • 2. Outline • Introductions • MLflow Intro • Overview of Components • MLflow 1.0 & Roadmap
  • 4. Accelerate innovation by unifying data science, engineering and business • Original creators of • 2000+ global companies use our platform across big data & machine learning lifecycle VISION WHO WE ARE Unified Analytics PlatformSOLUTION
  • 6. ML Lifecycle Challenges Delta Tuning Model Mgmt Raw Data ETL TrainFeaturize Score/Serve Batch + Realtime Monitor Alert, Debug Deploy RetrainUpdate Features Zoo of Ecosystem Frameworks Collaboration Scale Governance An open source platform for the machin learning lifecycle New Data
  • 7. Custom ML Platforms Facebook FBLearner, Uber Michelangelo, Google TFX + Standardize the data prep / training / deploy loop: if you work with the platform, you get these! – Limited to a few algorithms or frameworks – Tied to one company’s infrastructure Can we provide similar benefits in an open manner?
  • 8. Introducing Open machine learning platform • Works with any ML library & language • Runs the same way anywhere (e.g., any cloud) • Designed to be useful for 1 or 1000+ person orgs
  • 9. MLflow Design Philosophy 1. “API-first”, open platform • Allow submitting runs, models, etc from any library & language • Example: a “model” can just be a lambda function that MLflow can then deploy in many places (Docker, Azure ML, Spark UDF, …) Key enabler: built around REST APIs and CLI
  • 10. MLflow Design Philosophy 2. Modular design • Let people use different components individually (e.g., use MLflow’s project format but not its deployment tools) • Easy to integrate into existing ML platforms & workflows Key enabler: distinct components (Tracking/Projects/Models)
  • 11. MLflow Community Growth 600k 100+ 40 Comparison: Apache Spark took 3 years to get to 100 contributors, and has 1.2M downloads/month on PyPI
  • 12. Some Users & Contributors
  • 15. MLflow Components 15 Tracking Record and query experiments: code, data, config, results Projects Packaging format for reproducible runs on any platform Models General model format that supports diverse deployment tools
  • 16. Keeping Track of Experiments Source: https://fr.wikipedia.org/wiki/Fichier:Jean_Mi%C3%A9lot,_Brussels.jpg
  • 17. Key Concepts in Tracking Parameters: key-value inputs to your code Metrics: numeric values (can update over time) Artifacts: arbitrary files, including data and models Source: what code ran? Tags & Comments
  • 18. MLflow Tracking API: Simple! 18 Tracking Record and query experiments: code, configs, results, …etc import mlflow # log model’s tuning parameters with mlflow.start_run(): mlflow.log_param("layers", layers) mlflow.log_param("alpha", alpha) # log model’s metrics mlflow.log_metric("mse", model.mse()) mlflow.log_artifact("plot", model.plot(test_df)) mlflow.tensorflow.log_model(model)
  • 19. Notebooks Local Apps Cloud Jobs Tracking Server UI API MLflow Tracking Python or REST API
  • 22. MLflow Projects Motivation Diverse set of tools Diverse set of environments Projects Packaging format for reproducible runs on any platform Result: It is difficult to productionize and share.
  • 23. Project Spec Code DataConfig Local Execution Remote Execution MLflow Projects
  • 24. Example MLflow Project my_project/ ├── MLproject │ │ │ │ │ ├── conda.yaml ├── main.py └── model.py ... conda_env: conda.yaml entry_points: main: parameters: training_data: path lambda: {type: float, default: 0.1} command: python main.py {training_data} {lambda} $ mlflow run git://<my_project> mlflow.run(“git://<my_project>”, ...)
  • 26. Model Format Flavor 2Flavor 1 Run Sources Inference Code Batch & Stream Scoring Cloud Serving Tools MLflow Models Simple model flavors usable by many tools
  • 27. Example MLflow Model my_model/ ├── MLmodel │ │ │ │ │ └── estimator/ ├── saved_model.pb └── variables/ ... Usable by tools that understand TensorFlow model format Usable by any tool that can run Python (Docker, Spark, etc!) run_id: 769915006efd4c4bbd662461 time_created: 2018-06-28T12:34 flavors: tensorflow: saved_model_dir: estimator signature_def_key: predict python_function: loader_module: mlflow.tensorflow
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. About Those Flavors Models can generate several flavors • Spark ML model generates spark, mleap and pyfunc flavors Generic flavors provide abstraction layer • pyfunc can be served locally, as spark_udf, on Azure ML, … • By generating pyfunc flavor we get all above Different Flavors Can be Loaded By Different Languages • pyfunc in python, mleap in Java, crate in R, Keras in python and R
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. PyFunc - Generic Python Model PyFunc is saved as “just a directory”: ./model_dir/ ./MLmodel: configuration <code>: code packaged with the model (specified in the MLmodel file) <data>: data packaged with the model (specified in the MLmodel file) <env>: Conda environment definition (specified in the MLmodel file) Model loader specified in MLmodel file: Arbitrary python code loaded dynamically at runtime Loaded Pyfunc is “just an object” with a predict method: predict(pandas.DataFrame) -> pandas.DataFrame | numpy.array
  • 31. MLflow Components 31 Tracking Record and query experiments: code, data, config, results Projects Packaging format for reproducible runs on any platform Models General model format that supports diverse deployment tools mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow
  • 33. What Does the 1.0 Release Mean? API stability of the original components • Safe to build apps and integrations around them long term Time to start adding some new features! 33
  • 34. Selected New Features in MLflow 1.0 • Support for logging metrics per user-defined step • HDFS support for artifacts • ONNX Model Flavor [experimental] • Deploying an MLflow Model as a Docker Image [experimental]
  • 35. Support for logging metrics per user-defined step Metrics logged at the end of a run, e.g.: ● Overall accuracy ● Overall AUC ● Overall loss Metrics logged while training, e.g.: ● Accuracy per minibatch ● AUC per minibatch ● Loss per minibatch Currently visualized by logging order:
  • 36. Support for logging metrics per user-defined step New step argument for log_metric ● Define the x coordinate for the metric ● Define ordering and scale of the horizontal axis in visualizations log_metric ("exp", 1, 10) log_metric ("exp", 2, 1000) log_metric ("exp", 4, 10000) log_metric ("exp", 8, 100000) log_metric ("exp", 16, 1000000) log_metric(key, value, step=None)
  • 37. Improved Search Search API supports a simplified version of the SQL WHERE clause, e.g.: params.model = "LogisticRegression" and metrics.error <= 0.05
  • 38. Improved Search Search API supports a simplified version of the SQL WHERE clause, e.g.: params.model = "LogisticRegression" and metrics.error <= 0.05 all_experiments = [exp.experiment_id for exp in MlflowClient().list_experiments()] runs = MlflowClient().search_runs( all_experiments, "params.model='LogisticRegression'" " and metrics.error<=0.05", ViewType.ALL) Python API Example
  • 39. Improved Search Search API supports a simplified version of the SQL WHERE clause, e.g.: Python API Example UI Example all_experiments = [exp.experiment_id for exp in MlflowClient().list_experiments()] runs = MlflowClient().search_runs( all_experiments, "params.model='LogisticRegression'" " and metrics.error<=0.05", ViewType.ALL) params.model = "LogisticRegression" and metrics.error <= 0.05
  • 40. HDFS Support for Artifacts mlflow.log_artifact(local_path, artifact_path=None) AWS S3 Azure Blob Store Google Cloud Storage HDFS● DBFS ● NFS ● FTP ● SFTP Supported Artifact Stores
  • 41. ONNX Model Flavor [Experimental] ONNX models export both • ONNX native format • Pyfunc mlflow.onnx.load_model(model_uri) mlflow.onnx.log_model(onnx_model, artifact_path, conda_env=None) mlflow.onnx.save_model(onnx_model, path, conda_env=None, mlflow_model=<mlflow.models.Model object>) Supported Model Flavors Scikit TensorFlow MLlib H2O PyTorch Keras MLeap Python Function R FunctionONNX
  • 42. Docker Build [Experimental] $ mlflow models build-docker -m "runs:/some-run-uuid/my-model" -n "my-image-name" $ docker run -p 5001:8080 "my-image-name" Builds a Docker image whose default entrypoint serves the specified MLflow model at port 8080 within the container.
  • 44. What users want to see next
  • 45. What’s coming soon • New component: Model Registry • Version-controlled registry of models • Model lifecycle management • Model monitoring
  • 46. Model registry & deployment tracking https://databricks.com/sparkaisummit/north-america/2019-spark-summit-ai-keynotes-2#keynote-e
  • 47. What’s coming soon • New component: Model Registry • Version-controlled registry of models • Model lifecycle management • Model monitoring • Auto-logging from common frameworks
  • 48. What’s coming soon • New component: Model Registry • Version-controlled registry of models • Model lifecycle management • Model monitoring • Auto-logging from common frameworks • Parallel coordinates plot
  • 49. What’s coming soon • New component: Model Registry • Version-controlled registry of models • Model lifecycle management • Model monitoring • Auto-logging from common frameworks • Parallel coordinates plot • Delta Lake integration (Delta.io) for Data Versioning
  • 50. Learning More About MLflow pip install mlflow to get started Find docs & examples at mlflow.org https://mlflow.org/docs/latest/tutorial.html tinyurl.com/mlflow-slack 50