Successfully reported this slideshow.
Your SlideShare is downloading. ×

Utilisation de MLflow pour le cycle de vie des projet Machine learning

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 52 Ad

Utilisation de MLflow pour le cycle de vie des projet Machine learning

Download to read offline

Mlflow est un projet opensource pour administrer le cycle de vie des projets machine learning (de l’expérimentation jusqu’au déploiement) afin de mieux les intégrer dans l’écosystème qui les entoure.
Durant cette présentation nous montrerons les différentes composantes de MLflow et ferons une démonstration de son utilisation à la fois dans le contexte d’une plateforme Databricks et d’un IDE local.

Mlflow est un projet opensource pour administrer le cycle de vie des projets machine learning (de l’expérimentation jusqu’au déploiement) afin de mieux les intégrer dans l’écosystème qui les entoure.
Durant cette présentation nous montrerons les différentes composantes de MLflow et ferons une démonstration de son utilisation à la fois dans le contexte d’une plateforme Databricks et d’un IDE local.

Advertisement
Advertisement

More Related Content

Similar to Utilisation de MLflow pour le cycle de vie des projet Machine learning (20)

Advertisement

Recently uploaded (20)

Utilisation de MLflow pour le cycle de vie des projet Machine learning

  1. 1. Utilisation de pour le cycle de vie des projets Machine Learning Arduino Cascella, Solutions Architect
  2. 2. Outline • Introductions • MLflow Intro • Overview of Components • MLflow 1.0 & Roadmap
  3. 3. 33 Arduino Cascella Solutions Architect arduino@databricks.com @ArduinoCascella About me
  4. 4. Accelerate innovation by unifying data science, engineering and business • Original creators of • 2000+ global companies use our platform across big data & machine learning lifecycle VISION WHO WE ARE Unified Analytics PlatformSOLUTION
  5. 5. Machine Learning Development is Complex... Source: https://xkcd.com/1838/
  6. 6. ML Lifecycle Challenges Delta Tuning Model Mgmt Raw Data ETL TrainFeaturize Score/Serve Batch + Realtime Monitor Alert, Debug Deploy RetrainUpdate Features Zoo of Ecosystem Frameworks Collaboration Scale Governance An open source platform for the machin learning lifecycle New Data
  7. 7. Custom ML Platforms Facebook FBLearner, Uber Michelangelo, Google TFX + Standardize the data prep / training / deploy loop: if you work with the platform, you get these! – Limited to a few algorithms or frameworks – Tied to one company’s infrastructure Can we provide similar benefits in an open manner?
  8. 8. Introducing Open machine learning platform • Works with any ML library & language • Runs the same way anywhere (e.g., any cloud) • Designed to be useful for 1 or 1000+ person orgs
  9. 9. MLflow Design Philosophy 1. “API-first”, open platform • Allow submitting runs, models, etc from any library & language • Example: a “model” can just be a lambda function that MLflow can then deploy in many places (Docker, Azure ML, Spark UDF, …) Key enabler: built around REST APIs and CLI
  10. 10. MLflow Design Philosophy 2. Modular design • Let people use different components individually (e.g., use MLflow’s project format but not its deployment tools) • Easy to integrate into existing ML platforms & workflows Key enabler: distinct components (Tracking/Projects/Models)
  11. 11. MLflow Community Growth 600k 100+ 40 Comparison: Apache Spark took 3 years to get to 100 contributors, and has 1.2M downloads/month on PyPI
  12. 12. Some Users & Contributors
  13. 13. Supported Integrations: June ‘18 13
  14. 14. Supported Integrations: June ‘19 14
  15. 15. MLflow Components 15 Tracking Record and query experiments: code, data, config, results Projects Packaging format for reproducible runs on any platform Models General model format that supports diverse deployment tools
  16. 16. Keeping Track of Experiments Source: https://fr.wikipedia.org/wiki/Fichier:Jean_Mi%C3%A9lot,_Brussels.jpg
  17. 17. Key Concepts in Tracking Parameters: key-value inputs to your code Metrics: numeric values (can update over time) Artifacts: arbitrary files, including data and models Source: what code ran? Tags & Comments
  18. 18. MLflow Tracking API: Simple! 18 Tracking Record and query experiments: code, configs, results, …etc import mlflow # log model’s tuning parameters with mlflow.start_run(): mlflow.log_param("layers", layers) mlflow.log_param("alpha", alpha) # log model’s metrics mlflow.log_metric("mse", model.mse()) mlflow.log_artifact("plot", model.plot(test_df)) mlflow.tensorflow.log_model(model)
  19. 19. Notebooks Local Apps Cloud Jobs Tracking Server UI API MLflow Tracking Python or REST API
  20. 20. 20 mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow Demo: tracking
  21. 21. Source: https://www.digital-science.com/blog/guest/digital-science-doodles-data-reproducibility/ Need for reproducible Data Science...
  22. 22. MLflow Projects Motivation Diverse set of tools Diverse set of environments Projects Packaging format for reproducible runs on any platform Result: It is difficult to productionize and share.
  23. 23. Project Spec Code DataConfig Local Execution Remote Execution MLflow Projects
  24. 24. Example MLflow Project my_project/ ├── MLproject │ │ │ │ │ ├── conda.yaml ├── main.py └── model.py ... conda_env: conda.yaml entry_points: main: parameters: training_data: path lambda: {type: float, default: 0.1} command: python main.py {training_data} {lambda} $ mlflow run git://<my_project> mlflow.run(“git://<my_project>”, ...)
  25. 25. 25 mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow Demo: reproducible projects
  26. 26. Model Format Flavor 2Flavor 1 Run Sources Inference Code Batch & Stream Scoring Cloud Serving Tools MLflow Models Simple model flavors usable by many tools
  27. 27. Example MLflow Model my_model/ ├── MLmodel │ │ │ │ │ └── estimator/ ├── saved_model.pb └── variables/ ... Usable by tools that understand TensorFlow model format Usable by any tool that can run Python (Docker, Spark, etc!) run_id: 769915006efd4c4bbd662461 time_created: 2018-06-28T12:34 flavors: tensorflow: saved_model_dir: estimator signature_def_key: predict python_function: loader_module: mlflow.tensorflow
  28. 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. About Those Flavors Models can generate several flavors • Spark ML model generates spark, mleap and pyfunc flavors Generic flavors provide abstraction layer • pyfunc can be served locally, as spark_udf, on Azure ML, … • By generating pyfunc flavor we get all above Different Flavors Can be Loaded By Different Languages • pyfunc in python, mleap in Java, crate in R, Keras in python and R
  29. 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. PyFunc - Generic Python Model PyFunc is saved as “just a directory”: ./model_dir/ ./MLmodel: configuration <code>: code packaged with the model (specified in the MLmodel file) <data>: data packaged with the model (specified in the MLmodel file) <env>: Conda environment definition (specified in the MLmodel file) Model loader specified in MLmodel file: Arbitrary python code loaded dynamically at runtime Loaded Pyfunc is “just an object” with a predict method: predict(pandas.DataFrame) -> pandas.DataFrame | numpy.array
  30. 30. 30 mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow Demo: easy to deploy models
  31. 31. MLflow Components 31 Tracking Record and query experiments: code, data, config, results Projects Packaging format for reproducible runs on any platform Models General model format that supports diverse deployment tools mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow
  32. 32. What’s new with
  33. 33. What Does the 1.0 Release Mean? API stability of the original components • Safe to build apps and integrations around them long term Time to start adding some new features! 33
  34. 34. Selected New Features in MLflow 1.0 • Support for logging metrics per user-defined step • HDFS support for artifacts • ONNX Model Flavor [experimental] • Deploying an MLflow Model as a Docker Image [experimental]
  35. 35. Support for logging metrics per user-defined step Metrics logged at the end of a run, e.g.: ● Overall accuracy ● Overall AUC ● Overall loss Metrics logged while training, e.g.: ● Accuracy per minibatch ● AUC per minibatch ● Loss per minibatch Currently visualized by logging order:
  36. 36. Support for logging metrics per user-defined step New step argument for log_metric ● Define the x coordinate for the metric ● Define ordering and scale of the horizontal axis in visualizations log_metric ("exp", 1, 10) log_metric ("exp", 2, 1000) log_metric ("exp", 4, 10000) log_metric ("exp", 8, 100000) log_metric ("exp", 16, 1000000) log_metric(key, value, step=None)
  37. 37. Improved Search Search API supports a simplified version of the SQL WHERE clause, e.g.: params.model = "LogisticRegression" and metrics.error <= 0.05
  38. 38. Improved Search Search API supports a simplified version of the SQL WHERE clause, e.g.: params.model = "LogisticRegression" and metrics.error <= 0.05 all_experiments = [exp.experiment_id for exp in MlflowClient().list_experiments()] runs = MlflowClient().search_runs( all_experiments, "params.model='LogisticRegression'" " and metrics.error<=0.05", ViewType.ALL) Python API Example
  39. 39. Improved Search Search API supports a simplified version of the SQL WHERE clause, e.g.: Python API Example UI Example all_experiments = [exp.experiment_id for exp in MlflowClient().list_experiments()] runs = MlflowClient().search_runs( all_experiments, "params.model='LogisticRegression'" " and metrics.error<=0.05", ViewType.ALL) params.model = "LogisticRegression" and metrics.error <= 0.05
  40. 40. HDFS Support for Artifacts mlflow.log_artifact(local_path, artifact_path=None) AWS S3 Azure Blob Store Google Cloud Storage HDFS● DBFS ● NFS ● FTP ● SFTP Supported Artifact Stores
  41. 41. ONNX Model Flavor [Experimental] ONNX models export both • ONNX native format • Pyfunc mlflow.onnx.load_model(model_uri) mlflow.onnx.log_model(onnx_model, artifact_path, conda_env=None) mlflow.onnx.save_model(onnx_model, path, conda_env=None, mlflow_model=<mlflow.models.Model object>) Supported Model Flavors Scikit TensorFlow MLlib H2O PyTorch Keras MLeap Python Function R FunctionONNX
  42. 42. Docker Build [Experimental] $ mlflow models build-docker -m "runs:/some-run-uuid/my-model" -n "my-image-name" $ docker run -p 5001:8080 "my-image-name" Builds a Docker image whose default entrypoint serves the specified MLflow model at port 8080 within the container.
  43. 43. 43 mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow beyond 1.0
  44. 44. What users want to see next
  45. 45. What’s coming soon • New component: Model Registry • Version-controlled registry of models • Model lifecycle management • Model monitoring
  46. 46. Model registry & deployment tracking https://databricks.com/sparkaisummit/north-america/2019-spark-summit-ai-keynotes-2#keynote-e
  47. 47. What’s coming soon • New component: Model Registry • Version-controlled registry of models • Model lifecycle management • Model monitoring • Auto-logging from common frameworks
  48. 48. What’s coming soon • New component: Model Registry • Version-controlled registry of models • Model lifecycle management • Model monitoring • Auto-logging from common frameworks • Parallel coordinates plot
  49. 49. What’s coming soon • New component: Model Registry • Version-controlled registry of models • Model lifecycle management • Model monitoring • Auto-logging from common frameworks • Parallel coordinates plot • Delta Lake integration (Delta.io) for Data Versioning
  50. 50. Learning More About MLflow pip install mlflow to get started Find docs & examples at mlflow.org https://mlflow.org/docs/latest/tutorial.html tinyurl.com/mlflow-slack 50
  51. 51. Rendez-vous à Amsterdam ! 51
  52. 52. 52 mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow Thank You

×