Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to MLflow


Published on

ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure. In this talk, I present MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.

Published in: Software

Introduction to MLflow

  1. 1. Introduction to Matei Zaharia & Aaron Davidson June28, 2018
  2. 2. Outline ML development challenges How MLflow tackles these Demo Roadmap
  3. 3. Machine Learning Development is Complex
  4. 4. ML Lifecycle 4 Delta Data Prep Training Deploy Raw Data μ λ θ Tuning Scale μ λ θ Tuning Scale Scale Scale Model Exchange Governance
  5. 5. ML Development Challenges 100s of software tools to leverage Hard to track & reproduceresults: code, data, params,etc Hard to productionizemodels Needs large scale for best results
  6. 6. Introducing Open machine learningplatform • Works with any ML library& language • Runs the same wayanywhere (e.g. any cloud) • Designed to be useful for 1 or 100,000 person orgs
  7. 7. MLflow Design Philosophy 1. “API-first”, open platform • Allow submittingruns,models,etc from anylibrary & language • Example: a “model” can justbe a lambdafunction thatMLflow can thendeploy in many places (Docker, AzureML, Spark UDF, …) Key enabler: built aroundREST APIs and CLI
  8. 8. MLflow Design Philosophy 2. Modular design • Let people use different components individually(e.g.,use MLflow’s project format but not its deployment tools) • Easy to integrateinto existing ML platforms & workflows Key enabler: distinct components (Tracking/Projects/Models)
  9. 9. Why Open Source? Everyone is solvinga similarproblem Lots of benefits in having a common API across orgs • Can open source & share individualworkflow steps • ML tool developers can easily reach lots of users – E.g. a new ML library canuse MLflow Models to reach many serving tools
  10. 10. MLflow Components 10 Tracking Record and query experiments: code, data, config, results Projects Packaging format for reproducible runs on any platform Models General model format that supports diverse deployment tools
  11. 11. Notebooks LocalApps CloudJobs Tracking Server UI API MLflow Tracking Python or REST API
  12. 12. Key Concepts in Tracking Parameters: key-value inputs to your code Metrics: numeric values (can update over time) Artifacts: arbitrary files, including models Source: what code ran?
  13. 13. Project Spec Code DataConfig LocalExecution Remote Execution MLflow Projects
  14. 14. Example MLflow Project my_project/ ├── MLproject │ │ │ │ │ ├── conda.yaml ├── └── ... conda_env: conda.yaml entry_points: main: parameters: training_data: path lambda: {type: float, default: 0.1} command: python {training_data} {lambda} $ mlflow run git://<my_project>“git://<my_project>”, ...)
  15. 15. Model Format Flavor 2Flavor 1 Run Sources InferenceCode Batch & Stream Scoring Cloud ServingTools MLflow Models Simple model flavors usableby many tools
  16. 16. Example MLflow Model my_model/ ├── MLmodel │ │ │ │ │ └── estimator/ ├── saved_model.pb └── variables/ ... Usable by tools that understand TensorFlowmodel format Usable by any tool that can run Python (Docker,Spark,etc!) run_id: 769915006efd4c4bbd662461 time_created: 2018-06-28T12:34 flavors: tensorflow: saved_model_dir: estimator signature_def_key: predict python_function: loader_module: mlflow.tensorflow
  17. 17. Demo
  18. 18. Roadmap
  19. 19. Current Status MLflow is still alpha, so expect things to break • But send input or patches on GitHub! Just made0.2.1 release • TensorFlowintegration (model logging & serving) • More robust server (multi-worker setup and S3 artifact store) • Doc, example and API improvements
  20. 20. Longer-Term Roadmap 1. Improvingcurrent components • Pluggable execution backends for • Database-backed tracking store (already a pluggable API) • Model metadata (e.g. required input schema) • Easier support for multi-step workflows
  21. 21. Longer-Term Roadmap 2. MLflow Data component • Let MLflowprojects load data from diverse formats (e.g. CSV vs Parquet) so you don’t have to pick a format in advance • Will build on Spark’s Data SourceAPI
  22. 22. Longer-Term Roadmap 3. Hyperparametertuning • Integrate with common hyperparameter tuning libraries • Make it easier to launch & track many runs in parallel (already possible but kind of awkward)
  23. 23. Longer-Term Roadmap 4. Language and libraryintegrations • Java and R are high on our list for APIs • Built-in Spark MLlib and PyTorchintegrations • Demonstrate how to use MLflow with other libraries (it’s easy) Let us know if you have other roadmap ideas!
  24. 24. Contributingto MLflow Submit issues and patches on GitHub • We’re using it for all our development & issue tracking • See CONTRIBUTING.rstfor how to run dev builds Join our mailinglist: Join our Slack:
  25. 25. Conclusion Powerful workflow tools can simplifythe ML lifecycle • Improve usability for both data scientists and engineers • Same way that software dev lifecycle tools simplify dev MLflow is a lightweight, open platform that integrates easily into existing workflows