Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Loading in …3
×
1 of 18

Accelerating Production Machine Learning with MLflow with Matei Zaharia

2

Share

Download to read offline

Successfully building and deploying a machine learning model can be difficult to do once. Enabling other data scientists (or yourself, one month later) to reproduce your pipeline, to compare the results of different versions, to track what’s running where, and to redeploy and rollback updated models is much harder.

In this talk, I’ll introduce MLflow, a new open source project from Databricks that simplifies the machine learning lifecycle. MLflow provides APIs for tracking experiment runs between multiple users within a reproducible environment, and for managing the deployment of models to production. MLflow is designed to be an open, modular platform, in the sense that you can use it with any existing ML library and development process. MLflow was launched in June 2018 and has already seen significant community contributions, with 45 contributors and new features new multiple language APIs, integrations with popular ML libraries, and storage backends. I’ll go through some of the newly released features and explain how to get started with MLflow.

Related Books

Free with a 30 day trial from Scribd

See all

Accelerating Production Machine Learning with MLflow with Matei Zaharia

  1. 1. : Accelerating Production Machine Learning Matei Zaharia @matei_zaharia
  2. 2. Machine Learning Development is Complex
  3. 3. ML Lifecycle 3 Delta Data Prep Training Deploy Raw Data μ λ θ Tuning Scale μ λ θ Tuning Scale Scale Scale Model Exchange Governance
  4. 4. Custom ML Platforms Facebook FBLearner, Uber Michelangelo, Google TFX Standardize the data prep / training / deploy cycle: if you work within the platform, you get these! Limited to a few algorithms or frameworks Tied to one company’s infrastructure Can we provide similar benefits in an open manner?
  5. 5. www.mlflow.org Introducing Open source machine learning platform • Works with any ML library & language • Runs the same way everywhere & cross-cloud • Scales to big data with Apache Spark Launched this June • Already 48 contributors and many new features! This Talk: MLflow Overview + New Announcements
  6. 6. How Does MLflow Work? 8 Tracking Record and query experiments: code, data, config, results Projects Package workflows into reproducible and reusable steps Models Model packaging format for diverse deployment tools
  7. 7. Model Development without MLflow data = load_text(file) ngrams = extract_ngrams(data, N=n) model = train_model(ngrams, learning_rate=lr) score = compute_accuracy(model) print(“For n=%d, lr=%f: accuracy=%f” % (n, lr, score)) pickle.dump(model, open(“model.pkl”)) For n=2, lr=0.1: accuracy=0.71 For n=2, lr=0.2: accuracy=0.79 For n=2, lr=0.5: accuracy=0.83 For n=2, lr=0.9: accuracy=0.79 For n=3, lr=0.1: accuracy=0.83 For n=3, lr=0.2: accuracy=0.82 For n=4, lr=0.5: accuracy=0.75 ... What if I expand the input data? What if I tune this other parameter? What if I upgrade my ML library? What version of my code was this result from?
  8. 8. Model Deployment without MLflow Code & Models DATA SCIENTIST PRODUCTION ENGINEER Please deploy this SciKit model! Please deploy this Spark model! Please deploy this R model! Please deploy this TensorFlow model! Please deploy this ArXiv paper! …
  9. 9. Model Development with MLflow data = load_text(file) ngrams = extract_ngrams(data, N=n) model = train_model(ngrams, learning_rate=lr) score = compute_accuracy(model) print(“For n=%d, lr=%f: accuracy=%f” % (n, lr, score)) pickle.dump(model, open(“model.pkl”))
  10. 10. $ mlflow ui Model Development with MLflow data = load_text(file) ngrams = extract_ngrams(data, N=n) model = train_model(ngrams, learning_rate=lr) score = compute_accuracy(model) mlflow.log_param(“data_file”, file) mlflow.log_param(“n”, n) mlflow.log_param(“learning_rate”, lr) mlflow.log_metric(“score”, score) mlflow.sklearn.log_model(model) Track parameters, metrics, output files & code version Search using UI or API
  11. 11. MLflow UI: Inspecting Runs
  12. 12. MLflow UI: Comparing Runs
  13. 13. Project Spec Code ConfigDeps Local Execution Remote Cluster Packaging Code: MLflow Projects $ mlflow run git://...
  14. 14. Model Format ONNX Flavor Python Flavor Training Apps Batch & Stream Scoring REST Serving Tools Packaging Models: MLflow Models Packaging Format . . . Inference Code MLlib
  15. 15. Model Deployment with MLflow DATA SCIENTIST PRODUCTION ENGINEER Please deploy this MLflow Model! OK, it’s up in our REST server & Spark! Please run this MLflow Project nightly for updates! Don’t even tell me what ArXiv paper that’s from...
  16. 16. MLflow Development Status Many new features since our release in June • Model packaging for MLlib, H2O, TensorFlow, PyTorch, Keras • Storage on Azure, AWS, Google, SFTP • Java and Scala API • New examples and UI features Just released MLflow 0.7.0 today!
  17. 17. Major Announcement in MLflow 0.7.0 + RStudio partnered with Databricks to add an MLflow R API See Kevin Kuo’s talk on this at 14:40 today!
  18. 18. Conclusion Workflow platforms can greatly simplify ML development • Improve usability for both data scientists and engineers Get started at mlflow.org, and come see our other talks: • MLflow R API: 14:40 today • ML Factories: 11:40 Thursday • 1h Deep Dive: 14:00 Thursday

×