Successfully reported this slideshow.
Your SlideShare is downloading. ×

Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow

Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow

Download to read offline

ML development brings many new complexities beyond the traditional software development lifecycle. ML projects, unlike software projects, after they were successfully delivered and deployed, cannot be abandoned but must be continuously monitored if model performance still satisfies all requirements.

ML development brings many new complexities beyond the traditional software development lifecycle. ML projects, unlike software projects, after they were successfully delivered and deployed, cannot be abandoned but must be continuously monitored if model performance still satisfies all requirements.

Advertisement
Advertisement

More Related Content

Advertisement
Advertisement

Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow

  1. 1. Continuous delivery of ML pipelines on Databricks using MLflow and CICD Templates Michael Shtelma, Databricks Thunder Shiviah, Databricks
  2. 2. Agenda The Challenges of implementing CICD for ML (CD4ML) pipelines The CICD challenges forcing ML teams to choose between Databricks notebooks or local IDEs Introducing DatabricksLabs CICD Templates How CICD Templates solves ML team production challenges Demo and Next Steps
  3. 3. Sato, Wider, and Windheuser, 2019 Continuous Delivery for Machine Learning (CD4ML) is a software engineering approach in which a cross-functional team produces machine learning applications based on code, data, and models in small and safe increments that can be reproduced and reliably released at any time, in short adaptation cycles.
  4. 4. What challenges do ML teams face when then try to implement CD4ML?
  5. 5. ML teams struggle to combine traditional CICD tools with Databricks notebooks 1. Benefits to Databricks notebooks ▪ Easy to use ▪ Scalable ▪ Provides access to ML tools such as mlflow for model logging and serving 2. Challenges ▪ Non-trivial to hook into traditional software development tools such as CI tools or local IDEs. 3. Result ▪ Teams find themselves choosing between ▪ using traditional IDE based workflows but struggling to test and deploy at scale or ▪ using Databricks notebooks or other cloud notebooks but then struggling to ensure testing and deployment reliability via CICD pipelines.
  6. 6. What’s the solution?
  7. 7. CICD Templates gives you the benefits of traditional CICD workflows and the scale of databricks clusters CICD Templates allows you to ● create a production pipeline via template in a few steps ● that automatically hooks to github actions and ● runs tests and deployments on databricks upon git commit or whatever trigger you define and ● gives you a test success status directly in github so you know if your commit broke the build
  8. 8. A scalable CICD pipeline in 5 easy steps 1. Install and customize with a single command 2. Create a new github repo containing your databricks host and token secrets 3. Initialize git in your repo and commit the code. 4. Push your new cicd templates project to the repo. Your tests will start running automatically on Databricks. Upon your tests’ success or failure you will get a green checkmark or red x next to your commit status. 5. You’re done! You now have a fully scalable CICD pipeline. 1 2 3 4 5
  9. 9. CICD Templates executes tests and deployments directly on databricks while storing packages, model logging and other artifacts in Mlflow
  10. 10. Push Flow
  11. 11. Release Flow
  12. 12. Folder Structure
  13. 13. Demo: CICD Templates
  14. 14. Summary The Challenges of implementing CD4ML The CICD challenges forcing ML teams to choose between Databricks notebooks or local IDEs Introducing DatabricksLabs CICD Templates How CICD Templates solves ML team production challenges Next Steps Search DatabricksLabs cicd-templates or go directly to https://github.com/databrickslabs/cicd-templates to get started michael.shtelma@databricks.com thunder.shiviah@databricks.com
  15. 15. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.
  16. 16. Colors
  17. 17. Color Palette Primary Colors
  18. 18. Content Slides
  19. 19. Basic Slide ▪ Bullet 1 ▪ Sub-bullet ▪ Sub-bullet ▪ Bullet 2 ▪ Sub-bullet ▪ Sub-bullet ▪ Bullet 3 ▪ Sub-bullet ▪ Sub-bullet
  20. 20. Reduce Long Titles ▪ Bullet 1 ▪ Sub-bullet ▪ Sub-bullet ▪ Bullet 2 ▪ Sub-bullet ▪ Sub-bullet By splitting them into a short title, and a more detailed subtitle using this slide format that includes a subtitle area
  21. 21. Two Columns ▪ Bulleted list format ▪ Bulleted list format ▪ Bulleted list format ▪ Bulleted list format ▪ Bulleted list format ▪ Bulleted list format ▪ Bulleted list format ▪ Bulleted list format Headline FormatHeadline Format
  22. 22. Two Box ▪ Bulleted list ▪ Bulleted list ▪ Bulleted list ▪ Bulleted list CategoryCategory
  23. 23. Three Box ▪ Bulleted list ▪ Bulleted list ▪ Bulleted list ▪ Bulleted list CategoryCategory ▪ Bulleted list ▪ Bulleted list Category
  24. 24. Four Box ▪ Bulleted list ▪ Bulleted list ▪ Bulleted list ▪ Bulleted list CategoryCategory ▪ Bulleted list ▪ Bulleted list Category ▪ Bulleted list ▪ Bulleted list Category
  25. 25. Shapes
  26. 26. Shapes Pill-shaped rectangle Double corner rectangle Double corner rectangle
  27. 27. Tables and Charts
  28. 28. Table Column Column Column Row Value Value Value Row Value Value Value Row Value Value Value Row Value Value Value Row Value Value Value Row Value Value Value Row Value Value Value
  29. 29. Bar chart
  30. 30. Line chart
  31. 31. Pie Chart
  32. 32. Quotes and Text Callouts
  33. 33. Attribution Format Second line of attribution This is a template for a quote slide. This is where the quote goes. Attribute the source below…
  34. 34. Text
  35. 35. Text
  36. 36. Logos
  37. 37. Spark + AI Summit Logos
  38. 38. Databricks Logos
  39. 39. Open Source Logos
  40. 40. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

×