Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform


Published on

In large enterprises, large solutions are sometimes required to tackle even the smallest tasks and ML is no different. At Comcast we are building a comprehensive, configuration based, continuously integrated and deployed platform for data pipeline transformations, model development and deployment. This is accomplished using a range of tools and frameworks such as Databricks, MLflow, Apache Spark and others. With a Databricks environment used by hundreds of researchers and petabytes of data, scale is critical to Comcast, so making it all work together in a frictionless experience is a high priority. The platform consists of a number of components: an abstraction for data pipelines and transformation to allow our data scientists the freedom to combine the most appropriate algorithms from different frameworks , experiment tracking, project and model packaging using MLflow and model serving via the Kubeflow environment on Kubernetes. The architecture, progress and current state of the platform will be discussed as well as the challenges we had to overcome to make this platform work at Comcast scale. As a machine learning practitioner, you will gain knowledge in: an example of data pipeline abstraction; ways to package and track your ML project and experiments at scale; and how Comcast uses Kubeflow on Kubernetes to bring everything together.

Published in: Data & Analytics
  • Login to see the comments

How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform

  1. 1. WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
  2. 2. Nick Pinckernell, Comcast Applied AI Research Utilizing MLFlow and Kubernetes to build an Enterprise ML Platform #UnifiedAnalytics #SparkAISummit
  3. 3. Topics TOPIC WHY? Example of data pipeline abstraction Modular components and reuse are important for abstracting complex systems Ways to package and track ML project and experiments Consistency and reproducibility is key for scale How Comcast uses Kubeflow to serve and deploy models and pipelines A tangible example to help you brainstorm about your organizations requirements 3#UnifiedAnalytics #SparkAISummit
  4. 4. Challenges and motivations • Before, there was no – model management or tracking – standardization for model packaging or deployments • Cumbersome deployment process – Deployment required code rewrite from research to operations – Days or weeks to deploy • Response and tradeoff: restrict model complexity 4#UnifiedAnalytics #SparkAISummit
  5. 5. Requirements Minimum requirements from our organization • Zero code refactoring or rewriting between research ready models and production • Easier experiment and model tracking • Researchers need to deploy their own models • A/B testing for quick model enhancement testing in production • Ability to modularize and inject custom metrics and workflows at each step 5#UnifiedAnalytics #SparkAISummit
  6. 6. Solution – existing technologies 6#UnifiedAnalytics #SparkAISummit Research Model serving Images, containers
  7. 7. Data pipeline abstraction • Determine use cases • Identify commonalities for modularization • Abstract interfaces • Automate configuration 7#UnifiedAnalytics #SparkAISummit
  8. 8. Pipeline abstraction 8#UnifiedAnalytics #SparkAISummit
  9. 9. Pipeline abstraction 9#UnifiedAnalytics #SparkAISummit
  10. 10. Pipeline abstraction 10#UnifiedAnalytics #SparkAISummit
  11. 11. Pipeline abstraction 11#UnifiedAnalytics #SparkAISummit
  12. 12. Pipeline abstraction 12#UnifiedAnalytics #SparkAISummit
  13. 13. Pipeline abstraction 13#UnifiedAnalytics #SparkAISummit
  14. 14. Pipeline abstraction 14#UnifiedAnalytics #SparkAISummit
  15. 15. Pipeline abstraction 15#UnifiedAnalytics #SparkAISummit
  16. 16. Pipeline abstraction 16#UnifiedAnalytics #SparkAISummit
  17. 17. Pipeline abstraction 17#UnifiedAnalytics #SparkAISummit
  18. 18. Pipeline abstraction 18#UnifiedAnalytics #SparkAISummit
  19. 19. Pipeline abstraction 19#UnifiedAnalytics #SparkAISummit
  20. 20. Seldon inference graphs Allows for complex graphs • A/B testing • Ensembles • Multi-armed bandit • Custom combinations 20#UnifiedAnalytics #SparkAISummit
  21. 21. Packaging and tracking 1. Researchers code and train models with Databricks, Spark 2. Experiments tracked with MLFlow 3. Packaging and model tracking with MLFlow and Kubeflow • MLFLow standard packaging formats • scikit-learn • h2o • TensorFlow • more 21#UnifiedAnalytics #SparkAISummit
  22. 22. An MLFlow experiment 22#UnifiedAnalytics #SparkAISummit
  23. 23. MLFlow – multiple experiments 23#UnifiedAnalytics #SparkAISummit
  24. 24. MLFlow – multiple experiments 24#UnifiedAnalytics #SparkAISummit
  25. 25. MLFlow packaging 25#UnifiedAnalytics #SparkAISummit
  26. 26. Research and model flow 26#UnifiedAnalytics #SparkAISummit
  27. 27. Research and model flow 27#UnifiedAnalytics #SparkAISummit
  28. 28. Research and model flow – at scale 28#UnifiedAnalytics #SparkAISummit
  29. 29. Model serving with Kubeflow Considerations and requirements • Resilient • Highly available • Rate limiting • Shadow deployments • Auto-scaling (WIP) 29#UnifiedAnalytics #SparkAISummit Ambassador
  30. 30. Throughput Static number of replicas Determined after • Constant and burst load testing with Locust 30#UnifiedAnalytics #SparkAISummit
  31. 31. DEMO A demonstration of • MLFlow experiments – Serving the chosen model • Implementation of components – Consumer pod – Model pod – Producer logic (to simulate real requests) 31#UnifiedAnalytics #SparkAISummit
  32. 32. Choosing the run 32#UnifiedAnalytics #SparkAISummit
  33. 33. Choosing the model 33#UnifiedAnalytics #SparkAISummit
  34. 34. Implementing the model 34#UnifiedAnalytics #SparkAISummit
  35. 35. Implementing the consumer 35#UnifiedAnalytics #SparkAISummit
  36. 36. Implementing the producer 36#UnifiedAnalytics #SparkAISummit
  37. 37. Deploy the model • Define the YAML / JSON Seldon deployment • Build the image – s2i build -E environment_rest . seldonio/seldon-core-s2i-python3:0.6-SNAPSHOT sklearn-iris-mlflow:0.3 • Deploy – kubectl create -f sklearn_iris_deployment.json -n kubeflow 37#UnifiedAnalytics #SparkAISummit
  38. 38. 38#UnifiedAnalytics #SparkAISummit Grafana metrics