Successfully reported this slideshow.
Your SlideShare is downloading. ×

Apply MLOps at Scale

Ad

Apply MLOps At Scale
Keven(Qi) Wang
Linkedin: https://www.linkedin.com/in/kevenqiwang/
Medium: https://medium.com/@kevenwa...

Ad

Agenda
AI journey @H&M
Quick facts and use cases
Reference Architecture gen1
ML process and ML training
Reference Architec...

Ad

AI journey @H&M
Quick facts and use cases

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Check these out next

1 of 24 Ad
1 of 24 Ad

Apply MLOps at Scale

Download to read offline

This session is continuation of “Automated Production Ready ML at Scale” in last Spark AI Summit at Europe. In this session you will learn about how H&M evolves reference architecture covering entire MLOps stack addressing a few common challenges in AI and Machine learning product, like development efficiency, end to end traceability, speed to production, etc.

This session is continuation of “Automated Production Ready ML at Scale” in last Spark AI Summit at Europe. In this session you will learn about how H&M evolves reference architecture covering entire MLOps stack addressing a few common challenges in AI and Machine learning product, like development efficiency, end to end traceability, speed to production, etc.

More Related Content

Slideshows for you (19)

Similar to Apply MLOps at Scale (20)

More from Databricks (20)

Apply MLOps at Scale

  1. 1. Apply MLOps At Scale Keven(Qi) Wang Linkedin: https://www.linkedin.com/in/kevenqiwang/ Medium: https://medium.com/@kevenwang_33862 Lead AI Architect @ H&M
  2. 2. Agenda AI journey @H&M Quick facts and use cases Reference Architecture gen1 ML process and ML training Reference Architecture gen2 MLOps and Operationalize AI
  3. 3. AI journey @H&M Quick facts and use cases
  4. 4. General Information 74 markets 5000+ stores 177,000 employees More than Over Sales including VAT SEK 210 billion (2018) E-commerce in 51 markets
  5. 5. Our Journey 2016 Exploration Run initial PoCs Test AA appetite & applicability 2017 Initiation Industrialize early use cases Defining organization and capability needs Establishing the IT / data environment 2018 Establish AA & AI function Roll-out & hand over of successful pilots Establishing AA-WoW, team, governance 2019 AA Leader Increasingly data & algo-driven retail business Analytical support across entire value chain Strong internal AA teams Engage in partnership with strong AI players 2022 AI Leader of the Fashion Industry Lead the frontier of AI at scale in delivering customer value Global leader in developing talent pools and supporting AI hubs and networks AI-powered tools and capabilities supporting core processes and business decisions in all functions World leading ecosystem of cutting edge AI partners Today Algo library, IT platform, Business Impact
  6. 6. H&M use cases Analytics and Data Platform LogisticsProduction Sales MarketingDesign / Buying Assortment quantification Fashion Forecast Allocation Markdown Online Markdown Store Personalized Promotions, Recommendations & Journeys Movebox Knowledge & Best Practice AI exploration and Research Rapid Dev enablement AI platform
  7. 7. AI @ H&M quick facts 100+ co-located FTEs Growing # of colleagues 30+ different nationalities Several nationalities Combined teams Sprints Standups Product mgmt. Epics Algo Cloud New ways of working Consultants HAAL Azure Databricks
  8. 8. Reference Architecture gen1 ML process and ML training
  9. 9. Starting point – fragemented architecture
  10. 10. ML Process and Tooling Model Deployment Model training Data acquisition Data preparation Feature Engineering Model training Model repository Unseen data acquisition Data preparation Transform data into feature Model prediction Results Deployment orchestration Datastorage Training orchestration Data Lake Store Model and data versioning Automated, e2e feedback loop e2e monitoring
  11. 11. Interactive model development Kubernetes Container Registry Triggering CI Orchestrator Model repository Azure Databricks 1 Code commit 2 code static check, unit test, Packaging 3.2 Trigger pipeline 4.3 Commit model 5.1 Fetch model 5.2 Build container image 6 Push image 7 Auto deploy PyCharm 3.1 Push to DBFS 4.2 log model info 4.1 job execution
  12. 12. Automated model training pipeline 1 Scenario 1 • Geo location l1 • Product type p1 • Time t1 Scenario 2 • Geo location l2 • Product type p2 • Time t2 Scenario 3 • Geo location l3 • Product type p3 • Time t3 Scenario i • Geo location li • Product type pi • Time ti Scenario set Source data Prep data Feature engine… Train Optimize Source data Prep data Feature engine… Train Optimize Source data Prep data Feature engine… Train Optimize Source data Prep data Feature engine… Train Optimize Databricks Cluster Databricks Cluster Databricks Cluster VM VM Container
  13. 13. Automated model training pipeline 2 Scenario set Scenario task 1 Source data Prep data Feature engine… Train Optimize Scenario task 1 Source data Prep data Feature engine… Train Optimize Scenario task 1 Source data Prep data Feature engine… Train Optimize Scenario task 1 Source data Prep data Feature engine… Train Optimize Scenario set Scenario task 1 Source data Prep data Feature engine… Train Optimize Scenario task 1 Source data Prep data Feature engine… Train Optimize Scenario task 1 Source data Prep data Feature engine… Train Optimize Scenario task 1 Source data Prep data Feature engine… Train Optimize DAG Scenario set Scenario 1 Source data Prep data Feature engine… Train Optimize Scenario 2 Source data Prep data Feature engine… Train Optimize Scenario 3 Source data Prep data Feature engine… Train Optimize Scenario i Source data Prep data Feature engine… Train Optimize Databricks Cluster Databricks Cluster Databricks Cluster Azure Kubernetes Service Container RegistryAirflow Logs Airflow dags Persistent Volume Airflow Webserver Airflow Scheduler Kubernetes Pod Azure File share Airflow MetaDB
  14. 14. Trick for Airflow dependency challenge Actual python method Little trick: python_callable Call the function without import the module For more detail, check this blog post: https://medium.com/@kevenwang_33862/machine-learning-in-production-2-large-scale-ml-training-889cde94f26d
  15. 15. 15 General Information Evolve to scale and industrialize across H&M Make AI available for product teams across H&M Group Facilitate scalability and specialization Continue to build word-class AI products, engines and core components Proven the value in use case by use case Now: to reach next level we need to industrialize and scale AI across H&M
  16. 16. Reference Architecture gen2 MLOps and Operationalize AI
  17. 17. General Information Version compatibility Reproducibility Approve process Model format Experiment strategy Feedback loop Model traceability Model metadata Deployment strategy MLOps Scalability
  18. 18. MLOps tech stack
  19. 19. Model development - Interactive VS Automated ▪ AI product lifecycle ▪ Notebook and Python modules ▪ Container as first class citizen ▪ Airflow VS Kubeflow
  20. 20. Model serving – deployment strategy Router Model 1.1 Router (canary) Model 1.1 Model 1.2 Router (shadow) Model 1.1 Model 1.2 Router Model A1 Model A2 Model A3 Router Model A1 Model A2 Model A3 Reward System Release Strategies Experiment Strategy A/B test Experiment Strategy Multi-armed Bandit
  21. 21. Model serving – Inference Graph Router 1 (Multi-armed Bandit) Router 2 (A/B test) Model B1 Model B2 Model A1 Model A2 Model A3 Input Transformer Output Transformer
  22. 22. Model management and lifecycle Staging ProductionModel AprovalBack TestModel Development PR pipeline Back test pipeline Trainning CI pipeline CD – Staging Pipeline CD – prod pipeline CI/CD pipeline develop feature Pull Req Infra as code #dev #stage #prod Infra as code Infra as code
  23. 23. Take away ▪ Problem, Process and Architecture ▪ Platform approach ▪ Leverage cloud native service
  24. 24. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

×