1
General Information
APPLY MLOPS AT SCALE
Keven Wang @ H&M
2
General Information
H&M AI use cases
F E B 2 0 2 1
Analytics and Data Platform
Logistics
Production Sales Marketing
Design / Buying
Assortment quantification
Fashion Forecast
Allocation Markdown Online
Markdown Store
Personalized Promotions,
Recommendations & Journeys
Movebox
Knowledge &
Best Practice
AI exploration
and Research
Rapid Dev
enablement
AI platform
General Information
Version compatibility
Reproducibility
Model approval process
Model format
Experiment strategy
Fast feedback loop
Model traceability
Model evaluation
Automated model training
MLOps
Scalability
4
General Information
Machine Learning Process
F E B 2 0 2 1
Model Deployment
Model training
Data acquisition
Data
preparation
Feature
Engineering
Model training
Model
repository
Unseen data
acquisition
Data
preparation
Transform data
into feature
Model
prediction Results
Deployment orchestration
Data
storage
Training orchestration
Model and data versioning
Automated, e2e feedback loop
e2e monitoring
5
General Information
MLOps tech stack
F E B 2 0 2 1
6
General Information
Interactive model training
F E B 2 0 2 1
Kubernetes
Container
Registry
Triggering
CI Orchestrator
Model
repository
Azure Databricks
1 Code commit
2 code static check,
unit test,
Packaging
3.2 Trigger pipeline
4.3 Commit model
5.1 Fetch model
5.2 Build container image
6 Push image
7 Auto deploy
PyCharm
3.1 Push
to DBFS
4.2 log model info
4.1 job execution
Demo 1
Interactive Model Development
8
General Information
Automated model training 1
F E B 2 0 2 1
Scenario 1
• Geo location l1
• Product type p1
• Time t1
Scenario 2
• Geo location l2
• Product type p2
• Time t2
Scenario 3
• Geo location l3
• Product type p3
• Time t3
Scenario i
• Geo location li
• Product type pi
• Time ti
Scenario set
Source
data
Prep
data
Feature
engine…
Train Optimize
Source
data
Prep
data
Feature
engine…
Train Optimize
Source
data
Prep
data
Feature
engine…
Train Optimize
Source
data
Prep
data
Feature
engine…
Train Optimize
Databricks Cluster
Databricks Cluster
Databricks Cluster
VM
VM
Container
9
General Information
Automated model training 2
F E B 2 0 2 1
Scenario
set
Scenario
task 1
Source
data
Prep data
Feature
engine…
Train Optimize
Scenario
task 1
Source
data
Prep data
Feature
engine…
Train Optimize
Scenario
task 1
Source
data
Prep data
Feature
engine…
Train Optimize
Scenario
task 1
Source
data
Prep data
Feature
engine…
Train Optimize
Scenario
set
Scenario
task 1
Source
data
Prep data
Feature
engine…
Train Optimize
Scenario
task 1
Source
data
Prep data
Feature
engine…
Train Optimize
Scenario
task 1
Source
data
Prep data
Feature
engine…
Train Optimize
Scenario
task 1
Source
data
Prep data
Feature
engine…
Train Optimize
DAG
Scenario
set
Scenario 1
Source
data
Prep data
Feature
engine…
Train Optimize
Scenario 2 Source
data
Prep data
Feature
engine…
Train Optimize
Scenario 3
Source
data
Prep data
Feature
engine…
Train Optimize
Scenario i
Source
data
Prep data
Feature
engine…
Train Optimize
Databricks
Cluster
Databricks
Cluster
Databricks
Cluster
Azure Kubernetes Service
Container Registry
Airflow
Logs
Airflow
dags
Persistent
Volume
Airflow
Webserver
Airflow
Scheduler
Kubernetes Pod
Azure File share
10
General Information
Model management and lifecycle
F E B 2 0 2 1
Staging Production
Model Aproval
Back Test
Model Development
PR pipeline
Back test
pipeline
Trainning CI
pipeline
CD – Staging
Pipeline
CD – prod
pipeline CI/CD pipeline
develop feature
Pull Req
Infra as code
#dev #stage #prod
Infra as code Infra as code
Demo 2
Model Serving
12
General Information
Online model serving - Seldon
F E B 2 0 2 1
13
Take away
Leverage cloud native service
Problem, Process and Architecture
DS & SW engineering
General Information
THANK YOU

Apply MLOps at Scale by H&M

  • 1.
    1 General Information APPLY MLOPSAT SCALE Keven Wang @ H&M
  • 2.
    2 General Information H&M AIuse cases F E B 2 0 2 1 Analytics and Data Platform Logistics Production Sales Marketing Design / Buying Assortment quantification Fashion Forecast Allocation Markdown Online Markdown Store Personalized Promotions, Recommendations & Journeys Movebox Knowledge & Best Practice AI exploration and Research Rapid Dev enablement AI platform
  • 3.
    General Information Version compatibility Reproducibility Modelapproval process Model format Experiment strategy Fast feedback loop Model traceability Model evaluation Automated model training MLOps Scalability
  • 4.
    4 General Information Machine LearningProcess F E B 2 0 2 1 Model Deployment Model training Data acquisition Data preparation Feature Engineering Model training Model repository Unseen data acquisition Data preparation Transform data into feature Model prediction Results Deployment orchestration Data storage Training orchestration Model and data versioning Automated, e2e feedback loop e2e monitoring
  • 5.
  • 6.
    6 General Information Interactive modeltraining F E B 2 0 2 1 Kubernetes Container Registry Triggering CI Orchestrator Model repository Azure Databricks 1 Code commit 2 code static check, unit test, Packaging 3.2 Trigger pipeline 4.3 Commit model 5.1 Fetch model 5.2 Build container image 6 Push image 7 Auto deploy PyCharm 3.1 Push to DBFS 4.2 log model info 4.1 job execution
  • 7.
  • 8.
    8 General Information Automated modeltraining 1 F E B 2 0 2 1 Scenario 1 • Geo location l1 • Product type p1 • Time t1 Scenario 2 • Geo location l2 • Product type p2 • Time t2 Scenario 3 • Geo location l3 • Product type p3 • Time t3 Scenario i • Geo location li • Product type pi • Time ti Scenario set Source data Prep data Feature engine… Train Optimize Source data Prep data Feature engine… Train Optimize Source data Prep data Feature engine… Train Optimize Source data Prep data Feature engine… Train Optimize Databricks Cluster Databricks Cluster Databricks Cluster VM VM Container
  • 9.
    9 General Information Automated modeltraining 2 F E B 2 0 2 1 Scenario set Scenario task 1 Source data Prep data Feature engine… Train Optimize Scenario task 1 Source data Prep data Feature engine… Train Optimize Scenario task 1 Source data Prep data Feature engine… Train Optimize Scenario task 1 Source data Prep data Feature engine… Train Optimize Scenario set Scenario task 1 Source data Prep data Feature engine… Train Optimize Scenario task 1 Source data Prep data Feature engine… Train Optimize Scenario task 1 Source data Prep data Feature engine… Train Optimize Scenario task 1 Source data Prep data Feature engine… Train Optimize DAG Scenario set Scenario 1 Source data Prep data Feature engine… Train Optimize Scenario 2 Source data Prep data Feature engine… Train Optimize Scenario 3 Source data Prep data Feature engine… Train Optimize Scenario i Source data Prep data Feature engine… Train Optimize Databricks Cluster Databricks Cluster Databricks Cluster Azure Kubernetes Service Container Registry Airflow Logs Airflow dags Persistent Volume Airflow Webserver Airflow Scheduler Kubernetes Pod Azure File share
  • 10.
    10 General Information Model managementand lifecycle F E B 2 0 2 1 Staging Production Model Aproval Back Test Model Development PR pipeline Back test pipeline Trainning CI pipeline CD – Staging Pipeline CD – prod pipeline CI/CD pipeline develop feature Pull Req Infra as code #dev #stage #prod Infra as code Infra as code
  • 11.
  • 12.
    12 General Information Online modelserving - Seldon F E B 2 0 2 1
  • 13.
    13 Take away Leverage cloudnative service Problem, Process and Architecture DS & SW engineering
  • 14.