DevOps for Machine Learning overview en-us

Jordan Edwards
Senior Program Manager, ML Platform

Continuous Integration (CI)
• Blend together the work of individual
engineers in a repository.
• Each time you commit code, it’s
automatically built and tested, and
bugs are detected faster.
Continuous Deployment (CD)
• Automate the entire process from
code commit to production (if your
CI/CD tests are successful.)
Continuous Learning & Monitoring
• Safely deliver features to your
customers as soon as they’re ready.
• Monitor your features in production
and know when they aren’t behaving
as expected.

ML DevOps lifecycle
Experiment
Data Acquisition
Business Understanding
Initial Modeling
Develop
Modeling
Operate
Continuous Delivery
Data Feedback Loop
System + Model Monitoring
Experiment
+ Testing
Continuous Integration
Continuous Deployment

Overcome that data science teams only own experiments, instead of being responsible for the
end-to-end flow from experiment to production to operational support on AI.
Benefits:
• Continuous delivery of value (data insights, models) to end users.
• End-to-end ownership of the Analytics Lifecycle by DS teams
• Enforcing a consistent approach to building and deploying AI
• Extending data science with SDE practices to increase delivery quality and cadence.
• Framework for continuous learning, lineage, auditability and regulatory compliance.
• Improving team collaboration through standardization in delivery practices.

Use leaderboards, side by side run
comparison and model selection
Capture run metrics, intermediate
outputs, output logs and models
Produce Repeatable Experiments
80%
75%
90%
85%
Use well-defined pipelines
to capture the E2E model
training process

• Track model versions & metadata with a centralized
model registry
• Leverage containers to capture runtime
dependencies for inference
• Leverage an orchestrator like Kubernetes to provide
scalable inference
• Capture model telemetry – health, performance,
inputs / outputs
• Encapsulate each step in the lifecycle to enable
CI/CD and DevOps
• Automatically optimize models to take advantage of
hardware acceleration

Prepare
Data
Register &
Manage Model
Model training &
testing
Package &
Validate Model
…
Feature engineering Deploy Service
Monitor Model
Prepare Experiment Deploy
Data science workflow

App Developer IDE
Data Scientist
[ { "cat": 0.99218,
"feline": 0.81242 }]
IDE
Consume Model
DevOps
Pipeline
Predict
Update
Application
Publish Model
Deploy
Application
Validate
App

App Developer IDE
Data Scientist
[ { "cat": 0.99218,
"feline": 0.81242 }]
Model Store
Consume Model
DevOps
Pipeline
Predict
Validate
App
Update
Application
Deploy
Application
Publish Model

App Developer IDE
[ { "cat": 0.99218,
"feline": 0.81242 }]
Model Store
Consume Model
DevOps
Pipeline
Validate
Model
Predict
Validate
Model + App
Update
Application
Deploy
Application
Data Scientist
Publish Model

App Developer IDE
[ { "cat": 0.99218,
"feline": 0.81242 }]
Model Store
Consume Model
DevOps
Pipeline
Validate
Model
Predict
Validate
Model + App
Update
Application
Deploy
Application
Data Scientist
Publish Model
Collect
Feedback
Retrain Model
AB Test

App Developer
Cloud Services
IDE
Data Scientist
[ { "cat": 0.99218,
"feline": 0.81242 }]
IDE
Apps
Edge Devices
Model Store
Consume Model
DevOps
Pipeline
Customize Model
Deploy Model
Predict
Validate
&
Flight
Model
+
App
Update
Application
Publish Model
Collect
Feedback
Deploy
Application
Model
Telemetry
Retrain Model

Source Code DevOps Pipeline
Register
Model
Training Pipeline
Data
Movement
Data Prep Model Training
Model
Store
DevOps Pipeline
DevTest
Deploy to PROD
Package
Model
Validate
Model
Get
Human
Approval
MODEL CI/CD (Machine Learning as a Service + DevOps)
Azure DevOps
Azure Machine Learning
Azure Data Factory
New model
registered,
trigger
release
ML Pipeline handles dataPrep,
training, evaluation – certifies the
model is of high quality
TRAIN MODEL
DEPLOY MODEL
Unit Test
Code
Code change,
trigger CI
Inference Data
Data Preparation Services
(Labeling, Feedback, Drift)
Data Lake New data, trigger CI
Data
Cooking
Pipeline
New inference
code, trigger
release
Data Warehouse
New training job is
started whenever source
code is pushed.

Continuous Integration and Delivery
Build Model (app) (testing + validation)
Deploy Resources
Deploy Model (app)
Logging & Monitoring
Real-Time
Azure Kubernetes Service
Application Performance Monitoring
Azure ML Experiments
Docker +
Conda Env.
Model / Data Monitoring
Batch
Azure ML Pipelines
Data Collection

• Training data
• Featurization code (w/ tests)
• Training pipeline
• Training environment
• Evidence chain
• Model config
• Training job info
• Sample data
• Data profile
Use repeatable pipelines for your ML
workflow – they can get complicated.

Source Control
• Track changes in code (and configuration) over time, integrate work,
reproducibility and collaboration.
Dataset Versioning
• Training data plays an important role in the quality of the software
build. Hence, versioning of data is required for reproducability.
Model Versioning
• Version trained models in relation to code and training data for
traceability.
Experiment Tracking
• Version model experiment runs to understand which code, data and
e.g. selected features led to what output and performance, and
allow for reproducibility.

• The model response on a given record is not the expected one.
• Investigate the trainset and detect potential bias.
• Ensure that the preprocessing is not clipping any values etc.
• Document these corner cases & add them to validation process
Edge cases
• This type of bugs refers to the resiliency of the model in case of missing
values and how well can it handle unseen categorical values.
Null values /
unknown categories
• An input stream may stop producing data causing unexpected responses by
the model.
Input issues

Test Type Data Scientist App Dev / Ops
Unit Tests X
Data Integrity Tests X
Model Performance X
Model Validation X
Integration Tests X X
Load Tests X
Data Monitoring X
Skew Monitoring X
Model Monitoring X X

• Data (changes to shape / profile)
• Model in isolation (offline A/B)
• Model + app (functional testing)
• Only deploy after initial validation passes
• Ramp up traffic to new model using A/B
experimentations
• Functional behavior
• Performance characteristics

• which data,
• which experiment / previous model(s),
• where’s the code / notebook)
• Was it converted / quantized?
• Private / compliant data

• Focus on ML, not DevOps
• Get telemetry for service health and model behavior
• code-generation
• API specifications / interfaces
• Cloud Services
• Mobile / Embedded Applications
• Edge Devices
• Quantize / optimize models for target platform
• Compliant + Safe

© Microsoft Corporation
DevOps brings together people, processes, and technology, automating software delivery to provide continuous
value to your users. Using Azure DevOps, you can deliver software faster and more reliably - no matter how big
your IT department or what tools you’re using.
DevOps for ML: Supporting Technologies
Infrastructure as Code CI/CD Testing / Release / Monitoring
• Azure Resource Manager Templates
• Azure ML Python SDK & CLI
• Azure SDK’s
• Azure DevOps Pipelines
• Azure ML Training Services
• Azure Repos / GitHub
• Azure Boards
• Azure DevOps for automated testing
• R - Runit and testthat
• Python - PyUnit, pytest, nose, …
• Azure ML Tracking
• Azure Data Prep SDK (analyse/profile)
• Azure ML Model Management
(Instrumentation, Telemetry)
• Azure Monitor for app telemetry

Model trainer
Model trainer
Model trainer

Azure Machine Learning service
Set of Azure Cloud
Services
Python
SDK
 Prepare Data
 Build Models
 Train Models
 Manage Models
 Track Experiments
 Deploy Models
That enables you to:

IT/Ops
ML Scientist
Dev/Ops
Azure Machine Learning – Key Concepts

Azure ML service Artifact
Workspace
The workspace is the top-level resource for the Azure Machine Learning service.
It provides a centralized place to work with all the artifacts you create when using Azure Machine
Learning service.
The workspace keeps a list of compute targets that can be used to train your model. It also keeps a
history of the training runs, including logs, metrics, output, and a snapshot of your scripts.
Models are registered with the workspace.
You can create multiple workspaces, and each workspace can be shared by multiple people.
When you create a new workspace, it automatically creates these Azure resources:
Azure Container Registry - Registers docker containers that are used during training and when
deploying a model.
Azure Storage - Used as the default datastore for the workspace.
Azure Application Insights - Stores monitoring information about your models.
Azure Key Vault - Stores secrets used by compute targets and other sensitive information needed
by the workspace.

Azure ML service
Key Artifacts
Workspace

ML Pipelines
Increase experiment velocity, reliability, repeatability
Use the technology of your choice for each step
Create & manage ML workflows concurrently
Define steps to prepare data, train, deploy, eval
Use diverse languages & run on diverse compute
Easy to compose and swap out steps as your workflow
evolves
Features
Sequencing and parallelization of steps, declarative
data dependencies
Unattended execution for long running pipeline, mixed
and diverse (heterogeneous) compute for steps
Data management and reusable components. Share
pipelines, code, intermediate data, and models
Compute
#1, #2
Compute
#3
Compute
#4
ML Pipeline
2 3
5 6
8
1 4
7
REST API w/ parameters enables retraining and batch
scoring
Fine controls for compute provision and deprovision

Azure ML – Models and Model Registry
Model Model Registry

Cloud-hosted pipelines for Linux, Windows and macOS.
Azure DevOps Pipelines
Any language, any platform, any cloud
Build, test, and deploy Node.js, Python, Java, PHP,
Ruby, C/C++, .NET, Android, and iOS apps. Run in
parallel on Linux, macOS, and Windows. Deploy to
Azure, AWS, GCP or on-premises
Extensible
Explore and implement a wide range of community-
built build, test, and deployment tasks, along with
hundreds of extensions from Slack to SonarCloud.
Support for YAML, reporting and more
Containers and Kubernetes
Easily build and push images to container registries
like Docker Hub and Azure Container Registry.
Deploy containers to individual hosts or Kubernetes.
https://azure.com/pipelines


DevOps for Machine Learning overview en-us

DevOps for Machine Learning overview en-us

Recommended

Recommended

More Related Content

Similar to DevOps for Machine Learning overview en-us

Similar to DevOps for Machine Learning overview en-us (20)

Recently uploaded

Recently uploaded (20)

DevOps for Machine Learning overview en-us

Editor's Notes