SlideShare a Scribd company logo
1 of 66
Jordan Edwards
Senior Program Manager, ML Platform
Continuous Integration (CI)
• Blend together the work of individual
engineers in a repository.
• Each time you commit code, it’s
automatically built and tested, and
bugs are detected faster.
Continuous Deployment (CD)
• Automate the entire process from
code commit to production (if your
CI/CD tests are successful.)
Continuous Learning & Monitoring
• Safely deliver features to your
customers as soon as they’re ready.
• Monitor your features in production
and know when they aren’t behaving
as expected.
ML DevOps lifecycle
Experiment
Data Acquisition
Business Understanding
Initial Modeling
Develop
Modeling
Operate
Continuous Delivery
Data Feedback Loop
System + Model Monitoring
Experiment
+ Testing
Continuous Integration
Continuous Deployment
Overcome that data science teams only own experiments, instead of being responsible for the
end-to-end flow from experiment to production to operational support on AI.
Benefits:
• Continuous delivery of value (data insights, models) to end users.
• End-to-end ownership of the Analytics Lifecycle by DS teams
• Enforcing a consistent approach to building and deploying AI
• Extending data science with SDE practices to increase delivery quality and cadence.
• Framework for continuous learning, lineage, auditability and regulatory compliance.
• Improving team collaboration through standardization in delivery practices.
Use leaderboards, side by side run
comparison and model selection
Capture run metrics, intermediate
outputs, output logs and models
Produce Repeatable Experiments
80%
75%
90%
85%
Use well-defined pipelines
to capture the E2E model
training process
• Track model versions & metadata with a centralized
model registry
• Leverage containers to capture runtime
dependencies for inference
• Leverage an orchestrator like Kubernetes to provide
scalable inference
• Capture model telemetry – health, performance,
inputs / outputs
• Encapsulate each step in the lifecycle to enable
CI/CD and DevOps
• Automatically optimize models to take advantage of
hardware acceleration
Prepare
Data
Register &
Manage Model
Model training &
testing
Package &
Validate Model
…
Feature engineering Deploy Service
Monitor Model
Prepare Experiment Deploy
Data science workflow
App Developer IDE
Data Scientist
[ { "cat": 0.99218,
"feline": 0.81242 }]
IDE
Consume Model
DevOps
Pipeline
Predict
Update
Application
Publish Model
Deploy
Application
Validate
App
App Developer IDE
Data Scientist
[ { "cat": 0.99218,
"feline": 0.81242 }]
Model Store
Consume Model
DevOps
Pipeline
Predict
Validate
App
Update
Application
Deploy
Application
Publish Model
App Developer IDE
[ { "cat": 0.99218,
"feline": 0.81242 }]
Model Store
Consume Model
DevOps
Pipeline
Validate
Model
Predict
Validate
Model + App
Update
Application
Deploy
Application
Data Scientist
Publish Model
App Developer IDE
[ { "cat": 0.99218,
"feline": 0.81242 }]
Model Store
Consume Model
DevOps
Pipeline
Validate
Model
Predict
Validate
Model + App
Update
Application
Deploy
Application
Data Scientist
Publish Model
Collect
Feedback
Retrain Model
AB Test
App Developer
Cloud Services
IDE
Data Scientist
[ { "cat": 0.99218,
"feline": 0.81242 }]
IDE
Apps
Edge Devices
Model Store
Consume Model
DevOps
Pipeline
Customize Model
Deploy Model
Predict
Validate
&
Flight
Model
+
App
Update
Application
Publish Model
Collect
Feedback
Deploy
Application
Model
Telemetry
Retrain Model
Source Code DevOps Pipeline
Register
Model
Training Pipeline
Data
Movement
Data Prep Model Training
Model
Store
DevOps Pipeline
DevTest
Deploy to PROD
Package
Model
Validate
Model
Get
Human
Approval
MODEL CI/CD (Machine Learning as a Service + DevOps)
Azure DevOps
Azure Machine Learning
Azure Data Factory
New model
registered,
trigger
release
ML Pipeline handles dataPrep,
training, evaluation – certifies the
model is of high quality
TRAIN MODEL
DEPLOY MODEL
Unit Test
Code
Code change,
trigger CI
Inference Data
Data Preparation Services
(Labeling, Feedback, Drift)
Data Lake New data, trigger CI
Data
Cooking
Pipeline
New inference
code, trigger
release
Data Warehouse
New training job is
started whenever source
code is pushed.
Continuous Integration and Delivery
Build Model (app) (testing + validation)
Deploy Resources
Deploy Model (app)
Logging & Monitoring
Real-Time
Azure Kubernetes Service
Application Performance Monitoring
Azure ML Experiments
Docker +
Conda Env.
Model / Data Monitoring
Batch
Azure ML Pipelines
Data Collection
• Training data
• Featurization code (w/ tests)
• Training pipeline
• Training environment
• Evidence chain
• Model config
• Training job info
• Sample data
• Data profile
Use repeatable pipelines for your ML
workflow – they can get complicated.
Source Control
• Track changes in code (and configuration) over time, integrate work,
reproducibility and collaboration.
Dataset Versioning
• Training data plays an important role in the quality of the software
build. Hence, versioning of data is required for reproducability.
Model Versioning
• Version trained models in relation to code and training data for
traceability.
Experiment Tracking
• Version model experiment runs to understand which code, data and
e.g. selected features led to what output and performance, and
allow for reproducibility.
• The model response on a given record is not the expected one.
• Investigate the trainset and detect potential bias.
• Ensure that the preprocessing is not clipping any values etc.
• Document these corner cases & add them to validation process
Edge cases
• This type of bugs refers to the resiliency of the model in case of missing
values and how well can it handle unseen categorical values.
Null values /
unknown categories
• An input stream may stop producing data causing unexpected responses by
the model.
Input issues
Test Type Data Scientist App Dev / Ops
Unit Tests X
Data Integrity Tests X
Model Performance X
Model Validation X
Integration Tests X X
Load Tests X
Data Monitoring X
Skew Monitoring X
Model Monitoring X X
• Data (changes to shape / profile)
• Model in isolation (offline A/B)
• Model + app (functional testing)
• Only deploy after initial validation passes
• Ramp up traffic to new model using A/B
experimentations
• Functional behavior
• Performance characteristics
• which data,
• which experiment / previous model(s),
• where’s the code / notebook)
• Was it converted / quantized?
• Private / compliant data
• Focus on ML, not DevOps
• Get telemetry for service health and model behavior
• code-generation
• API specifications / interfaces
• Cloud Services
• Mobile / Embedded Applications
• Edge Devices
• Quantize / optimize models for target platform
• Compliant + Safe
ML DevOps lifecycle
Experiment
Data Acquisition
Business Understanding
Initial Modeling
Develop
Modeling
Operate
Continuous Delivery
Data Feedback Loop
System + Model Monitoring
Experiment
+ Testing
Continuous Integration
Continuous Deployment
© Microsoft Corporation
DevOps brings together people, processes, and technology, automating software delivery to provide continuous
value to your users. Using Azure DevOps, you can deliver software faster and more reliably - no matter how big
your IT department or what tools you’re using.
DevOps for ML: Supporting Technologies
Infrastructure as Code CI/CD Testing / Release / Monitoring
• Azure Resource Manager Templates
• Azure ML Python SDK & CLI
• Azure SDK’s
• Azure DevOps Pipelines
• Azure ML Training Services
• Azure Repos / GitHub
• Azure Boards
• Azure DevOps for automated testing
• R - Runit and testthat
• Python - PyUnit, pytest, nose, …
• Azure ML Tracking
• Azure Data Prep SDK (analyse/profile)
• Azure ML Model Management
(Instrumentation, Telemetry)
• Azure Monitor for app telemetry
Continuous Integration and Delivery
Build Model (app) (testing + validation)
Deploy Resources
Deploy Model (app)
Logging & Monitoring
Real-Time
Azure Kubernetes Service
Application Performance Monitoring
Azure ML Experiments
Docker +
Conda Env.
Model / Data Monitoring
Batch
Azure ML Pipelines
Data Collection
Model trainer
Model trainer
Model trainer
Azure Machine Learning service
Set of Azure Cloud
Services
Python
SDK
 Prepare Data
 Build Models
 Train Models
 Manage Models
 Track Experiments
 Deploy Models
That enables you to:
IT/Ops
ML Scientist
Dev/Ops
Azure Machine Learning – Key Concepts
Azure ML service Artifact
Workspace
The workspace is the top-level resource for the Azure Machine Learning service.
It provides a centralized place to work with all the artifacts you create when using Azure Machine
Learning service.
The workspace keeps a list of compute targets that can be used to train your model. It also keeps a
history of the training runs, including logs, metrics, output, and a snapshot of your scripts.
Models are registered with the workspace.
You can create multiple workspaces, and each workspace can be shared by multiple people.
When you create a new workspace, it automatically creates these Azure resources:
Azure Container Registry - Registers docker containers that are used during training and when
deploying a model.
Azure Storage - Used as the default datastore for the workspace.
Azure Application Insights - Stores monitoring information about your models.
Azure Key Vault - Stores secrets used by compute targets and other sensitive information needed
by the workspace.
Azure ML service
Key Artifacts
Workspace
Clone
Edit
Submit
ML Pipelines
Increase experiment velocity, reliability, repeatability
Use the technology of your choice for each step
Create & manage ML workflows concurrently
Define steps to prepare data, train, deploy, eval
Use diverse languages & run on diverse compute
Easy to compose and swap out steps as your workflow
evolves
Features
Sequencing and parallelization of steps, declarative
data dependencies
Unattended execution for long running pipeline, mixed
and diverse (heterogeneous) compute for steps
Data management and reusable components. Share
pipelines, code, intermediate data, and models
Compute
#1, #2
Compute
#3
Compute
#4
ML Pipeline
2 3
5 6
8
1 4
7
REST API w/ parameters enables retraining and batch
scoring
Fine controls for compute provision and deprovision
Azure ML – Models and Model Registry
Model Model Registry
Model Deployment
5
Cloud-hosted pipelines for Linux, Windows and macOS.
Azure DevOps Pipelines
Any language, any platform, any cloud
Build, test, and deploy Node.js, Python, Java, PHP,
Ruby, C/C++, .NET, Android, and iOS apps. Run in
parallel on Linux, macOS, and Windows. Deploy to
Azure, AWS, GCP or on-premises
Extensible
Explore and implement a wide range of community-
built build, test, and deployment tasks, along with
hundreds of extensions from Slack to SonarCloud.
Support for YAML, reporting and more
Containers and Kubernetes
Easily build and push images to container registries
like Docker Hub and Azure Container Registry.
Deploy containers to individual hosts or Kubernetes.
https://azure.com/pipelines

Continuous Integration and Delivery
Build Model (app) (testing + validation)
Deploy Resources
Deploy Model (app)
Logging & Monitoring
Real-Time
Azure Kubernetes Service
Application Performance Monitoring
Azure ML Experiments
Docker +
Conda Env.
Model / Data Monitoring
Batch
Azure ML Pipelines
Data Collection
DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-us

More Related Content

Similar to DevOps for Machine Learning overview en-us

How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
RTTS
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMaker
Amazon Web Services
 

Similar to DevOps for Machine Learning overview en-us (20)

How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
 
Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020
 
Making Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons Learned
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
Operationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at StarbucksOperationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at Starbucks
 
[AI] ML Operationalization with Microsoft Azure
[AI] ML Operationalization with Microsoft Azure[AI] ML Operationalization with Microsoft Azure
[AI] ML Operationalization with Microsoft Azure
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest Airports
 
LLMOps with Azure Machine Learning prompt flow
LLMOps with Azure Machine Learning prompt flowLLMOps with Azure Machine Learning prompt flow
LLMOps with Azure Machine Learning prompt flow
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
Past, Present and Future of DevOps Infrastructure
Past, Present and Future of DevOps InfrastructurePast, Present and Future of DevOps Infrastructure
Past, Present and Future of DevOps Infrastructure
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)
 
DevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft AzureDevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft Azure
 
DEVOPS AND MACHINE LEARNING
DEVOPS AND MACHINE LEARNINGDEVOPS AND MACHINE LEARNING
DEVOPS AND MACHINE LEARNING
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMaker
 
Cloud and Network Transformation using DevOps methodology : Cisco Live 2015
Cloud and Network Transformation using DevOps methodology : Cisco Live 2015Cloud and Network Transformation using DevOps methodology : Cisco Live 2015
Cloud and Network Transformation using DevOps methodology : Cisco Live 2015
 
Machine Learning Operations & Azure
Machine Learning Operations & AzureMachine Learning Operations & Azure
Machine Learning Operations & Azure
 
Digital Disruption with DevOps - Reference Architecture Overview
Digital Disruption with DevOps - Reference Architecture OverviewDigital Disruption with DevOps - Reference Architecture Overview
Digital Disruption with DevOps - Reference Architecture Overview
 

Recently uploaded

Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 

Recently uploaded (20)

WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 

DevOps for Machine Learning overview en-us

  • 1. Jordan Edwards Senior Program Manager, ML Platform
  • 2. Continuous Integration (CI) • Blend together the work of individual engineers in a repository. • Each time you commit code, it’s automatically built and tested, and bugs are detected faster. Continuous Deployment (CD) • Automate the entire process from code commit to production (if your CI/CD tests are successful.) Continuous Learning & Monitoring • Safely deliver features to your customers as soon as they’re ready. • Monitor your features in production and know when they aren’t behaving as expected.
  • 3.
  • 4. ML DevOps lifecycle Experiment Data Acquisition Business Understanding Initial Modeling Develop Modeling Operate Continuous Delivery Data Feedback Loop System + Model Monitoring Experiment + Testing Continuous Integration Continuous Deployment
  • 5.
  • 6. Overcome that data science teams only own experiments, instead of being responsible for the end-to-end flow from experiment to production to operational support on AI. Benefits: • Continuous delivery of value (data insights, models) to end users. • End-to-end ownership of the Analytics Lifecycle by DS teams • Enforcing a consistent approach to building and deploying AI • Extending data science with SDE practices to increase delivery quality and cadence. • Framework for continuous learning, lineage, auditability and regulatory compliance. • Improving team collaboration through standardization in delivery practices.
  • 7.
  • 8. Use leaderboards, side by side run comparison and model selection Capture run metrics, intermediate outputs, output logs and models Produce Repeatable Experiments 80% 75% 90% 85% Use well-defined pipelines to capture the E2E model training process
  • 9. • Track model versions & metadata with a centralized model registry • Leverage containers to capture runtime dependencies for inference • Leverage an orchestrator like Kubernetes to provide scalable inference • Capture model telemetry – health, performance, inputs / outputs • Encapsulate each step in the lifecycle to enable CI/CD and DevOps • Automatically optimize models to take advantage of hardware acceleration
  • 10. Prepare Data Register & Manage Model Model training & testing Package & Validate Model … Feature engineering Deploy Service Monitor Model Prepare Experiment Deploy Data science workflow
  • 11.
  • 12.
  • 13.
  • 14.
  • 15. App Developer IDE Data Scientist [ { "cat": 0.99218, "feline": 0.81242 }] IDE Consume Model DevOps Pipeline Predict Update Application Publish Model Deploy Application Validate App
  • 16. App Developer IDE Data Scientist [ { "cat": 0.99218, "feline": 0.81242 }] Model Store Consume Model DevOps Pipeline Predict Validate App Update Application Deploy Application Publish Model
  • 17. App Developer IDE [ { "cat": 0.99218, "feline": 0.81242 }] Model Store Consume Model DevOps Pipeline Validate Model Predict Validate Model + App Update Application Deploy Application Data Scientist Publish Model
  • 18. App Developer IDE [ { "cat": 0.99218, "feline": 0.81242 }] Model Store Consume Model DevOps Pipeline Validate Model Predict Validate Model + App Update Application Deploy Application Data Scientist Publish Model Collect Feedback Retrain Model AB Test
  • 19.
  • 20. App Developer Cloud Services IDE Data Scientist [ { "cat": 0.99218, "feline": 0.81242 }] IDE Apps Edge Devices Model Store Consume Model DevOps Pipeline Customize Model Deploy Model Predict Validate & Flight Model + App Update Application Publish Model Collect Feedback Deploy Application Model Telemetry Retrain Model
  • 21. Source Code DevOps Pipeline Register Model Training Pipeline Data Movement Data Prep Model Training Model Store DevOps Pipeline DevTest Deploy to PROD Package Model Validate Model Get Human Approval MODEL CI/CD (Machine Learning as a Service + DevOps) Azure DevOps Azure Machine Learning Azure Data Factory New model registered, trigger release ML Pipeline handles dataPrep, training, evaluation – certifies the model is of high quality TRAIN MODEL DEPLOY MODEL Unit Test Code Code change, trigger CI Inference Data Data Preparation Services (Labeling, Feedback, Drift) Data Lake New data, trigger CI Data Cooking Pipeline New inference code, trigger release Data Warehouse New training job is started whenever source code is pushed.
  • 22. Continuous Integration and Delivery Build Model (app) (testing + validation) Deploy Resources Deploy Model (app) Logging & Monitoring Real-Time Azure Kubernetes Service Application Performance Monitoring Azure ML Experiments Docker + Conda Env. Model / Data Monitoring Batch Azure ML Pipelines Data Collection
  • 23.
  • 24. • Training data • Featurization code (w/ tests) • Training pipeline • Training environment • Evidence chain • Model config • Training job info • Sample data • Data profile Use repeatable pipelines for your ML workflow – they can get complicated.
  • 25. Source Control • Track changes in code (and configuration) over time, integrate work, reproducibility and collaboration. Dataset Versioning • Training data plays an important role in the quality of the software build. Hence, versioning of data is required for reproducability. Model Versioning • Version trained models in relation to code and training data for traceability. Experiment Tracking • Version model experiment runs to understand which code, data and e.g. selected features led to what output and performance, and allow for reproducibility.
  • 26.
  • 27.
  • 28. • The model response on a given record is not the expected one. • Investigate the trainset and detect potential bias. • Ensure that the preprocessing is not clipping any values etc. • Document these corner cases & add them to validation process Edge cases • This type of bugs refers to the resiliency of the model in case of missing values and how well can it handle unseen categorical values. Null values / unknown categories • An input stream may stop producing data causing unexpected responses by the model. Input issues
  • 29.
  • 30.
  • 31. Test Type Data Scientist App Dev / Ops Unit Tests X Data Integrity Tests X Model Performance X Model Validation X Integration Tests X X Load Tests X Data Monitoring X Skew Monitoring X Model Monitoring X X
  • 32. • Data (changes to shape / profile) • Model in isolation (offline A/B) • Model + app (functional testing) • Only deploy after initial validation passes • Ramp up traffic to new model using A/B experimentations • Functional behavior • Performance characteristics
  • 33.
  • 34. • which data, • which experiment / previous model(s), • where’s the code / notebook) • Was it converted / quantized? • Private / compliant data
  • 35.
  • 36. • Focus on ML, not DevOps • Get telemetry for service health and model behavior • code-generation • API specifications / interfaces • Cloud Services • Mobile / Embedded Applications • Edge Devices • Quantize / optimize models for target platform • Compliant + Safe
  • 37.
  • 38. ML DevOps lifecycle Experiment Data Acquisition Business Understanding Initial Modeling Develop Modeling Operate Continuous Delivery Data Feedback Loop System + Model Monitoring Experiment + Testing Continuous Integration Continuous Deployment
  • 39. © Microsoft Corporation DevOps brings together people, processes, and technology, automating software delivery to provide continuous value to your users. Using Azure DevOps, you can deliver software faster and more reliably - no matter how big your IT department or what tools you’re using. DevOps for ML: Supporting Technologies Infrastructure as Code CI/CD Testing / Release / Monitoring • Azure Resource Manager Templates • Azure ML Python SDK & CLI • Azure SDK’s • Azure DevOps Pipelines • Azure ML Training Services • Azure Repos / GitHub • Azure Boards • Azure DevOps for automated testing • R - Runit and testthat • Python - PyUnit, pytest, nose, … • Azure ML Tracking • Azure Data Prep SDK (analyse/profile) • Azure ML Model Management (Instrumentation, Telemetry) • Azure Monitor for app telemetry
  • 40. Continuous Integration and Delivery Build Model (app) (testing + validation) Deploy Resources Deploy Model (app) Logging & Monitoring Real-Time Azure Kubernetes Service Application Performance Monitoring Azure ML Experiments Docker + Conda Env. Model / Data Monitoring Batch Azure ML Pipelines Data Collection
  • 41.
  • 43.
  • 44. Azure Machine Learning service Set of Azure Cloud Services Python SDK  Prepare Data  Build Models  Train Models  Manage Models  Track Experiments  Deploy Models That enables you to:
  • 45. IT/Ops ML Scientist Dev/Ops Azure Machine Learning – Key Concepts
  • 46. Azure ML service Artifact Workspace The workspace is the top-level resource for the Azure Machine Learning service. It provides a centralized place to work with all the artifacts you create when using Azure Machine Learning service. The workspace keeps a list of compute targets that can be used to train your model. It also keeps a history of the training runs, including logs, metrics, output, and a snapshot of your scripts. Models are registered with the workspace. You can create multiple workspaces, and each workspace can be shared by multiple people. When you create a new workspace, it automatically creates these Azure resources: Azure Container Registry - Registers docker containers that are used during training and when deploying a model. Azure Storage - Used as the default datastore for the workspace. Azure Application Insights - Stores monitoring information about your models. Azure Key Vault - Stores secrets used by compute targets and other sensitive information needed by the workspace.
  • 47. Azure ML service Key Artifacts Workspace
  • 49. ML Pipelines Increase experiment velocity, reliability, repeatability Use the technology of your choice for each step Create & manage ML workflows concurrently Define steps to prepare data, train, deploy, eval Use diverse languages & run on diverse compute Easy to compose and swap out steps as your workflow evolves Features Sequencing and parallelization of steps, declarative data dependencies Unattended execution for long running pipeline, mixed and diverse (heterogeneous) compute for steps Data management and reusable components. Share pipelines, code, intermediate data, and models Compute #1, #2 Compute #3 Compute #4 ML Pipeline 2 3 5 6 8 1 4 7 REST API w/ parameters enables retraining and batch scoring Fine controls for compute provision and deprovision
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55. Azure ML – Models and Model Registry Model Model Registry
  • 57.
  • 58. Cloud-hosted pipelines for Linux, Windows and macOS. Azure DevOps Pipelines Any language, any platform, any cloud Build, test, and deploy Node.js, Python, Java, PHP, Ruby, C/C++, .NET, Android, and iOS apps. Run in parallel on Linux, macOS, and Windows. Deploy to Azure, AWS, GCP or on-premises Extensible Explore and implement a wide range of community- built build, test, and deployment tasks, along with hundreds of extensions from Slack to SonarCloud. Support for YAML, reporting and more Containers and Kubernetes Easily build and push images to container registries like Docker Hub and Azure Container Registry. Deploy containers to individual hosts or Kubernetes. https://azure.com/pipelines 
  • 59.
  • 60.
  • 61. Continuous Integration and Delivery Build Model (app) (testing + validation) Deploy Resources Deploy Model (app) Logging & Monitoring Real-Time Azure Kubernetes Service Application Performance Monitoring Azure ML Experiments Docker + Conda Env. Model / Data Monitoring Batch Azure ML Pipelines Data Collection

Editor's Notes

  1. Continuous Integration (CI) enables individual developers to collaborate more effectively with each other and blend their work into a code repository Each time you commit code, it’s automatically built and tested, and bugs are detected faster. Continuous Delivery (CD) is the process to build, test, configure and deploy from a build to a production environment Key here is repeatability and consistency to the process, making sure it is well understood, repeatable by others and can aid in the process of verifying the correctness. Continuous integration (CI) Increase code coverage. Build faster by splitting test and build runs Automatically ensure you don't ship broken code. Run tests continually. Continuous delivery (CD) Automatically deploy code to production. Ensure deployment targets have latest code. Use tested code from CI process. More info can be found here: https://docs.microsoft.com/en-us/azure/devops/learn/what-is-devops
  2. 4
  3. Here is the data scientist’s inner loop of work
  4. Make this slide animation. Developer work on the IDE of their choice on the application code. They commit the code to source control of their choice (VSTS has good support for various source controls) Separately, Data scientist work on developing their model. Once happy they publish the model to a model repository (we can extend this with Vienna) A build is kicked off in VSTS based on the commit in GitHub. VSTS Build pipeline pulls the latest model from Blob container (can be extended with Vienna Model Management Service) and creates a container. VSTS pushes the image to private image repository in Azure Container Registry On a set schedule (nightly), release pipeline is kicked off. Latest image from ACR is pulled and deployed across Kubernetes cluster on ACS. Users request for the app goes through DNS server. DNS server passes the request to load balancer and sends the response back to user.
  5. Make this slide animation. Developer work on the IDE of their choice on the application code. They commit the code to source control of their choice (VSTS has good support for various source controls) Separately, Data scientist work on developing their model. Once happy they publish the model to a model repository (we can extend this with Vienna) A build is kicked off in VSTS based on the commit in GitHub. VSTS Build pipeline pulls the latest model from Blob container (can be extended with Vienna Model Management Service) and creates a container. VSTS pushes the image to private image repository in Azure Container Registry On a set schedule (nightly), release pipeline is kicked off. Latest image from ACR is pulled and deployed across Kubernetes cluster on ACS. Users request for the app goes through DNS server. DNS server passes the request to load balancer and sends the response back to user.
  6. Make this slide animation. Developer work on the IDE of their choice on the application code. They commit the code to source control of their choice (VSTS has good support for various source controls) Separately, Data scientist work on developing their model. Once happy they publish the model to a model repository (we can extend this with Vienna) A build is kicked off in VSTS based on the commit in GitHub. VSTS Build pipeline pulls the latest model from Blob container (can be extended with Vienna Model Management Service) and creates a container. VSTS pushes the image to private image repository in Azure Container Registry On a set schedule (nightly), release pipeline is kicked off. Latest image from ACR is pulled and deployed across Kubernetes cluster on ACS. Users request for the app goes through DNS server. DNS server passes the request to load balancer and sends the response back to user.
  7. Make this slide animation. Developer work on the IDE of their choice on the application code. They commit the code to source control of their choice (VSTS has good support for various source controls) Separately, Data scientist work on developing their model. Once happy they publish the model to a model repository (we can extend this with Vienna) A build is kicked off in VSTS based on the commit in GitHub. VSTS Build pipeline pulls the latest model from Blob container (can be extended with Vienna Model Management Service) and creates a container. VSTS pushes the image to private image repository in Azure Container Registry On a set schedule (nightly), release pipeline is kicked off. Latest image from ACR is pulled and deployed across Kubernetes cluster on ACS. Users request for the app goes through DNS server. DNS server passes the request to load balancer and sends the response back to user.
  8. Make this slide animation. Developer work on the IDE of their choice on the application code. They commit the code to source control of their choice (VSTS has good support for various source controls) Separately, Data scientist work on developing their model. Once happy they publish the model to a model repository (we can extend this with Vienna) A build is kicked off in VSTS based on the commit in GitHub. VSTS Build pipeline pulls the latest model from Blob container (can be extended with Vienna Model Management Service) and creates a container. VSTS pushes the image to private image repository in Azure Container Registry On a set schedule (nightly), release pipeline is kicked off. Latest image from ACR is pulled and deployed across Kubernetes cluster on ACS. Users request for the app goes through DNS server. DNS server passes the request to load balancer and sends the response back to user.
  9. [10:50 AM] Tim ScarfeJordan Edwards thanks for this! Assumptions: 1) the model store is keyed in some way on the build ID and/or the git commit id? 2) the ML pipeline is calling out to data bricks using the jobs API with python source checked into git i.e. not calling a mutable notebook ​ [11:23 AM] Jordan EdwardsTim Scarfe - yes the model is pinned with the git commit as well as the pipeline / build ID (so you have an audit trail to exactly how it was produced) yes the job should submit sources that are in git not in a magic notebook on the file system <https://teams.microsoft.com/l/message/19:bfb1b4d771ff441393e2c89c9e80d14c@thread.skype/1547059832334?tenantId=72f988bf-86f1-41af-91ab-2d7cd011db47&amp;groupId=66aa6f64-da6b-491b-b2e3-8e43ae872a7c&amp;parentMessageId=1547054108278&amp;teamName=DevOps for A.I. V-Team&amp;channelName=General&amp;createdTime=1547059832334> Ideally release in my opinion will be automated to a staging environment once a new model hits the model store, Then integration testing and then a manual release gate for deployment to production, so I would not have the arrow from the repo with inference code changes directly triggering a release ... changes to inference code should trigger the build pipeline too. Perhaps there is room for triggering a different build pipeline based on filter conditions (Path Filters) that follows a seperate path other that registering a new model ?
  10. 38
  11. The What… Azure Pipelines is our offering for the heart of your DevOps needs… CI/CD… continuous integration & deployment. Azure Pipelines is the perfect launchpad for your code – automating everything… from your builds and deployments so you spend less time with the nuts and bolts and more time being creative At Microsoft we do just that. We deploy over 78k times a day with Azure Pipelines. Open & extensible… It’s great for any type of application, any platform or any cloud. It has cloud hosted pools of Linux, Mac & Windows VMs that we manage for you. Your not restricted to the functionality we provide, Pipelines has rich extensibility. Partners and the community can contribute extensions in our marketplace for everyone One of my favourite things is when new extensions show up. We have over 500 today, ranging from community built to services from Slack to SonarCloud. works has rich extensibility with a wide range of community extensions along with If you want to build & test a Node app in a GitHub repo and deploy it via a docker container to AWS… go for it. Containers / Modern… Containers are becoming more & more the unity of deployment & Azure Pipelines is great for containers. Azure Pipelines can build images, push them to container registries like Docker Hub and Azure Container Registry. You can deploy to any container host including Kubernetes. Transition… Donovan, is going to show us Azure Pipelines in action.