SlideShare a Scribd company logo
CONFIDENTIAL
CONFIDENTIAL
Why is DevOps for Machine
Learning so Different?
Ryan Dawson, Seldon
Outline
1. MLOps Landscape
2. Data Science vs Programming
3. Traditional Programming E2E Workflow
4. Intro to ML E2E Workflow
5. MLOps Topics
a. Training
b. Serving
c. Monitoring
6. Advanced MLOps Challenges
7. Review
DevOps Background
Managing Smooth Journey to Prod
DevOps Roles Centred on CI/CD and Infra
Established tools
Key enabler for projects
MLOps Background
87% of projects never go live
ML-related infrastructure is complex
Rise of MLOps
Sculley et al., NIPS 2015
Complex Tool Landscape
landscape.lfai.foundation
Running software performs actions in response to inputs.
Traditional programming codifies actions as explicit rules
ML does not codify explicitly.
Instead rules are indirectly set by capturing patterns from
data.
Different problem domains - ML more applicable to focused
numerical problems.
Why So Different?
Traditional programming
Think of old terminal systems
Start with hello world and add control
structures
Examples
Data Science
Classification problems (e.g. cat or not cat)
Regression problems (e.g. sales from ad
spend)
Start with MNIST or kaggle
Regression
fitting
Image: Davi Frossard
Gradient Descent
Compute error against training data
Adjust weights and recompute
Tunable
Improving Vanilla Gradient Descent - towardsdatascience
Data science is exploratory
Interactive notebooks for
exploration
Code shared through notebooks
Data Playgrounds/Exploration
Image: ArcGIS
Comparing Journeys
Dev Build Journey
Compilation
Calculator user story
As a lazy person, I want to put numerical
operations into a screen so that I don’t have
to work out the answers.
ML Build Journey
Training Prediction
Training
Tracking
Data
Serving
Batch
E2E
Data Science Quesstion
Can we estimate/set/banchmark
employee pay from this data? Frameworks
ML is Different - Key Points
Training data and code together drive fitting
Closest thing to executable is a trained/weighted model (can
vary with toolkit)
Retraining can be necessary (e.g. online shop and fashion
trends)
Lots of data, long-running jobs
1. User Story
2. Write code
3. Submit PR
4. Tests run automatically
5. Review and merge
6. New version builds
7. Built executable deployed to environment
8. Further tests
9. Promote to next environment
10. More tests etc.
11. PROD
12. Monitor - stacktraces or error codes
Docker as packaging. Driver is a code change (git)
Traditional Dev Workflow
Driver might be a code change. Or new data.
Data not in git.
More experimental - data driven and you’ve only a sample
of data.
Testing for quantifiable performance, not pass/fail.
Let’s focus on offline learning to simplify.
ML Workflows - Primer
ML E2E Workflow Intro
1. Data inputs and outputs. Preprocessed. Large.
2. Try stuff locally with a slice.
3. Try with more data as long-running experiments.
4. Collaboration - often in jupyter & git
5. Model may be pickled/serialized
6. Integrate into a running app e.g. add REST API
(serving)
7. Integration test with app.
8. Rollout & monitor performance metrics
Metrics Example
Online store example
A/B test
A leads to more conversions
But…
More negative reviews? Bounce-rate?
Interaction-level? Latency?
Krishen Siew - quora
What Can Happen
Rise of the Term MLOps - towardsdatascience
Role of MLOps
Empower teams and break down silos
Provide ways to collaborate/self-serve
New Territory
Special challenges for ML.
No clear standards yet. We’ll drill into:
1. Training - slice of data, train a weighted model to
make predictions on unseen data.
2. Serving - call with HTTP.
3. Rollout and Monitoring - making sure it performs.
For long-running, intensive training jobs there’s
kubeflow pipelines, polyaxon, mlflow…
Broken into steps incl. cleaning and transformation (pre-
processing).
1 Training/Experimentation
Model Training
Each step can be long-running
Continuous Delivery for Machine Learning - martinfowler.com
Kubeflow ML Platform
Kubeflow Pipelines
Parameterised experiments
MLFlow Experiments
Training and CI
Some training platforms have CI integration.
Result of a run could be a model. So
analogous to a CI build of an executable.
But how to say that the new version is
‘good’?
2 Serving
Serving = use model via HTTP. Offline/batch is different.
Some platforms have serving or there’s dedicated solutions.
Seldon, Tensorflow Serving, AzureML, SageMaker
Often package the model and host (bucket) so the serving
solution can run it.
Serving can support rollout & monitoring.
Seldon ML Serving
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: sklearn
spec:
name: iris
predictors:
- graph:
children: []
implementation: SKLEARN_SERVER
modelUri: gs://seldon-models/sklearn/iris
name: classifier
name: default
replicas: 1
Open Source
K8s custom resource
Pods created to serve http
Docker option too
Data scientists like pickles
3 Rollout and Monitoring
ML model trained on sample - need to keep checking with new data coming in
Rollout strategies:
Canary = % of traffic to new version as check
A/B Test = % split between versions for longer to monitor performance
Shadowing = All traffic to old and new model. Only the live model’s responses are used
Canary with Seldon
kind: SeldonDeployment
apiVersion: machinelearning.seldon.io/v1alpha2
metadata:
name: skiris
namespace: default
creationTimestamp:
spec:
name: skiris
predictors:
- name: default
graph:
name: skiris-default
implementation: SKLEARN_SERVER
modelUri: gs://seldon-models/sklearn/iris
replicas: 1
- name: canary
graph:
name: skiris-canary
implementation: XGBOOST_SERVER
modelUri: gs://seldon-models/xgboost/iris
replicas: 1
Traffic-splitting more typically
defined in gateway config.
Very common in ML.
In serving not gateway so data
scientist can define rollout.
A/B Test With Seldon
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: mlflow-deployment
spec:
name: mlflow-deployment
predictors:
- graph:
children: []
implementation: MLFLOW_SERVER
modelUri: gs://seldon-models/mlflow/elasticnet_wine
name: wines-classifier
name: a-mlflow-deployment-dag
replicas: 1
traffic: 20
- graph:
children: []
implementation: MLFLOW_SERVER
modelUri: gs://seldon-models/mlflow/elasticnet_wine
name: wines-classifier
name: b-mlflow-deployment-dag
replicas: 1
traffic: 80
Seldon Metrics
Out of the box basic metrics (because so commonly needed)
Seldon Request Logging
Human review of predictions can be needed
Advanced Topics - Serving
● Real-time inference graphs with pre-processing
● Advanced routing - multi-armed bandits.
● Outlier detection
● Concept drift
Advanced Topics - Governance
● Explainability - why did it predict that?
○ Some orgs sticking to whitebox techniques - not neural nets
○ Blackbox is possible
● Provenance & Reproducibility (associating models to training runs to data to triggers)
○ Data versioning adds complexity
○ Competing tools for metadata
○ No agreed standards yet
● Bias & ethics
● Adversarial attacks
Summary
MLOps is new terrain.
ML workflows exploratory & data-driven.
MLOps enables ML workflows with:
● Data and compute-intensive experiments and training
● Artifact tracking
● Rollout strategies to work with monitoring
● Monitoring tools
CONTACT
Prénom NOM
xxxx@xxx.com
M: +xx xx xx xx xx
MERCI

More Related Content

What's hot

MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
Provectus
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
Databricks
 
Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey (x18140785) - Research in Computing CA2Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey
 
MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)
Julien SIMON
 
Apache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
Apache Liminal (Incubating)—Orchestrate the Machine Learning PipelineApache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
Apache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
Databricks
 
Managing the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOpsManaging the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOps
Fatih Baltacı
 
Ml ops past_present_future
Ml ops past_present_futureMl ops past_present_future
Ml ops past_present_future
Nisha Talagala
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps
Rui Quintino
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
Bill Liu
 
What's Next for MLflow in 2019
What's Next for MLflow in 2019What's Next for MLflow in 2019
What's Next for MLflow in 2019
Anyscale
 
MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
cnvrg.io AI OS - Hands-on ML Workshops
 
Fundamental MLOps
Fundamental MLOpsFundamental MLOps
Fundamental MLOps
Saripudin Gon
 
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and PrometheusRobust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
Manasi Vartak
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
Pieter de Bruin
 
Hamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreHamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature Store
Moritz Meister
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
C4Media
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOps
DataPhoenix
 
Managers guide to effective building of machine learning products
Managers guide to effective building of machine learning productsManagers guide to effective building of machine learning products
Managers guide to effective building of machine learning products
Gianmario Spacagna
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
Databricks
 

What's hot (20)

MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey (x18140785) - Research in Computing CA2Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey (x18140785) - Research in Computing CA2
 
MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)
 
Apache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
Apache Liminal (Incubating)—Orchestrate the Machine Learning PipelineApache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
Apache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
 
Managing the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOpsManaging the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOps
 
Ml ops past_present_future
Ml ops past_present_futureMl ops past_present_future
Ml ops past_present_future
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
 
What's Next for MLflow in 2019
What's Next for MLflow in 2019What's Next for MLflow in 2019
What's Next for MLflow in 2019
 
MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
 
Fundamental MLOps
Fundamental MLOpsFundamental MLOps
Fundamental MLOps
 
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and PrometheusRobust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
Hamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreHamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature Store
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOps
 
Managers guide to effective building of machine learning products
Managers guide to effective building of machine learning productsManagers guide to effective building of machine learning products
Managers guide to effective building of machine learning products
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
 

Similar to Why is dev ops for machine learning so different - dataxdays

Why is dev ops for machine learning so different
Why is dev ops for machine learning so differentWhy is dev ops for machine learning so different
Why is dev ops for machine learning so different
Ryan Dawson
 
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
Provectus
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Pitfalls of machine learning in production
Pitfalls of machine learning in productionPitfalls of machine learning in production
Pitfalls of machine learning in production
Antoine Sauray
 
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and IstioAdvanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Animesh Singh
 
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Databricks
 
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Databricks
 
Magdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningMagdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine Learning
Lviv Startup Club
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Sotrender
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with Databricks
Liangjun Jiang
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricks
Liangjun Jiang
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
All Things Open
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
dtz001
 
Practical machine learning
Practical machine learningPractical machine learning
Practical machine learning
Faizan Javed
 
Caffe2
Caffe2Caffe2
Certification Study Group - NLP & Recommendation Systems on GCP Session 5
Certification Study Group - NLP & Recommendation Systems on GCP Session 5Certification Study Group - NLP & Recommendation Systems on GCP Session 5
Certification Study Group - NLP & Recommendation Systems on GCP Session 5
gdgsurrey
 
Object Oriented Concepts and Principles
Object Oriented Concepts and PrinciplesObject Oriented Concepts and Principles
Object Oriented Concepts and Principles
deonpmeyer
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
James Anderson
 
Strata CA 2019: From Jupyter to Production Manu Mukerji
Strata CA 2019: From Jupyter to Production Manu MukerjiStrata CA 2019: From Jupyter to Production Manu Mukerji
Strata CA 2019: From Jupyter to Production Manu Mukerji
Manu Mukerji
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
Databricks
 

Similar to Why is dev ops for machine learning so different - dataxdays (20)

Why is dev ops for machine learning so different
Why is dev ops for machine learning so differentWhy is dev ops for machine learning so different
Why is dev ops for machine learning so different
 
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Pitfalls of machine learning in production
Pitfalls of machine learning in productionPitfalls of machine learning in production
Pitfalls of machine learning in production
 
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and IstioAdvanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
 
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
 
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
 
Magdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningMagdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine Learning
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with Databricks
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricks
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
 
Practical machine learning
Practical machine learningPractical machine learning
Practical machine learning
 
Caffe2
Caffe2Caffe2
Caffe2
 
Certification Study Group - NLP & Recommendation Systems on GCP Session 5
Certification Study Group - NLP & Recommendation Systems on GCP Session 5Certification Study Group - NLP & Recommendation Systems on GCP Session 5
Certification Study Group - NLP & Recommendation Systems on GCP Session 5
 
Object Oriented Concepts and Principles
Object Oriented Concepts and PrinciplesObject Oriented Concepts and Principles
Object Oriented Concepts and Principles
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Strata CA 2019: From Jupyter to Production Manu Mukerji
Strata CA 2019: From Jupyter to Production Manu MukerjiStrata CA 2019: From Jupyter to Production Manu Mukerji
Strata CA 2019: From Jupyter to Production Manu Mukerji
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
 

More from Ryan Dawson

mlops.community meetup - ML Governance_ A Practical Guide.pptx
mlops.community meetup - ML Governance_ A Practical Guide.pptxmlops.community meetup - ML Governance_ A Practical Guide.pptx
mlops.community meetup - ML Governance_ A Practical Guide.pptx
Ryan Dawson
 
Conspiracy Theories in the Information Age
Conspiracy Theories in the Information AgeConspiracy Theories in the Information Age
Conspiracy Theories in the Information Age
Ryan Dawson
 
Maximising teamwork in delivering software products
Maximising teamwork in delivering software productsMaximising teamwork in delivering software products
Maximising teamwork in delivering software products
Ryan Dawson
 
Maximising teamwork in delivering software products
Maximising teamwork in delivering software products Maximising teamwork in delivering software products
Maximising teamwork in delivering software products
Ryan Dawson
 
Java vs challenger languages
Java vs challenger languagesJava vs challenger languages
Java vs challenger languages
Ryan Dawson
 
Challenges for AI in prod
Challenges for AI in prodChallenges for AI in prod
Challenges for AI in prod
Ryan Dawson
 
From training to explainability via git ops
From training to explainability via git opsFrom training to explainability via git ops
From training to explainability via git ops
Ryan Dawson
 
How open source is funded the enterprise differentiation tightrope (1)
How open source is funded  the enterprise differentiation tightrope (1)How open source is funded  the enterprise differentiation tightrope (1)
How open source is funded the enterprise differentiation tightrope (1)
Ryan Dawson
 
From java monolith to kubernetes microservices - an open source journey with ...
From java monolith to kubernetes microservices - an open source journey with ...From java monolith to kubernetes microservices - an open source journey with ...
From java monolith to kubernetes microservices - an open source journey with ...
Ryan Dawson
 
Whirlwind tour of activiti 7
Whirlwind tour of activiti 7Whirlwind tour of activiti 7
Whirlwind tour of activiti 7
Ryan Dawson
 
Jdk.io cloud native business automation
Jdk.io cloud native business automationJdk.io cloud native business automation
Jdk.io cloud native business automation
Ryan Dawson
 
Identity management and single sign on - how much flexibility
Identity management and single sign on - how much flexibilityIdentity management and single sign on - how much flexibility
Identity management and single sign on - how much flexibility
Ryan Dawson
 
Activiti Cloud Deep Dive
Activiti Cloud Deep DiveActiviti Cloud Deep Dive
Activiti Cloud Deep Dive
Ryan Dawson
 

More from Ryan Dawson (13)

mlops.community meetup - ML Governance_ A Practical Guide.pptx
mlops.community meetup - ML Governance_ A Practical Guide.pptxmlops.community meetup - ML Governance_ A Practical Guide.pptx
mlops.community meetup - ML Governance_ A Practical Guide.pptx
 
Conspiracy Theories in the Information Age
Conspiracy Theories in the Information AgeConspiracy Theories in the Information Age
Conspiracy Theories in the Information Age
 
Maximising teamwork in delivering software products
Maximising teamwork in delivering software productsMaximising teamwork in delivering software products
Maximising teamwork in delivering software products
 
Maximising teamwork in delivering software products
Maximising teamwork in delivering software products Maximising teamwork in delivering software products
Maximising teamwork in delivering software products
 
Java vs challenger languages
Java vs challenger languagesJava vs challenger languages
Java vs challenger languages
 
Challenges for AI in prod
Challenges for AI in prodChallenges for AI in prod
Challenges for AI in prod
 
From training to explainability via git ops
From training to explainability via git opsFrom training to explainability via git ops
From training to explainability via git ops
 
How open source is funded the enterprise differentiation tightrope (1)
How open source is funded  the enterprise differentiation tightrope (1)How open source is funded  the enterprise differentiation tightrope (1)
How open source is funded the enterprise differentiation tightrope (1)
 
From java monolith to kubernetes microservices - an open source journey with ...
From java monolith to kubernetes microservices - an open source journey with ...From java monolith to kubernetes microservices - an open source journey with ...
From java monolith to kubernetes microservices - an open source journey with ...
 
Whirlwind tour of activiti 7
Whirlwind tour of activiti 7Whirlwind tour of activiti 7
Whirlwind tour of activiti 7
 
Jdk.io cloud native business automation
Jdk.io cloud native business automationJdk.io cloud native business automation
Jdk.io cloud native business automation
 
Identity management and single sign on - how much flexibility
Identity management and single sign on - how much flexibilityIdentity management and single sign on - how much flexibility
Identity management and single sign on - how much flexibility
 
Activiti Cloud Deep Dive
Activiti Cloud Deep DiveActiviti Cloud Deep Dive
Activiti Cloud Deep Dive
 

Recently uploaded

Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
Sharepoint Designs
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
Peter Caitens
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
MayankTawar1
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
 

Recently uploaded (20)

Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 

Why is dev ops for machine learning so different - dataxdays

  • 1. CONFIDENTIAL CONFIDENTIAL Why is DevOps for Machine Learning so Different? Ryan Dawson, Seldon
  • 2. Outline 1. MLOps Landscape 2. Data Science vs Programming 3. Traditional Programming E2E Workflow 4. Intro to ML E2E Workflow 5. MLOps Topics a. Training b. Serving c. Monitoring 6. Advanced MLOps Challenges 7. Review
  • 3. DevOps Background Managing Smooth Journey to Prod DevOps Roles Centred on CI/CD and Infra Established tools Key enabler for projects
  • 4. MLOps Background 87% of projects never go live ML-related infrastructure is complex Rise of MLOps Sculley et al., NIPS 2015
  • 6. Running software performs actions in response to inputs. Traditional programming codifies actions as explicit rules ML does not codify explicitly. Instead rules are indirectly set by capturing patterns from data. Different problem domains - ML more applicable to focused numerical problems. Why So Different?
  • 7. Traditional programming Think of old terminal systems Start with hello world and add control structures Examples Data Science Classification problems (e.g. cat or not cat) Regression problems (e.g. sales from ad spend) Start with MNIST or kaggle
  • 9. Gradient Descent Compute error against training data Adjust weights and recompute Tunable Improving Vanilla Gradient Descent - towardsdatascience
  • 10. Data science is exploratory Interactive notebooks for exploration Code shared through notebooks Data Playgrounds/Exploration Image: ArcGIS
  • 12. Dev Build Journey Compilation Calculator user story As a lazy person, I want to put numerical operations into a screen so that I don’t have to work out the answers.
  • 13. ML Build Journey Training Prediction Training Tracking Data Serving Batch E2E Data Science Quesstion Can we estimate/set/banchmark employee pay from this data? Frameworks
  • 14. ML is Different - Key Points Training data and code together drive fitting Closest thing to executable is a trained/weighted model (can vary with toolkit) Retraining can be necessary (e.g. online shop and fashion trends) Lots of data, long-running jobs
  • 15. 1. User Story 2. Write code 3. Submit PR 4. Tests run automatically 5. Review and merge 6. New version builds 7. Built executable deployed to environment 8. Further tests 9. Promote to next environment 10. More tests etc. 11. PROD 12. Monitor - stacktraces or error codes Docker as packaging. Driver is a code change (git) Traditional Dev Workflow
  • 16. Driver might be a code change. Or new data. Data not in git. More experimental - data driven and you’ve only a sample of data. Testing for quantifiable performance, not pass/fail. Let’s focus on offline learning to simplify. ML Workflows - Primer
  • 17. ML E2E Workflow Intro 1. Data inputs and outputs. Preprocessed. Large. 2. Try stuff locally with a slice. 3. Try with more data as long-running experiments. 4. Collaboration - often in jupyter & git 5. Model may be pickled/serialized 6. Integrate into a running app e.g. add REST API (serving) 7. Integration test with app. 8. Rollout & monitor performance metrics
  • 18. Metrics Example Online store example A/B test A leads to more conversions But… More negative reviews? Bounce-rate? Interaction-level? Latency? Krishen Siew - quora
  • 19. What Can Happen Rise of the Term MLOps - towardsdatascience
  • 20. Role of MLOps Empower teams and break down silos Provide ways to collaborate/self-serve
  • 21. New Territory Special challenges for ML. No clear standards yet. We’ll drill into: 1. Training - slice of data, train a weighted model to make predictions on unseen data. 2. Serving - call with HTTP. 3. Rollout and Monitoring - making sure it performs.
  • 22. For long-running, intensive training jobs there’s kubeflow pipelines, polyaxon, mlflow… Broken into steps incl. cleaning and transformation (pre- processing). 1 Training/Experimentation
  • 23. Model Training Each step can be long-running Continuous Delivery for Machine Learning - martinfowler.com
  • 27. Training and CI Some training platforms have CI integration. Result of a run could be a model. So analogous to a CI build of an executable. But how to say that the new version is ‘good’?
  • 28. 2 Serving Serving = use model via HTTP. Offline/batch is different. Some platforms have serving or there’s dedicated solutions. Seldon, Tensorflow Serving, AzureML, SageMaker Often package the model and host (bucket) so the serving solution can run it. Serving can support rollout & monitoring.
  • 29. Seldon ML Serving apiVersion: machinelearning.seldon.io/v1alpha2 kind: SeldonDeployment metadata: name: sklearn spec: name: iris predictors: - graph: children: [] implementation: SKLEARN_SERVER modelUri: gs://seldon-models/sklearn/iris name: classifier name: default replicas: 1 Open Source K8s custom resource Pods created to serve http Docker option too Data scientists like pickles
  • 30. 3 Rollout and Monitoring ML model trained on sample - need to keep checking with new data coming in Rollout strategies: Canary = % of traffic to new version as check A/B Test = % split between versions for longer to monitor performance Shadowing = All traffic to old and new model. Only the live model’s responses are used
  • 31. Canary with Seldon kind: SeldonDeployment apiVersion: machinelearning.seldon.io/v1alpha2 metadata: name: skiris namespace: default creationTimestamp: spec: name: skiris predictors: - name: default graph: name: skiris-default implementation: SKLEARN_SERVER modelUri: gs://seldon-models/sklearn/iris replicas: 1 - name: canary graph: name: skiris-canary implementation: XGBOOST_SERVER modelUri: gs://seldon-models/xgboost/iris replicas: 1 Traffic-splitting more typically defined in gateway config. Very common in ML. In serving not gateway so data scientist can define rollout.
  • 32. A/B Test With Seldon apiVersion: machinelearning.seldon.io/v1alpha2 kind: SeldonDeployment metadata: name: mlflow-deployment spec: name: mlflow-deployment predictors: - graph: children: [] implementation: MLFLOW_SERVER modelUri: gs://seldon-models/mlflow/elasticnet_wine name: wines-classifier name: a-mlflow-deployment-dag replicas: 1 traffic: 20 - graph: children: [] implementation: MLFLOW_SERVER modelUri: gs://seldon-models/mlflow/elasticnet_wine name: wines-classifier name: b-mlflow-deployment-dag replicas: 1 traffic: 80
  • 33. Seldon Metrics Out of the box basic metrics (because so commonly needed)
  • 34. Seldon Request Logging Human review of predictions can be needed
  • 35. Advanced Topics - Serving ● Real-time inference graphs with pre-processing ● Advanced routing - multi-armed bandits. ● Outlier detection ● Concept drift
  • 36. Advanced Topics - Governance ● Explainability - why did it predict that? ○ Some orgs sticking to whitebox techniques - not neural nets ○ Blackbox is possible ● Provenance & Reproducibility (associating models to training runs to data to triggers) ○ Data versioning adds complexity ○ Competing tools for metadata ○ No agreed standards yet ● Bias & ethics ● Adversarial attacks
  • 37. Summary MLOps is new terrain. ML workflows exploratory & data-driven. MLOps enables ML workflows with: ● Data and compute-intensive experiments and training ● Artifact tracking ● Rollout strategies to work with monitoring ● Monitoring tools
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.

Editor's Notes

  1. Expand on metrics. Perhaps you’re recommending really controversial productions. Or maybe you’re using annoying pop-ups for suggestions.
  2. So we’re seeing that this MLOps stuff is complicated and different from traditional DevOps. One challenge of this is that Data Science and DevOps can be different silos in many organisations and sometimes with a filter in between. So you get situations where a python pickle file ends up being passed to the DevOps team without any context. So naturally the team that is meant to run the model in production is like ‘what is this?’ For that situation this cartoon depicts a pretty reasonable reaction.
  3. Other companies have a more mature setup. Here see more particular specialisms in play. In the bottom left we’ve got the data engineers