SlideShare a Scribd company logo
WORKSHOP
Monitor ML models in production
How to monitor and manage the health of your ml models
Leah Kolben, CTO
@leah4kosh
leah@cnvrg.io
whoami
• Developer/Data scientist => CTO
• cnvrg.io = built by data scientists, for data scientists to help teams:
• Get from data to models to production in the most efficient and fast way
• bridge science and engineering
agenda
• Introduction – recap previous webinars
• Kubernetes overview
• Why should we monitor our models
• Monitor tools
• LIVE workshop
• Summary
Introduction
• Previous webinars:
• Train your ML models on Kubernetes
• Run ETL jobs using spark on Kubernetes
• Deploy your ML model in production
• Today: “My model is in production, now what?”
• Use Grafana to monitor your model metrics – CPU & Memory
• Use Kibana to monitor your ML model logs
• Use elasicsearch to index your model and create alerts
Kubernetes - recap
• Provides a runtime environment for Docker containers
• Provides an abstraction layer for containers to run on
• Deploy as micro services
• All services are natively load balanced
• Can scale up and down dynamically
• Monitor the health of the containers
• Schedule runs and cronjobs
• Use the same API across EVERY cloud provider and bare metal!
• Goals:
• Quickly deploy your ML model as a service
• Reduce costs and man power with auto scaling
• Load balanced the traffic
• Natively monitored by Kubernetes
• Update your model continuously: canary deployments, blue/green deployments
ML in production - recap
Why should we monitor our models?
• Able to track you model performance
• Prevent model drift
• Monitor auto scaling and high load of traffic
• Know when to retrain your model and updat it
• Monitor updates – validate new version is better the old one
• Keep you production models updated and relevant!
Monitor tools
• Use the EKS stack on kubernetes
• Use open source tools that can be quickly installed on kuberentes using helm
• Use grafana to monitor the resources you rmodel uses
• Use Kibana to show and track you input/output
• Use elasticsearch to index the logs
• Use elasalert for tagging and creating alerts regarding the health of your models
Grafana
• Open source analytics and monitoring for a variety of databases
• Natively connects to the EKS stack
• Use custom dashboards to track your model usage
Grafana
Grafana – custom dashboard
Kibana
• Part of the EKS stack
• Log all input and output
• Log internal logs
• Search and find prediction outputs
• Visualize and understand the data
Kibana
Elasticsearch & elasalert
• Store and index all input/output of you models
• Get sense of you data
• Query your data
• Create custom rules on the data to monitor the health of your model
• Trigger custom webhooks upon alerts – retrain CI/CD pipelines
Let’s do it!
Using cnvrg to deploy & monitor your models
• Your only responsibility is to write the predict function
• end-2-end pipelines
• EKS stack is deployed automatically
• Dedicated Grafana for each endpoint
• Dedicated kibana for each endpoint
• Alerts and triggers out of the box
• Continual learning support
DEMO
Summary
• Kubernetes is the becoming the standard way to deploy models
• Overview Kubernetes
• Overview of what’s ML in production
• Monitor – why and how
• Live demo
• Deploy, monitor and retrain models using cnvrg
Thanks!
https://cnvrg.io
info@cnvrg.io
+972-506-660186

More Related Content

What's hot

Lightning talk how to edit the Silverstripe CMS docs
Lightning talk how to edit the Silverstripe CMS docsLightning talk how to edit the Silverstripe CMS docs
Lightning talk how to edit the Silverstripe CMS docs
MichaelPritchard21
 
Introduction to Scala by Piotr Wiśniowski Scalac
Introduction to Scala by Piotr Wiśniowski ScalacIntroduction to Scala by Piotr Wiśniowski Scalac
Introduction to Scala by Piotr Wiśniowski Scalac
Scalac
 
projects
projectsprojects
Data Monitoring with whylogs
Data Monitoring with whylogsData Monitoring with whylogs
Data Monitoring with whylogs
Alexey Grigorev
 
The Bleeding Edge - Whats New in Angular 2
The Bleeding Edge - Whats New in Angular 2The Bleeding Edge - Whats New in Angular 2
The Bleeding Edge - Whats New in Angular 2
Lohith Goudagere Nagaraj
 
Data engineering zoomcamp introduction
Data engineering zoomcamp  introductionData engineering zoomcamp  introduction
Data engineering zoomcamp introduction
Alexey Grigorev
 
Matlab-Assignment-Projects
Matlab-Assignment-ProjectsMatlab-Assignment-Projects
Matlab-Assignment-Projects
Phdtopiccom
 
Just start coding
Just start codingJust start coding
Just start coding
Jane Prusakova
 
Network-Simulation-Tools-Comparison
Network-Simulation-Tools-ComparisonNetwork-Simulation-Tools-Comparison
Network-Simulation-Tools-Comparison
Phdtopiccom
 
Matlab-Programming-Homework-Help
Matlab-Programming-Homework-HelpMatlab-Programming-Homework-Help
Matlab-Programming-Homework-Help
Phdtopiccom
 
Intro to TypeScript, HTML5DevConf Oct 2013
Intro to TypeScript, HTML5DevConf Oct 2013Intro to TypeScript, HTML5DevConf Oct 2013
Intro to TypeScript, HTML5DevConf Oct 2013
Matt Harrington
 
Things we learned building a native IOS app
Things we learned building a native IOS appThings we learned building a native IOS app
Things we learned building a native IOS app
Plantola
 
Buliding Reliable Data Apps
Buliding Reliable Data AppsBuliding Reliable Data Apps
Buliding Reliable Data Apps
Gleb Mezhanskiy
 
Free software and agile: Do they fit together?
Free software and agile: Do they fit together?Free software and agile: Do they fit together?
Free software and agile: Do they fit together?
Pierluigi Pugliese
 
Advancing your data science career
Advancing your data science careerAdvancing your data science career
Advancing your data science career
Alexey Grigorev
 
Cert01 70-483 - programming in c#
Cert01   70-483 - programming in c#Cert01   70-483 - programming in c#
Cert01 70-483 - programming in c#
DotNetCampus
 
How to Use Innoslate for Beginners
How to Use Innoslate for BeginnersHow to Use Innoslate for Beginners
How to Use Innoslate for Beginners
Elizabeth Steiner
 
ArashResumeOct15
ArashResumeOct15ArashResumeOct15
ArashResumeOct15
Arash Zahoory
 
From Software Engineering To Machine Learning
From Software Engineering To Machine LearningFrom Software Engineering To Machine Learning
From Software Engineering To Machine Learning
Alexey Grigorev
 

What's hot (19)

Lightning talk how to edit the Silverstripe CMS docs
Lightning talk how to edit the Silverstripe CMS docsLightning talk how to edit the Silverstripe CMS docs
Lightning talk how to edit the Silverstripe CMS docs
 
Introduction to Scala by Piotr Wiśniowski Scalac
Introduction to Scala by Piotr Wiśniowski ScalacIntroduction to Scala by Piotr Wiśniowski Scalac
Introduction to Scala by Piotr Wiśniowski Scalac
 
projects
projectsprojects
projects
 
Data Monitoring with whylogs
Data Monitoring with whylogsData Monitoring with whylogs
Data Monitoring with whylogs
 
The Bleeding Edge - Whats New in Angular 2
The Bleeding Edge - Whats New in Angular 2The Bleeding Edge - Whats New in Angular 2
The Bleeding Edge - Whats New in Angular 2
 
Data engineering zoomcamp introduction
Data engineering zoomcamp  introductionData engineering zoomcamp  introduction
Data engineering zoomcamp introduction
 
Matlab-Assignment-Projects
Matlab-Assignment-ProjectsMatlab-Assignment-Projects
Matlab-Assignment-Projects
 
Just start coding
Just start codingJust start coding
Just start coding
 
Network-Simulation-Tools-Comparison
Network-Simulation-Tools-ComparisonNetwork-Simulation-Tools-Comparison
Network-Simulation-Tools-Comparison
 
Matlab-Programming-Homework-Help
Matlab-Programming-Homework-HelpMatlab-Programming-Homework-Help
Matlab-Programming-Homework-Help
 
Intro to TypeScript, HTML5DevConf Oct 2013
Intro to TypeScript, HTML5DevConf Oct 2013Intro to TypeScript, HTML5DevConf Oct 2013
Intro to TypeScript, HTML5DevConf Oct 2013
 
Things we learned building a native IOS app
Things we learned building a native IOS appThings we learned building a native IOS app
Things we learned building a native IOS app
 
Buliding Reliable Data Apps
Buliding Reliable Data AppsBuliding Reliable Data Apps
Buliding Reliable Data Apps
 
Free software and agile: Do they fit together?
Free software and agile: Do they fit together?Free software and agile: Do they fit together?
Free software and agile: Do they fit together?
 
Advancing your data science career
Advancing your data science careerAdvancing your data science career
Advancing your data science career
 
Cert01 70-483 - programming in c#
Cert01   70-483 - programming in c#Cert01   70-483 - programming in c#
Cert01 70-483 - programming in c#
 
How to Use Innoslate for Beginners
How to Use Innoslate for BeginnersHow to Use Innoslate for Beginners
How to Use Innoslate for Beginners
 
ArashResumeOct15
ArashResumeOct15ArashResumeOct15
ArashResumeOct15
 
From Software Engineering To Machine Learning
From Software Engineering To Machine LearningFrom Software Engineering To Machine Learning
From Software Engineering To Machine Learning
 

Similar to How to monitor your ML models in production with Kubernetes

Deploy your machine learning models to production with Kubernetes
Deploy your machine learning models to production with KubernetesDeploy your machine learning models to production with Kubernetes
Deploy your machine learning models to production with Kubernetes
cnvrg.io AI OS - Hands-on ML Workshops
 
How to use continual learning in your ML models
How to use continual learning in your ML modelsHow to use continual learning in your ML models
How to use continual learning in your ML models
cnvrg.io AI OS - Hands-on ML Workshops
 
Cnvrg webinar continual learning
Cnvrg webinar   continual learningCnvrg webinar   continual learning
Cnvrg webinar continual learning
Maya Perry
 
Webinar kubernetes and-spark
Webinar  kubernetes and-sparkWebinar  kubernetes and-spark
Webinar kubernetes and-spark
cnvrg.io AI OS - Hands-on ML Workshops
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest Airports
Databricks
 
Containerized architectures for deep learning
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learning
Antje Barth
 
How To Build Auto-Adaptive Machine Learning Models with Kubernetes
How To Build Auto-Adaptive Machine Learning Models with KubernetesHow To Build Auto-Adaptive Machine Learning Models with Kubernetes
How To Build Auto-Adaptive Machine Learning Models with Kubernetes
cnvrg.io AI OS - Hands-on ML Workshops
 
How to set up Kubernetes for all your machine learning workflows
How to set up Kubernetes for all your machine learning workflowsHow to set up Kubernetes for all your machine learning workflows
How to set up Kubernetes for all your machine learning workflows
cnvrg.io AI OS - Hands-on ML Workshops
 
Serverless spark
Serverless sparkServerless spark
Serverless spark
MamathaBusi
 
Kubeflow repos
Kubeflow reposKubeflow repos
Kubeflow repos
Weiqiang Zhuang
 
Pipelines for model deployment
Pipelines for model deploymentPipelines for model deployment
Pipelines for model deployment
Ramon Navarro
 
Sas 2015 event_driven
Sas 2015 event_drivenSas 2015 event_driven
Sas 2015 event_driven
Sascha Möllering
 
How kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updatedHow kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updated
Shikha Srivastava
 
Getting Started with Innoslate
Getting Started with InnoslateGetting Started with Innoslate
Getting Started with Innoslate
Elizabeth Steiner
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
Pieter de Bruin
 
AI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with KnativeAI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with Knative
Animesh Singh
 
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
DataScienceConferenc1
 
Agile Secure Cloud Application Development Management
Agile Secure Cloud Application Development ManagementAgile Secure Cloud Application Development Management
Agile Secure Cloud Application Development Management
Adam Getchell
 
Geek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsGeek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure Environments
IDERA Software
 
Using Metrics for Fun, Developing with the KV Store + Javascript & News from ...
Using Metrics for Fun, Developing with the KV Store + Javascript & News from ...Using Metrics for Fun, Developing with the KV Store + Javascript & News from ...
Using Metrics for Fun, Developing with the KV Store + Javascript & News from ...
Harry McLaren
 

Similar to How to monitor your ML models in production with Kubernetes (20)

Deploy your machine learning models to production with Kubernetes
Deploy your machine learning models to production with KubernetesDeploy your machine learning models to production with Kubernetes
Deploy your machine learning models to production with Kubernetes
 
How to use continual learning in your ML models
How to use continual learning in your ML modelsHow to use continual learning in your ML models
How to use continual learning in your ML models
 
Cnvrg webinar continual learning
Cnvrg webinar   continual learningCnvrg webinar   continual learning
Cnvrg webinar continual learning
 
Webinar kubernetes and-spark
Webinar  kubernetes and-sparkWebinar  kubernetes and-spark
Webinar kubernetes and-spark
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest Airports
 
Containerized architectures for deep learning
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learning
 
How To Build Auto-Adaptive Machine Learning Models with Kubernetes
How To Build Auto-Adaptive Machine Learning Models with KubernetesHow To Build Auto-Adaptive Machine Learning Models with Kubernetes
How To Build Auto-Adaptive Machine Learning Models with Kubernetes
 
How to set up Kubernetes for all your machine learning workflows
How to set up Kubernetes for all your machine learning workflowsHow to set up Kubernetes for all your machine learning workflows
How to set up Kubernetes for all your machine learning workflows
 
Serverless spark
Serverless sparkServerless spark
Serverless spark
 
Kubeflow repos
Kubeflow reposKubeflow repos
Kubeflow repos
 
Pipelines for model deployment
Pipelines for model deploymentPipelines for model deployment
Pipelines for model deployment
 
Sas 2015 event_driven
Sas 2015 event_drivenSas 2015 event_driven
Sas 2015 event_driven
 
How kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updatedHow kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updated
 
Getting Started with Innoslate
Getting Started with InnoslateGetting Started with Innoslate
Getting Started with Innoslate
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
AI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with KnativeAI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with Knative
 
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
 
Agile Secure Cloud Application Development Management
Agile Secure Cloud Application Development ManagementAgile Secure Cloud Application Development Management
Agile Secure Cloud Application Development Management
 
Geek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsGeek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure Environments
 
Using Metrics for Fun, Developing with the KV Store + Javascript & News from ...
Using Metrics for Fun, Developing with the KV Store + Javascript & News from ...Using Metrics for Fun, Developing with the KV Store + Javascript & News from ...
Using Metrics for Fun, Developing with the KV Store + Javascript & News from ...
 

Recently uploaded

Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 

Recently uploaded (20)

Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 

How to monitor your ML models in production with Kubernetes

  • 1. WORKSHOP Monitor ML models in production How to monitor and manage the health of your ml models Leah Kolben, CTO @leah4kosh leah@cnvrg.io
  • 2. whoami • Developer/Data scientist => CTO • cnvrg.io = built by data scientists, for data scientists to help teams: • Get from data to models to production in the most efficient and fast way • bridge science and engineering
  • 3. agenda • Introduction – recap previous webinars • Kubernetes overview • Why should we monitor our models • Monitor tools • LIVE workshop • Summary
  • 4. Introduction • Previous webinars: • Train your ML models on Kubernetes • Run ETL jobs using spark on Kubernetes • Deploy your ML model in production • Today: “My model is in production, now what?” • Use Grafana to monitor your model metrics – CPU & Memory • Use Kibana to monitor your ML model logs • Use elasicsearch to index your model and create alerts
  • 5. Kubernetes - recap • Provides a runtime environment for Docker containers • Provides an abstraction layer for containers to run on • Deploy as micro services • All services are natively load balanced • Can scale up and down dynamically • Monitor the health of the containers • Schedule runs and cronjobs • Use the same API across EVERY cloud provider and bare metal!
  • 6. • Goals: • Quickly deploy your ML model as a service • Reduce costs and man power with auto scaling • Load balanced the traffic • Natively monitored by Kubernetes • Update your model continuously: canary deployments, blue/green deployments ML in production - recap
  • 7. Why should we monitor our models? • Able to track you model performance • Prevent model drift • Monitor auto scaling and high load of traffic • Know when to retrain your model and updat it • Monitor updates – validate new version is better the old one • Keep you production models updated and relevant!
  • 8. Monitor tools • Use the EKS stack on kubernetes • Use open source tools that can be quickly installed on kuberentes using helm • Use grafana to monitor the resources you rmodel uses • Use Kibana to show and track you input/output • Use elasticsearch to index the logs • Use elasalert for tagging and creating alerts regarding the health of your models
  • 9. Grafana • Open source analytics and monitoring for a variety of databases • Natively connects to the EKS stack • Use custom dashboards to track your model usage
  • 11. Grafana – custom dashboard
  • 12. Kibana • Part of the EKS stack • Log all input and output • Log internal logs • Search and find prediction outputs • Visualize and understand the data
  • 14. Elasticsearch & elasalert • Store and index all input/output of you models • Get sense of you data • Query your data • Create custom rules on the data to monitor the health of your model • Trigger custom webhooks upon alerts – retrain CI/CD pipelines
  • 16. Using cnvrg to deploy & monitor your models • Your only responsibility is to write the predict function • end-2-end pipelines • EKS stack is deployed automatically • Dedicated Grafana for each endpoint • Dedicated kibana for each endpoint • Alerts and triggers out of the box • Continual learning support
  • 17. DEMO
  • 18. Summary • Kubernetes is the becoming the standard way to deploy models • Overview Kubernetes • Overview of what’s ML in production • Monitor – why and how • Live demo • Deploy, monitor and retrain models using cnvrg