CNCF-Istanbul-MLOps for Devops Engineers.pptx

Bridging DevOps and MLOps:
A Practitioner’s Guide
CNCF Istanbul, October 2024

2
☘️Introduction
github.com/ckavili
cansu@redhat.com
Cansu Kavili Örnek

3
DevOps Data Science
‍
♀️
‍
️
‍
♀️
‍
♀️
‍
♀️
‍
♀️
‍
♀️
‍
♀️
‍
♀️
‍
♀️
‍
♀️
‍
♀️
‍
♀️
‍
♀️
‍
♀️Let’s start

🥜 DevOps in a Nutshell
Plan &
Code
Build &
Test
Release &
Deploy
Monitor &
Learn

5
The practice of deploying
machine learning models into
production reliably and
efficiently.
🤖 What is MLOps?

Because, many of the challenges facing developers also apply to data scientists
6
🥰 Why MLOps?

● Siloed organization and poor communication between teams
● “Works on my machine” ‍
♀️
‍
️
‍
♀️
‍
♀️
‍
♀️
‍
♀️
‍
♀️
‍
♀️
‍
♀️
‍
♀️
‍
♀️
‍
♀️
‍
♀️
‍
♀️ ‍
♀️
● Lacking the ability to properly test, deploy, maintain software
● Not having access to decision makers
● Verify the model/feature you deploy is still relevant
● Reproducibility, traceability and explainability
● ...
7
🙊 Common Challenges

8
The Machine Learning Lifecycle
Data Engineering
Data Ingestion
Data Cleansing
Data Analysis
Data Transformation
Data Validation
Data Science
Data Splitting
Feature Engineering
Model Development
Model Training
Training Optimization
Model Validation
Continuous Integration &
Deployment
Data Preprocessing
App Dev / Heuristics
Inferencing Pipeline
Deployment Targets
Deployment Patterns
Monitor / alerts
Consumption & optimization metrics
Satisficing (Gating) metric
Logging & Visualization
Explainability, Interpolation
Drift, Decay, Skew, Shift
Improvements
Gather and Prep Data Deployment Monitoring
Training
MLOps
DataOps Experimentation
🦄 MLOps Overview

9
🦄 MLOps Overview
Operationalizing AI/ML requires collaboration
App developer
ML platform engineer
Data engineer
Data scientist
ML engineer
Business leadership
Set goals Gather and prepare data Training Monitoring
Deployment
Every member of your team plays a critical role in a complex
process

11
🦄 MLOps Overview
The Machine Learning Lifecycle
Gather and prepare data Monitor model
Develop model
Retrain model
Deploy model

12
🦄 MLOps Overview
We’ve seen this before..
Gather and prepare data Monitor model
Develop model
Retrain model
Code Deploy Operate & monitor
QA
Iterate
Deploy model

15
🔥 Data Science Essentials
What is a model really?
In a nutshell, it is a set of parameters plus the algorithm
or the neural network architecture, that can be packaged
in a single (usually binary or compressed) file.

16
Square Pentagon Triangle
Raw Data
Labeled Data
Training Data
Test Data
Square
Pentagon
Squar
e
Triangle
TrianglePentagon
Square Pentagon
Triangle Square
Triangle
Model Training
Model
+
Model Artifact
Model Evaluation
87% Accuracy
.82 R-square
.032 MSE
Model Training Overview

17
Model Serving
Model Model in a
Container
Clients
API
Input
Prediction

18
What about LLMs?

20
Clark
Kent
ML Platform Engineer
ckent@redhat.com
(123) 456-7890
Metropolis, NY
SKILLS
Kubernetes
Day Two Ops
GPU Management
GitOps
Containers
Python Package Management
CAREER OBJECTIVE
To enable Data Science teams to scale development and
experimentation of Data Science projects using Kube
Native tooling to more rapidly prove value with machine
learning.
WORK EXPERIENCE
● Created GPU enabled kubernetes cluster to enable
multiple data science teams to collaborate and
rapidly iterate on data science experiments.
Resulted in the a reduction from the time of
experiment idea to proof of value and an increase in
the number of value added experiments.
● Managed large multi-tenant environment and
provided best practices for managing shared
resources for large, distributed compute ML training
jobs, including GPU, CPU, and large memory pools.
This effort resulted in increased resource utilization,
and faster training times for ML jobs.
● Created multiple Jupyter Notebook Images with
team specific Python packages to increase
collaboration and reduce number of python
dependency issues.
● Established multi-cluster architecture to enable
training and deployment of ML models alongside
existing non-ML microservices.
🐈 ML Platform Engineer

21
🐈 ML Platform Engineer
Multi Cluster Architecture
ML Training Cluster Application Cluster
Multi-Tenant Projects Multi-Tenant Projects
GitOps Management
GitOps Management
Accelerated Compute
S3 Compatible Storage
Cluster Monitoring Cluster Monitoring
IDE
Distributed ML Training
ML Pipelines
Model Serving
Model Monitoring
Model Explainability
S3 Compatible Storage
Experiment
Deployment

23
🐈‍
⬛ ML Engineer
Lois
Lane
ML Engineer
llane@redhat.com
(123) 987-6543
Metropolis, NY
SKILLS
Kubernetes
GitOps
CI/CD Automation
Python
Testing
REST/GRCP APIs
Observability
CAREER OBJECTIVE
Help businesses to evolve ML Experiments to production
ready inference services by creating repeatable pipelines
while building trust in ML services.
WORK EXPERIENCE
● Assisted Data Scientist to transform ML Experiment
into production ready model and deploy ML model
as an API endpoint that is able to be consumed as a
microservice. Enabled data science team to
actualize value of experiment and get the model out
of Jupyter.
● Created repeatable pipeline to orchestrate training
and deployment of ML model as a REST API.
Resulted in rapid iteration of ML model enabling
increased accuracy of predictions.
● Created robust ML testing to ensure code changes
result in accurate ML models and validate that
deployed models using blue/green deployment
strategies. Resulted in reduced number of rolled
back models, and increased performance of model
accuracy against production results.
● Created observability with dashboards and alerts of
model performance with relation to inference time,
accuracy of predictions, and impact on business
objectives.

24
🐈‍
⬛ ML Engineer
Build Training
Container
ML Pipeline
Gather data Process data Train model
Download
existing model
Compare new
model with
existing
Deploy new
model if better

25
🐈‍
⬛ ML Engineer
Python Package Landscape
Model Training Data Tools
Package Management
Poetry
Code Quality and Testing
Polars
Other ML Tools
Miscellaneous

27
DevOps methodology for ML models. Operationalize CI/CD pipeline for ML.
❤️opendatahub.io
github.com/opendatahub-io

29
📖 What’s in the box?
● Scale distributed computing for AI
● Automatically adjust the underlying
workers based on demands
● Multi-user Jupyter
● Used for data science
and research
● Streamline the entire ML lifecycle
● Accelerate model development &
deployment
● Design pipeline with
drag and drop ease
● Kubeflow integration
● Supports multiple ML frameworks
● Deploy and scale AI models
quickly and efficiently
● AI Explainability Toolkit
● Aims to mitigate AI bias,
enhancing trust and fairness in AI
systems
Trusty AI

30
Sounds interesting?
🎶 Final things
Try Open Data Hub
Try Red Hat Developer Sandbox - OpenShift AI Playground
Check AI On OpenShift
Join upstream communities like Kubeflow, KServe, TrustyAI

@ckavili
Thank you!
cansu@redhat.com
linkedin.com/in/ckavili

CNCF-Istanbul-MLOps for Devops Engineers.pptx

More Related Content

Similar to CNCF-Istanbul-MLOps for Devops Engineers.pptx

Recently uploaded

CNCF-Istanbul-MLOps for Devops Engineers.pptx

Editor's Notes