Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approaches, cases, tools)
AI & BigData Online Day 2021
Website - https://aiconf.com.ua/
Youtube - https://www.youtube.com/startuplviv
FB - https://www.facebook.com/aiconf
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approaches, cases, tools)
1. MLOps: your way from
nonsense to valuable effect
A p p r o a c h e s , c a s e s , t o o l s
2. Speaker about
Head of the Data Science architecture at
“First Ukrainian International Bank”
Mykola Mykytenko
Experience:
• 10+ years in software development
• Construction of high-load BI solution
• Construction of architecture for Data Science applications and
processes
3. 30
років на ринку
243
1 825
млн. грн податків
сплатив банк в
2020 році
топ-5
в щорічному рейтингу
«50 провідних банків
України»
4-й
банк в Україні за
об'ємом чистого
прибутку за 2020 рік
8,5
1,6
відділення в
Україні
тисячі
співробітників
мільйонів
роздрібних клієнтів
Найкомфортніших
банків, лідер в номінації
«Зручність» за версією
Forbes Україна
топ-3
ПУМБ - НАЙБІЛЬШИЙ БАНК З УКРАЇНСЬКИМ
ПРИВАТНИМ КАПІТАЛОМ
9. No ML model in production,
No Happy Business!!!
Models can only add value to an organization
when insights are regularly available to end user.
No Happy Client!!!
11. In which phase
ML project stall?
Only fifth of ML projects are not
delayed
46% of ML project ready for production
deployment but stall for some reason
Alegion, Dimensional Research’s “What data scientists tell us about AI model training today”
12. Develop and deploy model
is not a one-man story
S e e i n s c r e e n s i n N o v e m b e r 2 0 2 1
Data Scientist
who deploy to production by
himself
13. In which phase
ML project stall?
Only fifth of ML projects are not
delayed
46% of ML project ready for production
deployment but stall for some reason
O’Reilly Media, “What Is MLOps? “ by Mark Treveil, Lynn Heidman
14. Wide range of
ML system
elements
• To develop and operate complex
systems like these you need add
additional principles, such as CI,
CD and CT.
Google, Hidden Technical Debt in Machine Learning Systems
15. ML project long
dev lifecycle
• A lot of stages
• A lot of different tasks
• A lot of knowledge areas
• A lot of a lot
Setting Up Machine Learning Projects. Full Stack Deep Learning
16. How long it takes
to deploy an ML
model into
production?
Algorithmia’s “2020 State of Enterprise ML”
ML model deployment timeline
17. What can help
deal with it?
• MLOps is the set of practices at the
intersection of Machine Learning,
DevOps and Machine Learning
• MLOps is a set of policies, practices
and governance for managing
Machine Learning solutions
throughout the lifecycle
18. Benefits of
MLOps
• Accelerate time-to-value
• Improve confidence in model and
quality of production
• Optimize productivity of team,
lower development time to 80%
• Faster response times
• Improve compliance with
regulatory requirements
• Manage infrastructure
Fast development
Fast response times
Quality of production models
Compliance with
regulatory governance
19. Key points of
MLOps?
• Organizational
• Governance
• Teams
• Communications
• Resource
• People
• Technology
• Approaches
• Frameworks
20. How does it
work?
• Focus on improve communication
• Manage and remove duplicate
routines
• Focus on collaboration
• Build platform and centralize
framework
• Build automated proceeds
• Implements approaches and tools
“What is MLOps” from nealanalytics blog
22. MLOps level 0
• Every step is manual, require manual execution
• Require manual transition
• Disconnection between DS and engineers lead
training-serving skew
• Infrequent release iterations: new model
deployed a couple times per year
• CI is ignored because few changes, testing is part
of the notebooks
• CD is ignore aren’t frequent model version
deployments
• Deployment refers to the prediction service, not
entire ML system
• Lack of performance monitoring, do no track log,
predictions and actions Prediction
service
Model
serving
Model
registry
Trained
model
Manual model
development
Manual data
processing
Offline data ML Ops
experimentation/development/test
staging/preproduction/production
24. What makes Data Scientist numb with horror?
Modeling Deploy
Magician of Mathematics and Algorithms
25. Offline (Batch)
Online (Web-service)
Real-time processing (Embedded)
The first step in determining how to deploy a model
Is understanding how end users should interact with
that model’s prediction
26. How do we see ML applications differently
D a t a S c i e n t i s t C u s t o m e r
27. Offline (Batch)
• Run on a bulk data
• Run on schedule
• Not needed real-time
• Flexibly manage resource
• Rerun the job of case of error
• Well educated model
Orchestrator DWH or Data Lake
Batch job pipeline
Trigger Get data
Save data and log
DWH or Data Lake
Save predict
Call model
ML model
28. Offline (Batch)
• Use: Collection return delay score
• Predict all batch for 5-10 minutes (from
end-to-end)
• Data always ready for business
Airflow
DWH
Trigger
Get data
Data Lake
Save predict,
data and log
Call model
Model store
VM Host
Python app Load model
Run
ML model
29. Online (sync API)
• Run on current case
• 24/7 execution ready
• Application append on result
• User enable to take action
• Impossible manage resource
• Not possible to fix production error
• Latency important
ML model on cluster
Load balancer
Call
Client Client Client
Call
Wait response!
Data Lake
Call log
Predict
log
Performance log
30. Online (sync API)
• Use: Customer fraud detection
• Hundred milliseconds for answer
• Dynamic cluster management and
balancing
• High risk of governance
Kubernetes cluster
Nginx
Call
App App Actions app
Call
Wait response!
Data Lake
Call log
Performance log
Predict and
container log
Fraud or not?
Data Science на страже закона: как мы автоматизировали финансовый мониторинг в ПУМБ
Call
31. Online (async API)
• Run on current case
• 24/7 execution ready
• Application do not append on result
• Result is not critical
• Possible manage resource
• Not possible to fix production error
• Latency not so important
ML model on cluster
Load balancer
Async call
Client Client Client
Async call
Async Framework
32. Real-time
• Run on current case
• 24/7 execution ready
• High-level load
• Application append on result
• Impossible manage resource
• Not possible to fix production error
• Latency critical
Clients
Application engine
Real-time processing app
Embedded ML module
Frameworks: jep, jpy, Jython, jni
33. Real-time
• Use: Transaction fraud
• Tens milliseconds for answer
• Thousands of transaction per second
Processing center
Application engine
Processing business workflow
Embedded ML module
Transactions
34. MLOps level 1
• Achieve goal - model in production
• Automatically build, test, package
and deploy
• Unit testing serving scripts
• Testing integration between
components
• Testing that model do not produce
bad or NaN
• Testing that components produce
artifacts
• Monitor model
Prediction service
Model registry
Trained model
Manual model
development
Manual data
processing
MLOps
experimentation/development/test
staging/preproduction/production
CI
CD: Pipeline
deployment
CD: Model serving Automated training
pipeline
Performance monitoring
35. CI/CD
• All done automated, Data Scientist and
Business get notifications with result
• Pipeline and its components are built,
tested, and packaged when new code is
committed or pushed, or merged
• Train model on huge data if required
• Spent time only on model analyze
• Test it well
Source code
Push or merge
event
Get model
Model registry
Model serving
Integration test
Prediction service Monitoring Metadata store
Metadata store
Get metadata
36. Testing
• Testing is crucial to ensure that
frequent changes aren’t sacrificing
quality
• Software testing is well studied
“The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction”
37. ML model testing
• Intersection of Machine learning and
testing less well explored in the
literature
• ML system testing more complex
• ML system depends strongly on data and
models (cannot be strongly specified)
• ML models needs debug ability, rollbacks
and monitoring
• 28 actionable tests at Google
“The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction”
38. Testing pyramid
• Contain Data, ML, UI, Integrated,
Performances and Governance
testing
• More types of testing will make
your Test Pyramid rethink
“Continuous Delivery for Machine Learning”, Danilo Sato, Arif Wider, Cristof Windheuser
39. Monitoring
• Based on data
• Based on ML model
• Based on system
• At least contain 2-3 metrics for all three
categories
• Operation metrics should be monitored real-
time level or at least daily
• Stability and performance metrics at larger
time frame depending on domain (weekly, etc.)
Abzooba, “MLOps: Model Monitoring 101” by Pronojit Saha and Dr. Arnab Bose
40. Monitoring
influence
• Model performance implemented
by tracking performance
• Trigger based on performance
metric decrease call automated
model retraining and making
decision
INNOQ, “MLOps Principles” by Dr. Larysa Visengeriyeva, Anja Kammer, Isabel Bar and other
41. MLOps level 2
• Support interactive execution for
quick experimentation and long-
running jobs
• Provide data connectors to wide data
source range
• Provide efficient data transformation
and feature engineering
• Support scalable batch and stream
data
• Support feature and metadata store
• Support Data Engineer and Data
Scientist tasks management
Prediction service
Model registry
Trained model
Manual model
development
Data
analysis
experimentation/development/test
staging/preproduction/production
CI
CD: Pipeline
deployment
CD: Model serving
Performance monitoring
Feature store Batch fetching
42. Data processing
platform
• Keep all data changes
• Available to repeat experiments
• Data connected to model
• Track journey of data
• Visualize of complete data flow
• Implement low risk changes
• Track errors in data processes
• Roll out and roll back steps
• Data discovery
Pachyderm, “Pachyderm overview”
Data Versioning Data Pipelines Data Lineage
43. Two loops
• qqqq
“Completing the Machine Learning Loop”, Jimmy Whitaker
• Dealing with two moving pieces: Code and
Data
• Code Loop is crucial for ML model stability and
efficiency
• Data Loop is essential to improving model
quality and relevance
44. MLOps level 3
• Automated ML pipelines lets Data
Scientists rapidly explore new
ideas around feature engineering,
model architecture and hyper
parameters
• Data Scientists can implements
and deploy to target environments
ideas automatically
Model registry
Trained model
Automated model
development
Data
analysis
experimentation/development/test
staging/preproduction/production
CI
CD: Pipeline
deployment
Performance monitoring
Batch fetching
Trigger
45. Automated CI/CD
ML pipeline
• End-to-end automated pipeline
consists 6-8 steps
• The data analysis step is still a
manual process for Data Scientist
before new pipeline iteration
• The model analysis as step is also
a manual process
Google, “”MLOps: Continuous delivery and automation pipelines in machine learning”
46. Continuous Delivery
for Machine
Learning (CD4ML)
Software engineering approach in
which cross-functional team
produces machine learning
applications based on code, data and
models in small and safe increments
that can be reproduced and reliably
released at any time, in short
adaptation cycles.
“Continuous Delivery for Machine Learning”, Danilo Sato, Arif Wider, Cristof Windheuser
End-to-end Continuous Delivery process for ML
47. Governance
responsible roles
• Regulatory complience
• Reproducibility and traceability
• Audit and documentation
• Pre-production verification
• Transparency and expandability
• Quality and compliance
O’Reilly Media, “What Is MLOps? “ by Mark Treveil, Lynn Heidman
48. M L O p s P l a t f o r m
CPUs GPUs
HDFS
Storage
Compute
C o l l a b o r a t e
M o n i t o r i n g
On-Premises
Data Preparation Build Train Deploy
Businesses Domain Experts Data Engineers Data Scientists ML Architects
49. Thank you for attention!
W W W . P U M B . U A
Mykola Mykytenko
N i ko l ay. M i k i t e n ko @ f u i b . c o m
i n fo @ f u i b . c o m
50.
51. Links
1. Перший Український Міжнародний Банк
2. VentureBeat report, Transform 2019
3. Alegion, Dimensional Research’s Survey “What data scientists tell us about AI model training today”
4. O’Reilly Media, “What Is MLOps? “ by Mark Treveil, Lynn Heidman
5. Google, Hidden Technical Debt in Machine Learning Systems
6. Setting up Machine Learning Projects, Full Stack Deep Learning
7. Algorithmia’s “2020 State of Enterprise ML”
8. “What is MLOps” from nealanalytics blog
9. Data Science на страже закона: как мы автоматизировали финансовый мониторинг в ПУМБ
10.The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction
11.“Continuous Delivery for Machine Learning”, Danilo Sato, Arif Wider, Cristof Windheuser
12.Abzooba, “MLOps: Model Monitoring 101” by Pronojit Saha and Dr. Arnab Bose
52. Links
13. INNOQ, “MLOps Principles” by Dr. Larysa Visengeriyeva, Anja Kammer, Isabel Bar and other
14. Pachyderm, “Pachyderm overview”
15. Completing the Machine Learning Loop”, Jimmy Whitaker
16. Google, “”MLOps: Continuous delivery and automation pipelines in machine learning”