Cristiano Rocha and Bart Buter
Building a scalable &
open source ML
platform
• Introduction to MLOps
• Use case: ING ML batch platform
• Exploration to production
• Considerations
Agenda
Sander van Donkelaar
Machine Learning Engineer
Xebia Data
About us…
Cristiano Rocha
Principal AI & Data engineer
Xebia Data
Bart Buter
Batch Execution Platform P.O. &
Analytics Platform Adoption Lead
ING
The machine learning lifecycle
The machine learning lifecycle
The machine learning lifecycle
The machine learning lifecycle
But machine learning in production is hard…
• Deploying a model requires
significant efforts in the IT
infrastructure.
• Difficult for other applications to
leverage produced insights.
• Data scientist != MLOps Engineer
• Deploying a model requires a different
skillset than building it.
• No CI/CD
• Lack of collaboration
• Lack of best-practices
• No standardization
Productionalizing ML: market observations
People
Technology
Processes
• Flexible
• Scalable
• Innovative
• Cross-functional
• Self-reliant
• Collaboration
• Standardized
• Automated
• Self-service
• Repeatable
MLOps: the ideal world
People
Technology
Processes
MLOps:
business
value for ING
Auditability
• Increased efficiency &
reusability.
• Reduced time to value
• Rapid onboarding of
new use-cases.
• Automation
Speed Quality
• Incorporate software
engineering best
practices
• Consistency
• Reliability
• Transparancy
• Model- and data
lineage
• Reproducibility
Security
Teams should be able to deploy train- and
deploy their own models, without relying on
support from ops teams
Solution should fully leverage open-source
and cloud native tooling (Kubernetes, Mlflow,
Airflow etc.)
It should be possible to dynamically
distribute- and scale workloads to distribute
resources efficiently
ING: some extra requirements for MLOps
Open-Source / (Private) Cloud Self-service
Scalable
Solution should service multiple customers.
Permissions should be determined on the
basis of roles- and group memberships.
Multi-tenancy
ING Analytics Platforms
Exploration
Execution
Private cloud Public cloud
API
Streaming
Exploration &
Model Development
Batch
Public Cloud Exploration
Public Cloud Execution
Platform
Maturity
Time
Sandbox
environment
Standardize
deployment
and
onboarding
processeses
Implement quality
checks, introduce
data- and model
lineage. Introduce
automatic retraining.
Self-service ML
Standardize- and centralize
monitoring of ML models.
Automated ML solution
onboarding
Experiment
Consistent
Reliable
Scalable &
Optimized
MLOps maturity levels
ML model lifecycle
Open source software
Con’s
• (Usually) not enterprise-ready as it requires
modifications or extensions to meet organazational’s
requirements
• Higher overhead (e.g., deployment on more complex
scenarios)
• Lack of vendor support for criticas issues (e.g. SLA’s)
Pro’s
• Modular: solves a specific purpose
• Community: embedded best practices and more
support for questions and issues
• Flexibility through customization
• Quicker updates and vulnerability fixes
• Reduced vendor lock-in risks
• Lower vendor costs
• Reduces integration costs
Open source tools & frameworks
1/2
Kubernete
s
System for orchestrating and
automating deployment,
scaling and management of
containerized applications.
MLFlo
w
Platform for managing
end-to-end machine learning
lifecycle (including experiment
tracking, model management,
deployment and registry).
Airflo
w
Platform for creating, scheduling
and monitoring workflow. When
such workflows are defined as
code, they become more
maintainable, versionable,
testable, and collaborative.
Open source tools & frameworks
2/2
Spar
k
Engine for executing data
engineering, data science and
machine learning workloads in
a distributed manner for fast
analytic queries against data of
any size.
NGIN
X
HTTP and reverse proxy
server. As a reverse proxy, it
sits in front of servers and
forward client request to such
servers. Typically, it is added to
help increasing security and
reliability.
Open Policy
Agent
General-purpose engine to
enforce policies in a
decoupled manner. OPA
provides a declarative
language that lets you specify
policy as code and simple APIs
to offload policy
decision-making from an
application logic.
ML model lifecycle
ML Batch Exploration
Architecture
ML Batch Execution
Architecture
Role-based access control on MLFlow OSS
Open Policy Agent policies description (policies.rego)
Golden path template
Template for training a model
Template for promoting models to Production
Template for promoting models to Production
Automatically set model metadata
• Use Golden-path repositories to
create examples that incorporate
best practices.
• Solution must be flexible enough to cover
a wide variety of use cases.
• But this should not come at the cost of
complexity!
• Modular: More difficult, requires a wide
variety of domain knowledge.
• End2End: Easier, but it can give problems
in the long term.
Considerations
Flexibility vs. Complexity
Self-Service vs.
Best-Practices
Modular vs. End2End

Building a Scalable and reliable open source ML Platform with MLFlow

  • 1.
    Cristiano Rocha andBart Buter Building a scalable & open source ML platform
  • 2.
    • Introduction toMLOps • Use case: ING ML batch platform • Exploration to production • Considerations Agenda
  • 3.
    Sander van Donkelaar MachineLearning Engineer Xebia Data About us… Cristiano Rocha Principal AI & Data engineer Xebia Data Bart Buter Batch Execution Platform P.O. & Analytics Platform Adoption Lead ING
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
    But machine learningin production is hard…
  • 9.
    • Deploying amodel requires significant efforts in the IT infrastructure. • Difficult for other applications to leverage produced insights. • Data scientist != MLOps Engineer • Deploying a model requires a different skillset than building it. • No CI/CD • Lack of collaboration • Lack of best-practices • No standardization Productionalizing ML: market observations People Technology Processes
  • 10.
    • Flexible • Scalable •Innovative • Cross-functional • Self-reliant • Collaboration • Standardized • Automated • Self-service • Repeatable MLOps: the ideal world People Technology Processes
  • 11.
    MLOps: business value for ING Auditability •Increased efficiency & reusability. • Reduced time to value • Rapid onboarding of new use-cases. • Automation Speed Quality • Incorporate software engineering best practices • Consistency • Reliability • Transparancy • Model- and data lineage • Reproducibility Security
  • 12.
    Teams should beable to deploy train- and deploy their own models, without relying on support from ops teams Solution should fully leverage open-source and cloud native tooling (Kubernetes, Mlflow, Airflow etc.) It should be possible to dynamically distribute- and scale workloads to distribute resources efficiently ING: some extra requirements for MLOps Open-Source / (Private) Cloud Self-service Scalable Solution should service multiple customers. Permissions should be determined on the basis of roles- and group memberships. Multi-tenancy
  • 13.
    ING Analytics Platforms Exploration Execution Privatecloud Public cloud API Streaming Exploration & Model Development Batch Public Cloud Exploration Public Cloud Execution
  • 14.
    Platform Maturity Time Sandbox environment Standardize deployment and onboarding processeses Implement quality checks, introduce data-and model lineage. Introduce automatic retraining. Self-service ML Standardize- and centralize monitoring of ML models. Automated ML solution onboarding Experiment Consistent Reliable Scalable & Optimized MLOps maturity levels
  • 15.
  • 16.
    Open source software Con’s •(Usually) not enterprise-ready as it requires modifications or extensions to meet organazational’s requirements • Higher overhead (e.g., deployment on more complex scenarios) • Lack of vendor support for criticas issues (e.g. SLA’s) Pro’s • Modular: solves a specific purpose • Community: embedded best practices and more support for questions and issues • Flexibility through customization • Quicker updates and vulnerability fixes • Reduced vendor lock-in risks • Lower vendor costs • Reduces integration costs
  • 17.
    Open source tools& frameworks 1/2 Kubernete s System for orchestrating and automating deployment, scaling and management of containerized applications. MLFlo w Platform for managing end-to-end machine learning lifecycle (including experiment tracking, model management, deployment and registry). Airflo w Platform for creating, scheduling and monitoring workflow. When such workflows are defined as code, they become more maintainable, versionable, testable, and collaborative.
  • 18.
    Open source tools& frameworks 2/2 Spar k Engine for executing data engineering, data science and machine learning workloads in a distributed manner for fast analytic queries against data of any size. NGIN X HTTP and reverse proxy server. As a reverse proxy, it sits in front of servers and forward client request to such servers. Typically, it is added to help increasing security and reliability. Open Policy Agent General-purpose engine to enforce policies in a decoupled manner. OPA provides a declarative language that lets you specify policy as code and simple APIs to offload policy decision-making from an application logic.
  • 19.
  • 20.
  • 21.
  • 22.
    Role-based access controlon MLFlow OSS Open Policy Agent policies description (policies.rego)
  • 23.
    Golden path template Templatefor training a model
  • 24.
    Template for promotingmodels to Production
  • 25.
    Template for promotingmodels to Production
  • 26.
  • 27.
    • Use Golden-pathrepositories to create examples that incorporate best practices. • Solution must be flexible enough to cover a wide variety of use cases. • But this should not come at the cost of complexity! • Modular: More difficult, requires a wide variety of domain knowledge. • End2End: Easier, but it can give problems in the long term. Considerations Flexibility vs. Complexity Self-Service vs. Best-Practices Modular vs. End2End