SlideShare a Scribd company logo
MLOps implemented - how we
combine the cloud & open-source
to boost data scientists work
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Marek Wiewiórka
Chief Data Architect
marek.wiewiorka@getindata.com
Krzysztof Zarzycki
Chief Technology Officer
krzysztof.zarzycki@getindata.com
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Founded by ex-Spotify
engineers in 2014
Focus only on Big Data and
Cloud (from day 1)
Community builders
(Big Data Tech Warsaw, blogs,
OSS)
80+ Big Data engineers
(and growing)
GetInData in a Nutshell
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
How We Got to MLOps
2015
Google publishes
“Hidden Technical Debt in
Machine Learning Systems“
2018
Started building a
cloud-native ML platform at
ING Bank
2019
started building a ML
Platform for a large Polish
telecom
2020
Built ML Platform for
Kcell, the largest Kazakh
Telecom
2020
MLOps projects started
with retail (cloud),
mobile app
2021
MLOps project started
for the largest Polish
bank (cloud)
and more...
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
● Software Engineering-like process but for ML models
● The pipeline is the result, not the model
● No IT required, for Data Science to production
● Freedom of choice of tools
● Loosely coupled mix of cloud services and open-source
● Best of breed instead of all-in-one approach
Our MLOps Principles
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Data Science Workbench - Our Vision
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Data Scientists IDE - Batteries Included
●
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Data Scientists IDE - Batteries Included
●
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Kedro - Data Scientist’s Swiss Knife
● Kedro is an open-source Python framework for
creating reproducible, maintainable and
modular data science code
● Kedro’s main concepts:
○ Project template
○ Configuration and environments
○ Data catalog
○ Nodes and pipelines
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
● common directory structure for all
projects
● customizable Cookiecutter
templates
● boilerplate code for a ML project
using Kedro framework
● official and in-house baked
kedro new --starter=pyspark
Kedro - Project starters
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Kedro - Data Catalog
Data source definition:
● Separation of
transformations code
and data connectors
● Can be reused
between projects
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
● Node - a Python function
that has zero to many inputs
and/or output datasets
● Pipeline - a DAG. A
collection of nodes with
defined relationships and
dependencies.
kedro run
Kedro viz
Kedro Nodes and Pipelines
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
1. Log into JupyterLab
2. Create a project with a Kedro starter
3. EDA with notebooks & pipeline
implementation using VS Code
4. Run your project and automatically track
experiment with a local MLflow
5. Optionally schedule it with a local Airflow
6. Repeat until you’re happy with your model !
Local development with Kedro and MLflow
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
● Pipeline containerization with
kedro-docker
● DAGs generation and scheduling
with one of our plugins:
○ kedro-airflow-k8s
○ kedro-kubeflow
● Dataset stability with
kedro-popmon (together with ING)
● Kubernetes pods profiling(R&D)
● CI/CD for maximum automation
Delivering ML Model to Production
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
● Freedom of toolkit choice
with containerized execution
● Scalable training
● Experiments and models
tracking
● “Continuous Training”
Schedule- or event-driven
Model Training
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
● CI/CD
● Models from registry
● Batch & online
● Scalability
● Extensive monitoring
Model Serving
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Model Deployment to Production!
writes
produces
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Kedro-kubeflow
kedro-airflow-k8s
Model deployer
Jupyter plugins
Prebaked images
Google
AI Platform
Experimentation Training Serving
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
MLOps R&D
● Align Data and ML engineering
● Feature Store
○ Feast, GCP, AWS
● Kedro
○ Company-wide data discovery tools
○ Hyperparameters tuning
○ Serving, model deployment
● Advanced deployments
● Retraining, data drift
● Business monitoring, outcome attribution
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
● Focus on unlocking data scientists
○ Start with Data Science Workbench
○ Make code reproducible by CI
○ Then build Scalable Training
How to Start with MLOps?
Thank you! - Dziękujemy!
github.com/getindata/kedro-kubeflow
github.com/getindata/kedro-airflow-k8s

More Related Content

Similar to MLOps implemented - how we combine the cloud & open-source to boost data scientists work - Krzysztof Zarzycki, Marek Wiewiórka - GetInData

InfluxDB + Kepware: Start Monitoring Industrial Data Quickly
InfluxDB + Kepware: Start Monitoring Industrial Data QuicklyInfluxDB + Kepware: Start Monitoring Industrial Data Quickly
InfluxDB + Kepware: Start Monitoring Industrial Data Quickly
InfluxData
 
AirBNB's ML platform - BigHead
AirBNB's ML platform - BigHeadAirBNB's ML platform - BigHead
AirBNB's ML platform - BigHead
Karthik Murugesan
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Databricks
 
Neo4j: The path to success with Graph Database and Graph Data Science
Neo4j: The path to success with Graph Database and Graph Data ScienceNeo4j: The path to success with Graph Database and Graph Data Science
Neo4j: The path to success with Graph Database and Graph Data Science
Neo4j
 
Working with Oracle Big Data Cloud Compute Edition and Apache Zeppelin
Working with Oracle Big Data Cloud Compute Edition and Apache ZeppelinWorking with Oracle Big Data Cloud Compute Edition and Apache Zeppelin
Working with Oracle Big Data Cloud Compute Edition and Apache Zeppelin
Edelweiss Kammermann
 
Get the Exact Identity Solution You Need - In the Cloud - Overview
Get the Exact Identity Solution You Need - In the Cloud - OverviewGet the Exact Identity Solution You Need - In the Cloud - Overview
Get the Exact Identity Solution You Need - In the Cloud - Overview
ForgeRock
 
CHIPS Alliance_Object Automation Inc_workshop
CHIPS Alliance_Object Automation Inc_workshopCHIPS Alliance_Object Automation Inc_workshop
CHIPS Alliance_Object Automation Inc_workshop
Object Automation
 
Distributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDLDistributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDL
Yulia Tell
 
Introduction to GCP
Introduction to GCPIntroduction to GCP
Introduction to GCP
Knoldus Inc.
 
Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?
DoKC
 
Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?
DoKC
 
Session 8 - Creating Data Processing Services | Train the Trainers Program
Session 8 - Creating Data Processing Services | Train the Trainers ProgramSession 8 - Creating Data Processing Services | Train the Trainers Program
Session 8 - Creating Data Processing Services | Train the Trainers Program
FIWARE
 
"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies
Data Science Milan
 
Summit 16: NetIDE: Integrating and Orchestrating SDN Controllers
Summit 16: NetIDE: Integrating and Orchestrating SDN ControllersSummit 16: NetIDE: Integrating and Orchestrating SDN Controllers
Summit 16: NetIDE: Integrating and Orchestrating SDN Controllers
OPNFV
 
Docker Birthday #5 Meetup Cluj - Presentation
Docker Birthday #5 Meetup Cluj - PresentationDocker Birthday #5 Meetup Cluj - Presentation
Docker Birthday #5 Meetup Cluj - Presentation
Alex Vranceanu
 
Head in the clouds @ bol.com
Head in the clouds @ bol.comHead in the clouds @ bol.com
Head in the clouds @ bol.com
Maarten Dirkse
 
Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers Program
FIWARE
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kubernetes
kloia
 
UC18NA-D3D202-Dianomic-IZoratti-Introduction-To-FogLAMP.pdf
UC18NA-D3D202-Dianomic-IZoratti-Introduction-To-FogLAMP.pdfUC18NA-D3D202-Dianomic-IZoratti-Introduction-To-FogLAMP.pdf
UC18NA-D3D202-Dianomic-IZoratti-Introduction-To-FogLAMP.pdf
Wlamir Molinari
 

Similar to MLOps implemented - how we combine the cloud & open-source to boost data scientists work - Krzysztof Zarzycki, Marek Wiewiórka - GetInData (20)

InfluxDB + Kepware: Start Monitoring Industrial Data Quickly
InfluxDB + Kepware: Start Monitoring Industrial Data QuicklyInfluxDB + Kepware: Start Monitoring Industrial Data Quickly
InfluxDB + Kepware: Start Monitoring Industrial Data Quickly
 
AirBNB's ML platform - BigHead
AirBNB's ML platform - BigHeadAirBNB's ML platform - BigHead
AirBNB's ML platform - BigHead
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 
Neo4j: The path to success with Graph Database and Graph Data Science
Neo4j: The path to success with Graph Database and Graph Data ScienceNeo4j: The path to success with Graph Database and Graph Data Science
Neo4j: The path to success with Graph Database and Graph Data Science
 
Working with Oracle Big Data Cloud Compute Edition and Apache Zeppelin
Working with Oracle Big Data Cloud Compute Edition and Apache ZeppelinWorking with Oracle Big Data Cloud Compute Edition and Apache Zeppelin
Working with Oracle Big Data Cloud Compute Edition and Apache Zeppelin
 
Get the Exact Identity Solution You Need - In the Cloud - Overview
Get the Exact Identity Solution You Need - In the Cloud - OverviewGet the Exact Identity Solution You Need - In the Cloud - Overview
Get the Exact Identity Solution You Need - In the Cloud - Overview
 
CHIPS Alliance_Object Automation Inc_workshop
CHIPS Alliance_Object Automation Inc_workshopCHIPS Alliance_Object Automation Inc_workshop
CHIPS Alliance_Object Automation Inc_workshop
 
Distributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDLDistributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDL
 
Introduction to GCP
Introduction to GCPIntroduction to GCP
Introduction to GCP
 
Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?
 
Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?
 
Session 8 - Creating Data Processing Services | Train the Trainers Program
Session 8 - Creating Data Processing Services | Train the Trainers ProgramSession 8 - Creating Data Processing Services | Train the Trainers Program
Session 8 - Creating Data Processing Services | Train the Trainers Program
 
"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies
 
Summit 16: NetIDE: Integrating and Orchestrating SDN Controllers
Summit 16: NetIDE: Integrating and Orchestrating SDN ControllersSummit 16: NetIDE: Integrating and Orchestrating SDN Controllers
Summit 16: NetIDE: Integrating and Orchestrating SDN Controllers
 
Docker Birthday #5 Meetup Cluj - Presentation
Docker Birthday #5 Meetup Cluj - PresentationDocker Birthday #5 Meetup Cluj - Presentation
Docker Birthday #5 Meetup Cluj - Presentation
 
Head in the clouds @ bol.com
Head in the clouds @ bol.comHead in the clouds @ bol.com
Head in the clouds @ bol.com
 
Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers Program
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kubernetes
 
UC18NA-D3D202-Dianomic-IZoratti-Introduction-To-FogLAMP.pdf
UC18NA-D3D202-Dianomic-IZoratti-Introduction-To-FogLAMP.pdfUC18NA-D3D202-Dianomic-IZoratti-Introduction-To-FogLAMP.pdf
UC18NA-D3D202-Dianomic-IZoratti-Introduction-To-FogLAMP.pdf
 
Gerardo Carmona Embedded Engineer
Gerardo Carmona Embedded EngineerGerardo Carmona Embedded Engineer
Gerardo Carmona Embedded Engineer
 

More from GetInData

Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...
GetInData
 
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr MenclewiczData-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
GetInData
 
How NOT to win a Kaggle competition
How NOT to win a Kaggle competitionHow NOT to win a Kaggle competition
How NOT to win a Kaggle competition
GetInData
 
How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team? How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team?
GetInData
 
OpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easierOpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easier
GetInData
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML Platform
GetInData
 
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
GetInData
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
GetInData
 
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInDataFeast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
GetInData
 
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
GetInData
 
Big data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInDataBig data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInData
GetInData
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
GetInData
 
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
GetInData
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
GetInData
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...
GetInData
 
Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...
GetInData
 
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
GetInData
 
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
GetInData
 
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInDataStrategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
GetInData
 

More from GetInData (20)

Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...
 
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr MenclewiczData-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
 
How NOT to win a Kaggle competition
How NOT to win a Kaggle competitionHow NOT to win a Kaggle competition
How NOT to win a Kaggle competition
 
How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team? How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team?
 
OpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easierOpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easier
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML Platform
 
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInDataFeast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
 
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
 
Big data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInDataBig data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInData
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...
 
Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...
 
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
 
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
 
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInDataStrategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
 

Recently uploaded

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 

Recently uploaded (20)

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 

MLOps implemented - how we combine the cloud & open-source to boost data scientists work - Krzysztof Zarzycki, Marek Wiewiórka - GetInData

  • 1. MLOps implemented - how we combine the cloud & open-source to boost data scientists work
  • 2. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Marek Wiewiórka Chief Data Architect marek.wiewiorka@getindata.com Krzysztof Zarzycki Chief Technology Officer krzysztof.zarzycki@getindata.com
  • 3. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Founded by ex-Spotify engineers in 2014 Focus only on Big Data and Cloud (from day 1) Community builders (Big Data Tech Warsaw, blogs, OSS) 80+ Big Data engineers (and growing) GetInData in a Nutshell
  • 4. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  • 5. © Copyright. All rights reserved. Not to be reproduced without prior written consent. How We Got to MLOps 2015 Google publishes “Hidden Technical Debt in Machine Learning Systems“ 2018 Started building a cloud-native ML platform at ING Bank 2019 started building a ML Platform for a large Polish telecom 2020 Built ML Platform for Kcell, the largest Kazakh Telecom 2020 MLOps projects started with retail (cloud), mobile app 2021 MLOps project started for the largest Polish bank (cloud) and more...
  • 6. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ● Software Engineering-like process but for ML models ● The pipeline is the result, not the model ● No IT required, for Data Science to production ● Freedom of choice of tools ● Loosely coupled mix of cloud services and open-source ● Best of breed instead of all-in-one approach Our MLOps Principles
  • 7.
  • 8. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Data Science Workbench - Our Vision
  • 9. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Data Scientists IDE - Batteries Included ●
  • 10. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Data Scientists IDE - Batteries Included ●
  • 11. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Kedro - Data Scientist’s Swiss Knife ● Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code ● Kedro’s main concepts: ○ Project template ○ Configuration and environments ○ Data catalog ○ Nodes and pipelines
  • 12. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ● common directory structure for all projects ● customizable Cookiecutter templates ● boilerplate code for a ML project using Kedro framework ● official and in-house baked kedro new --starter=pyspark Kedro - Project starters
  • 13. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Kedro - Data Catalog Data source definition: ● Separation of transformations code and data connectors ● Can be reused between projects
  • 14. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ● Node - a Python function that has zero to many inputs and/or output datasets ● Pipeline - a DAG. A collection of nodes with defined relationships and dependencies. kedro run Kedro viz Kedro Nodes and Pipelines
  • 15. © Copyright. All rights reserved. Not to be reproduced without prior written consent. 1. Log into JupyterLab 2. Create a project with a Kedro starter 3. EDA with notebooks & pipeline implementation using VS Code 4. Run your project and automatically track experiment with a local MLflow 5. Optionally schedule it with a local Airflow 6. Repeat until you’re happy with your model ! Local development with Kedro and MLflow
  • 16. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ● Pipeline containerization with kedro-docker ● DAGs generation and scheduling with one of our plugins: ○ kedro-airflow-k8s ○ kedro-kubeflow ● Dataset stability with kedro-popmon (together with ING) ● Kubernetes pods profiling(R&D) ● CI/CD for maximum automation Delivering ML Model to Production
  • 17. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ● Freedom of toolkit choice with containerized execution ● Scalable training ● Experiments and models tracking ● “Continuous Training” Schedule- or event-driven Model Training
  • 18. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ● CI/CD ● Models from registry ● Batch & online ● Scalability ● Extensive monitoring Model Serving
  • 19. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Model Deployment to Production! writes produces
  • 20. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Kedro-kubeflow kedro-airflow-k8s Model deployer Jupyter plugins Prebaked images Google AI Platform Experimentation Training Serving
  • 21. © Copyright. All rights reserved. Not to be reproduced without prior written consent. MLOps R&D ● Align Data and ML engineering ● Feature Store ○ Feast, GCP, AWS ● Kedro ○ Company-wide data discovery tools ○ Hyperparameters tuning ○ Serving, model deployment ● Advanced deployments ● Retraining, data drift ● Business monitoring, outcome attribution
  • 22. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ● Focus on unlocking data scientists ○ Start with Data Science Workbench ○ Make code reproducible by CI ○ Then build Scalable Training How to Start with MLOps?
  • 23. Thank you! - Dziękujemy! github.com/getindata/kedro-kubeflow github.com/getindata/kedro-airflow-k8s