SlideShare a Scribd company logo
1 of 34
Download to read offline
Rakesh Suresh
Jainik Vora
Transforming Data Processing with
Kubernetes: Journey Towards a
Self-Serve Data Mesh
Nov 6, 2023
© 2023 Intuit Inc. All rights reserved. 2
Speaker Introduction
Jainik Vora
Sr. Staff Software Engineer
Rakesh Suresh
Sr. Staff Software Engineer
jainiksvora
rakeshsuresh
3
© 2023 Intuit Inc. All rights reserved.
Agenda
Intuit
About Intuit and its mission
Data Mesh
What is Data Mesh and the problems it addresses
Data Lake & Data Mesh
How Intuit implements data mesh with a real example
Intuit’s Data Mesh Concepts
Foundational concepts defined for Data Mesh
Self-Serve Data Processing on Kubernetes
Architecture of Batch and Stream Processing Platform
©2022 Intuit Inc. All rights reserved. 4
100%
services on
Modern SaaS
65B
machine learning
predictions per day
24k
financial institutions
[+50 crypto]
$560B
money moved
3.6B
requests during
peak season (no
customer failures)
Data
Integration
Fintech
Infrastructure
Identity
AI
Infrastructure
Modern Dev
Experience
AI-driven expert platform
Intuit is leading the way in building an AI-native development platform using cloud native open source
technology. We’re committed to building tools that scale and giving back to the open source community.
©2022 Intuit Inc. All rights reserved.
We believe in open source
and open collaboration
bit.ly/intuit-oss
Created, open-sourced,
used, and maintained
by Intuit
Recipient of the
End User Award
in 2019 & 2022
End user of Cloud
Native and mobile
open source tech
© 2023 Intuit Inc. All rights reserved. 6
Data Mesh
© 2023 Intuit Inc. All rights reserved. 7
What is Data Mesh?
A data mesh is a decentralized data
architecture that organizes data by a
specific business domain.
Instead of data acting as a
by-product of a process, it becomes
the product, where data producers
act as data product owners.
© 2023 Intuit Inc. All rights reserved. 10
Why Data Mesh?
Improve value
of Data
Smart Product
Experiences
using Data
Power AI
Power Generative AI Applications
like Intuit Assist
Reduce time
to discover &
access Data
Serve variety
of Data
Personas
© 2023 Intuit Inc. All rights reserved. 11
Data Mesh Principles
Lorem ipsum
congue
Data Mesh
Domain
Driven
Ownership
Data
Product
Data Access
Self Serve
Infrastructure
© 2023 Intuit Inc. All rights reserved. 12
Data Lake & Data
Mesh
Small Business Owner has unpaid
Invoices
The small business owner logins to quickbooks and
realizes there are unpaid invoices from customers.
Options provided by the system:
◆ (A) The system reminds the owner about unpaid invoices
◆ (B) The system also offers an add-on feature that
automatically sends invoice reminders to customers
Small Biz & Quickbooks
© 2023 Intuit Inc. All rights reserved. 14
Invoice Business Technical Requirement
❖ Notification: Track &
Remind Business
Owner and their
Customer
❖ Get unpaid invoices
by Business
❖ Get unpaid invoices
for each Customer
grouped by
Business
© 2023 Intuit Inc. All rights reserved. 15
Traditional Data Lake Architecture
© 2023 Intuit Inc. All rights reserved. 16
Traditional Data Lake Architecture for Invoice
© 2023 Intuit Inc. All rights reserved. 17
Invoice is our Data Domain
© 2023 Intuit Inc. All rights reserved. 18
Lets ask some questions on Invoice Data..
◆ How do I find Invoice data for my use
case?
◆ Who is the domain expert for Invoice
data?
◆ What is the schema of the Invoice data?
◆ Where is Invoice data located for
consumption?
◆ How can I get access to Invoice data
and who can approve?
◆ Is there derived data from Invoice? How
do I derive data from Invoice?
© 2023 Intuit Inc. All rights reserved. 19
Data Mesh
Concepts for Intuit
© 2023 Intuit Inc. All rights reserved. 20
Organization & Discovery of Data
How do I find Invoice data for my use case?
Data Map
Organization of data using domain, sub-domain and
bounded context
© 2023 Intuit Inc. All rights reserved. 21
Organization & Discovery of Data
How do I find Invoice data for my use case?
Data Map
Organization of data using domain, sub-domain and
bounded context
Data Product
Foundational unit of data map, organized by data map
© 2023 Intuit Inc. All rights reserved. 22
Ownership of Data
Who is the domain expert for Invoice data?
Data Product
Consolidates essential information to enable data
consumers
Data Steward
Defines the data product and responsible for its contract
© 2023 Intuit Inc. All rights reserved. 23
Data Contract
What is the schema of the Invoice data?
Semantic Model
Consolidates essential modeling and schema
information enabling data consumers understand the
data
SLA
Defines the data product and responsible for its contract
like data quality, data freshness etc…
© 2023 Intuit Inc. All rights reserved. 24
Data Ports
Where is Invoice data located for consumption?
Data Assets
Provides location and medium through which data can
be consumed
Tag
Additional context for optimal discovery
© 2023 Intuit Inc. All rights reserved. 25
Access & Governance
How can I get access to Invoice data and who can approve?
Access Control List
Track explicit read and write access control
Access approved by Data Steward
© 2023 Intuit Inc. All rights reserved. 26
Data Concepts - Data Processing & Lineage
Is there derived data from Invoice? How do I derive data from Invoice?
© 2023 Intuit Inc. All rights reserved. 27
Self-Serve
Data Processing on
Kubernetes
© 2023 Intuit Inc. All rights reserved. 28
Scope of Data Processing At Intuit
Scale
2000+ users
100,000+ pipelines (batch
and streaming)
Variety of Users
Data Engineers
Data Scientists
Machine Learning Engineers
Data Analysts
Variety of Use Cases
Type
◆ Batch
◆ Streaming
Categories
◆ Model Training & Feature Gen
◆ Derivation & Enrichments
◆ Data Movement
© 2023 Intuit Inc. All rights reserved. 29
Self Serve Data Processing as Paved Path
Operate & Monitor
◆ Log forwarding & Metrics
reporting
◆ Alert & Notification
◆ DR Failover & Failback
Provision & Deploy
◆ Infrastructure provisioning
◆ Deployment of processing
artifacts
◆ Data Map Registration
◆ Lineage
Author & Define
◆ Authoring tools geared towards
user persona and expertise
◆ Access to input and output
◆ Scheduling & Orchestration
© 2023 Intuit Inc. All rights reserved. 30
Batch Processing Platform Architecture
© 2023 Intuit Inc. All rights reserved. 31
Stream Processing Platform Architecture
© 2023 Intuit Inc. All rights reserved. 32
Kubernetes Power Data Processing
Intuit Kubernetes Service
◆ Core Infrastructure Layer
◆ Runs
– Control plane APIs
– Processing jobs
Argo Workflow & Events
◆ Scheduling & Orchestration
◆ Deployment Workflow
© 2023 Intuit Inc. All rights reserved. 33
Learn more
Data Mesh
◆ Data Mesh Principles & Logical Architecture by Zhamak Dehghani
◆ Intuit’s Data Mesh Strategy
◆ Intuit’s Data Mesh Concepts
Data Processing
◆ How Intuit Built Stream Processing Platform with Flink
◆ Large Scale Batch Processing with Argo Workflow and Events
Q&A
© 2023 Intuit Inc. All rights reserved. 34

More Related Content

Similar to Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Data Mesh

Building the Internet of Everything
Building the Internet of Everything Building the Internet of Everything
Building the Internet of Everything Cisco Canada
 
SwiftAnt - Business Intelligence Presentation
SwiftAnt - Business Intelligence PresentationSwiftAnt - Business Intelligence Presentation
SwiftAnt - Business Intelligence PresentationAnusiyaRaja
 
SwiftAnt - Business Intelligence Presentation
SwiftAnt - Business Intelligence Presentation SwiftAnt - Business Intelligence Presentation
SwiftAnt - Business Intelligence Presentation AnusiyaRaja
 
Rethinking the Database in the IoT Era
Rethinking the Database in the IoT EraRethinking the Database in the IoT Era
Rethinking the Database in the IoT EraInfluxData
 
HiveMQ & HighByte Presents: Building an Enterprise Unified Namespace (UNS) to...
HiveMQ & HighByte Presents: Building an Enterprise Unified Namespace (UNS) to...HiveMQ & HighByte Presents: Building an Enterprise Unified Namespace (UNS) to...
HiveMQ & HighByte Presents: Building an Enterprise Unified Namespace (UNS) to...HiveMQ
 
Effective IoT System on Openstack
Effective IoT System on OpenstackEffective IoT System on Openstack
Effective IoT System on OpenstackTakashi Kajinami
 
IRJET- Integration of Cloud Computing and Big Data for Detecting the Black Mo...
IRJET- Integration of Cloud Computing and Big Data for Detecting the Black Mo...IRJET- Integration of Cloud Computing and Big Data for Detecting the Black Mo...
IRJET- Integration of Cloud Computing and Big Data for Detecting the Black Mo...IRJET Journal
 
A comprehensive guide on Data Engineering for IoT-1.pdf
A comprehensive guide on Data Engineering for IoT-1.pdfA comprehensive guide on Data Engineering for IoT-1.pdf
A comprehensive guide on Data Engineering for IoT-1.pdftv2064526
 
2016 Cloud Unbound Briefing
2016 Cloud Unbound Briefing2016 Cloud Unbound Briefing
2016 Cloud Unbound BriefingScott Cameron
 
How to Evaluate, Rollout and Operationalize Your SD-WAN Projects
How to Evaluate, Rollout and Operationalize Your SD-WAN ProjectsHow to Evaluate, Rollout and Operationalize Your SD-WAN Projects
How to Evaluate, Rollout and Operationalize Your SD-WAN ProjectsThousandEyes
 
11-Module-4 Opportunities and Challenges, Architectures for convergence,Data ...
11-Module-4 Opportunities and Challenges, Architectures for convergence,Data ...11-Module-4 Opportunities and Challenges, Architectures for convergence,Data ...
11-Module-4 Opportunities and Challenges, Architectures for convergence,Data ...RahulJain989779
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Timothy Spann
 
The IoT Food Chain – Picking the Right Dining Partner is Important with Dean ...
The IoT Food Chain – Picking the Right Dining Partner is Important with Dean ...The IoT Food Chain – Picking the Right Dining Partner is Important with Dean ...
The IoT Food Chain – Picking the Right Dining Partner is Important with Dean ...gogo6
 
AWS O&G Day - Ambyint and AWS
AWS O&G Day - Ambyint and AWSAWS O&G Day - Ambyint and AWS
AWS O&G Day - Ambyint and AWSAWS Summits
 
Pratik_Patel_CV_2016
Pratik_Patel_CV_2016Pratik_Patel_CV_2016
Pratik_Patel_CV_2016Pratik Patel
 
Pratik_Patel_CV_2016
Pratik_Patel_CV_2016Pratik_Patel_CV_2016
Pratik_Patel_CV_2016Pratik Patel
 
AWS Live Panel Discussion: Simplify Large-Scale IoT Device Management
 AWS Live Panel Discussion: Simplify Large-Scale IoT Device Management AWS Live Panel Discussion: Simplify Large-Scale IoT Device Management
AWS Live Panel Discussion: Simplify Large-Scale IoT Device ManagementAmazon Web Services
 

Similar to Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Data Mesh (20)

Building the Internet of Everything
Building the Internet of Everything Building the Internet of Everything
Building the Internet of Everything
 
SwiftAnt - Business Intelligence Presentation
SwiftAnt - Business Intelligence PresentationSwiftAnt - Business Intelligence Presentation
SwiftAnt - Business Intelligence Presentation
 
SwiftAnt - Business Intelligence Presentation
SwiftAnt - Business Intelligence Presentation SwiftAnt - Business Intelligence Presentation
SwiftAnt - Business Intelligence Presentation
 
Rethinking the Database in the IoT Era
Rethinking the Database in the IoT EraRethinking the Database in the IoT Era
Rethinking the Database in the IoT Era
 
HiveMQ & HighByte Presents: Building an Enterprise Unified Namespace (UNS) to...
HiveMQ & HighByte Presents: Building an Enterprise Unified Namespace (UNS) to...HiveMQ & HighByte Presents: Building an Enterprise Unified Namespace (UNS) to...
HiveMQ & HighByte Presents: Building an Enterprise Unified Namespace (UNS) to...
 
Effective IoT System on Openstack
Effective IoT System on OpenstackEffective IoT System on Openstack
Effective IoT System on Openstack
 
IRJET- Integration of Cloud Computing and Big Data for Detecting the Black Mo...
IRJET- Integration of Cloud Computing and Big Data for Detecting the Black Mo...IRJET- Integration of Cloud Computing and Big Data for Detecting the Black Mo...
IRJET- Integration of Cloud Computing and Big Data for Detecting the Black Mo...
 
A comprehensive guide on Data Engineering for IoT-1.pdf
A comprehensive guide on Data Engineering for IoT-1.pdfA comprehensive guide on Data Engineering for IoT-1.pdf
A comprehensive guide on Data Engineering for IoT-1.pdf
 
2016 Cloud Unbound Briefing
2016 Cloud Unbound Briefing2016 Cloud Unbound Briefing
2016 Cloud Unbound Briefing
 
iot
iotiot
iot
 
How to Evaluate, Rollout and Operationalize Your SD-WAN Projects
How to Evaluate, Rollout and Operationalize Your SD-WAN ProjectsHow to Evaluate, Rollout and Operationalize Your SD-WAN Projects
How to Evaluate, Rollout and Operationalize Your SD-WAN Projects
 
What is InitVerse.pdf
What is InitVerse.pdfWhat is InitVerse.pdf
What is InitVerse.pdf
 
11-Module-4 Opportunities and Challenges, Architectures for convergence,Data ...
11-Module-4 Opportunities and Challenges, Architectures for convergence,Data ...11-Module-4 Opportunities and Challenges, Architectures for convergence,Data ...
11-Module-4 Opportunities and Challenges, Architectures for convergence,Data ...
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
 
[IJET-V2I2P8] Authors:Ms. Madhushree M.Kubsad
[IJET-V2I2P8] Authors:Ms. Madhushree M.Kubsad[IJET-V2I2P8] Authors:Ms. Madhushree M.Kubsad
[IJET-V2I2P8] Authors:Ms. Madhushree M.Kubsad
 
The IoT Food Chain – Picking the Right Dining Partner is Important with Dean ...
The IoT Food Chain – Picking the Right Dining Partner is Important with Dean ...The IoT Food Chain – Picking the Right Dining Partner is Important with Dean ...
The IoT Food Chain – Picking the Right Dining Partner is Important with Dean ...
 
AWS O&G Day - Ambyint and AWS
AWS O&G Day - Ambyint and AWSAWS O&G Day - Ambyint and AWS
AWS O&G Day - Ambyint and AWS
 
Pratik_Patel_CV_2016
Pratik_Patel_CV_2016Pratik_Patel_CV_2016
Pratik_Patel_CV_2016
 
Pratik_Patel_CV_2016
Pratik_Patel_CV_2016Pratik_Patel_CV_2016
Pratik_Patel_CV_2016
 
AWS Live Panel Discussion: Simplify Large-Scale IoT Device Management
 AWS Live Panel Discussion: Simplify Large-Scale IoT Device Management AWS Live Panel Discussion: Simplify Large-Scale IoT Device Management
AWS Live Panel Discussion: Simplify Large-Scale IoT Device Management
 

More from DoKC

Distributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and HowDistributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and HowDoKC
 
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes OperatorsIs It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes OperatorsDoKC
 
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryStop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryDoKC
 
The State of Stateful on Kubernetes
The State of Stateful on KubernetesThe State of Stateful on Kubernetes
The State of Stateful on KubernetesDoKC
 
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...DoKC
 
Make Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-ReadyMake Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-ReadyDoKC
 
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...DoKC
 
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudRun PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudDoKC
 
The Kubernetes Native Database
The Kubernetes Native DatabaseThe Kubernetes Native Database
The Kubernetes Native DatabaseDoKC
 
ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023DoKC
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentDoKC
 
StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154DoKC
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...DoKC
 
Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151DoKC
 
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...DoKC
 
Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147DoKC
 
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...DoKC
 
We will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sWe will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sDoKC
 
Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators DoKC
 
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...DoKC
 

More from DoKC (20)

Distributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and HowDistributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and How
 
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes OperatorsIs It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
 
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryStop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
 
The State of Stateful on Kubernetes
The State of Stateful on KubernetesThe State of Stateful on Kubernetes
The State of Stateful on Kubernetes
 
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
 
Make Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-ReadyMake Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-Ready
 
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
 
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudRun PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
 
The Kubernetes Native Database
The Kubernetes Native DatabaseThe Kubernetes Native Database
The Kubernetes Native Database
 
ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch government
 
StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
 
Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151
 
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
 
Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147
 
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
 
We will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sWe will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8s
 
Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators
 
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
 

Recently uploaded

APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 

Recently uploaded (20)

APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 

Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Data Mesh

  • 1. Rakesh Suresh Jainik Vora Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Data Mesh Nov 6, 2023
  • 2. © 2023 Intuit Inc. All rights reserved. 2 Speaker Introduction Jainik Vora Sr. Staff Software Engineer Rakesh Suresh Sr. Staff Software Engineer jainiksvora rakeshsuresh
  • 3. 3 © 2023 Intuit Inc. All rights reserved. Agenda Intuit About Intuit and its mission Data Mesh What is Data Mesh and the problems it addresses Data Lake & Data Mesh How Intuit implements data mesh with a real example Intuit’s Data Mesh Concepts Foundational concepts defined for Data Mesh Self-Serve Data Processing on Kubernetes Architecture of Batch and Stream Processing Platform
  • 4. ©2022 Intuit Inc. All rights reserved. 4 100% services on Modern SaaS 65B machine learning predictions per day 24k financial institutions [+50 crypto] $560B money moved 3.6B requests during peak season (no customer failures) Data Integration Fintech Infrastructure Identity AI Infrastructure Modern Dev Experience AI-driven expert platform Intuit is leading the way in building an AI-native development platform using cloud native open source technology. We’re committed to building tools that scale and giving back to the open source community.
  • 5. ©2022 Intuit Inc. All rights reserved. We believe in open source and open collaboration bit.ly/intuit-oss Created, open-sourced, used, and maintained by Intuit Recipient of the End User Award in 2019 & 2022 End user of Cloud Native and mobile open source tech
  • 6. © 2023 Intuit Inc. All rights reserved. 6 Data Mesh
  • 7. © 2023 Intuit Inc. All rights reserved. 7 What is Data Mesh?
  • 8. A data mesh is a decentralized data architecture that organizes data by a specific business domain.
  • 9. Instead of data acting as a by-product of a process, it becomes the product, where data producers act as data product owners.
  • 10. © 2023 Intuit Inc. All rights reserved. 10 Why Data Mesh? Improve value of Data Smart Product Experiences using Data Power AI Power Generative AI Applications like Intuit Assist Reduce time to discover & access Data Serve variety of Data Personas
  • 11. © 2023 Intuit Inc. All rights reserved. 11 Data Mesh Principles Lorem ipsum congue Data Mesh Domain Driven Ownership Data Product Data Access Self Serve Infrastructure
  • 12. © 2023 Intuit Inc. All rights reserved. 12 Data Lake & Data Mesh
  • 13. Small Business Owner has unpaid Invoices The small business owner logins to quickbooks and realizes there are unpaid invoices from customers. Options provided by the system: ◆ (A) The system reminds the owner about unpaid invoices ◆ (B) The system also offers an add-on feature that automatically sends invoice reminders to customers Small Biz & Quickbooks
  • 14. © 2023 Intuit Inc. All rights reserved. 14 Invoice Business Technical Requirement ❖ Notification: Track & Remind Business Owner and their Customer ❖ Get unpaid invoices by Business ❖ Get unpaid invoices for each Customer grouped by Business
  • 15. © 2023 Intuit Inc. All rights reserved. 15 Traditional Data Lake Architecture
  • 16. © 2023 Intuit Inc. All rights reserved. 16 Traditional Data Lake Architecture for Invoice
  • 17. © 2023 Intuit Inc. All rights reserved. 17 Invoice is our Data Domain
  • 18. © 2023 Intuit Inc. All rights reserved. 18 Lets ask some questions on Invoice Data.. ◆ How do I find Invoice data for my use case? ◆ Who is the domain expert for Invoice data? ◆ What is the schema of the Invoice data? ◆ Where is Invoice data located for consumption? ◆ How can I get access to Invoice data and who can approve? ◆ Is there derived data from Invoice? How do I derive data from Invoice?
  • 19. © 2023 Intuit Inc. All rights reserved. 19 Data Mesh Concepts for Intuit
  • 20. © 2023 Intuit Inc. All rights reserved. 20 Organization & Discovery of Data How do I find Invoice data for my use case? Data Map Organization of data using domain, sub-domain and bounded context
  • 21. © 2023 Intuit Inc. All rights reserved. 21 Organization & Discovery of Data How do I find Invoice data for my use case? Data Map Organization of data using domain, sub-domain and bounded context Data Product Foundational unit of data map, organized by data map
  • 22. © 2023 Intuit Inc. All rights reserved. 22 Ownership of Data Who is the domain expert for Invoice data? Data Product Consolidates essential information to enable data consumers Data Steward Defines the data product and responsible for its contract
  • 23. © 2023 Intuit Inc. All rights reserved. 23 Data Contract What is the schema of the Invoice data? Semantic Model Consolidates essential modeling and schema information enabling data consumers understand the data SLA Defines the data product and responsible for its contract like data quality, data freshness etc…
  • 24. © 2023 Intuit Inc. All rights reserved. 24 Data Ports Where is Invoice data located for consumption? Data Assets Provides location and medium through which data can be consumed Tag Additional context for optimal discovery
  • 25. © 2023 Intuit Inc. All rights reserved. 25 Access & Governance How can I get access to Invoice data and who can approve? Access Control List Track explicit read and write access control Access approved by Data Steward
  • 26. © 2023 Intuit Inc. All rights reserved. 26 Data Concepts - Data Processing & Lineage Is there derived data from Invoice? How do I derive data from Invoice?
  • 27. © 2023 Intuit Inc. All rights reserved. 27 Self-Serve Data Processing on Kubernetes
  • 28. © 2023 Intuit Inc. All rights reserved. 28 Scope of Data Processing At Intuit Scale 2000+ users 100,000+ pipelines (batch and streaming) Variety of Users Data Engineers Data Scientists Machine Learning Engineers Data Analysts Variety of Use Cases Type ◆ Batch ◆ Streaming Categories ◆ Model Training & Feature Gen ◆ Derivation & Enrichments ◆ Data Movement
  • 29. © 2023 Intuit Inc. All rights reserved. 29 Self Serve Data Processing as Paved Path Operate & Monitor ◆ Log forwarding & Metrics reporting ◆ Alert & Notification ◆ DR Failover & Failback Provision & Deploy ◆ Infrastructure provisioning ◆ Deployment of processing artifacts ◆ Data Map Registration ◆ Lineage Author & Define ◆ Authoring tools geared towards user persona and expertise ◆ Access to input and output ◆ Scheduling & Orchestration
  • 30. © 2023 Intuit Inc. All rights reserved. 30 Batch Processing Platform Architecture
  • 31. © 2023 Intuit Inc. All rights reserved. 31 Stream Processing Platform Architecture
  • 32. © 2023 Intuit Inc. All rights reserved. 32 Kubernetes Power Data Processing Intuit Kubernetes Service ◆ Core Infrastructure Layer ◆ Runs – Control plane APIs – Processing jobs Argo Workflow & Events ◆ Scheduling & Orchestration ◆ Deployment Workflow
  • 33. © 2023 Intuit Inc. All rights reserved. 33 Learn more Data Mesh ◆ Data Mesh Principles & Logical Architecture by Zhamak Dehghani ◆ Intuit’s Data Mesh Strategy ◆ Intuit’s Data Mesh Concepts Data Processing ◆ How Intuit Built Stream Processing Platform with Flink ◆ Large Scale Batch Processing with Argo Workflow and Events
  • 34. Q&A © 2023 Intuit Inc. All rights reserved. 34