SlideShare a Scribd company logo
1 of 24
Download to read offline
Fabian Hardt
SEPTEMBER 2022
HOW SERVICE MESH
FITS INTO THE
MODERN DATA
STACK
AGENDA
MOTIVATION
01
WHAT IS THE MDS
02
SUMMARY
04
ARCHITECTURE
03
MOTIVATION
01
Data Lake and DWH are combined
In the same way, BI and AI are growing together, see e.g.
Snowflake and Databricks. Counter-movement to
increasing specialization. In general: it moves a lot.
WHAT WE ARE CURRENTLY SEEING...
A lot of money in the market
With it increasing fragmentation of functionalities; each
new startup takes care of a special function. This
distribution of individual functionalities in separate tools.
Cloud is the new standard
Hardly anyone still builds analytical architectures on-
premises. But there are exceptions. The Hadoop
ecosystem is becoming less important; Data lakes
increasingly on object storage in the public cloud.
Use of software development best
practices
Also as a result of migration to the cloud. So infrastructure
as code, automation, CI/CD. Increasing sympathy for code-
first and open source, return of frameworks, DIY, SQL only
(skills!).
WHAT IS THE MODERN DATA STACK
02
CORE CHARACTERISTICS OF THE MODERN DATA STACK (MDS)
Automation and
operationalization
Basic paradigms of modern software
development are being introduced, including
GitOps, CI/CD, containers and automated
testing.
Best of Breed and Modular
A Modern Data Stack has a modular
structure. Individual components can be
exchanged. EL+T are separated. The best tool
is selected for each discipline.
Cloud DWH
Central data storage component of the
Modern Data Stack. Combines advantages
of data lake and data warehouse.
SaaS / IaC
Focus on maintainability and low time to
market. This can be achieved using SaaS
services from cloud providers or automation
using IaC.
WHY TO USE MODERN DATA STACK?
¢ Data Mesh
¢ Total hype at the moment
¢ Organizational Framework for Data Driven Companies
¢ Data Products with APIs - similar to microservices
¢ Domain Driven Design from software development as a basis
¢ Clear responsibilities for Data Products
¢ More flexibility for developers in tool selection
¢ Modern Data Stack as a technical framework to implement data mesh (organizational)
¢ Flexible architecture to support “free choice of weapons” – Modern Data Platform
¢ APIs for intern and extern purposes
¢ Focus: Shorter “Time to Market”
DATA MESH – DATA PRODUCTS
¢ In direct connection with microservices from the classic SD environment
¢ Operational applications vs. analytical applications
USAGE OF DATA PRODUCTS
Data API
OUR SELECTION: COMPONENTS
AIRBYTE
¢ Data Ingest
¢ Many standard connectors available
¢ Saas, Cloud, APIs, Databases,…
¢ Facebook, Google, Salesforce, Redshift, Snowflake, BigQuery, …
¢ Own connectors with Python Connector Development Kit
¢ Simple transformations possible
¢ SaaS (just in US) und Open Source for own installations
¢ Container based operation
¢ Separation of platform/connectors (server, UI, scheduler,
...)
¢ New container for each connector
¢ Possible alternatives: Stitch, Fivetran, Singer, Meltano, …
DBT
¢ Data Transformation („Data Build Tool“)
¢ Just „T“ in EL+T – Extraction separately
¢ ELT approach, so-called models are compiled for the target platform
(e.g. Cloud DWH, Snowflake) and executed there
¢ Code-first, SQL with Jinja (Templating)
¢ There is a growing community, extensions can be downloaded
¢ SaaS and Open Source (Python)
¢ DEV environment cloud.getdbt.com
¢ Any editor can be used locally, CLI available (dbt-core)
¢ Deployment
¢ VM, Docker container, can be integrated almost anywhere
¢ Possible alternatives: Azure Data Factory, Talend, Informatica, …
APACHE AIRFLOW
¢ Workflow management system
¢ Originally developed by Airbnb
¢ Running a DAG (Directed Acyclic Graph)
¢ Nodes contain operators, can execute code, but also control other
tools
¢ Popular for building/running data pipelines
¢ Best suited for GitOps / integration into pipelines
¢ Managed variants available and open source
¢ Astronomer, Managed Airflow bei AWS, Google
¢ Consists of: Scheduler, Worker, UI, DB, Flower (Celery, Redis)
¢ Parallel processing on several workers possible
¢ Scales thanks to container technology
¢ Possible alternatives: Dagster, Luigi, Prefect, …
ARCHITECTURE
03
MDS ARCHITECTURE WITHOUT SERVICE MESH
TYPICAL SERVICE MESH WITHOUT MDS
AND BOTH TOGEHTER
KUMA IN ACTION
¢ All internal MDS services get
sidecars
¢ Central overview over
services of all domains
¢ Status of services
¢ Metrics of services
¢ Traffic between components
can be controlled
METRICS & TRACING
Metrics & Tracing of internal MDS components:
WHAT PROBLEMS DOES A SERVICE MESH SOLVE IN MDS
¢ Centralized Service Mesh implementaion
¢ Centralized overview over all services
¢ Internal – MDS
¢ External – Data APIs
¢ Centralized monitoring over all services
¢ Monitoring
¢ Logging
¢ Tracing
¢ Decrease time to market
¢ Developers don't have to worry about recurring problems
¢ Security – TLS
¢ Authentication & Authorization
¢ Support for exporting data & APIs
SUMMARY
04
¢ Modern Data Stack as a distributed data platform
¢ One possible architecture to build support Data Mesh
implementation
¢ Service Mesh helps to
¢ Secure…
¢ Monitor…
¢ Trace…
¢ …this Modern Data Stack Architecture
¢ But: The complexity of the system is additionally increased
¢ The team must have a deep understanding of service mesh
SUMMARY
Q & A
Fabian Hardt
CONTACT
Solution Architect
Fabian.hardt@opitz-consulting.com
https://twitter.com/fabian_hardt
www.linkedin.com/in/fabian-hardt

More Related Content

Similar to How Service Mesh Fits into the Modern Data Stack

NoSql presentation
NoSql presentationNoSql presentation
NoSql presentationMat Wall
 
Democratization of Data @Indix
Democratization of Data @IndixDemocratization of Data @Indix
Democratization of Data @IndixManoj Mahalingam
 
No SQL at The Guardian
No SQL at The GuardianNo SQL at The Guardian
No SQL at The GuardianMat Wall
 
Analytics meets Integration – Modern Development mit Data APIs
Analytics meets Integration – Modern Development mit Data APIsAnalytics meets Integration – Modern Development mit Data APIs
Analytics meets Integration – Modern Development mit Data APIsFabian Hardt
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
Analytics meets Integration - Modern Development with Data APIs
Analytics meets Integration - Modern Development with Data APIsAnalytics meets Integration - Modern Development with Data APIs
Analytics meets Integration - Modern Development with Data APIsSven Bernhardt
 
10/ EnterpriseDB @ OPEN'16
10/ EnterpriseDB @ OPEN'16 10/ EnterpriseDB @ OPEN'16
10/ EnterpriseDB @ OPEN'16 Kangaroot
 
The new big data
The new big dataThe new big data
The new big dataAdam Doyle
 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon
 
Big Data or Data Warehousing? How to Leverage Both in the Enterprise
Big Data or Data Warehousing? How to Leverage Both in the EnterpriseBig Data or Data Warehousing? How to Leverage Both in the Enterprise
Big Data or Data Warehousing? How to Leverage Both in the EnterpriseDean Hallman
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonJeffrey T. Pollock
 
Introduction to Microsoft Flow - Introduction & advanced scenarios
Introduction to Microsoft Flow - Introduction & advanced scenariosIntroduction to Microsoft Flow - Introduction & advanced scenarios
Introduction to Microsoft Flow - Introduction & advanced scenariosserge luca
 
Schnellere Digitalisierung mit einer cloudbasierten Datenstrategie
Schnellere Digitalisierung mit einer cloudbasierten DatenstrategieSchnellere Digitalisierung mit einer cloudbasierten Datenstrategie
Schnellere Digitalisierung mit einer cloudbasierten DatenstrategieMongoDB
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleDatabricks
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis
 
Accelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data StrategyAccelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data StrategyMongoDB
 
Technology Overview
Technology OverviewTechnology Overview
Technology OverviewLiran Zelkha
 

Similar to How Service Mesh Fits into the Modern Data Stack (20)

NoSql presentation
NoSql presentationNoSql presentation
NoSql presentation
 
Democratization of Data @Indix
Democratization of Data @IndixDemocratization of Data @Indix
Democratization of Data @Indix
 
No SQL at The Guardian
No SQL at The GuardianNo SQL at The Guardian
No SQL at The Guardian
 
Ibm db2update2019 icp4 data
Ibm db2update2019   icp4 dataIbm db2update2019   icp4 data
Ibm db2update2019 icp4 data
 
Analytics meets Integration – Modern Development mit Data APIs
Analytics meets Integration – Modern Development mit Data APIsAnalytics meets Integration – Modern Development mit Data APIs
Analytics meets Integration – Modern Development mit Data APIs
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Analytics meets Integration - Modern Development with Data APIs
Analytics meets Integration - Modern Development with Data APIsAnalytics meets Integration - Modern Development with Data APIs
Analytics meets Integration - Modern Development with Data APIs
 
10/ EnterpriseDB @ OPEN'16
10/ EnterpriseDB @ OPEN'16 10/ EnterpriseDB @ OPEN'16
10/ EnterpriseDB @ OPEN'16
 
The new big data
The new big dataThe new big data
The new big data
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
 
Big Data or Data Warehousing? How to Leverage Both in the Enterprise
Big Data or Data Warehousing? How to Leverage Both in the EnterpriseBig Data or Data Warehousing? How to Leverage Both in the Enterprise
Big Data or Data Warehousing? How to Leverage Both in the Enterprise
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
Introduction to Microsoft Flow - Introduction & advanced scenarios
Introduction to Microsoft Flow - Introduction & advanced scenariosIntroduction to Microsoft Flow - Introduction & advanced scenarios
Introduction to Microsoft Flow - Introduction & advanced scenarios
 
Schnellere Digitalisierung mit einer cloudbasierten Datenstrategie
Schnellere Digitalisierung mit einer cloudbasierten DatenstrategieSchnellere Digitalisierung mit einer cloudbasierten Datenstrategie
Schnellere Digitalisierung mit einer cloudbasierten Datenstrategie
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
 
Accelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data StrategyAccelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data Strategy
 
Technology Overview
Technology OverviewTechnology Overview
Technology Overview
 

More from Fabian Hardt

Advanced Observability & Security
Advanced Observability & SecurityAdvanced Observability & Security
Advanced Observability & SecurityFabian Hardt
 
Advanced Observability & Security
Advanced Observability & SecurityAdvanced Observability & Security
Advanced Observability & SecurityFabian Hardt
 
Mit APIs auf der Überholspur zur produktorientierten Organisation
Mit APIs auf der Überholspur zur produktorientierten OrganisationMit APIs auf der Überholspur zur produktorientierten Organisation
Mit APIs auf der Überholspur zur produktorientierten OrganisationFabian Hardt
 
Data Mesh und Domain Driven Design - rücken Analytics und SD nun doch näher z...
Data Mesh und Domain Driven Design - rücken Analytics und SD nun doch näher z...Data Mesh und Domain Driven Design - rücken Analytics und SD nun doch näher z...
Data Mesh und Domain Driven Design - rücken Analytics und SD nun doch näher z...Fabian Hardt
 
Service Mesh Advanced Use Cases
Service Mesh Advanced Use CasesService Mesh Advanced Use Cases
Service Mesh Advanced Use CasesFabian Hardt
 
Modern Data Stack – Buzzword oder echter Game-Changer?
Modern Data Stack – Buzzword oder echter Game-Changer?Modern Data Stack – Buzzword oder echter Game-Changer?
Modern Data Stack – Buzzword oder echter Game-Changer?Fabian Hardt
 
Persönliche Filmtipps mittels Recommender System und Chatbot
Persönliche Filmtipps mittels Recommender System und ChatbotPersönliche Filmtipps mittels Recommender System und Chatbot
Persönliche Filmtipps mittels Recommender System und ChatbotFabian Hardt
 
Automatisierte Provisionierung einer Data Lab Umgebung für Data Scientists
Automatisierte Provisionierung einer Data Lab Umgebung für Data ScientistsAutomatisierte Provisionierung einer Data Lab Umgebung für Data Scientists
Automatisierte Provisionierung einer Data Lab Umgebung für Data ScientistsFabian Hardt
 
Augmented Analytics mit Amazon Alexa
Augmented Analytics mit Amazon AlexaAugmented Analytics mit Amazon Alexa
Augmented Analytics mit Amazon AlexaFabian Hardt
 

More from Fabian Hardt (9)

Advanced Observability & Security
Advanced Observability & SecurityAdvanced Observability & Security
Advanced Observability & Security
 
Advanced Observability & Security
Advanced Observability & SecurityAdvanced Observability & Security
Advanced Observability & Security
 
Mit APIs auf der Überholspur zur produktorientierten Organisation
Mit APIs auf der Überholspur zur produktorientierten OrganisationMit APIs auf der Überholspur zur produktorientierten Organisation
Mit APIs auf der Überholspur zur produktorientierten Organisation
 
Data Mesh und Domain Driven Design - rücken Analytics und SD nun doch näher z...
Data Mesh und Domain Driven Design - rücken Analytics und SD nun doch näher z...Data Mesh und Domain Driven Design - rücken Analytics und SD nun doch näher z...
Data Mesh und Domain Driven Design - rücken Analytics und SD nun doch näher z...
 
Service Mesh Advanced Use Cases
Service Mesh Advanced Use CasesService Mesh Advanced Use Cases
Service Mesh Advanced Use Cases
 
Modern Data Stack – Buzzword oder echter Game-Changer?
Modern Data Stack – Buzzword oder echter Game-Changer?Modern Data Stack – Buzzword oder echter Game-Changer?
Modern Data Stack – Buzzword oder echter Game-Changer?
 
Persönliche Filmtipps mittels Recommender System und Chatbot
Persönliche Filmtipps mittels Recommender System und ChatbotPersönliche Filmtipps mittels Recommender System und Chatbot
Persönliche Filmtipps mittels Recommender System und Chatbot
 
Automatisierte Provisionierung einer Data Lab Umgebung für Data Scientists
Automatisierte Provisionierung einer Data Lab Umgebung für Data ScientistsAutomatisierte Provisionierung einer Data Lab Umgebung für Data Scientists
Automatisierte Provisionierung einer Data Lab Umgebung für Data Scientists
 
Augmented Analytics mit Amazon Alexa
Augmented Analytics mit Amazon AlexaAugmented Analytics mit Amazon Alexa
Augmented Analytics mit Amazon Alexa
 

Recently uploaded

English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 

Recently uploaded (17)

English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 

How Service Mesh Fits into the Modern Data Stack

  • 1. Fabian Hardt SEPTEMBER 2022 HOW SERVICE MESH FITS INTO THE MODERN DATA STACK
  • 2. AGENDA MOTIVATION 01 WHAT IS THE MDS 02 SUMMARY 04 ARCHITECTURE 03
  • 4. Data Lake and DWH are combined In the same way, BI and AI are growing together, see e.g. Snowflake and Databricks. Counter-movement to increasing specialization. In general: it moves a lot. WHAT WE ARE CURRENTLY SEEING... A lot of money in the market With it increasing fragmentation of functionalities; each new startup takes care of a special function. This distribution of individual functionalities in separate tools. Cloud is the new standard Hardly anyone still builds analytical architectures on- premises. But there are exceptions. The Hadoop ecosystem is becoming less important; Data lakes increasingly on object storage in the public cloud. Use of software development best practices Also as a result of migration to the cloud. So infrastructure as code, automation, CI/CD. Increasing sympathy for code- first and open source, return of frameworks, DIY, SQL only (skills!).
  • 5. WHAT IS THE MODERN DATA STACK 02
  • 6. CORE CHARACTERISTICS OF THE MODERN DATA STACK (MDS) Automation and operationalization Basic paradigms of modern software development are being introduced, including GitOps, CI/CD, containers and automated testing. Best of Breed and Modular A Modern Data Stack has a modular structure. Individual components can be exchanged. EL+T are separated. The best tool is selected for each discipline. Cloud DWH Central data storage component of the Modern Data Stack. Combines advantages of data lake and data warehouse. SaaS / IaC Focus on maintainability and low time to market. This can be achieved using SaaS services from cloud providers or automation using IaC.
  • 7. WHY TO USE MODERN DATA STACK? ¢ Data Mesh ¢ Total hype at the moment ¢ Organizational Framework for Data Driven Companies ¢ Data Products with APIs - similar to microservices ¢ Domain Driven Design from software development as a basis ¢ Clear responsibilities for Data Products ¢ More flexibility for developers in tool selection ¢ Modern Data Stack as a technical framework to implement data mesh (organizational) ¢ Flexible architecture to support “free choice of weapons” – Modern Data Platform ¢ APIs for intern and extern purposes ¢ Focus: Shorter “Time to Market”
  • 8. DATA MESH – DATA PRODUCTS ¢ In direct connection with microservices from the classic SD environment ¢ Operational applications vs. analytical applications
  • 9. USAGE OF DATA PRODUCTS Data API
  • 11. AIRBYTE ¢ Data Ingest ¢ Many standard connectors available ¢ Saas, Cloud, APIs, Databases,… ¢ Facebook, Google, Salesforce, Redshift, Snowflake, BigQuery, … ¢ Own connectors with Python Connector Development Kit ¢ Simple transformations possible ¢ SaaS (just in US) und Open Source for own installations ¢ Container based operation ¢ Separation of platform/connectors (server, UI, scheduler, ...) ¢ New container for each connector ¢ Possible alternatives: Stitch, Fivetran, Singer, Meltano, …
  • 12. DBT ¢ Data Transformation („Data Build Tool“) ¢ Just „T“ in EL+T – Extraction separately ¢ ELT approach, so-called models are compiled for the target platform (e.g. Cloud DWH, Snowflake) and executed there ¢ Code-first, SQL with Jinja (Templating) ¢ There is a growing community, extensions can be downloaded ¢ SaaS and Open Source (Python) ¢ DEV environment cloud.getdbt.com ¢ Any editor can be used locally, CLI available (dbt-core) ¢ Deployment ¢ VM, Docker container, can be integrated almost anywhere ¢ Possible alternatives: Azure Data Factory, Talend, Informatica, …
  • 13. APACHE AIRFLOW ¢ Workflow management system ¢ Originally developed by Airbnb ¢ Running a DAG (Directed Acyclic Graph) ¢ Nodes contain operators, can execute code, but also control other tools ¢ Popular for building/running data pipelines ¢ Best suited for GitOps / integration into pipelines ¢ Managed variants available and open source ¢ Astronomer, Managed Airflow bei AWS, Google ¢ Consists of: Scheduler, Worker, UI, DB, Flower (Celery, Redis) ¢ Parallel processing on several workers possible ¢ Scales thanks to container technology ¢ Possible alternatives: Dagster, Luigi, Prefect, …
  • 15. MDS ARCHITECTURE WITHOUT SERVICE MESH
  • 16. TYPICAL SERVICE MESH WITHOUT MDS
  • 18. KUMA IN ACTION ¢ All internal MDS services get sidecars ¢ Central overview over services of all domains ¢ Status of services ¢ Metrics of services ¢ Traffic between components can be controlled
  • 19. METRICS & TRACING Metrics & Tracing of internal MDS components:
  • 20. WHAT PROBLEMS DOES A SERVICE MESH SOLVE IN MDS ¢ Centralized Service Mesh implementaion ¢ Centralized overview over all services ¢ Internal – MDS ¢ External – Data APIs ¢ Centralized monitoring over all services ¢ Monitoring ¢ Logging ¢ Tracing ¢ Decrease time to market ¢ Developers don't have to worry about recurring problems ¢ Security – TLS ¢ Authentication & Authorization ¢ Support for exporting data & APIs
  • 22. ¢ Modern Data Stack as a distributed data platform ¢ One possible architecture to build support Data Mesh implementation ¢ Service Mesh helps to ¢ Secure… ¢ Monitor… ¢ Trace… ¢ …this Modern Data Stack Architecture ¢ But: The complexity of the system is additionally increased ¢ The team must have a deep understanding of service mesh SUMMARY
  • 23. Q & A