SlideShare a Scribd company logo
Data Tech Talk
ft. external speaker
Chris Riccomini, Author, Engineer & Manager, …
1. Talk: Data Warehousing Trends
2. Open Dialogue: Q & A
Data
Warehousing
Trends 2021/12/09 · Chris Riccomini
Hi, I’m Chris
● Engineer & Manager
WePay, LinkedIn, PayPal
● Open Source
Apache Samza, Apache Airflow, Debezium
● Author
The Missing README
● Investor & Advisor
Prefect, Meroxa, StarTree, Amundsen, Anomalo, TopCoat, ...
@criccomini
The Trends
● Realtime DWHs
● Analytics Engineering
● Data Mesh
● Data Catalogs
● Reverse ETL
● Headless BI
● Data Integrity
● Data Lakehouses
● DataOps
● White-label Data Viz
https://twitter.com/criccomini/status/1451557884769169412
https://preset.io/blog/reshaping-data-engineering/
The Trends
● Realtime DWHs
● Analytics Engineering
● Data Mesh
● Data Catalogs
● Reverse ETL
● Headless BI
● Data Integrity
● Data Lakehouses
● DataOps
● White-label Data Viz
https://twitter.com/criccomini/status/1451557884769169412
https://preset.io/blog/reshaping-data-engineering/
You are here
Realtime DWHs
Batch ETL
Pubsub
Realtime DWH
Realtime DWH
Why Realtime DWHs?
● Debugging
○ Investigate application errors
○ Audit log shows how things changed
● Operational
○ Monitoring
○ Scripts that pull from DWH
● Security/Compliance
○ Audit log
● Customer data products
○ Ad hoc customer reports (e.g. Stripe Sigma, WePay txns)
○ Data clean rooms
Realtime DWHs Technical Advantages
● Handles hard deletes
● No schema requirements (timestamps)
● Replay from Kafka
● Data integration
Realtime DWH Drawbacks
● Operationally complex
● Depends on source DB support (for CDC)
● Inline transformation is harder
● Fixing bad data is harder
Data Mesh
“A data mesh is a type of data platform architecture that embraces the ubiquity of
data in the enterprise by leveraging a domain-oriented, self-serve design.”
● Domain-oriented decentralized data ownership and architecture
● Data as a product
● Self-serve data infrastructure as a platform
● Federated computational governance
Data Mesh
https://towardsdatascience.com/what-is-a-data-mesh-and-how-not-to-mesh-it-up-210710bb41e0
“A data mesh is a type of data platform architecture that embraces the ubiquity of
data in the enterprise by leveraging a domain-oriented, self-serve design.”
● Domain-oriented decentralized data ownership and architecture
● Data as a product
● Self-serve data infrastructure as a platform
● Federated computational governance
Data Mesh
https://towardsdatascience.com/what-is-a-data-mesh-and-how-not-to-mesh-it-up-210710bb41e0
wat.
“A product is any item or service you sell to serve a customer's need or want.”
Data is a Product
https://www.aha.io/roadmapping/guide/product-management/what-is-a-product
● Customers
○ Data scientists
○ Business analysts
○ Finance
○ Sales
○ Product managers
○ Engineers
○ External customers
● Products
○ Recommender systems
○ Billing
○ Fraud
○ Reports
○ Dashboards
Data is a Product
https://cnr.sh/essays/what-the-heck-data-mesh
● Versioned
● Compatible
● Documented
● Monitored
● Self-serve
● Secure (AuthN, AuthZ)
Treat Data Models like APIs
https://cnr.sh/essays/what-the-heck-data-mesh
● Microservice
● DevOps
We’ve Done This Before
https://cnr.sh/essays/what-the-heck-data-mesh
Headless BI
Metrics then
● BI tools to create and visualize metrics
○ Looker
○ Mode
○ Tableau
○ Data Studio
● Answer internal business questions
○ How is a product's health?
○ What does revenue look like?
Metrics now
● Metrics matter for external business workflows
○ Predicting when a customer might churn
○ Notifying users when they reach their capacity limit
○ DS wants to create models to optimize certain metrics
○ Computing customer bills
● BI tools aren’t meant for this
○ Walled garden
○ Have to re-implement the same metrics in different systems
“For data consumption, we heard complaints from decision makers that different
teams reported different numbers for very simple business questions, and
there was no easy way to know which number was correct.”
"...the teams that own metrics would be able to define them once, in a way
that’s consistent across dashboards, automation tools, sales reporting, and so
on. Let’s call this ‘Headless BI’."
Headless BI
https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70
https://basecase.vc/blog/headless-bi
Headless BI
● Programmatically manage business metrics
○ Automated
○ Centralized
○ Documented
○ Validated
○ Metadata/lineage
○ Backfills
○ Cost
○ Privacy
○ Access
○ Deprecation
○ Retention
https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70
Headless BI
https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70
Q&A
● Realtime DWHs
● Analytics Engineering
● Data Mesh
● Data Catalogs
● Reverse ETL
● Headless BI
● Data Integrity
● Data Lakehouses
● DataOps
● White-label Data Viz
Appendix
Analytics Engineering
Analytics engineers provide clean data sets to end users, modeling data in a way
that empowers end users to answer their own questions.
While a data analyst spends their time analyzing data, an analytics engineer
spends their time transforming, testing, deploying, and documenting data.
Analytics engineers apply software engineering best practices like version
control and continuous integration to the analytics code base.
https://www.getdbt.com/what-is-analytics-engineering
Analytics Engineering
● Job
○ Building
○ Testing
○ Cataloging
● Tools
○ DBT
○ Airflow
● Customers
○ Data science
○ Data analysts
○ BI
○ Reporting
Data Catalogs
● Flavor of the month
○ Amundsen
○ DataHub
○ Metaphor
○ Marquez
○ Atlan
○ Collibra
○ Alation
● Use cases
○ Discoverability
○ Operations
○ Governance
“Reverse ETL syncs data from a system of records like a warehouse to a system
of actions like CRM, MAP, and other SaaS apps to operationalize data.”
Reverse ETL
https://blog.getcensus.com/what-is-reverse-etl/
Reverse ETL
https://blog.getcensus.com/what-is-reverse-etl/
● https://unsplash.com/photos/Lbvi0GGJWY4
● https://unsplash.com/photos/8vTAAFYhFfQ
● https://www.flickr.com/photos/jurgenappelo/5201275209
●
Photos
Realtime DWHs

More Related Content

What's hot

What's Digital Transformation?
What's Digital Transformation?What's Digital Transformation?
What's Digital Transformation?
Hải Phạm
 
Enterprise Data Governance for Financial Institutions
Enterprise Data Governance for Financial InstitutionsEnterprise Data Governance for Financial Institutions
Enterprise Data Governance for Financial InstitutionsSheldon McCarthy
 
DAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
DAS Slides: Data Architect vs. Data Engineer vs. Data ModelerDAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
DAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
DATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
DATAVERSITY
 
BI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and StrategyBI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and Strategy
Shivam Dhawan
 
Data Mesh
Data MeshData Mesh
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
Precisely
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture Deliverables
Lars E Martinsson
 
Data Architecture Brief Overview
Data Architecture Brief OverviewData Architecture Brief Overview
Data Architecture Brief Overview
Hal Kalechofsky
 
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
BigID Inc
 
Practicing Data Science: A Collection of Case Studies
Practicing Data Science: A Collection of Case StudiesPracticing Data Science: A Collection of Case Studies
Practicing Data Science: A Collection of Case Studies
KNIMESlides
 
DAS Slides: Enterprise Architecture vs. Data Architecture
DAS Slides: Enterprise Architecture vs. Data ArchitectureDAS Slides: Enterprise Architecture vs. Data Architecture
DAS Slides: Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
Data Management, Metadata Management, and Data Governance – Working Together
Data Management, Metadata Management, and Data Governance – Working TogetherData Management, Metadata Management, and Data Governance – Working Together
Data Management, Metadata Management, and Data Governance – Working Together
DATAVERSITY
 
8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy
Silicon Valley Data Science
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
DATAVERSITY
 
Data strategy demistifying data
Data strategy demistifying dataData strategy demistifying data
Data strategy demistifying data
Hans Verstraeten
 
Most Common Data Governance Challenges in the Digital Economy
Most Common Data Governance Challenges in the Digital EconomyMost Common Data Governance Challenges in the Digital Economy
Most Common Data Governance Challenges in the Digital Economy
Robyn Bollhorst
 
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
DATAVERSITY
 
Digital Transformation
Digital TransformationDigital Transformation
Digital Transformation
Vishal Sharma
 
Slides: Taking an Active Approach to Data Governance
Slides: Taking an Active Approach to Data GovernanceSlides: Taking an Active Approach to Data Governance
Slides: Taking an Active Approach to Data Governance
DATAVERSITY
 

What's hot (20)

What's Digital Transformation?
What's Digital Transformation?What's Digital Transformation?
What's Digital Transformation?
 
Enterprise Data Governance for Financial Institutions
Enterprise Data Governance for Financial InstitutionsEnterprise Data Governance for Financial Institutions
Enterprise Data Governance for Financial Institutions
 
DAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
DAS Slides: Data Architect vs. Data Engineer vs. Data ModelerDAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
DAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
BI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and StrategyBI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and Strategy
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture Deliverables
 
Data Architecture Brief Overview
Data Architecture Brief OverviewData Architecture Brief Overview
Data Architecture Brief Overview
 
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
 
Practicing Data Science: A Collection of Case Studies
Practicing Data Science: A Collection of Case StudiesPracticing Data Science: A Collection of Case Studies
Practicing Data Science: A Collection of Case Studies
 
DAS Slides: Enterprise Architecture vs. Data Architecture
DAS Slides: Enterprise Architecture vs. Data ArchitectureDAS Slides: Enterprise Architecture vs. Data Architecture
DAS Slides: Enterprise Architecture vs. Data Architecture
 
Data Management, Metadata Management, and Data Governance – Working Together
Data Management, Metadata Management, and Data Governance – Working TogetherData Management, Metadata Management, and Data Governance – Working Together
Data Management, Metadata Management, and Data Governance – Working Together
 
8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
 
Data strategy demistifying data
Data strategy demistifying dataData strategy demistifying data
Data strategy demistifying data
 
Most Common Data Governance Challenges in the Digital Economy
Most Common Data Governance Challenges in the Digital EconomyMost Common Data Governance Challenges in the Digital Economy
Most Common Data Governance Challenges in the Digital Economy
 
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
 
Digital Transformation
Digital TransformationDigital Transformation
Digital Transformation
 
Slides: Taking an Active Approach to Data Governance
Slides: Taking an Active Approach to Data GovernanceSlides: Taking an Active Approach to Data Governance
Slides: Taking an Active Approach to Data Governance
 

Similar to Data Warehousing Trends

Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
Databricks
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Denodo
 
Running Data Platforms Like Products
Running Data Platforms Like ProductsRunning Data Platforms Like Products
Running Data Platforms Like Products
VMware Tanzu
 
DATA BI: put key insights at the finger tip of decision makers.
DATA BI: put key insights at the finger tip of decision makers.DATA BI: put key insights at the finger tip of decision makers.
DATA BI: put key insights at the finger tip of decision makers.
ZaraaTitima1
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
DATAVERSITY
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
SingleStore
 
How To Run A Successful BI Project with Hadoop
How To Run A Successful BI Project with HadoopHow To Run A Successful BI Project with Hadoop
How To Run A Successful BI Project with Hadoop
Mammoth Data
 
The Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with VirtualizationThe Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with Virtualization
Inside Analysis
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
DATAVERSITY
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Looker
 
Introduction to Big Data using AWS Services
Introduction to Big Data using AWS ServicesIntroduction to Big Data using AWS Services
Introduction to Big Data using AWS Services
Anjani Phuyal
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution  Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution
Sirinporn Setworaya
 
Digital Operations Service Design
Digital Operations Service DesignDigital Operations Service Design
Digital Operations Service Design
NVISIA
 
No Compromises SQL Connectivity for MongoDB
No Compromises SQL Connectivity for MongoDBNo Compromises SQL Connectivity for MongoDB
No Compromises SQL Connectivity for MongoDB
MongoDB
 
20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science Orientation20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science Orientation
Duc Lai Trung Minh
 
Intro to Neo4j
Intro to Neo4jIntro to Neo4j
Intro to Neo4j
Neo4j
 
SPT 104 Unlock your big data with analytics and BI on Office 365
SPT 104 Unlock your big data with analytics and BI on Office 365SPT 104 Unlock your big data with analytics and BI on Office 365
SPT 104 Unlock your big data with analytics and BI on Office 365
Brian Culver
 
Paving The Way To Data Driven
Paving The Way To Data DrivenPaving The Way To Data Driven
Paving The Way To Data Driven
Mohd Izhar Firdaus Ismail
 
Big Data Analytics with Microsoft
Big Data Analytics with MicrosoftBig Data Analytics with Microsoft
Big Data Analytics with Microsoft
Caserta
 

Similar to Data Warehousing Trends (20)

Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
Running Data Platforms Like Products
Running Data Platforms Like ProductsRunning Data Platforms Like Products
Running Data Platforms Like Products
 
DATA BI: put key insights at the finger tip of decision makers.
DATA BI: put key insights at the finger tip of decision makers.DATA BI: put key insights at the finger tip of decision makers.
DATA BI: put key insights at the finger tip of decision makers.
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
How To Run A Successful BI Project with Hadoop
How To Run A Successful BI Project with HadoopHow To Run A Successful BI Project with Hadoop
How To Run A Successful BI Project with Hadoop
 
The Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with VirtualizationThe Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with Virtualization
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
 
Introduction to Big Data using AWS Services
Introduction to Big Data using AWS ServicesIntroduction to Big Data using AWS Services
Introduction to Big Data using AWS Services
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution  Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution
 
Digital Operations Service Design
Digital Operations Service DesignDigital Operations Service Design
Digital Operations Service Design
 
No Compromises SQL Connectivity for MongoDB
No Compromises SQL Connectivity for MongoDBNo Compromises SQL Connectivity for MongoDB
No Compromises SQL Connectivity for MongoDB
 
20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science Orientation20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science Orientation
 
Intro to Neo4j
Intro to Neo4jIntro to Neo4j
Intro to Neo4j
 
SPT 104 Unlock your big data with analytics and BI on Office 365
SPT 104 Unlock your big data with analytics and BI on Office 365SPT 104 Unlock your big data with analytics and BI on Office 365
SPT 104 Unlock your big data with analytics and BI on Office 365
 
Paving The Way To Data Driven
Paving The Way To Data DrivenPaving The Way To Data Driven
Paving The Way To Data Driven
 
Big Data Analytics with Microsoft
Big Data Analytics with MicrosoftBig Data Analytics with Microsoft
Big Data Analytics with Microsoft
 

More from Chris Riccomini

What Your Tech Lead Thinks You Know (But Didn't Teach You)
What Your Tech Lead Thinks You Know (But Didn't Teach You)What Your Tech Lead Thinks You Know (But Didn't Teach You)
What Your Tech Lead Thinks You Know (But Didn't Teach You)
Chris Riccomini
 
The Future of Data Engineering - 2019 InfoQ QConSF
The Future of Data Engineering - 2019 InfoQ QConSFThe Future of Data Engineering - 2019 InfoQ QConSF
The Future of Data Engineering - 2019 InfoQ QConSF
Chris Riccomini
 
Airflow at WePay
Airflow at WePayAirflow at WePay
Airflow at WePay
Chris Riccomini
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
Chris Riccomini
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
Chris Riccomini
 
Building Applications on YARN
Building Applications on YARNBuilding Applications on YARN
Building Applications on YARN
Chris Riccomini
 

More from Chris Riccomini (6)

What Your Tech Lead Thinks You Know (But Didn't Teach You)
What Your Tech Lead Thinks You Know (But Didn't Teach You)What Your Tech Lead Thinks You Know (But Didn't Teach You)
What Your Tech Lead Thinks You Know (But Didn't Teach You)
 
The Future of Data Engineering - 2019 InfoQ QConSF
The Future of Data Engineering - 2019 InfoQ QConSFThe Future of Data Engineering - 2019 InfoQ QConSF
The Future of Data Engineering - 2019 InfoQ QConSF
 
Airflow at WePay
Airflow at WePayAirflow at WePay
Airflow at WePay
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
 
Building Applications on YARN
Building Applications on YARNBuilding Applications on YARN
Building Applications on YARN
 

Recently uploaded

Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
Vijay Dialani, PhD
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 

Recently uploaded (20)

Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 

Data Warehousing Trends

  • 1. Data Tech Talk ft. external speaker Chris Riccomini, Author, Engineer & Manager, … 1. Talk: Data Warehousing Trends 2. Open Dialogue: Q & A
  • 3. Hi, I’m Chris ● Engineer & Manager WePay, LinkedIn, PayPal ● Open Source Apache Samza, Apache Airflow, Debezium ● Author The Missing README ● Investor & Advisor Prefect, Meroxa, StarTree, Amundsen, Anomalo, TopCoat, ... @criccomini
  • 4.
  • 5. The Trends ● Realtime DWHs ● Analytics Engineering ● Data Mesh ● Data Catalogs ● Reverse ETL ● Headless BI ● Data Integrity ● Data Lakehouses ● DataOps ● White-label Data Viz https://twitter.com/criccomini/status/1451557884769169412 https://preset.io/blog/reshaping-data-engineering/
  • 6. The Trends ● Realtime DWHs ● Analytics Engineering ● Data Mesh ● Data Catalogs ● Reverse ETL ● Headless BI ● Data Integrity ● Data Lakehouses ● DataOps ● White-label Data Viz https://twitter.com/criccomini/status/1451557884769169412 https://preset.io/blog/reshaping-data-engineering/
  • 13. Why Realtime DWHs? ● Debugging ○ Investigate application errors ○ Audit log shows how things changed ● Operational ○ Monitoring ○ Scripts that pull from DWH ● Security/Compliance ○ Audit log ● Customer data products ○ Ad hoc customer reports (e.g. Stripe Sigma, WePay txns) ○ Data clean rooms
  • 14. Realtime DWHs Technical Advantages ● Handles hard deletes ● No schema requirements (timestamps) ● Replay from Kafka ● Data integration
  • 15. Realtime DWH Drawbacks ● Operationally complex ● Depends on source DB support (for CDC) ● Inline transformation is harder ● Fixing bad data is harder
  • 17. “A data mesh is a type of data platform architecture that embraces the ubiquity of data in the enterprise by leveraging a domain-oriented, self-serve design.” ● Domain-oriented decentralized data ownership and architecture ● Data as a product ● Self-serve data infrastructure as a platform ● Federated computational governance Data Mesh https://towardsdatascience.com/what-is-a-data-mesh-and-how-not-to-mesh-it-up-210710bb41e0
  • 18. “A data mesh is a type of data platform architecture that embraces the ubiquity of data in the enterprise by leveraging a domain-oriented, self-serve design.” ● Domain-oriented decentralized data ownership and architecture ● Data as a product ● Self-serve data infrastructure as a platform ● Federated computational governance Data Mesh https://towardsdatascience.com/what-is-a-data-mesh-and-how-not-to-mesh-it-up-210710bb41e0 wat.
  • 19. “A product is any item or service you sell to serve a customer's need or want.” Data is a Product https://www.aha.io/roadmapping/guide/product-management/what-is-a-product
  • 20. ● Customers ○ Data scientists ○ Business analysts ○ Finance ○ Sales ○ Product managers ○ Engineers ○ External customers ● Products ○ Recommender systems ○ Billing ○ Fraud ○ Reports ○ Dashboards Data is a Product https://cnr.sh/essays/what-the-heck-data-mesh
  • 21. ● Versioned ● Compatible ● Documented ● Monitored ● Self-serve ● Secure (AuthN, AuthZ) Treat Data Models like APIs https://cnr.sh/essays/what-the-heck-data-mesh
  • 22. ● Microservice ● DevOps We’ve Done This Before https://cnr.sh/essays/what-the-heck-data-mesh
  • 24. Metrics then ● BI tools to create and visualize metrics ○ Looker ○ Mode ○ Tableau ○ Data Studio ● Answer internal business questions ○ How is a product's health? ○ What does revenue look like?
  • 25. Metrics now ● Metrics matter for external business workflows ○ Predicting when a customer might churn ○ Notifying users when they reach their capacity limit ○ DS wants to create models to optimize certain metrics ○ Computing customer bills ● BI tools aren’t meant for this ○ Walled garden ○ Have to re-implement the same metrics in different systems
  • 26. “For data consumption, we heard complaints from decision makers that different teams reported different numbers for very simple business questions, and there was no easy way to know which number was correct.” "...the teams that own metrics would be able to define them once, in a way that’s consistent across dashboards, automation tools, sales reporting, and so on. Let’s call this ‘Headless BI’." Headless BI https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70 https://basecase.vc/blog/headless-bi
  • 27. Headless BI ● Programmatically manage business metrics ○ Automated ○ Centralized ○ Documented ○ Validated ○ Metadata/lineage ○ Backfills ○ Cost ○ Privacy ○ Access ○ Deprecation ○ Retention https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70
  • 29. Q&A ● Realtime DWHs ● Analytics Engineering ● Data Mesh ● Data Catalogs ● Reverse ETL ● Headless BI ● Data Integrity ● Data Lakehouses ● DataOps ● White-label Data Viz
  • 31. Analytics Engineering Analytics engineers provide clean data sets to end users, modeling data in a way that empowers end users to answer their own questions. While a data analyst spends their time analyzing data, an analytics engineer spends their time transforming, testing, deploying, and documenting data. Analytics engineers apply software engineering best practices like version control and continuous integration to the analytics code base. https://www.getdbt.com/what-is-analytics-engineering
  • 32. Analytics Engineering ● Job ○ Building ○ Testing ○ Cataloging ● Tools ○ DBT ○ Airflow ● Customers ○ Data science ○ Data analysts ○ BI ○ Reporting
  • 33. Data Catalogs ● Flavor of the month ○ Amundsen ○ DataHub ○ Metaphor ○ Marquez ○ Atlan ○ Collibra ○ Alation ● Use cases ○ Discoverability ○ Operations ○ Governance
  • 34. “Reverse ETL syncs data from a system of records like a warehouse to a system of actions like CRM, MAP, and other SaaS apps to operationalize data.” Reverse ETL https://blog.getcensus.com/what-is-reverse-etl/
  • 36. ● https://unsplash.com/photos/Lbvi0GGJWY4 ● https://unsplash.com/photos/8vTAAFYhFfQ ● https://www.flickr.com/photos/jurgenappelo/5201275209 ● Photos