SlideShare a Scribd company logo
Data Tech Talk
ft. external speaker
Chris Riccomini, Author, Engineer & Manager, …
1. Talk: Data Warehousing Trends
2. Open Dialogue: Q & A
Data
Warehousing
Trends 2021/12/09 · Chris Riccomini
Hi, I’m Chris
● Engineer & Manager
WePay, LinkedIn, PayPal
● Open Source
Apache Samza, Apache Airflow, Debezium
● Author
The Missing README
● Investor & Advisor
Prefect, Meroxa, StarTree, Amundsen, Anomalo, TopCoat, ...
@criccomini
The Trends
● Realtime DWHs
● Analytics Engineering
● Data Mesh
● Data Catalogs
● Reverse ETL
● Headless BI
● Data Integrity
● Data Lakehouses
● DataOps
● White-label Data Viz
https://twitter.com/criccomini/status/1451557884769169412
https://preset.io/blog/reshaping-data-engineering/
The Trends
● Realtime DWHs
● Analytics Engineering
● Data Mesh
● Data Catalogs
● Reverse ETL
● Headless BI
● Data Integrity
● Data Lakehouses
● DataOps
● White-label Data Viz
https://twitter.com/criccomini/status/1451557884769169412
https://preset.io/blog/reshaping-data-engineering/
You are here
Realtime DWHs
Batch ETL
Pubsub
Realtime DWH
Realtime DWH
Why Realtime DWHs?
● Debugging
○ Investigate application errors
○ Audit log shows how things changed
● Operational
○ Monitoring
○ Scripts that pull from DWH
● Security/Compliance
○ Audit log
● Customer data products
○ Ad hoc customer reports (e.g. Stripe Sigma, WePay txns)
○ Data clean rooms
Realtime DWHs Technical Advantages
● Handles hard deletes
● No schema requirements (timestamps)
● Replay from Kafka
● Data integration
Realtime DWH Drawbacks
● Operationally complex
● Depends on source DB support (for CDC)
● Inline transformation is harder
● Fixing bad data is harder
Data Mesh
“A data mesh is a type of data platform architecture that embraces the ubiquity of
data in the enterprise by leveraging a domain-oriented, self-serve design.”
● Domain-oriented decentralized data ownership and architecture
● Data as a product
● Self-serve data infrastructure as a platform
● Federated computational governance
Data Mesh
https://towardsdatascience.com/what-is-a-data-mesh-and-how-not-to-mesh-it-up-210710bb41e0
“A data mesh is a type of data platform architecture that embraces the ubiquity of
data in the enterprise by leveraging a domain-oriented, self-serve design.”
● Domain-oriented decentralized data ownership and architecture
● Data as a product
● Self-serve data infrastructure as a platform
● Federated computational governance
Data Mesh
https://towardsdatascience.com/what-is-a-data-mesh-and-how-not-to-mesh-it-up-210710bb41e0
wat.
“A product is any item or service you sell to serve a customer's need or want.”
Data is a Product
https://www.aha.io/roadmapping/guide/product-management/what-is-a-product
● Customers
○ Data scientists
○ Business analysts
○ Finance
○ Sales
○ Product managers
○ Engineers
○ External customers
● Products
○ Recommender systems
○ Billing
○ Fraud
○ Reports
○ Dashboards
Data is a Product
https://cnr.sh/essays/what-the-heck-data-mesh
● Versioned
● Compatible
● Documented
● Monitored
● Self-serve
● Secure (AuthN, AuthZ)
Treat Data Models like APIs
https://cnr.sh/essays/what-the-heck-data-mesh
● Microservice
● DevOps
We’ve Done This Before
https://cnr.sh/essays/what-the-heck-data-mesh
Headless BI
Metrics then
● BI tools to create and visualize metrics
○ Looker
○ Mode
○ Tableau
○ Data Studio
● Answer internal business questions
○ How is a product's health?
○ What does revenue look like?
Metrics now
● Metrics matter for external business workflows
○ Predicting when a customer might churn
○ Notifying users when they reach their capacity limit
○ DS wants to create models to optimize certain metrics
○ Computing customer bills
● BI tools aren’t meant for this
○ Walled garden
○ Have to re-implement the same metrics in different systems
“For data consumption, we heard complaints from decision makers that different
teams reported different numbers for very simple business questions, and
there was no easy way to know which number was correct.”
"...the teams that own metrics would be able to define them once, in a way
that’s consistent across dashboards, automation tools, sales reporting, and so
on. Let’s call this ‘Headless BI’."
Headless BI
https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70
https://basecase.vc/blog/headless-bi
Headless BI
● Programmatically manage business metrics
○ Automated
○ Centralized
○ Documented
○ Validated
○ Metadata/lineage
○ Backfills
○ Cost
○ Privacy
○ Access
○ Deprecation
○ Retention
https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70
Headless BI
https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70
Q&A
● Realtime DWHs
● Analytics Engineering
● Data Mesh
● Data Catalogs
● Reverse ETL
● Headless BI
● Data Integrity
● Data Lakehouses
● DataOps
● White-label Data Viz
Appendix
Analytics Engineering
Analytics engineers provide clean data sets to end users, modeling data in a way
that empowers end users to answer their own questions.
While a data analyst spends their time analyzing data, an analytics engineer
spends their time transforming, testing, deploying, and documenting data.
Analytics engineers apply software engineering best practices like version
control and continuous integration to the analytics code base.
https://www.getdbt.com/what-is-analytics-engineering
Analytics Engineering
● Job
○ Building
○ Testing
○ Cataloging
● Tools
○ DBT
○ Airflow
● Customers
○ Data science
○ Data analysts
○ BI
○ Reporting
Data Catalogs
● Flavor of the month
○ Amundsen
○ DataHub
○ Metaphor
○ Marquez
○ Atlan
○ Collibra
○ Alation
● Use cases
○ Discoverability
○ Operations
○ Governance
“Reverse ETL syncs data from a system of records like a warehouse to a system
of actions like CRM, MAP, and other SaaS apps to operationalize data.”
Reverse ETL
https://blog.getcensus.com/what-is-reverse-etl/
Reverse ETL
https://blog.getcensus.com/what-is-reverse-etl/
● https://unsplash.com/photos/Lbvi0GGJWY4
● https://unsplash.com/photos/8vTAAFYhFfQ
● https://www.flickr.com/photos/jurgenappelo/5201275209
●
Photos
Realtime DWHs

More Related Content

What's hot

5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
MetroStar
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
Alan McSweeney
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
LibbySchulze
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
Neo4j
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
James Serra
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
Harald Erb
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
Data Catalog as a Business Enabler
Data Catalog as a Business EnablerData Catalog as a Business Enabler
Data Catalog as a Business Enabler
Srinivasan Sankar
 
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
HostedbyConfluent
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
James Serra
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Amazon Web Services
 
Data Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph DatabasesData Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph Databases
DATAVERSITY
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
Rajesh Kumar
 
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Denodo
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
Jeffrey T. Pollock
 
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Certus Solutions
 
Snowflake Company Presentation
Snowflake Company PresentationSnowflake Company Presentation
Snowflake Company Presentation
AndrewJiang18
 
Big Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data DemocratizationBig Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data Democratization
Cambridge Semantics
 

What's hot (20)

5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Data Catalog as a Business Enabler
Data Catalog as a Business EnablerData Catalog as a Business Enabler
Data Catalog as a Business Enabler
 
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
Data Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph DatabasesData Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph Databases
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
 
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
 
Snowflake Company Presentation
Snowflake Company PresentationSnowflake Company Presentation
Snowflake Company Presentation
 
Big Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data DemocratizationBig Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data Democratization
 

Similar to Data Warehousing Trends

Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
Databricks
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Denodo
 
Running Data Platforms Like Products
Running Data Platforms Like ProductsRunning Data Platforms Like Products
Running Data Platforms Like Products
VMware Tanzu
 
DATA BI: put key insights at the finger tip of decision makers.
DATA BI: put key insights at the finger tip of decision makers.DATA BI: put key insights at the finger tip of decision makers.
DATA BI: put key insights at the finger tip of decision makers.
ZaraaTitima1
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
DATAVERSITY
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
SingleStore
 
How To Run A Successful BI Project with Hadoop
How To Run A Successful BI Project with HadoopHow To Run A Successful BI Project with Hadoop
How To Run A Successful BI Project with Hadoop
Mammoth Data
 
The Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with VirtualizationThe Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with Virtualization
Inside Analysis
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
DATAVERSITY
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Looker
 
Introduction to Big Data using AWS Services
Introduction to Big Data using AWS ServicesIntroduction to Big Data using AWS Services
Introduction to Big Data using AWS Services
Anjani Phuyal
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution  Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution
Sirinporn Setworaya
 
Digital Operations Service Design
Digital Operations Service DesignDigital Operations Service Design
Digital Operations Service Design
NVISIA
 
No Compromises SQL Connectivity for MongoDB
No Compromises SQL Connectivity for MongoDBNo Compromises SQL Connectivity for MongoDB
No Compromises SQL Connectivity for MongoDB
MongoDB
 
20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science Orientation20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science Orientation
Duc Lai Trung Minh
 
Intro to Neo4j
Intro to Neo4jIntro to Neo4j
Intro to Neo4j
Neo4j
 
SPT 104 Unlock your big data with analytics and BI on Office 365
SPT 104 Unlock your big data with analytics and BI on Office 365SPT 104 Unlock your big data with analytics and BI on Office 365
SPT 104 Unlock your big data with analytics and BI on Office 365
Brian Culver
 
Paving The Way To Data Driven
Paving The Way To Data DrivenPaving The Way To Data Driven
Paving The Way To Data Driven
Mohd Izhar Firdaus Ismail
 
Big Data Analytics with Microsoft
Big Data Analytics with MicrosoftBig Data Analytics with Microsoft
Big Data Analytics with Microsoft
Caserta
 

Similar to Data Warehousing Trends (20)

Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
Running Data Platforms Like Products
Running Data Platforms Like ProductsRunning Data Platforms Like Products
Running Data Platforms Like Products
 
DATA BI: put key insights at the finger tip of decision makers.
DATA BI: put key insights at the finger tip of decision makers.DATA BI: put key insights at the finger tip of decision makers.
DATA BI: put key insights at the finger tip of decision makers.
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
How To Run A Successful BI Project with Hadoop
How To Run A Successful BI Project with HadoopHow To Run A Successful BI Project with Hadoop
How To Run A Successful BI Project with Hadoop
 
The Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with VirtualizationThe Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with Virtualization
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
 
Introduction to Big Data using AWS Services
Introduction to Big Data using AWS ServicesIntroduction to Big Data using AWS Services
Introduction to Big Data using AWS Services
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution  Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution
 
Digital Operations Service Design
Digital Operations Service DesignDigital Operations Service Design
Digital Operations Service Design
 
No Compromises SQL Connectivity for MongoDB
No Compromises SQL Connectivity for MongoDBNo Compromises SQL Connectivity for MongoDB
No Compromises SQL Connectivity for MongoDB
 
20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science Orientation20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science Orientation
 
Intro to Neo4j
Intro to Neo4jIntro to Neo4j
Intro to Neo4j
 
SPT 104 Unlock your big data with analytics and BI on Office 365
SPT 104 Unlock your big data with analytics and BI on Office 365SPT 104 Unlock your big data with analytics and BI on Office 365
SPT 104 Unlock your big data with analytics and BI on Office 365
 
Paving The Way To Data Driven
Paving The Way To Data DrivenPaving The Way To Data Driven
Paving The Way To Data Driven
 
Big Data Analytics with Microsoft
Big Data Analytics with MicrosoftBig Data Analytics with Microsoft
Big Data Analytics with Microsoft
 

More from Chris Riccomini

What Your Tech Lead Thinks You Know (But Didn't Teach You)
What Your Tech Lead Thinks You Know (But Didn't Teach You)What Your Tech Lead Thinks You Know (But Didn't Teach You)
What Your Tech Lead Thinks You Know (But Didn't Teach You)
Chris Riccomini
 
The Future of Data Engineering - 2019 InfoQ QConSF
The Future of Data Engineering - 2019 InfoQ QConSFThe Future of Data Engineering - 2019 InfoQ QConSF
The Future of Data Engineering - 2019 InfoQ QConSF
Chris Riccomini
 
Airflow at WePay
Airflow at WePayAirflow at WePay
Airflow at WePay
Chris Riccomini
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
Chris Riccomini
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
Chris Riccomini
 
Building Applications on YARN
Building Applications on YARNBuilding Applications on YARN
Building Applications on YARN
Chris Riccomini
 

More from Chris Riccomini (6)

What Your Tech Lead Thinks You Know (But Didn't Teach You)
What Your Tech Lead Thinks You Know (But Didn't Teach You)What Your Tech Lead Thinks You Know (But Didn't Teach You)
What Your Tech Lead Thinks You Know (But Didn't Teach You)
 
The Future of Data Engineering - 2019 InfoQ QConSF
The Future of Data Engineering - 2019 InfoQ QConSFThe Future of Data Engineering - 2019 InfoQ QConSF
The Future of Data Engineering - 2019 InfoQ QConSF
 
Airflow at WePay
Airflow at WePayAirflow at WePay
Airflow at WePay
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
 
Building Applications on YARN
Building Applications on YARNBuilding Applications on YARN
Building Applications on YARN
 

Recently uploaded

Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
mahaffeycheryld
 
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENTNATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
Addu25809
 
2. protection of river banks and bed erosion protection works.ppt
2. protection of river banks and bed erosion protection works.ppt2. protection of river banks and bed erosion protection works.ppt
2. protection of river banks and bed erosion protection works.ppt
abdatawakjira
 
AI-Based Home Security System : Home security
AI-Based Home Security System : Home securityAI-Based Home Security System : Home security
AI-Based Home Security System : Home security
AIRCC Publishing Corporation
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
OOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming languageOOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming language
PreethaV16
 
Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
PreethaV16
 
TIME TABLE MANAGEMENT SYSTEM testing.pptx
TIME TABLE MANAGEMENT SYSTEM testing.pptxTIME TABLE MANAGEMENT SYSTEM testing.pptx
TIME TABLE MANAGEMENT SYSTEM testing.pptx
CVCSOfficial
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
Kamal Acharya
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
Gino153088
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
upoux
 
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
upoux
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
sydezfe
 
5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf
AlvianRamadhani5
 
Height and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdfHeight and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdf
q30122000
 
Blood finder application project report (1).pdf
Blood finder application project report (1).pdfBlood finder application project report (1).pdf
Blood finder application project report (1).pdf
Kamal Acharya
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
aryanpankaj78
 
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
ijseajournal
 

Recently uploaded (20)

Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
 
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENTNATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
 
2. protection of river banks and bed erosion protection works.ppt
2. protection of river banks and bed erosion protection works.ppt2. protection of river banks and bed erosion protection works.ppt
2. protection of river banks and bed erosion protection works.ppt
 
AI-Based Home Security System : Home security
AI-Based Home Security System : Home securityAI-Based Home Security System : Home security
AI-Based Home Security System : Home security
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
OOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming languageOOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming language
 
Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
 
TIME TABLE MANAGEMENT SYSTEM testing.pptx
TIME TABLE MANAGEMENT SYSTEM testing.pptxTIME TABLE MANAGEMENT SYSTEM testing.pptx
TIME TABLE MANAGEMENT SYSTEM testing.pptx
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
 
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
 
5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf
 
Height and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdfHeight and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdf
 
Blood finder application project report (1).pdf
Blood finder application project report (1).pdfBlood finder application project report (1).pdf
Blood finder application project report (1).pdf
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
 
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
 

Data Warehousing Trends

  • 1. Data Tech Talk ft. external speaker Chris Riccomini, Author, Engineer & Manager, … 1. Talk: Data Warehousing Trends 2. Open Dialogue: Q & A
  • 3. Hi, I’m Chris ● Engineer & Manager WePay, LinkedIn, PayPal ● Open Source Apache Samza, Apache Airflow, Debezium ● Author The Missing README ● Investor & Advisor Prefect, Meroxa, StarTree, Amundsen, Anomalo, TopCoat, ... @criccomini
  • 4.
  • 5. The Trends ● Realtime DWHs ● Analytics Engineering ● Data Mesh ● Data Catalogs ● Reverse ETL ● Headless BI ● Data Integrity ● Data Lakehouses ● DataOps ● White-label Data Viz https://twitter.com/criccomini/status/1451557884769169412 https://preset.io/blog/reshaping-data-engineering/
  • 6. The Trends ● Realtime DWHs ● Analytics Engineering ● Data Mesh ● Data Catalogs ● Reverse ETL ● Headless BI ● Data Integrity ● Data Lakehouses ● DataOps ● White-label Data Viz https://twitter.com/criccomini/status/1451557884769169412 https://preset.io/blog/reshaping-data-engineering/
  • 13. Why Realtime DWHs? ● Debugging ○ Investigate application errors ○ Audit log shows how things changed ● Operational ○ Monitoring ○ Scripts that pull from DWH ● Security/Compliance ○ Audit log ● Customer data products ○ Ad hoc customer reports (e.g. Stripe Sigma, WePay txns) ○ Data clean rooms
  • 14. Realtime DWHs Technical Advantages ● Handles hard deletes ● No schema requirements (timestamps) ● Replay from Kafka ● Data integration
  • 15. Realtime DWH Drawbacks ● Operationally complex ● Depends on source DB support (for CDC) ● Inline transformation is harder ● Fixing bad data is harder
  • 17. “A data mesh is a type of data platform architecture that embraces the ubiquity of data in the enterprise by leveraging a domain-oriented, self-serve design.” ● Domain-oriented decentralized data ownership and architecture ● Data as a product ● Self-serve data infrastructure as a platform ● Federated computational governance Data Mesh https://towardsdatascience.com/what-is-a-data-mesh-and-how-not-to-mesh-it-up-210710bb41e0
  • 18. “A data mesh is a type of data platform architecture that embraces the ubiquity of data in the enterprise by leveraging a domain-oriented, self-serve design.” ● Domain-oriented decentralized data ownership and architecture ● Data as a product ● Self-serve data infrastructure as a platform ● Federated computational governance Data Mesh https://towardsdatascience.com/what-is-a-data-mesh-and-how-not-to-mesh-it-up-210710bb41e0 wat.
  • 19. “A product is any item or service you sell to serve a customer's need or want.” Data is a Product https://www.aha.io/roadmapping/guide/product-management/what-is-a-product
  • 20. ● Customers ○ Data scientists ○ Business analysts ○ Finance ○ Sales ○ Product managers ○ Engineers ○ External customers ● Products ○ Recommender systems ○ Billing ○ Fraud ○ Reports ○ Dashboards Data is a Product https://cnr.sh/essays/what-the-heck-data-mesh
  • 21. ● Versioned ● Compatible ● Documented ● Monitored ● Self-serve ● Secure (AuthN, AuthZ) Treat Data Models like APIs https://cnr.sh/essays/what-the-heck-data-mesh
  • 22. ● Microservice ● DevOps We’ve Done This Before https://cnr.sh/essays/what-the-heck-data-mesh
  • 24. Metrics then ● BI tools to create and visualize metrics ○ Looker ○ Mode ○ Tableau ○ Data Studio ● Answer internal business questions ○ How is a product's health? ○ What does revenue look like?
  • 25. Metrics now ● Metrics matter for external business workflows ○ Predicting when a customer might churn ○ Notifying users when they reach their capacity limit ○ DS wants to create models to optimize certain metrics ○ Computing customer bills ● BI tools aren’t meant for this ○ Walled garden ○ Have to re-implement the same metrics in different systems
  • 26. “For data consumption, we heard complaints from decision makers that different teams reported different numbers for very simple business questions, and there was no easy way to know which number was correct.” "...the teams that own metrics would be able to define them once, in a way that’s consistent across dashboards, automation tools, sales reporting, and so on. Let’s call this ‘Headless BI’." Headless BI https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70 https://basecase.vc/blog/headless-bi
  • 27. Headless BI ● Programmatically manage business metrics ○ Automated ○ Centralized ○ Documented ○ Validated ○ Metadata/lineage ○ Backfills ○ Cost ○ Privacy ○ Access ○ Deprecation ○ Retention https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70
  • 29. Q&A ● Realtime DWHs ● Analytics Engineering ● Data Mesh ● Data Catalogs ● Reverse ETL ● Headless BI ● Data Integrity ● Data Lakehouses ● DataOps ● White-label Data Viz
  • 31. Analytics Engineering Analytics engineers provide clean data sets to end users, modeling data in a way that empowers end users to answer their own questions. While a data analyst spends their time analyzing data, an analytics engineer spends their time transforming, testing, deploying, and documenting data. Analytics engineers apply software engineering best practices like version control and continuous integration to the analytics code base. https://www.getdbt.com/what-is-analytics-engineering
  • 32. Analytics Engineering ● Job ○ Building ○ Testing ○ Cataloging ● Tools ○ DBT ○ Airflow ● Customers ○ Data science ○ Data analysts ○ BI ○ Reporting
  • 33. Data Catalogs ● Flavor of the month ○ Amundsen ○ DataHub ○ Metaphor ○ Marquez ○ Atlan ○ Collibra ○ Alation ● Use cases ○ Discoverability ○ Operations ○ Governance
  • 34. “Reverse ETL syncs data from a system of records like a warehouse to a system of actions like CRM, MAP, and other SaaS apps to operationalize data.” Reverse ETL https://blog.getcensus.com/what-is-reverse-etl/
  • 36. ● https://unsplash.com/photos/Lbvi0GGJWY4 ● https://unsplash.com/photos/8vTAAFYhFfQ ● https://www.flickr.com/photos/jurgenappelo/5201275209 ● Photos