SlideShare a Scribd company logo
DATA VAULT
2.0:
Big Data Meets Data Warehousing
DEAN HALLMAN
WIRESOFT, LLC
DATA WAREHOUSING VS BIG DATA
• Does Big Data replace Data Warehousing? Or do I need both?
• What’s the difference:
• Between the data flowing into a data warehouse vs big data tools?
• Between the ingestion processes and infrastructure?
• Data Lakes arrived with Big Data, so are they useful in Data
Warehousing?
• How should I model my data in EDW?
• 3NF, Star Schema, same as my operational data stores?
• Data Vault 2.0
• Graph Databases
• What is an architecture that allows both to co-exists effectively?
Impressions
(Big Data)
Core
Business
Services
Core
Business
Services
Core
Business
Services
Operational
Data Stores
D
A
T
A
L
A
K
E
Enterprise Data Warehouse
CDC,
snapshot
Internet
External
Data
Sources
Big Data Toolchain
Batch
(SerDe)
StagingVault
RawVault
BusinessVault
InformationMart
Streaming
(Kafka)
Streaming
Analytics
Batch Analytics
(Hadoop)
Schema-on-Read
Schema-on-Write
Data Source
Landing
Clients
ETL
ELT
𝛾
𝛾
𝛾
𝛾 BI Tools
Monitoring
Discovery
Audit
clickstream
(SerDe)
ETL ETL
Impressions
(Big Data)
Core
Business
Services
Core
Business
Services
Core
Business
Services
Operational
Data Stores
D
A
T
A
L
A
K
E
Enterprise Data Warehouse
CDC,
snapshot
Internet
External
Data
Sources
Big Data Toolchain
Batch
(SerDe)
StagingVault
RawVault
BusinessVault
InformationMart
Streaming
(Kafka)
Streaming
Analytics
Batch Analytics
(Hadoop)
Schema-on-Read
Schema-on-Write
Data Source
Landing
Clients
ETL
ELT
𝛾
𝛾
𝛾
𝛾 BI Tools
Monitoring
Discovery
Audit
clickstream
(SerDe)
ETL ETL
THE DATA MODEL
DATA VAULT 2.0
COMMON FOUNDATIONAL WAREHOUSE ARCHITECTURE
• “The Data Vault Model is a detail oriented, historical tracking and uniquely linked
set of normalized tables that support one or more functional areas of business. It is a
hybrid approach encompassing the best of breed between 3rd normal form (3NF)
and star schema. The design is flexible, scalable, consistent and adaptable to the
needs of the enterprise” -- Dan Linstedt, Creator of Data Vault
• Data loaded as-is from sources, no edits or cleanup
• Append-only to afford highest performance
• Agile & agnostic to changes in the operational store’s data model
• Essentially, a prescription for Layered Graph to Relational Mapping
DATA WAREHOUSING & DATA VAULT 2.0
• 60’s, 70’s, 80’s
• E.F. Codd => 3NF
• Bill Inmon invents Data Warehousing
concept
• Dr. Ralph Kimball popularizes Star
Schema design
• 90’s, 00’s:
• Dan Linstedt creates Data Vault Model @
DOD
• 2014:
• Dan Introduces Data Vault 2.0
Source: “What are Graph Databases and Why should I care?“, by Dave Bechberger of Expero
SOLVE BY STAR SCHEMA ?
RELATIONAL VS GRAPH DATABASES
• Enterprise Grade
• Well-worn path
• SQL has been relatively stagnant vs programming languages
GRAPH DATA MODEL
Source: https://neo4j.com/developer/graph-database/
GRAPH DATABASE VS DATA VAULT
GRAPH DATABASE VS DATA VAULT
Flight
Base Dest Forecast
Record
Source
LoadDate Depart Gate
LGA 2018-10-11 1:25P
M
B27
CAE 2018-10-24 3:30P
M
A14
SFO 2018-09-06 8:55P
M
G19
RDU 2018-08-12 4:45P
M
C22
SERVICED_BY
Record Source Airport CAE
Load Date 2018-11-17
Source Id 20181117-32-983
Aircraft
Base Service FAA NTSB
Record
Source
LoadDate Model Tailno
United 2017-02-11 767 1477
Delta 2015-11-04 A6 2381
Alaska 2013-08-28 747 8312
Frontie
r
2016-07-19 182 1438
Record Source United Airlines
Load Date 2018-01-17
Source Id 2412c
SERVICED_BY
Base Dest Manifest
Record
Source
LoadDate Begin End
United 2017-02-11 2017-04-23 2017-09-23
Delta 2015-11-04 2015-12-01 2017-04-22
Alaska 2013-08-28 2013-09-14 2016-05-04
Frontie
r
2016-07-19 2016-08-02 2018-04-11
Record Source United Airlines
Load Date 2018-09-17
Hubs
Links
SatellitesTab
• Organizations which design systems ...
are constrained to produce designs
which are copies of the communication
structures of these organizations
- Mel Conway
FLIGHT
Base Dest Forecast
Record
Source
LoadDate Depart Gate
LGA 2018-10-
11
1:25P
M
B27
CAE 2018-10-
24
3:30P
M
A14
FLIGHT
Record Source Airport CAE
Load Date 2018-11-17
Source Id 20181117-32-983
Aircraft
Bas
e
Service FAA NTSB
Record
Source
LoadDate Model Tailno
United 2017-02-
11
767 1477
Delta 2015-11-
04
A6 2381
Alaska 2013-08-
28
747 8312
Frontie
r
2016-07-
19
182 1438
Record Source United Airlines
Load Date 2018-01-17
Source Id 2412c
Airport
Base Dest Manifest
Record
Source
LoadDate Begin End
United 2017-02-11 2017-04-23 2017-09-
23
Delta 2015-11-04 2015-12-01 2017-04-
22
Alaska 2013-08-28 2013-09-14 2016-05-
04
Frontie
r
2016-07-19 2016-08-02 2018-04-
11
Record Source United Airlines
Load Date 2018-09-17
Airline
Base Service FAA
NTS
B
Record
Source
LoadDate Model Tailno
United 2017-02-11 767 1477
Delta 2015-11-04 A6 2381
Record Source United Airlines
Load Date 2018-01-17
Source Id 2412c
Hubs
Links
SatellitesTab
Source: https://www.wherescape.com/solutions/project-types/data-vault-automation/
• Modeled after self-
organizing networks
• A Business Key identifies a
key concept in business.
• They have a business
meaning
• They are unique and
have very low propensity
to change
• Business keys change
only when the business
change
• Enables (forces) cross-
source modeling
Source: http://www.di.univr.it/documenti/OccorrenzaIns/matdid/matdid232240.pdf
Source: http://www.di.univr.it/documenti/OccorrenzaIns/matdid/matdid232240.pdf
Source: http://www.di.univr.it/documenti/OccorrenzaIns/matdid/matdid232240.pdf
DATA VAULT 2.0 MODELING:
HUBS, LINKS & SATELLITES
@wiresoft/Pathfinder
Impressions
(Big Data)
Core
Business
Services
Core
Business
Services
Core
Business
Services
Operational
Data Stores
D
A
T
A
L
A
K
E
Enterprise Data Warehouse
CDC,
snapshot
Internet
External
Data
Sources
Big Data Toolchain
Batch
(SerDe)
StagingVault
RawVault
BusinessVault
InformationMart
Streaming
(Kafka)
Streaming
Analytics
Batch Analytics
(Hadoop)
Schema-on-Read
Schema-on-Write
Data Source
Landing
Clients
ETL
ELT
𝛾
𝛾
𝛾
𝛾 BI Tools
Monitoring
Discovery
Audit
clickstream
(SerDe)
ETL ETL
THE DATA
Impressions vs Business Data
ENTERPRISE DATA SILOS
Small DataLarge DataBig Data
Describes the
user base
Describes the
Enterprise
Describes the
Product
Instance
Grain
Transaction
Grain
Audit Grain
Impression Grain
Big Data
Enterprise Data
Warehouse
Operational Data Stores
Impression
Analytics
Business
Analytics
External Data Sources
DATA GRANULARITY FUNNEL
Impressions
(Big Data)
Core
Business
Services
Core
Business
Services
Core
Business
Services
Operational
Data Stores
D
A
T
A
L
A
K
E
Enterprise Data Warehouse
CDC,
snapshot
Internet
External
Data
Sources
Big Data Toolchain
Batch
(SerDe)
StagingVault
RawVault
BusinessVault
InformationMart
Streaming
(Kafka)
Streaming
Analytics
Batch Analytics
(Hadoop)
Schema-on-Read
Schema-on-Write
Data Source
Landing
Clients
ETL
ELT
𝛾
𝛾
𝛾
𝛾 BI Tools
Monitoring
Discovery
Audit
clickstream
(SerDe)
ETL ETL
DATA INGESTION
ETL vs ELT vs SerDe
ETL
VS
ELT
VS
SerDe
• Beware the Turing tar-pit, in which
everything is possible, but nothing
of interest is easy
- Alan Perlis
DATA CLASSIFICATION
MATRIX:
DECLARATIVE VS INTERPRETIVE
Declarative Interpretive
HadoopRDBMS
Web Events
Media Player
DATA WAREHOUSING
• Deep Topic
• 60’s, 70’s, 80’s
• E.F. Codd => 3NF
• Bill Inmon invents Data Warehousing
concept
• Dr. Ralph Kimball popularizes Star Schema
design
• 90’s, 00’s:
• Dan Linstedt creates Data Vault Model @
DOD
• 2014:
• Dan Introduces Data Vault 2.0
• Data Warehouse vs Operational Data
Stores
• Data Warehouse as Version Control System
• MapReduce, 2004, Google by Jeffery
Dean and Sanjay, “MAPREDUCE:
SIMPLIFIED DATA PROCESSING ON
LARGE CLUSTERS” , GFS
• Nutch 2005, Hadoop 2006, 2007 - Doug
Cutting
• What exactly is “Big Data”?
BIG DATA
Client
User
Interpreter
Analysis
UNSTRUCTURED USER EXPERIENCE
L
L
n L
ilossy
Client
User
Time Series
Event
Record
Analysis
STRUCTURED USER EXPERIENCE
losslessL
p L
p
L
e
ETL OR SERDE ?
S3
Hadoop
Time Series
Event Record
Analysis
Deserializer
L e
L
d
L
m
Client
User
Serializer
L p
L
p
Eventlog.e Eventlog.d
L
e
Single Source
(Version Locked)
Kafka/Kinesis
LeInternet
ETL
ELT
(SerDe)
vs
Source: https://www.ironsidegroup.com/2015/03/01/etl-vs-elt-whats-the-big-difference/
Schema
On
Write
Schema
On
Read
OTHER CHALLENGES
• Satellites must be loaded chronologically
• Time-based scheduling vs data-availability scheduling
QUESTIONS?
• Contact:
 Dean Hallman
 rdhallman@gmail.com
 Linkedin: https://www.linkedin.com/in/dean-hallman/

More Related Content

What's hot

Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez
Willy Lulciuc
 
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Kent Graziano
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
Omid Vahdaty
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the Bijenkorf
Rob Winters
 
Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-bas...
Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-bas...Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-bas...
Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-bas...
Willy Lulciuc
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
Databricks
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analytics
Imply
 
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsScaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
Matei Zaharia
 
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 DimensionsExtreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Kent Graziano
 
Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...
Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...
Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...
StampedeCon
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
StampedeCon
 
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Databricks
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
Matei Zaharia
 
Making Sense of Schema on Read
Making Sense of Schema on ReadMaking Sense of Schema on Read
Making Sense of Schema on Read
Kent Graziano
 
Big Data or Data Warehousing? How to Leverage Both in the Enterprise
Big Data or Data Warehousing? How to Leverage Both in the EnterpriseBig Data or Data Warehousing? How to Leverage Both in the Enterprise
Big Data or Data Warehousing? How to Leverage Both in the Enterprise
Dean Hallman
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
Mark Kromer
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
Gary Stafford
 
Seeing Redshift: How Amazon Changed Data Warehousing Forever
Seeing Redshift: How Amazon Changed Data Warehousing ForeverSeeing Redshift: How Amazon Changed Data Warehousing Forever
Seeing Redshift: How Amazon Changed Data Warehousing Forever
Inside Analysis
 
Suburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeSuburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data Lake
Torsten Steinbach
 

What's hot (20)

Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez
 
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the Bijenkorf
 
Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-bas...
Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-bas...Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-bas...
Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-bas...
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analytics
 
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsScaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
 
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 DimensionsExtreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
 
Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...
Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...
Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Making Sense of Schema on Read
Making Sense of Schema on ReadMaking Sense of Schema on Read
Making Sense of Schema on Read
 
Big Data or Data Warehousing? How to Leverage Both in the Enterprise
Big Data or Data Warehousing? How to Leverage Both in the EnterpriseBig Data or Data Warehousing? How to Leverage Both in the Enterprise
Big Data or Data Warehousing? How to Leverage Both in the Enterprise
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Seeing Redshift: How Amazon Changed Data Warehousing Forever
Seeing Redshift: How Amazon Changed Data Warehousing ForeverSeeing Redshift: How Amazon Changed Data Warehousing Forever
Seeing Redshift: How Amazon Changed Data Warehousing Forever
 
Suburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeSuburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data Lake
 
LinkedIn2
LinkedIn2LinkedIn2
LinkedIn2
 

Similar to Data Vault 2.0: Big Data Meets Data Warehousing

datavault2.pptx
datavault2.pptxdatavault2.pptx
datavault2.pptx
Mounika662749
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Denodo
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
Jeffrey T. Pollock
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
Jeffrey T. Pollock
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
Mark Kromer
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
Jeffrey T. Pollock
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
10/ EnterpriseDB @ OPEN'16
10/ EnterpriseDB @ OPEN'16 10/ EnterpriseDB @ OPEN'16
10/ EnterpriseDB @ OPEN'16
Kangaroot
 
Building Custom Big Data Integrations
Building Custom Big Data IntegrationsBuilding Custom Big Data Integrations
Building Custom Big Data Integrations
Pat Patterson
 
Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Data
freshdatabos
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Streamsets Inc.
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
Denodo
 
The Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data ImplementationThe Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data Implementation
Inside Analysis
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Eric Kavanagh
 
Dealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityDealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to Infinity
Great Wide Open
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big Data
Treasure Data, Inc.
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
DataWorks Summit
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
Trivadis
 
Data APIs as a Foundation for Systems of Engagement
Data APIs as a Foundation for Systems of EngagementData APIs as a Foundation for Systems of Engagement
Data APIs as a Foundation for Systems of Engagement
Victor Olex
 

Similar to Data Vault 2.0: Big Data Meets Data Warehousing (20)

datavault2.pptx
datavault2.pptxdatavault2.pptx
datavault2.pptx
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
10/ EnterpriseDB @ OPEN'16
10/ EnterpriseDB @ OPEN'16 10/ EnterpriseDB @ OPEN'16
10/ EnterpriseDB @ OPEN'16
 
Building Custom Big Data Integrations
Building Custom Big Data IntegrationsBuilding Custom Big Data Integrations
Building Custom Big Data Integrations
 
Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Data
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
The Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data ImplementationThe Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data Implementation
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Dealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityDealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to Infinity
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big Data
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
 
Data APIs as a Foundation for Systems of Engagement
Data APIs as a Foundation for Systems of EngagementData APIs as a Foundation for Systems of Engagement
Data APIs as a Foundation for Systems of Engagement
 

More from All Things Open

Building Reliability - The Realities of Observability
Building Reliability - The Realities of ObservabilityBuilding Reliability - The Realities of Observability
Building Reliability - The Realities of Observability
All Things Open
 
Modern Database Best Practices
Modern Database Best PracticesModern Database Best Practices
Modern Database Best Practices
All Things Open
 
Open Source and Public Policy
Open Source and Public PolicyOpen Source and Public Policy
Open Source and Public Policy
All Things Open
 
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
All Things Open
 
The State of Passwordless Auth on the Web - Phil Nash
The State of Passwordless Auth on the Web - Phil NashThe State of Passwordless Auth on the Web - Phil Nash
The State of Passwordless Auth on the Web - Phil Nash
All Things Open
 
Total ReDoS: The dangers of regex in JavaScript
Total ReDoS: The dangers of regex in JavaScriptTotal ReDoS: The dangers of regex in JavaScript
Total ReDoS: The dangers of regex in JavaScript
All Things Open
 
What Does Real World Mass Adoption of Decentralized Tech Look Like?
What Does Real World Mass Adoption of Decentralized Tech Look Like?What Does Real World Mass Adoption of Decentralized Tech Look Like?
What Does Real World Mass Adoption of Decentralized Tech Look Like?
All Things Open
 
How to Write & Deploy a Smart Contract
How to Write & Deploy a Smart ContractHow to Write & Deploy a Smart Contract
How to Write & Deploy a Smart Contract
All Things Open
 
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
All Things Open
 
DEI Challenges and Success
DEI Challenges and SuccessDEI Challenges and Success
DEI Challenges and Success
All Things Open
 
Scaling Web Applications with Background
Scaling Web Applications with BackgroundScaling Web Applications with Background
Scaling Web Applications with Background
All Things Open
 
Supercharging tutorials with WebAssembly
Supercharging tutorials with WebAssemblySupercharging tutorials with WebAssembly
Supercharging tutorials with WebAssembly
All Things Open
 
Using SQL to Find Needles in Haystacks
Using SQL to Find Needles in HaystacksUsing SQL to Find Needles in Haystacks
Using SQL to Find Needles in Haystacks
All Things Open
 
Configuration Security as a Game of Pursuit Intercept
Configuration Security as a Game of Pursuit InterceptConfiguration Security as a Game of Pursuit Intercept
Configuration Security as a Game of Pursuit Intercept
All Things Open
 
Scaling an Open Source Sponsorship Program
Scaling an Open Source Sponsorship ProgramScaling an Open Source Sponsorship Program
Scaling an Open Source Sponsorship Program
All Things Open
 
Build Developer Experience Teams for Open Source
Build Developer Experience Teams for Open SourceBuild Developer Experience Teams for Open Source
Build Developer Experience Teams for Open Source
All Things Open
 
Deploying Models at Scale with Apache Beam
Deploying Models at Scale with Apache BeamDeploying Models at Scale with Apache Beam
Deploying Models at Scale with Apache Beam
All Things Open
 
Sudo – Giving access while staying in control
Sudo – Giving access while staying in controlSudo – Giving access while staying in control
Sudo – Giving access while staying in control
All Things Open
 
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML ApplicationsFortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
All Things Open
 
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
All Things Open
 

More from All Things Open (20)

Building Reliability - The Realities of Observability
Building Reliability - The Realities of ObservabilityBuilding Reliability - The Realities of Observability
Building Reliability - The Realities of Observability
 
Modern Database Best Practices
Modern Database Best PracticesModern Database Best Practices
Modern Database Best Practices
 
Open Source and Public Policy
Open Source and Public PolicyOpen Source and Public Policy
Open Source and Public Policy
 
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
 
The State of Passwordless Auth on the Web - Phil Nash
The State of Passwordless Auth on the Web - Phil NashThe State of Passwordless Auth on the Web - Phil Nash
The State of Passwordless Auth on the Web - Phil Nash
 
Total ReDoS: The dangers of regex in JavaScript
Total ReDoS: The dangers of regex in JavaScriptTotal ReDoS: The dangers of regex in JavaScript
Total ReDoS: The dangers of regex in JavaScript
 
What Does Real World Mass Adoption of Decentralized Tech Look Like?
What Does Real World Mass Adoption of Decentralized Tech Look Like?What Does Real World Mass Adoption of Decentralized Tech Look Like?
What Does Real World Mass Adoption of Decentralized Tech Look Like?
 
How to Write & Deploy a Smart Contract
How to Write & Deploy a Smart ContractHow to Write & Deploy a Smart Contract
How to Write & Deploy a Smart Contract
 
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 
DEI Challenges and Success
DEI Challenges and SuccessDEI Challenges and Success
DEI Challenges and Success
 
Scaling Web Applications with Background
Scaling Web Applications with BackgroundScaling Web Applications with Background
Scaling Web Applications with Background
 
Supercharging tutorials with WebAssembly
Supercharging tutorials with WebAssemblySupercharging tutorials with WebAssembly
Supercharging tutorials with WebAssembly
 
Using SQL to Find Needles in Haystacks
Using SQL to Find Needles in HaystacksUsing SQL to Find Needles in Haystacks
Using SQL to Find Needles in Haystacks
 
Configuration Security as a Game of Pursuit Intercept
Configuration Security as a Game of Pursuit InterceptConfiguration Security as a Game of Pursuit Intercept
Configuration Security as a Game of Pursuit Intercept
 
Scaling an Open Source Sponsorship Program
Scaling an Open Source Sponsorship ProgramScaling an Open Source Sponsorship Program
Scaling an Open Source Sponsorship Program
 
Build Developer Experience Teams for Open Source
Build Developer Experience Teams for Open SourceBuild Developer Experience Teams for Open Source
Build Developer Experience Teams for Open Source
 
Deploying Models at Scale with Apache Beam
Deploying Models at Scale with Apache BeamDeploying Models at Scale with Apache Beam
Deploying Models at Scale with Apache Beam
 
Sudo – Giving access while staying in control
Sudo – Giving access while staying in controlSudo – Giving access while staying in control
Sudo – Giving access while staying in control
 
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML ApplicationsFortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
 
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
 

Recently uploaded

"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 

Recently uploaded (20)

"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 

Data Vault 2.0: Big Data Meets Data Warehousing

  • 1. DATA VAULT 2.0: Big Data Meets Data Warehousing DEAN HALLMAN WIRESOFT, LLC
  • 2. DATA WAREHOUSING VS BIG DATA • Does Big Data replace Data Warehousing? Or do I need both? • What’s the difference: • Between the data flowing into a data warehouse vs big data tools? • Between the ingestion processes and infrastructure? • Data Lakes arrived with Big Data, so are they useful in Data Warehousing? • How should I model my data in EDW? • 3NF, Star Schema, same as my operational data stores? • Data Vault 2.0 • Graph Databases • What is an architecture that allows both to co-exists effectively?
  • 3. Impressions (Big Data) Core Business Services Core Business Services Core Business Services Operational Data Stores D A T A L A K E Enterprise Data Warehouse CDC, snapshot Internet External Data Sources Big Data Toolchain Batch (SerDe) StagingVault RawVault BusinessVault InformationMart Streaming (Kafka) Streaming Analytics Batch Analytics (Hadoop) Schema-on-Read Schema-on-Write Data Source Landing Clients ETL ELT 𝛾 𝛾 𝛾 𝛾 BI Tools Monitoring Discovery Audit clickstream (SerDe) ETL ETL
  • 4. Impressions (Big Data) Core Business Services Core Business Services Core Business Services Operational Data Stores D A T A L A K E Enterprise Data Warehouse CDC, snapshot Internet External Data Sources Big Data Toolchain Batch (SerDe) StagingVault RawVault BusinessVault InformationMart Streaming (Kafka) Streaming Analytics Batch Analytics (Hadoop) Schema-on-Read Schema-on-Write Data Source Landing Clients ETL ELT 𝛾 𝛾 𝛾 𝛾 BI Tools Monitoring Discovery Audit clickstream (SerDe) ETL ETL THE DATA MODEL
  • 5. DATA VAULT 2.0 COMMON FOUNDATIONAL WAREHOUSE ARCHITECTURE • “The Data Vault Model is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise” -- Dan Linstedt, Creator of Data Vault • Data loaded as-is from sources, no edits or cleanup • Append-only to afford highest performance • Agile & agnostic to changes in the operational store’s data model • Essentially, a prescription for Layered Graph to Relational Mapping
  • 6. DATA WAREHOUSING & DATA VAULT 2.0 • 60’s, 70’s, 80’s • E.F. Codd => 3NF • Bill Inmon invents Data Warehousing concept • Dr. Ralph Kimball popularizes Star Schema design • 90’s, 00’s: • Dan Linstedt creates Data Vault Model @ DOD • 2014: • Dan Introduces Data Vault 2.0
  • 7.
  • 8. Source: “What are Graph Databases and Why should I care?“, by Dave Bechberger of Expero
  • 9. SOLVE BY STAR SCHEMA ?
  • 10. RELATIONAL VS GRAPH DATABASES • Enterprise Grade • Well-worn path • SQL has been relatively stagnant vs programming languages
  • 11. GRAPH DATA MODEL Source: https://neo4j.com/developer/graph-database/
  • 12. GRAPH DATABASE VS DATA VAULT
  • 13. GRAPH DATABASE VS DATA VAULT
  • 14. Flight Base Dest Forecast Record Source LoadDate Depart Gate LGA 2018-10-11 1:25P M B27 CAE 2018-10-24 3:30P M A14 SFO 2018-09-06 8:55P M G19 RDU 2018-08-12 4:45P M C22 SERVICED_BY Record Source Airport CAE Load Date 2018-11-17 Source Id 20181117-32-983 Aircraft Base Service FAA NTSB Record Source LoadDate Model Tailno United 2017-02-11 767 1477 Delta 2015-11-04 A6 2381 Alaska 2013-08-28 747 8312 Frontie r 2016-07-19 182 1438 Record Source United Airlines Load Date 2018-01-17 Source Id 2412c SERVICED_BY Base Dest Manifest Record Source LoadDate Begin End United 2017-02-11 2017-04-23 2017-09-23 Delta 2015-11-04 2015-12-01 2017-04-22 Alaska 2013-08-28 2013-09-14 2016-05-04 Frontie r 2016-07-19 2016-08-02 2018-04-11 Record Source United Airlines Load Date 2018-09-17 Hubs Links SatellitesTab
  • 15. • Organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations - Mel Conway
  • 16. FLIGHT Base Dest Forecast Record Source LoadDate Depart Gate LGA 2018-10- 11 1:25P M B27 CAE 2018-10- 24 3:30P M A14 FLIGHT Record Source Airport CAE Load Date 2018-11-17 Source Id 20181117-32-983 Aircraft Bas e Service FAA NTSB Record Source LoadDate Model Tailno United 2017-02- 11 767 1477 Delta 2015-11- 04 A6 2381 Alaska 2013-08- 28 747 8312 Frontie r 2016-07- 19 182 1438 Record Source United Airlines Load Date 2018-01-17 Source Id 2412c Airport Base Dest Manifest Record Source LoadDate Begin End United 2017-02-11 2017-04-23 2017-09- 23 Delta 2015-11-04 2015-12-01 2017-04- 22 Alaska 2013-08-28 2013-09-14 2016-05- 04 Frontie r 2016-07-19 2016-08-02 2018-04- 11 Record Source United Airlines Load Date 2018-09-17 Airline Base Service FAA NTS B Record Source LoadDate Model Tailno United 2017-02-11 767 1477 Delta 2015-11-04 A6 2381 Record Source United Airlines Load Date 2018-01-17 Source Id 2412c Hubs Links SatellitesTab
  • 18. • Modeled after self- organizing networks • A Business Key identifies a key concept in business. • They have a business meaning • They are unique and have very low propensity to change • Business keys change only when the business change • Enables (forces) cross- source modeling Source: http://www.di.univr.it/documenti/OccorrenzaIns/matdid/matdid232240.pdf
  • 19.
  • 22. DATA VAULT 2.0 MODELING: HUBS, LINKS & SATELLITES
  • 24. Impressions (Big Data) Core Business Services Core Business Services Core Business Services Operational Data Stores D A T A L A K E Enterprise Data Warehouse CDC, snapshot Internet External Data Sources Big Data Toolchain Batch (SerDe) StagingVault RawVault BusinessVault InformationMart Streaming (Kafka) Streaming Analytics Batch Analytics (Hadoop) Schema-on-Read Schema-on-Write Data Source Landing Clients ETL ELT 𝛾 𝛾 𝛾 𝛾 BI Tools Monitoring Discovery Audit clickstream (SerDe) ETL ETL THE DATA Impressions vs Business Data
  • 25. ENTERPRISE DATA SILOS Small DataLarge DataBig Data Describes the user base Describes the Enterprise Describes the Product
  • 26. Instance Grain Transaction Grain Audit Grain Impression Grain Big Data Enterprise Data Warehouse Operational Data Stores Impression Analytics Business Analytics External Data Sources DATA GRANULARITY FUNNEL
  • 27. Impressions (Big Data) Core Business Services Core Business Services Core Business Services Operational Data Stores D A T A L A K E Enterprise Data Warehouse CDC, snapshot Internet External Data Sources Big Data Toolchain Batch (SerDe) StagingVault RawVault BusinessVault InformationMart Streaming (Kafka) Streaming Analytics Batch Analytics (Hadoop) Schema-on-Read Schema-on-Write Data Source Landing Clients ETL ELT 𝛾 𝛾 𝛾 𝛾 BI Tools Monitoring Discovery Audit clickstream (SerDe) ETL ETL DATA INGESTION ETL vs ELT vs SerDe
  • 28. ETL VS ELT VS SerDe • Beware the Turing tar-pit, in which everything is possible, but nothing of interest is easy - Alan Perlis
  • 29. DATA CLASSIFICATION MATRIX: DECLARATIVE VS INTERPRETIVE Declarative Interpretive HadoopRDBMS Web Events Media Player
  • 30. DATA WAREHOUSING • Deep Topic • 60’s, 70’s, 80’s • E.F. Codd => 3NF • Bill Inmon invents Data Warehousing concept • Dr. Ralph Kimball popularizes Star Schema design • 90’s, 00’s: • Dan Linstedt creates Data Vault Model @ DOD • 2014: • Dan Introduces Data Vault 2.0 • Data Warehouse vs Operational Data Stores • Data Warehouse as Version Control System • MapReduce, 2004, Google by Jeffery Dean and Sanjay, “MAPREDUCE: SIMPLIFIED DATA PROCESSING ON LARGE CLUSTERS” , GFS • Nutch 2005, Hadoop 2006, 2007 - Doug Cutting • What exactly is “Big Data”? BIG DATA
  • 33. ETL OR SERDE ? S3 Hadoop Time Series Event Record Analysis Deserializer L e L d L m Client User Serializer L p L p Eventlog.e Eventlog.d L e Single Source (Version Locked) Kafka/Kinesis LeInternet
  • 35. OTHER CHALLENGES • Satellites must be loaded chronologically • Time-based scheduling vs data-availability scheduling
  • 36. QUESTIONS? • Contact:  Dean Hallman  rdhallman@gmail.com  Linkedin: https://www.linkedin.com/in/dean-hallman/

Editor's Notes

  1. No single answer, but convention over configuration has one the day Data Warehousing --- 60’s, 70’s, 80’s E.F. Codd => 3NF Bill Inmon invents Data Warehousing concept Dr. Ralph Kimball popularizes Star Schema design 90’s, 00’s: Dan Linstedt creates Data Vault Model @ DOD 2014: Dan Introduces Data Vault 2.0 Data Warehouse vs Operational Data Stores Data Warehouse as Version Control System Big Data ----- MapReduce, 2004, Google by Jeffery Dean and Sanjay, “MAPREDUCE: SIMPLIFIED DATA PROCESSING ON LARGE CLUSTERS” , GFS Nutch 2005, Hadoop 2006, 2007 - Doug Cutting What exactly is “Big Data”?
  2. * 30 seconds * Slow down, cover this slide thoroughly * Lambda architecture
  3. * 30 seconds * Slow down, cover this slide thoroughly * Lambda architecture
  4. Data Warehouse vs Operational Data Stores Data Warehouse as Version Control System
  5. Same SQL abstraction level
  6. Graph databases - single property namespace - could impose naming convention as a namespace
  7. Hubs work to counter Conway’s law
  8. 3-way relationship - hypergraph / hyperedge
  9. Provide context and instance data for the hub-link relationship
  10. Encode the relationship between the nouns of your system
  11. * 30 seconds * Slow down, cover this slide thoroughly * Lambda architecture
  12. * 30 seconds * Slow down, cover this slide thoroughly * Lambda architecture
  13. Too close to the forest, forget to see the trees Is the business intelligence scattered out in the field Or centralized in the back office? Actors in the system are intelligent? Learn lanuage, conjugate verbs, form new sentences
  14. Serializer/Deserialize: Reusable package to be imported into a Lambda Test suite that ensures Serializer / Deserializer agree on before/after result