SlideShare a Scribd company logo
Data Vault
VS
Data Lake
C o a c h F r u L o u i s | w w w. f r u l o u i s . c o m
Tech|Career|Inspiration
Data Basics
2
Agenda
@Coachfrulouis
Tech|Career|Inspiration
02. What is Data Vaults?
01. What is Data Lakes?
03. Friends or Foes?
04. Possibilities… Agenda
3
•Semi structured,
unstructured, raw
Schema on read Low cost storage
Agile and easy
reconfigure
Data scientist,
Experimentations
Data Lakes: Definition & Characteristics
Democratize Data
Supports all Data
formats
Schema flexibility
Advanced Analytics
Scalability
Data Visualization
Data Filtering
Machine Learning
Dashboards
Batch Processing
Interactive Processing
Data Lake
Sources
Consumers/
Analysts/
Reports/
Data Scientists
E(xtract) T(ransform)L(oad)
Hadoop, HDFS, S3, Spark, Databricks, e.t.c R, Pig, Solr, Hive, Presto, Tableau,
Definition: A data lake is a system or repository of data stored in its natural/raw format,
usually object blobs or files. A data lake is usually a single store of all enterprise data
including raw copies of source system data and transformed data used for tasks such as
reporting, visualization, advanced analytics and machine learning.
https://en.wikipedia.org/wiki/Data_lake
4Data Lakes: History & Evolution
2006: Amazon AWS Launches
2008: Yahoo Open Sources Hadoop **
2009: Cloudera Forms
2009: AWS Elastic MapReduce
2010: Apache Hive release
2010: John Dickson, coins the term Data Lake
2011: Horton Works Forms
2015: Snowflake released on AWS
2015: Hive and Presto released on AWS
2017: AWS Athena released
Democratize Data
Supports all Data
formats
Schema flexibility
Advanced Analytics
Scalability
5Data Vault (Modeling): Definition & Characteristics
Democratize Data
Supports all Data
formats
Schema flexibility
Advanced Analytics
Scalability
Data Vault modeling is a database modeling method that is designed to provide long-term
historical storage of data coming in from multiple operational systems. It is also a method
of looking at historical data that deals with issues such as auditing, tracing of data, loading
speed and resilience to change as well as emphasizing the need to trace where all the
data in the database came from.
Data Visualization
Data Filtering
Machine Learning
Dashboards
Batch Processing
Interactive Processing
Data Lake
Sources
Consumers/
Analysts/
Reports/
Data Scientists
E(xtract) T(ransform)L(oad)
Hadoop, HDFS, S3, Spark,
Databricks, e.t.c
R, Pig, Solr, Hive,
Presto, Tableau,
Data Vault
Modelling
/Harmonize
Hive, Snowflake,
BigQuery, Redshift,
Oracle, Synapse, e.t.c.
•Semi structured,
unstructured, raw
Schema on read Low cost storage
Agile and easy
reconfigure
Data scientist,
Experimentations
Data Science / Exploration
https://en.wikipedia.org/wiki/Data_vault_modeling
6Data Vault (Modeling): History & Evolution
1960s: E.F. Codd => 3NF
Bill Inmon Invents Data Warehouse
Dr. R. Kimball champions star schema
1990s: Conceived by Dan Linstedt
2000: DV 1.0 Released into public
domain
2014: DV 2.0 Announced
7
Data Vault (Modelling)
Sats (Satellites): These are the
complete source tables that contain
descriptive information and time
attributes so we can track changes and
do point-in-time analysis.
Hubs: These contain the
business keys and any metadata.
Nothing descriptive is written to a
Hub.
Links: Links connect one or more Hubs
together.
The Data Vault modelling is a technique used to store
source data at a more granular level. Generally, the data
is not changed in any way, other than to add load date
keys to track changes.
1) Instead of each master table in 3NF,
we add a hub and a satellite.
2) Instead of the transactional table, we
add Link table and Satellite.
3) Instead of the joins between master
tables, we add Link tables.
http://bukhantsov.org/2012/04/what-is-data-vault/
Dimensional Model
Data Vault Model
8
Data Visualization
Data Filtering
Machine Learning
Dashboards
Batch Processing
Interactive Processing
Data Lake
Sources
Consumers/
Analysts/
Reports/
Data Scientists
E(xtract) T(ransform)L(oad)
Hadoop, HDFS, S3, Spark, Databricks, e.t.c R, Pig, Solr, Hive, Presto,
Tableau,
Modelling
/Harmonize
RDBMS: Hive, Snowflake, BigQuery,
Redshift,
Oracle, Synapse, e.t.c.
Verdict: Data Vault vs Data Lakes?
Data Warehousing
Modeling Techniques
Data Vault
Modelling
Dimensional
Modelling
(3NF)
Others
Verdict: This
comparison is a
misnomer. Data Vaults
don’t compete with Data
Lakes. DV compliments
Data Lakes for better
analytics i.e.
Data Lakes + Data
Vault (Modelling)
Data Science / Exploration
Modelling
/Harmonize
Consumption
Thanks
Tech|Career|Inspiration
F I N I S H w w w. f r u l o u i s . c o m

More Related Content

What's hot

Data Science Toolchain 101
Data Science Toolchain 101Data Science Toolchain 101
Data Science Toolchain 101
Francis Michael Bautista
 
Hadoop
HadoopHadoop
Hadoop
Aarti Bedre
 
A hint of_mint
A hint of_mintA hint of_mint
A hint of_mint
Peter Sefton
 
First NL-HUG: Large-scale data processing at SARA with Apache Hadoop
First NL-HUG: Large-scale data processing at SARA with Apache HadoopFirst NL-HUG: Large-scale data processing at SARA with Apache Hadoop
First NL-HUG: Large-scale data processing at SARA with Apache Hadoop
Evert Lammerts
 
Open source analytics
Open source analyticsOpen source analytics
Open source analytics
Ajay Ohri
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Spark Summit
 
13 09-28 hadoop-in_taiwan_2013_opening
13 09-28 hadoop-in_taiwan_2013_opening13 09-28 hadoop-in_taiwan_2013_opening
13 09-28 hadoop-in_taiwan_2013_openingJazz Yao-Tsung Wang
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridEvert Lammerts
 
Managing research data at Bristol
Managing research data at BristolManaging research data at Bristol
Managing research data at Bristol
Simon Price
 
Introduction to SARA's Hadoop Hackathon - dec 7th 2010
Introduction to SARA's Hadoop Hackathon - dec 7th 2010Introduction to SARA's Hadoop Hackathon - dec 7th 2010
Introduction to SARA's Hadoop Hackathon - dec 7th 2010
Evert Lammerts
 
Hadoop Case Studies in the Real World
Hadoop Case Studies in the Real WorldHadoop Case Studies in the Real World
Hadoop Case Studies in the Real World
Mobin Ranjbar
 
Analyzing Data With Python
Analyzing Data With PythonAnalyzing Data With Python
Analyzing Data With Python
Sarah Guido
 
Notes on data-intensive processing with Hadoop Mapreduce
Notes on data-intensive processing with Hadoop MapreduceNotes on data-intensive processing with Hadoop Mapreduce
Notes on data-intensive processing with Hadoop Mapreduce
Evert Lammerts
 
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
Geoffrey Fox
 
Hadoop
HadoopHadoop
Hadoop
Ankit Prasad
 
Database novelty detection
Database novelty detectionDatabase novelty detection
Database novelty detection
MostafaAliAbbas
 
Hadoop
HadoopHadoop
Text Mining with Node.js - Philipp Burckhardt, Carnegie Mellon University
Text Mining with Node.js - Philipp Burckhardt, Carnegie Mellon UniversityText Mining with Node.js - Philipp Burckhardt, Carnegie Mellon University
Text Mining with Node.js - Philipp Burckhardt, Carnegie Mellon University
NodejsFoundation
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different Facets
Geoffrey Fox
 
Large-Scale Data Storage and Processing for Scientists with Hadoop
Large-Scale Data Storage and Processing for Scientists with HadoopLarge-Scale Data Storage and Processing for Scientists with Hadoop
Large-Scale Data Storage and Processing for Scientists with Hadoop
Evert Lammerts
 

What's hot (20)

Data Science Toolchain 101
Data Science Toolchain 101Data Science Toolchain 101
Data Science Toolchain 101
 
Hadoop
HadoopHadoop
Hadoop
 
A hint of_mint
A hint of_mintA hint of_mint
A hint of_mint
 
First NL-HUG: Large-scale data processing at SARA with Apache Hadoop
First NL-HUG: Large-scale data processing at SARA with Apache HadoopFirst NL-HUG: Large-scale data processing at SARA with Apache Hadoop
First NL-HUG: Large-scale data processing at SARA with Apache Hadoop
 
Open source analytics
Open source analyticsOpen source analytics
Open source analytics
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
 
13 09-28 hadoop-in_taiwan_2013_opening
13 09-28 hadoop-in_taiwan_2013_opening13 09-28 hadoop-in_taiwan_2013_opening
13 09-28 hadoop-in_taiwan_2013_opening
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
Managing research data at Bristol
Managing research data at BristolManaging research data at Bristol
Managing research data at Bristol
 
Introduction to SARA's Hadoop Hackathon - dec 7th 2010
Introduction to SARA's Hadoop Hackathon - dec 7th 2010Introduction to SARA's Hadoop Hackathon - dec 7th 2010
Introduction to SARA's Hadoop Hackathon - dec 7th 2010
 
Hadoop Case Studies in the Real World
Hadoop Case Studies in the Real WorldHadoop Case Studies in the Real World
Hadoop Case Studies in the Real World
 
Analyzing Data With Python
Analyzing Data With PythonAnalyzing Data With Python
Analyzing Data With Python
 
Notes on data-intensive processing with Hadoop Mapreduce
Notes on data-intensive processing with Hadoop MapreduceNotes on data-intensive processing with Hadoop Mapreduce
Notes on data-intensive processing with Hadoop Mapreduce
 
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
 
Hadoop
HadoopHadoop
Hadoop
 
Database novelty detection
Database novelty detectionDatabase novelty detection
Database novelty detection
 
Hadoop
HadoopHadoop
Hadoop
 
Text Mining with Node.js - Philipp Burckhardt, Carnegie Mellon University
Text Mining with Node.js - Philipp Burckhardt, Carnegie Mellon UniversityText Mining with Node.js - Philipp Burckhardt, Carnegie Mellon University
Text Mining with Node.js - Philipp Burckhardt, Carnegie Mellon University
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different Facets
 
Large-Scale Data Storage and Processing for Scientists with Hadoop
Large-Scale Data Storage and Processing for Scientists with HadoopLarge-Scale Data Storage and Processing for Scientists with Hadoop
Large-Scale Data Storage and Processing for Scientists with Hadoop
 

Similar to Data Vault vs Data Lake: What's the difference?

So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
David P. Moore
 
Data Engineering A Deep Dive into Databricks
Data Engineering A Deep Dive into DatabricksData Engineering A Deep Dive into Databricks
Data Engineering A Deep Dive into Databricks
Knoldus Inc.
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
Jim Dowling
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
WU (Vienna University of Economics and Business)
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera, Inc.
 
Strata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache SparkStrata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache Spark
Databricks
 
تقنيات البيانات الضخمة.pptx
تقنيات البيانات الضخمة.pptxتقنيات البيانات الضخمة.pptx
تقنيات البيانات الضخمة.pptx
Fahad Alamoudi
 
OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...
OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...
OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...
NETWAYS
 
OSDC 2017 | An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...
OSDC 2017 |  An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...OSDC 2017 |  An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...
OSDC 2017 | An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...
NETWAYS
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
CCG
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
Ivan Herman
 
Ceph Days 2014 Paul Evans Slide Deck
Ceph Days 2014 Paul Evans Slide DeckCeph Days 2014 Paul Evans Slide Deck
Ceph Days 2014 Paul Evans Slide Deck
DaystromTech
 
Getting the most out of your containerized database
Getting the most out of your containerized databaseGetting the most out of your containerized database
Getting the most out of your containerized database
Claus Matzinger
 
Slides 111017220255-phpapp01
Slides 111017220255-phpapp01Slides 111017220255-phpapp01
Slides 111017220255-phpapp01Ken Mwai
 
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasPython for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasWes McKinney
 
Data Analytics using MATLAB and HDF5
Data Analytics using MATLAB and HDF5Data Analytics using MATLAB and HDF5
Data Analytics using MATLAB and HDF5
The HDF-EOS Tools and Information Center
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Databricks
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Lviv Startup Club
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
Sören Auer
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
Spark Summit
 

Similar to Data Vault vs Data Lake: What's the difference? (20)

So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
 
Data Engineering A Deep Dive into Databricks
Data Engineering A Deep Dive into DatabricksData Engineering A Deep Dive into Databricks
Data Engineering A Deep Dive into Databricks
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
 
Strata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache SparkStrata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache Spark
 
تقنيات البيانات الضخمة.pptx
تقنيات البيانات الضخمة.pptxتقنيات البيانات الضخمة.pptx
تقنيات البيانات الضخمة.pptx
 
OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...
OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...
OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...
 
OSDC 2017 | An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...
OSDC 2017 |  An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...OSDC 2017 |  An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...
OSDC 2017 | An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
Ceph Days 2014 Paul Evans Slide Deck
Ceph Days 2014 Paul Evans Slide DeckCeph Days 2014 Paul Evans Slide Deck
Ceph Days 2014 Paul Evans Slide Deck
 
Getting the most out of your containerized database
Getting the most out of your containerized databaseGetting the most out of your containerized database
Getting the most out of your containerized database
 
Slides 111017220255-phpapp01
Slides 111017220255-phpapp01Slides 111017220255-phpapp01
Slides 111017220255-phpapp01
 
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasPython for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
 
Data Analytics using MATLAB and HDF5
Data Analytics using MATLAB and HDF5Data Analytics using MATLAB and HDF5
Data Analytics using MATLAB and HDF5
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
 

More from Fru Louis

TPC Benchmarking Explained: Transaction Processing Performance Council | fr...
TPC Benchmarking Explained: Transaction Processing Performance Council   | fr...TPC Benchmarking Explained: Transaction Processing Performance Council   | fr...
TPC Benchmarking Explained: Transaction Processing Performance Council | fr...
Fru Louis
 
SAP Advanced Lecture | FruTech.io
SAP Advanced Lecture | FruTech.ioSAP Advanced Lecture | FruTech.io
SAP Advanced Lecture | FruTech.io
Fru Louis
 
Fru 2022 | Tech Trends, Themes, Thoughts, Perspectives and Predictions
Fru 2022 | Tech Trends, Themes, Thoughts, Perspectives and PredictionsFru 2022 | Tech Trends, Themes, Thoughts, Perspectives and Predictions
Fru 2022 | Tech Trends, Themes, Thoughts, Perspectives and Predictions
Fru Louis
 
10 Top Newsworthy Tech Industry Headlines Of 2020 | Other Than COVID
10 Top Newsworthy Tech Industry Headlines Of 2020 | Other Than COVID10 Top Newsworthy Tech Industry Headlines Of 2020 | Other Than COVID
10 Top Newsworthy Tech Industry Headlines Of 2020 | Other Than COVID
Fru Louis
 
10 Tech Conferences to Attend in 2021
10 Tech Conferences to Attend in 202110 Tech Conferences to Attend in 2021
10 Tech Conferences to Attend in 2021
Fru Louis
 
10 Jobs in Tech that DON'T require you to CODE
10 Jobs in Tech that DON'T require you to CODE10 Jobs in Tech that DON'T require you to CODE
10 Jobs in Tech that DON'T require you to CODE
Fru Louis
 
10 Brilliant ‘Tech’ Gifts for Remote Workers | 2021
10 Brilliant ‘Tech’ Gifts for Remote Workers |  2021 10 Brilliant ‘Tech’ Gifts for Remote Workers |  2021
10 Brilliant ‘Tech’ Gifts for Remote Workers | 2021
Fru Louis
 
10 Most Used AWS Services To LEARN For A Career Boost
10 Most Used AWS Services To LEARN For A Career Boost10 Most Used AWS Services To LEARN For A Career Boost
10 Most Used AWS Services To LEARN For A Career Boost
Fru Louis
 
10 Soft Skills Every Tech Professional Must Master For Career Success
10 Soft Skills Every Tech Professional Must Master For Career Success10 Soft Skills Every Tech Professional Must Master For Career Success
10 Soft Skills Every Tech Professional Must Master For Career Success
Fru Louis
 
10 Basic Skills Needed For Entry Level I.T. Jobs
10 Basic Skills Needed For Entry Level I.T. Jobs10 Basic Skills Needed For Entry Level I.T. Jobs
10 Basic Skills Needed For Entry Level I.T. Jobs
Fru Louis
 
10 Beginner Settings to Look At with Snowflake Web UI
10 Beginner Settings to Look At with Snowflake Web UI10 Beginner Settings to Look At with Snowflake Web UI
10 Beginner Settings to Look At with Snowflake Web UI
Fru Louis
 
10 Smart Questions To Ask Hiring Managers In A Job Interview
10 Smart Questions To Ask Hiring Managers In A Job Interview10 Smart Questions To Ask Hiring Managers In A Job Interview
10 Smart Questions To Ask Hiring Managers In A Job Interview
Fru Louis
 
10 Non-Tech Degrees That Will Help You With a Career In I.T.
10 Non-Tech Degrees That Will Help You With a Career In I.T. 10 Non-Tech Degrees That Will Help You With a Career In I.T.
10 Non-Tech Degrees That Will Help You With a Career In I.T.
Fru Louis
 
10 Amazing Benefits and Advantages of Online Degrees
10 Amazing Benefits and Advantages of Online Degrees10 Amazing Benefits and Advantages of Online Degrees
10 Amazing Benefits and Advantages of Online Degrees
Fru Louis
 
10 Concepts EVERY Self-Taught Tech Professionals Should Know
10 Concepts EVERY Self-Taught Tech Professionals Should Know10 Concepts EVERY Self-Taught Tech Professionals Should Know
10 Concepts EVERY Self-Taught Tech Professionals Should Know
Fru Louis
 
10 Acronyms Every TECH Professional Should Know | 2021
10 Acronyms Every TECH Professional Should Know | 202110 Acronyms Every TECH Professional Should Know | 2021
10 Acronyms Every TECH Professional Should Know | 2021
Fru Louis
 
10 Brilliant ‘Tech’ Gifts for Remote Workers | 2021
10 Brilliant ‘Tech’ Gifts for Remote Workers |  2021 10 Brilliant ‘Tech’ Gifts for Remote Workers |  2021
10 Brilliant ‘Tech’ Gifts for Remote Workers | 2021
Fru Louis
 
10 Tech Essentials Your Home Office Needs | Work From Home | WFH | 2021
10 Tech Essentials Your Home Office Needs | Work From Home | WFH | 202110 Tech Essentials Your Home Office Needs | Work From Home | WFH | 2021
10 Tech Essentials Your Home Office Needs | Work From Home | WFH | 2021
Fru Louis
 
10 Data Science, Machine Learning & AI Projects You Can Try Today
10 Data Science, Machine Learning & AI Projects You Can Try Today10 Data Science, Machine Learning & AI Projects You Can Try Today
10 Data Science, Machine Learning & AI Projects You Can Try Today
Fru Louis
 
10 Things Every Tech Employee Must Do Right After You Land Your Dream Job
10 Things Every Tech Employee Must Do Right After You Land Your Dream Job10 Things Every Tech Employee Must Do Right After You Land Your Dream Job
10 Things Every Tech Employee Must Do Right After You Land Your Dream Job
Fru Louis
 

More from Fru Louis (20)

TPC Benchmarking Explained: Transaction Processing Performance Council | fr...
TPC Benchmarking Explained: Transaction Processing Performance Council   | fr...TPC Benchmarking Explained: Transaction Processing Performance Council   | fr...
TPC Benchmarking Explained: Transaction Processing Performance Council | fr...
 
SAP Advanced Lecture | FruTech.io
SAP Advanced Lecture | FruTech.ioSAP Advanced Lecture | FruTech.io
SAP Advanced Lecture | FruTech.io
 
Fru 2022 | Tech Trends, Themes, Thoughts, Perspectives and Predictions
Fru 2022 | Tech Trends, Themes, Thoughts, Perspectives and PredictionsFru 2022 | Tech Trends, Themes, Thoughts, Perspectives and Predictions
Fru 2022 | Tech Trends, Themes, Thoughts, Perspectives and Predictions
 
10 Top Newsworthy Tech Industry Headlines Of 2020 | Other Than COVID
10 Top Newsworthy Tech Industry Headlines Of 2020 | Other Than COVID10 Top Newsworthy Tech Industry Headlines Of 2020 | Other Than COVID
10 Top Newsworthy Tech Industry Headlines Of 2020 | Other Than COVID
 
10 Tech Conferences to Attend in 2021
10 Tech Conferences to Attend in 202110 Tech Conferences to Attend in 2021
10 Tech Conferences to Attend in 2021
 
10 Jobs in Tech that DON'T require you to CODE
10 Jobs in Tech that DON'T require you to CODE10 Jobs in Tech that DON'T require you to CODE
10 Jobs in Tech that DON'T require you to CODE
 
10 Brilliant ‘Tech’ Gifts for Remote Workers | 2021
10 Brilliant ‘Tech’ Gifts for Remote Workers |  2021 10 Brilliant ‘Tech’ Gifts for Remote Workers |  2021
10 Brilliant ‘Tech’ Gifts for Remote Workers | 2021
 
10 Most Used AWS Services To LEARN For A Career Boost
10 Most Used AWS Services To LEARN For A Career Boost10 Most Used AWS Services To LEARN For A Career Boost
10 Most Used AWS Services To LEARN For A Career Boost
 
10 Soft Skills Every Tech Professional Must Master For Career Success
10 Soft Skills Every Tech Professional Must Master For Career Success10 Soft Skills Every Tech Professional Must Master For Career Success
10 Soft Skills Every Tech Professional Must Master For Career Success
 
10 Basic Skills Needed For Entry Level I.T. Jobs
10 Basic Skills Needed For Entry Level I.T. Jobs10 Basic Skills Needed For Entry Level I.T. Jobs
10 Basic Skills Needed For Entry Level I.T. Jobs
 
10 Beginner Settings to Look At with Snowflake Web UI
10 Beginner Settings to Look At with Snowflake Web UI10 Beginner Settings to Look At with Snowflake Web UI
10 Beginner Settings to Look At with Snowflake Web UI
 
10 Smart Questions To Ask Hiring Managers In A Job Interview
10 Smart Questions To Ask Hiring Managers In A Job Interview10 Smart Questions To Ask Hiring Managers In A Job Interview
10 Smart Questions To Ask Hiring Managers In A Job Interview
 
10 Non-Tech Degrees That Will Help You With a Career In I.T.
10 Non-Tech Degrees That Will Help You With a Career In I.T. 10 Non-Tech Degrees That Will Help You With a Career In I.T.
10 Non-Tech Degrees That Will Help You With a Career In I.T.
 
10 Amazing Benefits and Advantages of Online Degrees
10 Amazing Benefits and Advantages of Online Degrees10 Amazing Benefits and Advantages of Online Degrees
10 Amazing Benefits and Advantages of Online Degrees
 
10 Concepts EVERY Self-Taught Tech Professionals Should Know
10 Concepts EVERY Self-Taught Tech Professionals Should Know10 Concepts EVERY Self-Taught Tech Professionals Should Know
10 Concepts EVERY Self-Taught Tech Professionals Should Know
 
10 Acronyms Every TECH Professional Should Know | 2021
10 Acronyms Every TECH Professional Should Know | 202110 Acronyms Every TECH Professional Should Know | 2021
10 Acronyms Every TECH Professional Should Know | 2021
 
10 Brilliant ‘Tech’ Gifts for Remote Workers | 2021
10 Brilliant ‘Tech’ Gifts for Remote Workers |  2021 10 Brilliant ‘Tech’ Gifts for Remote Workers |  2021
10 Brilliant ‘Tech’ Gifts for Remote Workers | 2021
 
10 Tech Essentials Your Home Office Needs | Work From Home | WFH | 2021
10 Tech Essentials Your Home Office Needs | Work From Home | WFH | 202110 Tech Essentials Your Home Office Needs | Work From Home | WFH | 2021
10 Tech Essentials Your Home Office Needs | Work From Home | WFH | 2021
 
10 Data Science, Machine Learning & AI Projects You Can Try Today
10 Data Science, Machine Learning & AI Projects You Can Try Today10 Data Science, Machine Learning & AI Projects You Can Try Today
10 Data Science, Machine Learning & AI Projects You Can Try Today
 
10 Things Every Tech Employee Must Do Right After You Land Your Dream Job
10 Things Every Tech Employee Must Do Right After You Land Your Dream Job10 Things Every Tech Employee Must Do Right After You Land Your Dream Job
10 Things Every Tech Employee Must Do Right After You Land Your Dream Job
 

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 

Data Vault vs Data Lake: What's the difference?

  • 1. Data Vault VS Data Lake C o a c h F r u L o u i s | w w w. f r u l o u i s . c o m Tech|Career|Inspiration Data Basics
  • 2. 2 Agenda @Coachfrulouis Tech|Career|Inspiration 02. What is Data Vaults? 01. What is Data Lakes? 03. Friends or Foes? 04. Possibilities… Agenda
  • 3. 3 •Semi structured, unstructured, raw Schema on read Low cost storage Agile and easy reconfigure Data scientist, Experimentations Data Lakes: Definition & Characteristics Democratize Data Supports all Data formats Schema flexibility Advanced Analytics Scalability Data Visualization Data Filtering Machine Learning Dashboards Batch Processing Interactive Processing Data Lake Sources Consumers/ Analysts/ Reports/ Data Scientists E(xtract) T(ransform)L(oad) Hadoop, HDFS, S3, Spark, Databricks, e.t.c R, Pig, Solr, Hive, Presto, Tableau, Definition: A data lake is a system or repository of data stored in its natural/raw format, usually object blobs or files. A data lake is usually a single store of all enterprise data including raw copies of source system data and transformed data used for tasks such as reporting, visualization, advanced analytics and machine learning. https://en.wikipedia.org/wiki/Data_lake
  • 4. 4Data Lakes: History & Evolution 2006: Amazon AWS Launches 2008: Yahoo Open Sources Hadoop ** 2009: Cloudera Forms 2009: AWS Elastic MapReduce 2010: Apache Hive release 2010: John Dickson, coins the term Data Lake 2011: Horton Works Forms 2015: Snowflake released on AWS 2015: Hive and Presto released on AWS 2017: AWS Athena released Democratize Data Supports all Data formats Schema flexibility Advanced Analytics Scalability
  • 5. 5Data Vault (Modeling): Definition & Characteristics Democratize Data Supports all Data formats Schema flexibility Advanced Analytics Scalability Data Vault modeling is a database modeling method that is designed to provide long-term historical storage of data coming in from multiple operational systems. It is also a method of looking at historical data that deals with issues such as auditing, tracing of data, loading speed and resilience to change as well as emphasizing the need to trace where all the data in the database came from. Data Visualization Data Filtering Machine Learning Dashboards Batch Processing Interactive Processing Data Lake Sources Consumers/ Analysts/ Reports/ Data Scientists E(xtract) T(ransform)L(oad) Hadoop, HDFS, S3, Spark, Databricks, e.t.c R, Pig, Solr, Hive, Presto, Tableau, Data Vault Modelling /Harmonize Hive, Snowflake, BigQuery, Redshift, Oracle, Synapse, e.t.c. •Semi structured, unstructured, raw Schema on read Low cost storage Agile and easy reconfigure Data scientist, Experimentations Data Science / Exploration https://en.wikipedia.org/wiki/Data_vault_modeling
  • 6. 6Data Vault (Modeling): History & Evolution 1960s: E.F. Codd => 3NF Bill Inmon Invents Data Warehouse Dr. R. Kimball champions star schema 1990s: Conceived by Dan Linstedt 2000: DV 1.0 Released into public domain 2014: DV 2.0 Announced
  • 7. 7 Data Vault (Modelling) Sats (Satellites): These are the complete source tables that contain descriptive information and time attributes so we can track changes and do point-in-time analysis. Hubs: These contain the business keys and any metadata. Nothing descriptive is written to a Hub. Links: Links connect one or more Hubs together. The Data Vault modelling is a technique used to store source data at a more granular level. Generally, the data is not changed in any way, other than to add load date keys to track changes. 1) Instead of each master table in 3NF, we add a hub and a satellite. 2) Instead of the transactional table, we add Link table and Satellite. 3) Instead of the joins between master tables, we add Link tables. http://bukhantsov.org/2012/04/what-is-data-vault/ Dimensional Model Data Vault Model
  • 8. 8 Data Visualization Data Filtering Machine Learning Dashboards Batch Processing Interactive Processing Data Lake Sources Consumers/ Analysts/ Reports/ Data Scientists E(xtract) T(ransform)L(oad) Hadoop, HDFS, S3, Spark, Databricks, e.t.c R, Pig, Solr, Hive, Presto, Tableau, Modelling /Harmonize RDBMS: Hive, Snowflake, BigQuery, Redshift, Oracle, Synapse, e.t.c. Verdict: Data Vault vs Data Lakes? Data Warehousing Modeling Techniques Data Vault Modelling Dimensional Modelling (3NF) Others Verdict: This comparison is a misnomer. Data Vaults don’t compete with Data Lakes. DV compliments Data Lakes for better analytics i.e. Data Lakes + Data Vault (Modelling) Data Science / Exploration Modelling /Harmonize Consumption
  • 9. Thanks Tech|Career|Inspiration F I N I S H w w w. f r u l o u i s . c o m