The difference between a Data Lake and a
Data Vault is the difference between a
stethoscope and a radar
• A Data Lake reinforces what you already know
• A Data Lake provides weak support for strategic
decisions
• Data Lakes encourage a silo mentality
• Data Lakes can show the ‘what’
• Data Vaults help with the ‘why’
• Data Lakes enable drill down
• Data Vaults encourage drill across
Data Lake vs Vault Summary
The hunt for business
signals
What do we do?
Signal Processing or Data
Processing?
• Signals start conversations
• Signals move boardrooms
• Signals release IT expenditure
• Signal variety, reliability and context
are key business drivers
• Data Processing ends
conversations!
Signal Processing is the
customer of Data
Integration & Warehousing
Signal
Processing
Business
Intelligence
Artificial
Intelligence
Reporting Analytics
Spreadsheets
Dashboards
Sales are down but why?
There are many interpretations of
reality;
• Website broken
• Marketing budget cut
• Campaign poor
• Product price uncompetitive
• New product release
• Company trashed by Trump
• Fashion victim
• Delivery delays and/or cost
• Recession
Signal Processing at Scale
• The Cloud is one massive signal
processor, with limitless
compute power and storage
• The Role of Data Integration in
the cloud is the organisation of
data sets for both efficient and
effective signal processing
• Data Lakes & Vaults have
emerged as key cloud
integration patterns
Data Lakes vs
Data Vaults
Data Lake Evolution
• 2011: Horton Works Forms
• 2012: AWS announces Amazon RedShift
• 2014: Data Lake European on premise
projects take off
• 2015: Snowflake released on AWS
• 2015: Hive and Presto released on AWS
• 2017: AWS Athena released
• 2006: Amazon AWS Launches
• 2008: Yahoo Open Sources Hadoop
• 2009: Cloudera Forms
• 2009: AWS Elastic MapReduce
• 2010 (October): Apache Hive release
• 2010 (October): James Dickson,
CTO Pentaho, coined the term Data Lake
Data Lake Signals are Isolated
• Data Lakes encourage detailed
analysis of a very narrow field
• Thinking across separate data
sources is difficult and inconsistent
• A silo mentality can emerge
• Data Scientists spend their time
hunting for the data lake ontology
• Weak support for strategic
decisions
• Too easy to make bad decisions on
limited data
Data Lake Warning
The danger with Data Lakes is that they encourage
decisions based upon what can be easily measured
Data Lakes are Good for
• Starting EDW projects
• Persistent staging areas
• Feedstock for Data Vaults
• Tactical Analysis
• DWH flexibility
• API Calls/Gateway
• Unstructured log analysis
• Operational Monitoring
Data Vault Evolution
• 1990s: Conceived by Dan Linstedt
• 2000: DV 1.0 Released into public domain
• 2014: DV 2.0 Announced
Data Vault Trends
• Strong tools are emerging for source centric
modelling and model population
• The need for business centric modelling
• Patterns emerging for automation of documentation,
validation and reconciliation
• New Data Warehouse Databases complement data
vaults
• GDPR and & PII are driving the need for ontologies
• S3/Athena as a Data Vault?
Data Vaults are Good for
• EDW projects
• Strategic Analysis
• Feedstock for Cubes and Models
Data Vault Signals are related
through business context
Sales are down and here is the
business context
• Broadens the field of vision and
the scope of questions
• Increases the variety, quality and
strength of signal channels
• Different business perspectives
are supported in a consistent
analysis framework
Leaders need situational
awareness
Data Vaults expose relationships between different
business signals

Data Vault Vs Data Lake

  • 1.
    The difference betweena Data Lake and a Data Vault is the difference between a stethoscope and a radar • A Data Lake reinforces what you already know • A Data Lake provides weak support for strategic decisions • Data Lakes encourage a silo mentality • Data Lakes can show the ‘what’ • Data Vaults help with the ‘why’ • Data Lakes enable drill down • Data Vaults encourage drill across Data Lake vs Vault Summary
  • 2.
    The hunt forbusiness signals
  • 3.
    What do wedo? Signal Processing or Data Processing? • Signals start conversations • Signals move boardrooms • Signals release IT expenditure • Signal variety, reliability and context are key business drivers • Data Processing ends conversations!
  • 4.
    Signal Processing isthe customer of Data Integration & Warehousing Signal Processing Business Intelligence Artificial Intelligence Reporting Analytics Spreadsheets Dashboards
  • 5.
    Sales are downbut why? There are many interpretations of reality; • Website broken • Marketing budget cut • Campaign poor • Product price uncompetitive • New product release • Company trashed by Trump • Fashion victim • Delivery delays and/or cost • Recession
  • 6.
    Signal Processing atScale • The Cloud is one massive signal processor, with limitless compute power and storage • The Role of Data Integration in the cloud is the organisation of data sets for both efficient and effective signal processing • Data Lakes & Vaults have emerged as key cloud integration patterns
  • 7.
  • 8.
    Data Lake Evolution •2011: Horton Works Forms • 2012: AWS announces Amazon RedShift • 2014: Data Lake European on premise projects take off • 2015: Snowflake released on AWS • 2015: Hive and Presto released on AWS • 2017: AWS Athena released • 2006: Amazon AWS Launches • 2008: Yahoo Open Sources Hadoop • 2009: Cloudera Forms • 2009: AWS Elastic MapReduce • 2010 (October): Apache Hive release • 2010 (October): James Dickson, CTO Pentaho, coined the term Data Lake
  • 9.
    Data Lake Signalsare Isolated • Data Lakes encourage detailed analysis of a very narrow field • Thinking across separate data sources is difficult and inconsistent • A silo mentality can emerge • Data Scientists spend their time hunting for the data lake ontology • Weak support for strategic decisions • Too easy to make bad decisions on limited data
  • 10.
    Data Lake Warning Thedanger with Data Lakes is that they encourage decisions based upon what can be easily measured
  • 11.
    Data Lakes areGood for • Starting EDW projects • Persistent staging areas • Feedstock for Data Vaults • Tactical Analysis • DWH flexibility • API Calls/Gateway • Unstructured log analysis • Operational Monitoring
  • 12.
    Data Vault Evolution •1990s: Conceived by Dan Linstedt • 2000: DV 1.0 Released into public domain • 2014: DV 2.0 Announced
  • 13.
    Data Vault Trends •Strong tools are emerging for source centric modelling and model population • The need for business centric modelling • Patterns emerging for automation of documentation, validation and reconciliation • New Data Warehouse Databases complement data vaults • GDPR and & PII are driving the need for ontologies • S3/Athena as a Data Vault?
  • 14.
    Data Vaults areGood for • EDW projects • Strategic Analysis • Feedstock for Cubes and Models
  • 15.
    Data Vault Signalsare related through business context Sales are down and here is the business context • Broadens the field of vision and the scope of questions • Increases the variety, quality and strength of signal channels • Different business perspectives are supported in a consistent analysis framework
  • 16.
    Leaders need situational awareness DataVaults expose relationships between different business signals

Editor's Notes

  • #4 In the pub, signals open conversations Signals move boardrooms not data How our data integration projects are consumed by the board determines the success/failure We should sell signals not technology Flying blind Yield Curves
  • #5 Human task
  • #6 Board can’t take action if blind to obvious signals
  • #9 10 years since Yahoo open sourced Hadoop Which came first James Dickson or Hive? Up until Hive, Hadoop was hard, separated compute from storage without analysis 4 years since first data lake iteration…poor
  • #14 Conformed satellites made from rules
  • #16 Links Perspectives; sales, finance, marketing, operational