Data Vault 2.0 is a unique system of Data Warehousing created from the ground up to deal with real-world data challenges. Data Vault 2.0 delivers improved total cost of ownership, greatly enhanced operational agility and traceable data governance.
2. What is the current landscape for
Information Managers?
3. Security
Open Data
» Lockdown vs. Democratisation
» Redaction
» De-personalisation
Regulatory compliance
» GDPR
» SOX
» FMA/APRA
» Cloud / ISO27001
External factors
4. GDPR
› This is likely to be replicated in
New Zealand
› Security Data Breach Notification Law in
effect as at 25 May 2018
› Lack of compliance will result in
heavy penalties
5. › More data more often
› New data sets all the time
› Data Quality challenges
› Corporate memory
› Reliance on individuals
› Reconciliation and audit
Internal factors
8. What’s wrong with traditional Data Warehousing methods?
‘Shadow IT’
Big Data
E
T
L
M
A
R
T
S
3rd party
data
Source
System
Source
System
JSON/XML
Semi-
structured
Unstruct-
ured data
BI
Analytics
Data Science
9. Automation Framework
Data Acquisition
Real-Time
CDC
Messaging
Batch
ETL/ELT
Files
PDF, Docs,
Video, etc.
Staging
Raw
Data
Vault
Data Provisioning
Information Governance - Metadata Management, Lineage, Data Quality
Big Data/No SQL
Information
Marts
(Virtual/Physical)
Operational Data
Store
Business
Data
Vault
Enterprise Data
Vault
Data Vault: Scalable, Extensible, Agnostic
3rd party
data
Source
System
JSON/XML
Semi-
structured
Unstruct-
ured data
BI
Analytics
Data Science
"It is not the strongest of the species that survives, nor the most intelligent
that survives. It is the one that is the most adaptable to change." Charles
Darwin
10. Data Lake
› Data flows in ‘naturally’
› Some boundaries
› Content flows out with little constraint
Data Swamp
› Uncontrolled flow
› No borders and potentially bottomless
› Filled with flotsam and jetsam
Data Harbour
› Controlled flow
› Trust in the delivery and content of data
› Built to use and extract value from the data
11. What is Data Vault?
› Invented by Dan Linstedt in the late 1990s, Data Vault is
a System of Information Delivery containing the
necessary components needed to accomplish
enterprise vision in Data Warehousing and Business
Intelligence
› Data Vault includes a data modelling technique for data
repositories that has significant advantages over
traditional methodologies: auditable, extensible,
automated
› With the release of Data Vault 2.0 in 2013, it extended
from just the data model to a full methodology
› Is effectively vendor agnostic: works with multiple data
processing tools, relational databases, and file stores
Even Gartner is advising clients to beware of the data lake fallacy (http://www.gartner.com/newsroom/id/2809117).
Excerpt ….
"In broad terms, data lakes are marketed as enterprise-wide data management platforms for analyzing disparate sources of data in its native format," said Nick Heudecker, research director at Gartner. "The idea is simple: instead of placing data in a purpose-built data store, you move it into a data lake in its original format. This eliminates the upfront costs of data ingestion, like transformation. Once data is placed into the lake, it's available for analysis by everyone in the organization."
However, while the marketing hype suggests audiences throughout an enterprise will leverage data lakes, this positioning assumes that all those audiences are highly skilled at data manipulation and analysis, as data lakes lack semantic consistency and governed metadata.
This is why the data harbour is taking shape. It allows the skilled people to use the data under lesser control; but still a level of trust