SlideShare a Scribd company logo
1 of 42
Download to read offline
Introductionn

eacweber@gmail.com
Data Vault Definition

The Data Vault is a detail oriented, historical tracking and uniquely
linked set of normalized tables that support one or more functional
areas of business. It is a hybrid approach encompassing the best
of breed between 3rd normal form (3NF) and star schema. The
design is flexible, scalable, consistent and adaptable to the needs
of the enterprise. It is a data model that is architected specifically
to meet the needs of enterprise data warehouses.



Source: Dan Linstedt
http://www.tdan.com/view-articles/5054/
Data Vault Building Blocks
                                                                  different sources/rate of change




Source: Dan Linstedt
http://www.slideshare.net/dlinstedt/introduction-to-data-vault-dama-oregon-2012
Data Vault Fundamentals: Hub




Source: data-vault-modeling-guide
GENESEE ACADEMY, LLC, Hans Hultgren
Data Vault Fundamentals: Link




Source: data-vault-modeling-guide
GENESEE ACADEMY, LLC, Hans Hultgren
Data Vault Fundamentals: Satellite




   Source: data-vault-modeling-guide
   GENESEE ACADEMY, LLC, Hans Hultgren
Data Vault Fundamentals: Model




Source: data-vault-modeling-guide
GENESEE ACADEMY, LLC, Hans Hultgren
Data Vault ETL
Many objects to load, standardized procedures
This screams for a generic solution!
I don't want to:
  throw ETL tool away and code it all myself
  manage too many ETL objects
  connect similar columns in mappings by hand
I do want to:
  generate ETL (Kettle) objects? No
  Take it one step further: there's only 1 parameterised hub load
  object. Don't need to know xml structure of PDI objects
Tools
Operating System                Database


  Virtualization

                   Data Integration       'Productivity'

Sql Development
                                      Version Control
Place of framework in architecture


  Files

            MySQL      ETL:
                      Kettle   MySQL
 DBMS ETL              Data     Data   ETL

                      Vault    Vault
             CSV      Frame
             Files     work
  ERP       Staging             Central DWH &
             Area                 Data Marts

Sources     ETL Process        Data Warehouse   EUL
What has to be taken care of?

Data Vault designed and implemented in database
Staging tables and loading procedures in place
(can also be generic, we use PDI Metadata Injection step for loading
files)

Mapping from source to Data Vault specified
(now in an Excel sheet)
Framework components

PDI repository (file based), jobs and transformations
Configuration files:
kettle.properties
shared.xml
repositories.xml

Excel sheet that contains the specifications
MySQL database for metadata
Virtual machine with Ubuntu 12.04 Server
Design decisions

Updateable views with generic column names
(MySQL more lenient than PostgreSQL)
Compare satellite attributes via string comparison
(concatenate all columns, with | (pipe) as delimiter)

'inject' the metadata using Kettle parameters
Generate and use an error table for each Data Vault
table
Metadata tables




All have history tables
Metadata in Excel
                    Data Vault

                    connections

                    source systems



                    source tables
Metadata in Excel (hub + sat)




          x 200 (max)
Metadata in Excel (link)




                         x 10

link attributes
Metadata in Excel (link satellite)



                  x 10


                  x5

  x 200 (max)
Last seen date

applicable for hubs and links
existing hubs and links: update 'last_seen_dts'!
Link validity satellite
Link has 'business key': not all hub id's
Loading the metadata
'design errors'

Checks to avoid debugging:
(compares design metadata with Data Vault DB information_schema)


  hubs, links, satellites that don't exist in the DV
  key columns that do not exist in the DV
  missing connection data (source db)
  missing attribute columns
A complete run
Metadata needed for a hub

name
key column
business key column
source table
source table business key column
(can be expression, e.g. concatenate for composite key)
Job for hub
Transformation for hub
Metadata needed for a link
name
key column
for each hub (maximum 10, can be a ref-table)
   hub name
   column name for the hub key in the link (roles!)
   column in the source table → business key of hub


link 'attributes' (part of key, no hub, maximum 5)
link validity satellite needed?
last seen date needed?
source table
Job for link
Transformation for link




                  Last seen?
Lookup hubs

                               Remove columns not in link



                               Run table needed for
                               validity sat ?
Metadata needed for a hub satellite

  name
  key column
  hub name
  column in the source table → business key of hub
  for each attribute (maximum 200)
    source column
    target column
  source table
Job for hub satellite
Transformation for hub satellite
Metadata needed for a link satellite

name
key column
link name
for each hub of the link:
column in the source table → business key of hub
for each key attribute: source column
for each attribute: source column → target column
source table
Job for link satellite
Transformation for link satellite
Executing in a loop ..
.. and parallel
Logging


Custom logging




                                   PDI logging



Configuring log tables
for concurrent access
Version Control: PDI objects
Version Control: database objects
Some points of interest

Easy to make mistake in design sheet
Generic → a bit harder to maintain and debug
Application/tool to maintain metadata?
Data Vault generators (e.g. Quipu)?
Spinoff using Informatica and Oracle: Sander Robijns
Thanks to: Jos van Dongen
            Kasper de Graaf
Sourceforge!

More Related Content

What's hot

Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Kent Graziano
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016Kent Graziano
 
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...Kent Graziano
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Jonathan Seidman
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storagehybrid cloud
 
Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best PracticesEduardo Castro
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesIvo Andreev
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachKent Graziano
 
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...Eric Javier Espino Man
 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopDataWorks Summit
 
TDWI Roundtable: The HANA EDW
TDWI Roundtable: The HANA EDWTDWI Roundtable: The HANA EDW
TDWI Roundtable: The HANA EDWukc4
 
Using hadoop to expand data warehousing
Using hadoop to expand data warehousingUsing hadoop to expand data warehousing
Using hadoop to expand data warehousingDataWorks Summit
 
ETL big data with apache hadoop
ETL big data with apache hadoopETL big data with apache hadoop
ETL big data with apache hadoopMaulik Thaker
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data WarehouseCaserta
 
Real World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data WarehousingReal World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data Warehousingukc4
 

What's hot (20)

Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
 
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Data vault what's Next: Part 2
Data vault what's Next: Part 2Data vault what's Next: Part 2
Data vault what's Next: Part 2
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
 
Varadarajan CV
Varadarajan CVVaradarajan CV
Varadarajan CV
 
Oracle Data Warehouse
Oracle Data WarehouseOracle Data Warehouse
Oracle Data Warehouse
 
Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best Practices
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
 
Data vault: What's Next
Data vault: What's NextData vault: What's Next
Data vault: What's Next
 
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on Hadoop
 
TDWI Roundtable: The HANA EDW
TDWI Roundtable: The HANA EDWTDWI Roundtable: The HANA EDW
TDWI Roundtable: The HANA EDW
 
Using hadoop to expand data warehousing
Using hadoop to expand data warehousingUsing hadoop to expand data warehousing
Using hadoop to expand data warehousing
 
ETL big data with apache hadoop
ETL big data with apache hadoopETL big data with apache hadoop
ETL big data with apache hadoop
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data Warehouse
 
Real World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data WarehousingReal World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data Warehousing
 

Viewers also liked

Visualization 101 BA4All
Visualization 101 BA4AllVisualization 101 BA4All
Visualization 101 BA4AllJos van Dongen
 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI DutchJos van Dongen
 
Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Jos van Dongen
 
Open Source Business Intelligence
Open Source Business IntelligenceOpen Source Business Intelligence
Open Source Business IntelligenceJos van Dongen
 
Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?Jos van Dongen
 
От больших к очень большим данным — зачем нужна нормализация в Big Data / Гол...
От больших к очень большим данным — зачем нужна нормализация в Big Data / Гол...От больших к очень большим данным — зачем нужна нормализация в Big Data / Гол...
От больших к очень большим данным — зачем нужна нормализация в Big Data / Гол...Ontico
 
Как мы считали трафик на Вертике, Николай Голов (Avito)
Как мы считали трафик на Вертике, Николай Голов (Avito)Как мы считали трафик на Вертике, Николай Голов (Avito)
Как мы считали трафик на Вертике, Николай Голов (Avito)Ontico
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?Jos van Dongen
 
Hi Speed Datawarehousing
Hi Speed DatawarehousingHi Speed Datawarehousing
Hi Speed DatawarehousingJos van Dongen
 
Data Vault ReConnect Speed Presenting AM Part One
Data Vault ReConnect Speed Presenting AM Part OneData Vault ReConnect Speed Presenting AM Part One
Data Vault ReConnect Speed Presenting AM Part OneHans Hultgren
 
Data Vault ReConnect Speed Presenting PM Part Three
Data Vault ReConnect Speed Presenting PM Part ThreeData Vault ReConnect Speed Presenting PM Part Three
Data Vault ReConnect Speed Presenting PM Part ThreeHans Hultgren
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Kent Graziano
 

Viewers also liked (13)

Visualization 101 BA4All
Visualization 101 BA4AllVisualization 101 BA4All
Visualization 101 BA4All
 
Visual Data Vault
Visual Data VaultVisual Data Vault
Visual Data Vault
 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI Dutch
 
Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Database Shootout: What's best for BI?
Database Shootout: What's best for BI?
 
Open Source Business Intelligence
Open Source Business IntelligenceOpen Source Business Intelligence
Open Source Business Intelligence
 
Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?
 
От больших к очень большим данным — зачем нужна нормализация в Big Data / Гол...
От больших к очень большим данным — зачем нужна нормализация в Big Data / Гол...От больших к очень большим данным — зачем нужна нормализация в Big Data / Гол...
От больших к очень большим данным — зачем нужна нормализация в Big Data / Гол...
 
Как мы считали трафик на Вертике, Николай Голов (Avito)
Как мы считали трафик на Вертике, Николай Голов (Avito)Как мы считали трафик на Вертике, Николай Голов (Avito)
Как мы считали трафик на Вертике, Николай Голов (Avito)
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?
 
Hi Speed Datawarehousing
Hi Speed DatawarehousingHi Speed Datawarehousing
Hi Speed Datawarehousing
 
Data Vault ReConnect Speed Presenting AM Part One
Data Vault ReConnect Speed Presenting AM Part OneData Vault ReConnect Speed Presenting AM Part One
Data Vault ReConnect Speed Presenting AM Part One
 
Data Vault ReConnect Speed Presenting PM Part Three
Data Vault ReConnect Speed Presenting PM Part ThreeData Vault ReConnect Speed Presenting PM Part Three
Data Vault ReConnect Speed Presenting PM Part Three
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)
 

Similar to PDI data vault framework #pcmams 2012

Windows Azure and a little SQL Data Services
Windows Azure and a little SQL Data ServicesWindows Azure and a little SQL Data Services
Windows Azure and a little SQL Data Servicesukdpe
 
SQL Server 2008 for Developers
SQL Server 2008 for DevelopersSQL Server 2008 for Developers
SQL Server 2008 for Developersukdpe
 
What's New for Data?
What's New for Data?What's New for Data?
What's New for Data?ukdpe
 
Be A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data PipelineBe A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data PipelineChester Chen
 
Microsoft Database Options
Microsoft Database OptionsMicrosoft Database Options
Microsoft Database OptionsDavid Chou
 
Intro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationIntro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationPhilip Yurchuk
 
Entity framework 4.0
Entity framework 4.0Entity framework 4.0
Entity framework 4.0Abhishek Sur
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkDatabricks
 
Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)David McCarter
 
Tech Days09 Sqldev
Tech Days09 SqldevTech Days09 Sqldev
Tech Days09 Sqldevllangit
 
SQL Server 2008 for Developers
SQL Server 2008 for DevelopersSQL Server 2008 for Developers
SQL Server 2008 for Developersllangit
 
SQL Server 2008 for .NET Developers
SQL Server 2008 for .NET DevelopersSQL Server 2008 for .NET Developers
SQL Server 2008 for .NET Developersllangit
 
ASP.NET 3.5 SP1
ASP.NET 3.5 SP1ASP.NET 3.5 SP1
ASP.NET 3.5 SP1Dave Allen
 
Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)David McCarter
 

Similar to PDI data vault framework #pcmams 2012 (20)

Benedutch 2011 ew_ppt
Benedutch 2011 ew_pptBenedutch 2011 ew_ppt
Benedutch 2011 ew_ppt
 
Windows Azure and a little SQL Data Services
Windows Azure and a little SQL Data ServicesWindows Azure and a little SQL Data Services
Windows Azure and a little SQL Data Services
 
SQL Server 2008 for Developers
SQL Server 2008 for DevelopersSQL Server 2008 for Developers
SQL Server 2008 for Developers
 
What's New for Data?
What's New for Data?What's New for Data?
What's New for Data?
 
Be A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data PipelineBe A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data Pipeline
 
Microsoft Database Options
Microsoft Database OptionsMicrosoft Database Options
Microsoft Database Options
 
Intro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationIntro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data Integration
 
Entity framework 4.0
Entity framework 4.0Entity framework 4.0
Entity framework 4.0
 
It ready dw_day3_rev00
It ready dw_day3_rev00It ready dw_day3_rev00
It ready dw_day3_rev00
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Practical OData
Practical ODataPractical OData
Practical OData
 
Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)
 
Olap
OlapOlap
Olap
 
Tech Days09 Sqldev
Tech Days09 SqldevTech Days09 Sqldev
Tech Days09 Sqldev
 
SQL Server 2008 for Developers
SQL Server 2008 for DevelopersSQL Server 2008 for Developers
SQL Server 2008 for Developers
 
SQL Server 2008 for .NET Developers
SQL Server 2008 for .NET DevelopersSQL Server 2008 for .NET Developers
SQL Server 2008 for .NET Developers
 
ASP.NET 3.5 SP1
ASP.NET 3.5 SP1ASP.NET 3.5 SP1
ASP.NET 3.5 SP1
 
WebSphere Commerce v7 Data Load
WebSphere Commerce v7 Data LoadWebSphere Commerce v7 Data Load
WebSphere Commerce v7 Data Load
 
Running Databases on AWS
Running Databases on AWSRunning Databases on AWS
Running Databases on AWS
 
Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)
 

Recently uploaded

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 

Recently uploaded (20)

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 

PDI data vault framework #pcmams 2012

  • 2. Data Vault Definition The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. It is a data model that is architected specifically to meet the needs of enterprise data warehouses. Source: Dan Linstedt http://www.tdan.com/view-articles/5054/
  • 3. Data Vault Building Blocks different sources/rate of change Source: Dan Linstedt http://www.slideshare.net/dlinstedt/introduction-to-data-vault-dama-oregon-2012
  • 4. Data Vault Fundamentals: Hub Source: data-vault-modeling-guide GENESEE ACADEMY, LLC, Hans Hultgren
  • 5. Data Vault Fundamentals: Link Source: data-vault-modeling-guide GENESEE ACADEMY, LLC, Hans Hultgren
  • 6. Data Vault Fundamentals: Satellite Source: data-vault-modeling-guide GENESEE ACADEMY, LLC, Hans Hultgren
  • 7. Data Vault Fundamentals: Model Source: data-vault-modeling-guide GENESEE ACADEMY, LLC, Hans Hultgren
  • 8. Data Vault ETL Many objects to load, standardized procedures This screams for a generic solution! I don't want to: throw ETL tool away and code it all myself manage too many ETL objects connect similar columns in mappings by hand I do want to: generate ETL (Kettle) objects? No Take it one step further: there's only 1 parameterised hub load object. Don't need to know xml structure of PDI objects
  • 9. Tools Operating System Database Virtualization Data Integration 'Productivity' Sql Development Version Control
  • 10. Place of framework in architecture Files MySQL ETL: Kettle MySQL DBMS ETL Data Data ETL Vault Vault CSV Frame Files work ERP Staging Central DWH & Area Data Marts Sources ETL Process Data Warehouse EUL
  • 11. What has to be taken care of? Data Vault designed and implemented in database Staging tables and loading procedures in place (can also be generic, we use PDI Metadata Injection step for loading files) Mapping from source to Data Vault specified (now in an Excel sheet)
  • 12. Framework components PDI repository (file based), jobs and transformations Configuration files: kettle.properties shared.xml repositories.xml Excel sheet that contains the specifications MySQL database for metadata Virtual machine with Ubuntu 12.04 Server
  • 13. Design decisions Updateable views with generic column names (MySQL more lenient than PostgreSQL) Compare satellite attributes via string comparison (concatenate all columns, with | (pipe) as delimiter) 'inject' the metadata using Kettle parameters Generate and use an error table for each Data Vault table
  • 14. Metadata tables All have history tables
  • 15. Metadata in Excel Data Vault connections source systems source tables
  • 16. Metadata in Excel (hub + sat) x 200 (max)
  • 17. Metadata in Excel (link) x 10 link attributes
  • 18. Metadata in Excel (link satellite) x 10 x5 x 200 (max)
  • 19. Last seen date applicable for hubs and links existing hubs and links: update 'last_seen_dts'!
  • 20. Link validity satellite Link has 'business key': not all hub id's
  • 22. 'design errors' Checks to avoid debugging: (compares design metadata with Data Vault DB information_schema) hubs, links, satellites that don't exist in the DV key columns that do not exist in the DV missing connection data (source db) missing attribute columns
  • 24. Metadata needed for a hub name key column business key column source table source table business key column (can be expression, e.g. concatenate for composite key)
  • 27. Metadata needed for a link name key column for each hub (maximum 10, can be a ref-table) hub name column name for the hub key in the link (roles!) column in the source table → business key of hub link 'attributes' (part of key, no hub, maximum 5) link validity satellite needed? last seen date needed? source table
  • 29. Transformation for link Last seen? Lookup hubs Remove columns not in link Run table needed for validity sat ?
  • 30. Metadata needed for a hub satellite name key column hub name column in the source table → business key of hub for each attribute (maximum 200) source column target column source table
  • 31. Job for hub satellite
  • 33. Metadata needed for a link satellite name key column link name for each hub of the link: column in the source table → business key of hub for each key attribute: source column for each attribute: source column → target column source table
  • 34. Job for link satellite
  • 36. Executing in a loop ..
  • 38. Logging Custom logging PDI logging Configuring log tables for concurrent access
  • 41. Some points of interest Easy to make mistake in design sheet Generic → a bit harder to maintain and debug Application/tool to maintain metadata? Data Vault generators (e.g. Quipu)? Spinoff using Informatica and Oracle: Sander Robijns Thanks to: Jos van Dongen Kasper de Graaf