SlideShare a Scribd company logo
Introductionn

eacweber@gmail.com
Data Vault Definition

The Data Vault is a detail oriented, historical tracking and uniquely
linked set of normalized tables that support one or more functional
areas of business. It is a hybrid approach encompassing the best
of breed between 3rd normal form (3NF) and star schema. The
design is flexible, scalable, consistent and adaptable to the needs
of the enterprise. It is a data model that is architected specifically
to meet the needs of enterprise data warehouses.



Source: Dan Linstedt
http://www.tdan.com/view-articles/5054/
Data Vault Building Blocks
                                                                  different sources/rate of change




Source: Dan Linstedt
http://www.slideshare.net/dlinstedt/introduction-to-data-vault-dama-oregon-2012
Data Vault Fundamentals: Hub




Source: data-vault-modeling-guide
GENESEE ACADEMY, LLC, Hans Hultgren
Data Vault Fundamentals: Link




Source: data-vault-modeling-guide
GENESEE ACADEMY, LLC, Hans Hultgren
Data Vault Fundamentals: Satellite




   Source: data-vault-modeling-guide
   GENESEE ACADEMY, LLC, Hans Hultgren
Data Vault Fundamentals: Model




Source: data-vault-modeling-guide
GENESEE ACADEMY, LLC, Hans Hultgren
Data Vault ETL
Many objects to load, standardized procedures
This screams for a generic solution!
I don't want to:
  throw ETL tool away and code it all myself
  manage too many ETL objects
  connect similar columns in mappings by hand
I do want to:
  generate ETL (Kettle) objects? No
  Take it one step further: there's only 1 parameterised hub load
  object. Don't need to know xml structure of PDI objects
Tools
Operating System                Database


  Virtualization

                   Data Integration       'Productivity'

Sql Development
                                      Version Control
Place of framework in architecture


  Files

            MySQL      ETL:
                      Kettle   MySQL
 DBMS ETL              Data     Data   ETL

                      Vault    Vault
             CSV      Frame
             Files     work
  ERP       Staging             Central DWH &
             Area                 Data Marts

Sources     ETL Process        Data Warehouse   EUL
What has to be taken care of?

Data Vault designed and implemented in database
Staging tables and loading procedures in place
(can also be generic, we use PDI Metadata Injection step for loading
files)

Mapping from source to Data Vault specified
(now in an Excel sheet)
Framework components

PDI repository (file based), jobs and transformations
Configuration files:
kettle.properties
shared.xml
repositories.xml

Excel sheet that contains the specifications
MySQL database for metadata
Virtual machine with Ubuntu 12.04 Server
Design decisions

Updateable views with generic column names
(MySQL more lenient than PostgreSQL)
Compare satellite attributes via string comparison
(concatenate all columns, with | (pipe) as delimiter)

'inject' the metadata using Kettle parameters
Generate and use an error table for each Data Vault
table
Metadata tables




All have history tables
Metadata in Excel
                    Data Vault

                    connections

                    source systems



                    source tables
Metadata in Excel (hub + sat)




          x 200 (max)
Metadata in Excel (link)




                         x 10

link attributes
Metadata in Excel (link satellite)



                  x 10


                  x5

  x 200 (max)
Last seen date

applicable for hubs and links
existing hubs and links: update 'last_seen_dts'!
Link validity satellite
Link has 'business key': not all hub id's
Loading the metadata
'design errors'

Checks to avoid debugging:
(compares design metadata with Data Vault DB information_schema)


  hubs, links, satellites that don't exist in the DV
  key columns that do not exist in the DV
  missing connection data (source db)
  missing attribute columns
A complete run
Metadata needed for a hub

name
key column
business key column
source table
source table business key column
(can be expression, e.g. concatenate for composite key)
Job for hub
Transformation for hub
Metadata needed for a link
name
key column
for each hub (maximum 10, can be a ref-table)
   hub name
   column name for the hub key in the link (roles!)
   column in the source table → business key of hub


link 'attributes' (part of key, no hub, maximum 5)
link validity satellite needed?
last seen date needed?
source table
Job for link
Transformation for link




                  Last seen?
Lookup hubs

                               Remove columns not in link



                               Run table needed for
                               validity sat ?
Metadata needed for a hub satellite

  name
  key column
  hub name
  column in the source table → business key of hub
  for each attribute (maximum 200)
    source column
    target column
  source table
Job for hub satellite
Transformation for hub satellite
Metadata needed for a link satellite

name
key column
link name
for each hub of the link:
column in the source table → business key of hub
for each key attribute: source column
for each attribute: source column → target column
source table
Job for link satellite
Transformation for link satellite
Executing in a loop ..
.. and parallel
Logging


Custom logging




                                   PDI logging



Configuring log tables
for concurrent access
Version Control: PDI objects
Version Control: database objects
Some points of interest

Easy to make mistake in design sheet
Generic → a bit harder to maintain and debug
Application/tool to maintain metadata?
Data Vault generators (e.g. Quipu)?
Spinoff using Informatica and Oracle: Sander Robijns
Thanks to: Jos van Dongen
            Kasper de Graaf
Sourceforge!

More Related Content

What's hot

U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Michael Rys
 
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Michael Rys
 
U-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningU-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance Tuning
Michael Rys
 
Spark meetup v2.0.5
Spark meetup v2.0.5Spark meetup v2.0.5
Spark meetup v2.0.5
Yan Zhou
 
Killer Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLKiller Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQL
Michael Rys
 
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
Michael Rys
 
20140908 spark sql & catalyst
20140908 spark sql & catalyst20140908 spark sql & catalyst
20140908 spark sql & catalyst
Takuya UESHIN
 
U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)
Michael Rys
 
U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)
Michael Rys
 
Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)
Michael Rys
 
Hive and HiveQL - Module6
Hive and HiveQL - Module6Hive and HiveQL - Module6
Hive and HiveQL - Module6
Rohit Agrawal
 
Benedutch 2011 ew_ppt
Benedutch 2011 ew_pptBenedutch 2011 ew_ppt
Benedutch 2011 ew_ppt
Antonius Intelligence Team
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
Michael Rys
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)
Michael Rys
 
Spark SQL with Scala Code Examples
Spark SQL with Scala Code ExamplesSpark SQL with Scala Code Examples
Spark SQL with Scala Code Examples
Todd McGrath
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Julian Hyde
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
Databricks
 
Introduction to Spark SQL & Catalyst
Introduction to Spark SQL & CatalystIntroduction to Spark SQL & Catalyst
Introduction to Spark SQL & Catalyst
Takuya UESHIN
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
Michael Rys
 
Flex Tables Guide Software V. 7.0.x
Flex Tables Guide Software V. 7.0.xFlex Tables Guide Software V. 7.0.x
Flex Tables Guide Software V. 7.0.x
Andrey Karpov
 

What's hot (20)

U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
 
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
 
U-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningU-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance Tuning
 
Spark meetup v2.0.5
Spark meetup v2.0.5Spark meetup v2.0.5
Spark meetup v2.0.5
 
Killer Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLKiller Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQL
 
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
 
20140908 spark sql & catalyst
20140908 spark sql & catalyst20140908 spark sql & catalyst
20140908 spark sql & catalyst
 
U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)
 
U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)
 
Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)
 
Hive and HiveQL - Module6
Hive and HiveQL - Module6Hive and HiveQL - Module6
Hive and HiveQL - Module6
 
Benedutch 2011 ew_ppt
Benedutch 2011 ew_pptBenedutch 2011 ew_ppt
Benedutch 2011 ew_ppt
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)
 
Spark SQL with Scala Code Examples
Spark SQL with Scala Code ExamplesSpark SQL with Scala Code Examples
Spark SQL with Scala Code Examples
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
 
Introduction to Spark SQL & Catalyst
Introduction to Spark SQL & CatalystIntroduction to Spark SQL & Catalyst
Introduction to Spark SQL & Catalyst
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
Flex Tables Guide Software V. 7.0.x
Flex Tables Guide Software V. 7.0.xFlex Tables Guide Software V. 7.0.x
Flex Tables Guide Software V. 7.0.x
 

Similar to Presentation pdi data_vault_framework_meetup2012

Windows Azure and a little SQL Data Services
Windows Azure and a little SQL Data ServicesWindows Azure and a little SQL Data Services
Windows Azure and a little SQL Data Services
ukdpe
 
SQL Server 2008 for Developers
SQL Server 2008 for DevelopersSQL Server 2008 for Developers
SQL Server 2008 for Developers
ukdpe
 
What's New for Data?
What's New for Data?What's New for Data?
What's New for Data?ukdpe
 
Microsoft Database Options
Microsoft Database OptionsMicrosoft Database Options
Microsoft Database Options
David Chou
 
Intro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationIntro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data Integration
Philip Yurchuk
 
Entity framework 4.0
Entity framework 4.0Entity framework 4.0
Entity framework 4.0
Abhishek Sur
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
Databricks
 
Practical OData
Practical ODataPractical OData
Practical OData
Vagif Abilov
 
Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)
David McCarter
 
Olap
OlapOlap
Olap
preksha33
 
Tech Days09 Sqldev
Tech Days09 SqldevTech Days09 Sqldev
Tech Days09 Sqldev
llangit
 
SQL Server 2008 for Developers
SQL Server 2008 for DevelopersSQL Server 2008 for Developers
SQL Server 2008 for Developers
llangit
 
SQL Server 2008 for .NET Developers
SQL Server 2008 for .NET DevelopersSQL Server 2008 for .NET Developers
SQL Server 2008 for .NET Developers
llangit
 
ASP.NET 3.5 SP1
ASP.NET 3.5 SP1ASP.NET 3.5 SP1
ASP.NET 3.5 SP1
Dave Allen
 
WebSphere Commerce v7 Data Load
WebSphere Commerce v7 Data LoadWebSphere Commerce v7 Data Load
WebSphere Commerce v7 Data Load
Francesco Schettini
 
Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)
David McCarter
 
Intake 37 ef2
Intake 37 ef2Intake 37 ef2
Intake 37 ef2
Mahmoud Ouf
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 

Similar to Presentation pdi data_vault_framework_meetup2012 (20)

Windows Azure and a little SQL Data Services
Windows Azure and a little SQL Data ServicesWindows Azure and a little SQL Data Services
Windows Azure and a little SQL Data Services
 
SQL Server 2008 for Developers
SQL Server 2008 for DevelopersSQL Server 2008 for Developers
SQL Server 2008 for Developers
 
What's New for Data?
What's New for Data?What's New for Data?
What's New for Data?
 
Microsoft Database Options
Microsoft Database OptionsMicrosoft Database Options
Microsoft Database Options
 
Intro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationIntro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data Integration
 
Entity framework 4.0
Entity framework 4.0Entity framework 4.0
Entity framework 4.0
 
It ready dw_day3_rev00
It ready dw_day3_rev00It ready dw_day3_rev00
It ready dw_day3_rev00
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Practical OData
Practical ODataPractical OData
Practical OData
 
Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)
 
Olap
OlapOlap
Olap
 
Tech Days09 Sqldev
Tech Days09 SqldevTech Days09 Sqldev
Tech Days09 Sqldev
 
SQL Server 2008 for Developers
SQL Server 2008 for DevelopersSQL Server 2008 for Developers
SQL Server 2008 for Developers
 
SQL Server 2008 for .NET Developers
SQL Server 2008 for .NET DevelopersSQL Server 2008 for .NET Developers
SQL Server 2008 for .NET Developers
 
ASP.NET 3.5 SP1
ASP.NET 3.5 SP1ASP.NET 3.5 SP1
ASP.NET 3.5 SP1
 
WebSphere Commerce v7 Data Load
WebSphere Commerce v7 Data LoadWebSphere Commerce v7 Data Load
WebSphere Commerce v7 Data Load
 
Running Databases on AWS
Running Databases on AWSRunning Databases on AWS
Running Databases on AWS
 
Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)
 
Intake 37 ef2
Intake 37 ef2Intake 37 ef2
Intake 37 ef2
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 

Recently uploaded

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 

Recently uploaded (20)

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 

Presentation pdi data_vault_framework_meetup2012

  • 2. Data Vault Definition The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. It is a data model that is architected specifically to meet the needs of enterprise data warehouses. Source: Dan Linstedt http://www.tdan.com/view-articles/5054/
  • 3. Data Vault Building Blocks different sources/rate of change Source: Dan Linstedt http://www.slideshare.net/dlinstedt/introduction-to-data-vault-dama-oregon-2012
  • 4. Data Vault Fundamentals: Hub Source: data-vault-modeling-guide GENESEE ACADEMY, LLC, Hans Hultgren
  • 5. Data Vault Fundamentals: Link Source: data-vault-modeling-guide GENESEE ACADEMY, LLC, Hans Hultgren
  • 6. Data Vault Fundamentals: Satellite Source: data-vault-modeling-guide GENESEE ACADEMY, LLC, Hans Hultgren
  • 7. Data Vault Fundamentals: Model Source: data-vault-modeling-guide GENESEE ACADEMY, LLC, Hans Hultgren
  • 8. Data Vault ETL Many objects to load, standardized procedures This screams for a generic solution! I don't want to: throw ETL tool away and code it all myself manage too many ETL objects connect similar columns in mappings by hand I do want to: generate ETL (Kettle) objects? No Take it one step further: there's only 1 parameterised hub load object. Don't need to know xml structure of PDI objects
  • 9. Tools Operating System Database Virtualization Data Integration 'Productivity' Sql Development Version Control
  • 10. Place of framework in architecture Files MySQL ETL: Kettle MySQL DBMS ETL Data Data ETL Vault Vault CSV Frame Files work ERP Staging Central DWH & Area Data Marts Sources ETL Process Data Warehouse EUL
  • 11. What has to be taken care of? Data Vault designed and implemented in database Staging tables and loading procedures in place (can also be generic, we use PDI Metadata Injection step for loading files) Mapping from source to Data Vault specified (now in an Excel sheet)
  • 12. Framework components PDI repository (file based), jobs and transformations Configuration files: kettle.properties shared.xml repositories.xml Excel sheet that contains the specifications MySQL database for metadata Virtual machine with Ubuntu 12.04 Server
  • 13. Design decisions Updateable views with generic column names (MySQL more lenient than PostgreSQL) Compare satellite attributes via string comparison (concatenate all columns, with | (pipe) as delimiter) 'inject' the metadata using Kettle parameters Generate and use an error table for each Data Vault table
  • 14. Metadata tables All have history tables
  • 15. Metadata in Excel Data Vault connections source systems source tables
  • 16. Metadata in Excel (hub + sat) x 200 (max)
  • 17. Metadata in Excel (link) x 10 link attributes
  • 18. Metadata in Excel (link satellite) x 10 x5 x 200 (max)
  • 19. Last seen date applicable for hubs and links existing hubs and links: update 'last_seen_dts'!
  • 20. Link validity satellite Link has 'business key': not all hub id's
  • 22. 'design errors' Checks to avoid debugging: (compares design metadata with Data Vault DB information_schema) hubs, links, satellites that don't exist in the DV key columns that do not exist in the DV missing connection data (source db) missing attribute columns
  • 24. Metadata needed for a hub name key column business key column source table source table business key column (can be expression, e.g. concatenate for composite key)
  • 27. Metadata needed for a link name key column for each hub (maximum 10, can be a ref-table) hub name column name for the hub key in the link (roles!) column in the source table → business key of hub link 'attributes' (part of key, no hub, maximum 5) link validity satellite needed? last seen date needed? source table
  • 29. Transformation for link Last seen? Lookup hubs Remove columns not in link Run table needed for validity sat ?
  • 30. Metadata needed for a hub satellite name key column hub name column in the source table → business key of hub for each attribute (maximum 200) source column target column source table
  • 31. Job for hub satellite
  • 33. Metadata needed for a link satellite name key column link name for each hub of the link: column in the source table → business key of hub for each key attribute: source column for each attribute: source column → target column source table
  • 34. Job for link satellite
  • 36. Executing in a loop ..
  • 38. Logging Custom logging PDI logging Configuring log tables for concurrent access
  • 41. Some points of interest Easy to make mistake in design sheet Generic → a bit harder to maintain and debug Application/tool to maintain metadata? Data Vault generators (e.g. Quipu)? Spinoff using Informatica and Oracle: Sander Robijns Thanks to: Jos van Dongen Kasper de Graaf