SlideShare a Scribd company logo
1 of 14
Download to read offline
Data Vault 2.0: Using MD5 Hashes for 
Change Data Capture 
Kent Graziano 
Data Warrior LLC 
Twitter @KentGraziano
Data Vault Definition 
The Data Vault is a detail oriented, historical tracking 
and uniquely linked set of normalized tables that 
support one or more functional areas of business. 
It is a hybrid approach encompassing the best of 
breed between 3rd normal form (3NF) and star 
schema. The design is flexible, scalable, consistent 
and adaptable to the needs of the enterprise. 
Architected specifically to meet the needs 
of today’s enterprise data warehouses 
Dan Linstedt: Defining the Data Vault 
TDAN.com Article
Data Vault Time Line 
E.F. Codd invented 
relational modeling 
Chris Date and 
Hugh Darwen 
Maintained and 
Refined 
Modeling 
1976 Dr Peter Chen 
Created E-R 
Diagramming 
Mid 70’s AC Nielsen 
Popularized 
Dimension & Fact Terms 
1990 – Dan Linstedt 
Begins R&D on Data 
Vault Modeling 
1960 1970 1980 1990 2000 
Early 70’s Bill 
Inmon Began 
Discussing Data 
Warehousing 
Mid 60’s Dimension & Fact 
Modeling presented by 
General Mills and Dartmouth 
University 
Late 80’s – Barry 
Devlin and Dr Kimball 
Release “Business 
Data Warehouse” 
Mid 80’s Bill Inmon 
Popularizes Data 
Warehousing 
Mid – Late 80’s Dr Kimball 
Popularizes Star Schema 
2000 – Dan Linstedt 
releases first 5 
articles on Data Vault 
Modeling 
© LearnDataVault.com
2014 - Next Evolution
What’s New in DV2.0? 
 Modeling Structure Includes… 
● NoSQL, and Non-Relational DB systems, Hybrid Systems 
● Minor Structure Changes to support NoSQL 
 New ETL Implementation Standards 
● For true real-time support 
● For NoSQL support 
 New Architecture Standards 
● To include support for NoSQL data management systems 
 New Methodology Components 
● Including CMMI, Six Sigma, and TQM 
● Including Project Planning, Tracking, and Oversight 
● Agile Delivery Mechanisms 
● Standards, and templates for Projects 
© LearnDataVault.com
This model is fully 
compliant with Hadoop, 
needs NO changes to 
work properly. 
The Hash Keys can be 
used to join to Hadoop 
data sets. 
MD5 PK – replaces 
surrogate keys 
MD5DIFF – used for 
change detection 
Use of MD5 Hash in DV2.0 
© LearnDataVault.com
MD5-based Change Detection 
 Think Type 2 SCD 
 Old Way: 
● Compare column by column 
● Source value != Current value in DW table 
● 20 columns, then 20 compares 
 New Way: 
● Concatenate all columns to one string 
● Convert to one char(32) string with hash function 
● Compare to hashed value (MD5DIFF) in target table 
● Does not matter how many columns 
© Data Warrior LLC
What does it look like? 
 Encode using standard MD5 hash 
function 
● rawtohex(sys.utl_raw.cast_to_raw( 
dbms_obfuscation_toolkit.md5 (input_string => 
...) 
 Need to minimize chance of duplicates 
● 12||3||45 and 1||2||345 hash to same value 
● Need a separator between each 
● Also handles case of null values 
● Example: Col1||’^’||Col2||’^’||Col3 
© Data Warrior LLC
Other considerations 
 To generate most consistent string: standardize! 
 Convert data types 
 If 'NUMBER', 'NVARCHAR2', 'NVARCHAR', 
'NCHAR‘ 
● THEN 'TO_CHAR(' || column_name || ')‘ 
 If 'RAW‘ 
● THEN 'ENC_BASE64(' || column_name || ')‘ 
 If 'DATE‘ 
● THEN 'TO_CHAR(' || column_name || ', ''YYYY-MM-DD'')‘ 
 If LIKE 'TIME%‘ 
● THEN 'TO_CHAR(' || column_name || ', ''YYYY-MM-DD 
HH24:MI:SS'')' 
© Data Warrior LLC
Final Input String 
(UPPER(TRIM(T1.GENERICNAME)) 
||'^'|| 
UPPER(TRIM( 
TO_CHAR(T1.MED_STRNG_AMT))) 
||'^'|| 
UPPER(TRIM(T1.UOM_CD)) 
||'^'|| 
UPPER(TRIM(T1.MED_FORM_NM)) 
||'^') 
© Data Warrior LLC
So what? 
 MD5 hash is consistent cross-platform 
 Changes multi-column compares to a single 
column 
 All compares take the same time during load 
process 
 Can use with any DW architecture that requires 
change detections 
 Virtually no limit 
● Think Big Data/Hadoop/NoSQL 
 Can generate the input string automatically 
● But that is another talk! 
© Data Warrior LLC
Learn more about Data Vault 
www.LearnDataVault.com 
www.danlinstedt.com 
On YouTube: 
www.youtube.com/LearnDataVault 
On Facebook: 
www.facebook.com/learndatavault
Super Charge Your Data Warehouse 
Available on Amazon.com 
Soft Cover or Kindle Format 
Now also available in PDF at 
LearnDataVault.com
Contact Information 
Kent Graziano 
The Oracle Data Warrior 
Data Warrior LLC 
Kent.graziano@att.net 
On Twitter @KentGraziano 
Visit my blog at 
http://kentgraziano.com

More Related Content

What's hot

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
pcherukumalla
 

What's hot (20)

Data Vault and DW2.0
Data Vault and DW2.0Data Vault and DW2.0
Data Vault and DW2.0
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Introduction to Azure Data Factory
Introduction to Azure Data FactoryIntroduction to Azure Data Factory
Introduction to Azure Data Factory
 
Data Vault Overview
Data Vault OverviewData Vault Overview
Data Vault Overview
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 
Présentation data vault et bi v20120508
Présentation data vault et bi v20120508Présentation data vault et bi v20120508
Présentation data vault et bi v20120508
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011
 
Data Modeling and Relational to NoSQL
 Data Modeling and Relational to NoSQL  Data Modeling and Relational to NoSQL
Data Modeling and Relational to NoSQL
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Azure Data Factory
Azure Data FactoryAzure Data Factory
Azure Data Factory
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 

Viewers also liked

10 Interesting Facts about Accounting
10 Interesting Facts about Accounting10 Interesting Facts about Accounting
10 Interesting Facts about Accounting
Arass A. Ahmed
 

Viewers also liked (19)

Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
 
Worst Practices in Data Warehouse Design
Worst Practices in Data Warehouse DesignWorst Practices in Data Warehouse Design
Worst Practices in Data Warehouse Design
 
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 DimensionsExtreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
 
Why Data Vault?
Why Data Vault? Why Data Vault?
Why Data Vault?
 
Visual Data Vault
Visual Data VaultVisual Data Vault
Visual Data Vault
 
Agile Data Warehousing: Using SDDM to Build a Virtualized ODS
Agile Data Warehousing: Using SDDM to Build a Virtualized ODSAgile Data Warehousing: Using SDDM to Build a Virtualized ODS
Agile Data Warehousing: Using SDDM to Build a Virtualized ODS
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
 
Agile Methods and Data Warehousing
Agile Methods and Data WarehousingAgile Methods and Data Warehousing
Agile Methods and Data Warehousing
 
Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012
 
Top Five Cool Features in Oracle SQL Developer Data Modeler
Top Five Cool Features in Oracle SQL Developer Data ModelerTop Five Cool Features in Oracle SQL Developer Data Modeler
Top Five Cool Features in Oracle SQL Developer Data Modeler
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)
 
Shorter time to insight more adaptable less costly bi with end to end modelst...
Shorter time to insight more adaptable less costly bi with end to end modelst...Shorter time to insight more adaptable less costly bi with end to end modelst...
Shorter time to insight more adaptable less costly bi with end to end modelst...
 
Data Vault: Data Warehouse Design Goes Agile
Data Vault: Data Warehouse Design Goes AgileData Vault: Data Warehouse Design Goes Agile
Data Vault: Data Warehouse Design Goes Agile
 
Wireless society, mobile learning
Wireless society, mobile learningWireless society, mobile learning
Wireless society, mobile learning
 
GRUPOD_APLICINFO_07
GRUPOD_APLICINFO_07GRUPOD_APLICINFO_07
GRUPOD_APLICINFO_07
 
10 Interesting Facts about Accounting
10 Interesting Facts about Accounting10 Interesting Facts about Accounting
10 Interesting Facts about Accounting
 
Revolucion industrial
Revolucion industrialRevolucion industrial
Revolucion industrial
 
Good design better society - I nuovi luoghi della comunicazione - Bari
Good design better society - I nuovi luoghi della comunicazione  - BariGood design better society - I nuovi luoghi della comunicazione  - Bari
Good design better society - I nuovi luoghi della comunicazione - Bari
 

Similar to Data Vault 2.0: Using MD5 Hashes for Change Data Capture

Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
PL dream
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
vivekjv
 

Similar to Data Vault 2.0: Using MD5 Hashes for Change Data Capture (20)

Cassandra20141009
Cassandra20141009Cassandra20141009
Cassandra20141009
 
Cassandra20141113
Cassandra20141113Cassandra20141113
Cassandra20141113
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
Cassandra Data Modelling
Cassandra Data ModellingCassandra Data Modelling
Cassandra Data Modelling
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
 
Module02
Module02Module02
Module02
 
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra GuruUse Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
 
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig KerstiensFive Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
 
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
 
Presentation
PresentationPresentation
Presentation
 
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series Database
 
MIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresMIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome Measures
 
Real-World Cassandra at ShareThis
Real-World Cassandra at ShareThisReal-World Cassandra at ShareThis
Real-World Cassandra at ShareThis
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
Indic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path aheadIndic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path ahead
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
 

More from Kent Graziano

More from Kent Graziano (9)

Balance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudBalance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data Cloud
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
 
Rise of the Data Cloud
Rise of the Data CloudRise of the Data Cloud
Rise of the Data Cloud
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with Snowflake
 
Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)
 
Making Sense of Schema on Read
Making Sense of Schema on ReadMaking Sense of Schema on Read
Making Sense of Schema on Read
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
 

Recently uploaded

如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
jk0tkvfv
 
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
LuisMiguelPaz5
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
mikehavy0
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
varanasisatyanvesh
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
wsppdmt
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
23050636
 

Recently uploaded (20)

如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
 
jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdf
 
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSDBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
 

Data Vault 2.0: Using MD5 Hashes for Change Data Capture

  • 1. Data Vault 2.0: Using MD5 Hashes for Change Data Capture Kent Graziano Data Warrior LLC Twitter @KentGraziano
  • 2. Data Vault Definition The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. Architected specifically to meet the needs of today’s enterprise data warehouses Dan Linstedt: Defining the Data Vault TDAN.com Article
  • 3. Data Vault Time Line E.F. Codd invented relational modeling Chris Date and Hugh Darwen Maintained and Refined Modeling 1976 Dr Peter Chen Created E-R Diagramming Mid 70’s AC Nielsen Popularized Dimension & Fact Terms 1990 – Dan Linstedt Begins R&D on Data Vault Modeling 1960 1970 1980 1990 2000 Early 70’s Bill Inmon Began Discussing Data Warehousing Mid 60’s Dimension & Fact Modeling presented by General Mills and Dartmouth University Late 80’s – Barry Devlin and Dr Kimball Release “Business Data Warehouse” Mid 80’s Bill Inmon Popularizes Data Warehousing Mid – Late 80’s Dr Kimball Popularizes Star Schema 2000 – Dan Linstedt releases first 5 articles on Data Vault Modeling © LearnDataVault.com
  • 4. 2014 - Next Evolution
  • 5. What’s New in DV2.0?  Modeling Structure Includes… ● NoSQL, and Non-Relational DB systems, Hybrid Systems ● Minor Structure Changes to support NoSQL  New ETL Implementation Standards ● For true real-time support ● For NoSQL support  New Architecture Standards ● To include support for NoSQL data management systems  New Methodology Components ● Including CMMI, Six Sigma, and TQM ● Including Project Planning, Tracking, and Oversight ● Agile Delivery Mechanisms ● Standards, and templates for Projects © LearnDataVault.com
  • 6. This model is fully compliant with Hadoop, needs NO changes to work properly. The Hash Keys can be used to join to Hadoop data sets. MD5 PK – replaces surrogate keys MD5DIFF – used for change detection Use of MD5 Hash in DV2.0 © LearnDataVault.com
  • 7. MD5-based Change Detection  Think Type 2 SCD  Old Way: ● Compare column by column ● Source value != Current value in DW table ● 20 columns, then 20 compares  New Way: ● Concatenate all columns to one string ● Convert to one char(32) string with hash function ● Compare to hashed value (MD5DIFF) in target table ● Does not matter how many columns © Data Warrior LLC
  • 8. What does it look like?  Encode using standard MD5 hash function ● rawtohex(sys.utl_raw.cast_to_raw( dbms_obfuscation_toolkit.md5 (input_string => ...)  Need to minimize chance of duplicates ● 12||3||45 and 1||2||345 hash to same value ● Need a separator between each ● Also handles case of null values ● Example: Col1||’^’||Col2||’^’||Col3 © Data Warrior LLC
  • 9. Other considerations  To generate most consistent string: standardize!  Convert data types  If 'NUMBER', 'NVARCHAR2', 'NVARCHAR', 'NCHAR‘ ● THEN 'TO_CHAR(' || column_name || ')‘  If 'RAW‘ ● THEN 'ENC_BASE64(' || column_name || ')‘  If 'DATE‘ ● THEN 'TO_CHAR(' || column_name || ', ''YYYY-MM-DD'')‘  If LIKE 'TIME%‘ ● THEN 'TO_CHAR(' || column_name || ', ''YYYY-MM-DD HH24:MI:SS'')' © Data Warrior LLC
  • 10. Final Input String (UPPER(TRIM(T1.GENERICNAME)) ||'^'|| UPPER(TRIM( TO_CHAR(T1.MED_STRNG_AMT))) ||'^'|| UPPER(TRIM(T1.UOM_CD)) ||'^'|| UPPER(TRIM(T1.MED_FORM_NM)) ||'^') © Data Warrior LLC
  • 11. So what?  MD5 hash is consistent cross-platform  Changes multi-column compares to a single column  All compares take the same time during load process  Can use with any DW architecture that requires change detections  Virtually no limit ● Think Big Data/Hadoop/NoSQL  Can generate the input string automatically ● But that is another talk! © Data Warrior LLC
  • 12. Learn more about Data Vault www.LearnDataVault.com www.danlinstedt.com On YouTube: www.youtube.com/LearnDataVault On Facebook: www.facebook.com/learndatavault
  • 13. Super Charge Your Data Warehouse Available on Amazon.com Soft Cover or Kindle Format Now also available in PDF at LearnDataVault.com
  • 14. Contact Information Kent Graziano The Oracle Data Warrior Data Warrior LLC Kent.graziano@att.net On Twitter @KentGraziano Visit my blog at http://kentgraziano.com