(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

3,677 views
3,511 views

Published on

This is the presentation I gave at OakTable World 2013 in San Francisco. #OTW13 was held at the Children's Creativity Museum next to the Moscone Convention Center and was in parallel with Oracle OpenWorld 2013.

The session discussed our attempts to be more agile in designing enterprise data warehouses and how the Data Vault Data Modeling technique helps in that approach.

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,677
On SlideShare
0
From Embeds
0
Number of Embeds
250
Actions
Shares
0
Downloads
178
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

  1. 1. Agile Data Warehouse Modeling: Introduction to Data Vault Modeling Kent Graziano Data Warrior LLC Twitter @KentGraziano
  2. 2. Agenda  Bio  What do we mean by Agile?  What is a Data Vault?  Where does it fit in an Oracle BI architecture  How to design a Data Vault model  Being “agile”
  3. 3. My Bio  Oracle ACE Director  Certified Data Vault Master and DV 2.0 Architect  Blogger: Oracle Data Warrior  Data Architecture and Data Warehouse Specialist ● 30+ years in IT ● 20+ years of Oracle-related work ● 15+ years of data warehousing experience  Co-Author of ● The Business of Data Vault Modeling ● The Data Model Resource Book (1st Edition)  Editor of “The” Data Vault Book  Past-President of ODTUG and Rocky Mountain Oracle User Group
  4. 4. Manifesto for Agile Software Development  “We are uncovering better ways of developing software by doing it and helping others do it.  Through this work we have come to value:  Individuals and interactions over processes and tools  Working software over comprehensive documentation  Customer collaboration over contract negotiation  Responding to change over following a plan  That is, while there is value in the items on the right, we value the items on the left more.”  http://agilemanifesto.org/
  5. 5. Applying the Agile Manifesto to DW  User Stories instead of requirements documents  Time-boxed iterations ● Iteration has a standard length ● Choose one or more user stories to fit in that iteration  Rework is part of the game ● There are no “missed requirements”... only those that haven’t been delivered or discovered yet.
  6. 6. Data Vault Definition The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. Dan Linstedt: Defining the Data Vault TDAN.com Article Architected specifically to meet the needs of today’s enterprise data warehouses
  7. 7. What is Data Vault Trying to Solve?  What are our other Enterprise Data Warehouse options? ● Third-Normal Form (3NF): Complex primary keys (PK’s) with cascading snapshot dates ● Star Schema (Dimensional): Difficult to reengineer fact tables for granularity changes  Difficult to get it right the first time  Not adaptable to rapid business change  NOT AGILE! (C) Kent Graziano
  8. 8. Data Vault Time Line 20001960 1970 1980 1990 E.F. Codd invented relational modeling Chris Date and Hugh Darwen Maintained and Refined Modeling 1976 Dr Peter Chen Created E-R Diagramming Early 70’s Bill Inmon Began Discussing Data Warehousing Mid 60’s Dimension & Fact Modeling presented by General Mills and Dartmouth University Mid 70’s AC Nielsen Popularized Dimension & Fact Terms Mid – Late 80’s Dr Kimball Popularizes Star Schema Mid 80’s Bill Inmon Popularizes Data Warehousing Late 80’s – Barry Devlin and Dr Kimball Release “Business Data Warehouse” 1990 – Dan Linstedt Begins R&D on Data Vault Modeling 2000 – Dan Linstedt releases first 5 articles on Data Vault Modeling © LearnDataVault.com
  9. 9. Data Vault Evolution  The work on the Data Vault approach began in the early 1990s, and completed around 1999.  Throughout 1999, 2000, and 2001, the Data Vault design was tested, refined, and deployed into specific customer sites.  In 2002, the industry thought leaders were asked to review the architecture. ● This is when I attend my first DV seminar in Denver and met Dan!  In 2003, Dan began teaching the modeling techniques to the mass public. (C) Kent Graziano
  10. 10. Where does a Data Vault Fit? © LearnDataVault.com
  11. 11. Oracle Information Management Reference Architecture  Staging Layer ● Change tables ● Reject tables for Data Quality ● External tables for file feeds  Foundation Layer ● Transactional granularity maintained ● Process neutral: no user or business requirements ● Just recording what happened  Access and Performance Layer ● Dimensional model ● “Star Schemas” ● Process specific: targeting user and business requirements
  12. 12. Where does Data Vault fit? Data Vault goes here
  13. 13. What is a Foundation Layer?  Basis for long term enterprise scale data warehouse  Must be atomic level data ● A historical source of facts  Not based on any one data source or system  Single point of integration  Flexible  Extensible  Provides data to the access/reporting layer (C) Kent Graziano
  14. 14. How to be Agile using DV and Oracle  Model iteratively ● Use Data Vault data modeling technique ● Create basic components, then add over time  Virtualize the Access Layer ● Don’t waste time building facts and dimensions up front ● ETL and testing takes too long ● “Project” objects using pattern-based DV model with OBIEE BMM or Oracle Views  Users see real reports with real data (C) Kent Graziano
  15. 15. Data Vault: 3 Simple Structures © LearnDataVault.com
  16. 16. Data Vault Core Architecture  Hubs = Unique List of Business Keys  Links = Unique List of Relationships across keys  Satellites = Descriptive Data  Satellites have one and only one parent table  Satellites cannot be “Parents” to other tables  Hubs cannot be child tables © LearnDataVault.com
  17. 17. 1. Hub = Business Keys Hubs = Unique Lists of Business Keys Business Keys are used to TRACK and IDENTIFY key information (C) Kent Graziano
  18. 18. Hub Definition  What Makes a Hub Key? ● A Hub is based on an identifiable business key. ● An identifiable business key is an attribute that is used in the source systems to locate data. ● The business key has a very low propensity to change, and usually is not editable on the source systems. ● The business key has the same semantic meaning, and the same granularity across the company, but not necessarily the same format.  Attributes and Ordering ● All attributes are mandatory. ● Sequence ID 1st, Busn. Key 2nd , Load Date 3rd ,Record Source Last (4th). ● All attributes in the Business Key form a UNIQUE Index. © LearnDataVault.com
  19. 19. 2: Links = Associations Links = Transactions and Associations They are used to hook together multiple sets of information (C) Kent Graziano
  20. 20. Link Definitions  What Makes a Link? ● A Link is based on identifiable business element relationships. ● Otherwise known as a foreign key, ● AKA a business event or transaction between business keys, ● The relationship shouldn’t change over time ● It is established as a fact that occurred at a specific point in time and will remain that way forever. ● The link table may also represent a hierarchy.  Attributes ● All attributes are mandatory (C) LearnDataVault.com
  21. 21. Modeling Links - 1:1 or 1:M?  Today: ● Relationship is a 1:1 so why model a Link?  Tomorrow: ● The business rule can change to a 1:M. ● You discover new data later.  With a Link in the Data Vault: ● No need to change the EDW structure. ● Existing data is fine. ● New data is added. (C) Kent Graziano
  22. 22. 3. Satellites = Descriptors Satellites provide context for the Hubs and the Links (C) Kent Graziano
  23. 23. Satellite Definitions  What Makes a Satellite? ● A Satellite is based on an non-identifying business elements. ● The Satellite data changes, sometimes rapidly, sometimes slowly. ● The Satellite is dependent on the Hub or Link key as a parent, ● Satellites are never dependent on more than one parent table. ● The Satellite is never a parent table to any other table (no snow flaking).  Attributes and Ordering ● All attributes are mandatory – EXCEPT END DATE. ● Parent ID 1st, Load Date 2nd, Load End Date 3rd,Record Source Last. (C) LearnDataVault.com
  24. 24. Satellite Entity- Details  A Satellite has only 1 foreign key; it is dependent on the parent table (Hub or Link)  A Satellite may or may not have an “Item Numbering” attribute.  A Satellite’s Load Date represents the date the EDW saw the data (must be a delta set). ● This is not Effective Date from the Source!  A Satellite’s Record Source represents the actual source of the row (unit of work).  To avoid Outer Joins, you must ensure that every satellite has at least 1 entry for every Hub Key. (C) LearnDataVault.com
  25. 25. Data Vault Model Flexibility (Agility)  Goes beyond standard 3NF • Hyper normalized ● Hubs and Links only hold keys and meta data ● Satellites split by rate of change and/or source • Enables Agile data modeling ● Easy to add to model without having to change existing structures and load routines • Relationships (links) can be dropped and created on-demand. ● No more reloading history because of a missed requirement  Based on natural business keys • Not system surrogate keys • Allows for integrating data across functions and source systems more easily ● All data relationships are key driven. © LearnDataVault.com
  26. 26. Data Vault Extensibility Adding new components to the EDW has NEAR ZERO impact to: • Existing Loading Processes • Existing Data Model • Existing Reporting & BI Functions • Existing Source Systems • Existing Star Schemas and Data Marts © LearnDataVault.com
  27. 27.  Standardized modeling rules • Highly repeatable and learnable modeling technique • Can standardize load routines ● Delta Driven process ● Re-startable, consistent loading patterns. • Can standardize extract routines ● Rapid build of new or revised Data Marts • Can be automated ‣ Can use a BI-meta layer to virtualize the reporting structures ‣ Example: OBIEE Business Model and Mapping tool ‣ Can put views on the DV structures as well ‣ Simulate ODS/3NF or Star Schemas Data Vault Productivity (C) Kent Graziano
  28. 28. • The Data Vault holds granular historical relationships. • Holds all history for all time, allowing any source system feeds to be reconstructed on- demand • Easy generation of Audit Trails for data lineage and compliance. • Data Mining can discover new relationships between elements • Patterns of change emerge from the historical pictures and linkages. • The Data Vault can be accessed by power-users © LearnDataVault.com Data Vault Adaptability
  29. 29. Other Benefits of a Data Vault  Modeling it as a DV forces integration of the Business Keys upfront. • Good for organizational alignment.  An integrated data set with raw data extends it’s value beyond BI: • Source for data quality projects • Source for master data • Source for data mining • Source for Data as a Service (DaaS) in an SOA (Service Oriented Architecture).  Upfront Hub integration simplifies the data integration routines required to load data marts. • Helps divide the work a bit.  It is much easier to implement security on these granular pieces.  Granular, re-startable processes enable pin-point failure correction.  It is designed and optimized for real-time loading in its core architecture (without any tweaks or mods). © LearnDataVault.com
  30. 30. Worlds Smallest Data Vault  The Data Vault doesn’t have to be “BIG”.  An Data Vault can be built incrementally.  Reverse engineering one component of the existing models is not uncommon.  Building one part of the Data Vault, then changing the marts to feed from that vault is a best practice.  The smallest Enterprise Data Warehouse consists of two tables: ● One Hub, ● One Satellite Hub_Cust_Seq_ID Hub_Cust_Num Hub_Cust_Load_DTS Hub_Cust_Rec_Src Hub Customer Hub_Cust_Seq_ID Sat_Cust_Load_DTS Sat_Cust_Load_End_DTS Sat_Cust_Name Sat_Cust_Rec_Src Satellite Customer Name © LearnDataVault.com
  31. 31. Notably…  In 2008 Bill Inmon stated that the “Data Vault is the optimal approach for modeling the EDW in the DW2.0 framework.” (DW2.0)  The number of Data Vault users in the US surpassed 500 in 2010 and grows rapidly (http://danlinstedt.com/about/dv-customers/)
  32. 32. Organizations using Data Vault  WebMD Health Services  Anthem Blue-Cross Blue Shield  MD Anderson Cancer Center  Denver Public Schools  Independent Purchasing Cooperative (IPC, Miami) • Owner of Subway  Kaplan  US Defense Department  Colorado Springs Utilities  State Court of Wyoming  Federal Express  US Dept. Of Agriculture
  33. 33. What’s New in DV2.0?  Modeling Structure Includes… ● NoSQL, and Non-Relational DB systems, Hybrid Systems ● Minor Structure Changes to support NoSQL  New ETL Implementation Standards ● For true real-time support ● For NoSQL support  New Architecture Standards ● To include support for NoSQL data management systems  New Methodology Components ● Including CMMI, Six Sigma, and TQM ● Including Project Planning, Tracking, and Oversight ● Agile Delivery Mechanisms ● Standards, and templates for Projects © LearnDataVault.com
  34. 34. Conclusion? Changing the direction of the river takes less effort than stopping the flow of water © LearnDataVault.com
  35. 35. Summary • Data Vault provides a data modeling technique that allows: ‣ Model Agility ‣ Enabling rapid changes and additions ‣ Productivity ‣ Enabling low complexity systems with high value output at a rapid pace ‣ Easy projections of dimensional models ‣ So? Agile Data Warehousing?
  36. 36. Super Charge Your Data Warehouse Available on Amazon.com Soft Cover or Kindle Format Now also available in PDF at LearnDataVault.com Hint: Kent is the Technical Editor
  37. 37. Data Vault References www.learndatavault.com www.danlinstedt.com On LinkedIn: http://www.linkedin.com/groups?gid=44926 On YouTube: www.youtube.com/LearnDataVault On Facebook: www.facebook.com/learndatavault
  38. 38. Contact Information Kent Graziano The Oracle Data Warrior Data Warrior LLC Kent.graziano@att.net Visit my blog at http://kentgraziano.com

×