Why Data Vault?


Published on

Given at Oracle Open World 2011: Not to be confused with Oracle Database Vault (a commercial db security product), Data Vault Modeling is a specific data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It has been in use globally for over 10 years now but is not widely known. The purpose of this presentation is to provide an overview of the features of a Data Vault modeled EDW that distinguish it from the more traditional third normal form (3NF) or dimensional (i.e., star schema) modeling approaches used in most shops today. Topics will include dealing with evolving data requirements in an EDW (i.e., model agility), partitioning of data elements based on rate of change (and how that affects load speed and storage requirements), and where it fits in a typical Oracle EDW architecture. See more content like this by following my blog http://kentgraziano.com or follow me on twitter @kentgraziano.

Published in: Technology, Business

Why Data Vault?

  1. 1. Why Data Vault? Kent GrazianoData Vault Master and Oracle ACE TrueBridge Resources OOW 2011 Session #28782
  2. 2. My Bio• Kent Graziano – Certified Data Vault Master – Oracle ACE (BI/DW) – Data Architecture and Data Warehouse Specialist • 30 years in IT • 20 years of Oracle-related work • 15+ years of data warehousing experience – Co-Author of • The Business of Data Vault Modeling (2008) • The Data Model Resource Book (1st Edition) • Oracle Designer: A Template for Developing an Enterprise Standards Document – Past-President of Oracle Development Tools User Group (ODTUG) and Rocky Mountain Oracle User Group – Co-Chair BIDW SIG for ODTUG
  3. 3. Data Vault DefinitionThe Data Vault is a detail oriented, historicaltracking and uniquely linked set of normalizedtables that support one or more functional areasof business.It is a hybrid approach encompassing the best ofbreed between 3rd normal form (3NF) and starschema. The design is flexible, scalable, consistent,and adaptable to the needs of the enterprise. It is adata model that is architected specifically to meetthe needs of today’s enterprise data warehouses. Dan Linstedt: Defining the Data Vault TDAN.com Article (C) TeachDataVault.com
  4. 4. Where does a Data Vault Fit? (C) TeachDataVault.com
  5. 5. Where does a Data Vault Fit?Oracle’s Next Generation Data Warehouse Reference Architecture Data Vault goes here (C) Oracle Corp
  6. 6. Why Bother With Something New? Old Chinese proverb: Unless you change direction, youre apt to end up where youre headed. (C) TeachDataVault.com
  7. 7. Why do we need it?• We have seen issues in constructing (and managing) an enterprise data warehouse model using 3rd normal form, or Star Schema. – 3NF – Complex PKs with cascading snapshot dates (time-driven PKs) – Star – difficult to re-engineer fact tables for granularity changes• These issues lead to break downs in flexibility, adaptability, and even scalability (C) Kent Graziano
  8. 8. Data Vault Time LineE.F. Codd invented 1976 Dr Peter Chen 1990 – Dan Linstedtrelational modeling Created E-R Begins R&D on Data Vault Diagramming Modeling Chris Date and Hugh Darwen Maintained Mid 70’s AC Nielsen and Refined Popularized Modeling Dimension & Fact Terms1960 1970 1980 1990 2000 Late 80’s – Barry Devlin and Early 70’s Bill Inmon Dr Kimball Release “Business Began Discussing Data Data Warehouse” Warehousing Mid 80’s Bill Inmon Popularizes Data Mid 60’s Dimension & Fact Modeling Warehousing presented by General Mills and 2000 – Dan Linstedt Dartmouth University Mid – Late 80’s Dr Kimball releases first 5 articles on Popularizes Star Schema Data Vault Modeling (C) TeachDataVault.com
  9. 9. Data Vault Modeling… (C) TeachDataVault.com
  10. 10. What Are the Issues?This is NOT what youwant happening toyour project! (C) TeachDataVault.com THE GAP!!
  11. 11. What Are the Foundational Keys? Flexibility Scalability Productivity (C) TeachDataVault.com
  12. 12. Key: Flexibility (Agility)Enabling rapid change on a massive scale without downstream impacts! (C) TeachDataVault.com
  13. 13. Key: ScalabilityProviding no foreseeable barrier to increased size and scope People, Process, & Architecture! (C) TeachDataVault.com
  14. 14. Key: ProductivityEnabling low complexity systems with high value output at a rapid pace (C) TeachDataVault.com
  15. 15. Bringing the Data Vault to Your ProjectHOW DOES IT WORK? (C) TeachDataVault.com
  16. 16. Key: Flexibility (Agility)• Goes beyond standard 3NF • Hyper normalized • Hubs and Links only holds keys and meta data • Satellites split by rate of change and/or source • Enables Agile data modeling • Easy to add to model without having to change existing structures and load routines • Relationships (links) can be dropped and created on-demand. • No more reloading history because of a missed requirement• Based on natural business keys • Not system surrogate keys • Allows for integrating data across functions and source systems more easily • All data relationships are key driven. (C) TeachDataVault.com
  17. 17. Key: Flexibility (Agility)Adding new components to the EDW has NEAR ZERO impact to:• Existing Loading Processes• Existing Data Model• Existing Reporting & BI Functions• Existing Source Systems• Existing Star Schemas and Data Marts (C) TeachDataVault.com
  18. 18. Split and Merge ON DEMAND! 2 weeks from now 6 months from now (C) TeachDataVault.com
  19. 19. Case In Point: Result of flexibility of the Data Vault Model allowed them to merge 3 companies in 90 days – that is ALL systems, ALL DATA! (C) TeachDataVault.com
  20. 20. Key: Scalability in ArchitectureScaling is easy, its based on the following principles• Hub and spoke design• MPP Shared-Nothing Architecture• Scale Free Networks• Can be partitioned vertically and horizontally to meet performance demands (C) TeachDataVault.com
  21. 21. Perhaps You Wish To Split For Performance Reasons?FROM THIS TO THIS! (C) TeachDataVault.com
  22. 22. Case In Point: Result of scalability was to produce a Data Vault model that scaled to 3 Petabytes in size, and is still growing today! (C) TeachDataVault.com
  23. 23. Key: Scalability in Team Size You should be able to SCALE your TEAM as well! With the Data Vault methodology, you can:Scale your team when desired, at different points in the project! (C) TeachDataVault.com
  24. 24. Case In Point: (Dutch Tax Authority) Result of scalability was to increase ETL developers for each new source system, and reassign them when the system was completely loaded to the Data Vault (C) TeachDataVault.com
  25. 25. Key: ProductivityIncreasing Productivity requires a reduction in complexity.The Data Vault Model simplifies all of the following:• ETL Loading Routines• Real-Time Ingestion of Data• Data Modeling for the EDW• Enhancing and Adapting for Change to the Model• Ease of Monitoring, managing and optimizing processes (C) TeachDataVault.com
  26. 26. Key: Productivity• Standardized modeling rules • Highly repeatable and learnable modeling technique • Can standardize load routines • Delta Driven process • Re-startable, consistent loading patterns. • Can standardize extract routines • Rapid build of new or revised Data Marts • Can be automated • RapidACE (www.rapidace.com) (C) Kent Graziano
  27. 27. Key: Productivity• The Data Vault holds granular historical relationships. • Holds all history for all time, allowing any source system feeds to be reconstructed on-demand • Easy generation of Audit Trails for data lineage and compliance. • Data Mining can discover new relationships between elements • Patterns of change emerge from the historical pictures and linkages.• The Data Vault can be accessed by power-users (C) Kent Graziano
  28. 28. Case in Point: Result of Productivity was: 2 people in 2 weeks merged 3 systems, built a full Data Vault EDW, 5 star schemas and 3 reports. These individuals generated: • 90% of the ETL code for moving the data set • 100% of the Staging Data Model • 75% of the finished EDW data Model • 75% of the star schema data model (C) TeachDataVault.com
  29. 29. The Competing Bid? The competition bid this with 15 people and 3 months to completion, at a cost of $250k! (they bid a Very complex system)Actual total cost? $30k and 2 weeks! (C) TeachDataVault.com
  30. 30. Other Benefits of a Data Vault• Modeling it as a DV forces integration of the Business Keys upfront. • Good for organizational alignment.• An integrated data set with raw data extends it’s value beyond BI: • Source for data quality projects • Source for master data • Source for data mining • Source for Data as a Service (DaaS) in an SOA (Service Oriented Architecture).• Upfront Hub integration simplifies the data integration routines required to load data marts. • Helps divide the work a bit.• It is much easier to implement security on these granular pieces.• Granular, re-startable processes enable pin-point failure correction.• It is designed and optimized for real-time loading in its core architecture (without any tweaks or mods). (C) Kent Graziano
  31. 31. Conclusion? Changing the direction of the river takesless effort than stopping the flow of water (C) TeachDataVault.com
  32. 32. The Experts Say… “The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” Bill Inmon “The Data Vault is foundationally strong and exceptionally scalable architecture.” Stephen Brobst “The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” Doug Laney
  33. 33. More Notables… “This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.” Howard Dresner “[The Data Vault] captures a practical body of knowledge for data warehouse development which both agile and traditional practitioners will benefit from..” Scott Ambler
  34. 34. Who’s Using It?
  35. 35. Growing Adoption…• The number of Data Vault users in the US surpassed 500 in 2010 and grows rapidly (http://danlinstedt.com/about/dv- customers/) (C) Kent Graziano
  36. 36. In Review…• Data Vault provides you with the tools you need to succeed in your DW/BI projects• Flexibility • Enabling rapid change on a massive scale without downstream impacts!• Scalability • Providing no foreseeable barrier to increased size and scope• Productivity • Enabling low complexity systems with high value output at a rapid pace (C) TeachDataVault.com
  37. 37. (C) TeachDataVault.com
  38. 38. Where To Learn More The Technical Modeling Book: http://LearnDataVault.com On YouTube: http://www.youtube.com/LearnDataVault On Facebook: www.facebook.com/learndatavault Dan’s Blog: www.danlinstedt.comThe Discussion Forums: http://LinkedIn.com – Data Vault Discussions World wide User Group (Free): http://dvusergroup.com The Business of Data Vault Modeling by Dan Linstedt, Kent Graziano, Hans Hultgren (available at www.lulu.com ) 38
  39. 39. 10/11/2011 (C) TeachDataVault.com 39
  40. 40. Contact Information Kent Graziano Kent.graziano@att.net Want more Data Vault?Session # 05923: Introduction to Data Vault Modeling Thursday, 4:00 PM, Moscone South Rm 303