Why Data Vault?
          Kent Graziano
Data Vault Master and Oracle ACE
      TrueBridge Resources
           OOW 2011
         Session #28782
My Bio
• Kent Graziano
   – Certified Data Vault Master
   – Oracle ACE (BI/DW)
   – Data Architecture and Data Warehouse Specialist
      • 30 years in IT
      • 20 years of Oracle-related work
      • 15+ years of data warehousing experience
   – Co-Author of
      • The Business of Data Vault Modeling (2008)
      • The Data Model Resource Book (1st Edition)
      • Oracle Designer: A Template for Developing an Enterprise
        Standards Document
   – Past-President of Oracle Development Tools User Group
     (ODTUG) and Rocky Mountain Oracle User Group
   – Co-Chair BIDW SIG for ODTUG
Data Vault Definition
The Data Vault is a detail oriented, historical
tracking and uniquely linked set of normalized
tables that support one or more functional areas
of business.

It is a hybrid approach encompassing the best of
breed between 3rd normal form (3NF) and star
schema. The design is flexible, scalable, consistent,
and adaptable to the needs of the enterprise. It is a
data model that is architected specifically to meet
the needs of today’s enterprise data warehouses.

                       Dan Linstedt: Defining the Data Vault
                       TDAN.com Article



                       (C) TeachDataVault.com
Where does a Data Vault Fit?




          (C) TeachDataVault.com
Where does a Data Vault Fit?
Oracle’s Next Generation Data Warehouse Reference Architecture




                              Data Vault goes here

                             (C) Oracle Corp
Why Bother With Something New?
       Old Chinese proverb:
       'Unless you change direction, you're
       apt to end up where you're headed.'




                (C) TeachDataVault.com
Why do we need it?

• We have seen issues in constructing (and
  managing) an enterprise data warehouse model
  using 3rd normal form, or Star Schema.
   – 3NF – Complex PKs with cascading snapshot
     dates (time-driven PKs)
   – Star – difficult to re-engineer fact tables for
     granularity changes
• These issues lead to break downs in
  flexibility, adaptability, and even scalability

                        (C) Kent Graziano
Data Vault Time Line
E.F. Codd invented           1976 Dr Peter Chen                          1990 – Dan Linstedt
relational modeling          Created E-R                                 Begins R&D on Data Vault
                             Diagramming                                 Modeling
  Chris Date and Hugh
  Darwen Maintained            Mid 70’s AC Nielsen
  and Refined                  Popularized
  Modeling                     Dimension & Fact Terms



1960                  1970                   1980                        1990                2000
                                                              Late 80’s – Barry Devlin and
                        Early 70’s Bill Inmon                 Dr Kimball Release “Business
                        Began Discussing Data                 Data Warehouse”
                        Warehousing

                                                          Mid 80’s Bill Inmon
                                                          Popularizes Data
        Mid 60’s Dimension & Fact Modeling                Warehousing
        presented by General Mills and                                                       2000 – Dan Linstedt
        Dartmouth University                            Mid – Late 80’s Dr Kimball           releases first 5 articles on
                                                        Popularizes Star Schema              Data Vault Modeling
                                                (C) TeachDataVault.com
Data Vault Modeling…




       (C) TeachDataVault.com
What Are the Issues?
This is NOT what you
want happening to
your project!




                (C) TeachDataVault.com
                                         THE GAP!!
What Are the Foundational Keys?

        Flexibility


                                        Scalability

            Productivity



               (C) TeachDataVault.com
Key: Flexibility (Agility)




Enabling rapid change on a massive scale
     without downstream impacts!


               (C) TeachDataVault.com
Key: Scalability




Providing no foreseeable barrier to
     increased size and scope
      People, Process, & Architecture!

           (C) TeachDataVault.com
Key: Productivity




Enabling low complexity systems with high
       value output at a rapid pace

               (C) TeachDataVault.com
Bringing the Data Vault to Your Project

HOW DOES IT WORK?


                             (C) TeachDataVault.com
Key: Flexibility (Agility)
• Goes beyond standard 3NF
 • Hyper normalized
    • Hubs and Links only holds keys and meta data
    • Satellites split by rate of change and/or source
 • Enables Agile data modeling
    • Easy to add to model without having to change existing structures
      and load routines
        • Relationships (links) can be dropped and created on-demand.
    • No more reloading history because of a missed requirement
• Based on natural business keys
 • Not system surrogate keys
 • Allows for integrating data across functions and source
   systems more easily
    • All data relationships are key driven.


                                (C) TeachDataVault.com
Key: Flexibility (Agility)




Adding new components to the EDW has NEAR ZERO impact to:
• Existing Loading Processes
• Existing Data Model
• Existing Reporting & BI Functions
• Existing Source Systems
• Existing Star Schemas and Data Marts
                 (C) TeachDataVault.com
Split and Merge ON DEMAND!
        2 weeks from now




                  6 months from now




            (C) TeachDataVault.com
Case In Point:
        Result of flexibility of the Data Vault Model
        allowed them to merge 3 companies in 90
        days – that is ALL systems, ALL DATA!




                   (C) TeachDataVault.com
Key: Scalability in Architecture




Scaling is easy, its based on the following principles
• Hub and spoke design
• MPP Shared-Nothing Architecture
• Scale Free Networks
• Can be partitioned vertically and horizontally to meet performance demands

                            (C) TeachDataVault.com
Perhaps You Wish To Split For
    Performance Reasons?
FROM THIS

                                     TO THIS!




            (C) TeachDataVault.com
Case In Point:

         Result of scalability was to produce a Data
         Vault model that scaled to 3 Petabytes in
         size, and is still growing today!




                   (C) TeachDataVault.com
Key: Scalability in Team Size




        You should be able to SCALE your TEAM as well!
          With the Data Vault methodology, you can:
Scale your team when desired, at different points in the project!


                     (C) TeachDataVault.com
Case In Point:                 (Dutch Tax Authority)

         Result of scalability was to increase ETL
         developers for each new source system,
         and reassign them when the system was
         completely loaded to the Data Vault




                   (C) TeachDataVault.com
Key: Productivity




Increasing Productivity requires a reduction in complexity.
The Data Vault Model simplifies all of the following:
• ETL Loading Routines
• Real-Time Ingestion of Data
• Data Modeling for the EDW
• Enhancing and Adapting for Change to the Model
• Ease of Monitoring, managing and optimizing processes

                   (C) TeachDataVault.com
Key: Productivity
• Standardized modeling rules
  • Highly repeatable and learnable modeling
    technique
  • Can standardize load routines
     • Delta Driven process
     • Re-startable, consistent loading patterns.
  • Can standardize extract routines
     • Rapid build of new or revised Data Marts
  • Can be automated
     • RapidACE (www.rapidace.com)

                          (C) Kent Graziano
Key: Productivity
• The Data Vault holds granular historical relationships.
     • Holds all history for all time, allowing any source
       system feeds to be reconstructed on-demand
         •   Easy generation of Audit Trails for data lineage and
             compliance.
         •   Data Mining can discover new relationships between
             elements
         •   Patterns of change emerge from the historical
             pictures and linkages.
• The Data Vault can be accessed by power-users


                           (C) Kent Graziano
Case in Point:
             Result of Productivity was: 2 people in 2
             weeks merged 3 systems, built a full Data
             Vault EDW, 5 star schemas and 3 reports.




     These individuals generated:
     • 90% of the ETL code for moving the data set
     • 100% of the Staging Data Model
     • 75% of the finished EDW data Model
     • 75% of the star schema data model

                        (C) TeachDataVault.com
The Competing Bid?
 The competition bid this with 15 people
 and 3 months to completion, at a cost of
 $250k! (they bid a Very complex system)




Actual total cost? $30k and 2 weeks!

          (C) TeachDataVault.com
Other Benefits of a Data Vault
• Modeling it as a DV forces integration of the Business Keys upfront.
     • Good for organizational alignment.
• An integrated data set with raw data extends it’s value beyond BI:
     •   Source for data quality projects
     •   Source for master data
     •   Source for data mining
     •   Source for Data as a Service (DaaS) in an SOA (Service Oriented Architecture).
• Upfront Hub integration simplifies the data integration routines
  required to load data marts.
     •   Helps divide the work a bit.
•   It is much easier to implement security on these granular pieces.
•   Granular, re-startable processes enable pin-point failure correction.
•   It is designed and optimized for real-time loading in its core
    architecture (without any tweaks or mods).

                                            (C) Kent Graziano
Conclusion?




 Changing the direction of the river takes
less effort than stopping the flow of water

               (C) TeachDataVault.com
The Experts Say…
  “The Data Vault is the optimal choice for
  modeling the EDW in the DW 2.0
  framework.” Bill Inmon

   “The Data Vault is foundationally
   strong and exceptionally scalable
   architecture.”      Stephen Brobst



        “The Data Vault is a technique which some industry
        experts have predicted may spark a revolution as the
        next big thing in data modeling for enterprise
        warehousing....”                    Doug Laney
More Notables…

   “This enables organizations to take control of their
   data warehousing destiny, supporting better and
   more relevant data warehouses in less time than
   before.”                 Howard Dresner



  “[The Data Vault] captures a practical body of
  knowledge for data warehouse development which
  both agile and traditional practitioners will benefit
  from..”               Scott Ambler
Who’s Using It?
Growing Adoption…
• The number of Data Vault users in the US
  surpassed 500 in 2010 and grows rapidly
  (http://danlinstedt.com/about/dv-
  customers/)




                    (C) Kent Graziano
In Review…
• Data Vault provides you with the tools you need to
  succeed in your DW/BI projects
• Flexibility
   • Enabling rapid change on a massive scale without
     downstream impacts!
• Scalability
   • Providing no foreseeable barrier to increased size and
     scope
• Productivity
   • Enabling low complexity systems with high value output at
     a rapid pace


                         (C) TeachDataVault.com
(C) TeachDataVault.com
Where To Learn More
     The Technical Modeling Book: http://LearnDataVault.com

      On YouTube: http://www.youtube.com/LearnDataVault

          On Facebook: www.facebook.com/learndatavault

                 Dan’s Blog: www.danlinstedt.com

The Discussion Forums: http://LinkedIn.com – Data Vault Discussions

       World wide User Group (Free): http://dvusergroup.com

              The Business of Data Vault Modeling
          by Dan Linstedt, Kent Graziano, Hans Hultgren
                  (available at www.lulu.com )
                                                                      38
10/11/2011   (C) TeachDataVault.com   39
Contact Information


                 Kent Graziano
             Kent.graziano@att.net

            Want more Data Vault?
Session # 05923: Introduction to Data Vault Modeling
     Thursday, 4:00 PM, Moscone South Rm 303

Why Data Vault?

  • 1.
    Why Data Vault? Kent Graziano Data Vault Master and Oracle ACE TrueBridge Resources OOW 2011 Session #28782
  • 2.
    My Bio • KentGraziano – Certified Data Vault Master – Oracle ACE (BI/DW) – Data Architecture and Data Warehouse Specialist • 30 years in IT • 20 years of Oracle-related work • 15+ years of data warehousing experience – Co-Author of • The Business of Data Vault Modeling (2008) • The Data Model Resource Book (1st Edition) • Oracle Designer: A Template for Developing an Enterprise Standards Document – Past-President of Oracle Development Tools User Group (ODTUG) and Rocky Mountain Oracle User Group – Co-Chair BIDW SIG for ODTUG
  • 3.
    Data Vault Definition TheData Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent, and adaptable to the needs of the enterprise. It is a data model that is architected specifically to meet the needs of today’s enterprise data warehouses. Dan Linstedt: Defining the Data Vault TDAN.com Article (C) TeachDataVault.com
  • 4.
    Where does aData Vault Fit? (C) TeachDataVault.com
  • 5.
    Where does aData Vault Fit? Oracle’s Next Generation Data Warehouse Reference Architecture Data Vault goes here (C) Oracle Corp
  • 6.
    Why Bother WithSomething New? Old Chinese proverb: 'Unless you change direction, you're apt to end up where you're headed.' (C) TeachDataVault.com
  • 7.
    Why do weneed it? • We have seen issues in constructing (and managing) an enterprise data warehouse model using 3rd normal form, or Star Schema. – 3NF – Complex PKs with cascading snapshot dates (time-driven PKs) – Star – difficult to re-engineer fact tables for granularity changes • These issues lead to break downs in flexibility, adaptability, and even scalability (C) Kent Graziano
  • 8.
    Data Vault TimeLine E.F. Codd invented 1976 Dr Peter Chen 1990 – Dan Linstedt relational modeling Created E-R Begins R&D on Data Vault Diagramming Modeling Chris Date and Hugh Darwen Maintained Mid 70’s AC Nielsen and Refined Popularized Modeling Dimension & Fact Terms 1960 1970 1980 1990 2000 Late 80’s – Barry Devlin and Early 70’s Bill Inmon Dr Kimball Release “Business Began Discussing Data Data Warehouse” Warehousing Mid 80’s Bill Inmon Popularizes Data Mid 60’s Dimension & Fact Modeling Warehousing presented by General Mills and 2000 – Dan Linstedt Dartmouth University Mid – Late 80’s Dr Kimball releases first 5 articles on Popularizes Star Schema Data Vault Modeling (C) TeachDataVault.com
  • 9.
    Data Vault Modeling… (C) TeachDataVault.com
  • 10.
    What Are theIssues? This is NOT what you want happening to your project! (C) TeachDataVault.com THE GAP!!
  • 11.
    What Are theFoundational Keys? Flexibility Scalability Productivity (C) TeachDataVault.com
  • 12.
    Key: Flexibility (Agility) Enablingrapid change on a massive scale without downstream impacts! (C) TeachDataVault.com
  • 13.
    Key: Scalability Providing noforeseeable barrier to increased size and scope People, Process, & Architecture! (C) TeachDataVault.com
  • 14.
    Key: Productivity Enabling lowcomplexity systems with high value output at a rapid pace (C) TeachDataVault.com
  • 15.
    Bringing the DataVault to Your Project HOW DOES IT WORK? (C) TeachDataVault.com
  • 16.
    Key: Flexibility (Agility) •Goes beyond standard 3NF • Hyper normalized • Hubs and Links only holds keys and meta data • Satellites split by rate of change and/or source • Enables Agile data modeling • Easy to add to model without having to change existing structures and load routines • Relationships (links) can be dropped and created on-demand. • No more reloading history because of a missed requirement • Based on natural business keys • Not system surrogate keys • Allows for integrating data across functions and source systems more easily • All data relationships are key driven. (C) TeachDataVault.com
  • 17.
    Key: Flexibility (Agility) Addingnew components to the EDW has NEAR ZERO impact to: • Existing Loading Processes • Existing Data Model • Existing Reporting & BI Functions • Existing Source Systems • Existing Star Schemas and Data Marts (C) TeachDataVault.com
  • 18.
    Split and MergeON DEMAND! 2 weeks from now 6 months from now (C) TeachDataVault.com
  • 19.
    Case In Point: Result of flexibility of the Data Vault Model allowed them to merge 3 companies in 90 days – that is ALL systems, ALL DATA! (C) TeachDataVault.com
  • 20.
    Key: Scalability inArchitecture Scaling is easy, its based on the following principles • Hub and spoke design • MPP Shared-Nothing Architecture • Scale Free Networks • Can be partitioned vertically and horizontally to meet performance demands (C) TeachDataVault.com
  • 21.
    Perhaps You WishTo Split For Performance Reasons? FROM THIS TO THIS! (C) TeachDataVault.com
  • 22.
    Case In Point: Result of scalability was to produce a Data Vault model that scaled to 3 Petabytes in size, and is still growing today! (C) TeachDataVault.com
  • 23.
    Key: Scalability inTeam Size You should be able to SCALE your TEAM as well! With the Data Vault methodology, you can: Scale your team when desired, at different points in the project! (C) TeachDataVault.com
  • 24.
    Case In Point: (Dutch Tax Authority) Result of scalability was to increase ETL developers for each new source system, and reassign them when the system was completely loaded to the Data Vault (C) TeachDataVault.com
  • 25.
    Key: Productivity Increasing Productivityrequires a reduction in complexity. The Data Vault Model simplifies all of the following: • ETL Loading Routines • Real-Time Ingestion of Data • Data Modeling for the EDW • Enhancing and Adapting for Change to the Model • Ease of Monitoring, managing and optimizing processes (C) TeachDataVault.com
  • 26.
    Key: Productivity • Standardizedmodeling rules • Highly repeatable and learnable modeling technique • Can standardize load routines • Delta Driven process • Re-startable, consistent loading patterns. • Can standardize extract routines • Rapid build of new or revised Data Marts • Can be automated • RapidACE (www.rapidace.com) (C) Kent Graziano
  • 27.
    Key: Productivity • TheData Vault holds granular historical relationships. • Holds all history for all time, allowing any source system feeds to be reconstructed on-demand • Easy generation of Audit Trails for data lineage and compliance. • Data Mining can discover new relationships between elements • Patterns of change emerge from the historical pictures and linkages. • The Data Vault can be accessed by power-users (C) Kent Graziano
  • 28.
    Case in Point: Result of Productivity was: 2 people in 2 weeks merged 3 systems, built a full Data Vault EDW, 5 star schemas and 3 reports. These individuals generated: • 90% of the ETL code for moving the data set • 100% of the Staging Data Model • 75% of the finished EDW data Model • 75% of the star schema data model (C) TeachDataVault.com
  • 29.
    The Competing Bid? The competition bid this with 15 people and 3 months to completion, at a cost of $250k! (they bid a Very complex system) Actual total cost? $30k and 2 weeks! (C) TeachDataVault.com
  • 30.
    Other Benefits ofa Data Vault • Modeling it as a DV forces integration of the Business Keys upfront. • Good for organizational alignment. • An integrated data set with raw data extends it’s value beyond BI: • Source for data quality projects • Source for master data • Source for data mining • Source for Data as a Service (DaaS) in an SOA (Service Oriented Architecture). • Upfront Hub integration simplifies the data integration routines required to load data marts. • Helps divide the work a bit. • It is much easier to implement security on these granular pieces. • Granular, re-startable processes enable pin-point failure correction. • It is designed and optimized for real-time loading in its core architecture (without any tweaks or mods). (C) Kent Graziano
  • 31.
    Conclusion? Changing thedirection of the river takes less effort than stopping the flow of water (C) TeachDataVault.com
  • 32.
    The Experts Say… “The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” Bill Inmon “The Data Vault is foundationally strong and exceptionally scalable architecture.” Stephen Brobst “The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” Doug Laney
  • 33.
    More Notables… “This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.” Howard Dresner “[The Data Vault] captures a practical body of knowledge for data warehouse development which both agile and traditional practitioners will benefit from..” Scott Ambler
  • 34.
  • 35.
    Growing Adoption… • Thenumber of Data Vault users in the US surpassed 500 in 2010 and grows rapidly (http://danlinstedt.com/about/dv- customers/) (C) Kent Graziano
  • 36.
    In Review… • DataVault provides you with the tools you need to succeed in your DW/BI projects • Flexibility • Enabling rapid change on a massive scale without downstream impacts! • Scalability • Providing no foreseeable barrier to increased size and scope • Productivity • Enabling low complexity systems with high value output at a rapid pace (C) TeachDataVault.com
  • 37.
  • 38.
    Where To LearnMore The Technical Modeling Book: http://LearnDataVault.com On YouTube: http://www.youtube.com/LearnDataVault On Facebook: www.facebook.com/learndatavault Dan’s Blog: www.danlinstedt.com The Discussion Forums: http://LinkedIn.com – Data Vault Discussions World wide User Group (Free): http://dvusergroup.com The Business of Data Vault Modeling by Dan Linstedt, Kent Graziano, Hans Hultgren (available at www.lulu.com ) 38
  • 39.
    10/11/2011 (C) TeachDataVault.com 39
  • 40.
    Contact Information Kent Graziano Kent.graziano@att.net Want more Data Vault? Session # 05923: Introduction to Data Vault Modeling Thursday, 4:00 PM, Moscone South Rm 303