Data Vault Model &Methodology© Dan Linstedt, 2011-2012 all rights reserved1
AgendaIntroduction – why are you here?What is a Data Vault?  Where does it come from?Star Schema, 3nf, and Data Vault pros and cons AS AN EDW solution..When is a Data Vault a good fit?Benefits of Data Vault Modeling & Methodology<BREAK>When to NOT use a Data VaultFundamental Paradigm ShiftBusiness Keys & Business ProcessesTechnical ReviewQuery Performance (PIT & Bridge)What wasn’t covered in this presentation…2
A bit about me…3Author, Inventor, Speaker – and part time photographer…25+ years in the IT industryWorked in DoD, US Gov’t, Fortune 50, and so on…Find out more about the Data Vault:http://www.youtube.com/LearnDataVaulthttp://LearnDataVault.comFull profile on http://www.LinkedIn.com/dlinstedt
Why Are YOU Here?4Your Expectations?Your Questions?Your Background?Areas of Interest?Biggest question:What are the top 3 pains your current EDW / BI solution is experiencing?
What is it?Where did it come from? Defining the Data Vault Space5
Data Vault Time LineE.F. Codd invented relational modeling1976 Dr Peter ChenCreated E-R Diagramming1990 – Dan Linstedt Begins R&D on Data Vault ModelingChris Date and Hugh Darwen  Maintained and Refined ModelingMid 70’s AC Nielsen PopularizedDimension & Fact Terms19702000196019801990Late 80’s – Barry Devlin and Dr Kimball Release “Business Data Warehouse”Early 70’s Bill Inmon Began Discussing Data WarehousingMid 80’s Bill InmonPopularizes Data WarehousingMid 60’s Dimension & Fact Modeling  presented by General Mills and Dartmouth University2000 – Dan Linstedt releases first 5 articles on Data Vault ModelingMid – Late 80’s Dr Kimball Popularizes Star Schema6
Data Vault Modeling…Took 10 years of Research and Design, including TESTING to become flexible, consistent, and scalable7
What IS a Data Vault? (Business Definition)Data Vault ModelDetail orientedHistorical traceabilityUniquely linked set of normalized tablesSupports one or more functional areas of business8Data Vault Methodology
CMMI, Project Plan
Risk, Governance, Versioning
Peer Reviews, Release Cycles
Repeatable, Consistent, Optimized
Complete with Best Practices for BI/DWBusiness KeysSpan  / CrossLines of BusinessSalesContractsPlanningDeliveryFinanceOperationsProcurementFunctional Area
The Data Vault ModelThe Data Vault model is a data modeling approach		…so it fits into the family of modeling approaches:3rd Normal FormData VaultStar SchemaWhile 3rd Normal Formis optimal for Operational Systems		…andStar Schema is optimal for OLAP Delivery / Data Marts		…the Data Vault is optimal for the Data Warehouse (EDW)9
Supply Chain Analogy10Source SystemsData Vault(EDW)Data Marts
What Does One Look Like?Records a history of the interactionCustomerProductSatSatSatSatSatLinkCustomerProductF(x)F(x)F(x)SatSatSatSatOrderF(x)SatOrderElements:Hub
Link
Satellite11Hub = List of Unique Business KeysLink = List of Relationships, AssociationsSatellites = Descriptive Data
Colorized Perspective…Data Vault3rd NF & Star Schema(separation)Business KeysAssociationsDetailsHUBSatelliteThe Data Vault uniquely separates the Business Keys (Hubs) from the Associations (Links)  and both of these from the Detailsthat describe them and provide context (Satellites).  LINKSatellite(Colors Concept Originated By: Hans Hultgren)12
Star Schemas, 3NF, Data Vault:Pros & ConsDefining the Data Vault SpaceWhy NOT use Star Schemas as an EDW?Why NOT use 3NF as an EDW?Why NOT use Data Vault as a Data Delivery Model?13
Star Schema Pros/Cons as an EDWPROSGood for multi-dimensional analysisSubject oriented answersExcellent for aggregation pointsRapid development / deploymentGreat for some historical storageCONSNot cross-business functionalUse of junk / helper tablesTrouble with VLDWUnable to provide integrated enterprise informationCan’t handle ODS or exploration warehouse requirementsTrouble with data explosion in near-real-time environmentsTrouble with updates to type 2 dimension primary keysTrouble with late arriving data in dimensions to support real-time arriving transactionsNot granular enough information to support real-time data integration14
3nf Pros/Cons as an EDWPROSMany to many linkagesHandle lots of informationTightly integrated informationHighly structuredConducive to near-real time loadsRelatively easy to extendCONSTime driven PK issuesParent-child complexitiesCascading change impactsDifficult to loadNot conducive to BI toolsNot conducive to drill-downDifficult to architect for an enterpriseNot conducive to spiral/scope controlled implementationPhysical design usually doesn’t follow business processes15
Data Vault Pros/Cons as an EDWCONSNot conducive to OLAP processingRequires business analysis to be firmIntroduces many join operationsPROSSupports near-real time and batch feedsSupports functional business linkingExtensible / flexibleProvides rapid build / delivery of star schema’sSupports VLDB / VLDWDesigned for EDWSupports data mining and AIProvides granular detailIncrementally built16
Analogy: The Porsche, the SUV and the Big RigWhich would you use to win a race?Which would you use to move a house?Would you adapt the truck and enter a race with Porches and expect to win?17
A Quick Look at Methodology IssuesBusiness Rule Processing, Lack of Agility, and Future proofing your new solution18
EDW Architecture: Generation 119Enterprise BI SolutionSales(batch)Staging(EDW)StarSchemasComplex Business Rules #2FinanceConformed DimensionsJunk TablesHelper TablesFactless FactsStaging + HistoryComplexBusiness Rules+DependenciesContractsQuality routines
Cross-system dependencies
Source data filtering
In-process data manipulation
High risk of incorrect data aggregation
Larger system = increased impact
Often re-engineered at the SOURCE
History can be destroyed (completely re-computed)#1 Cause of BI Initiative Failure20Anyone?Re-EngineeringForEvery Change!Let’s take a look at one example…
Re-EngineeringBusinessRulesData Flow (Mapping)Current SourcesSalesCustomerSourceJoinFinanceCustomerTransactionsCustomerPurchasesIMPACT!!** NEW SYSTEM**21
Federated Star Schema Inhibiting AgilityData Mart 3HighEffort& CostData Mart 2Data Mart 1Changing and Adjusting conformed dimensions causes an exponential rise in the cost curve over timeRESULT: Business builds their own Data Marts!LowMaintenanceCycle BeginsTimeStart22The main driver for this is the maintenance costs, and re-engineering of the existing system which occurs for each new “federated/conformed” effort.  This increases delivery time, difficulty, and maintenance costs.
EDW Architecture: Generation 2SOAEnterprise BI SolutionStarSchemas(real-time)Sales(batch)EDW(Data Vault)(batch)StagingErrorMartsFinanceContractsComplexBusiness RulesReportCollectionsUnstructuredDataFUNDAMENTAL GOALSRepeatable
Consistent
Fault-tolerant
Supports phased release
Scalable
AuditableThe business rules are moved closer to the business, improving IT reaction time, reducing cost and minimizing impacts to the enterprise data warehouse (EDW)23
NO Re-EngineeringCurrent SourcesData VaultSalesStageCopyHubCustomerCustomerFinanceStageCopyLink TransactionCustomerTransactionsHubAcctHubProductCustomerPurchasesStageCopyNO IMPACT!!!NO RE-ENGINEERING!** NEW SYSTEM**IMPACT!!24
Progressive Agility and  Responsiveness of ITHighEffort& CostLowMaintenanceCycle BeginsTimeStart25Foundational Base BuiltNew Functional Areas AddedInitial DV Build OutRe-Engineering does NOT occur with a Data Vault Model.  This keeps costs down, and maintenance easy.  It also reduces complexity of the existing architecture.
What’s Wrong With the OLD METHODOLOGY?Using Star Schemas as your Data Warehouse leads to….26
DimensionitisDimensionItis: Incurable Disease, the symptoms are the creation of new dimensions because the cost and time to conform existing dimensions with new attributes rises beyond the business ability to pay…27…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...Business Says: Avoid the re-engineering costs, just “copy” the dimensions and create a new one for OUR department…  What can it hurt?…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………... …………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...
Deformed DimensionsDeformity: The URGE to continue “slamming data” into an existing conformed dimension until it simply cannot sustain any further changes, the result: a deformed dimension and a HUGE re-engineering cost / nightmare.28Business Wants a Change!Business said: Just add that to the existing Dimension, it will be easy right?Business ChangeBusiness ChangeV1Business ChangeV2…………………………………… ………………… ………………… ………………… ………………… ………………… ………………… ComplexLoadV3………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… …………………………………… ComplexLoadComplexLoad90 days, $125k120 days, $200kRe-Engineering  the Load Processes EACH TIME!180 days, $275k
Silo Building / IT Non-AgilityBusiness Says: Take the dimension you have, copy it, and change it… This should be cheap, and easy right?29SALESBusiness ChangeTo Modify Existing Star = 180 days, $275kWe built our own because IT costs too much…First StarFINANCECustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_TypeCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_TypeWe built our own because IT took too long…Customer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneFact_ABCFact_DEFFact_PDQFact_MYFACTMARKETINGCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_TypeCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_TypeWe built our own because we needed customized dimension data…
Why is Data Vault a Good Fit?30
What are the top businessobstacles in your data warehousetoday?31
Poor AgilityInconsistent Answer SetsNeeds AccountabilityDemands AuditabilityDesires IT TransparencyAre you feeling Pinned Down?32
What are the top technologyobstacles in yourdata warehousetoday?33
Complex SystemsReal-Time Data ArrivalUnimaginable Data GrowthMaster Data AlignmentBad Data QualityLate Delivery/Over BudgetAre your systems CRUMBLING?34
YugoExisting SolutionsWorlds Worst CarHave lead you down a painful path…35
Projects Cancelled & RestartedRe-engineering required to absorb new systemsComplexity drives maintenance cost Sky highDisparate Silo Solutions provide inaccurate answers!Severe lack of Accountability36
How can youovercomethese obstacles?There must be a better way…There IS a better way!37
It’s Called the Data Vault Model andMethodology38
What is it?It’s a simpleEasy-to-usePlanTo build your valuableData Warehouse!39
What’s the Value?Painless Auditability Understandable StandardsRapid AdaptabilitySimple Build-outUncomplicated DesignEffortless ScalabilityPursue Your Goals!40
Why Bother With Something New?Old Chinese proverb: 'Unless you change direction, you're apt to end up where you're headed.'41
What Are the Issues?This is NOT what you want happening to your project!Business…Changes FrequentlyIT….Needs AccountabilityTakes Too LongDemands AuditabilityIs Over-budgetHas No VisibilityToo ComplexWants More ControlCan’t Sustain GrowthTHE GAP!!42
What Are the Foundational Keys?FlexibilityScalabilityProductivity43
Key: FlexibilityEnabling rapid change on a massive scale without downstream impacts!44
Key: ScalabilityProviding no foreseeable barrier to increased size and scopePeople, Process, & Architecture!45
Key: ProductivityEnabling low complexity systems with high value output at a rapid pace46
< BREAK TIME >47
How does it work?Bringing the Data Vault to Your Project48
Key: FlexibilityNo Re-Engineering!Addingnew components to the EDW has NEAR ZERO impact to:Existing Loading Processes
Existing Data Model
Existing Reporting & BI Functions
Existing Source Systems
Existing Star Schemas and Data Marts49
Case In Point:Result of flexibility of the Data Vault Model allowed them to merge 3 companies in 90 days – that is ALL systems, ALL DATA!50
Key: Scalability in ArchitectureScalingis easy, its based on the following principlesHub and spoke design
MPP Shared-Nothing Architecture
Scale Free Networks51
Case In Point:Result of scalability was to produce a Data Vault model that scaled to 3 Petabytes in size, and is still growing today!52
Key: Scalability in Team SizeYou should be able to SCALE your TEAM as well!With the Data Vault methodology, you can:Scale your team when desired, at different points in the project!53
Case In Point:(Dutch Tax Authority)Result of scalability was to increase ETL developers for each new source system, and reassign them when the system was completely loaded to the Data Vault54
Key: ProductivityIncreasing Productivity requires a reduction in complexity.The Data Vault Model simplifies all of the following:ETL Loading Routines
Real-Time Ingestion of Data
Data Modeling for the EDW
Enhancing and Adapting for Change to the Model
Ease of Monitoring, managing and optimizing processes55
Case in Point:Result of Productivity was: 2 people in 2 weeks merged 3 systems, built a full Data Vault EDW, 5 star schemas and 3 reports.  These individuals generated:90% of the ETL code for moving the data set
100% of the Staging Data Model
75% of the finished EDW data Model
75% of the star schema data model56
The Competing Bid?The competition bid this with 15 people and 3 months to completion, at a cost of $250k! (they bid a Very complex system)Our total cost?  $30k and 2 weeks!57

Data Vault Overview

  • 1.
    Data Vault Model&Methodology© Dan Linstedt, 2011-2012 all rights reserved1
  • 2.
    AgendaIntroduction – whyare you here?What is a Data Vault? Where does it come from?Star Schema, 3nf, and Data Vault pros and cons AS AN EDW solution..When is a Data Vault a good fit?Benefits of Data Vault Modeling & Methodology<BREAK>When to NOT use a Data VaultFundamental Paradigm ShiftBusiness Keys & Business ProcessesTechnical ReviewQuery Performance (PIT & Bridge)What wasn’t covered in this presentation…2
  • 3.
    A bit aboutme…3Author, Inventor, Speaker – and part time photographer…25+ years in the IT industryWorked in DoD, US Gov’t, Fortune 50, and so on…Find out more about the Data Vault:http://www.youtube.com/LearnDataVaulthttp://LearnDataVault.comFull profile on http://www.LinkedIn.com/dlinstedt
  • 4.
    Why Are YOUHere?4Your Expectations?Your Questions?Your Background?Areas of Interest?Biggest question:What are the top 3 pains your current EDW / BI solution is experiencing?
  • 5.
    What is it?Wheredid it come from? Defining the Data Vault Space5
  • 6.
    Data Vault TimeLineE.F. Codd invented relational modeling1976 Dr Peter ChenCreated E-R Diagramming1990 – Dan Linstedt Begins R&D on Data Vault ModelingChris Date and Hugh Darwen Maintained and Refined ModelingMid 70’s AC Nielsen PopularizedDimension & Fact Terms19702000196019801990Late 80’s – Barry Devlin and Dr Kimball Release “Business Data Warehouse”Early 70’s Bill Inmon Began Discussing Data WarehousingMid 80’s Bill InmonPopularizes Data WarehousingMid 60’s Dimension & Fact Modeling presented by General Mills and Dartmouth University2000 – Dan Linstedt releases first 5 articles on Data Vault ModelingMid – Late 80’s Dr Kimball Popularizes Star Schema6
  • 7.
    Data Vault Modeling…Took10 years of Research and Design, including TESTING to become flexible, consistent, and scalable7
  • 8.
    What IS aData Vault? (Business Definition)Data Vault ModelDetail orientedHistorical traceabilityUniquely linked set of normalized tablesSupports one or more functional areas of business8Data Vault Methodology
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
    Complete with BestPractices for BI/DWBusiness KeysSpan / CrossLines of BusinessSalesContractsPlanningDeliveryFinanceOperationsProcurementFunctional Area
  • 14.
    The Data VaultModelThe Data Vault model is a data modeling approach …so it fits into the family of modeling approaches:3rd Normal FormData VaultStar SchemaWhile 3rd Normal Formis optimal for Operational Systems …andStar Schema is optimal for OLAP Delivery / Data Marts …the Data Vault is optimal for the Data Warehouse (EDW)9
  • 15.
    Supply Chain Analogy10SourceSystemsData Vault(EDW)Data Marts
  • 16.
    What Does OneLook Like?Records a history of the interactionCustomerProductSatSatSatSatSatLinkCustomerProductF(x)F(x)F(x)SatSatSatSatOrderF(x)SatOrderElements:Hub
  • 17.
  • 18.
    Satellite11Hub = Listof Unique Business KeysLink = List of Relationships, AssociationsSatellites = Descriptive Data
  • 19.
    Colorized Perspective…Data Vault3rdNF & Star Schema(separation)Business KeysAssociationsDetailsHUBSatelliteThe Data Vault uniquely separates the Business Keys (Hubs) from the Associations (Links) and both of these from the Detailsthat describe them and provide context (Satellites). LINKSatellite(Colors Concept Originated By: Hans Hultgren)12
  • 20.
    Star Schemas, 3NF,Data Vault:Pros & ConsDefining the Data Vault SpaceWhy NOT use Star Schemas as an EDW?Why NOT use 3NF as an EDW?Why NOT use Data Vault as a Data Delivery Model?13
  • 21.
    Star Schema Pros/Consas an EDWPROSGood for multi-dimensional analysisSubject oriented answersExcellent for aggregation pointsRapid development / deploymentGreat for some historical storageCONSNot cross-business functionalUse of junk / helper tablesTrouble with VLDWUnable to provide integrated enterprise informationCan’t handle ODS or exploration warehouse requirementsTrouble with data explosion in near-real-time environmentsTrouble with updates to type 2 dimension primary keysTrouble with late arriving data in dimensions to support real-time arriving transactionsNot granular enough information to support real-time data integration14
  • 22.
    3nf Pros/Cons asan EDWPROSMany to many linkagesHandle lots of informationTightly integrated informationHighly structuredConducive to near-real time loadsRelatively easy to extendCONSTime driven PK issuesParent-child complexitiesCascading change impactsDifficult to loadNot conducive to BI toolsNot conducive to drill-downDifficult to architect for an enterpriseNot conducive to spiral/scope controlled implementationPhysical design usually doesn’t follow business processes15
  • 23.
    Data Vault Pros/Consas an EDWCONSNot conducive to OLAP processingRequires business analysis to be firmIntroduces many join operationsPROSSupports near-real time and batch feedsSupports functional business linkingExtensible / flexibleProvides rapid build / delivery of star schema’sSupports VLDB / VLDWDesigned for EDWSupports data mining and AIProvides granular detailIncrementally built16
  • 24.
    Analogy: The Porsche,the SUV and the Big RigWhich would you use to win a race?Which would you use to move a house?Would you adapt the truck and enter a race with Porches and expect to win?17
  • 25.
    A Quick Lookat Methodology IssuesBusiness Rule Processing, Lack of Agility, and Future proofing your new solution18
  • 26.
    EDW Architecture: Generation119Enterprise BI SolutionSales(batch)Staging(EDW)StarSchemasComplex Business Rules #2FinanceConformed DimensionsJunk TablesHelper TablesFactless FactsStaging + HistoryComplexBusiness Rules+DependenciesContractsQuality routines
  • 27.
  • 28.
  • 29.
  • 30.
    High risk ofincorrect data aggregation
  • 31.
    Larger system =increased impact
  • 32.
  • 33.
    History can bedestroyed (completely re-computed)#1 Cause of BI Initiative Failure20Anyone?Re-EngineeringForEvery Change!Let’s take a look at one example…
  • 34.
    Re-EngineeringBusinessRulesData Flow (Mapping)CurrentSourcesSalesCustomerSourceJoinFinanceCustomerTransactionsCustomerPurchasesIMPACT!!** NEW SYSTEM**21
  • 35.
    Federated Star SchemaInhibiting AgilityData Mart 3HighEffort& CostData Mart 2Data Mart 1Changing and Adjusting conformed dimensions causes an exponential rise in the cost curve over timeRESULT: Business builds their own Data Marts!LowMaintenanceCycle BeginsTimeStart22The main driver for this is the maintenance costs, and re-engineering of the existing system which occurs for each new “federated/conformed” effort. This increases delivery time, difficulty, and maintenance costs.
  • 36.
    EDW Architecture: Generation2SOAEnterprise BI SolutionStarSchemas(real-time)Sales(batch)EDW(Data Vault)(batch)StagingErrorMartsFinanceContractsComplexBusiness RulesReportCollectionsUnstructuredDataFUNDAMENTAL GOALSRepeatable
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
    AuditableThe business rulesare moved closer to the business, improving IT reaction time, reducing cost and minimizing impacts to the enterprise data warehouse (EDW)23
  • 42.
    NO Re-EngineeringCurrent SourcesDataVaultSalesStageCopyHubCustomerCustomerFinanceStageCopyLink TransactionCustomerTransactionsHubAcctHubProductCustomerPurchasesStageCopyNO IMPACT!!!NO RE-ENGINEERING!** NEW SYSTEM**IMPACT!!24
  • 43.
    Progressive Agility and Responsiveness of ITHighEffort& CostLowMaintenanceCycle BeginsTimeStart25Foundational Base BuiltNew Functional Areas AddedInitial DV Build OutRe-Engineering does NOT occur with a Data Vault Model. This keeps costs down, and maintenance easy. It also reduces complexity of the existing architecture.
  • 44.
    What’s Wrong Withthe OLD METHODOLOGY?Using Star Schemas as your Data Warehouse leads to….26
  • 45.
    DimensionitisDimensionItis: Incurable Disease,the symptoms are the creation of new dimensions because the cost and time to conform existing dimensions with new attributes rises beyond the business ability to pay…27…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...Business Says: Avoid the re-engineering costs, just “copy” the dimensions and create a new one for OUR department… What can it hurt?…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………... …………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...
  • 46.
    Deformed DimensionsDeformity: TheURGE to continue “slamming data” into an existing conformed dimension until it simply cannot sustain any further changes, the result: a deformed dimension and a HUGE re-engineering cost / nightmare.28Business Wants a Change!Business said: Just add that to the existing Dimension, it will be easy right?Business ChangeBusiness ChangeV1Business ChangeV2…………………………………… ………………… ………………… ………………… ………………… ………………… ………………… ComplexLoadV3………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… …………………………………… ComplexLoadComplexLoad90 days, $125k120 days, $200kRe-Engineering the Load Processes EACH TIME!180 days, $275k
  • 47.
    Silo Building /IT Non-AgilityBusiness Says: Take the dimension you have, copy it, and change it… This should be cheap, and easy right?29SALESBusiness ChangeTo Modify Existing Star = 180 days, $275kWe built our own because IT costs too much…First StarFINANCECustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_TypeCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_TypeWe built our own because IT took too long…Customer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneFact_ABCFact_DEFFact_PDQFact_MYFACTMARKETINGCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_TypeCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_TypeWe built our own because we needed customized dimension data…
  • 48.
    Why is DataVault a Good Fit?30
  • 49.
    What are thetop businessobstacles in your data warehousetoday?31
  • 50.
    Poor AgilityInconsistent AnswerSetsNeeds AccountabilityDemands AuditabilityDesires IT TransparencyAre you feeling Pinned Down?32
  • 51.
    What are thetop technologyobstacles in yourdata warehousetoday?33
  • 52.
    Complex SystemsReal-Time DataArrivalUnimaginable Data GrowthMaster Data AlignmentBad Data QualityLate Delivery/Over BudgetAre your systems CRUMBLING?34
  • 53.
    YugoExisting SolutionsWorlds WorstCarHave lead you down a painful path…35
  • 54.
    Projects Cancelled &RestartedRe-engineering required to absorb new systemsComplexity drives maintenance cost Sky highDisparate Silo Solutions provide inaccurate answers!Severe lack of Accountability36
  • 55.
    How can youovercometheseobstacles?There must be a better way…There IS a better way!37
  • 56.
    It’s Called theData Vault Model andMethodology38
  • 57.
    What is it?It’sa simpleEasy-to-usePlanTo build your valuableData Warehouse!39
  • 58.
    What’s the Value?PainlessAuditability Understandable StandardsRapid AdaptabilitySimple Build-outUncomplicated DesignEffortless ScalabilityPursue Your Goals!40
  • 59.
    Why Bother WithSomething New?Old Chinese proverb: 'Unless you change direction, you're apt to end up where you're headed.'41
  • 60.
    What Are theIssues?This is NOT what you want happening to your project!Business…Changes FrequentlyIT….Needs AccountabilityTakes Too LongDemands AuditabilityIs Over-budgetHas No VisibilityToo ComplexWants More ControlCan’t Sustain GrowthTHE GAP!!42
  • 61.
    What Are theFoundational Keys?FlexibilityScalabilityProductivity43
  • 62.
    Key: FlexibilityEnabling rapidchange on a massive scale without downstream impacts!44
  • 63.
    Key: ScalabilityProviding noforeseeable barrier to increased size and scopePeople, Process, & Architecture!45
  • 64.
    Key: ProductivityEnabling lowcomplexity systems with high value output at a rapid pace46
  • 65.
  • 66.
    How does itwork?Bringing the Data Vault to Your Project48
  • 67.
    Key: FlexibilityNo Re-Engineering!Addingnewcomponents to the EDW has NEAR ZERO impact to:Existing Loading Processes
  • 68.
  • 69.
  • 70.
  • 71.
    Existing Star Schemasand Data Marts49
  • 72.
    Case In Point:Resultof flexibility of the Data Vault Model allowed them to merge 3 companies in 90 days – that is ALL systems, ALL DATA!50
  • 73.
    Key: Scalability inArchitectureScalingis easy, its based on the following principlesHub and spoke design
  • 74.
  • 75.
  • 76.
    Case In Point:Resultof scalability was to produce a Data Vault model that scaled to 3 Petabytes in size, and is still growing today!52
  • 77.
    Key: Scalability inTeam SizeYou should be able to SCALE your TEAM as well!With the Data Vault methodology, you can:Scale your team when desired, at different points in the project!53
  • 78.
    Case In Point:(DutchTax Authority)Result of scalability was to increase ETL developers for each new source system, and reassign them when the system was completely loaded to the Data Vault54
  • 79.
    Key: ProductivityIncreasing Productivityrequires a reduction in complexity.The Data Vault Model simplifies all of the following:ETL Loading Routines
  • 80.
  • 81.
  • 82.
    Enhancing and Adaptingfor Change to the Model
  • 83.
    Ease of Monitoring,managing and optimizing processes55
  • 84.
    Case in Point:Resultof Productivity was: 2 people in 2 weeks merged 3 systems, built a full Data Vault EDW, 5 star schemas and 3 reports. These individuals generated:90% of the ETL code for moving the data set
  • 85.
    100% of theStaging Data Model
  • 86.
    75% of thefinished EDW data Model
  • 87.
    75% of thestar schema data model56
  • 88.
    The Competing Bid?Thecompetition bid this with 15 people and 3 months to completion, at a cost of $250k! (they bid a Very complex system)Our total cost? $30k and 2 weeks!57

Editor's Notes

  • #32 Before we begin exploring how the Data Vault can help you, or even defining what a Data Vault is, we need to first understand some of the business problems that may be causing you heartburn on a daily basis.
  • #33 Everything from poor agility to a lack of IT Transparency plague todays’ data warehouses. I can’t begin to tell you how much pain these businesses are suffering as a result of these problems. Inconsistent Answer Sets, Lack of accountability, inadequate auditablitiy all play a part in data warehouses that are currently on the brink of falling apart.But it’s not just business issues, there are technical ones to cope with as well.
  • #34 There are always technology obstacles that we face in any data warehousing project. So the question is: what kinds of problems have you seen in your journey? Do they haunt you today?
  • #35 Complexity drives high cost, resulting in unnecessary late delivery schedules and unsustainable business logic in the integration channels.Real-time data is flooding our data warehouses, has your architecture fallen down on the job?Unstructured data and legal requirements for auditability are bringing huge data volumes.Master Data Alignment is missing from our data warehouses, as they are split in disparate systems all over the world.Bad data quality is covered up through the transformation layers on the way IN to your EDW.Data warehouses grow so large and become so difficult to maintain that IT teams are often delivering late, and beyond original costs.The foundations of your data warehouse are probably crumbling under sheer weight and pressure.
  • #36 Disparate data marts, unmatched answer sets, geographical problems, and worse…Projects are under fire from a number of areas. Let’s take a look at what happenswhen a data warehouse project reaches the brick wall head-on, at 90 miles an hour.
  • #37 I think this says it all…. Projects cancelled and restarted, Re-Engineering required to absorb changes, high complexity making it difficult to upgrade, change, and keep up at the speed of business. Disparate silo solutions screaming for consolidation, and of course – a lack of accountability on BOTH sides of the fence…All signs of an ailing BI solution on the brink of being shut-down.
  • #38 We have got to keep focus on the prize. Business still wants a BI systemBacked by an enterprise EDW.IT still wants a manageable system that will grow and change without major re-engineering.There is a better way, and I can help you with it.
  • #39 The Data Vault model is really just another name for “Common foundational architecture and design”.It’s based on 10 years of Research and design work, followed by10 years of implementation best practices.It is architected to help you solve the problems!
  • #40 Put quite simply: It’s an easy-to-use architecture and plan, a guide-bookFor building a repeatable, consistent, and scalable data warehouse system.So just what is the value of the Data Vault?
  • #41 The Data Vault model and methodology provide:Painless AuditabilityUnderstandable standardsRapid AdaptabilitySimple Build-outUncomplicated DesignAnd Effortless ScalabilityGo after your goals, build a wildly successful data warehouse just like I have.
  • #55 Beginning: 5 advanced ETLBy the 1st month, they 5 advanced, and 15 basic/introBy the 6th month, they 5 advanced, but 50 basicBy the end of the 8th month they went to production with 10 MF sourcesAnd their team size was: 12 people (5 advanced, 7 basic – for support).
  • #86 You’re not the first, nor will you be the last one to use it.Some of the worlds biggest companies are implementing Data Vaults.From Diamler Motors to Lockheed Martin, to the Department of Defense.JPMorgan and Chase used the Data Vault model to merge 3 companies in 90 days!