Data Vault Overview

7,513 views

Published on

Data Vault Modeling and Methodology introduction that I provided to a Montreal event in September 2011. It covers an introduction and overview of the Data Vault components for Business Intelligence and Data Warehousing. I am Dan Linstedt, the author and inventor of Data Vault Modeling and methodology.

If you use the images anywhere in your presentations, please credit http://LearnDataVault.com as the source (me).

Thank-you kindly,
Daniel Linstedt

Published in: Business, Technology
0 Comments
13 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
7,513
On SlideShare
0
From Embeds
0
Number of Embeds
172
Actions
Shares
0
Downloads
510
Comments
0
Likes
13
Embeds 0
No embeds

No notes for slide
  • Before we begin exploring how the Data Vault can help you, or even defining what a Data Vault is, we need to first understand some of the business problems that may be causing you heartburn on a daily basis.
  • Everything from poor agility to a lack of IT Transparency plague todays’ data warehouses. I can’t begin to tell you how much pain these businesses are suffering as a result of these problems. Inconsistent Answer Sets, Lack of accountability, inadequate auditablitiy all play a part in data warehouses that are currently on the brink of falling apart.But it’s not just business issues, there are technical ones to cope with as well.
  • There are always technology obstacles that we face in any data warehousing project. So the question is: what kinds of problems have you seen in your journey? Do they haunt you today?
  • Complexity drives high cost, resulting in unnecessary late delivery schedules and unsustainable business logic in the integration channels.Real-time data is flooding our data warehouses, has your architecture fallen down on the job?Unstructured data and legal requirements for auditability are bringing huge data volumes.Master Data Alignment is missing from our data warehouses, as they are split in disparate systems all over the world.Bad data quality is covered up through the transformation layers on the way IN to your EDW.Data warehouses grow so large and become so difficult to maintain that IT teams are often delivering late, and beyond original costs.The foundations of your data warehouse are probably crumbling under sheer weight and pressure.
  • Disparate data marts, unmatched answer sets, geographical problems, and worse…Projects are under fire from a number of areas. Let’s take a look at what happenswhen a data warehouse project reaches the brick wall head-on, at 90 miles an hour.
  • I think this says it all…. Projects cancelled and restarted, Re-Engineering required to absorb changes, high complexity making it difficult to upgrade, change, and keep up at the speed of business. Disparate silo solutions screaming for consolidation, and of course – a lack of accountability on BOTH sides of the fence…All signs of an ailing BI solution on the brink of being shut-down.
  • We have got to keep focus on the prize. Business still wants a BI systemBacked by an enterprise EDW.IT still wants a manageable system that will grow and change without major re-engineering.There is a better way, and I can help you with it.
  • The Data Vault model is really just another name for “Common foundational architecture and design”.It’s based on 10 years of Research and design work, followed by10 years of implementation best practices.It is architected to help you solve the problems!
  • Put quite simply: It’s an easy-to-use architecture and plan, a guide-bookFor building a repeatable, consistent, and scalable data warehouse system.So just what is the value of the Data Vault?
  • The Data Vault model and methodology provide:Painless AuditabilityUnderstandable standardsRapid AdaptabilitySimple Build-outUncomplicated DesignAnd Effortless ScalabilityGo after your goals, build a wildly successful data warehouse just like I have.
  • Beginning: 5 advanced ETLBy the 1st month, they 5 advanced, and 15 basic/introBy the 6th month, they 5 advanced, but 50 basicBy the end of the 8th month they went to production with 10 MF sourcesAnd their team size was: 12 people (5 advanced, 7 basic – for support).
  • You’re not the first, nor will you be the last one to use it.Some of the worlds biggest companies are implementing Data Vaults.From Diamler Motors to Lockheed Martin, to the Department of Defense.JPMorgan and Chase used the Data Vault model to merge 3 companies in 90 days!
  • Data Vault Overview

    1. 1. Data Vault Model &Methodology<br />© Dan Linstedt, 2011-2012 all rights reserved<br />1<br />
    2. 2. Agenda<br />Introduction – why are you here?<br />What is a Data Vault? Where does it come from?<br />Star Schema, 3nf, and Data Vault pros and cons AS AN EDW solution..<br />When is a Data Vault a good fit?<br />Benefits of Data Vault Modeling & Methodology<br /><BREAK><br />When to NOT use a Data Vault<br />Fundamental Paradigm Shift<br />Business Keys & Business Processes<br />Technical Review<br />Query Performance (PIT & Bridge)<br />What wasn’t covered in this presentation…<br />2<br />
    3. 3. A bit about me…<br />3<br />Author, Inventor, Speaker – and part time photographer…<br />25+ years in the IT industry<br />Worked in DoD, US Gov’t, Fortune 50, and so on…<br />Find out more about the Data Vault:<br />http://www.youtube.com/LearnDataVault<br />http://LearnDataVault.com<br />Full profile on http://www.LinkedIn.com/dlinstedt<br />
    4. 4. Why Are YOU Here?<br />4<br />Your Expectations?<br />Your Questions?<br />Your Background?<br />Areas of Interest?<br />Biggest question:<br />What are the top 3 pains your current EDW / BI solution is experiencing?<br />
    5. 5. What is it?Where did it come from? <br />Defining the Data Vault Space<br />5<br />
    6. 6. Data Vault Time Line<br />E.F. Codd invented relational modeling<br />1976 Dr Peter Chen<br />Created E-R Diagramming<br />1990 – Dan Linstedt Begins R&D on Data Vault Modeling<br />Chris Date and Hugh Darwen Maintained and Refined Modeling<br />Mid 70’s AC Nielsen <br />Popularized<br />Dimension & Fact Terms<br />1970<br />2000<br />1960<br />1980<br />1990<br />Late 80’s – Barry Devlin and Dr Kimball Release “Business Data Warehouse”<br />Early 70’s Bill Inmon Began Discussing Data Warehousing<br />Mid 80’s Bill Inmon<br />Popularizes Data Warehousing<br />Mid 60’s Dimension & Fact Modeling presented by General Mills and Dartmouth University<br />2000 – Dan Linstedt releases first 5 articles on Data Vault Modeling<br />Mid – Late 80’s Dr Kimball <br />Popularizes Star Schema<br />6<br />
    7. 7. Data Vault Modeling…<br />Took 10 years of Research and Design, including TESTING <br />to become <br />flexible, consistent, and scalable<br />7<br />
    8. 8. What IS a Data Vault? (Business Definition)<br />Data Vault Model<br />Detail oriented<br />Historical traceability<br />Uniquely linked set of normalized tables<br />Supports one or more functional areas of business<br />8<br /><ul><li>Data Vault Methodology
    9. 9. CMMI, Project Plan
    10. 10. Risk, Governance, Versioning
    11. 11. Peer Reviews, Release Cycles
    12. 12. Repeatable, Consistent, Optimized
    13. 13. Complete with Best Practices for BI/DW</li></ul>Business Keys<br />Span / Cross<br />Lines of Business<br />Sales<br />Contracts<br />Planning<br />Delivery<br />Finance<br />Operations<br />Procurement<br />Functional Area<br />
    14. 14. The Data Vault Model<br />The Data Vault model is a data modeling approach<br /> …so it fits into the family of modeling approaches:<br />3rd Normal Form<br />Data Vault<br />Star Schema<br /><ul><li>While 3rd Normal Formis optimal for Operational Systems</li></ul> …andStar Schema is optimal for OLAP Delivery / Data Marts<br /> …the Data Vault is optimal for the Data Warehouse (EDW)<br />9<br />
    15. 15. Supply Chain Analogy<br />10<br />Source <br />Systems<br />Data Vault<br />(EDW)<br />Data Marts<br />
    16. 16. What Does One Look Like?<br />Records a history of the interaction<br />Customer<br />Product<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Link<br />Customer<br />Product<br />F(x)<br />F(x)<br />F(x)<br />Sat<br />Sat<br />Sat<br />Sat<br />Order<br />F(x)<br />Sat<br />Order<br />Elements:<br /><ul><li>Hub
    17. 17. Link
    18. 18. Satellite</li></ul>11<br />Hub = List of Unique Business Keys<br />Link = List of Relationships, Associations<br />Satellites = Descriptive Data<br />
    19. 19. Colorized Perspective…<br />Data Vault<br />3rd NF & Star Schema<br />(separation)<br />Business Keys<br />Associations<br />Details<br />HUB<br />Satellite<br />The Data Vault uniquely separates the Business Keys (Hubs) from the Associations (Links) and both of these from the Detailsthat describe them and provide context (Satellites). <br />LINK<br />Satellite<br />(Colors Concept Originated By: Hans Hultgren)<br />12<br />
    20. 20. Star Schemas, 3NF, Data Vault:Pros & Cons<br />Defining the Data Vault Space<br />Why NOT use Star Schemas as an EDW?<br />Why NOT use 3NF as an EDW?<br />Why NOT use Data Vault as a Data Delivery Model?<br />13<br />
    21. 21. Star Schema Pros/Cons as an EDW<br />PROS<br />Good for multi-dimensional analysis<br />Subject oriented answers<br />Excellent for aggregation points<br />Rapid development / deployment<br />Great for some historical storage<br />CONS<br />Not cross-business functional<br />Use of junk / helper tables<br />Trouble with VLDW<br />Unable to provide integrated enterprise information<br />Can’t handle ODS or exploration warehouse requirements<br />Trouble with data explosion in near-real-time environments<br />Trouble with updates to type 2 dimension primary keys<br />Trouble with late arriving data in dimensions to support real-time arriving transactions<br />Not granular enough information to support real-time data integration<br />14<br />
    22. 22. 3nf Pros/Cons as an EDW<br />PROS<br />Many to many linkages<br />Handle lots of information<br />Tightly integrated information<br />Highly structured<br />Conducive to near-real time loads<br />Relatively easy to extend<br />CONS<br />Time driven PK issues<br />Parent-child complexities<br />Cascading change impacts<br />Difficult to load<br />Not conducive to BI tools<br />Not conducive to drill-down<br />Difficult to architect for an enterprise<br />Not conducive to spiral/scope controlled implementation<br />Physical design usually doesn’t follow business processes<br />15<br />
    23. 23. Data Vault Pros/Cons as an EDW<br />CONS<br />Not conducive to OLAP processing<br />Requires business analysis to be firm<br />Introduces many join operations<br />PROS<br />Supports near-real time and batch feeds<br />Supports functional business linking<br />Extensible / flexible<br />Provides rapid build / delivery of star schema’s<br />Supports VLDB / VLDW<br />Designed for EDW<br />Supports data mining and AI<br />Provides granular detail<br />Incrementally built<br />16<br />
    24. 24. Analogy: The Porsche, the SUV and the Big Rig<br />Which would you use to win a race?<br />Which would you use to move a house?<br />Would you adapt the truck and enter a race with Porches and expect to win?<br />17<br />
    25. 25. A Quick Look at Methodology Issues<br />Business Rule Processing, Lack of Agility, and <br />Future proofing your new solution<br />18<br />
    26. 26. EDW Architecture: Generation 1<br />19<br />Enterprise BI Solution<br />Sales<br />(batch)<br />Staging<br />(EDW)<br />Star<br />Schemas<br />Complex <br />Business <br />Rules #2<br />Finance<br />Conformed Dimensions<br />Junk Tables<br />Helper Tables<br />Factless Facts<br />Staging + History<br />Complex<br />Business <br />Rules<br />+Dependencies<br />Contracts<br /><ul><li>Quality routines
    27. 27. Cross-system dependencies
    28. 28. Source data filtering
    29. 29. In-process data manipulation
    30. 30. High risk of incorrect data aggregation
    31. 31. Larger system = increased impact
    32. 32. Often re-engineered at the SOURCE
    33. 33. History can be destroyed (completely re-computed)</li></li></ul><li>#1 Cause of BI Initiative Failure<br />20<br />Anyone?<br />Re-Engineering<br />For<br />Every Change!<br />Let’s take a look at one example…<br />
    34. 34. Re-Engineering<br />Business<br />Rules<br />Data Flow (Mapping)<br />Current Sources<br />Sales<br />Customer<br />Source<br />Join<br />Finance<br />Customer<br />Transactions<br />Customer<br />Purchases<br />IMPACT!!<br />** NEW SYSTEM**<br />21<br />
    35. 35. Federated Star Schema Inhibiting Agility<br />Data Mart 3<br />High<br />Effort<br />& Cost<br />Data Mart 2<br />Data Mart 1<br />Changing and Adjusting conformed dimensions causes an exponential rise in the cost curve over time<br />RESULT: Business builds their own Data Marts!<br />Low<br />Maintenance<br />Cycle Begins<br />Time<br />Start<br />22<br />The main driver for this is the maintenance costs, and re-engineering of the existing system which occurs for each new “federated/conformed” effort. This increases delivery time, difficulty, and maintenance costs.<br />
    36. 36. EDW Architecture: Generation 2<br />SOA<br />Enterprise BI Solution<br />Star<br />Schemas<br />(real-time)<br />Sales<br />(batch)<br />EDW<br />(Data Vault)<br />(batch)<br />Staging<br />Error<br />Marts<br />Finance<br />Contracts<br />Complex<br />Business <br />Rules<br />Report<br />Collections<br />Unstructured<br />Data<br />FUNDAMENTAL GOALS<br /><ul><li>Repeatable
    37. 37. Consistent
    38. 38. Fault-tolerant
    39. 39. Supports phased release
    40. 40. Scalable
    41. 41. Auditable</li></ul>The business rules are moved closer to the business, improving IT reaction time, reducing cost and minimizing impacts to the enterprise data warehouse (EDW)<br />23<br />
    42. 42. NO Re-Engineering<br />Current Sources<br />Data Vault<br />Sales<br />Stage<br />Copy<br />Hub<br />Customer<br />Customer<br />Finance<br />Stage<br />Copy<br />Link Transaction<br />Customer<br />Transactions<br />Hub<br />Acct<br />Hub<br />Product<br />Customer<br />Purchases<br />Stage<br />Copy<br />NO IMPACT!!!<br />NO RE-ENGINEERING!<br />** NEW SYSTEM**<br />IMPACT!!<br />24<br />
    43. 43. Progressive Agility and Responsiveness of IT<br />High<br />Effort<br />& Cost<br />Low<br />Maintenance<br />Cycle Begins<br />Time<br />Start<br />25<br />Foundational Base Built<br />New Functional Areas Added<br />Initial DV Build Out<br />Re-Engineering does NOT occur with a Data Vault Model. This keeps costs down, and maintenance easy. It also reduces complexity of the existing architecture.<br />
    44. 44. What’s Wrong With the OLD METHODOLOGY?<br />Using Star Schemas as your Data Warehouse leads to….<br />26<br />
    45. 45. Dimensionitis<br />DimensionItis: Incurable Disease, the symptoms are the creation of new dimensions because the cost and time to conform existing dimensions with new attributes rises beyond the business ability to pay…<br />27<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />Business Says: <br />Avoid the re-engineering costs, just “copy” the dimensions and create a new one for OUR department… <br />What can it hurt?<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………... …………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />
    46. 46. Deformed Dimensions<br />Deformity: The URGE to continue “slamming data” into an existing conformed dimension until it simply cannot sustain any further changes, the result: a deformed dimension and a HUGE re-engineering cost / nightmare.<br />28<br />Business Wants a Change!<br />Business said: Just add that to the existing Dimension, it will be easy right?<br />Business Change<br />Business Change<br />V1<br />Business Change<br />V2<br />…………………<br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />Complex<br />Load<br />V3<br />………………<br />………………<br />………………<br />………………<br />………………<br />………………<br />………………<br />………………<br />………………<br />………………<br />………………<br />………………<br />………………<br />………………<br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />…………………<br />………………… <br />Complex<br />Load<br />Complex<br />Load<br />90 days, $125k<br />120 days, $200k<br />Re-Engineering the <br />Load Processes EACH TIME!<br />180 days, $275k<br />
    47. 47. Silo Building / IT Non-Agility<br />Business Says: Take the dimension you have, copy it, and change it… This should be cheap, and easy right?<br />29<br />SALES<br />Business Change<br />To Modify Existing Star = <br />180 days, $275k<br />We built our own because IT costs too much…<br />First Star<br />FINANCE<br />Customer_ID<br />Customer_Name<br />Customer_Addr<br />Customer_Addr1<br />Customer_City<br />Customer_State<br />Customer_Zip<br />Customer_Phone<br />Customer_Tag<br />Customer_Score<br />Customer_Region<br />Customer_Stats<br />Customer_Phone<br />Customer_Type<br />Customer_ID<br />Customer_Name<br />Customer_Addr<br />Customer_Addr1<br />Customer_City<br />Customer_State<br />Customer_Zip<br />Customer_Phone<br />Customer_Tag<br />Customer_Score<br />Customer_Region<br />Customer_Stats<br />Customer_Phone<br />Customer_Type<br />We built our own because IT took too long…<br />Customer_ID<br />Customer_Name<br />Customer_Addr<br />Customer_Addr1<br />Customer_City<br />Customer_State<br />Customer_Zip<br />Customer_Phone<br />Fact_ABC<br />Fact_DEF<br />Fact_PDQ<br />Fact_MYFACT<br />MARKETING<br />Customer_ID<br />Customer_Name<br />Customer_Addr<br />Customer_Addr1<br />Customer_City<br />Customer_State<br />Customer_Zip<br />Customer_Phone<br />Customer_Tag<br />Customer_Score<br />Customer_Region<br />Customer_Stats<br />Customer_Phone<br />Customer_Type<br />Customer_ID<br />Customer_Name<br />Customer_Addr<br />Customer_Addr1<br />Customer_City<br />Customer_State<br />Customer_Zip<br />Customer_Phone<br />Customer_Tag<br />Customer_Score<br />Customer_Region<br />Customer_Stats<br />Customer_Phone<br />Customer_Type<br />We built our own because we needed customized dimension data…<br />
    48. 48. Why is Data Vault a Good Fit?<br />30<br />
    49. 49. What are the top businessobstacles in your data warehousetoday?<br />31<br />
    50. 50. Poor Agility<br />Inconsistent Answer Sets<br />Needs Accountability<br />Demands Auditability<br />Desires IT Transparency<br />Are you feeling Pinned Down?<br />32<br />
    51. 51. What are the top technologyobstacles in yourdata warehousetoday?<br />33<br />
    52. 52. Complex Systems<br />Real-Time Data Arrival<br />Unimaginable Data Growth<br />Master Data Alignment<br />Bad Data Quality<br />Late Delivery/Over Budget<br />Are your systems CRUMBLING?<br />34<br />
    53. 53. Yugo<br />Existing Solutions<br />Worlds Worst Car<br />Have lead you down a painful path…<br />35<br />
    54. 54. Projects Cancelled & Restarted<br />Re-engineering required to absorb new systems<br />Complexity drives maintenance cost Sky high<br />Disparate Silo Solutions provide inaccurate answers!<br />Severe lack of Accountability<br />36<br />
    55. 55. How can youovercomethese obstacles?<br />There must be a better way…<br />There IS a better way!<br />37<br />
    56. 56. It’s Called the Data Vault Model andMethodology<br />38<br />
    57. 57. What is it?<br />It’s a simple<br />Easy-to-use<br />Plan<br />To build your <br />valuable<br />Data Warehouse!<br />39<br />
    58. 58. What’s the Value?<br />Painless Auditability <br />Understandable Standards<br />Rapid Adaptability<br />Simple Build-out<br />Uncomplicated Design<br />Effortless Scalability<br />Pursue Your Goals!<br />40<br />
    59. 59. Why Bother With Something New?<br />Old Chinese proverb: <br />'Unless you change direction, you're apt to end up where you're headed.'<br />41<br />
    60. 60. What Are the Issues?<br />This is NOT what you want happening to your project!<br />Business…<br />Changes Frequently<br />IT….<br />Needs Accountability<br />Takes Too Long<br />Demands Auditability<br />Is Over-budget<br />Has No Visibility<br />Too Complex<br />Wants More Control<br />Can’t Sustain Growth<br />THE GAP!!<br />42<br />
    61. 61. What Are the Foundational Keys?<br />Flexibility<br />Scalability<br />Productivity<br />43<br />
    62. 62. Key: Flexibility<br />Enabling rapid change on a massive scale without downstream impacts!<br />44<br />
    63. 63. Key: Scalability<br />Providing no foreseeable barrier to increased size and scope<br />People, Process, & Architecture!<br />45<br />
    64. 64. Key: Productivity<br />Enabling low complexity systems with high value output at a rapid pace<br />46<br />
    65. 65. < BREAK TIME ><br />47<br />
    66. 66. How does it work?<br />Bringing the Data Vault to Your Project<br />48<br />
    67. 67. Key: Flexibility<br />No Re-Engineering!<br />Addingnew components to the EDW has NEAR ZERO impact to:<br /><ul><li>Existing Loading Processes
    68. 68. Existing Data Model
    69. 69. Existing Reporting & BI Functions
    70. 70. Existing Source Systems
    71. 71. Existing Star Schemas and Data Marts</li></ul>49<br />
    72. 72. Case In Point:<br />Result of flexibility of the Data Vault Model allowed them to merge 3 companies in 90 days – that is ALL systems, ALL DATA!<br />50<br />
    73. 73. Key: Scalability in Architecture<br />Scalingis easy, its based on the following principles<br /><ul><li>Hub and spoke design
    74. 74. MPP Shared-Nothing Architecture
    75. 75. Scale Free Networks</li></ul>51<br />
    76. 76. Case In Point:<br />Result of scalability was to produce a Data Vault model that scaled to 3 Petabytes in size, and is still growing today!<br />52<br />
    77. 77. Key: Scalability in Team Size<br />You should be able to SCALE your TEAM as well!<br />With the Data Vault methodology, you can:<br />Scale your team when desired, at different points in the project!<br />53<br />
    78. 78. Case In Point:<br />(Dutch Tax Authority)<br />Result of scalability was to increase ETL developers for each new source system, and reassign them when the system was completely loaded to the Data Vault<br />54<br />
    79. 79. Key: Productivity<br />Increasing Productivity requires a reduction in complexity.<br />The Data Vault Model simplifies all of the following:<br /><ul><li>ETL Loading Routines
    80. 80. Real-Time Ingestion of Data
    81. 81. Data Modeling for the EDW
    82. 82. Enhancing and Adapting for Change to the Model
    83. 83. Ease of Monitoring, managing and optimizing processes</li></ul>55<br />
    84. 84. Case in Point:<br />Result of Productivity was: 2 people in 2 weeks merged 3 systems, built a full Data Vault EDW, 5 star schemas and 3 reports. <br />These individuals generated:<br /><ul><li>90% of the ETL code for moving the data set
    85. 85. 100% of the Staging Data Model
    86. 86. 75% of the finished EDW data Model
    87. 87. 75% of the star schema data model</li></ul>56<br />
    88. 88. The Competing Bid?<br />The competition bid this with 15 people and 3 months to completion, at a cost of $250k! (they bid a Very complex system)<br />Our total cost? $30k and 2 weeks!<br />57<br />
    89. 89. Results?<br />Changing the direction of the river takes less effort than stopping the flow of water<br />58<br />
    90. 90. When NOT to use the Data Vault Model & Methodology<br />59<br />
    91. 91. When NOT to Use the Data Vault<br />You have:<br />a small set of point solution requirements<br />a very short time-frame for delivery<br />To use the data one-time, then throw it away<br />a single source system, single source application<br />A single business analyst in the entire company<br />You do NOT have:<br />audit requirements forcing you to keep history<br />multiple data center consolidation efforts<br />near-real-time to worry about<br />massive batch data to integrate<br />External data feeds outside your control<br />Requirements to do trend analysis of all your data<br />Pain – that forces you to reengineer every time you ask for a change to your current data warehousing systems<br />60<br />
    92. 92. Fundamental Paradigm Shift<br />Exploring differences in the architecture, implementation, and process design.<br />61<br />
    93. 93. It’s Not Just a Data Model…<br />Model<br />Methodology<br />SUCCESS!<br />62<br />
    94. 94. Different From ANYTHING ELSE!<br />The Business Rules go after the Data Warehouse!<br />Data is interpreted on the way OUT!<br />Hold on… We do distinguish between HARD and SOFT business rules…<br />Ok, now tell my WHY this is important?<br />63<br />
    95. 95. EDW: The Old Way of Loading<br />Corporate Fraud Accountability Title XI consists of seven sections. Section 1101 recommends a name for this title as “Corporate Fraud Accountability Act of 2002”. It identifies corporate fraud and records tamperingas criminal offenses and joins those offenses to specific penalties. It also revises sentencing guidelines and strengthens their penalties. This enables the SEC to temporarily freeze large or unusual payments. <br />Source 1<br />HR Mart<br />Business Rules<br />Change<br />Data!<br />Sales Mart<br />Source 2<br />Staging<br />Are changes to data ON THE WAY IN to the EDW <br />equivalent to records tampering?<br />Finance Mart<br />Source 3<br />64<br />
    96. 96. EDW: The New Compliant Way<br />Implement a Raw Data Vault Data Warehouse<br />Move the business rules “downstream”<br />65<br />
    97. 97. Business Keys & Business Processes<br />66<br />
    98. 98. Business Keys & Business Processes<br />67<br />Excel Spreadsheet<br />SLS123<br />*P123MFG<br />SLS123<br />SLS123<br />*P123MFG<br />Procurement<br />Sales<br />Manual Process<br />NO VISIBILITY!<br />Customer<br />Contact<br />$$<br />Revenue<br />Time<br />Delivery<br />Sales<br />Contracts<br />Planning<br />Procurement<br />Manufacturing<br />Finance<br />
    99. 99. Technical Review<br />Hub, Link, Satellite - Definitions<br />68<br />
    100. 100. HUB Data Examples<br />HUB_PART_NUMBER<br />HUB_CUST_ACCT<br />SQN PART_NUM LOAD_DTS RECORD_SRC<br />1 MFG-25862 10-14-2000 MANUFACT<br />2 MFG*25266 10-14-2000 MANUFACT<br />3 *P25862 10-14-2000 PLANNING<br />4 MFG_25862 10-15-2000 DELIVERY<br />5 CN*25266 10-16-2000 DELIVERY<br />SQN CUST_ACCT LOAD_DTS RECORD_SRC<br />1 ABC123 10-14-2000 SALES<br />2 ABC-123 10-14-2000 SALES<br />3 *ABC-123 10-14-2000 FINANCE<br />4 123,ABCD 10-15-2000 CONTRACTS<br />5 PEF-2956 10-16-2000 CONTRACTS<br />Hub Structure<br />SEQUENCE<br /><BUSINESS KEY><br />{LAST SEEN DATE}<br /><LOAD DATE><br /><RECORD SOURCE><br />} Unique Index<br />} Optional<br />69<br />
    101. 101. Link Structures<br />Link_Product_Supplier<br />Link_Customer_Account_Employee<br />LPS_SQN<br />PRODUCT_SQN<br />SUPPLIER_SQN<br />LPS_LOAD_DTS<br />LPS_REC_SOURCE<br />LPS_ENCR_KEY<br />LCAE_SQN<br />CUSTOMER_SQN<br />ACCOUNT_SQN<br />EMPLOYEE_SQN<br />LCAE_LOAD_DTS<br />LCAE_REC_SOURCE<br />Unique<br />Index<br />Link Structure<br />SEQUENCE<br /><HUB KEY SQN 1><br /><HUB KEY SQN 2><br /><HUB KEY SQN N><br />{LAST SEEN DATE}<br />{CONFIDENCE}<br />{STRENGTH}<br /><LOAD DATE><br /><RECORD SOURCE><br />Unique <br />Index<br />} Optional<br />Dynamic Link<br />70<br />
    102. 102. Satellites Split By Source System<br />SAT_FINANCE_CUST<br />SAT_CONTRACTS_CUST<br />SAT_SALES_CUST<br />PARENT SEQUENCE<br />LOAD DATE<br /><LOAD-END-DATE><br /><RECORD-SOURCE><br />Contact Name<br />Contact Email<br />Contact Phone Number<br />PARENT SEQUENCE<br />LOAD DATE<br /><LOAD-END-DATE><br /><RECORD-SOURCE><br />First Name<br />Last Name<br />Guardian Full Name<br />Co-Signer Full Name<br />Phone Number<br />Address<br />City<br />State/Province<br />Zip Code<br />PARENT SEQUENCE<br />LOAD DATE<br /><LOAD-END-DATE><br /><RECORD-SOURCE><br />Name<br />Phone Number<br />Best time of day to reach<br />Do Not Call Flag<br />Satellite Structure<br />PARENT SEQUENCE<br />LOAD DATE<br /><LOAD-END-DATE><br /><RECORD-SOURCE><br />{user defined descriptive data}<br />{or temporal based timelines}<br />Primary<br />Key<br />71<br />
    103. 103. Why do we build Links this way?<br />72<br />
    104. 104. History Teaches Us…<br />If we model for ONE relationship in the EDW, we BREAK the others!<br />73<br />Portfolio<br />The EDW is designed to handle TODAY’S relationship, as soon as history is loaded, it breaks the model!<br />1<br />Today:<br />M<br />Customer<br />Hub Portfolio<br />X<br />1<br />Portfolio<br />5 years<br />From now<br />M<br />M<br />M<br />Customer<br />Hub Customer<br />X<br />Portfolio<br />M<br />10 Years ago<br />1<br />This situation forces re-engineering of the model, load routines, and queries!<br />Customer<br />
    105. 105. History Teaches Us…<br />If we model with a LINK table, we can handle ALL the requirements!<br />74<br />Portfolio<br />1<br />Today:<br />Hub Portfolio<br />M<br />Customer<br />1<br />M<br />Portfolio<br />LNK<br />Cust-Port<br />5 years <br />from now<br />M<br />M<br />M<br />Customer<br />1<br />Hub Customer<br />Portfolio<br />M<br />10 Years ago<br />This design is flexible, handles past, present, and future relationship changes with NO RE-ENGINEERING!<br />1<br />Customer<br />
    106. 106. Applying the Data Vault to Global DW2.0<br />Manufacturing EDW <br />in China<br />Planning in Brazil<br />Hub<br />Hub<br />Link<br />Sat<br />Sat<br />Link<br />Sat<br />Sat<br />Link<br />Hub<br />Link<br />Hub<br />Hub<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Base EDW Created in Corporate<br />Financials in USA<br />75<br />
    107. 107. 76<br />Extreme Data Vault Partitioning<br />
    108. 108. Query Performance<br />Point-in-time and Bridge Tables, overcoming query issues<br />77<br />
    109. 109. Purpose Of PIT & Bridge<br />To reduce the number of joins, and to reduce the amount of data being queried for a given range of time.<br />These two together, allow “direct table match”, as well as table elimination in the queries to occur.<br />These tables are not necessary for the entire model; only when:<br />Massive amounts of data are found<br />Large numbers of Satellites surround a Hub or Link<br />Large query across multiple Hubs & Links is necessary<br />Real-time-data is flowing in, uninterrupted<br />What are they?<br />Snapshot tables – Specifically built for query speed<br />78<br />
    110. 110. PIT Table Architecture<br />Satellite: Point In Time<br />Primary<br />Key<br />PARENT SEQUENCE<br />LOAD DATE<br />{Satellite 1 Load Date}<br />{Satellite 2 Load Date}<br />{Satellite 3 Load Date}<br />{…}<br />{Satellite N Load Date}<br />PIT Sat <br />Sat 1<br />Sat 2<br />Hub<br />Order<br />PIT Sat <br />Sat 3<br />Sat 1<br />Sat 4<br />Sat 2<br />Sat 1<br />Hub Customer<br />Hub Product<br />Sat 2<br />Sat 3<br />Link Line Item<br />Sat 4<br />Satellite<br />Line Item<br />79<br />
    111. 111. PIT Table Example<br />SAT_CUST_CONTACT_CELL<br />SAT_CUST_CONTACT_ADDR<br />SAT_CUST_CONTACT_NAME<br />SQN LOAD_DTSCELL <br />1 10-14-2000999-555-1212<br />1 10-15-2000 999-111-1234<br />1 10-16-2000 999-252-2834<br />1 10-17-2000 999.257-2837<br />1 10-18-2000 999-273-5555<br />SQN LOAD_DTSADDR <br />1 08-01-200026 Prospect<br />109-29-200026 Prosp St.<br />112-17-200028 November<br />1 01-01-200126 Prospect St<br />SQN LOAD_DTSNAME <br />1 10-14-2000 Dan L<br />1 11-01-2000Dan Linedt<br />112-31-2000Dan Linstedt<br />SQN LOAD_DTSSAT_NAME_LDTS SAT_CELL_LDTS SAT_ADDR_LDTS<br />1 08-01-2000NULL NULL 08-01-2000<br />1 09-01-2000 NULL NULL 08-01-2000<br />1 10-01-2000 NULL NULL 09-29-2000<br />1 11-01-200011-01-200010-18-200009-29-2000<br />1 12-01-200011-01-200010-18-200009-29-2000<br />1 01-01-200112-31-200010-18-200001-01-2001<br />Snapshot Date<br />80<br />
    112. 112. BridgeTable Architecture<br />Satellite: Bridge<br />Primary<br />Key<br />UNIQUE SEQUENCE<br />LOAD DATE<br />{Hub 1 Sequence #}<br />{Hub 2 Sequence #}<br />{Hub 3 Sequence #}<br />{Link 1 Sequence #}<br />{Link 2 Sequence #}<br />{…}<br />{Link N Sequence #}<br />{Hub 1 Business Key}<br />{Hub 2 Business Key}<br />{…}<br />{Hub N Business Key}<br />Bridge<br />Sat 1<br />Sat 2<br />Hub Parts<br />Hub Seller<br />Hub Product<br />Link <br />Link <br />Sat 3<br />Sat 4<br />Satellite<br />Satellite<br />81<br />
    113. 113. Bridge Table Data Example<br />Bridge Table: Seller by Product by Part<br />SQN LOAD_DTSSELL_SQN SELL_ID PROD_SQN PROD_NUM PART_SQN PART_NUM<br />1 08-01-200015 NY*1 2756 ABC-123-9K 525 JK*2*4<br />209-01-200016CO*242654DEF-847-0L 324 MN*5-2<br />310-01-200016CO*2482374PPA-252-2A 9938 DD*2*3<br />411-01-200024AZ*2525222UIF-525-88 7 UF*9*0<br />512-01-200099NM*581DAN-347-7F 16 KI*9-2<br />601-01-200199NM*581DAN-347-7F 24 DL*0-5<br />Snapshot Date<br />82<br />
    114. 114. What WASN’T Covered<br />ETL Automation<br />ETL Implementation<br />SQL Query Logic<br />Balanced MPP design<br />Data Vault Modeling on Appliances<br />Deep Dive on Structures (Hubs, Links, Satellites)<br />What happens when you break the rules?<br />Project management, Risk management & mitigation, methodology & approach<br />Automation: Automated DV modeling, Automated ETL production<br />Change Management<br />Temporal Data Modeling Concerns… And so on…<br />83<br />
    115. 115. Conclusions<br />84<br />
    116. 116. Who’s Using It?<br />85<br />
    117. 117. The Experts Say…<br />“The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” <br />Bill Inmon<br />“The Data Vault is foundationally strong and exceptionally scalable architecture.”<br />Stephen Brobst<br />“The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” <br />Doug Laney<br />86<br />
    118. 118. More Notables…<br />“This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.” <br />Howard Dresner<br />“[The Data Vault] captures a practical body of knowledge for data warehouse development which both agile and traditional practitioners will benefit from..”<br />Scott Ambler<br />87<br />
    119. 119. Where To Learn More<br />The Technical Modeling Book: http://LearnDataVault.com<br />The Discussion Forums: & eventshttp://LinkedIn.com – Data Vault Discussions<br />Contact me:http://DanLinstedt.com - web siteDanLinstedt@gmail.com - email<br />World wide User Group (Free)http://dvusergroup.com<br />88<br />

    ×