Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Operational Data Vault

16,618 views

Published on

I gave this presentation at the Advanced Architecture Conference, Bill Inmon, 2011 in Evergreen, Colorado. This presentation covers a new breed of data warehousing called Operational Data Warehousing. These are the next steps in business intelligence towards self-service BI and enabling users to do more with their enterprise data warehouse solution. Specifically, it talks about how the Data Vault model fits in to this picture.

If you would like to use the slides, please e-mail me first, I'd be happy to discuss it with you.

Published in: Technology, Business
  • The agenda on this presentation is not quite right (sorry folks), I will fix the agenda slide and re-upload this presentation in a couple days. Don't forget: you can watch the YouTube videos at: http://YouTube.com/LearnDataVault, sign up for the facebook page: LearnDataVault, or purchase the book at: http://LearnDataVault.com

    Thanks, Dan L
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Operational Data Vault

  1. 1. Data Vault:What’s Next?<br />© Dan Linstedt, 2011-2012 all rights reserved<br />1<br />
  2. 2. Agenda<br />Introduction – why are you here?<br />Short Data Vault Review<br />What’s Next? Advanced Architecture…<br />Defining Operational Data Warehousing<br />Why is Data Vault a Good Fit?<br /><BREAK><br />Fundamental Paradigm Shift<br />Business Keys & Business Processes<br />Technical Review<br />Query Performance (PIT & Bridge)<br />What wasn’t covered in this presentation…<br />2<br />
  3. 3. A bit about me…<br />3<br />Author, Inventor, Speaker – and part time photographer…<br />25+ years in the IT industry<br />Worked in DoD, US Gov’t, Fortune 50, and so on…<br />Find out more about the Data Vault:<br />http://YouTube.com/LearnDataVault<br />http://LearnDataVault.com<br />Slides available:<br />http://SlideShare.net<br />Search: “Advanced Architecture Data Vault”<br />Full profile on http://www.LinkedIn.com/dlinstedt<br />
  4. 4. Why Are You Here?<br />4<br />Your Expectations?<br />Your Questions?<br />Your Background?<br />Areas of Interest?<br />Biggest question:<br />What are the top 3 pains your current EDW / BI solution is experiencing?<br />
  5. 5. Short Data Vault Review<br />What is it and where did it come from?<br />5<br />
  6. 6. Data Warehousing Timeline<br />E.F. Codd invented relational modeling<br />1976 Dr Peter Chen<br />Created E-R Diagramming<br />2010- DV<br />Alive and Well<br />Around the<br />World<br />1990 – Dan Linstedt Begins R&D on Data Vault Modeling<br />Chris Date and Hugh Darwen Maintained and Refined Modeling<br />Mid 70’s AC Nielsen <br />Popularized<br />Dimension & Fact Terms<br />1970<br />2010<br />2000<br />1960<br />1980<br />1990<br />Late 80’s – Barry Devlin and Dr Kimball Release “Business Data Warehouse”<br />Early 70’s Bill Inmon Began Discussing Data Warehousing<br />Mid 80’s Bill Inmon<br />Popularizes Data Warehousing<br />Mid 60’s Dimension & Fact Modeling presented by General Mills and Dartmouth University<br />2000 – Dan Linstedt releases first 5 articles on Data Vault Modeling<br />Mid – Late 80’s Dr Kimball <br />Popularizes Star Schema<br />
  7. 7. Data Vault Modeling…<br />Took 10 years of Research and Design, including TESTING <br />to become <br />flexible, consistent, and scalable<br />7<br />
  8. 8. What IS a Data Vault? (Business Definition)<br />Data Vault Model<br />Detail oriented<br />Historical traceability<br />Uniquely linked set of normalized tables<br />Supports one or more functional areas of business<br />8<br /><ul><li>Data Vault Methodology
  9. 9. CMMI, Project Plan
  10. 10. Risk, Governance, Versioning
  11. 11. Peer Reviews, Release Cycles
  12. 12. Repeatable, Consistent, Optimized
  13. 13. Complete with Best Practices for BI/DW</li></ul>Business Keys<br />Span / Cross<br />Lines of Business<br />Sales<br />Contracts<br />Planning<br />Delivery<br />Finance<br />Operations<br />Procurement<br />Functional Area<br />
  14. 14. Supply Chain Analogy<br />9<br />Source <br />Systems<br />Data Vault<br />(EDW)<br />Data Marts<br />
  15. 15. What Does One Look Like?<br />Records a history of the interaction<br />Customer<br />Product<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Link<br />Customer<br />Product<br />F(x)<br />F(x)<br />F(x)<br />Sat<br />Sat<br />Sat<br />Sat<br />Order<br />F(x)<br />Sat<br />Order<br />Elements:<br /><ul><li>Hub
  16. 16. Link
  17. 17. Satellite</li></ul>10<br />Hub = List of Unique Business Keys<br />Link = List of Relationships, Associations<br />Satellites = Descriptive Data<br />
  18. 18. Colorized Perspective…<br />Data Vault<br />3rd NF & Star Schema<br />(separation)<br />Business Keys<br />Associations<br />Details<br />HUB<br />Satellite<br />The Data Vault uniquely separates the Business Keys (Hubs) from the Associations (Links) and both of these from the Detailsthat describe them and provide context (Satellites). <br />LINK<br />Satellite<br />(Colors Concept Originated By: Hans Hultgren)<br />11<br />
  19. 19. A Quick Look at Methodology Issues<br />Business Rule Processing, Lack of Agility, and <br />Future proofing your new solution<br />12<br />
  20. 20. EDW Architecture: Generation 1<br />13<br />Enterprise BI Solution<br />Sales<br />(batch)<br />Staging<br />(EDW)<br />Star<br />Schemas<br />Complex <br />Business <br />Rules #2<br />Finance<br />Conformed Dimensions<br />Junk Tables<br />Helper Tables<br />Factless Facts<br />Staging + History<br />Complex<br />Business <br />Rules<br />+Dependencies<br />Contracts<br /><ul><li>Quality routines
  21. 21. Cross-system dependencies
  22. 22. Source data filtering
  23. 23. In-process data manipulation
  24. 24. High risk of incorrect data aggregation
  25. 25. Larger system = increased impact
  26. 26. Often re-engineered at the SOURCE
  27. 27. History can be destroyed (completely re-computed)</li></li></ul><li>#1 Cause of BI Initiative Failure<br />14<br />Anyone?<br />Re-Engineering<br />For<br />Every Change!<br />Let’s take a look at one example…<br />
  28. 28. Re-Engineering<br />Business<br />Rules<br />Data Flow (Mapping)<br />Current Sources<br />Sales<br />Customer<br />Source<br />Join<br />Finance<br />Customer<br />Transactions<br />Customer<br />Purchases<br />IMPACT!!<br />** NEW SYSTEM**<br />15<br />
  29. 29. Federated Star Schema Inhibiting Agility<br />Data Mart 3<br />High<br />Effort<br />& Cost<br />Data Mart 2<br />Data Mart 1<br />Changing and Adjusting conformed dimensions causes an exponential rise in the cost curve over time<br />RESULT: Business builds their own Data Marts!<br />Low<br />Maintenance<br />Cycle Begins<br />Time<br />Start<br />16<br />The main driver for this is the maintenance costs, and re-engineering of the existing system which occurs for each new “federated/conformed” effort. This increases delivery time, difficulty, and maintenance costs.<br />
  30. 30. EDW Architecture: Generation 2<br />SOA<br />Enterprise BI Solution<br />Star<br />Schemas<br />(real-time)<br />Sales<br />(batch)<br />EDW<br />(Data Vault)<br />(batch)<br />Staging<br />Error<br />Marts<br />Finance<br />Contracts<br />Complex<br />Business <br />Rules<br />Report<br />Collections<br />Unstructured<br />Data<br />FUNDAMENTAL GOALS<br /><ul><li>Repeatable
  31. 31. Consistent
  32. 32. Fault-tolerant
  33. 33. Supports phased release
  34. 34. Scalable
  35. 35. Auditable</li></ul>The business rules are moved closer to the business, improving IT reaction time, reducing cost and minimizing impacts to the enterprise data warehouse (EDW)<br />17<br />
  36. 36. NO Re-Engineering<br />Current Sources<br />Data Vault<br />Sales<br />Stage<br />Copy<br />Hub<br />Customer<br />Customer<br />Finance<br />Stage<br />Copy<br />Link Transaction<br />Customer<br />Transactions<br />Hub<br />Acct<br />Hub<br />Product<br />Customer<br />Purchases<br />Stage<br />Copy<br />NO IMPACT!!!<br />NO RE-ENGINEERING!<br />** NEW SYSTEM**<br />IMPACT!!<br />18<br />
  37. 37. Progressive Agility and Responsiveness of IT<br />High<br />Effort<br />& Cost<br />Low<br />Maintenance<br />Cycle Begins<br />Time<br />Start<br />19<br />Foundational Base Built<br />New Functional Areas Added<br />Initial DV Build Out<br />Re-Engineering does NOT occur with a Data Vault Model. This keeps costs down, and maintenance easy. It also reduces complexity of the existing architecture.<br />
  38. 38. Why is Data Vault a Good Fit?<br />20<br />
  39. 39. What are the top businessobstacles in your data warehousetoday?<br />21<br />
  40. 40. Poor Agility<br />Inconsistent Answer Sets<br />Needs Accountability<br />Demands Auditability<br />Desires IT Transparency<br />Are you feeling Pinned Down?<br />22<br />
  41. 41. What are the top technologyobstacles in yourdata warehousetoday?<br />23<br />
  42. 42. Complex Systems<br />Real-Time Data Arrival<br />Unimaginable Data Growth<br />Master Data Alignment<br />Bad Data Quality<br />Late Delivery/Over Budget<br />Are your systems CRUMBLING?<br />24<br />
  43. 43. Yugo<br />Existing Solutions<br />Worlds Worst Car<br />Have lead you down a painful path…<br />25<br />
  44. 44. Projects Cancelled & Restarted<br />Re-engineering required to absorb new systems<br />Complexity drives maintenance cost Sky high<br />Disparate Silo Solutions provide inaccurate answers!<br />Severe lack of Accountability<br />26<br />
  45. 45. How can youovercomethese obstacles?<br />There must be a better way…<br />There IS a better way!<br />27<br />
  46. 46. It’s Called the Data Vault Model andMethodology<br />28<br />
  47. 47. What is it?<br />It’s a simple<br />Easy-to-use<br />Plan<br />To build your <br />valuable<br />Data Warehouse!<br />29<br />
  48. 48. What’s the Value?<br />Painless Auditability <br />Understandable Standards<br />Rapid Adaptability<br />Simple Build-out<br />Uncomplicated Design<br />Effortless Scalability<br />Pursue Your Goals!<br />30<br />
  49. 49. Why Bother With Something New?<br />Old Chinese proverb: <br />'Unless you change direction, you're apt to end up where you're headed.'<br />31<br />
  50. 50. What Are the Issues?<br />This is NOT what you want happening to your project!<br />Business…<br />Changes Frequently<br />IT….<br />Needs Accountability<br />Takes Too Long<br />Demands Auditability<br />Is Over-budget<br />Has No Visibility<br />Too Complex<br />Wants More Control<br />Can’t Sustain Growth<br />THE GAP!!<br />32<br />
  51. 51. What Are the Foundational Keys?<br />Flexibility<br />Scalability<br />Productivity<br />33<br />
  52. 52. Key: Flexibility<br />Enabling rapid change on a massive scale without downstream impacts!<br />34<br />
  53. 53. Key: Scalability<br />Providing no foreseeable barrier to increased size and scope<br />People, Process, & Architecture!<br />35<br />
  54. 54. Key: Productivity<br />Enabling low complexity systems with high value output at a rapid pace<br />36<br />
  55. 55. How does it work?<br />Bringing the Data Vault to Your Project<br />37<br />
  56. 56. Key: Flexibility<br />No Re-Engineering!<br />Addingnew components to the EDW has NEAR ZERO impact to:<br /><ul><li>Existing Loading Processes
  57. 57. Existing Data Model
  58. 58. Existing Reporting & BI Functions
  59. 59. Existing Source Systems
  60. 60. Existing Star Schemas and Data Marts</li></ul>38<br />
  61. 61. Case In Point:<br />Result of flexibility of the Data Vault Model allowed them to merge 3 companies in 90 days – that is ALL systems, ALL DATA!<br />39<br />
  62. 62. Key: Scalability in Architecture<br />Scalingis easy, its based on the following principles<br /><ul><li>Hub and spoke design
  63. 63. MPP Shared-Nothing Architecture
  64. 64. Scale Free Networks</li></ul>40<br />
  65. 65. Case In Point:<br />Result of scalability was to produce a Data Vault model that scaled to 3 Petabytes in size, and is still growing today!<br />41<br />
  66. 66. Key: Scalability in Team Size<br />You should be able to SCALE your TEAM as well!<br />With the Data Vault methodology, you can:<br />Scale your team when desired, at different points in the project!<br />42<br />
  67. 67. Case In Point:<br />(Dutch Tax Authority)<br />Result of scalability was to increase ETL developers for each new source system, and reassign them when the system was completely loaded to the Data Vault<br />43<br />
  68. 68. Key: Productivity<br />Increasing Productivity requires a reduction in complexity.<br />The Data Vault Model simplifies all of the following:<br /><ul><li>ETL Loading Routines
  69. 69. Real-Time Ingestion of Data
  70. 70. Data Modeling for the EDW
  71. 71. Enhancing and Adapting for Change to the Model
  72. 72. Ease of Monitoring, managing and optimizing processes</li></ul>44<br />
  73. 73. Case in Point:<br />Result of Productivity was: 2 people in 2 weeks merged 3 systems, built a full Data Vault EDW, 5 star schemas and 3 reports. <br />These individuals generated:<br /><ul><li>90% of the ETL code for moving the data set
  74. 74. 100% of the Staging Data Model
  75. 75. 75% of the finished EDW data Model
  76. 76. 75% of the star schema data model</li></ul>45<br />
  77. 77. The Competing Bid?<br />The competition bid this with 15 people and 3 months to completion, at a cost of $250k! (they bid a Very complex system)<br />Our total cost? $30k and 2 weeks!<br />46<br />
  78. 78. Results?<br />Changing the direction of the river takes less effort than stopping the flow of water<br />47<br />
  79. 79. < BREAK TIME ><br />48<br />
  80. 80. What’s Next?<br />A look at what’s around the corner for Data Warehousing and Business Intelligence, believe me, it’s going to get interesting fast.<br />49<br />
  81. 81. Operational Data Vault<br />50<br />Data Co-Location:<br />Transactions & Transaction History<br />Master Data & Master Data History<br />Metadata & Metadata History<br />External Data & External Data History<br />Business Rules & Business Rule History<br />Security / Access data & History<br />Unstructured Data Ties & History<br />Real-time Data Feeds DIRECTLY in to the data store<br />Operational Applications <br />ON TOP of the warehouse!<br />
  82. 82. Extreme Automation!<br />51<br />Automated Creation of Data Models:<br /><ul><li>Staging Models
  83. 83. Data Vault Models
  84. 84. Star Schema Models
  85. 85. Cube Models
  86. 86. Excel Models (spreadsheets)
  87. 87. Data Mining Models (table structures)</li></ul>Automated Creation of ETL Processes:<br /><ul><li>Staging Loads
  88. 88. Data Vault (Data Warehouse Loads)
  89. 89. Star Schema Loads (80% solutions)
  90. 90. Cube Loads (80% solutions)
  91. 91. Excel Loads / Queries (80% solutions)
  92. 92. Data Mining Queries (80% solutions)</li></ul>Other Automated Components:<br /><ul><li>Initial Metadata Population
  93. 93. Initial Master Data Population
  94. 94. Generated Testing Scripts</li></ul>http://www.jmorganmarketing.com/should-social-crm-be-automated/<br />
  95. 95. Results of all of this?<br />52<br />EDW Will:<br />become BACK OFFICE!!<br />become SELF-RELIANT / SELF-HEALING<br />adapt to new structures, new hardware, and new data<br />automatically backup and remove old data<br />Self-Reliance<br />http://images.businessweek.com/ss/06/10/bestunder25/source/1.htm<br />
  96. 96. How Long Will it Take?<br />53<br />My milestone predictions:<br />1 yr: Operational Data Vault<br />2 yrs: Beginning automation of business rules<br />3 yrs: Beginning dynamic restructuring in the DV<br />4 yrs: Oper Apps contain BI & metadata & Master data GUI’s in a single place<br />5 yrs: the “all-in-one” appliance, containing 75% of what we need at the firmware levels to do all these things<br />http://thypolarlife.wordpress.com/2011/08/02/this-moment-in-time/<br />
  97. 97. Why Should I Care?<br />54<br /><ul><li>Because the Data Warehouse combined with the operational applications on top, make for a self-service BI environment
  98. 98. Because this technology is the heartof Data Warehousing!
  99. 99. Because the future is now
  100. 100. Because it will happen with or without you… You do want a job right?</li></li></ul><li>What About Tooling?<br />55<br />Data<br />Patterns<br />New Models<br />Auto-<br />mation<br />Target DDL<br />ETL Code<br />Source DDL<br />Documentation<br />Ontology<br />Test Data<br />Cross-Reference<br />SQL Code<br />Templates<br />Config<br />Metadata & Business Rules!<br />
  101. 101. Who’s Tooling Today?<br />56<br />WhereScape<br />Quipu<br />AnalytixDS<br />RapidACE<br />Nexus<br />BI-Ready<br />Centennium<br />
  102. 102. What Does It Add Up To?<br />57<br />Pervasive<br />BI<br />
  103. 103. What’s the Key Ingredient?<br />58<br />Ubiquitous<br />A.I.<br />
  104. 104. Defining Operational Data Warehousing<br />What is an ODW and How did we get here?<br />59<br />
  105. 105. What IS An Operational DW?<br />A raw, time-variant, integrated, non-volatile data warehouse, on top of which sits an operational application – “editing and changing data”.<br />However, instead of updates and deletes in place, the data is “marked” deleted, and updates are turned in to Inserts, creating a delta audit trail along the way.<br />Yes, it’s an operational application on top of the integrated data warehouse (or in this case, Data Vault model).<br />60<br />
  106. 106. Oper/Active DW Timeline<br />61<br />Real-Time & Oper BI<br />Make the Scene<br />(Users Want Direct<br />Control & Up to the<br />Minute Data)<br />Teradata makes<br />Real advances in Active DW<br />“Appliances” begin appearing<br />On-scene<br />Data Warehouses<br />Split From Operational<br />Systems<br />2010<br />2000<br />1980<br />1990<br />2002 - Cendant-TRG<br />Creates Worlds First<br />Operational Data Vault<br />Mid 90’s “Active” DW<br />Becomes Important<br />But has to wait for Technology<br />To Catch Up!<br />
  107. 107. How Did We Get Here?<br />62<br />DDW<br />ODW<br />How do you dynamically adapt to business?<br />Can you change what is happening?<br />7<br />6<br />Dynamic Alterations<br />To Structure<br />System Of Record<br />Application Direct Edits to Data in the EDW<br />Parts are © Teradata – Stephen Brobst, CTO<br />
  108. 108. ODV Overview<br />63<br />Applications<br />(Direct edits)<br />ODV<br />Direct Inserts<br />NO STAGING AREA<br />Web-Services<br />(Direct Feeds)<br />Virtual Marts<br />(Direct Access)<br />Unstructured Feeds<br />(Indirect Feeds)<br />Metadata Rules<br />(Direct Edits)<br />Batch Loads<br />(Direct Feeds)<br />
  109. 109. What is the architecture?<br />64<br />Operational<br />Metadata<br />Management<br />Operational<br />Applications<br />Master Data<br />Strategic<br />Reports<br />& OLAP<br />Master<br />Data<br />Direct Edits<br />Web Interface (usually)<br />Direct Edits<br />Real-Time<br />Collector<br />SOR<br />Real-Time Data<br />Data Vault EDW<br /><ul><li>Stored
  110. 110. Analyzed / Scored</li></ul>Operational<br />Systems<br />Unstructured<br />Semi-Structured<br />Staging <br />Area<br />Non-S.O.R.<br />Historical Batch Data<br />Non-SOR<br />Batch Data<br />Operational<br />Alerts<br />Operational <br />Systems<br />Virtual <br />Marts<br />Real-Time<br />Mining<br />Engine<br /><ul><li>Flexible
  111. 111. Accountable
  112. 112. Compliant
  113. 113. Scalable
  114. 114. Normalized
  115. 115. Dynamic
  116. 116. Granular
  117. 117. Historic
  118. 118. Integrated by business key</li></li></ul><li>What must an ODW have?<br />Operational Application(s) on-top of the single data store<br />All the up-time and maintenance requirements of a standard operational application (24x7x365, 6 9’s reliance, etc…)<br />Inflow and outflow of information; bi-directional data flow to & from the service bus (SOA/ESB, etc..)<br />Capacity to incorporate and store existing batch loads and accept real-time data from other feeds<br />Ability to interface with unstructured data sets<br />All the inherent design necessities of an EDW<br />65<br />
  119. 119. Why should I care?<br />66<br />TWO REASONS:<br />CONVERGENCE<br />SELF-SERVICE BI<br />
  120. 120. Under the Covers…<br />67<br />Presents Data to User in Conformed Screens<br />Application<br />3. Present in GUI<br />4. Accept Ins, Upd, Del<br />Data Access<br />Control Layer<br />5. Perform Insert / Status change<br />2. Lock Business Key Rows<br />1. Read Data for Edit<br />6. Release Lock On Business Key Rows<br />Sat 1<br />Operational<br />Data Vault<br />(ODW) Layer<br />Sat 2<br />Hub Parts<br />Link <br />Hub Seller<br />Hub Product<br />Link <br />Sat 3<br />Sat 4<br />Satellite<br />Satellite<br />
  121. 121. Dropping by the Way-Side<br />No…<br />ETL<br />BATCH DRIVEN PROCESSING<br />“Synchronization” with the Source System<br />missing source data<br />No scalability problems<br />No ODS needed!<br />No “Master Data” system needed<br />No Staging area needed<br />68<br />
  122. 122. Positives<br />Data in the ODW can be governed<br />Audit trail built in<br />Delta’s only are stored<br />NEW applications can be created to “automatically” generate Cubes/Star Schemas – these apps can be run by the users…<br />Self-Service BI is enabled!<br />Master data can be “marked, scored, stored” in the same place as the EDW<br />69<br />
  123. 123. Old Components Still There?<br />Staging areas will exist as long as there is external data to load and integrate<br />ODS areas may still exist as long as there are other legacy applications existing as source systems<br />Master Data areas may still exist as long as the logic is not built directly in to the “operational DW application”<br />70<br />
  124. 124. Secure ODV Technical Layers<br />71<br />Visible Objects<br />Inbound API<br />Outbound API<br />Services<br />Authentication API<br />Master Data API<br />Component<br />Groups<br />Packaging API<br />Pedigree API<br />Security Key Mgr<br />API<br />Transaction API<br />Aggregation API<br />File Management<br />Interface<br />Kit API<br />Busn. Intelligence API<br />Notification Interface<br />Vault <br />Accessibility<br />Subject Area API<br />Scheduling Interface<br />Local DB Interface<br />Global DB Interface<br />Common Data <br />Object Area<br />Security Interface<br />(Encryption Too)<br />Format Interface<br />Persistence Cache DB Interface<br />Logging Interface<br />Database Interface<br />Web Server Locally Based<br />Persistent DB Cache for <br />Joining<br />Global DB<br />Local DB1<br />Local DB2<br />
  125. 125. What are the benefits?<br />Simplified Architecture<br />Single Copy of the data!<br />No “intermediate” IT work to do<br />Users become empowered, with direct access to data sets<br />Of course, using the Data Vault model, you gain ALL the benefits of the Data Vault (Scalability, flexibility, etc…)<br />NOTE: Two or more “users” can actually EDIT different parts of the same record at the same time!<br />Integrating external data basically makes it all available to the application immediately!<br />NO NEED TO BUILD A SEPARATE EDW!!<br />72<br />
  126. 126. What are the drawbacks?<br />No current “application” is using the Data Vault for operational data<br />In other words, off-the-shelf apps in this area do not yet exist – you have to “build it” yourself<br />Self-Service BI application technology is nascent or non-existent today<br />Master Data & Metadata Applications are not currently available on top of Data Vault<br />73<br />
  127. 127. Technical Review<br />Hub, Link, Satellite - Definitions<br />74<br />
  128. 128. HUB Data Examples<br />HUB_PART_NUMBER<br />HUB_CUST_ACCT<br />SQN PART_NUM LOAD_DTS RECORD_SRC<br />1 MFG-25862 10-14-2000 MANUFACT<br />2 MFG*25266 10-14-2000 MANUFACT<br />3 *P25862 10-14-2000 PLANNING<br />4 MFG_25862 10-15-2000 DELIVERY<br />5 CN*25266 10-16-2000 DELIVERY<br />SQN CUST_ACCT LOAD_DTS RECORD_SRC<br />1 ABC123 10-14-2000 SALES<br />2 ABC-123 10-14-2000 SALES<br />3 *ABC-123 10-14-2000 FINANCE<br />4 123,ABCD 10-15-2000 CONTRACTS<br />5 PEF-2956 10-16-2000 CONTRACTS<br />Hub Structure<br />SEQUENCE<br /><BUSINESS KEY><br />{LAST SEEN DATE}<br /><LOAD DATE><br /><RECORD SOURCE><br />} Unique Index<br />} Optional<br />75<br />
  129. 129. Link Structures<br />Link_Product_Supplier<br />Link_Customer_Account_Employee<br />LPS_SQN<br />PRODUCT_SQN<br />SUPPLIER_SQN<br />LPS_LOAD_DTS<br />LPS_REC_SOURCE<br />LPS_ENCR_KEY<br />LCAE_SQN<br />CUSTOMER_SQN<br />ACCOUNT_SQN<br />EMPLOYEE_SQN<br />LCAE_LOAD_DTS<br />LCAE_REC_SOURCE<br />Unique<br />Index<br />Link Structure<br />SEQUENCE<br /><HUB KEY SQN 1><br /><HUB KEY SQN 2><br /><HUB KEY SQN N><br />{LAST SEEN DATE}<br />{CONFIDENCE}<br />{STRENGTH}<br /><LOAD DATE><br /><RECORD SOURCE><br />Unique <br />Index<br />} Optional<br />Dynamic Link<br />76<br />
  130. 130. Satellites Split By Source System<br />SAT_FINANCE_CUST<br />SAT_CONTRACTS_CUST<br />SAT_SALES_CUST<br />PARENT SEQUENCE<br />LOAD DATE<br /><LOAD-END-DATE><br /><RECORD-SOURCE><br />Contact Name<br />Contact Email<br />Contact Phone Number<br />PARENT SEQUENCE<br />LOAD DATE<br /><LOAD-END-DATE><br /><RECORD-SOURCE><br />First Name<br />Last Name<br />Guardian Full Name<br />Co-Signer Full Name<br />Phone Number<br />Address<br />City<br />State/Province<br />Zip Code<br />PARENT SEQUENCE<br />LOAD DATE<br /><LOAD-END-DATE><br /><RECORD-SOURCE><br />Name<br />Phone Number<br />Best time of day to reach<br />Do Not Call Flag<br />Satellite Structure<br />PARENT SEQUENCE<br />LOAD DATE<br /><LOAD-END-DATE><br /><RECORD-SOURCE><br />{user defined descriptive data}<br />{or temporal based timelines}<br />Primary<br />Key<br />77<br />
  131. 131. Why do we build Links this way?<br />78<br />
  132. 132. History Teaches Us…<br />If we model for ONE relationship in the EDW, we BREAK the others!<br />79<br />Portfolio<br />The EDW is designed to handle TODAY’S relationship, as soon as history is loaded, it breaks the model!<br />1<br />Today:<br />M<br />Customer<br />Hub Portfolio<br />X<br />1<br />Portfolio<br />5 years<br />From now<br />M<br />M<br />M<br />Customer<br />Hub Customer<br />X<br />Portfolio<br />M<br />10 Years ago<br />1<br />This situation forces re-engineering of the model, load routines, and queries!<br />Customer<br />
  133. 133. History Teaches Us…<br />If we model with a LINK table, we can handle ALL the requirements!<br />80<br />Portfolio<br />1<br />Today:<br />Hub Portfolio<br />M<br />Customer<br />1<br />M<br />Portfolio<br />LNK<br />Cust-Port<br />5 years <br />from now<br />M<br />M<br />M<br />Customer<br />1<br />Hub Customer<br />Portfolio<br />M<br />10 Years ago<br />This design is flexible, handles past, present, and future relationship changes with NO RE-ENGINEERING!<br />1<br />Customer<br />
  134. 134. Applying the Data Vault to Global DW2.0<br />Manufacturing EDW <br />in China<br />Planning in Brazil<br />Hub<br />Hub<br />Link<br />Sat<br />Sat<br />Link<br />Sat<br />Sat<br />Link<br />Hub<br />Link<br />Hub<br />Hub<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Base EDW Created in Corporate<br />Financials in USA<br />81<br />
  135. 135. 82<br />Extreme Data Vault Partitioning<br />
  136. 136. Query Performance<br />Point-in-time and Bridge Tables, overcoming query issues<br />83<br />
  137. 137. Purpose Of PIT & Bridge<br />To reduce the number of joins, and to reduce the amount of data being queried for a given range of time.<br />These two together, allow “direct table match”, as well as table elimination in the queries to occur.<br />These tables are not necessary for the entire model; only when:<br />Massive amounts of data are found<br />Large numbers of Satellites surround a Hub or Link<br />Large query across multiple Hubs & Links is necessary<br />Real-time-data is flowing in, uninterrupted<br />What are they?<br />Snapshot tables – Specifically built for query speed<br />84<br />
  138. 138. PIT Table Architecture<br />Satellite: Point In Time<br />Primary<br />Key<br />PARENT SEQUENCE<br />LOAD DATE<br />{Satellite 1 Load Date}<br />{Satellite 2 Load Date}<br />{Satellite 3 Load Date}<br />{…}<br />{Satellite N Load Date}<br />PIT Sat <br />Sat 1<br />Sat 2<br />Hub<br />Order<br />PIT Sat <br />Sat 3<br />Sat 1<br />Sat 4<br />Sat 2<br />Sat 1<br />Hub Customer<br />Hub Product<br />Sat 2<br />Sat 3<br />Link Line Item<br />Sat 4<br />Satellite<br />Line Item<br />85<br />
  139. 139. PIT Table Example<br />SAT_CUST_CONTACT_CELL<br />SAT_CUST_CONTACT_ADDR<br />SAT_CUST_CONTACT_NAME<br />SQN LOAD_DTSCELL <br />1 10-14-2000999-555-1212<br />1 10-15-2000 999-111-1234<br />1 10-16-2000 999-252-2834<br />1 10-17-2000 999.257-2837<br />1 10-18-2000 999-273-5555<br />SQN LOAD_DTSADDR <br />1 08-01-200026 Prospect<br />109-29-200026 Prosp St.<br />112-17-200028 November<br />1 01-01-200126 Prospect St<br />SQN LOAD_DTSNAME <br />1 10-14-2000 Dan L<br />1 11-01-2000Dan Linedt<br />112-31-2000Dan Linstedt<br />SQN LOAD_DTSSAT_NAME_LDTS SAT_CELL_LDTS SAT_ADDR_LDTS<br />1 08-01-2000NULL NULL 08-01-2000<br />1 09-01-2000 NULL NULL 08-01-2000<br />1 10-01-2000 NULL NULL 09-29-2000<br />1 11-01-200011-01-200010-18-200009-29-2000<br />1 12-01-200011-01-200010-18-200009-29-2000<br />1 01-01-200112-31-200010-18-200001-01-2001<br />Snapshot Date<br />86<br />
  140. 140. BridgeTable Architecture<br />Satellite: Bridge<br />Primary<br />Key<br />UNIQUE SEQUENCE<br />LOAD DATE<br />{Hub 1 Sequence #}<br />{Hub 2 Sequence #}<br />{Hub 3 Sequence #}<br />{Link 1 Sequence #}<br />{Link 2 Sequence #}<br />{…}<br />{Link N Sequence #}<br />{Hub 1 Business Key}<br />{Hub 2 Business Key}<br />{…}<br />{Hub N Business Key}<br />Bridge<br />Sat 1<br />Sat 2<br />Hub Parts<br />Hub Seller<br />Hub Product<br />Link <br />Link <br />Sat 3<br />Sat 4<br />Satellite<br />Satellite<br />87<br />
  141. 141. Bridge Table Data Example<br />Bridge Table: Seller by Product by Part<br />SQN LOAD_DTSSELL_SQN SELL_ID PROD_SQN PROD_NUM PART_SQN PART_NUM<br />1 08-01-200015 NY*1 2756 ABC-123-9K 525 JK*2*4<br />209-01-200016CO*242654DEF-847-0L 324 MN*5-2<br />310-01-200016CO*2482374PPA-252-2A 9938 DD*2*3<br />411-01-200024AZ*2525222UIF-525-88 7 UF*9*0<br />512-01-200099NM*581DAN-347-7F 16 KI*9-2<br />601-01-200199NM*581DAN-347-7F 24 DL*0-5<br />Snapshot Date<br />88<br />
  142. 142. What WASN’T Covered<br />ETL Automation<br />ETL Implementation<br />SQL Query Logic<br />Balanced MPP design<br />Data Vault Modeling on Appliances<br />Deep Dive on Structures (Hubs, Links, Satellites)<br />What happens when you break the rules?<br />Project management, Risk management & mitigation, methodology & approach<br />Automation: Automated DV modeling, Automated ETL production<br />Change Management<br />Temporal Data Modeling Concerns… And so on…<br />89<br />
  143. 143. Conclusions<br />90<br />
  144. 144. Who’s Using It?<br />
  145. 145. The Experts Say…<br />“The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” <br />Bill Inmon<br />“The Data Vault is foundationally strong and exceptionally scalable architecture.”<br />Stephen Brobst<br />“The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” <br />Doug Laney<br />
  146. 146. More Notables…<br />“This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.” <br />Howard Dresner<br />“[The Data Vault] captures a practical body of knowledge for data warehouse development which both agile and traditional practitioners will benefit from..”<br />Scott Ambler<br />
  147. 147. Where To Learn More<br />The Technical Modeling Book: http://LearnDataVault.com<br />The Discussion Forums: & eventshttp://LinkedIn.com – Data Vault Discussions<br />Contact me:http://DanLinstedt.com - web siteDanLinstedt@gmail.com - email<br />World wide User Group (Free)http://dvusergroup.com<br />Certification Training:<br />Contact me, or learn more at: http://GeneseeAcademy.com<br />94<br />
  148. 148. ODV – Case Study<br />Operational Data Vault – IN THE REAL WORLD!<br />95<br />
  149. 149. E-Pedigree, Drug Track & Trace<br />96<br />Product Returns<br />And Recalls<br />Product<br />Packaging<br />CorpSite<br />Server<br />Secure Integration Services<br />Corporate<br />Serialization<br />Vault<br />Serialization<br />Analytics<br />Engine<br />Packaging<br />Orders<br />Product Authenticator<br />3rd Party Logistics<br />Distribution Warehouse<br />Secure Integration Services<br />E-Pedigree<br />Management<br />Manufacturer<br />Product Packager<br />Supply Chain<br />
  150. 150. Label Serialization Vault<br />97<br />ERP<br />Product <br />Master Data<br />EPC GlobalStandards<br />Corp Domain<br />Corp<br />Applications<br />Serialization<br />Vault<br />CustPkg<br />Line<br />Data<br />E-Pedigree<br />WS/SOAP<br />Master Data<br /><ul><li>Products
  151. 151. Locations
  152. 152. Trading Partners
  153. 153. Users</li></ul>Shipping Data<br /><ul><li>Transactions</li></ul>Shipping<br />Reasons<br />Serialization<br />Marts<br />Warehouse<br />(WMS)<br />Flat Files<br />WS/SOAP<br />ASN<br />Serialization Vault<br />Global – Master Data<br />Local – Private Data<br />Serialization/Packaging Data<br /><ul><li>Serial #’s
  154. 154. Hierarchical Relationships
  155. 155. Containers</li></li></ul><li>Corporate Security<br />98<br />Pros<br /><ul><li>Unique Logins Limit Access
  156. 156. Physical Data Separation in Logical “Database” units
  157. 157. No single login has 100% data access.
  158. 158. Customers can be CHARGED for disk space, indexing, utilization</li></ul>Cons<br /><ul><li>Maintenance, Backup and Restore
  159. 159. Changes to the data model ripple (larger impacts) as more customers are signed up.
  160. 160. Each “support call” requires separate login to see the data set.</li></ul>Data Exchange/Sharing Through Code Only<br />Web-Services and Flat File Delivery<br />Customer<br />Login<br />Corp<br />Login<br />Customer<br />Login<br />Corp<br />Login<br />Employee<br />Validation<br />Admin<br />Login<br />Encrypt Key<br />Encrypt Key<br />Encrypt Key<br />Mart<br />1<br />Mart<br />2<br />Mart<br />3<br />Mart<br />1<br />Mart<br />2<br />Mart<br />3<br />Tracking #<br />Machine Info<br />SQL View Layer<br />SQL View Layer<br />Global<br />Data Vault<br />Data Vault<br />Manufacturer<br />Shipper<br />9/27/2011<br />
  161. 161. Web Services File Delivery<br />99<br />Web-Services and Flat File Delivery<br />Machine<br />Local DB<br />Machine<br />Global DB<br />Machine<br />Local DB<br />Machine<br /><ul><li>Encryption at multiple levels
  162. 162. Multi-machine Utilization
  163. 163. RAM Based encryption decryption through services</li></li></ul><li>Secure Machine Transfers<br />100<br />External IP Cards<br />Web-Services and Flat File Delivery Machine<br />Encrypted Local Director Database<br />Encrypt / Decrypt<br />https layer<br />Encrypt / Decrypt<br />DBMS<br />Machine<br />VPN Tunnel<br />Encrypted / Compressed Storage<br />
  164. 164. Secure Client Data Interchange<br />101<br /><ul><li>Decrypt using Corp Key, then Re-Encrypt with Customer Unique Key before storing
  165. 165. Customer Owned Key (Dictated by Customer)
  166. 166. Corporate Owned Key (Encrypts data internally)</li></ul>Corp Managed / Owned Copy<br />Web Services<br />Customer Copy<br />Customer<br />Login<br />Corp<br />Login<br />+HTTPS<br />Corp Encrypt Key<br />Web Services<br />Encrypted<br />Flat Files<br />Decryption<br />Key<br />+ SFTP<br />Customer <br />Local Copy<br />
  167. 167. Security: ODV Web Services<br />102<br />Corp Managed / Owned Copy<br />Web Browser<br />Web Site / Server<br />Java Script<br />Or PHP<br />Web Services<br />Customer<br />Login<br />Corp<br />Login<br />Corporate Encrypt Key<br />Corporate Owned Encryption Key<br />Global DB<br />
  168. 168. Inflow/Outflow Applications<br />103<br />Customer<br />Corporation<br />Corporation<br />Customer<br />Source<br />Machine<br />Encrypts Data<br />Using Customer<br />Key<br />Corp Decrypts<br />Data<br />According to <br />Customer Key<br />Corp Re-Encrypts<br />Data According to<br />Internal Key<br />For Specific <br />Customer<br />Corp Decrypts<br />Data According to<br />Internal Key<br />For Specific <br />Customer<br />Corp Encrypts<br />Data<br />According to <br />Customer Key<br />Customer <br />Decrypts<br />Data<br />According to <br />Customer Key<br />DB<br />DB<br />Transmit Encrypted <br />Data over HTTPS<br />Transmit Encrypted <br />Data over HTTPS<br />Web Service Sender<br />Web Service Collector<br />
  169. 169. ODV: Secure File Request<br />104<br />Corporation<br />Customer<br />** Note: Each Customer DB is encrypted via an internally owned Corp key which is unique to EACH customer.<br />Customer <br />Decrypts<br />File<br />According to <br />Customer Key<br />Transmit Encrypted <br />Data over FTPS<br />Encrypted File<br />
  170. 170. ODV: Front-End Ping Request<br />105<br />Corporation<br />Customer<br />Corp One-Way<br />Hash of key<br />Number<br />To Execute Ping<br />Web-Based<br />PING<br />Validation<br />DBMS<br />Unencrypted Data <br />Transfer<br />Login / Auth<br />

×