Operational Data Vault

16,133 views

Published on

I gave this presentation at the Advanced Architecture Conference, Bill Inmon, 2011 in Evergreen, Colorado. This presentation covers a new breed of data warehousing called Operational Data Warehousing. These are the next steps in business intelligence towards self-service BI and enabling users to do more with their enterprise data warehouse solution. Specifically, it talks about how the Data Vault model fits in to this picture.

If you would like to use the slides, please e-mail me first, I'd be happy to discuss it with you.

Published in: Technology, Business
1 Comment
8 Likes
Statistics
Notes
  • The agenda on this presentation is not quite right (sorry folks), I will fix the agenda slide and re-upload this presentation in a couple days. Don't forget: you can watch the YouTube videos at: http://YouTube.com/LearnDataVault, sign up for the facebook page: LearnDataVault, or purchase the book at: http://LearnDataVault.com

    Thanks, Dan L
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
16,133
On SlideShare
0
From Embeds
0
Number of Embeds
513
Actions
Shares
0
Downloads
483
Comments
1
Likes
8
Embeds 0
No embeds

No notes for slide
  • Before we begin exploring how the Data Vault can help you, or even defining what a Data Vault is, we need to first understand some of the business problems that may be causing you heartburn on a daily basis.
  • Everything from poor agility to a lack of IT Transparency plague todays’ data warehouses. I can’t begin to tell you how much pain these businesses are suffering as a result of these problems. Inconsistent Answer Sets, Lack of accountability, inadequate auditablitiy all play a part in data warehouses that are currently on the brink of falling apart.But it’s not just business issues, there are technical ones to cope with as well.
  • There are always technology obstacles that we face in any data warehousing project. So the question is: what kinds of problems have you seen in your journey? Do they haunt you today?
  • Complexity drives high cost, resulting in unnecessary late delivery schedules and unsustainable business logic in the integration channels.Real-time data is flooding our data warehouses, has your architecture fallen down on the job?Unstructured data and legal requirements for auditability are bringing huge data volumes.Master Data Alignment is missing from our data warehouses, as they are split in disparate systems all over the world.Bad data quality is covered up through the transformation layers on the way IN to your EDW.Data warehouses grow so large and become so difficult to maintain that IT teams are often delivering late, and beyond original costs.The foundations of your data warehouse are probably crumbling under sheer weight and pressure.
  • Disparate data marts, unmatched answer sets, geographical problems, and worse…Projects are under fire from a number of areas. Let’s take a look at what happenswhen a data warehouse project reaches the brick wall head-on, at 90 miles an hour.
  • I think this says it all…. Projects cancelled and restarted, Re-Engineering required to absorb changes, high complexity making it difficult to upgrade, change, and keep up at the speed of business. Disparate silo solutions screaming for consolidation, and of course – a lack of accountability on BOTH sides of the fence…All signs of an ailing BI solution on the brink of being shut-down.
  • We have got to keep focus on the prize. Business still wants a BI systemBacked by an enterprise EDW.IT still wants a manageable system that will grow and change without major re-engineering.There is a better way, and I can help you with it.
  • The Data Vault model is really just another name for “Common foundational architecture and design”.It’s based on 10 years of Research and design work, followed by10 years of implementation best practices.It is architected to help you solve the problems!
  • Put quite simply: It’s an easy-to-use architecture and plan, a guide-bookFor building a repeatable, consistent, and scalable data warehouse system.So just what is the value of the Data Vault?
  • The Data Vault model and methodology provide:Painless AuditabilityUnderstandable standardsRapid AdaptabilitySimple Build-outUncomplicated DesignAnd Effortless ScalabilityGo after your goals, build a wildly successful data warehouse just like I have.
  • Beginning: 5 advanced ETLBy the 1st month, they 5 advanced, and 15 basic/introBy the 6th month, they 5 advanced, but 50 basicBy the end of the 8th month they went to production with 10 MF sourcesAnd their team size was: 12 people (5 advanced, 7 basic – for support).
  • You’re not the first, nor will you be the last one to use it.Some of the worlds biggest companies are implementing Data Vaults.From Diamler Motors to Lockheed Martin, to the Department of Defense.JPMorgan and Chase used the Data Vault model to merge 3 companies in 90 days!
  • Operational Data Vault

    1. 1. Data Vault:What’s Next?<br />© Dan Linstedt, 2011-2012 all rights reserved<br />1<br />
    2. 2. Agenda<br />Introduction – why are you here?<br />Short Data Vault Review<br />What’s Next? Advanced Architecture…<br />Defining Operational Data Warehousing<br />Why is Data Vault a Good Fit?<br /><BREAK><br />Fundamental Paradigm Shift<br />Business Keys & Business Processes<br />Technical Review<br />Query Performance (PIT & Bridge)<br />What wasn’t covered in this presentation…<br />2<br />
    3. 3. A bit about me…<br />3<br />Author, Inventor, Speaker – and part time photographer…<br />25+ years in the IT industry<br />Worked in DoD, US Gov’t, Fortune 50, and so on…<br />Find out more about the Data Vault:<br />http://YouTube.com/LearnDataVault<br />http://LearnDataVault.com<br />Slides available:<br />http://SlideShare.net<br />Search: “Advanced Architecture Data Vault”<br />Full profile on http://www.LinkedIn.com/dlinstedt<br />
    4. 4. Why Are You Here?<br />4<br />Your Expectations?<br />Your Questions?<br />Your Background?<br />Areas of Interest?<br />Biggest question:<br />What are the top 3 pains your current EDW / BI solution is experiencing?<br />
    5. 5. Short Data Vault Review<br />What is it and where did it come from?<br />5<br />
    6. 6. Data Warehousing Timeline<br />E.F. Codd invented relational modeling<br />1976 Dr Peter Chen<br />Created E-R Diagramming<br />2010- DV<br />Alive and Well<br />Around the<br />World<br />1990 – Dan Linstedt Begins R&D on Data Vault Modeling<br />Chris Date and Hugh Darwen Maintained and Refined Modeling<br />Mid 70’s AC Nielsen <br />Popularized<br />Dimension & Fact Terms<br />1970<br />2010<br />2000<br />1960<br />1980<br />1990<br />Late 80’s – Barry Devlin and Dr Kimball Release “Business Data Warehouse”<br />Early 70’s Bill Inmon Began Discussing Data Warehousing<br />Mid 80’s Bill Inmon<br />Popularizes Data Warehousing<br />Mid 60’s Dimension & Fact Modeling presented by General Mills and Dartmouth University<br />2000 – Dan Linstedt releases first 5 articles on Data Vault Modeling<br />Mid – Late 80’s Dr Kimball <br />Popularizes Star Schema<br />
    7. 7. Data Vault Modeling…<br />Took 10 years of Research and Design, including TESTING <br />to become <br />flexible, consistent, and scalable<br />7<br />
    8. 8. What IS a Data Vault? (Business Definition)<br />Data Vault Model<br />Detail oriented<br />Historical traceability<br />Uniquely linked set of normalized tables<br />Supports one or more functional areas of business<br />8<br /><ul><li>Data Vault Methodology
    9. 9. CMMI, Project Plan
    10. 10. Risk, Governance, Versioning
    11. 11. Peer Reviews, Release Cycles
    12. 12. Repeatable, Consistent, Optimized
    13. 13. Complete with Best Practices for BI/DW</li></ul>Business Keys<br />Span / Cross<br />Lines of Business<br />Sales<br />Contracts<br />Planning<br />Delivery<br />Finance<br />Operations<br />Procurement<br />Functional Area<br />
    14. 14. Supply Chain Analogy<br />9<br />Source <br />Systems<br />Data Vault<br />(EDW)<br />Data Marts<br />
    15. 15. What Does One Look Like?<br />Records a history of the interaction<br />Customer<br />Product<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Link<br />Customer<br />Product<br />F(x)<br />F(x)<br />F(x)<br />Sat<br />Sat<br />Sat<br />Sat<br />Order<br />F(x)<br />Sat<br />Order<br />Elements:<br /><ul><li>Hub
    16. 16. Link
    17. 17. Satellite</li></ul>10<br />Hub = List of Unique Business Keys<br />Link = List of Relationships, Associations<br />Satellites = Descriptive Data<br />
    18. 18. Colorized Perspective…<br />Data Vault<br />3rd NF & Star Schema<br />(separation)<br />Business Keys<br />Associations<br />Details<br />HUB<br />Satellite<br />The Data Vault uniquely separates the Business Keys (Hubs) from the Associations (Links) and both of these from the Detailsthat describe them and provide context (Satellites). <br />LINK<br />Satellite<br />(Colors Concept Originated By: Hans Hultgren)<br />11<br />
    19. 19. A Quick Look at Methodology Issues<br />Business Rule Processing, Lack of Agility, and <br />Future proofing your new solution<br />12<br />
    20. 20. EDW Architecture: Generation 1<br />13<br />Enterprise BI Solution<br />Sales<br />(batch)<br />Staging<br />(EDW)<br />Star<br />Schemas<br />Complex <br />Business <br />Rules #2<br />Finance<br />Conformed Dimensions<br />Junk Tables<br />Helper Tables<br />Factless Facts<br />Staging + History<br />Complex<br />Business <br />Rules<br />+Dependencies<br />Contracts<br /><ul><li>Quality routines
    21. 21. Cross-system dependencies
    22. 22. Source data filtering
    23. 23. In-process data manipulation
    24. 24. High risk of incorrect data aggregation
    25. 25. Larger system = increased impact
    26. 26. Often re-engineered at the SOURCE
    27. 27. History can be destroyed (completely re-computed)</li></li></ul><li>#1 Cause of BI Initiative Failure<br />14<br />Anyone?<br />Re-Engineering<br />For<br />Every Change!<br />Let’s take a look at one example…<br />
    28. 28. Re-Engineering<br />Business<br />Rules<br />Data Flow (Mapping)<br />Current Sources<br />Sales<br />Customer<br />Source<br />Join<br />Finance<br />Customer<br />Transactions<br />Customer<br />Purchases<br />IMPACT!!<br />** NEW SYSTEM**<br />15<br />
    29. 29. Federated Star Schema Inhibiting Agility<br />Data Mart 3<br />High<br />Effort<br />& Cost<br />Data Mart 2<br />Data Mart 1<br />Changing and Adjusting conformed dimensions causes an exponential rise in the cost curve over time<br />RESULT: Business builds their own Data Marts!<br />Low<br />Maintenance<br />Cycle Begins<br />Time<br />Start<br />16<br />The main driver for this is the maintenance costs, and re-engineering of the existing system which occurs for each new “federated/conformed” effort. This increases delivery time, difficulty, and maintenance costs.<br />
    30. 30. EDW Architecture: Generation 2<br />SOA<br />Enterprise BI Solution<br />Star<br />Schemas<br />(real-time)<br />Sales<br />(batch)<br />EDW<br />(Data Vault)<br />(batch)<br />Staging<br />Error<br />Marts<br />Finance<br />Contracts<br />Complex<br />Business <br />Rules<br />Report<br />Collections<br />Unstructured<br />Data<br />FUNDAMENTAL GOALS<br /><ul><li>Repeatable
    31. 31. Consistent
    32. 32. Fault-tolerant
    33. 33. Supports phased release
    34. 34. Scalable
    35. 35. Auditable</li></ul>The business rules are moved closer to the business, improving IT reaction time, reducing cost and minimizing impacts to the enterprise data warehouse (EDW)<br />17<br />
    36. 36. NO Re-Engineering<br />Current Sources<br />Data Vault<br />Sales<br />Stage<br />Copy<br />Hub<br />Customer<br />Customer<br />Finance<br />Stage<br />Copy<br />Link Transaction<br />Customer<br />Transactions<br />Hub<br />Acct<br />Hub<br />Product<br />Customer<br />Purchases<br />Stage<br />Copy<br />NO IMPACT!!!<br />NO RE-ENGINEERING!<br />** NEW SYSTEM**<br />IMPACT!!<br />18<br />
    37. 37. Progressive Agility and Responsiveness of IT<br />High<br />Effort<br />& Cost<br />Low<br />Maintenance<br />Cycle Begins<br />Time<br />Start<br />19<br />Foundational Base Built<br />New Functional Areas Added<br />Initial DV Build Out<br />Re-Engineering does NOT occur with a Data Vault Model. This keeps costs down, and maintenance easy. It also reduces complexity of the existing architecture.<br />
    38. 38. Why is Data Vault a Good Fit?<br />20<br />
    39. 39. What are the top businessobstacles in your data warehousetoday?<br />21<br />
    40. 40. Poor Agility<br />Inconsistent Answer Sets<br />Needs Accountability<br />Demands Auditability<br />Desires IT Transparency<br />Are you feeling Pinned Down?<br />22<br />
    41. 41. What are the top technologyobstacles in yourdata warehousetoday?<br />23<br />
    42. 42. Complex Systems<br />Real-Time Data Arrival<br />Unimaginable Data Growth<br />Master Data Alignment<br />Bad Data Quality<br />Late Delivery/Over Budget<br />Are your systems CRUMBLING?<br />24<br />
    43. 43. Yugo<br />Existing Solutions<br />Worlds Worst Car<br />Have lead you down a painful path…<br />25<br />
    44. 44. Projects Cancelled & Restarted<br />Re-engineering required to absorb new systems<br />Complexity drives maintenance cost Sky high<br />Disparate Silo Solutions provide inaccurate answers!<br />Severe lack of Accountability<br />26<br />
    45. 45. How can youovercomethese obstacles?<br />There must be a better way…<br />There IS a better way!<br />27<br />
    46. 46. It’s Called the Data Vault Model andMethodology<br />28<br />
    47. 47. What is it?<br />It’s a simple<br />Easy-to-use<br />Plan<br />To build your <br />valuable<br />Data Warehouse!<br />29<br />
    48. 48. What’s the Value?<br />Painless Auditability <br />Understandable Standards<br />Rapid Adaptability<br />Simple Build-out<br />Uncomplicated Design<br />Effortless Scalability<br />Pursue Your Goals!<br />30<br />
    49. 49. Why Bother With Something New?<br />Old Chinese proverb: <br />'Unless you change direction, you're apt to end up where you're headed.'<br />31<br />
    50. 50. What Are the Issues?<br />This is NOT what you want happening to your project!<br />Business…<br />Changes Frequently<br />IT….<br />Needs Accountability<br />Takes Too Long<br />Demands Auditability<br />Is Over-budget<br />Has No Visibility<br />Too Complex<br />Wants More Control<br />Can’t Sustain Growth<br />THE GAP!!<br />32<br />
    51. 51. What Are the Foundational Keys?<br />Flexibility<br />Scalability<br />Productivity<br />33<br />
    52. 52. Key: Flexibility<br />Enabling rapid change on a massive scale without downstream impacts!<br />34<br />
    53. 53. Key: Scalability<br />Providing no foreseeable barrier to increased size and scope<br />People, Process, & Architecture!<br />35<br />
    54. 54. Key: Productivity<br />Enabling low complexity systems with high value output at a rapid pace<br />36<br />
    55. 55. How does it work?<br />Bringing the Data Vault to Your Project<br />37<br />
    56. 56. Key: Flexibility<br />No Re-Engineering!<br />Addingnew components to the EDW has NEAR ZERO impact to:<br /><ul><li>Existing Loading Processes
    57. 57. Existing Data Model
    58. 58. Existing Reporting & BI Functions
    59. 59. Existing Source Systems
    60. 60. Existing Star Schemas and Data Marts</li></ul>38<br />
    61. 61. Case In Point:<br />Result of flexibility of the Data Vault Model allowed them to merge 3 companies in 90 days – that is ALL systems, ALL DATA!<br />39<br />
    62. 62. Key: Scalability in Architecture<br />Scalingis easy, its based on the following principles<br /><ul><li>Hub and spoke design
    63. 63. MPP Shared-Nothing Architecture
    64. 64. Scale Free Networks</li></ul>40<br />
    65. 65. Case In Point:<br />Result of scalability was to produce a Data Vault model that scaled to 3 Petabytes in size, and is still growing today!<br />41<br />
    66. 66. Key: Scalability in Team Size<br />You should be able to SCALE your TEAM as well!<br />With the Data Vault methodology, you can:<br />Scale your team when desired, at different points in the project!<br />42<br />
    67. 67. Case In Point:<br />(Dutch Tax Authority)<br />Result of scalability was to increase ETL developers for each new source system, and reassign them when the system was completely loaded to the Data Vault<br />43<br />
    68. 68. Key: Productivity<br />Increasing Productivity requires a reduction in complexity.<br />The Data Vault Model simplifies all of the following:<br /><ul><li>ETL Loading Routines
    69. 69. Real-Time Ingestion of Data
    70. 70. Data Modeling for the EDW
    71. 71. Enhancing and Adapting for Change to the Model
    72. 72. Ease of Monitoring, managing and optimizing processes</li></ul>44<br />
    73. 73. Case in Point:<br />Result of Productivity was: 2 people in 2 weeks merged 3 systems, built a full Data Vault EDW, 5 star schemas and 3 reports. <br />These individuals generated:<br /><ul><li>90% of the ETL code for moving the data set
    74. 74. 100% of the Staging Data Model
    75. 75. 75% of the finished EDW data Model
    76. 76. 75% of the star schema data model</li></ul>45<br />
    77. 77. The Competing Bid?<br />The competition bid this with 15 people and 3 months to completion, at a cost of $250k! (they bid a Very complex system)<br />Our total cost? $30k and 2 weeks!<br />46<br />
    78. 78. Results?<br />Changing the direction of the river takes less effort than stopping the flow of water<br />47<br />
    79. 79. < BREAK TIME ><br />48<br />
    80. 80. What’s Next?<br />A look at what’s around the corner for Data Warehousing and Business Intelligence, believe me, it’s going to get interesting fast.<br />49<br />
    81. 81. Operational Data Vault<br />50<br />Data Co-Location:<br />Transactions & Transaction History<br />Master Data & Master Data History<br />Metadata & Metadata History<br />External Data & External Data History<br />Business Rules & Business Rule History<br />Security / Access data & History<br />Unstructured Data Ties & History<br />Real-time Data Feeds DIRECTLY in to the data store<br />Operational Applications <br />ON TOP of the warehouse!<br />
    82. 82. Extreme Automation!<br />51<br />Automated Creation of Data Models:<br /><ul><li>Staging Models
    83. 83. Data Vault Models
    84. 84. Star Schema Models
    85. 85. Cube Models
    86. 86. Excel Models (spreadsheets)
    87. 87. Data Mining Models (table structures)</li></ul>Automated Creation of ETL Processes:<br /><ul><li>Staging Loads
    88. 88. Data Vault (Data Warehouse Loads)
    89. 89. Star Schema Loads (80% solutions)
    90. 90. Cube Loads (80% solutions)
    91. 91. Excel Loads / Queries (80% solutions)
    92. 92. Data Mining Queries (80% solutions)</li></ul>Other Automated Components:<br /><ul><li>Initial Metadata Population
    93. 93. Initial Master Data Population
    94. 94. Generated Testing Scripts</li></ul>http://www.jmorganmarketing.com/should-social-crm-be-automated/<br />
    95. 95. Results of all of this?<br />52<br />EDW Will:<br />become BACK OFFICE!!<br />become SELF-RELIANT / SELF-HEALING<br />adapt to new structures, new hardware, and new data<br />automatically backup and remove old data<br />Self-Reliance<br />http://images.businessweek.com/ss/06/10/bestunder25/source/1.htm<br />
    96. 96. How Long Will it Take?<br />53<br />My milestone predictions:<br />1 yr: Operational Data Vault<br />2 yrs: Beginning automation of business rules<br />3 yrs: Beginning dynamic restructuring in the DV<br />4 yrs: Oper Apps contain BI & metadata & Master data GUI’s in a single place<br />5 yrs: the “all-in-one” appliance, containing 75% of what we need at the firmware levels to do all these things<br />http://thypolarlife.wordpress.com/2011/08/02/this-moment-in-time/<br />
    97. 97. Why Should I Care?<br />54<br /><ul><li>Because the Data Warehouse combined with the operational applications on top, make for a self-service BI environment
    98. 98. Because this technology is the heartof Data Warehousing!
    99. 99. Because the future is now
    100. 100. Because it will happen with or without you… You do want a job right?</li></li></ul><li>What About Tooling?<br />55<br />Data<br />Patterns<br />New Models<br />Auto-<br />mation<br />Target DDL<br />ETL Code<br />Source DDL<br />Documentation<br />Ontology<br />Test Data<br />Cross-Reference<br />SQL Code<br />Templates<br />Config<br />Metadata & Business Rules!<br />
    101. 101. Who’s Tooling Today?<br />56<br />WhereScape<br />Quipu<br />AnalytixDS<br />RapidACE<br />Nexus<br />BI-Ready<br />Centennium<br />
    102. 102. What Does It Add Up To?<br />57<br />Pervasive<br />BI<br />
    103. 103. What’s the Key Ingredient?<br />58<br />Ubiquitous<br />A.I.<br />
    104. 104. Defining Operational Data Warehousing<br />What is an ODW and How did we get here?<br />59<br />
    105. 105. What IS An Operational DW?<br />A raw, time-variant, integrated, non-volatile data warehouse, on top of which sits an operational application – “editing and changing data”.<br />However, instead of updates and deletes in place, the data is “marked” deleted, and updates are turned in to Inserts, creating a delta audit trail along the way.<br />Yes, it’s an operational application on top of the integrated data warehouse (or in this case, Data Vault model).<br />60<br />
    106. 106. Oper/Active DW Timeline<br />61<br />Real-Time & Oper BI<br />Make the Scene<br />(Users Want Direct<br />Control & Up to the<br />Minute Data)<br />Teradata makes<br />Real advances in Active DW<br />“Appliances” begin appearing<br />On-scene<br />Data Warehouses<br />Split From Operational<br />Systems<br />2010<br />2000<br />1980<br />1990<br />2002 - Cendant-TRG<br />Creates Worlds First<br />Operational Data Vault<br />Mid 90’s “Active” DW<br />Becomes Important<br />But has to wait for Technology<br />To Catch Up!<br />
    107. 107. How Did We Get Here?<br />62<br />DDW<br />ODW<br />How do you dynamically adapt to business?<br />Can you change what is happening?<br />7<br />6<br />Dynamic Alterations<br />To Structure<br />System Of Record<br />Application Direct Edits to Data in the EDW<br />Parts are © Teradata – Stephen Brobst, CTO<br />
    108. 108. ODV Overview<br />63<br />Applications<br />(Direct edits)<br />ODV<br />Direct Inserts<br />NO STAGING AREA<br />Web-Services<br />(Direct Feeds)<br />Virtual Marts<br />(Direct Access)<br />Unstructured Feeds<br />(Indirect Feeds)<br />Metadata Rules<br />(Direct Edits)<br />Batch Loads<br />(Direct Feeds)<br />
    109. 109. What is the architecture?<br />64<br />Operational<br />Metadata<br />Management<br />Operational<br />Applications<br />Master Data<br />Strategic<br />Reports<br />& OLAP<br />Master<br />Data<br />Direct Edits<br />Web Interface (usually)<br />Direct Edits<br />Real-Time<br />Collector<br />SOR<br />Real-Time Data<br />Data Vault EDW<br /><ul><li>Stored
    110. 110. Analyzed / Scored</li></ul>Operational<br />Systems<br />Unstructured<br />Semi-Structured<br />Staging <br />Area<br />Non-S.O.R.<br />Historical Batch Data<br />Non-SOR<br />Batch Data<br />Operational<br />Alerts<br />Operational <br />Systems<br />Virtual <br />Marts<br />Real-Time<br />Mining<br />Engine<br /><ul><li>Flexible
    111. 111. Accountable
    112. 112. Compliant
    113. 113. Scalable
    114. 114. Normalized
    115. 115. Dynamic
    116. 116. Granular
    117. 117. Historic
    118. 118. Integrated by business key</li></li></ul><li>What must an ODW have?<br />Operational Application(s) on-top of the single data store<br />All the up-time and maintenance requirements of a standard operational application (24x7x365, 6 9’s reliance, etc…)<br />Inflow and outflow of information; bi-directional data flow to & from the service bus (SOA/ESB, etc..)<br />Capacity to incorporate and store existing batch loads and accept real-time data from other feeds<br />Ability to interface with unstructured data sets<br />All the inherent design necessities of an EDW<br />65<br />
    119. 119. Why should I care?<br />66<br />TWO REASONS:<br />CONVERGENCE<br />SELF-SERVICE BI<br />
    120. 120. Under the Covers…<br />67<br />Presents Data to User in Conformed Screens<br />Application<br />3. Present in GUI<br />4. Accept Ins, Upd, Del<br />Data Access<br />Control Layer<br />5. Perform Insert / Status change<br />2. Lock Business Key Rows<br />1. Read Data for Edit<br />6. Release Lock On Business Key Rows<br />Sat 1<br />Operational<br />Data Vault<br />(ODW) Layer<br />Sat 2<br />Hub Parts<br />Link <br />Hub Seller<br />Hub Product<br />Link <br />Sat 3<br />Sat 4<br />Satellite<br />Satellite<br />
    121. 121. Dropping by the Way-Side<br />No…<br />ETL<br />BATCH DRIVEN PROCESSING<br />“Synchronization” with the Source System<br />missing source data<br />No scalability problems<br />No ODS needed!<br />No “Master Data” system needed<br />No Staging area needed<br />68<br />
    122. 122. Positives<br />Data in the ODW can be governed<br />Audit trail built in<br />Delta’s only are stored<br />NEW applications can be created to “automatically” generate Cubes/Star Schemas – these apps can be run by the users…<br />Self-Service BI is enabled!<br />Master data can be “marked, scored, stored” in the same place as the EDW<br />69<br />
    123. 123. Old Components Still There?<br />Staging areas will exist as long as there is external data to load and integrate<br />ODS areas may still exist as long as there are other legacy applications existing as source systems<br />Master Data areas may still exist as long as the logic is not built directly in to the “operational DW application”<br />70<br />
    124. 124. Secure ODV Technical Layers<br />71<br />Visible Objects<br />Inbound API<br />Outbound API<br />Services<br />Authentication API<br />Master Data API<br />Component<br />Groups<br />Packaging API<br />Pedigree API<br />Security Key Mgr<br />API<br />Transaction API<br />Aggregation API<br />File Management<br />Interface<br />Kit API<br />Busn. Intelligence API<br />Notification Interface<br />Vault <br />Accessibility<br />Subject Area API<br />Scheduling Interface<br />Local DB Interface<br />Global DB Interface<br />Common Data <br />Object Area<br />Security Interface<br />(Encryption Too)<br />Format Interface<br />Persistence Cache DB Interface<br />Logging Interface<br />Database Interface<br />Web Server Locally Based<br />Persistent DB Cache for <br />Joining<br />Global DB<br />Local DB1<br />Local DB2<br />
    125. 125. What are the benefits?<br />Simplified Architecture<br />Single Copy of the data!<br />No “intermediate” IT work to do<br />Users become empowered, with direct access to data sets<br />Of course, using the Data Vault model, you gain ALL the benefits of the Data Vault (Scalability, flexibility, etc…)<br />NOTE: Two or more “users” can actually EDIT different parts of the same record at the same time!<br />Integrating external data basically makes it all available to the application immediately!<br />NO NEED TO BUILD A SEPARATE EDW!!<br />72<br />
    126. 126. What are the drawbacks?<br />No current “application” is using the Data Vault for operational data<br />In other words, off-the-shelf apps in this area do not yet exist – you have to “build it” yourself<br />Self-Service BI application technology is nascent or non-existent today<br />Master Data & Metadata Applications are not currently available on top of Data Vault<br />73<br />
    127. 127. Technical Review<br />Hub, Link, Satellite - Definitions<br />74<br />
    128. 128. HUB Data Examples<br />HUB_PART_NUMBER<br />HUB_CUST_ACCT<br />SQN PART_NUM LOAD_DTS RECORD_SRC<br />1 MFG-25862 10-14-2000 MANUFACT<br />2 MFG*25266 10-14-2000 MANUFACT<br />3 *P25862 10-14-2000 PLANNING<br />4 MFG_25862 10-15-2000 DELIVERY<br />5 CN*25266 10-16-2000 DELIVERY<br />SQN CUST_ACCT LOAD_DTS RECORD_SRC<br />1 ABC123 10-14-2000 SALES<br />2 ABC-123 10-14-2000 SALES<br />3 *ABC-123 10-14-2000 FINANCE<br />4 123,ABCD 10-15-2000 CONTRACTS<br />5 PEF-2956 10-16-2000 CONTRACTS<br />Hub Structure<br />SEQUENCE<br /><BUSINESS KEY><br />{LAST SEEN DATE}<br /><LOAD DATE><br /><RECORD SOURCE><br />} Unique Index<br />} Optional<br />75<br />
    129. 129. Link Structures<br />Link_Product_Supplier<br />Link_Customer_Account_Employee<br />LPS_SQN<br />PRODUCT_SQN<br />SUPPLIER_SQN<br />LPS_LOAD_DTS<br />LPS_REC_SOURCE<br />LPS_ENCR_KEY<br />LCAE_SQN<br />CUSTOMER_SQN<br />ACCOUNT_SQN<br />EMPLOYEE_SQN<br />LCAE_LOAD_DTS<br />LCAE_REC_SOURCE<br />Unique<br />Index<br />Link Structure<br />SEQUENCE<br /><HUB KEY SQN 1><br /><HUB KEY SQN 2><br /><HUB KEY SQN N><br />{LAST SEEN DATE}<br />{CONFIDENCE}<br />{STRENGTH}<br /><LOAD DATE><br /><RECORD SOURCE><br />Unique <br />Index<br />} Optional<br />Dynamic Link<br />76<br />
    130. 130. Satellites Split By Source System<br />SAT_FINANCE_CUST<br />SAT_CONTRACTS_CUST<br />SAT_SALES_CUST<br />PARENT SEQUENCE<br />LOAD DATE<br /><LOAD-END-DATE><br /><RECORD-SOURCE><br />Contact Name<br />Contact Email<br />Contact Phone Number<br />PARENT SEQUENCE<br />LOAD DATE<br /><LOAD-END-DATE><br /><RECORD-SOURCE><br />First Name<br />Last Name<br />Guardian Full Name<br />Co-Signer Full Name<br />Phone Number<br />Address<br />City<br />State/Province<br />Zip Code<br />PARENT SEQUENCE<br />LOAD DATE<br /><LOAD-END-DATE><br /><RECORD-SOURCE><br />Name<br />Phone Number<br />Best time of day to reach<br />Do Not Call Flag<br />Satellite Structure<br />PARENT SEQUENCE<br />LOAD DATE<br /><LOAD-END-DATE><br /><RECORD-SOURCE><br />{user defined descriptive data}<br />{or temporal based timelines}<br />Primary<br />Key<br />77<br />
    131. 131. Why do we build Links this way?<br />78<br />
    132. 132. History Teaches Us…<br />If we model for ONE relationship in the EDW, we BREAK the others!<br />79<br />Portfolio<br />The EDW is designed to handle TODAY’S relationship, as soon as history is loaded, it breaks the model!<br />1<br />Today:<br />M<br />Customer<br />Hub Portfolio<br />X<br />1<br />Portfolio<br />5 years<br />From now<br />M<br />M<br />M<br />Customer<br />Hub Customer<br />X<br />Portfolio<br />M<br />10 Years ago<br />1<br />This situation forces re-engineering of the model, load routines, and queries!<br />Customer<br />
    133. 133. History Teaches Us…<br />If we model with a LINK table, we can handle ALL the requirements!<br />80<br />Portfolio<br />1<br />Today:<br />Hub Portfolio<br />M<br />Customer<br />1<br />M<br />Portfolio<br />LNK<br />Cust-Port<br />5 years <br />from now<br />M<br />M<br />M<br />Customer<br />1<br />Hub Customer<br />Portfolio<br />M<br />10 Years ago<br />This design is flexible, handles past, present, and future relationship changes with NO RE-ENGINEERING!<br />1<br />Customer<br />
    134. 134. Applying the Data Vault to Global DW2.0<br />Manufacturing EDW <br />in China<br />Planning in Brazil<br />Hub<br />Hub<br />Link<br />Sat<br />Sat<br />Link<br />Sat<br />Sat<br />Link<br />Hub<br />Link<br />Hub<br />Hub<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Base EDW Created in Corporate<br />Financials in USA<br />81<br />
    135. 135. 82<br />Extreme Data Vault Partitioning<br />
    136. 136. Query Performance<br />Point-in-time and Bridge Tables, overcoming query issues<br />83<br />
    137. 137. Purpose Of PIT & Bridge<br />To reduce the number of joins, and to reduce the amount of data being queried for a given range of time.<br />These two together, allow “direct table match”, as well as table elimination in the queries to occur.<br />These tables are not necessary for the entire model; only when:<br />Massive amounts of data are found<br />Large numbers of Satellites surround a Hub or Link<br />Large query across multiple Hubs & Links is necessary<br />Real-time-data is flowing in, uninterrupted<br />What are they?<br />Snapshot tables – Specifically built for query speed<br />84<br />
    138. 138. PIT Table Architecture<br />Satellite: Point In Time<br />Primary<br />Key<br />PARENT SEQUENCE<br />LOAD DATE<br />{Satellite 1 Load Date}<br />{Satellite 2 Load Date}<br />{Satellite 3 Load Date}<br />{…}<br />{Satellite N Load Date}<br />PIT Sat <br />Sat 1<br />Sat 2<br />Hub<br />Order<br />PIT Sat <br />Sat 3<br />Sat 1<br />Sat 4<br />Sat 2<br />Sat 1<br />Hub Customer<br />Hub Product<br />Sat 2<br />Sat 3<br />Link Line Item<br />Sat 4<br />Satellite<br />Line Item<br />85<br />
    139. 139. PIT Table Example<br />SAT_CUST_CONTACT_CELL<br />SAT_CUST_CONTACT_ADDR<br />SAT_CUST_CONTACT_NAME<br />SQN LOAD_DTSCELL <br />1 10-14-2000999-555-1212<br />1 10-15-2000 999-111-1234<br />1 10-16-2000 999-252-2834<br />1 10-17-2000 999.257-2837<br />1 10-18-2000 999-273-5555<br />SQN LOAD_DTSADDR <br />1 08-01-200026 Prospect<br />109-29-200026 Prosp St.<br />112-17-200028 November<br />1 01-01-200126 Prospect St<br />SQN LOAD_DTSNAME <br />1 10-14-2000 Dan L<br />1 11-01-2000Dan Linedt<br />112-31-2000Dan Linstedt<br />SQN LOAD_DTSSAT_NAME_LDTS SAT_CELL_LDTS SAT_ADDR_LDTS<br />1 08-01-2000NULL NULL 08-01-2000<br />1 09-01-2000 NULL NULL 08-01-2000<br />1 10-01-2000 NULL NULL 09-29-2000<br />1 11-01-200011-01-200010-18-200009-29-2000<br />1 12-01-200011-01-200010-18-200009-29-2000<br />1 01-01-200112-31-200010-18-200001-01-2001<br />Snapshot Date<br />86<br />
    140. 140. BridgeTable Architecture<br />Satellite: Bridge<br />Primary<br />Key<br />UNIQUE SEQUENCE<br />LOAD DATE<br />{Hub 1 Sequence #}<br />{Hub 2 Sequence #}<br />{Hub 3 Sequence #}<br />{Link 1 Sequence #}<br />{Link 2 Sequence #}<br />{…}<br />{Link N Sequence #}<br />{Hub 1 Business Key}<br />{Hub 2 Business Key}<br />{…}<br />{Hub N Business Key}<br />Bridge<br />Sat 1<br />Sat 2<br />Hub Parts<br />Hub Seller<br />Hub Product<br />Link <br />Link <br />Sat 3<br />Sat 4<br />Satellite<br />Satellite<br />87<br />
    141. 141. Bridge Table Data Example<br />Bridge Table: Seller by Product by Part<br />SQN LOAD_DTSSELL_SQN SELL_ID PROD_SQN PROD_NUM PART_SQN PART_NUM<br />1 08-01-200015 NY*1 2756 ABC-123-9K 525 JK*2*4<br />209-01-200016CO*242654DEF-847-0L 324 MN*5-2<br />310-01-200016CO*2482374PPA-252-2A 9938 DD*2*3<br />411-01-200024AZ*2525222UIF-525-88 7 UF*9*0<br />512-01-200099NM*581DAN-347-7F 16 KI*9-2<br />601-01-200199NM*581DAN-347-7F 24 DL*0-5<br />Snapshot Date<br />88<br />
    142. 142. What WASN’T Covered<br />ETL Automation<br />ETL Implementation<br />SQL Query Logic<br />Balanced MPP design<br />Data Vault Modeling on Appliances<br />Deep Dive on Structures (Hubs, Links, Satellites)<br />What happens when you break the rules?<br />Project management, Risk management & mitigation, methodology & approach<br />Automation: Automated DV modeling, Automated ETL production<br />Change Management<br />Temporal Data Modeling Concerns… And so on…<br />89<br />
    143. 143. Conclusions<br />90<br />
    144. 144. Who’s Using It?<br />
    145. 145. The Experts Say…<br />“The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” <br />Bill Inmon<br />“The Data Vault is foundationally strong and exceptionally scalable architecture.”<br />Stephen Brobst<br />“The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” <br />Doug Laney<br />
    146. 146. More Notables…<br />“This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.” <br />Howard Dresner<br />“[The Data Vault] captures a practical body of knowledge for data warehouse development which both agile and traditional practitioners will benefit from..”<br />Scott Ambler<br />
    147. 147. Where To Learn More<br />The Technical Modeling Book: http://LearnDataVault.com<br />The Discussion Forums: & eventshttp://LinkedIn.com – Data Vault Discussions<br />Contact me:http://DanLinstedt.com - web siteDanLinstedt@gmail.com - email<br />World wide User Group (Free)http://dvusergroup.com<br />Certification Training:<br />Contact me, or learn more at: http://GeneseeAcademy.com<br />94<br />
    148. 148. ODV – Case Study<br />Operational Data Vault – IN THE REAL WORLD!<br />95<br />
    149. 149. E-Pedigree, Drug Track & Trace<br />96<br />Product Returns<br />And Recalls<br />Product<br />Packaging<br />CorpSite<br />Server<br />Secure Integration Services<br />Corporate<br />Serialization<br />Vault<br />Serialization<br />Analytics<br />Engine<br />Packaging<br />Orders<br />Product Authenticator<br />3rd Party Logistics<br />Distribution Warehouse<br />Secure Integration Services<br />E-Pedigree<br />Management<br />Manufacturer<br />Product Packager<br />Supply Chain<br />
    150. 150. Label Serialization Vault<br />97<br />ERP<br />Product <br />Master Data<br />EPC GlobalStandards<br />Corp Domain<br />Corp<br />Applications<br />Serialization<br />Vault<br />CustPkg<br />Line<br />Data<br />E-Pedigree<br />WS/SOAP<br />Master Data<br /><ul><li>Products
    151. 151. Locations
    152. 152. Trading Partners
    153. 153. Users</li></ul>Shipping Data<br /><ul><li>Transactions</li></ul>Shipping<br />Reasons<br />Serialization<br />Marts<br />Warehouse<br />(WMS)<br />Flat Files<br />WS/SOAP<br />ASN<br />Serialization Vault<br />Global – Master Data<br />Local – Private Data<br />Serialization/Packaging Data<br /><ul><li>Serial #’s
    154. 154. Hierarchical Relationships
    155. 155. Containers</li></li></ul><li>Corporate Security<br />98<br />Pros<br /><ul><li>Unique Logins Limit Access
    156. 156. Physical Data Separation in Logical “Database” units
    157. 157. No single login has 100% data access.
    158. 158. Customers can be CHARGED for disk space, indexing, utilization</li></ul>Cons<br /><ul><li>Maintenance, Backup and Restore
    159. 159. Changes to the data model ripple (larger impacts) as more customers are signed up.
    160. 160. Each “support call” requires separate login to see the data set.</li></ul>Data Exchange/Sharing Through Code Only<br />Web-Services and Flat File Delivery<br />Customer<br />Login<br />Corp<br />Login<br />Customer<br />Login<br />Corp<br />Login<br />Employee<br />Validation<br />Admin<br />Login<br />Encrypt Key<br />Encrypt Key<br />Encrypt Key<br />Mart<br />1<br />Mart<br />2<br />Mart<br />3<br />Mart<br />1<br />Mart<br />2<br />Mart<br />3<br />Tracking #<br />Machine Info<br />SQL View Layer<br />SQL View Layer<br />Global<br />Data Vault<br />Data Vault<br />Manufacturer<br />Shipper<br />9/27/2011<br />
    161. 161. Web Services File Delivery<br />99<br />Web-Services and Flat File Delivery<br />Machine<br />Local DB<br />Machine<br />Global DB<br />Machine<br />Local DB<br />Machine<br /><ul><li>Encryption at multiple levels
    162. 162. Multi-machine Utilization
    163. 163. RAM Based encryption decryption through services</li></li></ul><li>Secure Machine Transfers<br />100<br />External IP Cards<br />Web-Services and Flat File Delivery Machine<br />Encrypted Local Director Database<br />Encrypt / Decrypt<br />https layer<br />Encrypt / Decrypt<br />DBMS<br />Machine<br />VPN Tunnel<br />Encrypted / Compressed Storage<br />
    164. 164. Secure Client Data Interchange<br />101<br /><ul><li>Decrypt using Corp Key, then Re-Encrypt with Customer Unique Key before storing
    165. 165. Customer Owned Key (Dictated by Customer)
    166. 166. Corporate Owned Key (Encrypts data internally)</li></ul>Corp Managed / Owned Copy<br />Web Services<br />Customer Copy<br />Customer<br />Login<br />Corp<br />Login<br />+HTTPS<br />Corp Encrypt Key<br />Web Services<br />Encrypted<br />Flat Files<br />Decryption<br />Key<br />+ SFTP<br />Customer <br />Local Copy<br />
    167. 167. Security: ODV Web Services<br />102<br />Corp Managed / Owned Copy<br />Web Browser<br />Web Site / Server<br />Java Script<br />Or PHP<br />Web Services<br />Customer<br />Login<br />Corp<br />Login<br />Corporate Encrypt Key<br />Corporate Owned Encryption Key<br />Global DB<br />
    168. 168. Inflow/Outflow Applications<br />103<br />Customer<br />Corporation<br />Corporation<br />Customer<br />Source<br />Machine<br />Encrypts Data<br />Using Customer<br />Key<br />Corp Decrypts<br />Data<br />According to <br />Customer Key<br />Corp Re-Encrypts<br />Data According to<br />Internal Key<br />For Specific <br />Customer<br />Corp Decrypts<br />Data According to<br />Internal Key<br />For Specific <br />Customer<br />Corp Encrypts<br />Data<br />According to <br />Customer Key<br />Customer <br />Decrypts<br />Data<br />According to <br />Customer Key<br />DB<br />DB<br />Transmit Encrypted <br />Data over HTTPS<br />Transmit Encrypted <br />Data over HTTPS<br />Web Service Sender<br />Web Service Collector<br />
    169. 169. ODV: Secure File Request<br />104<br />Corporation<br />Customer<br />** Note: Each Customer DB is encrypted via an internally owned Corp key which is unique to EACH customer.<br />Customer <br />Decrypts<br />File<br />According to <br />Customer Key<br />Transmit Encrypted <br />Data over FTPS<br />Encrypted File<br />
    170. 170. ODV: Front-End Ping Request<br />105<br />Corporation<br />Customer<br />Corp One-Way<br />Hash of key<br />Number<br />To Execute Ping<br />Web-Based<br />PING<br />Validation<br />DBMS<br />Unencrypted Data <br />Transfer<br />Login / Auth<br />

    ×