Your SlideShare is downloading. ×
0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Data Vault and DW2.0
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data Vault and DW2.0

2,564

Published on

This is a presentation I gave in 2006 for Bill Inmon. The presentation covers Data Vault and how it integrates with Bill Inmon's DW2.0 vision. This is focused on the business intelligence side of …

This is a presentation I gave in 2006 for Bill Inmon. The presentation covers Data Vault and how it integrates with Bill Inmon's DW2.0 vision. This is focused on the business intelligence side of the house.

IF you want to use these slides, please put (C) Dan Linstedt, all rights reserved, http://LearnDataVault.com

Published in: Business, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,564
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
246
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The Application of Data Vault to DW2.0<br />© Dan Linstedt, 2011-2012 all rights reserved<br />
  • 2. A bit about me…<br />2<br />Author, Inventor, Speaker – and part time photographer…<br />25+ years in the IT industry<br />Worked in DoD, US Gov’t, Fortune 50, and so on…<br />Find out more about the Data Vault:<br />http://www.youtube.com/LearnDataVault<br />http://LearnDataVault.com<br />Full profile on http://www.LinkedIn.com/dlinstedt<br />
  • 3. Agenda<br />Defining The Needs for the Data Vault<br />DW2.0 Architecture<br />DW2.0 Drivers for Data Modeling<br />Divergence of Data Models over Time<br />Data Vault in DW2.0<br />Defining the Data Vault<br />What does one look like?<br />Modeling in DW2.0<br />Applying Data Vault to Global DW2.0<br />Applying Data Vault to Time-Value DW2.0<br />Compliance in DW2.0<br />Applying Data Vault to System of Record<br />The Paradox of DW2.0<br />Volume, Latency, Complexity,Normalization andTransformation ability<br />10/5/2011<br />Do Not Duplicate Without Written Permission<br />3<br />
  • 4. DW2.0 Architecture<br />10/5/2011<br />Do Not Duplicate Without Written Permission<br />4<br />Enterprise Service Bus<br />ESB Connectivity:<br /><ul><li>EAI
  • 5. EII
  • 6. ETL / ELT
  • 7. Web Services</li></ul>Cube <br />Processing<br />Temporal<br />Indexing<br />Semantic<br />Management<br />Active <br />Data Mining<br />Transformation<br />Active<br />Cleansing<br />Unstructured Data:<br /><ul><li>Email
  • 8. Plain Text
  • 9. Word Docs
  • 10. Images</li></ul>M<br />E<br />T<br />A<br />D<br />A<br />T<br />A<br />Interactive<br />Tactical<br />Data Models Must be consistently applied throughout all layers.<br />Integrated<br />Strategic<br />ESB Management:<br /><ul><li>Text
  • 11. Email
  • 12. Spread Sheets
  • 13. Transaction
  • 14. Structured Information</li></ul>Near-Line<br />Extended<br />Archival<br />Historical<br />Enterprise Data Warehouse<br />
  • 15. DW2.0 Drivers for Data Modeling<br />10/5/2011<br />Do Not Duplicate Without Written Permission<br />5<br />Technical Drivers<br />Business Drivers<br />Flexibility<br />Compliance<br />Volume<br />Frequency<br />Data<br />Model<br />Data<br />Model<br />Understandability<br />Granularity<br />Data Models are one of the main integration points between Technical and Business drivers.<br />Business Keys drive understandability, and granularity<br />Normalization drives flexibility, and frequency of load<br />Raw data sets in the EDW/ADW drive compliance and volume<br />
  • 16. Divergence of Data Models over Time<br />Data models (both logical and physical) have diverged from business drivers and direction over time.<br />The Data Models have driven towards physical improvements instead of towards business improvements.<br />The Data Vault Architecture drives data modeling back to the business sides of the house.<br />10/5/2011<br />Do Not Duplicate Without Written Permission<br />6<br />
  • 17. Agenda<br />Defining The Needs for the Data Vault<br />DW2.0 Architecture<br />DW2.0 Drivers for Data Modeling<br />Divergence of Data Models over Time<br />Data Vault in DW2.0<br />Defining the Data Vault<br />What does one look like?<br />Modeling in DW2.0<br />Applying Data Vault to Global DW2.0<br />Applying Data Vault to Time-Value DW2.0<br />Compliance in DW2.0<br />Applying Data Vault to System of Record<br />The Paradox of DW2.0<br />Volume, Latency, Complexity,Normalization andTransformation ability<br />10/5/2011<br />Do Not Duplicate Without Written Permission<br />7<br />Image is from - What The Bleep Do We Know?<br />
  • 18. Defining the Data Vault<br />10/5/2011<br />Do Not Duplicate Without Written Permission<br />8<br />The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. <br />It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. It is a data model that is architected specifically to meet the needs of today’s enterprise data warehouses.<br />Defining the Data Vault<br />TDAN.com Article<br />
  • 19. What Does One Look Like?<br />10/5/2011<br />Do Not Duplicate Without Written Permission<br />9<br />Records a history of the interaction<br />Account Information<br />Sat<br />Sat<br />Sat<br />Link<br />Account<br />F(x)<br />F(x)<br />Sat<br />Sat<br />Invoice<br />ID<br />Sat<br />F(x)<br />Sat<br />Invoice / Billing Information<br />Customer Information<br />Sat<br />Elements:<br /><ul><li>Hub
  • 20. Link
  • 21. Satellite</li></ul>Sat<br />Customer<br />F(x)<br />Sat<br />The impact of linking disparate systems together, is inside the shaded area.<br />
  • 22. Modeling in DW2.0<br />Bill Says:<br />DW2.0 must be brought down to a very finite level of detail.<br />The starting point for DW2.0 is the modeling process.<br />The data model applies to the integrated sector, the near line sector, and the archival sector.<br />The way that data warehouses are built is in an incremental manner<br />The Data Vault specializes in:<br />Providing finite grain at the lowest level possible,<br />Mapping business process models to data models<br />Existing in all sectors simultaneously without changes.<br />Flexibility and managing change so that impacts are not a mile-wide and 10 miles deep.<br />10/5/2011<br />Do Not Duplicate Without Written Permission<br />10<br />
  • 23. Elements in a Data Vault<br />Hub<br />Unique List of Business Keys, tracked by the first time the warehouse saw them appear.<br />Link<br />Relationships between business keys, also representing a grain shift, or a hierarchical roll-up.<br />Satellite<br />Data over time, granular, and descriptive about the business key. Also setup according to type of information, and rate of change.<br />10/5/2011<br />Do Not Duplicate Without Written Permission<br />11<br />
  • 24. Applying the Data Vault to Global DW2.0<br />10/5/2011<br />Do Not Duplicate Without Written Permission<br />12<br />Manufacturing EDW <br />in China<br />Planning in Brazil<br />Hub<br />Hub<br />Link<br />Sat<br />Sat<br />Link<br />Sat<br />Sat<br />Link<br />Hub<br />Link<br />Hub<br />Hub<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Base EDW Created in Corporate<br />Financials in USA<br />
  • 25. Applying the Data Vault to Time-Value DW2.0<br />10/5/2011<br />Do Not Duplicate Without Written Permission<br />13<br />Satellite Data Over Time<br />Row 1<br />Row 2<br />Row 3<br />Row 4<br />Satellite entities in the Data Vault house data over time. They are split by type of information and rate of change. This is an example set of data for a customer name satellite.<br />
  • 26. Batch and Real-Time Data Arrival<br />10/5/2011<br />Do Not Duplicate Without Written Permission<br />14<br />All Inserts<br />All the time<br />Transaction ID<br />Date Stamp<br />Customer<br />Account #<br />Amount<br />Sat<br />Transaction<br />Type<br />Hub <br />Customer<br />Link<br />Transaction<br />Hub <br />Acct<br />Sat<br />Customer<br />Sat<br />Acct<br />3, 6 or 12 Hr <br />Load Window<br />Batch Load<br />Customer Info<br />Acct Data<br />
  • 27. Star Schema Real-Time Data Issues<br />10/5/2011<br />Do Not Duplicate Without Written Permission<br />15<br />Updates are<br />REQUIRED!<br />Transaction ID<br />Date Stamp<br />Customer<br />Account #<br />Amount<br />Type<br />3, 6 or 12 Hr <br />Load Window<br />Dimension<br />Customer<br />Fact<br />Transaction<br />Dimension<br />Account<br />Batch Load<br />Customer Info<br />Acct Data<br />Cleansing & Quality must occur before the data can reach the target tables, cleansing and quality introduce unwanted latency!<br />
  • 28. Compliance in DW2.0<br />10/5/2011<br />Do Not Duplicate Without Written Permission<br />16<br />Changes to Source Information<br />Source <br />Systems<br />EDW / ADW<br />Data Vault<br />Data Marts<br />Data Delivery<br />Raw Detail = auditable<br />Loads in Real-Time or in Batch<br />Integrated by Business Key<br />Flexible, allows business changes (with little to no impact)<br />No delay in loading data<br />Data type conformity<br />Semantic Integration<br />True<br />Marts<br />Raw<br />Integration<br />Business<br />Rules<br />User or<br />Auditor<br />Continuous <br />Data <br />Improvement<br />Error<br />Mart<br />Quality<br />Direction of Information Flow<br />Master Data<br />(Operational)<br />
  • 29. Applying the Data Vault to System Of Record<br />10/5/2011<br />Do Not Duplicate Without Written Permission<br />17<br />Master Data or<br />Conformed Dimensions<br />Normalized EDW<br />Source Systems<br />SOR<br />Definition 2<br />SOR<br />Definition 3<br />SOR<br />Definition 1<br />SOR 1 <br />Data Capture, Data Produced by system algorithms<br />SOR 2<br />Raw Detailed Integrated Data over time, Integrated by Horizontal (functional) Business Key. Auditable.<br />SOR 3<br />Current view of the business, merged, quality cleansed, single copy, single source, feeds operational systems.<br />
  • 30. DW2.0 Paradoxes<br />DW2.0 incorporates:<br />Unstructured, Semi-Structured, Real-Time, and Batch Data<br />Global views<br />All of which drive volumes of data.<br />Volume causes latency in transformation.<br />Volume is directly proportional to transformation complexity.<br />Real-Time data arrival is inversely proportional to complexity and volume.<br />Time for “quality, cleansing, and transformation” on the way in to the EDW diminishes as near-real-time is approached, or massive volumes of batch data are found within a shrinking batch window.<br />Transformation can destroy data audit ability and compliance of the EDW / ADW.<br />10/5/2011<br />Do Not Duplicate Without Written Permission<br />18<br />
  • 31. DW2.0 Paradoxes - Imagery<br />10/5/2011<br />Do Not Duplicate Without Written Permission<br />19<br />Drives<br />DW2.0<br />Real-Time<br />Transactions<br />Unstructured<br />Data<br />Low-Level<br />Grain<br />Pushes<br />Increases<br />Low<br />Latency<br />Volume<br />Fights<br />Requires<br />Merging, Quality,<br />Cleansing<br />Fights<br />Data Model<br />Denormalization<br />Fights<br />Data Model<br />Normalization<br />& Raw Details<br />Inhibits<br />Requires<br />Inhibits<br />Auditability & Compliance<br />Provides<br />
  • 32. DW2.0 Paradox Hypothesis<br />As we reach near-real time, the ability to transform data and “wait” for parent dependencies directly decreases, the data decay rates increase, and therefore can cause data death if not processed in time.<br />Normalization of the data model increases flexibility, and scalability.<br />The closer we get to near-real-time, the more normalized the data model in the EDW/ADW must become.<br />In order to process high volumes of batch data extremely fast, the “business transformations” must be removed from the load stream of the EDW.<br />10/5/2011<br />Do Not Duplicate Without Written Permission<br />20<br />
  • 33. Data Vault Volumetrics<br />10/5/2011<br />Do Not Duplicate Without Written Permission<br />21<br />Volumetrics (10% null Data)<br />Upon Initial Investigation, the 12 month growth rate for new customers is 197.4 MB per year…. <br />Now let’s factor in the DELTA’s.<br />
  • 34. Data Vault Growth<br />10/5/2011<br />Do Not Duplicate Without Written Permission<br />22<br />Volumetrics (10% null Data) – Delta Growth Only<br />Original Dimension: 497.16 MB per Year<br />New Data Vault:317.03 MB Per Year<br />
  • 35. Data Vault VS Dimension Growth<br />10/5/2011<br />Do Not Duplicate Without Written Permission<br />23<br />How does the extensive growth rate affect queries?<br />
  • 36. Summarization<br />Business:<br />Lack of a single view of a customer, product, service, etc...<br />Lack of visibility into ALL information across the enterprise.<br />Competition does it better, faster, cheaper.<br />Unable to identify and forecast business trends and their impacts.<br />WHERE’S THE KNOWLEDGE? OR IS IT JUST ALL DATA?<br />10/5/2011<br />Do Not Duplicate Without Written Permission<br />24<br />Technical:<br />Near-Real-Time (Active)<br />Huge Data Volumes<br />Massive Data Dis-Integration<br />Spread-Marts<br />Convergence of Operational and Strategic Questions<br />Duplication of data in the ODS, Warehouse, and Data Marts!<br />Dimension-itis!!<br />ODS Ulcer!<br />Fact Table Granularity<br />JUNK tables, Helper Tables<br />
  • 37. Where To Learn More<br />The Technical Modeling Book: http://LearnDataVault.com<br />The Discussion Forums: & eventshttp://LinkedIn.com – Data Vault Discussions<br />Contact me:http://DanLinstedt.com - web siteDanLinstedt@gmail.com - email<br />World wide User Group (Free)http://dvusergroup.com<br />25<br />

×