Data vault what's Next: Part 2


Published on

Part 2 of a 2 part presentation that I did in 2009, this presentation covers more about unstructured data, and operational data vault components. YES, even then I was commenting on how this market will evolve. IF you want to use these slides, please let me know, and add: "(C) Dan Linstedt, all rights reserved," in a VISIBLE fashion on your slides.

Published in: Business, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data vault what's Next: Part 2

  1. 1. Data Vault ModelingWhat’s Next? Part 2<br />© Dan Linstedt 2009-2012<br />This was PART 2 of a presentation I gave at an Array Conference In the Netherlands, in 2009.<br />
  2. 2. A bit about me…<br />Author, Inventor, Speaker – and part time photographer…<br />25+ years in the IT industry<br />Worked in DoD, US Gov’t, Fortune 50, and so on…<br />Find out more about the Data Vault:<br /><br /><br />Full profile on<br /><br />
  3. 3. Where are We Today?<br />IF you are using Data Vault…<br />Auto Generation of Staging Loads<br />Auto Generation of Data Vault Loads<br />Auto Generation of Data Vault Reconciliation Routines<br />Auto Generation of RAW Star Schemas<br />Rapid Build out of Star Schemas<br />If you are lucky…<br />Auto Generation of the Data Vault Model<br />Auto Consolidation of Source System Data Models<br />Auto Generation of the Staging Data Model<br /><br />
  4. 4. Where do all these pieces fit?<br />DW2.0 Framework!<br /><br />
  5. 5. DW2.0 Framework<br /><br />Cube <br />Processing<br />Temporal<br />Indexing<br />Semantic<br />Management<br />Active <br />Data Mining<br />Transformation<br />Active<br />Cleansing<br />Interactive<br />Tactical<br />Integrated<br />Strategic<br />Near-Line<br />Extended<br />Archival<br />Historical<br />Enterprise Service Bus / SOA / Web Services<br />Unstructured Data:<br /><ul><li>Email
  6. 6. Plain Text
  7. 7. Word Docs
  8. 8. Images</li></ul>Temp<br />HOT<br />SSD!<br />(Cloud RAM)<br />MediuM<br />Cloud<br />Storage<br />Warm<br />COLD<br />METADATA<br />Enterprise Data Warehouse<br />
  9. 9. How do we get there?<br /><br />
  10. 10. Virtual Marts: What are they?<br />They Are:<br />RAM based data marts, or SSD drive based Data Marts<br />OLAP cubes (most of the time) built on the fly by new queries<br />“hot-data” that are continually accessed by the BI tool<br />the result sets of the most frequently used queries<br />built dynamically, are accessed regularly, and are destroyed after “idle” for a specific time<br />FAST<br />only a subset of data from the EDW<br />NOTE: They have WRITE-BACK capabilities!!<br /><br />
  11. 11. Virtual Marts<br />REQUIREMENTS<br />Cloud based RDBMS <br />with expandable RAM<br />Unlimited computing power<br />Maximum parallelism<br />Extreme scalability<br />OR: Big Hardware with similar attributes<br /><br />BENEFITS<br /><ul><li>Highly Alterable Answer Sets
  12. 12. Write Back to BDV
  13. 13. Dynamic create/destroy capability
  14. 14. No “copy” of the data except in RAM</li></li></ul><li>Virtual Marts: How do I build one?<br />You can, if you have Solid-State-Disk (RAM-DISK) in your database server<br />You can if you are using Cloud Technology<br />Building one is the job of the 2010 RDBMS engine (today’s database engines do not provide these capabilities)<br />However: To emulate, you can build one as follows:<br />Monitor the queries most frequently executed<br />Build the Cubes / stars on a regular schedule (automated queries)<br />Tear the cubes down when queries no longer access the data<br /> Remember: It will be YOUR job to maintain, monitor and manage these components until the database engines get there with HOT data.<br /><br />
  15. 15. Virtual Marts Affect The BDV<br />Write Back Capability:<br />from Virtual Marts affect business decisions<br />New Business transactions/changed transactions will be fed back to operational systems<br />Changes will be sent on the bus to notify other systems of business decisions<br />User security and control will have to be in place to authorize WHO can change WHAT in which parts of the marts.<br />Tracking of each change will become a required standard<br />Eventually the Virtual Marts will become a MIXED BI Application with an operational front end!<br /><br />
  16. 16. Unstructured Data: What is it?<br />It is: Information that resides on your desktop, on your servers, on the web, is multi-lingual, and conceptually based.<br />Technically: Documents, E-Mails, Transcripts, Videos, Images, Sound Files.<br />It is 80% of the data yet un-used by EDW/BI operations around the world<br />It is 10x harder to deal with than structured data due to privacy concerns, ownership issues, and ethical concerns.<br />Data Governance, and Data Stewardship play a HUGE role in the success/failure of working with Unstructured Data Sets<br /><br />
  17. 17. Unstructured Data<br /><br />REQUIREMENTS<br /><ul><li>Pre-Processed data sets
  18. 18. Pointers to data sets
  19. 19. Use of & Loading of Ontologies
  20. 20. Multi-Language processing</li></ul>BENEFITS<br /><ul><li>Highly Alterable Answer Sets
  21. 21. Write Back to BDV
  22. 22. Dynamic create/destroy capability
  23. 23. No “copy” of the data except in RAM</li></li></ul><li>Unstructured Data Engines Vs Search Engines<br />Unstructured Data Engine<br />Search Engine<br /><br />Indexes Documents<br />Locates ALL potential matches<br />Uses Data Mining / Neural Nets<br />Correlates across multiple languages, multiple meanings of phrases<br />Induction based reasoning<br />Similarity Ratings based on Confidence and Strength<br />Deep Analysis (focused on 1 question)<br />Utilizes Ontologies<br />Indexes key terms<br />Locates “most likely match”<br />Uses Statistical Analysis<br />Correlates based on “Term matching”<br />Wide search, but not “deep analysis”<br />
  24. 24. U-Data & Data Vault<br />Unstructured Data – Loaded To Database<br />Ontology, Loaded to Database<br />Dynamic Links<br />Built from Analyzing Queries<br />And Ontologies<br />Used to Load Cubes!<br />Structured RAW Data Vault<br /><br />
  25. 25. U-Data & Ontologies<br />Ontologies describe term relationships<br />Ontologies house term hierarchies<br />Ontologies can correlate terms across languages<br />Ontologies can provide synonyms, homonyms, and antonyms<br />Ontologies are the key piece of Metadataneeded to cross unstructured mining results to structured data sets in source systems<br />Ontologies define the manner in which natural language ties together concepts<br />Ontologies (or pieces of them) are required for success within the understanding of Unstructured Data & Structured Data Combinations<br /><br />
  26. 26. Ontologies and BI Applications<br />Business Users will shift their BI applications to include managing data sets THROUGH ontology specifications<br />Business Users will assign governance to ontologies and manage changes to ontologies as their metadata definitions<br />Tomorrows BI tool set will provide visualizations of Ontologies cross-mapped to analytical data sets<br />Ontologies ARE the metadata of tomorrow<br /><br />
  27. 27. Plateau: Operational Data Warehouse <br /><br />REQUIREMENTS<br /><ul><li>Web-Services feeds with real-time data
  28. 28. Applications for metadata management on top of the EDV
  29. 29. Applications for Ontology Management on top of the EDV
  30. 30. Applications to edit/maintain Operational Data
  31. 31. Virtual Data Marts
  32. 32. In-DB Data Mining Engine Capabilities</li></ul>BENEFITS<br /><ul><li>Direct ties between the operational world and the Data Warehouse
  33. 33. Rapid turn around/impact analysis by business users</li></li></ul><li>Operational DV: How to Build One<br />The Easy Way:<br />Start with standard Data Vault Modeling<br />Attach Web-Services for In-flow/Out-Flow of Data (putting the DV on the ESB as a 24x7x365 operational component)<br />Use Business Workflow Engines to monitor, create, edit, change and build applications on top of the web-services and web messages components<br />Never allow direct access to the data in the Data Vault EXCEPT through web-services<br />The Hard Way:<br />Start with Standard Data Vault Modeling<br />Attach Web Services for In-Flow/Out-Flow of Data<br />Build a common data access layer (CDAL) that houses transactions in RAM (manages locking of data sets)<br />Build applications on top of the CDAL<br />Put the whole thing on the CLOUD to allow dynamic data marts<br /><br />
  34. 34. The Experts Say…<br />“The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” <br />Bill Inmon<br />“The Data Vault is foundationally strong and exceptionally scalable architecture.”<br />Stephen Brobst<br />“The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” <br />Doug Laney<br /><br />
  35. 35. More Notables…<br />“This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.” <br />Howard Dresner<br />“[The Data Vault] captures a practical body of knowledge for data warehouse development which both agile and traditional practitioners will benefit from..”<br />Scott Ambler<br /><br />
  36. 36. Where To Learn More<br />The Technical Modeling Book:<br />The Discussion Forums: & events – Data Vault Discussions<br />Contact me: - web - email<br />World wide User Group (Free)<br /><br />