Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data vault what's Next: Part 2


Published on

Part 2 of a 2 part presentation that I did in 2009, this presentation covers more about unstructured data, and operational data vault components. YES, even then I was commenting on how this market will evolve. IF you want to use these slides, please let me know, and add: "(C) Dan Linstedt, all rights reserved," in a VISIBLE fashion on your slides.

Published in: Business, Technology
  • Be the first to comment

Data vault what's Next: Part 2

  1. 1. Data Vault ModelingWhat’s Next? Part 2<br />© Dan Linstedt 2009-2012<br />This was PART 2 of a presentation I gave at an Array Conference In the Netherlands, in 2009.<br />
  2. 2. A bit about me…<br />Author, Inventor, Speaker – and part time photographer…<br />25+ years in the IT industry<br />Worked in DoD, US Gov’t, Fortune 50, and so on…<br />Find out more about the Data Vault:<br /><br /><br />Full profile on<br /><br />
  3. 3. Where are We Today?<br />IF you are using Data Vault…<br />Auto Generation of Staging Loads<br />Auto Generation of Data Vault Loads<br />Auto Generation of Data Vault Reconciliation Routines<br />Auto Generation of RAW Star Schemas<br />Rapid Build out of Star Schemas<br />If you are lucky…<br />Auto Generation of the Data Vault Model<br />Auto Consolidation of Source System Data Models<br />Auto Generation of the Staging Data Model<br /><br />
  4. 4. Where do all these pieces fit?<br />DW2.0 Framework!<br /><br />
  5. 5. DW2.0 Framework<br /><br />Cube <br />Processing<br />Temporal<br />Indexing<br />Semantic<br />Management<br />Active <br />Data Mining<br />Transformation<br />Active<br />Cleansing<br />Interactive<br />Tactical<br />Integrated<br />Strategic<br />Near-Line<br />Extended<br />Archival<br />Historical<br />Enterprise Service Bus / SOA / Web Services<br />Unstructured Data:<br /><ul><li>Email
  6. 6. Plain Text
  7. 7. Word Docs
  8. 8. Images</li></ul>Temp<br />HOT<br />SSD!<br />(Cloud RAM)<br />MediuM<br />Cloud<br />Storage<br />Warm<br />COLD<br />METADATA<br />Enterprise Data Warehouse<br />
  9. 9. How do we get there?<br /><br />
  10. 10. Virtual Marts: What are they?<br />They Are:<br />RAM based data marts, or SSD drive based Data Marts<br />OLAP cubes (most of the time) built on the fly by new queries<br />“hot-data” that are continually accessed by the BI tool<br />the result sets of the most frequently used queries<br />built dynamically, are accessed regularly, and are destroyed after “idle” for a specific time<br />FAST<br />only a subset of data from the EDW<br />NOTE: They have WRITE-BACK capabilities!!<br /><br />
  11. 11. Virtual Marts<br />REQUIREMENTS<br />Cloud based RDBMS <br />with expandable RAM<br />Unlimited computing power<br />Maximum parallelism<br />Extreme scalability<br />OR: Big Hardware with similar attributes<br /><br />BENEFITS<br /><ul><li>Highly Alterable Answer Sets
  12. 12. Write Back to BDV
  13. 13. Dynamic create/destroy capability
  14. 14. No “copy” of the data except in RAM</li></li></ul><li>Virtual Marts: How do I build one?<br />You can, if you have Solid-State-Disk (RAM-DISK) in your database server<br />You can if you are using Cloud Technology<br />Building one is the job of the 2010 RDBMS engine (today’s database engines do not provide these capabilities)<br />However: To emulate, you can build one as follows:<br />Monitor the queries most frequently executed<br />Build the Cubes / stars on a regular schedule (automated queries)<br />Tear the cubes down when queries no longer access the data<br /> Remember: It will be YOUR job to maintain, monitor and manage these components until the database engines get there with HOT data.<br /><br />
  15. 15. Virtual Marts Affect The BDV<br />Write Back Capability:<br />from Virtual Marts affect business decisions<br />New Business transactions/changed transactions will be fed back to operational systems<br />Changes will be sent on the bus to notify other systems of business decisions<br />User security and control will have to be in place to authorize WHO can change WHAT in which parts of the marts.<br />Tracking of each change will become a required standard<br />Eventually the Virtual Marts will become a MIXED BI Application with an operational front end!<br /><br />
  16. 16. Unstructured Data: What is it?<br />It is: Information that resides on your desktop, on your servers, on the web, is multi-lingual, and conceptually based.<br />Technically: Documents, E-Mails, Transcripts, Videos, Images, Sound Files.<br />It is 80% of the data yet un-used by EDW/BI operations around the world<br />It is 10x harder to deal with than structured data due to privacy concerns, ownership issues, and ethical concerns.<br />Data Governance, and Data Stewardship play a HUGE role in the success/failure of working with Unstructured Data Sets<br /><br />
  17. 17. Unstructured Data<br /><br />REQUIREMENTS<br /><ul><li>Pre-Processed data sets
  18. 18. Pointers to data sets
  19. 19. Use of & Loading of Ontologies
  20. 20. Multi-Language processing</li></ul>BENEFITS<br /><ul><li>Highly Alterable Answer Sets
  21. 21. Write Back to BDV
  22. 22. Dynamic create/destroy capability
  23. 23. No “copy” of the data except in RAM</li></li></ul><li>Unstructured Data Engines Vs Search Engines<br />Unstructured Data Engine<br />Search Engine<br /><br />Indexes Documents<br />Locates ALL potential matches<br />Uses Data Mining / Neural Nets<br />Correlates across multiple languages, multiple meanings of phrases<br />Induction based reasoning<br />Similarity Ratings based on Confidence and Strength<br />Deep Analysis (focused on 1 question)<br />Utilizes Ontologies<br />Indexes key terms<br />Locates “most likely match”<br />Uses Statistical Analysis<br />Correlates based on “Term matching”<br />Wide search, but not “deep analysis”<br />
  24. 24. U-Data & Data Vault<br />Unstructured Data – Loaded To Database<br />Ontology, Loaded to Database<br />Dynamic Links<br />Built from Analyzing Queries<br />And Ontologies<br />Used to Load Cubes!<br />Structured RAW Data Vault<br /><br />
  25. 25. U-Data & Ontologies<br />Ontologies describe term relationships<br />Ontologies house term hierarchies<br />Ontologies can correlate terms across languages<br />Ontologies can provide synonyms, homonyms, and antonyms<br />Ontologies are the key piece of Metadataneeded to cross unstructured mining results to structured data sets in source systems<br />Ontologies define the manner in which natural language ties together concepts<br />Ontologies (or pieces of them) are required for success within the understanding of Unstructured Data & Structured Data Combinations<br /><br />
  26. 26. Ontologies and BI Applications<br />Business Users will shift their BI applications to include managing data sets THROUGH ontology specifications<br />Business Users will assign governance to ontologies and manage changes to ontologies as their metadata definitions<br />Tomorrows BI tool set will provide visualizations of Ontologies cross-mapped to analytical data sets<br />Ontologies ARE the metadata of tomorrow<br /><br />
  27. 27. Plateau: Operational Data Warehouse <br /><br />REQUIREMENTS<br /><ul><li>Web-Services feeds with real-time data
  28. 28. Applications for metadata management on top of the EDV
  29. 29. Applications for Ontology Management on top of the EDV
  30. 30. Applications to edit/maintain Operational Data
  31. 31. Virtual Data Marts
  32. 32. In-DB Data Mining Engine Capabilities</li></ul>BENEFITS<br /><ul><li>Direct ties between the operational world and the Data Warehouse
  33. 33. Rapid turn around/impact analysis by business users</li></li></ul><li>Operational DV: How to Build One<br />The Easy Way:<br />Start with standard Data Vault Modeling<br />Attach Web-Services for In-flow/Out-Flow of Data (putting the DV on the ESB as a 24x7x365 operational component)<br />Use Business Workflow Engines to monitor, create, edit, change and build applications on top of the web-services and web messages components<br />Never allow direct access to the data in the Data Vault EXCEPT through web-services<br />The Hard Way:<br />Start with Standard Data Vault Modeling<br />Attach Web Services for In-Flow/Out-Flow of Data<br />Build a common data access layer (CDAL) that houses transactions in RAM (manages locking of data sets)<br />Build applications on top of the CDAL<br />Put the whole thing on the CLOUD to allow dynamic data marts<br /><br />
  34. 34. The Experts Say…<br />“The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” <br />Bill Inmon<br />“The Data Vault is foundationally strong and exceptionally scalable architecture.”<br />Stephen Brobst<br />“The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” <br />Doug Laney<br /><br />
  35. 35. More Notables…<br />“This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.” <br />Howard Dresner<br />“[The Data Vault] captures a practical body of knowledge for data warehouse development which both agile and traditional practitioners will benefit from..”<br />Scott Ambler<br /><br />
  36. 36. Where To Learn More<br />The Technical Modeling Book:<br />The Discussion Forums: & events – Data Vault Discussions<br />Contact me: - web - email<br />World wide User Group (Free)<br /><br />