Data vault what's Next: Part 2
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


Data vault what's Next: Part 2



Part 2 of a 2 part presentation that I did in 2009, this presentation covers more about unstructured data, and operational data vault components. YES, even then I was commenting on how this market ...

Part 2 of a 2 part presentation that I did in 2009, this presentation covers more about unstructured data, and operational data vault components. YES, even then I was commenting on how this market will evolve. IF you want to use these slides, please let me know, and add: "(C) Dan Linstedt, all rights reserved," in a VISIBLE fashion on your slides.



Total Views
Views on SlideShare
Embed Views



4 Embeds 204 191 9 3 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Data vault what's Next: Part 2 Presentation Transcript

  • 1. Data Vault ModelingWhat’s Next? Part 2
    © Dan Linstedt 2009-2012
    This was PART 2 of a presentation I gave at an Array Conference In the Netherlands, in 2009.
  • 2. A bit about me…
    Author, Inventor, Speaker – and part time photographer…
    25+ years in the IT industry
    Worked in DoD, US Gov’t, Fortune 50, and so on…
    Find out more about the Data Vault:
    Full profile on
  • 3. Where are We Today?
    IF you are using Data Vault…
    Auto Generation of Staging Loads
    Auto Generation of Data Vault Loads
    Auto Generation of Data Vault Reconciliation Routines
    Auto Generation of RAW Star Schemas
    Rapid Build out of Star Schemas
    If you are lucky…
    Auto Generation of the Data Vault Model
    Auto Consolidation of Source System Data Models
    Auto Generation of the Staging Data Model
  • 4. Where do all these pieces fit?
    DW2.0 Framework!
  • 5. DW2.0 Framework
    Data Mining
    Enterprise Service Bus / SOA / Web Services
    Unstructured Data:
    • Email
    • 6. Plain Text
    • 7. Word Docs
    • 8. Images
    (Cloud RAM)
    Enterprise Data Warehouse
  • 9. How do we get there?
  • 10. Virtual Marts: What are they?
    They Are:
    RAM based data marts, or SSD drive based Data Marts
    OLAP cubes (most of the time) built on the fly by new queries
    “hot-data” that are continually accessed by the BI tool
    the result sets of the most frequently used queries
    built dynamically, are accessed regularly, and are destroyed after “idle” for a specific time
    only a subset of data from the EDW
    NOTE: They have WRITE-BACK capabilities!!
  • 11. Virtual Marts
    Cloud based RDBMS
    with expandable RAM
    Unlimited computing power
    Maximum parallelism
    Extreme scalability
    OR: Big Hardware with similar attributes
    • Highly Alterable Answer Sets
    • 12. Write Back to BDV
    • 13. Dynamic create/destroy capability
    • 14. No “copy” of the data except in RAM
  • Virtual Marts: How do I build one?
    You can, if you have Solid-State-Disk (RAM-DISK) in your database server
    You can if you are using Cloud Technology
    Building one is the job of the 2010 RDBMS engine (today’s database engines do not provide these capabilities)
    However: To emulate, you can build one as follows:
    Monitor the queries most frequently executed
    Build the Cubes / stars on a regular schedule (automated queries)
    Tear the cubes down when queries no longer access the data
    Remember: It will be YOUR job to maintain, monitor and manage these components until the database engines get there with HOT data.
  • 15. Virtual Marts Affect The BDV
    Write Back Capability:
    from Virtual Marts affect business decisions
    New Business transactions/changed transactions will be fed back to operational systems
    Changes will be sent on the bus to notify other systems of business decisions
    User security and control will have to be in place to authorize WHO can change WHAT in which parts of the marts.
    Tracking of each change will become a required standard
    Eventually the Virtual Marts will become a MIXED BI Application with an operational front end!
  • 16. Unstructured Data: What is it?
    It is: Information that resides on your desktop, on your servers, on the web, is multi-lingual, and conceptually based.
    Technically: Documents, E-Mails, Transcripts, Videos, Images, Sound Files.
    It is 80% of the data yet un-used by EDW/BI operations around the world
    It is 10x harder to deal with than structured data due to privacy concerns, ownership issues, and ethical concerns.
    Data Governance, and Data Stewardship play a HUGE role in the success/failure of working with Unstructured Data Sets
  • 17. Unstructured Data
    • Pre-Processed data sets
    • 18. Pointers to data sets
    • 19. Use of & Loading of Ontologies
    • 20. Multi-Language processing
    • Highly Alterable Answer Sets
    • 21. Write Back to BDV
    • 22. Dynamic create/destroy capability
    • 23. No “copy” of the data except in RAM
  • Unstructured Data Engines Vs Search Engines
    Unstructured Data Engine
    Search Engine
    Indexes Documents
    Locates ALL potential matches
    Uses Data Mining / Neural Nets
    Correlates across multiple languages, multiple meanings of phrases
    Induction based reasoning
    Similarity Ratings based on Confidence and Strength
    Deep Analysis (focused on 1 question)
    Utilizes Ontologies
    Indexes key terms
    Locates “most likely match”
    Uses Statistical Analysis
    Correlates based on “Term matching”
    Wide search, but not “deep analysis”
  • 24. U-Data & Data Vault
    Unstructured Data – Loaded To Database
    Ontology, Loaded to Database
    Dynamic Links
    Built from Analyzing Queries
    And Ontologies
    Used to Load Cubes!
    Structured RAW Data Vault
  • 25. U-Data & Ontologies
    Ontologies describe term relationships
    Ontologies house term hierarchies
    Ontologies can correlate terms across languages
    Ontologies can provide synonyms, homonyms, and antonyms
    Ontologies are the key piece of Metadataneeded to cross unstructured mining results to structured data sets in source systems
    Ontologies define the manner in which natural language ties together concepts
    Ontologies (or pieces of them) are required for success within the understanding of Unstructured Data & Structured Data Combinations
  • 26. Ontologies and BI Applications
    Business Users will shift their BI applications to include managing data sets THROUGH ontology specifications
    Business Users will assign governance to ontologies and manage changes to ontologies as their metadata definitions
    Tomorrows BI tool set will provide visualizations of Ontologies cross-mapped to analytical data sets
    Ontologies ARE the metadata of tomorrow
  • 27. Plateau: Operational Data Warehouse
    • Web-Services feeds with real-time data
    • 28. Applications for metadata management on top of the EDV
    • 29. Applications for Ontology Management on top of the EDV
    • 30. Applications to edit/maintain Operational Data
    • 31. Virtual Data Marts
    • 32. In-DB Data Mining Engine Capabilities
    • Direct ties between the operational world and the Data Warehouse
    • 33. Rapid turn around/impact analysis by business users
  • Operational DV: How to Build One
    The Easy Way:
    Start with standard Data Vault Modeling
    Attach Web-Services for In-flow/Out-Flow of Data (putting the DV on the ESB as a 24x7x365 operational component)
    Use Business Workflow Engines to monitor, create, edit, change and build applications on top of the web-services and web messages components
    Never allow direct access to the data in the Data Vault EXCEPT through web-services
    The Hard Way:
    Start with Standard Data Vault Modeling
    Attach Web Services for In-Flow/Out-Flow of Data
    Build a common data access layer (CDAL) that houses transactions in RAM (manages locking of data sets)
    Build applications on top of the CDAL
    Put the whole thing on the CLOUD to allow dynamic data marts
  • 34. The Experts Say…
    “The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.”
    Bill Inmon
    “The Data Vault is foundationally strong and exceptionally scalable architecture.”
    Stephen Brobst
    “The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....”
    Doug Laney
  • 35. More Notables…
    “This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.”
    Howard Dresner
    “[The Data Vault] captures a practical body of knowledge for data warehouse development which both agile and traditional practitioners will benefit from..”
    Scott Ambler
  • 36. Where To Learn More
    The Technical Modeling Book:
    The Discussion Forums: & events – Data Vault Discussions
    Contact me: - web - email
    World wide User Group (Free)