Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Real-life Customer Cases using Data Vault and Data Warehouse Automation


Published on

Presented by Dirk Vermeiren, Partner Tripwire Solutions

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Real-life Customer Cases using Data Vault and Data Warehouse Automation

  1. 1. 1 Real life customer cases using Data Vault and Data Warehouse automation Dirk Vermeiren – Partner Tripwire Solutions
  2. 2. Historical – Milestone projects Health Sector1 2009 Data Vault
  3. 3. Rule 1 – Do not implement a Data Vault DWH without DWH Automation ¡ Why ? ¡ You have 3 to 4 times more objects than 3NF, meaning much more manual development work. ¡ Data Vault objects have generic logic per type (HUB, SATELLITE & LINK) and there are lots of them. ¡ Therefore code generation can be used to deliver higher development speeds.
  4. 4. Rule 2 – Do not create a DV model that holds the single version of the Truth in your first layer of your DWH ¡ Why ? ¡ Single version of the Truth : ¡ Is defined by business and changes faster over time than the source systems ¡ The single version of the truth is a myth. ¡ As business definitions changes over time you will also get multiple versions of the truth over time
  5. 5. Rule 3 – Do not limit what you record in the DV based on user requirements ¡ Why ? ¡ End users can not predict everything they need and they want to be able change their mind on what is needed.
  6. 6. Historical – Milestone projects Health Sector1 2009 Data Vault Health Sector2 2010 Data Vault DWH Automation 1.0
  7. 7. Healthcare Sector Project 2 ¡  Create a foundation layer the holds : ¡  the Single version of the Facts = Stores data in Source system format ¡  Atomic level data ¡  All data from source except for Interface or other technical tables ¡  All History of change ¡  Integrates data across sources ¡  Use a data Vault modeling which is flexible and resilient to change. ¡  Use etl-generation = OWB OMB-code generation ¡  Important : Reuse of investment of existing ETL-tool is important so the automation tool should generate Mappings, not replace the existing tool. ¡  Create a presentation layer ¡  Generate the incremental logic from foundation to presentation layer. ¡  Manual development. ¡  That structures the data in a way end users understands it.
  8. 8. Data Flow – HC Sector Project 2
  9. 9. Rule 4 – Do not automate incremental logic towards PL ¡ Why ? ¡ Generic increment logic can not take in account that there are driving tables, which means all tables are driving tables in the load logic and this has a huge impact on the performance. ¡ Exception : ¡ Use Engineered systems to run this logic ¡ Use Engineered systems & in Memory technology to virtualize the Presentation Layer.
  10. 10. Rule 5 – Do not implement all business logic From Foundation layer to Presentation layer ¡ Atomic level objects that do not exist in the source should not be placed in the presentation layer ¡ Why ? ¡ They are typically used in business logic to build multiple Presentation Layer object ¡ Best to persist them before the PL, otherwise you have to implement the logic to load them, multiple times
  11. 11. Historical – Milestone projects Health Sector1 2009 Data Vault Health Sector2 2010 Data Vault DWH Automation 1.0
  12. 12. Bank project ¡ Introduced new DV features in DWH Automation tool ¡ Transactional Links ¡ Same as Logic ¡ Splitting Satellites over Multiple Satellites ¡ More customization so more customer standards could be supported
  13. 13. Historical – Milestone projects Health Sector1 2009 Data Vault Health Sector2 2010 Data Vault DWH Automation 1.0
  14. 14. The Agile Information Factory i Architecture & Approach q  Innovation ü  Supports all new concepts in Information management q  Delivers Value ü  Agility ü  Cost Reduction q  Best Practices ü  Reuse of approach & solutions q  Oracle Platform ü  Uses Integrated Software/Hardware stack of Oracle
  15. 15. DWH-Automation Solution 3.0 • Tripwire DWH Foundation Accelerator Analysis Source Analysis Automation • Tripwire DWH Foundation Accelerator Development Etl-Code Generation • Tripwire DWH Foundation Accelerator Testing Automated Data Validation • Redbridge Lifecycle Management Automated Release Management • Oracle Enterprise Metada Management Impact Analysis Enterprise Metadata Management
  16. 16. Raw and Business DV
  17. 17. ¡  In the foundation layer there are actually 2 persistent layers (typically stored in 1 schema) ¡  RDV : Raw integration – none to simple business key integration – the data does not represent common business rules ¡  BDV : Business Data Vault ¡  Business rules are applied ¡  Business key integration takes place ¡  New Business Concept introduced ¡  Data Virtualization of exiting business concepts in the Raw Vault – Do not persist objects that already exist in the Raw Data Vault Foundation Area : The internal Layers
  18. 18. Multiple speed Implementation ¡  The Raw and the Business Data Vault area can be built at different speeds because : ¡  The RAW or Source based Data Vault is : ¡  A technical implementation based on source systems and only requires a source analysis = Single version of the fact ¡  Data Warehouse automation can be used as the target structure is a direct representation of the source. ¡  The Business Data Vault is : ¡  A Business based implementation that requires functional and technical analysis to understand business requirements = Single or multiple versions of the truth ¡  New Business Concepts can be created (New Hubs) but implementation experience show typically link tables between existing Source Business concepts (source based Hubs) support requirements for 90%. ¡  The multiple speed approach supports better functional and technical analysis when the raw data vault data is already available.
  19. 19. Rule 6 – Put the right business logic in the right layer. ¡  If you do not standardize than you will have to document everything ¡  Supports the multiple speed approach ¡  Increases the ability to change without high impact. ¡  Parameters to define where to place which Business logic : ¡  Stability : Is this logic likely to change a lot over time ¡  Scope : Enterprise wide, Departmental, User specific ¡  Type : Conditional, Calculation, Aggregation, Data Quality Check, … ¡  Result : Factual, Master Data
  20. 20. For questions : Piet De Windt +32 473 99 99 89 Everything you need to build something exceptional