Your SlideShare is downloading. ×
0
Trends inData Warehouse Data Modeling:      Data Vault and     Anchor Modeling
Thanks for Attending!●   Roland Bouman, Leiden the Netherlands●   MySQL AB, Sun, Strukton, Pentaho (1 nov)●   Web- and Bus...
Data Warehouse (DWH)●   Support Business Intelligence (BI)        –   Reporting        –   Analysis        –   Data mining...
DWH Architectures●   Categories       –   Traditional       –   Hybrid       –   Modern●   Aspects       –   Modelling    ...
DWH Architectures●   Traditional        –   Information Factory (Bill Inmon)        –   Enterprise Bus (Ralph Kimball)●   ...
DWH Architectures●   Traditional●   Hybrid        –   Hub-and-Spoke●   Modern
DWH Architectures●   Traditional●   Hybrid●   Modern        –   Data Vault (Dan Linstedt)        –   Anchor Modeling (Lars...
Inmon DWH (Traditional):Corporate Information Factory “A source of data that is subject oriented,integrated, nonvolatile a...
Inmon DWH (Traditional):Corporate Information Factory●   Enterprise or Corporate DWH, DWH 2.0●   Focus on backroom data in...
Data logistics of the  Corporate Information FactoryOLTP                                               OLAP DB      Stagin...
Data Modeling for the        CIF Enterprise DWH●   Normalized, typically 3NF●   Organized in “subject areas”       –   Ser...
Data Modeling for the        CIF Enterprise DWH●   History       –   PK includes a date/timepart●   Contains both detail a...
Kimball DWH (Traditional):  Dimensional Model and   DWH Bus Architecture “The data warehouse is the conglomerationof an or...
Kimball DWH (Traditional):      DWH Bus Architecture●   Focus on data delivery●   Integration at the data mart level●   To...
Data logistics of the    DWH Bus ArchitectureOLTP DB      Staging      (Enterprise) Data Warehouse         Extract        ...
Data Modeling for the       DWH Bus Architecture●   Dimensional Modeling       –   Star schemas●   Organized in:       –  ...
Data Modeling for the       DWH Bus Architecture●   Fact tables       –   Highly normalized       –   Additive metrics●   ...
Data Modeling for the       DWH Bus Architecture●   History       –   Slowly changing dimensions (versioning)       –   Fa...
Sakila Rental Star Schema
Sakila DWH Bus Architecture    dim_film       dim_date  fact_inventory   fact_rental   fact_payment    dim_store      dim_...
Problems with traditional        DWH architectures●   General Problems       –   Lack of flexibility and resilience to cha...
Dimensional Modeling            Anomalies●   Snowflaking (dimension normalization)        –   Monster dimensions        – ...
Hybrid DWH: Hub-and-Spoke●   Inmon back-end (hub)●   Kimball front-end (satellites)
Modern: Data Vault “The Data Vault is a detail oriented,historical tracking and uniquely linked set ofnormalized tables th...
Data Vault●   Focus on        –   Data Integration        –   Traceability and Auditability        –   Resilience to chang...
Data Vault Modelling●   Hubs●   Links●   Satellites
Data Vault Modelling: Hubs●   Hubs Model Entities●   Contains business keys       –   PK in absence of surrogate key●   Me...
Data Vault Modelling: Links●   Links model relationships        –   Intersection table (M:n relationship)●   Foreign keys ...
Data Vault Modelling: Satellites ●   Satellites model a group of attributes ●   Foreign key to a Hub or Link ●   Metadata:...
Sakila Data Vault Example
Data Vault tools and Example●   Kettle Data Vault Example       –   Sakila Data Vault       –   Chapter 19       –   Kaspe...
Modern: Anchor model “Anchor Modeling is an agile informationmodeling technique that offers non-destructive extensibility ...
Anchor Modelling●   Focus on       –   Resilience to change       –   Agility       –   Extensibility       –   History tr...
Anchor Modelling●   6NF (Date, Darwen, Lorentzos)●   Table features no non-trivial join    dependencies at all●   Translat...
Anchor Modelling Constructs●   Anchors●   Attributes●   Ties●   Knots
Anchor Modelling: Anchors●   Entities are modeled as Anchors●   Relationships may be modeled as Anchors       –   m:n rela...
Anchor Modelling: Ties●   Ties model relationships        –   1:n relationships        –   m:n relationships without prope...
Anchor Modelling: Attributes●   Models properties of an Anchor●   Static vs Historized        –   History tracked using da...
Anchor Modelling: Knots●   Reference table        –   Fairly small set of distinct values●   Dictionary lookup to qualify ...
Anchor Model Diagram  anchor        Static attribute         Static tie  knot          Historized attribute     Historized...
Aknowledgements●   Kasper de Graaf       –   Twitter: @kdgraaf       –   http://www.dikw-academy.nl●   Jos van Dongen     ...
Upcoming SlideShare
Loading in...5
×

Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_modelling

1,617

Published on

My Percona Live London 2011 presentation

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,617
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
182
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_modelling"

  1. 1. Trends inData Warehouse Data Modeling: Data Vault and Anchor Modeling
  2. 2. Thanks for Attending!● Roland Bouman, Leiden the Netherlands● MySQL AB, Sun, Strukton, Pentaho (1 nov)● Web- and Business Intelligence Developer● author: – Pentaho Solutions – Pentaho Kettle Solutions● Http://rpbouman.blogspot.com/● Twitter: @rolandbouman
  3. 3. Data Warehouse (DWH)● Support Business Intelligence (BI) – Reporting – Analysis – Data mining● General Requirements – Integrate disparate data sources – Maintain History – Calculate Derived data – Data delivery to BI applications
  4. 4. DWH Architectures● Categories – Traditional – Hybrid – Modern● Aspects – Modelling – Data logistics
  5. 5. DWH Architectures● Traditional – Information Factory (Bill Inmon) – Enterprise Bus (Ralph Kimball)● Hybrid● Modern
  6. 6. DWH Architectures● Traditional● Hybrid – Hub-and-Spoke● Modern
  7. 7. DWH Architectures● Traditional● Hybrid● Modern – Data Vault (Dan Linstedt) – Anchor Modeling (Lars Rönnbäck)
  8. 8. Inmon DWH (Traditional):Corporate Information Factory “A source of data that is subject oriented,integrated, nonvolatile and time variant forthe purpose of managements decisionprocesses.”Bill Inmon (the Data Warehouse Toolkit)●http://www.inmoncif.com/home/
  9. 9. Inmon DWH (Traditional):Corporate Information Factory● Enterprise or Corporate DWH, DWH 2.0● Focus on backroom data integration – Central information model – Single version of the truth● Data delivery – Disposable data marts● Bottom-up
  10. 10. Data logistics of the Corporate Information FactoryOLTP OLAP DB Staging DB Extract Extract Enterprise Transform Transform Data Warehouse Load Load Files CubeSource Data Marts BI Apps
  11. 11. Data Modeling for the CIF Enterprise DWH● Normalized, typically 3NF● Organized in “subject areas” – Series of related tables – Example: Customer, Product, Transaction – Common key
  12. 12. Data Modeling for the CIF Enterprise DWH● History – PK includes a date/timepart● Contains both detail and aggregate data – Multiple levels of aggregation
  13. 13. Kimball DWH (Traditional): Dimensional Model and DWH Bus Architecture “The data warehouse is the conglomerationof an organizations staging and presentationareas, where operational data is specificallystructured for query and analysisperformance and ease of use.”Ralph Kimball (the Data Warehouse Toolkit)●http://www.kimballgroup.com/
  14. 14. Kimball DWH (Traditional): DWH Bus Architecture● Focus on data delivery● Integration at the data mart level● Top-down
  15. 15. Data logistics of the DWH Bus ArchitectureOLTP DB Staging (Enterprise) Data Warehouse Extract OLAP Transform DB Load Files CubeSource EDW is a collection Data Marts BI Apps
  16. 16. Data Modeling for the DWH Bus Architecture● Dimensional Modeling – Star schemas● Organized in: – Fact tables – Dimension tables
  17. 17. Data Modeling for the DWH Bus Architecture● Fact tables – Highly normalized – Additive metrics● Dimension tables – Highly denormalized – Descriptive labels – Shared across fact tables
  18. 18. Data Modeling for the DWH Bus Architecture● History – Slowly changing dimensions (versioning) – Fact links to Date and/or Time dimensions● Detailed, not aggregated
  19. 19. Sakila Rental Star Schema
  20. 20. Sakila DWH Bus Architecture dim_film dim_date fact_inventory fact_rental fact_payment dim_store dim_staff dim_customer
  21. 21. Problems with traditional DWH architectures● General Problems – Lack of flexibility and resilience to change – Loading (ETL) Complexity● Problems with Inmon – Centralization requires upfront investment – Single version of whose truth, when?● Problems with Kimball – Dimensional Model anomalies
  22. 22. Dimensional Modeling Anomalies● Snowflaking (dimension normalization) – Monster dimensions – Outriggers – Ex: Customer Demographics● Hierarchical data – Bridge table (closure table) – Ex: Employee/Boss,● Multi-valued dimensions – Bridge table – Ex: Account/Customer bridge table
  23. 23. Hybrid DWH: Hub-and-Spoke● Inmon back-end (hub)● Kimball front-end (satellites)
  24. 24. Modern: Data Vault “The Data Vault is a detail oriented,historical tracking and uniquely linked set ofnormalized tables that supports one or morefunctional areas of business. It is a hybridapproach encompassing the best of breedbetween 3rd normal form (3NF) and starschema.”Dan Linstedt (Data Vault Overview)●http://danlinstedt.com/
  25. 25. Data Vault● Focus on – Data Integration – Traceability and Auditability – Resilience to change● Single version of the facts – Rather than single version of the truth● All of the data, all of the time – No upfront cleansing and conforming● Bottom-up
  26. 26. Data Vault Modelling● Hubs● Links● Satellites
  27. 27. Data Vault Modelling: Hubs● Hubs Model Entities● Contains business keys – PK in absence of surrogate key● Metadata: – Record source – Load date/time● Optional surrogate key – Used as PK if present● No foreign keys!
  28. 28. Data Vault Modelling: Links● Links model relationships – Intersection table (M:n relationship)● Foreign keys to related hubs or links – Form natural key (business key) of the link● Metadata: – Record source – Load date/time● Optional surrogate key
  29. 29. Data Vault Modelling: Satellites ● Satellites model a group of attributes ● Foreign key to a Hub or Link ● Metadata: – Record source – Load date/time
  30. 30. Sakila Data Vault Example
  31. 31. Data Vault tools and Example● Kettle Data Vault Example – Sakila Data Vault – Chapter 19 – Kasper van de Graaf – http://www.dikw-academy.nl● Quipu – Data Vault Generator – Kettle templates – Johannes van den Bosch – http://www.datawarehousemanagement.org/
  32. 32. Modern: Anchor model “Anchor Modeling is an agile informationmodeling technique that offers non-destructive extensibility mechanismsenabling robust and flexible management ofchanges. A key benefit of Anchor Modeling isthat changes in a data warehouseenvironment only require extensions, notmodications.”Lars Rönnbäck (Agile Information Modeling  in Evolving Data Environments)●http://www.anchormodeling.com/
  33. 33. Anchor Modelling● Focus on – Resilience to change – Agility – Extensibility – History tracking● Bottom-up
  34. 34. Anchor Modelling● 6NF (Date, Darwen, Lorentzos)● Table features no non-trivial join dependencies at all● Translation: A 6NF table cannot be decomposed losslessly● Translation● Temporal Data
  35. 35. Anchor Modelling Constructs● Anchors● Attributes● Ties● Knots
  36. 36. Anchor Modelling: Anchors● Entities are modeled as Anchors● Relationships may be modeled as Anchors – m:n relationships having properties● Only a surrogate key
  37. 37. Anchor Modelling: Ties● Ties model relationships – 1:n relationships – m:n relationships without properties● Static vs Historized – History tracked using date/time● May be Knotted – Knot holds set of association types● Two or more “anchor roles” – Relationships may be broken into several ties having only mandatory anchors
  38. 38. Anchor Modelling: Attributes● Models properties of an Anchor● Static vs Historized – History tracked using date/time● May or not be Knotted – Knot holds set of valid attribute values
  39. 39. Anchor Modelling: Knots● Reference table – Fairly small set of distinct values● Dictionary lookup to qualify – Attributes – Ties● “Knotted” Attributes and Ties
  40. 40. Anchor Model Diagram anchor Static attribute Static tie knot Historized attribute Historized tiehttp://www.anchormodeling.com/modeler/latest/
  41. 41. Aknowledgements● Kasper de Graaf – Twitter: @kdgraaf – http://www.dikw-academy.nl● Jos van Dongen – Twitter: @josvandongen – http://www.tholis.com/
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×