Lean Data Modelstorming
For The Agile Enterprise
Daniel Upton
www.DecisionLab.Net
dupton@DecisionLab.Net
linkedin.com/in/danielupton
The Business Intelligence Promise: Smarter, more fact-based
decision-making as an everyday routine.
Traditional Architectural Trade-Off:
Do you want it quickly, fully featured, or
with high quality? Pick two.
Daniel Upton
DecisionLab dupton@decisionlab.Net
Q: Why does a Data Warehouse Take So Long?
A: Many functional interdependencies result in
mostly sequential tasking
Daniel Upton
DecisionLab dupton@decisionlab.Net
Deliver Data
Especially these tasks: Nothing delivered
for multiple sprints. Non-agile.
Daniel Upton
DecisionLab dupton@decisionlab.Net
Deliver Data
Sequential Development
Daniel Upton
DecisionLab dupton@decisionlab.Net
Gantt View: End to End ModelStormed
Lean Data Hubs Enable Fast Delivery
Daniel Upton
DecisionLab dupton@decisionlab.Net
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Lean Core Principles
• Focus on customer
• Eliminate waste
• Deliver as fast as possible
• Decide as late as possible
• Optimize the whole
– Sub-optimize some parts
• Modularize, automate, and re-use
Daniel Upton dupton@decisionlab.Net
DecisionLab
End to End ModelStorming
• Dimensional Modelstorming
• Goal: Quickly express a complete logical star schema data model proven to fully satisfy a user
data story’s narrative and it’s detailed acceptance criteria.
– Reference: Agile Data Warehouse Design: from Whiteboard to Star Schema (book by Lawrence Corr &
James Stagnitto)
• Lean Data Modelstorming
• Goal: Quickly express a complete lean data model, mapped upstream directly from actual source
data and proven to then supply this source data downstream directly into a data presentation layer
such as a Star Schema.
– An Original Concept by Daniel Upton (Presenter: 10/15/2016, SoTec Conference)
Daniel Upton dupton@decisionlab.Net
DecisionLab
Dimensional ModelStorming
In stakeholder meeting…
• Question: Who does what?
– Who, what, where, when, how, why, how much?
• Answers: User Information Stories
• On a whiteboard, draw and refine…
– Event Model Storm
– Dimension ModelStorm
– Hours Later… Event Matrix
Source: Agile Data Warehouse Design: from Whiteboard to Star Schema (book by Corr & Stagnitto)
Daniel Upton dupton@decisionlab.Net
DecisionLab
User Information Stories
• Alumni Development specialists must know which
donors donate how much and when, in order to ensure
donors receive recognition and benefits.
• Institutional Research must know how many students
enrolled weekly, with what program and standing, to
ensure University meets qualifications for student loans.
• Auditors must know what family relationships exist
between donors and students, and student standing, to
monitor for conflicts of interest and uphold reputation.
Daniel Upton dupton@decisionlab.Net
DecisionLab
Event ModelStorms
Daniel Upton dupton@decisionlab.Net
DecisionLab
Dimension ModelStorm
Daniel Upton dupton@decisionlab.Net
DecisionLab
Events Matrix
Daniel Upton dupton@decisionlab.Net
DecisionLab
End to End ModelStorm: Lean Data Overview
Daniel Upton
DecisionLab.Net dupton@decisionlab.Net
Traditional Enterprise DW:
Entities = Relations with Dependencies
Daniel Upton
DecisionLab dupton@decisionlab.Net
Star Schema: Facts w/ Business Rules Fully
Dependent on Dimensions w/ Business Rules
Daniel Upton
DecisionLab dupton@decisionlab.Net
Perfect World Data Flow: Direct to Star Schema
Daniel Upton
DecisionLab dupton@decisionlab.Net
Perfect World Data Flow: EDW and Data Marts
Daniel Upton
DecisionLab dupton@decisionlab.Net
Daniel Upton, DecisionLab
Real World Data Flow: EDW and Data Marts
Daniel Upton
DecisionLab
Real World Data Flow: EDW and Data Marts
Daniel Upton, DecisionLab
Real World Data Flow: EDW and Data Marts
Daniel Upton
DecisionLab
Real World: Even Big Ships Are At Risk
Daniel Upton
DecisionLab
EDW’s are at Risk from Business Volatility.
EDW Development Takes a Long Time
Daniel Upton
DecisionLab dupton@decisionlab.Net
Long ago, many decided not to wait for an EDW
Daniel Upton
DecisionLab dupton@decisionlab.Net
Traditional Data Mart (Star Schema)
Daniel Upton
DecisionLab dupton@decisionlab.Net
Perfect World Data Flow: Direct to Star Schema
Daniel Upton
DecisionLab dupton@decisionlab.Net
Standardizing
Directly on
Star Schema
Forces Tight
Restrictions
on Incoming
Data
Daniel Upton
DecisionLab dupton@decisionlab.Net
Real World Data Flow: Direct to Star Schema
Daniel Upton
DecisionLab dupton@decisionlab.Net
What’s wrong here?
Daniel Upton
DecisionLab dupton@decisionlab.Net
Direct to Star: Begins well, but harder to
sustain over time.
Daniel Upton
DecisionLab dupton@decisionlab.Net
Real World Data Flow: Direct to Star Schema
Daniel Upton, DecisionLab.Net
Single Version of the Truth (SVOT)
Source data is reinterpreted and massaged
into a new data model that fixes core truths
about the data, it’s relationships, and the
business, so that one record, one field, one
table, contains THE authoritative data.
• Is SVOT easy? …achievable? …aspirational?
• Tasks for SVOT: Lengthy requirements analysis and successful negotiation
with many stakeholders across an enterprise, then intensive data modeling,
then custom ETL coding, while hoping SVOT remains fixed.
Daniel Upton
DecisionLab dupton@decisionlab.Net
Single Version of the Truth (SVOT)
Assumptions vs. Experience
• Perfect World Assumption:
• SVOT is universally accurate and stable. Given the amount
of work to achieve SVOT, it needs to be.
• Real World Experience:
• SVOT accuracy can vary from ‘fair to excellent’, it’s scope is
often far from universal across an entire business, and
business changes occur faster than the DW can keep up.
Daniel Upton
DecisionLab dupton@decisionlab.Net
Single Version of the Truth (SVOT)
Disclosure
• Seemingly trivial changes in the business may require non-trivial
changes to the ETL code and reports.
• Non-trivial business changes often require changes to the
underlying SVOT data model, the ETL code, and reports.
• Major business changes, or just many small ones over time often
leave a SVOT DW in a perpetual state of disarray, with a never-
ending list of critical issues and an excessive focus
on ‘break – fix – redeploy’.
Daniel Upton
DecisionLab dupton@decisionlab.Net
Single Version of the Truth (SVOT)
SVOT is worth pursuing, but with a different playbook.
Resolution: The DW structure must not be dependent on a static
interpretation of truth. It must not break when new rules and analyses
need to be applied to data, when inaccuracies are discovered late, or
when data sources or business processes change.
Lean Data Warehouse overcomes this big challenge directly by loosely
coupling diverse source data to insulate the DW from changes, by
easily storing all data as it changes over time, and by delaying
decisions on business rules, SVOT, data quality, reporting or analytics
until after the horses are in the barn.
Daniel Upton
DecisionLab dupton@decisionlab.Net
Questions?
Daniel Upton
DecisionLab dupton@decisionlab.Net
Lean Data Principles
• Eliminate waste: For in-scope source tables, instantiate and load all
records from all attribute and key fields.
• Deliver Fast: Generic design pattern for quickly historizing source data.
• Decide Late:
– Write code for business rules just downstream of Lean Data Hubs, in
order to avoid hard-coding business rules into the core data load.
• Focus on Customer (Pragmatic Design): Scope, design, and load tables
purely based on business needs, regardless of functional constraints in
data sources.
Daniel Upton
DecisionLab dupton@decisionlab.Net
Lean Data Principles
Here’s How: Optimize the Whole: The Lean Data Model must have…
• High Cohesion: Hubs have no functional dependencies to other Hubs, thus
can be scoped, design, loaded simultaneously or months or years later.
• Loose Coupling: Hubs link to other Hubs by association, never by
functional dependency (foreign key in a dependent table).
• Accept some Suboptimized Components to Achieve it: Models are
larger, associative links require an added 1-2 table joins for querying across
Hubs.
Daniel Upton
DecisionLab dupton@decisionlab.Net
Lean Data Hubs
The Lean Data Hub is a critical architectural component in End-to-End
ModelStorming. It is fundamentally based on Data Vault architecture, with the
following specific Data Vault references:
– Super Charge Your Data Warehouse, by Dan Linstedt, co-edited by
Kent Graziano (2008-2011) http://LearnDataVault.Com
– Modeling the Agile Data Warehouse with Data Vault, by Hans Hultgren
(2012) New Hamilton Press
– Agile Data Warehousing for the Enterprise, by Ralph Hughes (2016)
Elsevier / Morgan Kaufman
Daniel Upton
DecisionLab dupton@decisionlab.Net
Lean Data Hubs
Definition:
– “Pattern-based, history-tracking, modular data assets sourced
from highly disparate data, loosely coupled by common business
keys to join core business concepts (ensembles), and leaving
source data otherwise unchanged. It remains flexible and
cohesive, easily configured to support urgent changes in (a)
data sources, (b) business rules, and (c) reporting or analytics
requirements, and it’s loose-coupled design pattern inherently
supports fast, highly parallelized loading by eliminating
dependencies among core business concepts.”
- Daniel Upton, 10/15/2016, SoTec Conference
Daniel Upton
DecisionLab dupton@decisionlab.Net
Lean Data Hubs
Definition (continued):
– Core Business Concept (CBC): Equivalent to an Entity in a 3rd
normal form normalized data model.
– Ensemble: Storage of a CBC in one Hub and all associated
Satellites.
– Modularity and Cohesion: Attained on two levels:
• Between Ensembles (Hubs): an associative (loosely-coupled) ensemble modeling
pattern eliminates all functional dependencies between ensembles (Hubs).
• Isolation of Business Rules from Core Data Layer: The virtualization of analytic or
business-rule transformations, downstream of the ensembles, and preferably as views,
prevents changes in those analytics or business rules from compromising the core
ensembles and their loading process.
Daniel Upton
DecisionLab dupton@decisionlab.Net
A Lean Data Model Protects The Repository from
Volatile Business Rules That Cause ETL to Break
Daniel Upton
DecisionLab dupton@decisionlab.Net
Lean Data Hubs: High Level Architecture
Daniel Upton
DecisionLab dupton@decisionlab.Net
High Level View:
Lean Hubs Design is Modular
Daniel Upton
DecisionLab dupton@decisionlab.Net
End to End ModelStorm: Lean Data Warehouse:
Detail
Daniel Upton
DecisionLab.Net dupton@decisionlab.Net
High Level Summary of Lean Data Modeling Steps:
Watch for the following details in the upcoming visual diagrams
* Addition of control fields
* Modification of primary keys
* Temporary duplication of tables
* Removal of excess fields
* Establishment of Hub vs. Satellite and the Hub-Satellite link
* Creation of Links
* Creation of Hub-Link relationships
Daniel Upton, DecisionLab
Lean Data ModelStorming:
Step 1: Source Data – Fast Profile
Daniel Upton, DecisionLab
Lean Data ModelStorming
Step 2: Prepare New Tables
a. Add two fields to top, set new PK, add two fields to bottom, then duplicate tables
Daniel Upton, DecisionLab
Lean Data ModelStorming
Step 3: Modify for New Ensembles
Daniel Upton, DecisionLab
Lean Data ModelStorming
Step 4: Working Data Model
Daniel Upton, DecisionLab
Credits:
Lean Data Hubs model is based on Data Vault (aka. Hyper Normalized)
design pattern, with credits to these authors…
Super Charge Your Data Warehouse, Dan Linstedt, 2008
LearnDataVault.com (and other books by Mr. Linstedt)
Modeling the Agile Data Warehouse with Data Vault, Hans Hultgren, 2012,
New Hamilton
Agile Data Warehouse Design for the Enterprise, Ralph Hughes, 2016,
Elsevier Inc. (Mr. Hughes originated the term “Hyper Normalalized”.)
In 11th Hour,
Two User Stories Change
• Institutional Research must know how many students
enrolled weekly, with what program and standing and
relationship to a Counselor, to ensure University meets
qualifications for student loans.
• Auditors must know what family relationships exist
between donors and students and Counselors, and
student standing, to monitor for conflicts of interest and
uphold reputation.
Daniel Upton dupton@decisionlab.Net
DecisionLab
Lean Data ModelStorming
Revisit Step 1: Source Data – Fast Profile
Daniel Upton, DecisionLab
Lean DW ModelStorming
Revisit Step 2: Prepare New Tables
Daniel Upton, DecisionLab
11th Hour Lean ModelStorm Complete
High Cohesion and Loose Coupling
No functional dependencies between free-standing ensembles.
Delivered in ½the normal time. Load in parallel with existing loads.
Daniel Upton, DecisionLab
Extensions and adaptations are all pattern-based.
No dependencies between Hubs (core business concepts), so existing
tables never need refactoring of existing tables, therefore multiple
teams can co-develop simultaneously within one Lean Data Warehouse
without interference with each other.
Daniel Upton, DecisionLab
“Warehouse Your Data Now.
Add Rules and Relationships As Needed”
Daniel Upton
DecisionLab dupton@decisionlab.Net
High Level View:
Lean Hubs Design is Modular Design
Daniel Upton
DecisionLab dupton@decisionlab.Net
End to End ModelStorm: Lean Data Hubs
Daniel Upton
DecisionLab.Net dupton@decisionlab.Net
Lean Data Hubs: Detailed Architecture
Daniel Upton, DecisionLab
With Lean Data Hubs as Infrastructure, we can
easily keep up with
ongoing change,
delivering quickly…
Daniel Upton
DecisionLab dupton@decisionlab.Net
…and building data
stability just beneath the
changes.
Waiting for the EDW
Daniel Upton
DecisionLab dupton@decisionlab.Net
Gantt: Nothing delivered
for multiple sprints. Non-agile.
Daniel Upton
DecisionLab dupton@decisionlab.Net
Deliver Data
Critical Path Delays
Daniel Upton
DecisionLab dupton@decisionlab.Net
Traditional Method vs. ModelStormed Lean Data Hubs:
Assume same resources, same levels of skill and effort.
Abbreviations denote chunks of work in Lean Data Hubs.
Daniel Upton
DecisionLab dupton@decisionlab.Net
Daniel Upton
DecisionLab dupton@decisionlab.Net
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
How should we sequence these little chunks of work?
This way?
Let’s take some hints from Lean Data Principles…
* Loose coupling. Many dependencies are gone now.
* Deliver as fast as possible.
…Ideas?
This way is faster and reflects actual
dependencies. Smaller chunks of work due
to fewer data dependencies.
Daniel Upton
DecisionLab dupton@decisionlab.Net
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver Data Every Sprint:
With same resources, skill levels and effort,
Modelstormed Lean Data Hubs support more rapid data delivery
Daniel Upton
DecisionLab dupton@decisionlab.Net
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
Deliver
Data
With Lean Data Hubs as Infrastructure, we can
easily keep up with
ongoing change,
delivering quickly…
Daniel Upton
DecisionLab dupton@decisionlab.Net
…and building data
stability just beneath the
changes.
End to End ModelStorming and Lean Data Hubs
Daniel Upton
DecisionLab.Net dupton@decisionlab.Net
Lean Data Modelstorming
For The Agile Enterprise
Thank you!
Daniel Upton
www.DecisionLab.Net,
dupton@DecisionLab.Net
linkedin.com/in/danielupton
Daniel Upton
DecisionLab dupton@decisionlab.Net

Original: Lean Data Model Storming for the Agile Enterprise

  • 1.
    Lean Data Modelstorming ForThe Agile Enterprise Daniel Upton www.DecisionLab.Net dupton@DecisionLab.Net linkedin.com/in/danielupton
  • 2.
    The Business IntelligencePromise: Smarter, more fact-based decision-making as an everyday routine.
  • 3.
    Traditional Architectural Trade-Off: Doyou want it quickly, fully featured, or with high quality? Pick two. Daniel Upton DecisionLab dupton@decisionlab.Net
  • 4.
    Q: Why doesa Data Warehouse Take So Long? A: Many functional interdependencies result in mostly sequential tasking Daniel Upton DecisionLab dupton@decisionlab.Net Deliver Data
  • 5.
    Especially these tasks:Nothing delivered for multiple sprints. Non-agile. Daniel Upton DecisionLab dupton@decisionlab.Net Deliver Data
  • 6.
  • 7.
    Gantt View: Endto End ModelStormed Lean Data Hubs Enable Fast Delivery Daniel Upton DecisionLab dupton@decisionlab.Net Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data
  • 8.
    Lean Core Principles •Focus on customer • Eliminate waste • Deliver as fast as possible • Decide as late as possible • Optimize the whole – Sub-optimize some parts • Modularize, automate, and re-use Daniel Upton dupton@decisionlab.Net DecisionLab
  • 9.
    End to EndModelStorming • Dimensional Modelstorming • Goal: Quickly express a complete logical star schema data model proven to fully satisfy a user data story’s narrative and it’s detailed acceptance criteria. – Reference: Agile Data Warehouse Design: from Whiteboard to Star Schema (book by Lawrence Corr & James Stagnitto) • Lean Data Modelstorming • Goal: Quickly express a complete lean data model, mapped upstream directly from actual source data and proven to then supply this source data downstream directly into a data presentation layer such as a Star Schema. – An Original Concept by Daniel Upton (Presenter: 10/15/2016, SoTec Conference) Daniel Upton dupton@decisionlab.Net DecisionLab
  • 10.
    Dimensional ModelStorming In stakeholdermeeting… • Question: Who does what? – Who, what, where, when, how, why, how much? • Answers: User Information Stories • On a whiteboard, draw and refine… – Event Model Storm – Dimension ModelStorm – Hours Later… Event Matrix Source: Agile Data Warehouse Design: from Whiteboard to Star Schema (book by Corr & Stagnitto) Daniel Upton dupton@decisionlab.Net DecisionLab
  • 11.
    User Information Stories •Alumni Development specialists must know which donors donate how much and when, in order to ensure donors receive recognition and benefits. • Institutional Research must know how many students enrolled weekly, with what program and standing, to ensure University meets qualifications for student loans. • Auditors must know what family relationships exist between donors and students, and student standing, to monitor for conflicts of interest and uphold reputation. Daniel Upton dupton@decisionlab.Net DecisionLab
  • 12.
    Event ModelStorms Daniel Uptondupton@decisionlab.Net DecisionLab
  • 13.
    Dimension ModelStorm Daniel Uptondupton@decisionlab.Net DecisionLab
  • 14.
    Events Matrix Daniel Uptondupton@decisionlab.Net DecisionLab
  • 15.
    End to EndModelStorm: Lean Data Overview Daniel Upton DecisionLab.Net dupton@decisionlab.Net
  • 16.
    Traditional Enterprise DW: Entities= Relations with Dependencies Daniel Upton DecisionLab dupton@decisionlab.Net
  • 17.
    Star Schema: Factsw/ Business Rules Fully Dependent on Dimensions w/ Business Rules Daniel Upton DecisionLab dupton@decisionlab.Net
  • 18.
    Perfect World DataFlow: Direct to Star Schema Daniel Upton DecisionLab dupton@decisionlab.Net
  • 19.
    Perfect World DataFlow: EDW and Data Marts Daniel Upton DecisionLab dupton@decisionlab.Net
  • 20.
    Daniel Upton, DecisionLab RealWorld Data Flow: EDW and Data Marts
  • 21.
    Daniel Upton DecisionLab Real WorldData Flow: EDW and Data Marts
  • 22.
    Daniel Upton, DecisionLab RealWorld Data Flow: EDW and Data Marts
  • 23.
    Daniel Upton DecisionLab Real World:Even Big Ships Are At Risk
  • 24.
    Daniel Upton DecisionLab EDW’s areat Risk from Business Volatility.
  • 25.
    EDW Development Takesa Long Time Daniel Upton DecisionLab dupton@decisionlab.Net
  • 26.
    Long ago, manydecided not to wait for an EDW Daniel Upton DecisionLab dupton@decisionlab.Net
  • 27.
    Traditional Data Mart(Star Schema) Daniel Upton DecisionLab dupton@decisionlab.Net
  • 28.
    Perfect World DataFlow: Direct to Star Schema Daniel Upton DecisionLab dupton@decisionlab.Net
  • 29.
    Standardizing Directly on Star Schema ForcesTight Restrictions on Incoming Data Daniel Upton DecisionLab dupton@decisionlab.Net
  • 30.
    Real World DataFlow: Direct to Star Schema Daniel Upton DecisionLab dupton@decisionlab.Net
  • 31.
    What’s wrong here? DanielUpton DecisionLab dupton@decisionlab.Net
  • 32.
    Direct to Star:Begins well, but harder to sustain over time. Daniel Upton DecisionLab dupton@decisionlab.Net
  • 33.
    Real World DataFlow: Direct to Star Schema Daniel Upton, DecisionLab.Net
  • 34.
    Single Version ofthe Truth (SVOT) Source data is reinterpreted and massaged into a new data model that fixes core truths about the data, it’s relationships, and the business, so that one record, one field, one table, contains THE authoritative data. • Is SVOT easy? …achievable? …aspirational? • Tasks for SVOT: Lengthy requirements analysis and successful negotiation with many stakeholders across an enterprise, then intensive data modeling, then custom ETL coding, while hoping SVOT remains fixed. Daniel Upton DecisionLab dupton@decisionlab.Net
  • 35.
    Single Version ofthe Truth (SVOT) Assumptions vs. Experience • Perfect World Assumption: • SVOT is universally accurate and stable. Given the amount of work to achieve SVOT, it needs to be. • Real World Experience: • SVOT accuracy can vary from ‘fair to excellent’, it’s scope is often far from universal across an entire business, and business changes occur faster than the DW can keep up. Daniel Upton DecisionLab dupton@decisionlab.Net
  • 36.
    Single Version ofthe Truth (SVOT) Disclosure • Seemingly trivial changes in the business may require non-trivial changes to the ETL code and reports. • Non-trivial business changes often require changes to the underlying SVOT data model, the ETL code, and reports. • Major business changes, or just many small ones over time often leave a SVOT DW in a perpetual state of disarray, with a never- ending list of critical issues and an excessive focus on ‘break – fix – redeploy’. Daniel Upton DecisionLab dupton@decisionlab.Net
  • 37.
    Single Version ofthe Truth (SVOT) SVOT is worth pursuing, but with a different playbook. Resolution: The DW structure must not be dependent on a static interpretation of truth. It must not break when new rules and analyses need to be applied to data, when inaccuracies are discovered late, or when data sources or business processes change. Lean Data Warehouse overcomes this big challenge directly by loosely coupling diverse source data to insulate the DW from changes, by easily storing all data as it changes over time, and by delaying decisions on business rules, SVOT, data quality, reporting or analytics until after the horses are in the barn. Daniel Upton DecisionLab dupton@decisionlab.Net
  • 38.
  • 39.
    Lean Data Principles •Eliminate waste: For in-scope source tables, instantiate and load all records from all attribute and key fields. • Deliver Fast: Generic design pattern for quickly historizing source data. • Decide Late: – Write code for business rules just downstream of Lean Data Hubs, in order to avoid hard-coding business rules into the core data load. • Focus on Customer (Pragmatic Design): Scope, design, and load tables purely based on business needs, regardless of functional constraints in data sources. Daniel Upton DecisionLab dupton@decisionlab.Net
  • 40.
    Lean Data Principles Here’sHow: Optimize the Whole: The Lean Data Model must have… • High Cohesion: Hubs have no functional dependencies to other Hubs, thus can be scoped, design, loaded simultaneously or months or years later. • Loose Coupling: Hubs link to other Hubs by association, never by functional dependency (foreign key in a dependent table). • Accept some Suboptimized Components to Achieve it: Models are larger, associative links require an added 1-2 table joins for querying across Hubs. Daniel Upton DecisionLab dupton@decisionlab.Net
  • 41.
    Lean Data Hubs TheLean Data Hub is a critical architectural component in End-to-End ModelStorming. It is fundamentally based on Data Vault architecture, with the following specific Data Vault references: – Super Charge Your Data Warehouse, by Dan Linstedt, co-edited by Kent Graziano (2008-2011) http://LearnDataVault.Com – Modeling the Agile Data Warehouse with Data Vault, by Hans Hultgren (2012) New Hamilton Press – Agile Data Warehousing for the Enterprise, by Ralph Hughes (2016) Elsevier / Morgan Kaufman Daniel Upton DecisionLab dupton@decisionlab.Net
  • 42.
    Lean Data Hubs Definition: –“Pattern-based, history-tracking, modular data assets sourced from highly disparate data, loosely coupled by common business keys to join core business concepts (ensembles), and leaving source data otherwise unchanged. It remains flexible and cohesive, easily configured to support urgent changes in (a) data sources, (b) business rules, and (c) reporting or analytics requirements, and it’s loose-coupled design pattern inherently supports fast, highly parallelized loading by eliminating dependencies among core business concepts.” - Daniel Upton, 10/15/2016, SoTec Conference Daniel Upton DecisionLab dupton@decisionlab.Net
  • 43.
    Lean Data Hubs Definition(continued): – Core Business Concept (CBC): Equivalent to an Entity in a 3rd normal form normalized data model. – Ensemble: Storage of a CBC in one Hub and all associated Satellites. – Modularity and Cohesion: Attained on two levels: • Between Ensembles (Hubs): an associative (loosely-coupled) ensemble modeling pattern eliminates all functional dependencies between ensembles (Hubs). • Isolation of Business Rules from Core Data Layer: The virtualization of analytic or business-rule transformations, downstream of the ensembles, and preferably as views, prevents changes in those analytics or business rules from compromising the core ensembles and their loading process. Daniel Upton DecisionLab dupton@decisionlab.Net
  • 44.
    A Lean DataModel Protects The Repository from Volatile Business Rules That Cause ETL to Break Daniel Upton DecisionLab dupton@decisionlab.Net
  • 45.
    Lean Data Hubs:High Level Architecture Daniel Upton DecisionLab dupton@decisionlab.Net
  • 46.
    High Level View: LeanHubs Design is Modular Daniel Upton DecisionLab dupton@decisionlab.Net
  • 47.
    End to EndModelStorm: Lean Data Warehouse: Detail Daniel Upton DecisionLab.Net dupton@decisionlab.Net
  • 48.
    High Level Summaryof Lean Data Modeling Steps: Watch for the following details in the upcoming visual diagrams * Addition of control fields * Modification of primary keys * Temporary duplication of tables * Removal of excess fields * Establishment of Hub vs. Satellite and the Hub-Satellite link * Creation of Links * Creation of Hub-Link relationships Daniel Upton, DecisionLab
  • 49.
    Lean Data ModelStorming: Step1: Source Data – Fast Profile Daniel Upton, DecisionLab
  • 50.
    Lean Data ModelStorming Step2: Prepare New Tables a. Add two fields to top, set new PK, add two fields to bottom, then duplicate tables Daniel Upton, DecisionLab
  • 51.
    Lean Data ModelStorming Step3: Modify for New Ensembles Daniel Upton, DecisionLab
  • 52.
    Lean Data ModelStorming Step4: Working Data Model Daniel Upton, DecisionLab Credits: Lean Data Hubs model is based on Data Vault (aka. Hyper Normalized) design pattern, with credits to these authors… Super Charge Your Data Warehouse, Dan Linstedt, 2008 LearnDataVault.com (and other books by Mr. Linstedt) Modeling the Agile Data Warehouse with Data Vault, Hans Hultgren, 2012, New Hamilton Agile Data Warehouse Design for the Enterprise, Ralph Hughes, 2016, Elsevier Inc. (Mr. Hughes originated the term “Hyper Normalalized”.)
  • 53.
    In 11th Hour, TwoUser Stories Change • Institutional Research must know how many students enrolled weekly, with what program and standing and relationship to a Counselor, to ensure University meets qualifications for student loans. • Auditors must know what family relationships exist between donors and students and Counselors, and student standing, to monitor for conflicts of interest and uphold reputation. Daniel Upton dupton@decisionlab.Net DecisionLab
  • 54.
    Lean Data ModelStorming RevisitStep 1: Source Data – Fast Profile Daniel Upton, DecisionLab
  • 55.
    Lean DW ModelStorming RevisitStep 2: Prepare New Tables Daniel Upton, DecisionLab
  • 56.
    11th Hour LeanModelStorm Complete High Cohesion and Loose Coupling No functional dependencies between free-standing ensembles. Delivered in ½the normal time. Load in parallel with existing loads. Daniel Upton, DecisionLab
  • 57.
    Extensions and adaptationsare all pattern-based. No dependencies between Hubs (core business concepts), so existing tables never need refactoring of existing tables, therefore multiple teams can co-develop simultaneously within one Lean Data Warehouse without interference with each other. Daniel Upton, DecisionLab
  • 58.
    “Warehouse Your DataNow. Add Rules and Relationships As Needed” Daniel Upton DecisionLab dupton@decisionlab.Net
  • 59.
    High Level View: LeanHubs Design is Modular Design Daniel Upton DecisionLab dupton@decisionlab.Net
  • 60.
    End to EndModelStorm: Lean Data Hubs Daniel Upton DecisionLab.Net dupton@decisionlab.Net
  • 61.
    Lean Data Hubs:Detailed Architecture Daniel Upton, DecisionLab
  • 62.
    With Lean DataHubs as Infrastructure, we can easily keep up with ongoing change, delivering quickly… Daniel Upton DecisionLab dupton@decisionlab.Net …and building data stability just beneath the changes.
  • 63.
    Waiting for theEDW Daniel Upton DecisionLab dupton@decisionlab.Net
  • 64.
    Gantt: Nothing delivered formultiple sprints. Non-agile. Daniel Upton DecisionLab dupton@decisionlab.Net Deliver Data
  • 65.
    Critical Path Delays DanielUpton DecisionLab dupton@decisionlab.Net
  • 66.
    Traditional Method vs.ModelStormed Lean Data Hubs: Assume same resources, same levels of skill and effort. Abbreviations denote chunks of work in Lean Data Hubs. Daniel Upton DecisionLab dupton@decisionlab.Net
  • 67.
    Daniel Upton DecisionLab dupton@decisionlab.Net Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data Howshould we sequence these little chunks of work? This way? Let’s take some hints from Lean Data Principles… * Loose coupling. Many dependencies are gone now. * Deliver as fast as possible. …Ideas?
  • 68.
    This way isfaster and reflects actual dependencies. Smaller chunks of work due to fewer data dependencies. Daniel Upton DecisionLab dupton@decisionlab.Net Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data
  • 69.
    Deliver Data EverySprint: With same resources, skill levels and effort, Modelstormed Lean Data Hubs support more rapid data delivery Daniel Upton DecisionLab dupton@decisionlab.Net Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data Deliver Data
  • 70.
    With Lean DataHubs as Infrastructure, we can easily keep up with ongoing change, delivering quickly… Daniel Upton DecisionLab dupton@decisionlab.Net …and building data stability just beneath the changes.
  • 71.
    End to EndModelStorming and Lean Data Hubs Daniel Upton DecisionLab.Net dupton@decisionlab.Net
  • 72.
    Lean Data Modelstorming ForThe Agile Enterprise Thank you! Daniel Upton www.DecisionLab.Net, dupton@DecisionLab.Net linkedin.com/in/danielupton
  • 73.