Agile Data Architecture Overview
• Tim Guay, PMP, CSM, CSD, PMI-ACP,
CLSSS
Who is cPrime?
Engaged for Your Project Management Success
After the webinar…
• We will send directions to collect the PDU you will earn
from attending this webinar
• We will also send a links to the recorded webinar and
presentation slides once they are posted online
For more information, visit www.cprime.com
Your Instructor
• Tim Guay has over 25 years of IT experience and has
applied Agile methodologies since 2002.
• Enterprise Data Warehouse Specialist for 6 1/2 years
• Managed major DW projects
• PMP Certified since 2001, CSM since 2008, PMP-ACP
since 2012, and Lean Sensei since 2013.
• Clients have included government agencies, start-ups,
and Fortune 500 corporations.
• Agile trainer and coach.
4
Agenda
• Agile Data is Possible
• Why do it?
• Guiding Principles
• Evolutionary design
• Database Refactoring
• Hyper-normalization and Generalization
• Agile Modeling
• Q & A
5
Agile Data Is Possible
• There are many who say that creating an enterprise-
level database or data warehouse requires BDUF
• It is possible and it is actually the better way to go as
both Kimball and Inmon attest
• Though Kimball’s architecture is best suited and will be
the one that underlies my presentation.
• Best because:
• Bottom-up approach
• Conformed Dimensions and Bus
• Matrix Bus
6
Agile Data is Possible
Goals of Agile Data Architecture
•To architect to support the delivery of working DW/BI
functionality early and continuously to our customers
•To architect for change
•Scott Ambler key thought-leader in this space
7
Why Do It?
• Agile Myths - Too risky, no planning, no design, no
documentation, cowboy coding, only good for small
projects
• Waterfall realities - Overall failure rate 29% (Standish),
DW failure rate 50%+ (Gartner)
8
Why Do It?
• DW Failure Modes :
• Insufficient business involvement
• Underestimating the complexity and scope
• Not anticipating or allowing change
• Misunderstood expectations
• Overcomplicated architecture
• Poor understanding of the data
9
Guiding Principles
Agile Principles
1.Our highest priority is to satisfy the customer through early and continuous
delivery of valuable software
2.Welcome changing requirements, even late in development. Agile processes
harness change for the customer's competitive advantage
3.Deliver working software frequently, from a couple of weeks to a couple of
months, with a preference to the shorter timescale
4.Business people and developers must work together daily throughout the
project
5.Build projects around motivated individuals. Give them the environment and
support they need, and trust them to get the job done
6.The most efficient and effective method of conveying information to and within
a development team is face-to-face conversation
10
Guiding Principles
Agile Principles
7.Working software is the primary measure of progress
8.Agile processes promote sustainable development. The sponsors, developers,
and users should be able to maintain a constant pace indefinitely
9.Continuous attention to technical excellence and good design enhances agility
10.Simplicity — the art of maximizing the amount of work not done — is essential
11.The best architectures, requirements, and designs emerge from self-
organizing teams
12.At regular intervals, the team reflects on how to become more effective, then
tunes and adjusts its behavior accordingly.
11
Evolutionary Design
Key Practices
•Close collaboration between DBAs and developers
•Each developer gets their own DB instance and test data
•Continuous integration into the shared master
•Automate the refactoring
•Automatically update the developer instances whenever
the master is changed
•Have a clear DB access layer within the code
•Beware of delivering one-off solutions
12
Evolutionary Design
Laying the Foundation - Conformed Dimensions
•Conformed dimensions are descriptive master reference data that
are referenced in multiple dimensional models
•Fundamental to the Kimball approach
•Enables Agile DW/BI by levering existing CD’s
•Start by identifying a subset of attributes that have significance
across the enterprise and iteratively grow from there
•Failure to create conformed dimensions from the start will result
in significant technical debit and is one of the key reasons for Agile
DW project failure
13
Evolutionary Design
Laying the Foundation - Bus Matrix
•Each column is a conformed dimension
•Separate columns describe other information associated to
each business process i.e. Owner, etc.
•Each row is a business process
•Each dimension is associated to a process by an X in the
intersecting cell
•Meets the Agile principle of just enough documentation
14
Evolutionary Design
Laying the Foundation - Bus Matrix
•Can be done in a matter of days with the right people at the
table and a skilled facilitator
•Solid understanding of data and processes is required
•Collaboration is key
•Provides the Agile master plan and list of reusable common
dimensions
•Focusing on one row at a time reduces risks from overly-
ambitious plans and supports the Agile principle of rapid
development of valuable software
15
Evolutionary Design
Database Encapsulation Layer
•Software architecture should include a database
encapsulation layer; aka persistence layer or data layer
•Hides the physical details of the DB from the business code
•If DB changes only this layer needs to be changed
•Consolidates all DB access code in ‘one’ place
16
Evolutionary Design
Database Encapsulation Layer - Variations
•Single application, single DB - pretty straight-forward
•Multiple-applications, single DB - common when there is a
legacy DB
•Multiple applications, multiple DB
•Implement via direct SQL access, DAOs, Persistence
Frameworks, or services
17
Database Refactoring
• Essentially normalization after the fact
• Are design improvement changes to the schema that still
preserve its behavioral and informational semantics
• Includes both structural and functional aspects
• Can involve doing three changes together
1. Changing the schema
2. Migrating the data to the new schema
3. Changing the DB access code
18
Hyper Normalization & Generalization
• Hyper-normalization – Beyond 3NF
• Data Vault with attributes in satellite tables and foreign
keys moved to link tables
• Allows changes to data relationship without changing data
(hub) tables
• Hyper-generalization -all hub data moved to single table
and have a table of tables to ID which rows belong to what
data category. Also only one link table needed
• Reduces complexity and collateral damage from changes
19
Database Refactoring
Examples include:
•Apply Standard Types to Similar Data
•Consolidate Key Strategy for Entity
•Encapsulate Common Structure With View
•Introduce Column Constraint
•Introduce Common Format
•Introduce Lookup Table
•Migrate Database Method to Application
•Rename Column
•Replace One-To-Many With Associative Table
•Replace View With Method(s)
•Split Column 20
Database Refactoring
Refactoring Enablers:
•Regression testing
•Strong configuration management
•Close collaboration
•Just do it!
21
Agile Modeling
• Scott Ambler developed the concept of Agile Modeling
• Agile models are just barely good enough
• Agile models are developed iteratively
• Starts with a light-weight envisioning session to create a
domain model. To that I would add developing a Bus
Matrix and defining a core set of conformed dimensions
• With each iteration develop just barely enough of the
data model to support development of the sprint backlog
22
Questions
23

Agile Data Architecture

  • 1.
    Agile Data ArchitectureOverview • Tim Guay, PMP, CSM, CSD, PMI-ACP, CLSSS
  • 2.
    Who is cPrime? Engagedfor Your Project Management Success
  • 3.
    After the webinar… •We will send directions to collect the PDU you will earn from attending this webinar • We will also send a links to the recorded webinar and presentation slides once they are posted online For more information, visit www.cprime.com
  • 4.
    Your Instructor • TimGuay has over 25 years of IT experience and has applied Agile methodologies since 2002. • Enterprise Data Warehouse Specialist for 6 1/2 years • Managed major DW projects • PMP Certified since 2001, CSM since 2008, PMP-ACP since 2012, and Lean Sensei since 2013. • Clients have included government agencies, start-ups, and Fortune 500 corporations. • Agile trainer and coach. 4
  • 5.
    Agenda • Agile Datais Possible • Why do it? • Guiding Principles • Evolutionary design • Database Refactoring • Hyper-normalization and Generalization • Agile Modeling • Q & A 5
  • 6.
    Agile Data IsPossible • There are many who say that creating an enterprise- level database or data warehouse requires BDUF • It is possible and it is actually the better way to go as both Kimball and Inmon attest • Though Kimball’s architecture is best suited and will be the one that underlies my presentation. • Best because: • Bottom-up approach • Conformed Dimensions and Bus • Matrix Bus 6
  • 7.
    Agile Data isPossible Goals of Agile Data Architecture •To architect to support the delivery of working DW/BI functionality early and continuously to our customers •To architect for change •Scott Ambler key thought-leader in this space 7
  • 8.
    Why Do It? •Agile Myths - Too risky, no planning, no design, no documentation, cowboy coding, only good for small projects • Waterfall realities - Overall failure rate 29% (Standish), DW failure rate 50%+ (Gartner) 8
  • 9.
    Why Do It? •DW Failure Modes : • Insufficient business involvement • Underestimating the complexity and scope • Not anticipating or allowing change • Misunderstood expectations • Overcomplicated architecture • Poor understanding of the data 9
  • 10.
    Guiding Principles Agile Principles 1.Ourhighest priority is to satisfy the customer through early and continuous delivery of valuable software 2.Welcome changing requirements, even late in development. Agile processes harness change for the customer's competitive advantage 3.Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale 4.Business people and developers must work together daily throughout the project 5.Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done 6.The most efficient and effective method of conveying information to and within a development team is face-to-face conversation 10
  • 11.
    Guiding Principles Agile Principles 7.Workingsoftware is the primary measure of progress 8.Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely 9.Continuous attention to technical excellence and good design enhances agility 10.Simplicity — the art of maximizing the amount of work not done — is essential 11.The best architectures, requirements, and designs emerge from self- organizing teams 12.At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly. 11
  • 12.
    Evolutionary Design Key Practices •Closecollaboration between DBAs and developers •Each developer gets their own DB instance and test data •Continuous integration into the shared master •Automate the refactoring •Automatically update the developer instances whenever the master is changed •Have a clear DB access layer within the code •Beware of delivering one-off solutions 12
  • 13.
    Evolutionary Design Laying theFoundation - Conformed Dimensions •Conformed dimensions are descriptive master reference data that are referenced in multiple dimensional models •Fundamental to the Kimball approach •Enables Agile DW/BI by levering existing CD’s •Start by identifying a subset of attributes that have significance across the enterprise and iteratively grow from there •Failure to create conformed dimensions from the start will result in significant technical debit and is one of the key reasons for Agile DW project failure 13
  • 14.
    Evolutionary Design Laying theFoundation - Bus Matrix •Each column is a conformed dimension •Separate columns describe other information associated to each business process i.e. Owner, etc. •Each row is a business process •Each dimension is associated to a process by an X in the intersecting cell •Meets the Agile principle of just enough documentation 14
  • 15.
    Evolutionary Design Laying theFoundation - Bus Matrix •Can be done in a matter of days with the right people at the table and a skilled facilitator •Solid understanding of data and processes is required •Collaboration is key •Provides the Agile master plan and list of reusable common dimensions •Focusing on one row at a time reduces risks from overly- ambitious plans and supports the Agile principle of rapid development of valuable software 15
  • 16.
    Evolutionary Design Database EncapsulationLayer •Software architecture should include a database encapsulation layer; aka persistence layer or data layer •Hides the physical details of the DB from the business code •If DB changes only this layer needs to be changed •Consolidates all DB access code in ‘one’ place 16
  • 17.
    Evolutionary Design Database EncapsulationLayer - Variations •Single application, single DB - pretty straight-forward •Multiple-applications, single DB - common when there is a legacy DB •Multiple applications, multiple DB •Implement via direct SQL access, DAOs, Persistence Frameworks, or services 17
  • 18.
    Database Refactoring • Essentiallynormalization after the fact • Are design improvement changes to the schema that still preserve its behavioral and informational semantics • Includes both structural and functional aspects • Can involve doing three changes together 1. Changing the schema 2. Migrating the data to the new schema 3. Changing the DB access code 18
  • 19.
    Hyper Normalization &Generalization • Hyper-normalization – Beyond 3NF • Data Vault with attributes in satellite tables and foreign keys moved to link tables • Allows changes to data relationship without changing data (hub) tables • Hyper-generalization -all hub data moved to single table and have a table of tables to ID which rows belong to what data category. Also only one link table needed • Reduces complexity and collateral damage from changes 19
  • 20.
    Database Refactoring Examples include: •ApplyStandard Types to Similar Data •Consolidate Key Strategy for Entity •Encapsulate Common Structure With View •Introduce Column Constraint •Introduce Common Format •Introduce Lookup Table •Migrate Database Method to Application •Rename Column •Replace One-To-Many With Associative Table •Replace View With Method(s) •Split Column 20
  • 21.
    Database Refactoring Refactoring Enablers: •Regressiontesting •Strong configuration management •Close collaboration •Just do it! 21
  • 22.
    Agile Modeling • ScottAmbler developed the concept of Agile Modeling • Agile models are just barely good enough • Agile models are developed iteratively • Starts with a light-weight envisioning session to create a domain model. To that I would add developing a Bus Matrix and defining a core set of conformed dimensions • With each iteration develop just barely enough of the data model to support development of the sprint backlog 22
  • 23.