Agile Data Warehouse The Final Frontier Terry Bunio
Agile Data Warehouse The Final Frontier
1 Data Warehouse Architecture2 Spectre of the Agility3 The Prime Directive4 Captain, we need more visualization!5 The United Federation of Planets
Data Warehouse• Definition – “a database used for reporting and data analysis. It is a central repository of data which is created by integrating data from multiple disparate sources. Data warehouses store current as well as historical data and are commonly used for creating trending reports for senior management reporting such as annual and quarterly comparisons.” – Wikipedia.org
Data Warehouse• Can refer to: – Reporting Databases – Operational Data Stores – Data Marts – Enterprise Data Warehouse – Cubes – Excel? – Others
Relational• Relational Analysis – Third Normal Form – OLTP – Normalized tables optimized for modification
Dimensional• Dimensional Analysis – Star Schema – OLAP – Facts and Dimensions optimized for retrieval • Facts – Business events – Transactions • Dimensions – context for Transactions – Accounts – Products – Date
Kimball-lytes• Bottom-up - incremental – Operational systems feed the Data Warehouse – Data Warehouse is a corporate dimensional model that Data Marts are sourced from – Data Warehouse is the consolidation of Data Marts – Sometimes the Data Warehouse is generated from Subject area Data Marts
Inmon-ians• Top-down – Corporate Information Factory – Operational systems feed the Data Warehouse – Enterprise Data Warehouse is a corporate relational model that Data Marts are sourced from – Enterprise Data Warehouse is the source of Data Marts
The gist…• Kimball‟s approach is easier to implement as you are dealing with separate subject areas, but can be a nightmare to integrate• Inmon‟s approach has more upfront effort to avoid these consistency problems, but takes longer to implement.
Popular Agile Data Warehouse Pattern • Analyze data requirements department by department • Create Reports and Facts and Dimensions for each • Integrate when you do subsequent departments
The problem• Conforming Dimensions – A Dimension conforms when it is in equivalent structure and content – Is an account defined by Marketing the same as Finance? • Probably not – If the Dimensions do not conform, this severly hampers the Data Warehouse
Where is she?
Where is the true Agility?• Iterations not Increments• Brutal Visibility/Visualization• Short Feedback loops• Just enough requirements• Working on enterprise priorities – not just for an individual department
Our Mission• “Data... the Final Frontier. These are the continuing voyages of the starship Agile. Her on-going mission: to explore strange new projects, to seek out new value and new clients, to iteratively go where no projects have gone before.”
The Prime Directive
The Prime Directive• Is a vision or philosophy that binds the actions of Starfleet• Can an Data Warehouse project truly be Agile without a Vision of either the Business Domain or Data Domain? – Essentially it is then just an Ad Hoc Data Warehouse. Separate components that may fit together. – How do we ensure we are working on the right priorities for the entire enterprise?
A new model
Agile Enterprise Data Model• Confirms the major entities and the relationships between them – 30-50 entities• Confirms the Business and Data Domains• Starts the definition of a Data Model that will be refined over time – Completed in 1 – 4 weeks
An Agile Enterprise Data Model• Is just enough to understand the domain so that the iterations can proceed• Is not mapping all the attributes• Is not BDUF• Is a User Story Map for the Data Domain• Contains placeholders for refinement
Agile Enterprise Data Model is a Data Map
Agile Enterprise Data Model• Is – Our vision – Our User Story Map for the Data Domain – Guides our solution – Our Prime Directive – A Data Model
Kimball or Inmon?
Spock• Hybrid approach – It is only logical – Needs of the many outweigh the needs of the few – or the one
Spock Approach Agile Enterprise Data Model
Spock Approach• Agile Enterprise Data Model• Operational Data Store• Dimensional Data Warehouse• Reporting can then be done from: – Operational Data Store – Dimensional Data Warehouse – New Data Marts
Benefits of Spock Approach• Agile Enterprise Data Model – Validates knowledge of Data Domain – Ensure later increments don’t uncover data that was previously unknown and hard to integrate • Minimizes rework – True iterations • Confirm at high level and then refine
Benefits of Spock Approach• Operational Data Store – Model data relationally to provide enterprise level operational reports – Consolidate and cleanse data before it is visible to end-users – Is used to refine the Agile Enterprise Data Model
Benefits of Spock Approach• Dimensional Data Warehouse – Model data dimensionally to validate domain – Able to provide data to analytical reports – Able to provide full historical data and context for reports – Able to provide clients with the ability to generate their own reports easily
How do we work iteratively ona Data Warehouse?
Increments versus iterations• Increments – Series by series – department by department• Iterations – Story by story – episode by episode• Enterprise prioritization – Work on the highest priority for the enterprise – Not just within each series/department
Captain, we need more Visualization!
His pattern indicates 2 dimensional thinking
Data Visualization• Is required to: – Provide a visual data backlog – Provide a visualization of the data requirements across dimensions – Plan iterations – Lead into the creation of FACT tables and Dimensions in a Data Warehouse
• For an Agile Data Warehouse we must think and visualize in more dimensions• We must create a User Story Map for Data Requirements that is both intuitive and informing – Primarily for clients! – They should be able to look at it and understand what is currently being worked on
We need a bigger metaphor
Payments Sales Data Hexes MasterTransactions Invoices Data Bills Operations
Bill Payment Hex• Can have up to six dimensions of how payments are sliced, diced, and aggregated• Concentric hexes allow for the planning of iterations
Be careful how you spell that…
Data Modeling Union• For too long the Data Modelers have not been integrated with Software Developers• Data Modelers have been like the Cardassian Union, not integrated with the Federation
Issues• This has led to: – Holy wars – Each side expecting the other to follow their schedule – Lack of communication and collaboration• Data Modelers need to join the „United Federation of Projects‟
Tools of the trade
Version Control• If you don‟t control versions, they will control you• Data Models must become integrated with the source control of the project – In the same repository of project trunk and branches
Our Version Experience• We are using Subversion• We are using Oracle Data Modeler as our Modeling tool. – It has very good integration with Subversion – Our DBMS is SQL Server• Unlike other modeling tools, the data model was able to be integrated in Subversion with the rest of the project
Shameless plug• Free• Subversion Integration• Supports Logical, Relational, and Dimensional data models• Since it is free, the data models can be shared and refined by all 60+ members of the development team• Currently on version 889
Change Tolerant Data Model• Only add tables and columns when they are absolutely required• Leverage Data Domains so that attributes are created consistently and can be changed in unison
Change Tolerant Data Model• Don‟t model the data according to the application‟s Object Model• Don‟t model the data according to source systems• These items will change more frequently than the actual data structure• Your Data Model and Object Model should be different!
Re-Factoring – Read It
Create the plan for how youwill re-factor
Plan for:• Versioning – Major and minor• Adaptability – How the data model will adapt to major changes• Refinement – How will iterations be planned and executed• Re-Factoring – How will the data design be re-factored
Assimilate• Assimilate Version Control, Adaptability, Refinement, and Re-Factoring into core project activities – Stand ups – Continuous Integration – Check outs and Check Ins• Make them part of the standard activities – not something on the side
Current Stardate• We are reaching the end of Operational Data Store ETL Development – ODS has been refined as we progress – no major changes – Data Warehouse Dimensional Model is also complete – Initial Reports analysis is complete and report backlog will soon be started on
Summary• Use an Agile Enterprise Data Model to provide the vision• Strive for Iterations over Increments – Spock Approach assists in this• Use Data Hexes to provide brutal visibility and Iteration planning• Plan and Integrate processes for Versioning, Adaptability, Refinement, and Re-Factoring
What doesn‟t change?
Leadership• “If you want to build a ship, dont drum up people together to collect wood and dont assign them tasks and work, but rather teach them to long for the endless immensity of the sea.” ~ Antoine de Saint-Exupery
Leadership• “[A goalies] job is to stop pucks, ... Well, yeah, thats part of it. But you know what else it is? ... Youre trying to deliver a message to your team that things are OK back here. This end of the ice is pretty well cared for. You take it now and go. Go! Feel the freedom you need in order to be that dynamic, creative, offensive player and go out and score. ... That was my job. And it was to try to deliver a feeling.” ~ Ken Dryden
Agile Data Warehouse The Final Frontier Terry Bunio @tbunio bornagainagilist.wordpress.com www.protegra.com