The final frontier


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

The final frontier

  1. 1. Agile Data Warehouse The Final Frontier Terry Bunio
  2. 2. Agile Data Warehouse The Final Frontier
  3. 3. Terry
  4. 4. 1 Data Warehouse Architecture2 Spectre of the Agility3 The Prime Directive4 Captain, we need more visualization!5 The United Federation of Planets
  5. 5. Data Warehouse• Definition – “a database used for reporting and data analysis. It is a central repository of data which is created by integrating data from multiple disparate sources. Data warehouses store current as well as historical data and are commonly used for creating trending reports for senior management reporting such as annual and quarterly comparisons.” –
  6. 6. Data Warehouse• Can refer to: – Reporting Databases – Operational Data Stores – Data Marts – Enterprise Data Warehouse – Cubes – Excel? – Others
  7. 7. Relational• Relational Analysis – Third Normal Form – OLTP – Normalized tables optimized for modification
  8. 8. Dimensional• Dimensional Analysis – Star Schema – OLAP – Facts and Dimensions optimized for retrieval • Facts – Business events – Transactions • Dimensions – context for Transactions – Accounts – Products – Date
  9. 9. Relational
  10. 10. Dimensional
  11. 11. Kimball-lytes• Bottom-up - incremental – Operational systems feed the Data Warehouse – Data Warehouse is a corporate dimensional model that Data Marts are sourced from – Data Warehouse is the consolidation of Data Marts – Sometimes the Data Warehouse is generated from Subject area Data Marts
  12. 12. Inmon-ians• Top-down – Corporate Information Factory – Operational systems feed the Data Warehouse – Enterprise Data Warehouse is a corporate relational model that Data Marts are sourced from – Enterprise Data Warehouse is the source of Data Marts
  13. 13. The gist…• Kimball‟s approach is easier to implement as you are dealing with separate subject areas, but can be a nightmare to integrate• Inmon‟s approach has more upfront effort to avoid these consistency problems, but takes longer to implement.
  14. 14. Spectre of the Agility
  15. 15. Data Warehouse Project Incremental - KimballWaterfall - Inmon •In Segments•Detailed Analysis •Detailed Analysis•Large Development •Development•Large Deploy •Deploy•Long Feedback loop •Long Feedback loop•Extensive changes •Considerable changes•Many Defects •Rework •Defects
  16. 16. Popular Agile Data Warehouse Pattern • Analyze data requirements department by department • Create Reports and Facts and Dimensions for each • Integrate when you do subsequent departments
  17. 17. The problem• Conforming Dimensions – A Dimension conforms when it is in equivalent structure and content – Is an account defined by Marketing the same as Finance? • Probably not – If the Dimensions do not conform, this severly hampers the Data Warehouse
  18. 18. Where is she?
  19. 19. Where is the true Agility?• Iterations not Increments• Brutal Visibility/Visualization• Short Feedback loops• Just enough requirements• Working on enterprise priorities – not just for an individual department
  20. 20. Our Mission• “Data... the Final Frontier. These are the continuing voyages of the starship Agile. Her on-going mission: to explore strange new projects, to seek out new value and new clients, to iteratively go where no projects have gone before.”
  21. 21. The Prime Directive
  22. 22. The Prime Directive• Is a vision or philosophy that binds the actions of Starfleet• Can an Data Warehouse project truly be Agile without a Vision of either the Business Domain or Data Domain? – Essentially it is then just an Ad Hoc Data Warehouse. Separate components that may fit together. – How do we ensure we are working on the right priorities for the entire enterprise?
  23. 23. A new model
  24. 24. Agile Enterprise Data Model• Confirms the major entities and the relationships between them – 30-50 entities• Confirms the Business and Data Domains• Starts the definition of a Data Model that will be refined over time – Completed in 1 – 4 weeks
  25. 25. An Agile Enterprise Data Model• Is just enough to understand the domain so that the iterations can proceed• Is not mapping all the attributes• Is not BDUF• Is a User Story Map for the Data Domain• Contains placeholders for refinement
  26. 26. Agile Enterprise Data Model is a Data Map
  27. 27. Agile Enterprise Data Model• Is – Our vision – Our User Story Map for the Data Domain – Guides our solution – Our Prime Directive – A Data Model
  28. 28. Kimball or Inmon?
  29. 29. Spock• Hybrid approach – It is only logical – Needs of the many outweigh the needs of the few – or the one
  30. 30. Spock Approach Agile Enterprise Data Model
  31. 31. Spock Approach• Agile Enterprise Data Model• Operational Data Store• Dimensional Data Warehouse• Reporting can then be done from: – Operational Data Store – Dimensional Data Warehouse – New Data Marts
  32. 32. Benefits of Spock Approach• Agile Enterprise Data Model – Validates knowledge of Data Domain – Ensure later increments don’t uncover data that was previously unknown and hard to integrate • Minimizes rework – True iterations • Confirm at high level and then refine
  33. 33. Benefits of Spock Approach• Operational Data Store – Model data relationally to provide enterprise level operational reports – Consolidate and cleanse data before it is visible to end-users – Is used to refine the Agile Enterprise Data Model
  34. 34. Benefits of Spock Approach• Dimensional Data Warehouse – Model data dimensionally to validate domain – Able to provide data to analytical reports – Able to provide full historical data and context for reports – Able to provide clients with the ability to generate their own reports easily
  35. 35. How do we work iteratively ona Data Warehouse?
  36. 36. Increments versus iterations• Increments – Series by series – department by department• Iterations – Story by story – episode by episode• Enterprise prioritization – Work on the highest priority for the enterprise – Not just within each series/department
  37. 37. Captain, we need more Visualization!
  38. 38. His pattern indicates 2 dimensional thinking
  39. 39. Data visualization
  40. 40. Data Visualization• Is required to: – Provide a visual data backlog – Provide a visualization of the data requirements across dimensions – Plan iterations – Lead into the creation of FACT tables and Dimensions in a Data Warehouse
  41. 41. • For an Agile Data Warehouse we must think and visualize in more dimensions• We must create a User Story Map for Data Requirements that is both intuitive and informing – Primarily for clients! – They should be able to look at it and understand what is currently being worked on
  42. 42. We need a bigger metaphor
  43. 43. Payments Sales Data Hexes MasterTransactions Invoices Data Bills Operations
  44. 44. Bill Payment Hex• Can have up to six dimensions of how payments are sliced, diced, and aggregated• Concentric hexes allow for the planning of iterations
  45. 45. Cardassian Union
  46. 46. Be careful how you spell that…
  47. 47. Data Modeling Union• For too long the Data Modelers have not been integrated with Software Developers• Data Modelers have been like the Cardassian Union, not integrated with the Federation
  48. 48. Issues• This has led to: – Holy wars – Each side expecting the other to follow their schedule – Lack of communication and collaboration• Data Modelers need to join the „United Federation of Projects‟
  49. 49. Tools of the trade
  50. 50. Version Control
  51. 51. Version Control• If you don‟t control versions, they will control you• Data Models must become integrated with the source control of the project – In the same repository of project trunk and branches
  52. 52. Our Version Experience• We are using Subversion• We are using Oracle Data Modeler as our Modeling tool. – It has very good integration with Subversion – Our DBMS is SQL Server• Unlike other modeling tools, the data model was able to be integrated in Subversion with the rest of the project
  53. 53. Shameless plug• Free• Subversion Integration• Supports Logical, Relational, and Dimensional data models• Since it is free, the data models can be shared and refined by all 60+ members of the development team• Currently on version 889
  54. 54. Adaptability
  55. 55. Change Tolerant Data Model• Only add tables and columns when they are absolutely required• Leverage Data Domains so that attributes are created consistently and can be changed in unison
  56. 56. Change Tolerant Data Model• Don‟t model the data according to the application‟s Object Model• Don‟t model the data according to source systems• These items will change more frequently than the actual data structure• Your Data Model and Object Model should be different!
  57. 57. Re-Factoring – Read It
  58. 58. Create the plan for how youwill re-factor
  59. 59. Plan for:• Versioning – Major and minor• Adaptability – How the data model will adapt to major changes• Refinement – How will iterations be planned and executed• Re-Factoring – How will the data design be re-factored
  60. 60. Assimilate
  61. 61. Assimilate• Assimilate Version Control, Adaptability, Refinement, and Re-Factoring into core project activities – Stand ups – Continuous Integration – Check outs and Check Ins• Make them part of the standard activities – not something on the side
  62. 62. Our experience
  63. 63. Current Stardate• We are reaching the end of Operational Data Store ETL Development – ODS has been refined as we progress – no major changes – Data Warehouse Dimensional Model is also complete – Initial Reports analysis is complete and report backlog will soon be started on
  64. 64. Summary• Use an Agile Enterprise Data Model to provide the vision• Strive for Iterations over Increments – Spock Approach assists in this• Use Data Hexes to provide brutal visibility and Iteration planning• Plan and Integrate processes for Versioning, Adaptability, Refinement, and Re-Factoring
  65. 65. What doesn‟t change?
  66. 66. Leadership
  67. 67. Leadership• “If you want to build a ship, dont drum up people together to collect wood and dont assign them tasks and work, but rather teach them to long for the endless immensity of the sea.” ~ Antoine de Saint-Exupery
  68. 68. Leadership• “[A goalies] job is to stop pucks, ... Well, yeah, thats part of it. But you know what else it is? ... Youre trying to deliver a message to your team that things are OK back here. This end of the ice is pretty well cared for. You take it now and go. Go! Feel the freedom you need in order to be that dynamic, creative, offensive player and go out and score. ... That was my job. And it was to try to deliver a feeling.” ~ Ken Dryden
  69. 69. Agile Data Warehouse The Final Frontier Terry Bunio @tbunio