The final frontier v3


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

The final frontier v3

  1. 1. Agile Data Warehouse The Final Frontier
  2. 2. @tbunio Terry Bunio
  3. 3. Who Am I? • Data Base Administrator – Oracle, SQL Server, ADABAS • Data Architect – Investors Group, LPL Financial, Manitoba Blue Cross, Assante Financial • Agilist – Innovation Gamer, Team Member, Project Manager, PMO on SAP Implementation
  4. 4. Learning Objectives • Learn about how a Data Warehouse Project can be Agile • Introduce Agile practices that can help to be DWAgile • Introduce DW practices that can help to be DWAgile
  5. 5. What is Agile? • Deliver frequently as possible • Minimize Inventory – All work that doesn’t directly contribute to delivering value to the client – Typically value is realized by code
  6. 6. Enterpise Models Spock Method Visualization Spectre of the Agility Database/Data Warehouse Architecture DWAgile Practices
  7. 7. Data Warehouse • Definition – “a database used for reporting and data analysis. It is a central repository of data which is created by integrating data from multiple disparate sources. Data warehouses store current as well as historical data and are commonly used for creating trending reports for senior management reporting such as annual and quarterly comparisons.” –
  8. 8. Data Warehouse • Can refer to: – Reporting Databases – Operational Data Stores – Data Marts – Enterprise Data Warehouse – Cubes – Excel? – Others
  9. 9. Two sides of Database Design
  10. 10. Two design methods • Relational – “Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.”.”
  11. 11. Two design methods • Dimensional – “Dimensional modeling always uses the concepts of facts (measures), and dimensions (context). Facts are typically (but not always) numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts
  12. 12. Relational • Relational Analysis – Database design is usually in Third Normal Form – Database is optimized for transaction processing. (OLTP) – Normalized tables are optimized for modification rather than retrieval
  13. 13. Normal forms • 1st - Under first normal form, all occurrences of a record type must contain the same number of fields. • 2nd - Second normal form is violated when a non- key field is a fact about a subset of a key. It is only relevant when the key is composite • 3rd - Third normal form is violated when a non-key field is a fact about another non-key field Source: William Kent - 1982
  14. 14. Dimensional • Dimensional Analysis – Star Schema/Snowflake – Database is optimized for analytical processing. (OLAP) – Facts and Dimensions optimized for retrieval • Facts – Business events – Transactions • Dimensions – context for Transactions – Accounts – Products – Date
  15. 15. Relational
  16. 16. Dimensional
  17. 17. Kimball-lytes • Bottom-up - incremental – Operational systems feed the Data Warehouse – Data Warehouse is a corporate dimensional model that Data Marts are sourced from – Data Warehouse is the consolidation of Data Marts – Sometimes the Data Warehouse is generated from Subject area Data Marts
  18. 18. Inmon-ians • Top-down – Corporate Information Factory – Operational systems feed the Data Warehouse – Enterprise Data Warehouse is a corporate relational model that Data Marts are sourced from – Enterprise Data Warehouse is the source of Data Marts
  19. 19. The gist… • Kimball’s approach is easier to implement as you are dealing with separate subject areas, but can be a nightmare to integrate • Inmon’s approach has more upfront effort to avoid these consistency problems, but takes longer to implement.
  20. 20. Spectre of the Agility
  21. 21. Incremental - Kimball •In Segments •Detailed Analysis •Development •Deploy •Long Feedback loop •Considerable changes •Rework •Defects Waterfall - Inmon •Detailed Analysis •Large Development •Large Deploy •Long Feedback loop •Extensive changes •Many Defects Data Warehouse Project
  22. 22. Popular Agile Data Warehouse Pattern • Son’a method – Analyze data requirements department by department – Create Reports and Facts and Dimensions for each – Integrate when you do subsequent departments
  23. 23. The two problems • Conforming Dimensions – A Dimension conforms when it is in equivalent structure and content – Is a client defined by Marketing the same as Finance? • Probably not – If the Dimensions do not conform, this severely hampers the Data Warehouse
  24. 24. The two problems • Modeling the use of the data versus the data – By using reporting needs as the primary foundation for the data model, you are modeling the use of the data rather than the data – This will cause more rework in the future as the use of the data is more likely to change than the data itself.
  25. 25. Where is she?
  26. 26. Where is the true Agility? • Iterations not Increments • Brutal Visibility/Visualization • Short Feedback loops • Just enough requirements • Working on enterprise priorities – not just for an individual department
  27. 27. Fact • True iterative development on a Data Warehouse project is hard – perhaps harder than a traditional Software Development project – ETL, Data Models, and Business Intelligence stories can have a high impact on other stories – Can be difficult to create independent stories – Stories can have many prerequisites
  28. 28. Fiction • True iterative development on a Data Warehouse project is impossible – ETL, Data Models, and Business Intelligence stories can be developed iteratively – Independent stories can be developed – Stories can have many prerequisites – but this can be limited
  29. 29. Agile Mindset • We need to implement an Agile Mindset to Data Modelling – What is just enough Data Modelling? – And do no more…
  30. 30. Our Mission • “Data... the Final Frontier. These are the continuing voyages of the starship Agile. Her on-going mission: to explore strange new projects, to seek out new value and new clients, to iteratively go where no projects have gone before.”
  31. 31. The Prime Directive
  32. 32. The Prime Directive • Is a vision or philosophy that binds the actions of Starfleet • Can an Data Warehouse project truly be Agile without a Vision of either the Business Domain or Data Domain? – Essentially it is then just an Ad Hoc Data Warehouse. Separate components that may fit together. – How do we ensure we are working on the right priorities for the entire enterprise?
  33. 33. Enterprise Data Model?
  34. 34. Torture • Why does the creation of Enterprise Data Models feel like torture? – Interrogation – Coercion – Agreement on Excessive detail without direct alignment to business value
  35. 35. Enterprise Models
  36. 36. Enterprise Models
  37. 37. Two new models
  38. 38. Agile Enterprise Normalized Data Model • Confirms the major entities and the relationships between them – 30-50 entities • Confirms the Data Domain • Starts the definition of a Normalized Data Model that will be refined over time – Completed in 1 – 4 weeks
  39. 39. Agile Enterprise Normalized Data Model • Is just enough to understand the data domain so that the iterations can proceed • Is not mapping all the attributes – Is not BDUF • Is an Information Map for the Data Domain • Contains placeholders for refinement – Like a User Story Map
  40. 40. Agile Enterprise Dimensional Data Model • Confirms the Business Objects and the relationships between them – 10-15 entities • Confirms the Business Domains • Starts the definition of a Dimensional Data Model that will be refined over time – Completed in 1 – 2 weeks
  41. 41. Agile Enterprise Dimensional Data Model • Is just enough to understand the business domain so that the iterations can proceed – And to validate the understanding of the data domain • Is not mapping all the attributes – Is not BDUF • Is an Information Map for the Business Domain • Contains placeholders for refinement – Like a User Story Map
  42. 42. Agile Information Maps
  43. 43. Agile Information Maps • Agile Information Maps allow for: – Efficient Navigation of the Data and Business Domains – Ability to set up ‘Neutral Zones’ for areas that need more negotiation – Visual communication of the topology of the Data and Business Domains • Easier and more accurate to validate than text • ‘feels right’
  44. 44. Agile Information Maps • Are – Our vision – Our Maps for the Data and Business Domains – A guide for our solution – Minimizes rework and refactoring – Our Prime Directive – Data Models
  45. 45. Kimball or Inmon?
  46. 46. Spock • Hybrid approach – It is only logical – Needs of the many outweigh the needs of the few – or the one
  47. 47. Spock Approach Agile Normalized Data Model DM DM DM ODS DWAgile Dimensional Data Model Business Domain Spike
  48. 48. Spock Approach • Business Domain Spike • Agile Information Maps – Agile Enterprise Normalized Data Model – Agile Enterprise Dimensional Data Model • Implement – Operational Data Store – Dimensional Data Warehouse • Reporting can then be done from either
  49. 49. Business Domain Spike • Needs to precede work on Agile Information Maps • Need to understand the business and industry before you can create Data of Business Information Maps • Can take 1-2 weeks for an initial understanding – Constant refinement
  50. 50. Benefits of Spock Approach • Agile Enterprise Normalized Data Model – Validates knowledge of Data Domain – Ensure later increments don’t uncover data that was previously unknown and hard to integrate • Minimizes rework and refactoring – True iterations • Confirm at high level and then refine
  51. 51. Benefits of Spock Approach • Agile Enterprise Dimensional Data Model – Validates knowledge of Business Domain – The process of ‘cooking down’ to a Dimensional Model validates design and identifies areas of inconsistencies or errors • Especially true when you need to design how changes and history will be handled – True iterations • Confirm at high level and then refine
  52. 52. Benefits of Spock Approach • Operational Data Store – Model data relationally to provide enterprise level operational reports – Consolidate and cleanse data before it is visible to end-users – Is used to refine the Agile Enterprise Normalized Data Model – Start creating reports to validate data model immediately!
  53. 53. Benefits of Spock Approach • Dimensional Data Warehouse – Model data dimensionally to provide enterprise level analytical reports – Provide full historical data and context for reports – Is used to refine the Agile Enterprise Dimensional Data Model – Clients can start creating reports to validate data model immediately!
  54. 54. Do we need an ODS and DW? • Relational Analysis provides – Validation of the Data domain • Dimensional Analysis provides – Validation of the Business domain – Additional level of confirmation of the Data domain as the relational model in translated into a dimensional one • Much easier for inconsistencies and errors to hide in 300+ tables as opposed to 30+
  55. 55. Most Importantly.. • Operational Data Store – Minimal Data Latency – Current state – Allow for efficient Operational Reporting • Data Warehouse – Moderate Data Latency – Full history – Allows for efficient Analytical Reporting
  56. 56. Agile Approach • With an Agile approach you can deliver just enough of an Operational Data Store or Data Warehouse based on needs – No longer do they need to be a huge deliverable • Neither presumes a complete implementation is required • The Information Models allow for iterative delivery of value
  57. 57. How do we work iteratively on a Data Warehouse?
  58. 58. Increments versus iterations • Increments – Series by series – department by department • Iterations – Story by story – episode by episode • Enterprise prioritization – Work on the highest priority for the enterprise – Not just within each series/department
  59. 59. Iterative Focus • Instead of focusing on trying to have a complete model, we focused on creating processes that allow us to deliver changes within 30 minutes from model to deployment
  60. 60. Captain, we need more Visualization!
  61. 61. The View Screen
  62. 62. The View Screen • Enabled bridge to bridge communications • Provided visual images in and around the ship – From different angles – How did that work? • Allowed for more understanding of the situation
  63. 63. Visualization
  64. 64. Visualization • Is required to: – Report Project status – Provide a visual report map
  65. 65. Kanban Board • We used a standard Kanban board to track stories as we worked on them – These stories resulted in ETL, Data Model, and Reporting tasks – We had a Data Model/ETL board and a Report board – ETL and Data Model required a foundation created by the Information Maps before we could start on stories
  66. 66. • We also used thermometer imagery to report how we were progressing according to the schedule – Milestones were on the thermometer along with the number of reports that we had completed every day Report Visualization
  67. 67. Cardassian Union
  68. 68. Be careful how you spell that…
  69. 69. Data Modeling Union • For too long the Data Modellers have not been integrated with Software Developers • Data Modellers have been like the Cardassian Union, not integrated with the Federation
  70. 70. Issues • This has led to: – Holy wars – Each side expecting the other to follow their schedule – Lack of communication and collaboration • Data Modellers need to join the ‘United Federation of Projects’
  71. 71. How did we be Agile?
  72. 72. Tools of the trade
  73. 73. Tools of the Trade • Version Control and Refactoring • Test Automation • Communication and Governance • Adaptability and Change Tolerance • Assimilation
  74. 74. Version Control
  75. 75. Version Control • If you don’t control versions, they will control you • Data Models must become integrated with the source control of the project – In the same repository of project trunk and branches • You can’t just version a series of SQL files separate from your data model
  76. 76. Our Version Experience • We are using Subversion • We are using Oracle Data Modeler as our Modeling tool. – It has very good integration with Subversion – Our DBMS is SQL Server 2012 • Unlike other modeling tools, the data model was able to be integrated in Subversion with the rest of the project
  77. 77. ODM Shameless plug • Free • Subversion Integration • Supports Logical and Relational data models • Since it is free, the data models can be shared and refined by all members of the development team • Currently on version 2685
  78. 78. How do we roll out versions? • Create Data Model changes • Use Red Gate SQL Compare to generate alter script – Generate a new DB and compare to the last version to generate alter script • 95% of changes deployed in less than 10 minutes
  79. 79. How do we roll out versions? • We build on the Farley and Humble Blue- Green Deployment model – Blue – Current Version and Revision – Database Name will be ‘ODS’ – Green – 1 Revision Old – Database Name will be ‘ODS-GREEN’ – Brown – 1 Major Version Old – Database Name will be ‘ODS-BROWN’
  80. 80. Versioning • SQL Change scripts are generated all changes • A full script is generated for every major version – A new folder is created for every major version – Major version folders and named after the greek alphabet. (alpha, beta, gamma)
  81. 81. SQL Script version naming standards • [revision number]-[ODS/DW]-[I/A][version number]- [subversion revision number of corresponding Data model].sql – Revision number – auto-incrementing – Version Number – A999 • Alphabetic character represents major version – corresponds with folder named after greek alphabet • 999 indicates minor versions – subversion revision number of corresponding Data model – allows for a exact synchronization between Data Model and SQL Scripts • All objects are stored within one Subversion repository – They all share the same revision numbering
  82. 82. SQL Script version naming standards • For example: – 0-ODS-I-A001-745.sql – initial db and table creation for current ODS version (includes reference data) – 1-ODS-A-A001-1574.sql – revision 1 ODS alter script that corresponds to data model subversion revision 1574 – 2-ODS-A-A001-1590.sql - revision 2 ODS alter script that corresponds to data model subversion revision 1590
  83. 83. SQL Script error handling • Validation is done to prevent – Scripts being run out of sequence – Revision being applied without addressing required refactoring – Scripts being run on any environment but the Blue environment
  84. 84. But what about Refactoring? • Having Agile Information Maps has significantly reduced refactoring – This was an entirely new data domain for the team • Using the Blue-Green-Brown deployment model has simplified required refactoring • We have used the methods described by Scott Ambler on the odd occasion
  85. 85. Good Start
  86. 86. Create the plan for how you will re-factor
  87. 87. Refactoring Experience • We haven’t needed to refactor much • Since are iteratively refining we haven’t had to re-define much – Just adding more detail – Main Information Maps have held together
  88. 88. Test Automation
  89. 89. Test Automation • Enterprise was saved due to constantly running tests on the warp engine • Allowed for quick decision making
  90. 90. Automated Test Suite • Leveraged the tSQLt Open Source Framework • Purchased SQL-test from Red-Gate for a enhanced interface • Enhanced the framework to execute tests from four custom tables we defined
  91. 91. Automated Test Suite • Leveraged Data Mapping spreadsheet that the automated tests used – Two database tables were loaded from the spreadsheet – Two additional tables contained ETL test cases – 13 Stored Procedures executed the tests – 3300+ columns mapped!
  92. 92. Table Tests • TstTableCount: Compares record counts between source data and target data. • TstTableColumnDistinct: Compares counts on distinct values of columns. • TstTableColumnNull: Generates a report of all columns where all the contents of a field is all null.
  93. 93. Column Tests • TstColumnDataMapping: Compares columns directly assigned from a source column on a field by field basis for 5-10 rows in the target table. • TstColumnConstantMapping: Compares columns assigned a constant on a field by field basis for 5-10 rows in the target table. • TstColumnNullMapping: Compares columns assigned a Null value on a field by field basis for 5-10 rows in the target table. • TstColumnTransformedMapping: Compares transformed columns on a field by field basis for 5-10 rows in the target table.
  94. 94. Data Quality Tests • TstInvalidParentFKColumn: Tests that an Invalid Parent FK value results in the records being logged and bypassed. This record will be added to the staging table to test the process. • TstInvalidFKColumn: Tests that an Invalid FK value results in the value being assigned a default value or Null. This record will be added to the staging table to test the process. • TstInvalidColumn: Tests that an Invalid value results in the value being assigned a default value or Null. This record will be added to the staging table to test the process.
  95. 95. Process Integrity Tests • TstRestartTask: Tests that a Task can be started from the start and subsequent steps will run in sequence. • TstRecoverTask: Tests that a Task can be re-started in the middle and that record are processed correctly and subsequent steps will run in sequence.
  96. 96. Interested? • Leave me a business card and I’ll send you the design document and stored procedures
  97. 97. Communication
  98. 98. Team Communication • Frequent Data Model walkthroughs with application teams • Full access to the Data model through the Data Modeling development tool • Data Models posted in every room for developers to mark up with suggestions • Database deployment to play with for every release
  99. 99. Client Communication • Frequent Conceptual Data Model walkthroughs with clients – Includes presentation of scenarios with data to confirm and validate understanding • Collaboration on the iterative plan to ensure they agree on the process and support it
  100. 100. Monthly Governance Meeting – Visual Kan Ban boards reviewed – Reports developed in the prior iterations were demonstrated – Business Areas were asked to submit a ranked list of their top 10-20 data requirement/reports for the next iteration.
  101. 101. Adaptability
  102. 102. Be Nimble • Already discussed how we can roll out new versions quickly
  103. 103. Change Tolerant Data Model • Only add tables and columns when they are absolutely required • Leverage Data Domains so that attributes are created consistently and can be changed in unison – Use limited number of standard domains
  104. 104. Change Tolerant Data Model • Data Model needs to be loosely coupled and have high cohesion – Need to model the data and business and not the applications or reports!
  105. 105. Change Tolerant Data Model • Don’t model the data according to the application’s Object Model • Don’t model the data according to source systems • These items will change more frequently than the actual data structure • Your Data Model and Object Model should be different!
  106. 106. Assimilate
  107. 107. Assimilate • Assimilate Version Control, Communication, Adaptability, Refinement, and Re-Factoring into core project activities – Stand ups – Continuous Integration – Check outs and Check Ins • Make them part of the standard activities – not something on the side
  108. 108. Our experience
  109. 109. Our Mission • These practices and methods are being used to redevelop an entire Business Intelligence platform for a major ‘Blue’ Health Benefits company – Operational and Analytical Reports • 100+ integration projects • SAP Claims solution
  110. 110. Our Mission • Integration projects are being run Agile • 100+ team members across all projects • SAP project is being run more in a more traditional manner – ‘big-bang’ SAP implementation • I’m now also fulfilling the role of an Agile PMO
  111. 111. Our Challenge • How can we deploy to production early and often when the system is a ‘big-bang’ implementation – We were ready to deploy ahead of clients and other projects – We were dependant on other conversion projects
  112. 112. Our Challenge • We are now exploring alternate ways to deploy to production before the ‘big-bang’ implementation – To allow the clients to use the reports and iteratively refine them and the solution – Also allows our team to validate data integrity and quality iteratively – We are now executing iterations to make this possible
  113. 113. Our BI Solution • SQL Server 2012 – Integration Services – Reporting Services • SharePoint 2010 Foundation – SharePoint Integrated Reporting Solution
  114. 114. Our team • Integrated team of – 2 enterprise DBAs from the ‘Blue’ – 5 Data Analysts/DBAs/SSIS/SSRS developers • Governance team comprised of – Business Areas – Systems Areas – Stakeholders
  115. 115. Current Stardate • We have completed the initial ODS and DW development – including ETL • We have completed a significant revision of ODS, DW, and ETL – without major issues • We are now finishing Report development – reports have required database changes and ETL changes – but no major changes! – 300+ reports developed
  116. 116. Summary • Use Agile Enterprise Data Models to provide the initial vision and allow for refinements • Strive for Iterations over Increments • Align governance and prioritization with iterations • Plan and Integrate processes for Versioning, Test Automation, Adaptability, Refinement
  117. 117. What doesn’t change?
  118. 118. Leadership
  119. 119. Leadership • “If you want to build a ship, don't drum up people together to collect wood and don't assign them tasks and work, but rather teach them to long for the endless immensity of the sea.” ~ Antoine de Saint-Exupery
  120. 120. Leadership • “[A goalie's] job is to stop pucks, ... Well, yeah, that's part of it. But you know what else it is? ... You're trying to deliver a message to your team that things are OK back here. This end of the ice is pretty well cared for. You take it now and go. Go! Feel the freedom you need in order to be that dynamic, creative, offensive player and go out and score. ... That was my job. And it was to try to deliver a feeling.” ~ Ken Dryden
  121. 121. Three awesome books