Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball Approach

21,526 views

Published on

Data Warehouse - Business Intelligence Lifecycle Overview by Warren Thronthwaite

This slide deck describes the Kimball approach from the best-selling Data Warehouse Toolkit, 2nd Edition. It was presented to the Bay Area Microsoft Business Intelligence User Group in October 2012.

Starting with business requirements and project definition, the lifecycle branches out into three tracks: Technical, Data and Applications. You will learn:

* The major steps in the Lifecycle and what needs to happen in each one.
* Why business requirements are so important and how they influence all major decisions across the entire DW/BI system.
* Key tools for prioritizing business requirements and creating an enterprise information framework.
* How to break up a DW/BI system into doable increments that add real business value and can be completed in a reasonable time frame.

Published in: Technology
6 Comments
95 Likes
Statistics
Notes
No Downloads
Views
Total views
21,526
On SlideShare
0
From Embeds
0
Number of Embeds
145
Actions
Shares
0
Downloads
0
Comments
6
Likes
95
Embeds 0
No embeds

No notes for slide

Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball Approach

  1. 1. THE DW/BI SYSTEM LIFECYCLEOVERVIEW The Kimball Approach Warren Thornthwaite
  2. 2. Acknowledgments  Course materials adapted from...  The Microsoft Data Warehouse Toolkit  J. Mundy, W. Thornthwaite (Wiley 2006)  The Data Warehouse Lifecycle Toolkit, 2nd Ed.  R. Kimball, M. Ross, W. Thornthwaite, J. Mundy, B. Becker (Wiley 2008)  The Data Warehouse Toolkit, 2nd Ed.  R. Kimball, M. Ross (Wiley 2002)  Kimball University  Course materials  Design Tips and Intelligent Enterprise articles at www.KimballUniversity.com2
  3. 3. Acknowledgments  Course materials adapted from...  The Data Warehouse Lifecycle Toolkit, 2nd Ed.  R. Kimball, M. Ross, W. Thornthwaite, J. Mundy, B. Becker (Wiley 2008)  The Data Warehouse Toolkit, 2nd Ed.  R. Kimball, M. Ross (Wiley 2002)  The Microsoft Data Warehouse Toolkit, 2nd Ed.  J. Mundy, W. Thornthwaite (Wiley 2011)  Kimball Group / Kimball University  Course materials  Design Tips and Intelligent Enterprise articles at www.KimballGroup.com
  4. 4. Session Agenda  DW/BI Lifecycle business context  DW/BI System Lifecycle overview  Planning and managing the project/program  Defining business requirements  Creating the dimensional modeling (the data track)  Designing the DW/BI system architecture (the technology track)  Building the ETL system  Building BI applications (the applications track)  Rollout and repeat4
  5. 5. The DW/BI Lifecycle Context
  6. 6. The Business Context to the Lifecycle  Business people need information to make plans and assess results, and this need continues to grow  Data is captured by complex systems structured to support specific transaction requirements  Business people find it difficult to get business information from data in transaction systems Therefore, our job is to create a system that will: reliably take data out of the source systems, restructure its form and content as appropriate for business analysis, and provide it to the business people via tools they can actually use.6
  7. 7. Strengthen Your Awareness of the Broader Context  Ask yourself “What am I doing?”  Writing a program  Building a database  Creating a DW/BI system  Solving a set of high value, difficult problems  Thebroader you think, the more effective you will be in addressing business problems and delivering real business value.7
  8. 8. DW/BI System Lifecycle Overview
  9. 9. The Kimball Approach  Understand business requirements and deliver business value  Follow a proven methodology: the DW Lifecycle  Build and deliver incrementally (by business process) within an enterprise data framework (Bus Matrix and conformed dimensions)  Design the data sets for flexibility, usability and performance (Business Process Dimensional Model)  Provide the complete solution, including reports, query tools, portals, documentation, training, and support9
  10. 10. Business Requirements Are theFoundation of Success The more you focus your efforts on information-based business opportunities that are high value and relatively easy to implement, the more likely you will be to succeed. Regardless of which ETL or BI tools you use Regardless of which database you use Regardless of your technical skills
  11. 11. The DW/BI Lifecycle Technical Product Architecture Selection & Design Installation Growth Business Project Dimensional Physical ETL Design & Requirements Deployment Planning Modeling Design Development Definition BI BI Maintenance Application Application Specification Development Project Management11
  12. 12. Planning and Managing theProject/Program Technical Product Architecture Selection & Design Installation Growth Business Dimensional Physical ETL Design & Requirements Deployment Modeling Design Development Definition BI BI Maintenance Application Application Specification Development
  13. 13. Project Planning & Management Highlights  Assess readiness and determine starting point  Define the program / project – (2 phased startup)  Phase 1 program level: Enterprise business requirements  Prioritization / Business justification  Phase 2 project scope: Initial business process lifecycle iteration  Plan the project  Team roles and responsibilities  Detailed project plan  Manage the project  Control scope creep  Communication to manage expectations13
  14. 14. Data Warehouse Readiness Factors 1. Strong business management sponsor(s) (60%)  Vision of value, Politically capable, Realistic 2. Compelling business motivation (15%)  Generates urgency and supplies justification 3. Feasibility (15%)  Data feasibility 4. Other organizational Issues (10%)  IT/Business partnership  Current analytic culture  Requirements definition and prioritization are best tools to address shortfalls  Proof of concept demo is generally a bad idea14
  15. 15. Defining the Project: the Two Phased Startup  Phase 1: Enterprise Requirements definition  Phase 2: Project requirements focused on top priority business process 1 2 Enterprise Project (Horizontal) (Vertical) Initial Requirements Project Project Prioritization Business Planning Business Scope Process Requirement Requirement Definition Definition Project Management15
  16. 16. Planning the Project: the Phase 2 Detailed Project Plan  Assign roles and responsibilities  Leverage existing project planning tools  List end-to-end tasks for entire Lifecycle  Integrated and detailed  Key team members should develop estimates for their tasks  User acceptance after major tasks & deliverables  Keep unique characteristics in mind  Cross-functional, high visibility, iterative  Data problems will happen – identify them early!16
  17. 17. Estimating Guidelines for Project Planning / Management  Phase 1 requirements and prioritization  Key determinates: Readiness / sponsor scenario  Rule of thumb: Three weeks to months +  Developing the project plan  Rule of thumb: Less than two weeks  On-going project management  Key determinates: Organizational complexities, # players, # issues, political realities, ...  Rule of thumb: Often dedicated to DW/BI team17
  18. 18. Defining Business Requirements Technical Product Architecture Selection & Design Installation Growth Project Dimensional Physical ETL Design & DeploymentPlanning Modeling Design Development BI BI Maintenance Application Application Specification Development Project Management
  19. 19. Defining Business Requirements: Overall Process  Interviews are preferable  Three phases  Preparation (do your homework)  Interviews (including data source experts)  Documentation  Two passes  Enterprise  Project19
  20. 20. Defining Business Requirements: the Interviews  Assign roles and be ready  Must ask the right question  NOT “What do you want?”  Ask “What do you do?” (“What are your roles and responsibilities? What could you do better with improved access to information?”)  Cover key areas and listen  Take notes  Debrief with team immediately after  Common themes / opportunities  Required data (business processes)  Do-ability  Areas requiring clarification  User analytical / technical sophistication20
  21. 21. Defining Business Requirements: Interview Results  You must do the formal documentation  Validation  Reference material  Individual interview write-ups  Summary, not transcript  Business Objectives  Analytic opportunities and info requirements  Project Success Criteria  Consolidated findings document  Main content is list and descriptions of analytic opportunities  Includes the initial data warehouse bus matrix21
  22. 22. The Data Warehouse Bus Matrix is the Enterprise Data Architecture Framework  Matrix of business processes and conformed dimensions Business <--- Dimensions ---> Processes Date Product Dist Ctr Vendor Shipper Store Customer Promo Purchase Orders X X X X Dist Ctr Delivery X X X X X Dist Ctr Inventory X X X Store Deliveries X X X X X Store Inventory X X X Store Sales X X X X X Returns X X X X X22
  23. 23. Requirements Prioritization Session  Facilitated session with Business and IT management  Agenda:  Confirm requirements  Prioritize analytic info groups High A  Evaluate business B impact / benefit  Evaluate feasibility E Potential G Business F  Outcomes: Impact H  Mgmt education on feasibility D  “Right” opportunities C  Consensus Low  Ownership / Sponsorship Low Feasibility High  Roadmap for growth23
  24. 24. Defining Business Requirements Summary  Understanding business requirements is CRITICAL to successful DW/BI system  Don’t overlook the up-front preparation  Focus on listening  Document what you’ve heard  Analytic requirements  Enterprise bus matrix  Prioritize with senior management24
  25. 25. Designing the Dimensional ModelThe Data Track Technical Product Architecture Selection & Design Installation Growth Business Project Requirements Deployment Planning Definition BI BI Maintenance Application Application Specification Development Project Management
  26. 26. Designing the Business Process Dimensional Model  Basic dimensional modeling concepts  Slowly changing dimensions  The dimensional modeling process  Data profiling and data stewardship26
  27. 27. Terminology: Business Process Dimensional Model (or Star Schema)  Normalized fact table (business event) for a single business process at atomic detail level (the grain)  Denormalized dimensions (entities/objects) with all attributes and one active row per occurrence of the object Product KEY Product KEY Store KEY Date KEY Product Store KEY Store  Benefits: Attributes Promo KEY Attributes  Easier to understand  Better performance  Pre-joined dimensions Date KEY Promo KEY Facts  Star join optimization Promo Date  Dimensional engine Attributes Attributes  Extensible to handle change27
  28. 28. Terminology: Slowly Changing Dimension  Techniques for handling changes to dimension attributes  Type 1: overwrite attribute values  Common default, appropriate for corrections  Type 2 : create a new dimension row when attribute value changes  Flexible technique, critical for accurately tracking behavior over time  Hybrid combinations of 1 and 2 are most common  Integration Services has basic Slowly Changing Dimension management built in28
  29. 29. Dimensional Modeling Process  Develop the Data Warehouse Bus matrix  Start with the 4-step method to identify facts and dimensions  Step 1: Identify the business process (what row on the matrix should we start with?)  Step 2: Declare the grain  Step 3: Choose the dimensions  Step 4: Choose the facts  Diagram the dimensional model  Fill in the dimension and fact attributes (Step 5)  Use business requirements + source docs + data profiling  Follow naming standards (understandable to business)  Try the dimensional modeling spreadsheet from the book’s web site: http://www.kimballgroup.com/html/booksMDWTtools.html29
  30. 30. Creating Conformed Dimensions (Step 5)  All fact tables that share dimensions must use the same dimension with the same key  Agree on column names and definitions  Identify best source  Assign surrogate key to every dimension row Product  Combine all attributes into Surrogate Key Product KEY Master dimension table Business Key Product Code  Use the Master dimension Description to map the business Marketing Brand Category key in the fact rows Height to the surrogate key for Logistics Width each business process Weight that uses the dimension Cost Acctg. Standard Cost30
  31. 31. Dealing with Data QualityData Profiling Data Stewardship Data exploration to determine  Identify people on the data feasibility business side who care  Understand data structures,  Enroll them in data relationships and business rules exploration  Identify (and document) data problems  Include source systems Tools managers  Simple: SQL, BI tool, RS project  Agree on names, definitions, (see kimballgroup.com) business rules, etc.  Advanced: Data Profiling tool
  32. 32. Dimensional Modeling Summary  Enterprise perspective / roadmap  Enterprise Data Warehouse Bus Matrix  Presentation area must be dimensional  Ease of use  Query performance  Start with atomic detail, not just summary  Conform dimensions for consistency  Apply SCD techniques for handling attribute changes  Engage business to define names, content, business rules, and deal with data quality  Process  4-step approach  1) Business process, 2) grain, 3) dimensions, 4) facts  Fill in the attributes and measures (Step 5)32
  33. 33. Designing the DW/BI System ArchitectureThe Technology Track Growth Business Project Dimensional Physical ETL Design & Requirements Deployment Planning Modeling Design Development Definition BI BI Maintenance Application Application Specification Development Project Management
  34. 34. Architecture Principles  The DW/BI System architecture is the set of components and functionality needed to meet the business requirements  Business requirements determine architecture  Most of the tools include only core functionality. You will have to write code for your specific issues.  Thismeans your DW/BI system architecture will not be the same as your neighbor’s  Draw it out and write it down!34
  35. 35. The Goal: A Conformed DW/BI System Source Systems ETL Business Process System Dimensional Models Relational Analysis Logistics Inventory Dimension DMBS Services Processing Inventory Fact Sales Orders Processing Orders Billing Aggregates - Analysis Svcs. Returns Marketing - Relational DB Billing / • Models contain atomic-level detail Returns • with aggregates for performance • and transparent aggregate navigation • Includes both relational dimensional model and OLAP dimensional model35
  36. 36. What Goes Into a Typical Warehouse Architecture? High Level Warehouse Technical Architecture Model Back Room Front Room Metadata BI Applications Presentation Server Layer Data Access Services Direct access Dimension query and Source Data Quality Real Time Layer Maintenance Operational reporting tools BI Portal Systems Workbench- Operational Front End BI and Enterprise- ODS Standard Performanc Reporting- Desktop tools Reports- XML / Flat files ETL Metadata Dim 4 Dim 1 e Mgmt Services Aggregate Navigator Fact- MDM system Services Dim 3 Dim 2 Enterprise Bus Matrix- External Storage Repository Dim 4 Dim 1 Analytic Applications Fact-… Dim 3 Dim 2 Dim 4 Dim 1 Dashboards & Fact Dim 3 Dim 2 Delivery Admin. Web Scorecards Cleaning & Applications Dim 4 Dim 1 Extract Services Services Fact Preparing for Conforming Dim 3 Dim 2 Presentation Dim 4 Fact Dim 1 Operational Dim 3 Dim 2 Aggregates for Metadata BI Management Atomic level Performance Security Mgmt. Operational ETL Services business process Services and Systems and dimensional models Browsing Reports Infrastructure
  37. 37. Sample Architecture Plan Document Outline  Executive Overview  Architecture Implications of Business Requirements  Architecture Overview  Back Room and Front Room Services  Data Stores (Source, Staging, Presentation Servers)  Metadata Strategy  ETL System Strategy and Details  BI Applications System Strategy and Details  Infrastructure  Architecture Implementation Phases and Timing  Technology Evaluation Process37  Architecture Models
  38. 38. The ETL SystemPopulating the data warehouse Technical Product Architecture Selection & Design Installation Growth Business Project Dimensional Physical Requirements Deployment Planning Modeling Design Definition BI BI Maintenance Application Application Specification Development Project Management
  39. 39. ETL Startup  Create an ETL Plan  Based on dimensional model docs, data quality, and additional research  Map source tables to each target and identify required transformations  Each target flow corresponds to an ETL package  Setup development environment39
  40. 40. The ETL Functions  Understand the core functions common to most ETL systems (there are 34 of them)  They fall into four categories:  Extract: get the data out of the source and into the DW system  Transformation: clean the data and conform it to standard definitions and contents  Prepare the data for presentation: “dimensionalization”  Manage all the above functions in a coherent system40
  41. 41. Populating Dimension Tables  Recreating Type 2 change history can be a challenge  Cleaning and conforming can be complex  Integrating multiple sources and de-duplicating is a process unique to your business  Integration Services’ tools including Fuzzy Lookup can help for simple problems  Complex problems require a third party tool or service  Universaldimension function is handling changes in dimension attributes (SCDs)41
  42. 42. Slowly Changing Dimensions  Dimension attributes will change over time  Business users determine what must be tracked 1 Source New Source Rows Assign Simple Insert Surrogate Master File Keys Dimension 2 Type 2 Assign Update Changed Insert New Compare Surrogate Current_Flags Rows Current Rows Keys and Dates 3 Type 1 Update Changed Master Value in Rows Replace Dimension Current Row Most-Recent- Key Map w/Current No Change Rows (Optional) Ignore42
  43. 43. Populating Fact Tables  Populate the initial historical load  Different source systems, data structures, formats over time  History missing  Must build historical Slowly Changing Dimensions first  Can take a long time  Develop incremental load logic  Usually different packages from the historical load  Push vs. Pull (extract ownership)  Identify new / changed rows  Key substitution is big task  Catch up history and start incremental loads  Validate data at each step43
  44. 44. Fact Table ETL: Surrogate Key Lookup (Pipeline)  Replace production keys in the fact table extract with surrogate keys from the dimensions  Maintain and ensure referential integrity!  Watch for fact table key collisions Most Recent Most Recent Most Recent Most Recent Time Key Product Key Store Key Promotion Map Map Map Key Map time_ID product_ID store_ID promo_ID time_key product_key store_key promo_key Fact Table Fact Table replace replace replace replace Records With Records With time_ID with product_ID store_ID with promo_ID w/ Production IDS Surrogate Keys surrogate w/ surrogate surrogate surrogate time_ID time_key product_key store_key promo_key time_key product_ID product_key store_ID store_key promotion_ID promotion_key dollar_sales dollar_sales unit_sales dollar_cost Referential Integrity Failures unit_sales dollar_cost Key Collisions load fact table records into DBMS44
  45. 45. ETL System Summary  Develop a plan and setup development environment  Build out historical dimensions, including Type 2 attribute changes  Build out historical facts based on historical dimension key substitution  Design and build Analysis Services cube(s)  Create incremental load packages45
  46. 46. BI ApplicationsDelivering Value --not just data Technical Product Architecture Selection & Design Installation Growth Business Project Dimensional Physical ETL Design & Requirements Deployment Planning Modeling Design Development Definition Maintenance Project Management
  47. 47. BI Application Concepts  Role and definition  Application design  Templates  Navigation  Applications development  Additional Value  Data validation  Performance tuning  Character development47
  48. 48. Role and Definition of BI Applications Nature Consumer Information of Use Type Interface Value of BI Appl’n Ad hoc - Reporting / Analysis Strategic power Desktop tools for Do-it-yourself examples users - Assured reference points queries Migration Path Push-button - Low effort knowledge BI - Current business view workers - Flexible Applications Migration Path Standard Operational report ReportingOperational consumers Environment
  49. 49. The BI Application Continuum Standard Analytic Reports ApplicationsSimple, fixed Standard, flexible Complex analyticformat, pre-run format, parameter applications withreports driven reports domain expertise, embedded algorithms and operational system feedback loops
  50. 50. Design and Spec BI Applications  Create and prioritize a candidate report list  Develop a standard report template  Mandatory content (descriptions, titles, etc.)  Output look and feel  Develop mock-ups for top N candidate BI applications  Report/dashboard layouts with parameters  Document data sets, business rules, calculations, etc.  End user navigation  Structured path through templates and reports  This becomes the core of the BI Portal50
  51. 51. Sample BI Application Mock-up From the Geography Dimension Variable Time PeriodVariable Time Period Global BI System <Geography Name> We’re here to help Topline Performance Report <Period> Compared to <Previous Period> Sales YA Sales Sales Market % Var <<Product Line>> Units Units Index Share Prev Share xxxxxxxxxx xxx,xxx xxx,xxx xx.x xx.xx x.x xxxxxxxxxx xxx,xxx xxx,xxx xx.x xx.xx x.x xxxxxxxxxx xxx,xxx xxx,xxx xx.x xx.xx x.x xxxxxxxxxx xxx,xxx xxx,xxx xx.x xx.xx x.x xxxxxxxxxx xxx,xxx xxx,xxx xx.x xx.xx x.x Report Information Report Category: {Sales Analysis} Report Name: {Topline Performance Report – current vs. prior period by Geography} Source: {DW - Sales Performance} Run on: {Run_Date} Page { 1} Product Lines that meet the constraint criteria (may have drill down capability).
  52. 52. Sample Navigation Design BI Portal Marketing Sales Purchasing Manufact- uring Sales Activity Pipeline and Sales Force Web Channel Forecasting Management SalesProduct Topline Product Product Trend Product Market Topline Market Market Trend Market Template Topline Template Trend Template Topline Template Trend
  53. 53. Adventure Works Cycles DW/BI PortalStarting Point
  54. 54. Developing BI Applications  Begin development when...  Data is ready  Front end tool installed and environment set-up  Pull out report specifications  Build reports  Validate tool calculations and drill-paths  Validation data  Performance Tuning  Ongoing maintenance and enhancement resources  Character Development54
  55. 55. BI Applications Summary  BI Applications play a critical role  High value reporting  Broad audience  Data Warehouse team learning opportunity  Design Application standards, specs and navigation up front  Develop Applications when data is ready  Validate tool capabilities  Check data quality  Identify query performance problems  Make sure you have dedicated resources to maintain and enhance your BI offerings55
  56. 56. Rollout and RepeatSecurity, Deployment, Operations, and Growth Technical Product Architecture Selection & Design Installation Business Project Dimensional Physical ETL Design & Requirements Planning Modeling Design Development Definition BI BI Application Application Specification Development Project Management
  57. 57. The Thankless Tasks that Must Be Done  Security  Deployment process  User support  Training, desktop, support, documentation  System deployment  dev  Test  Production  Maintenance  System  User support57
  58. 58. Growth  The Lifecycle is an iterative process  Revisitopportunities with business and select the next top priority  Build additional dimensions  Load facts for this business process (fill out the bus matrix row by row)  Build and deliver the BI applications  Rollout and repeat!58
  59. 59. Start the Business Dimensional Lifecycle All Over Again! Technical Product Architecture Selection & Design Installation Growth Business Project Dimensional Physical ETL Design & Requirements DeploymentPlanning Modeling Design Development Definition BI BI Maintenance Application Application Specification Development Project Management
  60. 60. Conclusion  The Lifecycle is a proven DW/BI methodology  Keys to success:  Business value focused  Short, iterative delivery cycles  In an Enterprise framework  Full, end-to-end solution60
  61. 61. Contact Info warren@kimballgroup.com Visit www.kimballgroup.com for  Articles  Design tips (149 and counting)  Whitepapers  Forum Allof the concepts discussed are expanded on in the Kimball Toolkit series of books
  62. 62. THANK YOUfor attending this session and the2009 PASS Summit in Seattle

×