THE DW/BI SYSTEM LIFECYCLE
OVERVIEW
 The Kimball Approach




     Warren Thornthwaite
Acknowledgments

           Course materials adapted from...
             The Microsoft Data Warehouse Toolkit
               J. Mundy, W. Thornthwaite (Wiley 2006)
             The Data Warehouse Lifecycle Toolkit, 2nd Ed.
               R. Kimball, M. Ross, W. Thornthwaite,
                J. Mundy, B. Becker (Wiley 2008)
             The Data Warehouse Toolkit, 2nd Ed.
               R. Kimball, M. Ross (Wiley 2002)
             Kimball University
               Course materials
               Design Tips and Intelligent Enterprise articles at
                www.KimballUniversity.com


2
Acknowledgments

       Course materials adapted from...
         The Data Warehouse Lifecycle Toolkit, 2nd Ed.
            R. Kimball, M. Ross, W. Thornthwaite,
             J. Mundy, B. Becker (Wiley 2008)
         The Data Warehouse Toolkit, 2nd Ed.
            R. Kimball, M. Ross (Wiley 2002)
         The Microsoft Data Warehouse Toolkit, 2nd Ed.
            J. Mundy, W. Thornthwaite (Wiley 2011)
         Kimball Group / Kimball University
            Course materials
            Design Tips and Intelligent Enterprise articles at
             www.KimballGroup.com
Session Agenda

       DW/BI Lifecycle business context
       DW/BI System Lifecycle overview
       Planning and managing the project/program
       Defining business requirements
       Creating the dimensional modeling (the data track)
       Designing the DW/BI system architecture (the technology track)
       Building the ETL system
       Building BI applications (the applications track)
       Rollout and repeat



4
The DW/BI Lifecycle Context
The Business Context to the Lifecycle

     Business people need information to make plans and assess
      results, and this need continues to grow
     Data is captured by complex systems structured to support
      specific transaction requirements
     Business people find it difficult to get business information from
      data in transaction systems

        Therefore, our job is to create a system that will:
          reliably take data out of the source systems,
          restructure its form and content as appropriate for business analysis,
          and provide it to the business people via tools they can actually use.




6
Strengthen Your Awareness of the
    Broader Context
     Ask   yourself “What am I doing?”
         Writing a program
         Building a database
         Creating a DW/BI system
         Solving a set of high value, difficult problems
     Thebroader you think, the more effective you will be
     in addressing business problems and delivering real
     business value.


7
DW/BI System Lifecycle Overview
The Kimball Approach

       Understand business requirements and deliver business value
       Follow a proven methodology: the DW Lifecycle
       Build and deliver incrementally (by business process) within an
        enterprise data framework (Bus Matrix and conformed dimensions)
       Design the data sets for flexibility, usability and performance
        (Business Process Dimensional Model)
       Provide the complete solution, including reports, query tools, portals,
        documentation, training, and support




9
Business Requirements Are the
Foundation of Success
 The more you focus your efforts on information-based
  business opportunities that are high value and
  relatively easy to implement, the more likely you will
  be to succeed.


 Regardless   of which ETL or BI tools you use
 Regardless   of which database you use
 Regardless   of your technical skills
The DW/BI Lifecycle


                            Technical           Product
                           Architecture       Selection &
                             Design           Installation                               Growth

              Business
  Project                  Dimensional         Physical      ETL Design &
            Requirements                                                    Deployment
 Planning                   Modeling            Design       Development
             Definition

                                   BI                             BI                 Maintenance
                              Application                     Application
                              Specification                  Development



                                               Project Management



11
Planning and Managing the
Project/Program
                   Technical           Product
                  Architecture       Selection &
                    Design           Installation                               Growth


     Business
                  Dimensional         Physical      ETL Design &
   Requirements                                                    Deployment
                   Modeling            Design       Development
    Definition


                          BI                             BI                 Maintenance
                     Application                     Application
                     Specification                  Development
Project Planning & Management
     Highlights
        Assess readiness and determine starting point
        Define the program / project – (2 phased startup)
          Phase 1 program level: Enterprise business requirements
          Prioritization / Business justification
          Phase 2 project scope: Initial business process lifecycle iteration
        Plan the project
          Team roles and responsibilities
          Detailed project plan
        Manage the project
          Control scope creep
          Communication to manage expectations




13
Data Warehouse
     Readiness Factors
     1. Strong business management sponsor(s)                    (60%)
          Vision of value, Politically capable, Realistic
     2. Compelling business motivation                           (15%)
          Generates urgency and supplies justification
     3. Feasibility                                              (15%)
          Data feasibility
     4. Other organizational Issues                              (10%)
          IT/Business partnership
          Current analytic culture

      Requirements definition and prioritization are best tools to
       address shortfalls
      Proof of concept demo is generally a bad idea



14
Defining the Project:
     the Two Phased Startup
    Phase 1: Enterprise Requirements definition
    Phase 2: Project requirements focused on top priority
                               business process

               1                                   2
                    Enterprise                                  Project
                   (Horizontal)                                (Vertical)
     Initial                      Requirements
                                                    Project
     Project                      Prioritization
                     Business                      Planning     Business
     Scope                           Process
                   Requirement                                Requirement
                    Definition                                 Definition



                                                                 Project Management


15
Planning the Project:
     the Phase 2 Detailed Project Plan
      Assign roles and responsibilities
      Leverage existing project planning tools
      List end-to-end tasks for entire Lifecycle
          Integrated and detailed
          Key team members should develop estimates for their
           tasks
          User acceptance after major tasks & deliverables
        Keep unique characteristics in mind
          Cross-functional, high visibility, iterative
          Data problems will happen – identify them early!

16
Estimating Guidelines for
     Project Planning / Management
      Phase   1 requirements and prioritization
        Key determinates: Readiness / sponsor scenario
        Rule of thumb: Three weeks to months +
      Developing   the project plan
        Rule of thumb: Less than two weeks
      On-going   project management
        Key determinates: Organizational complexities, #
         players, # issues, political realities, ...
        Rule of thumb: Often dedicated to DW/BI team

17
Defining Business Requirements

            Technical           Product
           Architecture       Selection &
             Design           Installation                               Growth




 Project   Dimensional         Physical      ETL Design &
                                                            Deployment
Planning    Modeling            Design       Development




                   BI                             BI                 Maintenance
              Application                     Application
              Specification                  Development


                               Project Management
Defining Business Requirements:
     Overall Process

      Interviews   are preferable
      Three   phases
        Preparation (do your homework)
        Interviews (including data source experts)
        Documentation
      Two   passes
        Enterprise
        Project


19
Defining Business Requirements: the
     Interviews
        Assign roles and be ready
        Must ask the right question
          NOT “What do you want?”
          Ask “What do you do?”
           (“What are your roles and responsibilities? What could you do better
           with improved access to information?”)
        Cover key areas and listen
        Take notes
        Debrief with team immediately after
            Common themes / opportunities
            Required data (business processes)
            Do-ability
            Areas requiring clarification
            User analytical / technical sophistication

20
Defining Business Requirements:
     Interview Results
        You must do the formal documentation
          Validation
          Reference material

        Individual interview write-ups
          Summary, not transcript
             Business Objectives
             Analytic opportunities and info requirements
             Project Success Criteria

        Consolidated findings document
          Main content is list and descriptions of analytic opportunities
          Includes the initial data warehouse bus matrix



21
The Data Warehouse Bus Matrix is the
      Enterprise Data Architecture Framework
       Matrix  of business processes and
         conformed dimensions

     Business                                <--- Dimensions --->
     Processes        Date   Product Dist Ctr   Vendor   Shipper   Store   Customer Promo
 Purchase Orders       X        X        X        X
 Dist Ctr Delivery     X        X        X        X         X
 Dist Ctr Inventory    X        X        X
 Store Deliveries      X        X        X                  X        X
 Store Inventory       X        X                                    X
 Store Sales           X        X                                    X        X       X
 Returns               X        X                                    X        X       X




22
Requirements Prioritization Session
      Facilitated session with Business and IT management
      Agenda:
          Confirm requirements
          Prioritize analytic info groups       High
                                                          A
              Evaluate business                                             B
               impact / benefit
              Evaluate feasibility                                     E
                                             Potential                       G
                                             Business          F
        Outcomes:                           Impact
                                                                    H
            Mgmt education on feasibility                                       D
            “Right” opportunities                       C
            Consensus                           Low
            Ownership / Sponsorship                     Low   Feasibility           High
            Roadmap for growth

23
Defining Business Requirements
     Summary
      Understanding business requirements is CRITICAL to
       successful DW/BI system
      Don’t   overlook the up-front preparation
      Focus   on listening
      Document     what you’ve heard
        Analytic requirements
        Enterprise bus matrix
      Prioritize   with senior management

24
Designing the Dimensional Model
The Data Track
                               Technical           Product
                              Architecture       Selection &
                                Design           Installation                               Growth



                 Business
     Project   Requirements                                                    Deployment
    Planning
                Definition



                                      BI                             BI                 Maintenance
                                 Application                     Application
                                 Specification                  Development


                                                  Project Management
Designing the Business
     Process Dimensional Model
      Basic   dimensional modeling concepts
      Slowly   changing dimensions
      The   dimensional modeling process
      Data    profiling and data stewardship




26
Terminology: Business Process
     Dimensional Model (or Star Schema)
        Normalized fact table (business event) for a single business
         process at atomic detail level (the grain)
        Denormalized dimensions (entities/objects) with all attributes
         and one active row per
         occurrence of the object         Product KEY  Product KEY Store KEY
                                                         Date KEY
                                            Product      Store KEY   Store
        Benefits:                          Attributes   Promo KEY   Attributes
          Easier to understand
          Better performance
             Pre-joined dimensions         Date KEY                 Promo KEY
                                                           Facts
             Star join optimization                                  Promo
                                            Date
             Dimensional engine                                      Attributes
                                            Attributes
          Extensible to handle change


27
Terminology:
     Slowly Changing Dimension
        Techniques for handling changes to dimension attributes
          Type 1: overwrite attribute values
            Common default, appropriate for corrections
          Type 2 : create a new dimension row when attribute value
           changes
            Flexible technique, critical for accurately tracking behavior over time
          Hybrid combinations of 1 and 2 are most common
          Integration Services has basic Slowly Changing Dimension
           management built in




28
Dimensional Modeling Process

        Develop the Data Warehouse Bus matrix
        Start with the 4-step method to identify facts and dimensions
          Step 1: Identify the business process (what row on the matrix should we
           start with?)
          Step 2: Declare the grain
          Step 3: Choose the dimensions
          Step 4: Choose the facts
      Diagram the dimensional model
      Fill in the dimension and fact attributes (Step 5)
          Use business requirements + source docs + data profiling
          Follow naming standards (understandable to business)
          Try the dimensional modeling spreadsheet from the book’s web site:
           http://www.kimballgroup.com/html/booksMDWTtools.html



29
Creating Conformed Dimensions
     (Step 5)
      All fact tables that share dimensions must use the same
       dimension with the same key
      Agree on column names and definitions
      Identify best source
      Assign surrogate key to every dimension row
                                                         Product
      Combine all attributes into     Surrogate Key  Product KEY
       Master dimension table          Business Key  Product Code
      Use the Master dimension                      Description
       to map the business             Marketing     Brand
                                                     Category
       key in the fact rows                          Height
       to the surrogate key for        Logistics
                                                     Width
       each business process                         Weight
       that uses the dimension        Cost Acctg.    Standard Cost

30
Dealing with Data Quality

Data Profiling                       Data Stewardship
 Data exploration to determine  Identify people on the
  data feasibility                     business side who care
   Understand data structures,       Enroll them in data
    relationships and business rules   exploration
   Identify (and document) data
    problems                          Include source systems
 Tools
                                       managers
   Simple: SQL, BI tool, RS project  Agree on names, definitions,
    (see kimballgroup.com)             business rules, etc.
    Advanced: Data Profiling tool
Dimensional Modeling Summary

        Enterprise perspective / roadmap
          Enterprise Data Warehouse Bus Matrix
        Presentation area must be dimensional
          Ease of use
          Query performance
        Start with atomic detail, not just summary
        Conform dimensions for consistency
          Apply SCD techniques for handling attribute changes
          Engage business to define names, content, business rules, and deal with data
           quality
        Process
          4-step approach
          1) Business process, 2) grain, 3) dimensions, 4) facts
          Fill in the attributes and measures (Step 5)

32
Designing the DW/BI System
     Architecture
The Technology Track


                                                                                          Growth



                 Business
     Project                  Dimensional        Physical     ETL Design &
               Requirements                                                  Deployment
    Planning                   Modeling           Design      Development
                Definition



                                      BI                           BI                 Maintenance
                                 Application                   Application
                                 Specification                Development


                                                 Project Management
Architecture Principles

      The DW/BI System architecture is the set of
       components and functionality needed to meet the
       business requirements
      Business   requirements determine architecture
      Most  of the tools include only core functionality. You
       will have to write code for your specific issues.
      Thismeans your DW/BI system architecture will not
       be the same as your neighbor’s
      Draw   it out and write it down!
34
The Goal: A Conformed DW/BI System
      Source
      Systems
                      ETL                Business Process
                    System             Dimensional Models
                                        Relational     Analysis                    Logistics
 Inventory       Dimension              DMBS           Services
                 Processing
                                              Inventory
                 Fact                                                                Sales
                                                Orders
                 Processing
     Orders
                                                 Billing
                 Aggregates
                 - Analysis Svcs.              Returns                             Marketing
                 - Relational DB


     Billing /                      • Models contain atomic-level detail
     Returns                        • with aggregates for performance
                                    • and transparent aggregate navigation
                                    • Includes both relational dimensional model
                                        and OLAP dimensional model
35
What Goes Into a Typical
  Warehouse Architecture?
        High Level Warehouse Technical Architecture Model

                                   Back Room                                                                                               Front Room
                                                                                              Metadata
                                                                                                                                                                 BI Applications
                                                              Presentation Server Layer                                                 Data Access Services      Direct access
                      Dimension                                                                                                                                    query and
 Source                              Data Quality                                        Real Time Layer
                     Maintenance                                                                                                     Operational                 reporting tools




                                                                                                                                                                                        BI Portal
 Systems                              Workbench
- Operational         Front End                                                                                                        BI and       Enterprise
- ODS                                                                                                                                                               Standard
                                                                                                                                     Performanc     Reporting
- Desktop tools                                                                                                                                                      Reports
- XML / Flat files      ETL          Metadata                                Dim 4    Dim 1                                            e Mgmt        Services




                                                                                                               Aggregate Navigator
                                                                                  Fact
- MDM system                                                                                                                          Services
                                                                             Dim 3    Dim 2
                                                     Enterprise Bus Matrix




- External            Storage        Repository                              Dim 4    Dim 1
                                                                                                                                                                    Analytic
                                                                                                                                                                  Applications
                                                                                  Fact
-…                                                                           Dim 3 Dim 2


                                                                             Dim 4    Dim 1

                                                                                                                                                                 Dashboards &
                                                                                  Fact
                                                                             Dim 3    Dim 2

                                       Delivery                                                                                       Admin.           Web        Scorecards
                     Cleaning &




                                                                                                                                                                                   Applications
                                                                             Dim 4    Dim 1
   Extract                                                                                                                            Services       Services
                                                                                  Fact
                                     Preparing for
                     Conforming
                                                                             Dim 3    Dim 2

                                      Presentation                           Dim 4
                                                                                  Fact
                                                                                      Dim 1                                                                       Operational
                                                                             Dim 3    Dim 2   Aggregates for                                        Metadata          BI
                     Management                                  Atomic level
                                                                                               Performance                            Security       Mgmt.
                                                                                                                                                                  Operational
                     ETL Services                               business process                                                      Services        and         Systems and
                                                              dimensional models                                                                    Browsing        Reports

                                                                                          Infrastructure
Sample Architecture Plan Document
     Outline
        Executive Overview
        Architecture Implications of Business Requirements
        Architecture Overview
          Back Room and Front Room Services
          Data Stores (Source, Staging, Presentation Servers)
        Metadata Strategy
        ETL System Strategy and Details
        BI Applications System Strategy and Details
        Infrastructure
        Architecture Implementation Phases and Timing
        Technology Evaluation Process

37
        Architecture Models
The ETL System
Populating the data warehouse

                               Technical           Product
                               Architecture        Selection &
                               Design              Installation                              Growth



              Business
   Project                     Dimensional         Physical
              Requirements                                                      Deployment
   Planning                    Modeling            Design
              Definition



                                   BI                             BI                         Maintenance
                                   Application                    Application
                                   Specification                  Development


              Project Management
ETL Startup

      Create   an ETL Plan
        Based on dimensional model docs, data quality, and
         additional research
        Map source tables to each target and identify required
         transformations
        Each target flow corresponds to an ETL package
      Setup   development environment




39
The ETL Functions

      Understand the core functions common to most ETL
      systems (there are 34 of them)
      They   fall into four categories:
        Extract: get the data out of the source and into the DW
         system
        Transformation: clean the data and conform it to
         standard definitions and contents
        Prepare the data for presentation: “dimensionalization”
        Manage all the above functions in a coherent system


40
Populating Dimension Tables

      Recreating  Type 2 change history can be a challenge
      Cleaning and conforming can be complex
      Integrating multiple sources and de-duplicating is a
       process unique to your business
        Integration Services’ tools including Fuzzy Lookup can
         help for simple problems
        Complex problems require a third party tool or service
      Universaldimension function is handling changes in
       dimension attributes (SCDs)



41
Slowly Changing Dimensions

      Dimension       attributes will change over time
      Business   users determine what must be tracked
                                   1
            Source       New Source Rows        Assign        Simple Insert
                                                Surrogate                        Master
            File
                                                Keys                             Dimension

                              2
                                   Type 2
                                             Assign         Update
                                   Changed                                      Insert New
            Compare                          Surrogate      Current_Flags
                                   Rows                                         Current Rows
                                             Keys           and Dates

                              3 Type 1
                                             Update
                                   Changed
           Master                            Value in
                                   Rows                                             Replace
           Dimension                         Current Row          Most-Recent-
                                                                  Key Map           w/Current
                          No Change
                                                                                    Rows
                                                                   (Optional)
                          Ignore
42
Populating Fact Tables

        Populate the initial historical load
            Different source systems, data structures, formats over time
            History missing
            Must build historical Slowly Changing Dimensions first
            Can take a long time
        Develop incremental load logic
          Usually different packages from the historical load
          Push vs. Pull (extract ownership)
          Identify new / changed rows
        Key substitution is big task
        Catch up history and start incremental loads
        Validate data at each step

43
Fact Table ETL:
     Surrogate Key Lookup (Pipeline)
        Replace production keys in the fact table extract with surrogate
         keys from the dimensions
        Maintain and ensure referential integrity!
        Watch for fact table key collisions
                                 Most Recent    Most Recent    Most Recent     Most Recent
                                  Time Key      Product Key     Store Key       Promotion
                                    Map            Map            Map           Key Map

                                 time_ID        product_ID     store_ID        promo_ID
                                 time_key       product_key    store_key       promo_key




                  Fact Table                                                                   Fact Table
                                    replace       replace         replace         replace
                 Records With                                                                 Records With
                                 time_ID with   product_ID     store_ID with   promo_ID w/
                Production IDS                                                               Surrogate Keys
                                   surrogate    w/ surrogate     surrogate       surrogate
               time_ID             time_key     product_key      store_key      promo_key    time_key
               product_ID                                                                    product_key
               store_ID                                                                      store_key
               promotion_ID                                                                  promotion_key
               dollar_sales                                                                  dollar_sales
               unit_sales
               dollar_cost
                                     Referential Integrity Failures                          unit_sales
                                                                                             dollar_cost




                                     Key Collisions
                                                                                                load fact
                                                                                              table records
                                                                                               into DBMS

44
ETL System Summary

      Develop   a plan and setup development environment
      Build out historical dimensions, including Type 2
       attribute changes
      Build out historical facts based on historical dimension
       key substitution
      Design   and build Analysis Services cube(s)
      Create   incremental load packages



45
BI Applications
Delivering Value --not just data

                                Technical       Product
                               Architecture   Selection &
                                 Design       Installation                               Growth



                  Business
      Project                  Dimensional     Physical      ETL Design &
                Requirements                                                Deployment
     Planning                   Modeling        Design       Development
                 Definition



                                                                                     Maintenance




                                               Project Management
BI Application
     Concepts
      Role   and definition
      Application   design
        Templates
        Navigation
      Applications   development
      Additional   Value
        Data validation
        Performance tuning
        Character development
47
Role and Definition of BI
  Applications
   Nature     Consumer           Information
   of Use     Type               Interface                  Value of BI Appl’n

                Ad hoc                                       - Reporting / Analysis
 Strategic      power
                            Desktop tools for
                            Do-it-yourself                     examples
                users                                        - Assured reference points
                            queries             Migration
                                                Path



              Push-button                                    - Low effort
              knowledge             BI                       - Current business view
              workers                                        - Flexible
                                Applications


                                                Migration
                                                Path

               Standard       Operational
               report         Reporting
Operational    consumers      Environment
The BI Application Continuum


                  Standard                        Analytic
                  Reports                         Applications




Simple, fixed                Standard, flexible           Complex analytic
format, pre-run              format, parameter            applications with
reports                      driven reports               domain expertise,
                                                          embedded algorithms
                                                          and operational system
                                                          feedback loops
Design and Spec BI Applications

      Create and prioritize a candidate report list
      Develop a standard report template
        Mandatory content (descriptions, titles, etc.)
        Output look and feel
      Develop  mock-ups for top N candidate BI
       applications
        Report/dashboard layouts with parameters
        Document data sets, business rules, calculations, etc.
      End   user navigation
        Structured path through templates and reports
        This becomes the core of the BI Portal
50
Sample BI Application Mock-up
                            From the Geography
                            Dimension                                                                    Variable Time
                                                                                                         Period
Variable Time Period
                                                                                                        Global BI System
                                          <Geography Name>                                              We’re here to help


                                      Topline Performance Report
                                  <Period> Compared to <Previous Period>

                                         Sales           YA Sales           Sales        Market           % Var
    <<Product Line>>                     Units             Units            Index        Share          Prev Share
      xxxxxxxxxx                        xxx,xxx           xxx,xxx            xx.x        xx.xx              x.x
      xxxxxxxxxx                        xxx,xxx           xxx,xxx            xx.x        xx.xx              x.x
      xxxxxxxxxx                        xxx,xxx           xxx,xxx            xx.x        xx.xx              x.x
      xxxxxxxxxx                        xxx,xxx           xxx,xxx            xx.x        xx.xx              x.x
      xxxxxxxxxx                        xxx,xxx           xxx,xxx            xx.x        xx.xx              x.x

                                                         Report Information
     Report Category: {Sales Analysis}
     Report Name: {Topline Performance Report – current vs. prior period by Geography}
     Source: {DW - Sales Performance}                                                     Run on: {Run_Date} Page { 1}

                                            Product Lines that meet the constraint criteria
                                            (may have drill down capability).
Sample Navigation Design

                                                         BI Portal



                               Marketing                  Sales             Purchasing             Manufact-
                                                                                                   uring




              Sales Activity              Pipeline and            Sales Force             Web Channel
                                          Forecasting             Management              Sales




Product Topline
    Product               Product Trend
                              Product              Market Topline
                                                      Market               Market Trend
                                                                              Market
   Template
    Topline                  Template
                              Trend                  Template
                                                      Topline               Template
                                                                              Trend
Adventure Works Cycles DW/BI Portal
Starting Point
Developing BI Applications
      Begin   development when...
        Data is ready
        Front end tool installed and environment set-up
      Pull out report specifications
      Build reports
      Validate tool calculations and drill-paths
      Validation data
      Performance Tuning
      Ongoing maintenance and enhancement resources
      Character Development
54
BI Applications Summary

        BI Applications play a critical role
          High value reporting
          Broad audience
          Data Warehouse team learning opportunity
      Design Application standards, specs and navigation up
       front
      Develop Applications when data is ready
          Validate tool capabilities
          Check data quality
          Identify query performance problems
        Make sure you have dedicated resources to maintain and
         enhance your BI offerings
55
Rollout and Repeat
Security, Deployment, Operations, and Growth

                               Technical           Product
                              Architecture       Selection &
                                Design           Installation



                 Business
     Project                  Dimensional         Physical      ETL Design &
               Requirements
    Planning                   Modeling            Design       Development
                Definition



                                      BI                             BI
                                 Application                     Application
                                 Specification                  Development


                                                  Project Management
The Thankless Tasks that Must Be
     Done
      Security

      Deployment    process
        User support
          Training, desktop, support, documentation
        System deployment
          dev  Test  Production
      Maintenance
        System
        User support

57
Growth

      The   Lifecycle is an iterative process
      Revisitopportunities with business and select the next
       top priority
      Build   additional dimensions
      Load    facts for this business process
       (fill out the bus matrix row by row)
      Build   and deliver the BI applications
      Rollout   and repeat!

58
Start the Business Dimensional Lifecycle All
   Over Again!

                           Technical           Product
                          Architecture       Selection &
                            Design           Installation                               Growth

             Business
 Project                  Dimensional         Physical      ETL Design &
           Requirements                                                    Deployment
Planning                   Modeling            Design       Development
            Definition

                                  BI                             BI                 Maintenance
                             Application                     Application
                             Specification                  Development



                                              Project Management
Conclusion

      The   Lifecycle is a proven DW/BI methodology
      Keys   to success:
          Business value focused
          Short, iterative delivery cycles
          In an Enterprise framework
          Full, end-to-end solution




60
Contact Info

 warren@kimballgroup.com

 Visit   www.kimballgroup.com for
     Articles
     Design tips (149 and counting)
     Whitepapers
     Forum
 Allof the concepts discussed are expanded on in the
  Kimball Toolkit series of books
THANK YOU
for attending this session and the
2009 PASS Summit in Seattle

Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball Approach

  • 1.
    THE DW/BI SYSTEMLIFECYCLE OVERVIEW The Kimball Approach Warren Thornthwaite
  • 2.
    Acknowledgments  Course materials adapted from...  The Microsoft Data Warehouse Toolkit  J. Mundy, W. Thornthwaite (Wiley 2006)  The Data Warehouse Lifecycle Toolkit, 2nd Ed.  R. Kimball, M. Ross, W. Thornthwaite, J. Mundy, B. Becker (Wiley 2008)  The Data Warehouse Toolkit, 2nd Ed.  R. Kimball, M. Ross (Wiley 2002)  Kimball University  Course materials  Design Tips and Intelligent Enterprise articles at www.KimballUniversity.com 2
  • 3.
    Acknowledgments  Course materials adapted from...  The Data Warehouse Lifecycle Toolkit, 2nd Ed.  R. Kimball, M. Ross, W. Thornthwaite, J. Mundy, B. Becker (Wiley 2008)  The Data Warehouse Toolkit, 2nd Ed.  R. Kimball, M. Ross (Wiley 2002)  The Microsoft Data Warehouse Toolkit, 2nd Ed.  J. Mundy, W. Thornthwaite (Wiley 2011)  Kimball Group / Kimball University  Course materials  Design Tips and Intelligent Enterprise articles at www.KimballGroup.com
  • 4.
    Session Agenda  DW/BI Lifecycle business context  DW/BI System Lifecycle overview  Planning and managing the project/program  Defining business requirements  Creating the dimensional modeling (the data track)  Designing the DW/BI system architecture (the technology track)  Building the ETL system  Building BI applications (the applications track)  Rollout and repeat 4
  • 5.
  • 6.
    The Business Contextto the Lifecycle  Business people need information to make plans and assess results, and this need continues to grow  Data is captured by complex systems structured to support specific transaction requirements  Business people find it difficult to get business information from data in transaction systems Therefore, our job is to create a system that will: reliably take data out of the source systems, restructure its form and content as appropriate for business analysis, and provide it to the business people via tools they can actually use. 6
  • 7.
    Strengthen Your Awarenessof the Broader Context  Ask yourself “What am I doing?”  Writing a program  Building a database  Creating a DW/BI system  Solving a set of high value, difficult problems  Thebroader you think, the more effective you will be in addressing business problems and delivering real business value. 7
  • 8.
  • 9.
    The Kimball Approach  Understand business requirements and deliver business value  Follow a proven methodology: the DW Lifecycle  Build and deliver incrementally (by business process) within an enterprise data framework (Bus Matrix and conformed dimensions)  Design the data sets for flexibility, usability and performance (Business Process Dimensional Model)  Provide the complete solution, including reports, query tools, portals, documentation, training, and support 9
  • 10.
    Business Requirements Arethe Foundation of Success  The more you focus your efforts on information-based business opportunities that are high value and relatively easy to implement, the more likely you will be to succeed.  Regardless of which ETL or BI tools you use  Regardless of which database you use  Regardless of your technical skills
  • 11.
    The DW/BI Lifecycle Technical Product Architecture Selection & Design Installation Growth Business Project Dimensional Physical ETL Design & Requirements Deployment Planning Modeling Design Development Definition BI BI Maintenance Application Application Specification Development Project Management 11
  • 12.
    Planning and Managingthe Project/Program Technical Product Architecture Selection & Design Installation Growth Business Dimensional Physical ETL Design & Requirements Deployment Modeling Design Development Definition BI BI Maintenance Application Application Specification Development
  • 13.
    Project Planning &Management Highlights  Assess readiness and determine starting point  Define the program / project – (2 phased startup)  Phase 1 program level: Enterprise business requirements  Prioritization / Business justification  Phase 2 project scope: Initial business process lifecycle iteration  Plan the project  Team roles and responsibilities  Detailed project plan  Manage the project  Control scope creep  Communication to manage expectations 13
  • 14.
    Data Warehouse Readiness Factors 1. Strong business management sponsor(s) (60%)  Vision of value, Politically capable, Realistic 2. Compelling business motivation (15%)  Generates urgency and supplies justification 3. Feasibility (15%)  Data feasibility 4. Other organizational Issues (10%)  IT/Business partnership  Current analytic culture  Requirements definition and prioritization are best tools to address shortfalls  Proof of concept demo is generally a bad idea 14
  • 15.
    Defining the Project: the Two Phased Startup  Phase 1: Enterprise Requirements definition  Phase 2: Project requirements focused on top priority business process 1 2 Enterprise Project (Horizontal) (Vertical) Initial Requirements Project Project Prioritization Business Planning Business Scope Process Requirement Requirement Definition Definition Project Management 15
  • 16.
    Planning the Project: the Phase 2 Detailed Project Plan  Assign roles and responsibilities  Leverage existing project planning tools  List end-to-end tasks for entire Lifecycle  Integrated and detailed  Key team members should develop estimates for their tasks  User acceptance after major tasks & deliverables  Keep unique characteristics in mind  Cross-functional, high visibility, iterative  Data problems will happen – identify them early! 16
  • 17.
    Estimating Guidelines for Project Planning / Management  Phase 1 requirements and prioritization  Key determinates: Readiness / sponsor scenario  Rule of thumb: Three weeks to months +  Developing the project plan  Rule of thumb: Less than two weeks  On-going project management  Key determinates: Organizational complexities, # players, # issues, political realities, ...  Rule of thumb: Often dedicated to DW/BI team 17
  • 18.
    Defining Business Requirements Technical Product Architecture Selection & Design Installation Growth Project Dimensional Physical ETL Design & Deployment Planning Modeling Design Development BI BI Maintenance Application Application Specification Development Project Management
  • 19.
    Defining Business Requirements: Overall Process  Interviews are preferable  Three phases  Preparation (do your homework)  Interviews (including data source experts)  Documentation  Two passes  Enterprise  Project 19
  • 20.
    Defining Business Requirements:the Interviews  Assign roles and be ready  Must ask the right question  NOT “What do you want?”  Ask “What do you do?” (“What are your roles and responsibilities? What could you do better with improved access to information?”)  Cover key areas and listen  Take notes  Debrief with team immediately after  Common themes / opportunities  Required data (business processes)  Do-ability  Areas requiring clarification  User analytical / technical sophistication 20
  • 21.
    Defining Business Requirements: Interview Results  You must do the formal documentation  Validation  Reference material  Individual interview write-ups  Summary, not transcript  Business Objectives  Analytic opportunities and info requirements  Project Success Criteria  Consolidated findings document  Main content is list and descriptions of analytic opportunities  Includes the initial data warehouse bus matrix 21
  • 22.
    The Data WarehouseBus Matrix is the Enterprise Data Architecture Framework  Matrix of business processes and conformed dimensions Business <--- Dimensions ---> Processes Date Product Dist Ctr Vendor Shipper Store Customer Promo Purchase Orders X X X X Dist Ctr Delivery X X X X X Dist Ctr Inventory X X X Store Deliveries X X X X X Store Inventory X X X Store Sales X X X X X Returns X X X X X 22
  • 23.
    Requirements Prioritization Session  Facilitated session with Business and IT management  Agenda:  Confirm requirements  Prioritize analytic info groups High A  Evaluate business B impact / benefit  Evaluate feasibility E Potential G Business F  Outcomes: Impact H  Mgmt education on feasibility D  “Right” opportunities C  Consensus Low  Ownership / Sponsorship Low Feasibility High  Roadmap for growth 23
  • 24.
    Defining Business Requirements Summary  Understanding business requirements is CRITICAL to successful DW/BI system  Don’t overlook the up-front preparation  Focus on listening  Document what you’ve heard  Analytic requirements  Enterprise bus matrix  Prioritize with senior management 24
  • 25.
    Designing the DimensionalModel The Data Track Technical Product Architecture Selection & Design Installation Growth Business Project Requirements Deployment Planning Definition BI BI Maintenance Application Application Specification Development Project Management
  • 26.
    Designing the Business Process Dimensional Model  Basic dimensional modeling concepts  Slowly changing dimensions  The dimensional modeling process  Data profiling and data stewardship 26
  • 27.
    Terminology: Business Process Dimensional Model (or Star Schema)  Normalized fact table (business event) for a single business process at atomic detail level (the grain)  Denormalized dimensions (entities/objects) with all attributes and one active row per occurrence of the object Product KEY Product KEY Store KEY Date KEY Product Store KEY Store  Benefits: Attributes Promo KEY Attributes  Easier to understand  Better performance  Pre-joined dimensions Date KEY Promo KEY Facts  Star join optimization Promo Date  Dimensional engine Attributes Attributes  Extensible to handle change 27
  • 28.
    Terminology: Slowly Changing Dimension  Techniques for handling changes to dimension attributes  Type 1: overwrite attribute values  Common default, appropriate for corrections  Type 2 : create a new dimension row when attribute value changes  Flexible technique, critical for accurately tracking behavior over time  Hybrid combinations of 1 and 2 are most common  Integration Services has basic Slowly Changing Dimension management built in 28
  • 29.
    Dimensional Modeling Process  Develop the Data Warehouse Bus matrix  Start with the 4-step method to identify facts and dimensions  Step 1: Identify the business process (what row on the matrix should we start with?)  Step 2: Declare the grain  Step 3: Choose the dimensions  Step 4: Choose the facts  Diagram the dimensional model  Fill in the dimension and fact attributes (Step 5)  Use business requirements + source docs + data profiling  Follow naming standards (understandable to business)  Try the dimensional modeling spreadsheet from the book’s web site: http://www.kimballgroup.com/html/booksMDWTtools.html 29
  • 30.
    Creating Conformed Dimensions (Step 5)  All fact tables that share dimensions must use the same dimension with the same key  Agree on column names and definitions  Identify best source  Assign surrogate key to every dimension row Product  Combine all attributes into Surrogate Key Product KEY Master dimension table Business Key Product Code  Use the Master dimension Description to map the business Marketing Brand Category key in the fact rows Height to the surrogate key for Logistics Width each business process Weight that uses the dimension Cost Acctg. Standard Cost 30
  • 31.
    Dealing with DataQuality Data Profiling Data Stewardship  Data exploration to determine  Identify people on the data feasibility business side who care  Understand data structures,  Enroll them in data relationships and business rules exploration  Identify (and document) data problems  Include source systems  Tools managers  Simple: SQL, BI tool, RS project  Agree on names, definitions, (see kimballgroup.com) business rules, etc.  Advanced: Data Profiling tool
  • 32.
    Dimensional Modeling Summary  Enterprise perspective / roadmap  Enterprise Data Warehouse Bus Matrix  Presentation area must be dimensional  Ease of use  Query performance  Start with atomic detail, not just summary  Conform dimensions for consistency  Apply SCD techniques for handling attribute changes  Engage business to define names, content, business rules, and deal with data quality  Process  4-step approach  1) Business process, 2) grain, 3) dimensions, 4) facts  Fill in the attributes and measures (Step 5) 32
  • 33.
    Designing the DW/BISystem Architecture The Technology Track Growth Business Project Dimensional Physical ETL Design & Requirements Deployment Planning Modeling Design Development Definition BI BI Maintenance Application Application Specification Development Project Management
  • 34.
    Architecture Principles  The DW/BI System architecture is the set of components and functionality needed to meet the business requirements  Business requirements determine architecture  Most of the tools include only core functionality. You will have to write code for your specific issues.  Thismeans your DW/BI system architecture will not be the same as your neighbor’s  Draw it out and write it down! 34
  • 35.
    The Goal: AConformed DW/BI System Source Systems ETL Business Process System Dimensional Models Relational Analysis Logistics Inventory Dimension DMBS Services Processing Inventory Fact Sales Orders Processing Orders Billing Aggregates - Analysis Svcs. Returns Marketing - Relational DB Billing / • Models contain atomic-level detail Returns • with aggregates for performance • and transparent aggregate navigation • Includes both relational dimensional model and OLAP dimensional model 35
  • 36.
    What Goes Intoa Typical Warehouse Architecture? High Level Warehouse Technical Architecture Model Back Room Front Room Metadata BI Applications Presentation Server Layer Data Access Services Direct access Dimension query and Source Data Quality Real Time Layer Maintenance Operational reporting tools BI Portal Systems Workbench - Operational Front End BI and Enterprise - ODS Standard Performanc Reporting - Desktop tools Reports - XML / Flat files ETL Metadata Dim 4 Dim 1 e Mgmt Services Aggregate Navigator Fact - MDM system Services Dim 3 Dim 2 Enterprise Bus Matrix - External Storage Repository Dim 4 Dim 1 Analytic Applications Fact -… Dim 3 Dim 2 Dim 4 Dim 1 Dashboards & Fact Dim 3 Dim 2 Delivery Admin. Web Scorecards Cleaning & Applications Dim 4 Dim 1 Extract Services Services Fact Preparing for Conforming Dim 3 Dim 2 Presentation Dim 4 Fact Dim 1 Operational Dim 3 Dim 2 Aggregates for Metadata BI Management Atomic level Performance Security Mgmt. Operational ETL Services business process Services and Systems and dimensional models Browsing Reports Infrastructure
  • 37.
    Sample Architecture PlanDocument Outline  Executive Overview  Architecture Implications of Business Requirements  Architecture Overview  Back Room and Front Room Services  Data Stores (Source, Staging, Presentation Servers)  Metadata Strategy  ETL System Strategy and Details  BI Applications System Strategy and Details  Infrastructure  Architecture Implementation Phases and Timing  Technology Evaluation Process 37  Architecture Models
  • 38.
    The ETL System Populatingthe data warehouse Technical Product Architecture Selection & Design Installation Growth Business Project Dimensional Physical Requirements Deployment Planning Modeling Design Definition BI BI Maintenance Application Application Specification Development Project Management
  • 39.
    ETL Startup  Create an ETL Plan  Based on dimensional model docs, data quality, and additional research  Map source tables to each target and identify required transformations  Each target flow corresponds to an ETL package  Setup development environment 39
  • 40.
    The ETL Functions  Understand the core functions common to most ETL systems (there are 34 of them)  They fall into four categories:  Extract: get the data out of the source and into the DW system  Transformation: clean the data and conform it to standard definitions and contents  Prepare the data for presentation: “dimensionalization”  Manage all the above functions in a coherent system 40
  • 41.
    Populating Dimension Tables  Recreating Type 2 change history can be a challenge  Cleaning and conforming can be complex  Integrating multiple sources and de-duplicating is a process unique to your business  Integration Services’ tools including Fuzzy Lookup can help for simple problems  Complex problems require a third party tool or service  Universaldimension function is handling changes in dimension attributes (SCDs) 41
  • 42.
    Slowly Changing Dimensions  Dimension attributes will change over time  Business users determine what must be tracked 1 Source New Source Rows Assign Simple Insert Surrogate Master File Keys Dimension 2 Type 2 Assign Update Changed Insert New Compare Surrogate Current_Flags Rows Current Rows Keys and Dates 3 Type 1 Update Changed Master Value in Rows Replace Dimension Current Row Most-Recent- Key Map w/Current No Change Rows (Optional) Ignore 42
  • 43.
    Populating Fact Tables  Populate the initial historical load  Different source systems, data structures, formats over time  History missing  Must build historical Slowly Changing Dimensions first  Can take a long time  Develop incremental load logic  Usually different packages from the historical load  Push vs. Pull (extract ownership)  Identify new / changed rows  Key substitution is big task  Catch up history and start incremental loads  Validate data at each step 43
  • 44.
    Fact Table ETL: Surrogate Key Lookup (Pipeline)  Replace production keys in the fact table extract with surrogate keys from the dimensions  Maintain and ensure referential integrity!  Watch for fact table key collisions Most Recent Most Recent Most Recent Most Recent Time Key Product Key Store Key Promotion Map Map Map Key Map time_ID product_ID store_ID promo_ID time_key product_key store_key promo_key Fact Table Fact Table replace replace replace replace Records With Records With time_ID with product_ID store_ID with promo_ID w/ Production IDS Surrogate Keys surrogate w/ surrogate surrogate surrogate time_ID time_key product_key store_key promo_key time_key product_ID product_key store_ID store_key promotion_ID promotion_key dollar_sales dollar_sales unit_sales dollar_cost Referential Integrity Failures unit_sales dollar_cost Key Collisions load fact table records into DBMS 44
  • 45.
    ETL System Summary  Develop a plan and setup development environment  Build out historical dimensions, including Type 2 attribute changes  Build out historical facts based on historical dimension key substitution  Design and build Analysis Services cube(s)  Create incremental load packages 45
  • 46.
    BI Applications Delivering Value--not just data Technical Product Architecture Selection & Design Installation Growth Business Project Dimensional Physical ETL Design & Requirements Deployment Planning Modeling Design Development Definition Maintenance Project Management
  • 47.
    BI Application Concepts  Role and definition  Application design  Templates  Navigation  Applications development  Additional Value  Data validation  Performance tuning  Character development 47
  • 48.
    Role and Definitionof BI Applications Nature Consumer Information of Use Type Interface Value of BI Appl’n Ad hoc - Reporting / Analysis Strategic power Desktop tools for Do-it-yourself examples users - Assured reference points queries Migration Path Push-button - Low effort knowledge BI - Current business view workers - Flexible Applications Migration Path Standard Operational report Reporting Operational consumers Environment
  • 49.
    The BI ApplicationContinuum Standard Analytic Reports Applications Simple, fixed Standard, flexible Complex analytic format, pre-run format, parameter applications with reports driven reports domain expertise, embedded algorithms and operational system feedback loops
  • 50.
    Design and SpecBI Applications  Create and prioritize a candidate report list  Develop a standard report template  Mandatory content (descriptions, titles, etc.)  Output look and feel  Develop mock-ups for top N candidate BI applications  Report/dashboard layouts with parameters  Document data sets, business rules, calculations, etc.  End user navigation  Structured path through templates and reports  This becomes the core of the BI Portal 50
  • 51.
    Sample BI ApplicationMock-up From the Geography Dimension Variable Time Period Variable Time Period Global BI System <Geography Name> We’re here to help Topline Performance Report <Period> Compared to <Previous Period> Sales YA Sales Sales Market % Var <<Product Line>> Units Units Index Share Prev Share xxxxxxxxxx xxx,xxx xxx,xxx xx.x xx.xx x.x xxxxxxxxxx xxx,xxx xxx,xxx xx.x xx.xx x.x xxxxxxxxxx xxx,xxx xxx,xxx xx.x xx.xx x.x xxxxxxxxxx xxx,xxx xxx,xxx xx.x xx.xx x.x xxxxxxxxxx xxx,xxx xxx,xxx xx.x xx.xx x.x Report Information Report Category: {Sales Analysis} Report Name: {Topline Performance Report – current vs. prior period by Geography} Source: {DW - Sales Performance} Run on: {Run_Date} Page { 1} Product Lines that meet the constraint criteria (may have drill down capability).
  • 52.
    Sample Navigation Design BI Portal Marketing Sales Purchasing Manufact- uring Sales Activity Pipeline and Sales Force Web Channel Forecasting Management Sales Product Topline Product Product Trend Product Market Topline Market Market Trend Market Template Topline Template Trend Template Topline Template Trend
  • 53.
    Adventure Works CyclesDW/BI Portal Starting Point
  • 54.
    Developing BI Applications  Begin development when...  Data is ready  Front end tool installed and environment set-up  Pull out report specifications  Build reports  Validate tool calculations and drill-paths  Validation data  Performance Tuning  Ongoing maintenance and enhancement resources  Character Development 54
  • 55.
    BI Applications Summary  BI Applications play a critical role  High value reporting  Broad audience  Data Warehouse team learning opportunity  Design Application standards, specs and navigation up front  Develop Applications when data is ready  Validate tool capabilities  Check data quality  Identify query performance problems  Make sure you have dedicated resources to maintain and enhance your BI offerings 55
  • 56.
    Rollout and Repeat Security,Deployment, Operations, and Growth Technical Product Architecture Selection & Design Installation Business Project Dimensional Physical ETL Design & Requirements Planning Modeling Design Development Definition BI BI Application Application Specification Development Project Management
  • 57.
    The Thankless Tasksthat Must Be Done  Security  Deployment process  User support  Training, desktop, support, documentation  System deployment  dev  Test  Production  Maintenance  System  User support 57
  • 58.
    Growth  The Lifecycle is an iterative process  Revisitopportunities with business and select the next top priority  Build additional dimensions  Load facts for this business process (fill out the bus matrix row by row)  Build and deliver the BI applications  Rollout and repeat! 58
  • 59.
    Start the BusinessDimensional Lifecycle All Over Again! Technical Product Architecture Selection & Design Installation Growth Business Project Dimensional Physical ETL Design & Requirements Deployment Planning Modeling Design Development Definition BI BI Maintenance Application Application Specification Development Project Management
  • 60.
    Conclusion  The Lifecycle is a proven DW/BI methodology  Keys to success:  Business value focused  Short, iterative delivery cycles  In an Enterprise framework  Full, end-to-end solution 60
  • 61.
    Contact Info  warren@kimballgroup.com Visit www.kimballgroup.com for  Articles  Design tips (149 and counting)  Whitepapers  Forum  Allof the concepts discussed are expanded on in the Kimball Toolkit series of books
  • 62.
    THANK YOU for attendingthis session and the 2009 PASS Summit in Seattle