SlideShare a Scribd company logo
1 of 54
Download to read offline
Orderly Approach for DWH construction




2/23/2012   3.Planning & Project management/D.S.Jagli             1
Topics to be covered
 1. How is it different?
 2. Life-cycle approach
 3. The Development Phases
 4. Dimensional Analysis
 5. Dimensional Modeling
   i.    Star Schema
   ii.   Snowflake Scheme




2/23/2012       3.Planning & Project management/D.S.Jagli   2
3.Planning & Project management
  Reasons for DWH projects failure
       1.   Improper planning
       2.   Inadequate project management
  Planning for Data ware house is necessary.
 I.     Key issues needs to be planned
       1.   Value and expectation
       2.   Risk assessment
       3.   Top-down or bottom –up
       4.   Build or Buy
       5.   Single vender or best of breed
 II.  Business requirement ,not technology
 III. Top management support
 IV. Justification
2/23/2012              3.Planning & Project management/D.S.Jagli   3
3.Planning & Project management
  Example for DWH Project
                                                 1.  Introduction
 Outline for overall plan                        2.  Mission statement
                                                 3.  Scope
                                                 4.  Goals& objectives
                                                 5.  Key issues & Options
                                                 6.  Value & expectations
                                                 7.  Justification
                                                 8.  Executive sponsorship
                                                 9.  Implementation
                                                     Strategy
                                                 10. Tentative schedule
                                                 11. Project authorization
2/23/2012       3.Planning & Project management/D.S.Jagli                    4
3.1 How is it different?
  DWH Project Different from OLTP System Project
  DWH Distinguish features and Challenges for Project
     Management
     1. Data Acquisition
     2. Data Storage
     3. Information Delivery




2/23/2012         3.Planning & Project management/D.S.Jagli   5
2/23/2012   3.Planning & Project management/D.S.Jagli   6
3.2 The life-cycle Approach
            Fig: DW functional components and SDLC




2/23/2012      3.Planning & Project management/D.S.Jagli   7
DWH Project Plan: Sample outline




2/23/2012   3.Planning & Project management/D.S.Jagli   8
3.3 DWH
Development
  Phases




2/23/2012     3.Planning & Project management/D.S.Jagli   9
3.3 DWH Development Phases
 1) Project plan
 2)         Requirements definition
 3)         Design
 4)         Construction
 5)         Deployment
 6)         Growth and maintenance

  Interleaved within the design and construction phases are the three
      tracks along with the definition of the architecture and the
      establishment of the infrastructure.

2/23/2012              3.Planning & Project management/D.S.Jagli     10
3.4 Dimensional Analysis

  A data warehouse is an information delivery system.


  It is not about technology, but about solving users’ problems.


  It is providing strategic information to the user.



  In the phase of defining requirements, need to concentrate on
     what information the users need, not on how we are going to
     provide the required information.
2/23/2012          3.Planning & Project management/D.S.Jagli        11
Dimensional Nature of DWH
 1. Usage of Information Unpredictable
           In providing information about the requirements for an operational
            system, the users are able to give you precise details of the required
            functions, information content, and usage patterns.



 2. Dimensional Nature of Business Data
           Even though the users cannot fully describe what they want in a data
            warehouse, they can provide you with very important insights into
            how they think about the business.



2/23/2012               3.Planning & Project management/D.S.Jagli                12
Managers think in business dimensions : example




2/23/2012   3.Planning & Project management/D.S.Jagli   13
Dimensional Nature of Business Data




2/23/2012   3.Planning & Project management/D.S.Jagli   14
Dimensional Nature of Business Data




2/23/2012   3.Planning & Project management/D.S.Jagli   15
Examples of Business Dimensions




2/23/2012   3.Planning & Project management/D.S.Jagli   16
Examples of Business Dimensions




2/23/2012   3.Planning & Project management/D.S.Jagli   17
INFORMATION PACKAGES—A NEW
CONCEPT
  A novel idea is introduced for determining and recording information
     requirements for a data warehouse.


  This concept helps us to give
 • A concrete form to the various insights, nebulous thoughts,
    opinions expressed during the process of collecting requirements.

  The information packages, put together while collecting requirements, are
     very useful for taking the development of the data warehouse to the next
     phases.



2/23/2012             3.Planning & Project management/D.S.Jagli             18
Requirements Not Fully Determinate
 Information packages enable us to:
1.    Define the common subject areas
2.    Design key business metrics
3.    Decide how data must be presented
4.    Determine how users will aggregate or roll up
5.    Decide the data quantity for user analysis or query
6.    Decide how data will be accessed
7.    Establish data granularity
8.    Estimate data warehouse size
9.    Determine the frequency for data refreshing
10.   Determine how information must be packaged

 2/23/2012          3.Planning & Project management/D.S.Jagli   19
An information package.




2/23/2012   3.Planning & Project management/D.S.Jagli   20
Business Dimensions
  Business dimensions form the underlying basis of the new
     methodology for requirements definition.

  Data must be stored to provide for the business dimensions.


  The business dimensions and their hierarchical levels form the
     basis for all further phases.




2/23/2012            3.Planning & Project management/D.S.Jagli   21
Dimension Hierarchies/Categories
 Examples:
1)    Product: Model name, model year, package styling, product line, product category,
      exterior color, interior color, first model year

2)    Dealer: Dealer name, city, state, single brand flag, date first operation

3)    Customer demographics: Age, gender, income range, marital status, household
      size, vehicles owned, home value, own or rent

4)    Payment method: Finance type, term in months, interest rate, agent

5)    Time: Date, month, quarter, year, day of week, day of month, season, holiday flag



     2/23/2012             3.Planning & Project management/D.S.Jagli                22
Key Business Metrics or Facts
  The numbers , users analyze are the measurements or metrics
     that measure the success of their departments.

  These are the facts that indicate to the users how their
     departments are doing in fulfilling their departmental
     objectives.




2/23/2012           3.Planning & Project management/D.S.Jagli   23
Example: automobile sales
  The set of meaningful and useful metrics for analyzing
     automobile sales is as follows:
        Actual sale price
        MSRP sale price
        Options price
        Full price
        Dealer add-ons
        Dealer credits
        Dealer invoice
        Amount of down payment
        Manufacturer proceeds
        Amount financed
2/23/2012                 3.Planning & Project management/D.S.Jagli   24
Star Schema
                                                 Snowflake Scheme




2/23/2012   3.Planning & Project management/D.S.Jagli               25
FROM REQUIREMENTS TO DATA
DESIGN
 1.    The requirements definition completely drives the data design for the data
       warehouse.

 2.    A group of data elements form a data structure.

 3.    Logical data design includes determination of the various data elements
       ,structures of data & establishing the relationships among the data
       structures.

 4.    The information package diagrams form the basis for the logical data
       design for the data warehouse.

 5.    The data design process results in a dimensional data model.

2/23/2012              3.Planning & Project management/D.S.Jagli                26
FROM REQUIREMENTS TO DATA DESIGN




2/23/2012   3.Planning & Project management/D.S.Jagli   27
Dimensional Modeling Basics: Formation of the automaker sales
fact table.




2/23/2012        3.Planning & Project management/D.S.Jagli      28
Formation of the automaker dimension tables.




2/23/2012     3.Planning & Project management/D.S.Jagli   29
Concept of Keys for Dimension table
Surrogate Keys
1. A surrogate key is the primary key for a dimension table and
     is independent of any keys provided by source data systems.
2.   Surrogate keys are created and maintained in the data
     warehouse and should not encode any information about the
     contents of records.
3.   Automatically increasing integers make good surrogate keys.
4.   The original key for each record is carried in the dimension
     table but is not used as the primary key.
5.   Surrogate keys provide the means to maintain data warehouse
     information when dimensions change.

30
Concept of Keys for Dimension table
 Business Keys
  Natural keys
  Will have a meaning and can be generated out of the data from source
   system or can be used as is from source system field




31
The criteria for combining the tables into a
dimensional model.
1.    The model should provide the best data access.
2.     The whole model must be query-centric.
3.     It must be optimized for queries and analyses.
4.     The model must show that the dimension tables interact with
      the fact table.
5.     It should also be structured in such a way that every
      dimension can interact equally with the fact table.
6.     The model should allow drilling down or rolling up along
      dimension hierarchies.


2/23/2012          3.Planning & Project management/D.S.Jagli         32
The Dimensional model :a STAR schema
  With these requirements, we find that a dimensional
     model with the fact table in the middle and the dimension
     tables arranged around the fact table satisfies the condition




2/23/2012          3.Planning & Project management/D.S.Jagli     33
Case study: STAR schema for automaker
sales.




2/23/2012   3.Planning & Project management/D.S.Jagli   34
E-R Modeling Versus Dimensional
 Modeling
1.   OLTP systems capture details of
     events transactions                         1.     DW meant to answer questions on
                                                        overall process
2.    OLTP systems focus on
     individual events                           2.      DW focus is on how managers
                                                        view the business
3.    An OLTP system is a window
     into micro-level transactions               3.     DW focus business trends
4.    Picture at detail level necessary          4.      Information is centered around a
     to run the business                                business process
5.    Suitable only for questions at             5.     Answers show how the business
     transaction level                                  measures the process
6.    Data       consistency,      non-          6.     The measures to be studied in
     redundancy, and efficient data                     many ways along several business
     storage critical                                   dimensions



 2/23/2012             3.Planning & Project management/D.S.Jagli                        35
E-R Modeling Versus Dimensional
Modeling
                                                 Dimensional modeling for the data
    E-R modeling for OLTP                                  warehouse.
          systems




2/23/2012        3.Planning & Project management/D.S.Jagli                      36
2/23/2012   3.Planning & Project management/D.S.Jagli   37
Star Schemas
  Data Modeling Technique to map multidimensional decision
     support data into a relational database.

  Current Relational modeling techniques do not serve the needs
   of advanced data requirements.
  4 Components
       Facts

       Dimensions

       Attributes

       Attribute Hierarchies

2/23/2012           3.Planning & Project management/D.S.Jagli      38
Facts
 1. Numeric measurements (values) that represent a specific
       business aspect or activity.

 2. Stored in a fact table at the center of the star scheme.


 3. Contains facts that are linked through their dimensions.


 4. Updated periodically with data from operational databases



2/23/2012            3.Planning & Project management/D.S.Jagli   39
Dimensions
 1. Qualifying characteristics that provide additional
        perspectives to a given fact

           DSS data is almost always viewed in relation to other data

 2. Dimensions are normally stored in dimension tables




2/23/2012             3.Planning & Project management/D.S.Jagli          40
Attributes
 1.    Dimension Tables contain Attributes.

 2.    Attributes are used to search, filter, or classify facts.

 3.    Dimensions provide descriptive characteristics about the facts through
       their attributed.

 4.    Must define common business attributes that will be used to narrow a
       search, group information, or describe dimensions. (ex.: Time / Location /
       Product).

 5.    No mathematical limit to the number of dimensions (3-D makes it easy to
       model).
2/23/2012               3.Planning & Project management/D.S.Jagli               41
Attribute Hierarchies
 1. Provides a Top-Down data organization
     Aggregation
     Drill-down / Roll-Up data analysis
 2. Attributes from different dimensions can be grouped to
        form a hierarchy




2/23/2012          3.Planning & Project management/D.S.Jagli   42
Concept of Keys for Star schema
Surrogate Keys
 The surrogate keys are simply system-generated sequence numbers and is
  independent of any keys provided by source data systems.
 They do not have any built-in meanings.
 Surrogate keys are created and maintained in the data warehouse and should not
  encode any information about the contents of records;
 Automatically increasing integers make good surrogate keys.
 The original key for each record is carried in the dimension table but is not used
  as the primary key.
Business Keys
Primary Keys
 Each row in a dimension table is identified by a unique value of an attribute
  designated as the primary key of the dimension.
Foreign Keys
 Each dimension table is in a one-to-many relationship with the central fact table.
  So the primary key of each dimension table must be a foreign key in the fact
  table.
   43
Star Schema for Sales
Dimension
Tables




                                       Fact Table
   2/23/2012    3.Planning & Project management/D.S.Jagli   44
Star Schema Representation
  Fact and Dimensions are represented by physical tables in the
     data warehouse database.

  Fact tables are related to each dimension table in a Many to
     One relationship (Primary/Foreign Key Relationships).

  Fact Table is related to many dimension tables
        The primary key of the fact table is a composite primary key
            from the dimension tables.

  Each fact table is designed to answer a specific DSS question

2/23/2012              3.Planning & Project management/D.S.Jagli    45
Star Schema
  The fact table is always the larges table in the star schema.


  Each dimension record is related to thousand of fact records.


  Star Schema facilitated data retrieval functions.


  DBMS first searches the Dimension Tables before the larger
     fact table



2/23/2012          3.Planning & Project management/D.S.Jagli       46
Star Schema : advantages
 1. Easy to understand
 2. Optimizes Navigation
 3. Most Suitable for Query Processing




2/23/2012        3.Planning & Project management/D.S.Jagli   47
2/23/2012   3.Planning & Project management/D.S.Jagli   48
THE SNOWFLAKE SCHEMA
  Snowflaking” is a method of normalizing the dimension
     tables in a STAR schema.




2/23/2012         3.Planning & Project management/D.S.Jagli   49
Sales: a simple STAR schema.




2/23/2012   3.Planning & Project management/D.S.Jagli   50
Product dimension: partially normalized




2/23/2012    3.Planning & Project management/D.S.Jagli   51
When to Snowflake
  The principle behind snowflaking is normalization of the
     dimension tables by removing low cardinality attributes and
     forming separate tables.

  In a similar manner, some situations provide opportunities to
     separate out a set of attributes and form a subdimension.




2/23/2012           3.Planning & Project management/D.S.Jagli    52
Advantages and Disadvantages
  Advantages
  Small savings in storage space
  Normalized structures are easier to update and maintain
  Disadvantages
  Schema less intuitive and end-users are put off by the
   complexity
  Ability to browse through the contents difficult
  Degraded query performance because of additional joins


2/23/2012        3.Planning & Project management/D.S.Jagli   53
???
            Thank you
2/23/2012    3.Planning & Project management/D.S.Jagli   54

More Related Content

What's hot (20)

Dimensional Modelling
Dimensional ModellingDimensional Modelling
Dimensional Modelling
 
Ppt
PptPpt
Ppt
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehouse physical design
Data warehouse physical designData warehouse physical design
Data warehouse physical design
 
Star schema PPT
Star schema PPTStar schema PPT
Star schema PPT
 
Dimensional model | | Fact Tables | | Types
Dimensional model | | Fact Tables | | TypesDimensional model | | Fact Tables | | Types
Dimensional model | | Fact Tables | | Types
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
Data modeling star schema
Data modeling star schemaData modeling star schema
Data modeling star schema
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Data mart
Data martData mart
Data mart
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
ETL Process
ETL ProcessETL Process
ETL Process
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Data mining and data warehousing
Data mining and data warehousingData mining and data warehousing
Data mining and data warehousing
 

Viewers also liked

White Paper - Data Warehouse Project Management
White Paper - Data Warehouse Project ManagementWhite Paper - Data Warehouse Project Management
White Paper - Data Warehouse Project ManagementDavid Walker
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecyclebartlowe
 
Gathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data WarehousesGathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data WarehousesDavid Walker
 
Project Planning and Management (Presentation)
Project Planning and Management (Presentation)Project Planning and Management (Presentation)
Project Planning and Management (Presentation)Trokon Bryant
 
Data Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowData Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowMapR Technologies
 
Information Architech and DWH with PowerDesigner
Information Architech and DWH with PowerDesignerInformation Architech and DWH with PowerDesigner
Information Architech and DWH with PowerDesignerSybase Türkiye
 
Lecture 13
Lecture 13Lecture 13
Lecture 13Shani729
 
Data warehouse master test plan
Data warehouse master test planData warehouse master test plan
Data warehouse master test planWayne Yaddow
 
5 project management project planning
5 project management  project planning5 project management  project planning
5 project management project planningYasirHamour
 
Business intelligence implementation case study
Business intelligence implementation case studyBusiness intelligence implementation case study
Business intelligence implementation case studyJennie Chen, CTP
 
Real World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data WarehousingReal World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data Warehousingukc4
 
Requirements Gathering and Discovery
Requirements Gathering and DiscoveryRequirements Gathering and Discovery
Requirements Gathering and DiscoverySean Larkin
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
 

Viewers also liked (20)

Planning Data Warehouse
Planning Data WarehousePlanning Data Warehouse
Planning Data Warehouse
 
White Paper - Data Warehouse Project Management
White Paper - Data Warehouse Project ManagementWhite Paper - Data Warehouse Project Management
White Paper - Data Warehouse Project Management
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecycle
 
Gathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data WarehousesGathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data Warehouses
 
Project Planning and Management (Presentation)
Project Planning and Management (Presentation)Project Planning and Management (Presentation)
Project Planning and Management (Presentation)
 
Project Planning & Management
Project Planning & ManagementProject Planning & Management
Project Planning & Management
 
Data Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowData Warehouse Evolution Roadshow
Data Warehouse Evolution Roadshow
 
Information Architech and DWH with PowerDesigner
Information Architech and DWH with PowerDesignerInformation Architech and DWH with PowerDesigner
Information Architech and DWH with PowerDesigner
 
Lecture 13
Lecture 13Lecture 13
Lecture 13
 
Project Management
Project ManagementProject Management
Project Management
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Data warehouse master test plan
Data warehouse master test planData warehouse master test plan
Data warehouse master test plan
 
5 project management project planning
5 project management  project planning5 project management  project planning
5 project management project planning
 
Business intelligence implementation case study
Business intelligence implementation case studyBusiness intelligence implementation case study
Business intelligence implementation case study
 
Real World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data WarehousingReal World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data Warehousing
 
Retail Data Warehouse
Retail Data WarehouseRetail Data Warehouse
Retail Data Warehouse
 
Planning phase of project
Planning phase of projectPlanning phase of project
Planning phase of project
 
Requirements Gathering and Discovery
Requirements Gathering and DiscoveryRequirements Gathering and Discovery
Requirements Gathering and Discovery
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 

Similar to Orderly Approach for DWH Construction

Knowledge area 5 from IMBOK
Knowledge area 5 from IMBOKKnowledge area 5 from IMBOK
Knowledge area 5 from IMBOKNeha Chopra
 
System Analysis & Design (CHAPTER TWO) (1).ppt
System Analysis & Design (CHAPTER TWO) (1).pptSystem Analysis & Design (CHAPTER TWO) (1).ppt
System Analysis & Design (CHAPTER TWO) (1).pptAynetuTerefe2
 
ADV: Solving the data visualization dilemma
ADV: Solving the data visualization dilemmaADV: Solving the data visualization dilemma
ADV: Solving the data visualization dilemmaGrant Thornton LLP
 
DATA WAREHOUSE AND BIG DATA INTEGRATION
DATA WAREHOUSE AND BIG DATA INTEGRATIONDATA WAREHOUSE AND BIG DATA INTEGRATION
DATA WAREHOUSE AND BIG DATA INTEGRATIONijcsit
 
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?AgileNetwork
 
No matter how hard we try, planning is not perfect, and sometimes .docx
No matter how hard we try, planning is not perfect, and sometimes .docxNo matter how hard we try, planning is not perfect, and sometimes .docx
No matter how hard we try, planning is not perfect, and sometimes .docxhenrymartin15260
 
2 data warehouse life cycle golfarelli
2 data warehouse life cycle golfarelli2 data warehouse life cycle golfarelli
2 data warehouse life cycle golfarellitruongthuthuy47
 
How To Develop A Project Management Plan
How To Develop A Project Management PlanHow To Develop A Project Management Plan
How To Develop A Project Management PlanOrangescrum
 
Enabling a Bimodal IT Framework for Advanced Analytics with Data Virtualization
Enabling a Bimodal IT Framework for Advanced Analytics with Data VirtualizationEnabling a Bimodal IT Framework for Advanced Analytics with Data Virtualization
Enabling a Bimodal IT Framework for Advanced Analytics with Data VirtualizationDenodo
 
Adapting to Uncertain and Evolving Requirements: The Case of Business-Driven ...
Adapting to Uncertain and Evolving Requirements: The Case of Business-Driven ...Adapting to Uncertain and Evolving Requirements: The Case of Business-Driven ...
Adapting to Uncertain and Evolving Requirements: The Case of Business-Driven ...Eric Yu
 
Sabiron PLM Project Methodology.pdf
Sabiron PLM Project Methodology.pdfSabiron PLM Project Methodology.pdf
Sabiron PLM Project Methodology.pdfBrion Carroll (II)
 
Managing projects
Managing projectsManaging projects
Managing projectsNovoraj Roy
 
requirement gathering
requirement gatheringrequirement gathering
requirement gatheringSaeedMat
 

Similar to Orderly Approach for DWH Construction (20)

Knowledge area 5 from IMBOK
Knowledge area 5 from IMBOKKnowledge area 5 from IMBOK
Knowledge area 5 from IMBOK
 
System Analysis & Design (CHAPTER TWO) (1).ppt
System Analysis & Design (CHAPTER TWO) (1).pptSystem Analysis & Design (CHAPTER TWO) (1).ppt
System Analysis & Design (CHAPTER TWO) (1).ppt
 
ADV: Solving the data visualization dilemma
ADV: Solving the data visualization dilemmaADV: Solving the data visualization dilemma
ADV: Solving the data visualization dilemma
 
DATA WAREHOUSE AND BIG DATA INTEGRATION
DATA WAREHOUSE AND BIG DATA INTEGRATIONDATA WAREHOUSE AND BIG DATA INTEGRATION
DATA WAREHOUSE AND BIG DATA INTEGRATION
 
Data Warehouse and Big Data Integration
Data Warehouse and Big Data IntegrationData Warehouse and Big Data Integration
Data Warehouse and Big Data Integration
 
DATA WAREHOUSE AND BIG DATA INTEGRATION
DATA WAREHOUSE AND BIG DATA INTEGRATIONDATA WAREHOUSE AND BIG DATA INTEGRATION
DATA WAREHOUSE AND BIG DATA INTEGRATION
 
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
 
No matter how hard we try, planning is not perfect, and sometimes .docx
No matter how hard we try, planning is not perfect, and sometimes .docxNo matter how hard we try, planning is not perfect, and sometimes .docx
No matter how hard we try, planning is not perfect, and sometimes .docx
 
2 data warehouse life cycle golfarelli
2 data warehouse life cycle golfarelli2 data warehouse life cycle golfarelli
2 data warehouse life cycle golfarelli
 
03_AgilePM.pptx
03_AgilePM.pptx03_AgilePM.pptx
03_AgilePM.pptx
 
Bringing project learning to forefront
Bringing project learning to forefrontBringing project learning to forefront
Bringing project learning to forefront
 
Bringing project learning to forefront
Bringing project learning to forefrontBringing project learning to forefront
Bringing project learning to forefront
 
How To Develop A Project Management Plan
How To Develop A Project Management PlanHow To Develop A Project Management Plan
How To Develop A Project Management Plan
 
MTech- Viva_Voce
MTech- Viva_VoceMTech- Viva_Voce
MTech- Viva_Voce
 
Enabling a Bimodal IT Framework for Advanced Analytics with Data Virtualization
Enabling a Bimodal IT Framework for Advanced Analytics with Data VirtualizationEnabling a Bimodal IT Framework for Advanced Analytics with Data Virtualization
Enabling a Bimodal IT Framework for Advanced Analytics with Data Virtualization
 
Adapting to Uncertain and Evolving Requirements: The Case of Business-Driven ...
Adapting to Uncertain and Evolving Requirements: The Case of Business-Driven ...Adapting to Uncertain and Evolving Requirements: The Case of Business-Driven ...
Adapting to Uncertain and Evolving Requirements: The Case of Business-Driven ...
 
Sabiron PLM Project Methodology.pdf
Sabiron PLM Project Methodology.pdfSabiron PLM Project Methodology.pdf
Sabiron PLM Project Methodology.pdf
 
Spmcasestudy
SpmcasestudySpmcasestudy
Spmcasestudy
 
Managing projects
Managing projectsManaging projects
Managing projects
 
requirement gathering
requirement gatheringrequirement gathering
requirement gathering
 

More from VESIT/University of Mumbai (10)

4.dynamic analysis
4.dynamic analysis4.dynamic analysis
4.dynamic analysis
 
3.static testing
3.static testing 3.static testing
3.static testing
 
2.testing in the software life cycle
2.testing in the software life cycle 2.testing in the software life cycle
2.testing in the software life cycle
 
1.basics of software testing
1.basics of software testing 1.basics of software testing
1.basics of software testing
 
Handy annotations-within-oracle-10g
Handy annotations-within-oracle-10gHandy annotations-within-oracle-10g
Handy annotations-within-oracle-10g
 
Rational Unified Treatment for Web Application Vulnerability Assessment
Rational Unified Treatment for Web Application Vulnerability AssessmentRational Unified Treatment for Web Application Vulnerability Assessment
Rational Unified Treatment for Web Application Vulnerability Assessment
 
THE APPLICATION OF CAUSE EFFECT GRAPH FOR THE COLLEGE PLACEMENT PROCESS
THE APPLICATION OF CAUSE EFFECT GRAPH FOR THE COLLEGE PLACEMENT PROCESSTHE APPLICATION OF CAUSE EFFECT GRAPH FOR THE COLLEGE PLACEMENT PROCESS
THE APPLICATION OF CAUSE EFFECT GRAPH FOR THE COLLEGE PLACEMENT PROCESS
 
Working with temperaments
Working with temperamentsWorking with temperaments
Working with temperaments
 
Data ware house
Data ware houseData ware house
Data ware house
 
Parallel Database
Parallel DatabaseParallel Database
Parallel Database
 

Orderly Approach for DWH Construction

  • 1. Orderly Approach for DWH construction 2/23/2012 3.Planning & Project management/D.S.Jagli 1
  • 2. Topics to be covered 1. How is it different? 2. Life-cycle approach 3. The Development Phases 4. Dimensional Analysis 5. Dimensional Modeling i. Star Schema ii. Snowflake Scheme 2/23/2012 3.Planning & Project management/D.S.Jagli 2
  • 3. 3.Planning & Project management  Reasons for DWH projects failure 1. Improper planning 2. Inadequate project management  Planning for Data ware house is necessary. I. Key issues needs to be planned 1. Value and expectation 2. Risk assessment 3. Top-down or bottom –up 4. Build or Buy 5. Single vender or best of breed II. Business requirement ,not technology III. Top management support IV. Justification 2/23/2012 3.Planning & Project management/D.S.Jagli 3
  • 4. 3.Planning & Project management  Example for DWH Project 1. Introduction Outline for overall plan 2. Mission statement 3. Scope 4. Goals& objectives 5. Key issues & Options 6. Value & expectations 7. Justification 8. Executive sponsorship 9. Implementation Strategy 10. Tentative schedule 11. Project authorization 2/23/2012 3.Planning & Project management/D.S.Jagli 4
  • 5. 3.1 How is it different?  DWH Project Different from OLTP System Project  DWH Distinguish features and Challenges for Project Management 1. Data Acquisition 2. Data Storage 3. Information Delivery 2/23/2012 3.Planning & Project management/D.S.Jagli 5
  • 6. 2/23/2012 3.Planning & Project management/D.S.Jagli 6
  • 7. 3.2 The life-cycle Approach Fig: DW functional components and SDLC 2/23/2012 3.Planning & Project management/D.S.Jagli 7
  • 8. DWH Project Plan: Sample outline 2/23/2012 3.Planning & Project management/D.S.Jagli 8
  • 9. 3.3 DWH Development Phases 2/23/2012 3.Planning & Project management/D.S.Jagli 9
  • 10. 3.3 DWH Development Phases 1) Project plan 2) Requirements definition 3) Design 4) Construction 5) Deployment 6) Growth and maintenance  Interleaved within the design and construction phases are the three tracks along with the definition of the architecture and the establishment of the infrastructure. 2/23/2012 3.Planning & Project management/D.S.Jagli 10
  • 11. 3.4 Dimensional Analysis  A data warehouse is an information delivery system.  It is not about technology, but about solving users’ problems.  It is providing strategic information to the user.  In the phase of defining requirements, need to concentrate on what information the users need, not on how we are going to provide the required information. 2/23/2012 3.Planning & Project management/D.S.Jagli 11
  • 12. Dimensional Nature of DWH 1. Usage of Information Unpredictable  In providing information about the requirements for an operational system, the users are able to give you precise details of the required functions, information content, and usage patterns. 2. Dimensional Nature of Business Data  Even though the users cannot fully describe what they want in a data warehouse, they can provide you with very important insights into how they think about the business. 2/23/2012 3.Planning & Project management/D.S.Jagli 12
  • 13. Managers think in business dimensions : example 2/23/2012 3.Planning & Project management/D.S.Jagli 13
  • 14. Dimensional Nature of Business Data 2/23/2012 3.Planning & Project management/D.S.Jagli 14
  • 15. Dimensional Nature of Business Data 2/23/2012 3.Planning & Project management/D.S.Jagli 15
  • 16. Examples of Business Dimensions 2/23/2012 3.Planning & Project management/D.S.Jagli 16
  • 17. Examples of Business Dimensions 2/23/2012 3.Planning & Project management/D.S.Jagli 17
  • 18. INFORMATION PACKAGES—A NEW CONCEPT  A novel idea is introduced for determining and recording information requirements for a data warehouse.  This concept helps us to give • A concrete form to the various insights, nebulous thoughts, opinions expressed during the process of collecting requirements.  The information packages, put together while collecting requirements, are very useful for taking the development of the data warehouse to the next phases. 2/23/2012 3.Planning & Project management/D.S.Jagli 18
  • 19. Requirements Not Fully Determinate  Information packages enable us to: 1. Define the common subject areas 2. Design key business metrics 3. Decide how data must be presented 4. Determine how users will aggregate or roll up 5. Decide the data quantity for user analysis or query 6. Decide how data will be accessed 7. Establish data granularity 8. Estimate data warehouse size 9. Determine the frequency for data refreshing 10. Determine how information must be packaged 2/23/2012 3.Planning & Project management/D.S.Jagli 19
  • 20. An information package. 2/23/2012 3.Planning & Project management/D.S.Jagli 20
  • 21. Business Dimensions  Business dimensions form the underlying basis of the new methodology for requirements definition.  Data must be stored to provide for the business dimensions.  The business dimensions and their hierarchical levels form the basis for all further phases. 2/23/2012 3.Planning & Project management/D.S.Jagli 21
  • 22. Dimension Hierarchies/Categories  Examples: 1) Product: Model name, model year, package styling, product line, product category, exterior color, interior color, first model year 2) Dealer: Dealer name, city, state, single brand flag, date first operation 3) Customer demographics: Age, gender, income range, marital status, household size, vehicles owned, home value, own or rent 4) Payment method: Finance type, term in months, interest rate, agent 5) Time: Date, month, quarter, year, day of week, day of month, season, holiday flag 2/23/2012 3.Planning & Project management/D.S.Jagli 22
  • 23. Key Business Metrics or Facts  The numbers , users analyze are the measurements or metrics that measure the success of their departments.  These are the facts that indicate to the users how their departments are doing in fulfilling their departmental objectives. 2/23/2012 3.Planning & Project management/D.S.Jagli 23
  • 24. Example: automobile sales  The set of meaningful and useful metrics for analyzing automobile sales is as follows:  Actual sale price  MSRP sale price  Options price  Full price  Dealer add-ons  Dealer credits  Dealer invoice  Amount of down payment  Manufacturer proceeds  Amount financed 2/23/2012 3.Planning & Project management/D.S.Jagli 24
  • 25. Star Schema Snowflake Scheme 2/23/2012 3.Planning & Project management/D.S.Jagli 25
  • 26. FROM REQUIREMENTS TO DATA DESIGN 1. The requirements definition completely drives the data design for the data warehouse. 2. A group of data elements form a data structure. 3. Logical data design includes determination of the various data elements ,structures of data & establishing the relationships among the data structures. 4. The information package diagrams form the basis for the logical data design for the data warehouse. 5. The data design process results in a dimensional data model. 2/23/2012 3.Planning & Project management/D.S.Jagli 26
  • 27. FROM REQUIREMENTS TO DATA DESIGN 2/23/2012 3.Planning & Project management/D.S.Jagli 27
  • 28. Dimensional Modeling Basics: Formation of the automaker sales fact table. 2/23/2012 3.Planning & Project management/D.S.Jagli 28
  • 29. Formation of the automaker dimension tables. 2/23/2012 3.Planning & Project management/D.S.Jagli 29
  • 30. Concept of Keys for Dimension table Surrogate Keys 1. A surrogate key is the primary key for a dimension table and is independent of any keys provided by source data systems. 2. Surrogate keys are created and maintained in the data warehouse and should not encode any information about the contents of records. 3. Automatically increasing integers make good surrogate keys. 4. The original key for each record is carried in the dimension table but is not used as the primary key. 5. Surrogate keys provide the means to maintain data warehouse information when dimensions change. 30
  • 31. Concept of Keys for Dimension table Business Keys  Natural keys  Will have a meaning and can be generated out of the data from source system or can be used as is from source system field 31
  • 32. The criteria for combining the tables into a dimensional model. 1. The model should provide the best data access. 2. The whole model must be query-centric. 3. It must be optimized for queries and analyses. 4. The model must show that the dimension tables interact with the fact table. 5. It should also be structured in such a way that every dimension can interact equally with the fact table. 6. The model should allow drilling down or rolling up along dimension hierarchies. 2/23/2012 3.Planning & Project management/D.S.Jagli 32
  • 33. The Dimensional model :a STAR schema  With these requirements, we find that a dimensional model with the fact table in the middle and the dimension tables arranged around the fact table satisfies the condition 2/23/2012 3.Planning & Project management/D.S.Jagli 33
  • 34. Case study: STAR schema for automaker sales. 2/23/2012 3.Planning & Project management/D.S.Jagli 34
  • 35. E-R Modeling Versus Dimensional Modeling 1. OLTP systems capture details of events transactions 1. DW meant to answer questions on overall process 2. OLTP systems focus on individual events 2. DW focus is on how managers view the business 3. An OLTP system is a window into micro-level transactions 3. DW focus business trends 4. Picture at detail level necessary 4. Information is centered around a to run the business business process 5. Suitable only for questions at 5. Answers show how the business transaction level measures the process 6. Data consistency, non- 6. The measures to be studied in redundancy, and efficient data many ways along several business storage critical dimensions 2/23/2012 3.Planning & Project management/D.S.Jagli 35
  • 36. E-R Modeling Versus Dimensional Modeling Dimensional modeling for the data E-R modeling for OLTP warehouse. systems 2/23/2012 3.Planning & Project management/D.S.Jagli 36
  • 37. 2/23/2012 3.Planning & Project management/D.S.Jagli 37
  • 38. Star Schemas  Data Modeling Technique to map multidimensional decision support data into a relational database.  Current Relational modeling techniques do not serve the needs of advanced data requirements.  4 Components  Facts  Dimensions  Attributes  Attribute Hierarchies 2/23/2012 3.Planning & Project management/D.S.Jagli 38
  • 39. Facts 1. Numeric measurements (values) that represent a specific business aspect or activity. 2. Stored in a fact table at the center of the star scheme. 3. Contains facts that are linked through their dimensions. 4. Updated periodically with data from operational databases 2/23/2012 3.Planning & Project management/D.S.Jagli 39
  • 40. Dimensions 1. Qualifying characteristics that provide additional perspectives to a given fact  DSS data is almost always viewed in relation to other data 2. Dimensions are normally stored in dimension tables 2/23/2012 3.Planning & Project management/D.S.Jagli 40
  • 41. Attributes 1. Dimension Tables contain Attributes. 2. Attributes are used to search, filter, or classify facts. 3. Dimensions provide descriptive characteristics about the facts through their attributed. 4. Must define common business attributes that will be used to narrow a search, group information, or describe dimensions. (ex.: Time / Location / Product). 5. No mathematical limit to the number of dimensions (3-D makes it easy to model). 2/23/2012 3.Planning & Project management/D.S.Jagli 41
  • 42. Attribute Hierarchies 1. Provides a Top-Down data organization  Aggregation  Drill-down / Roll-Up data analysis 2. Attributes from different dimensions can be grouped to form a hierarchy 2/23/2012 3.Planning & Project management/D.S.Jagli 42
  • 43. Concept of Keys for Star schema Surrogate Keys  The surrogate keys are simply system-generated sequence numbers and is independent of any keys provided by source data systems.  They do not have any built-in meanings.  Surrogate keys are created and maintained in the data warehouse and should not encode any information about the contents of records;  Automatically increasing integers make good surrogate keys.  The original key for each record is carried in the dimension table but is not used as the primary key. Business Keys Primary Keys  Each row in a dimension table is identified by a unique value of an attribute designated as the primary key of the dimension. Foreign Keys  Each dimension table is in a one-to-many relationship with the central fact table. So the primary key of each dimension table must be a foreign key in the fact table. 43
  • 44. Star Schema for Sales Dimension Tables Fact Table 2/23/2012 3.Planning & Project management/D.S.Jagli 44
  • 45. Star Schema Representation  Fact and Dimensions are represented by physical tables in the data warehouse database.  Fact tables are related to each dimension table in a Many to One relationship (Primary/Foreign Key Relationships).  Fact Table is related to many dimension tables  The primary key of the fact table is a composite primary key from the dimension tables.  Each fact table is designed to answer a specific DSS question 2/23/2012 3.Planning & Project management/D.S.Jagli 45
  • 46. Star Schema  The fact table is always the larges table in the star schema.  Each dimension record is related to thousand of fact records.  Star Schema facilitated data retrieval functions.  DBMS first searches the Dimension Tables before the larger fact table 2/23/2012 3.Planning & Project management/D.S.Jagli 46
  • 47. Star Schema : advantages 1. Easy to understand 2. Optimizes Navigation 3. Most Suitable for Query Processing 2/23/2012 3.Planning & Project management/D.S.Jagli 47
  • 48. 2/23/2012 3.Planning & Project management/D.S.Jagli 48
  • 49. THE SNOWFLAKE SCHEMA  Snowflaking” is a method of normalizing the dimension tables in a STAR schema. 2/23/2012 3.Planning & Project management/D.S.Jagli 49
  • 50. Sales: a simple STAR schema. 2/23/2012 3.Planning & Project management/D.S.Jagli 50
  • 51. Product dimension: partially normalized 2/23/2012 3.Planning & Project management/D.S.Jagli 51
  • 52. When to Snowflake  The principle behind snowflaking is normalization of the dimension tables by removing low cardinality attributes and forming separate tables.  In a similar manner, some situations provide opportunities to separate out a set of attributes and form a subdimension. 2/23/2012 3.Planning & Project management/D.S.Jagli 52
  • 53. Advantages and Disadvantages  Advantages  Small savings in storage space  Normalized structures are easier to update and maintain  Disadvantages  Schema less intuitive and end-users are put off by the complexity  Ability to browse through the contents difficult  Degraded query performance because of additional joins 2/23/2012 3.Planning & Project management/D.S.Jagli 53
  • 54. ??? Thank you 2/23/2012 3.Planning & Project management/D.S.Jagli 54