SlideShare a Scribd company logo
1 of 24
Introduction to Data Warehousing

              December 20, 2012



Tameem Ahmad
M.Tech. (F)
ZHCET, AMU, Aligarh
References:
   • “Building Data Warehouse” by Inmon (Third Edition), New
       York: John Wiley & Sons, (2002)
   • “Data Mining: Concepts and Techniques” by Han,Kamber.
     2000
   • http://www.data-warehouse-online.com/ [Accessed: November
     4, 2012]
   •   Data Warehousing Battle of the Giants: Comparing the Basics of the
       Kimball and Inmon Models: by Mary Breslin
       http://www.bibestpractices.com/view-articles/4768




12/27/2012                        Tameem Ahmad                              2
Plan for the Presentation
   •   Necessity of Data Warehousing. (Why it is needed?)
   •   What is Data Warehousing?
   •   Architecture
   •   Schema
   •   How to build Data Warehouse (components)
   •   Data Warehousing Tools




12/27/2012                  Tameem Ahmad                    3
?             ?                  ?      ?



             Necessity is the mother of invention…




                    Why Data Warehouse?


12/27/2012                     Tameem Ahmad              4
Scenario

   • ABC Pvt Ltd is a company with branches
     at Mumbai, Delhi, Chennai and Banglore.
     The Sales Manager wants quarterly sales
     report. Each branch has a separate
     operational system.




12/27/2012         Tameem Ahmad                5
Scenarion: ABC Pvt. Ltd.

        Mumbai




         Delhi
                  Sales per item type per branch    Sales
                         for first quarter.        Manager

        Chennai




       Banglore

12/27/2012                  Tameem Ahmad                     6
                                                                 6
Solution: ABC Pvt. Ltd.

     Extract sales information from each
      database.
     Store the information in a common
      repository at a single site.




12/27/2012        Tameem Ahmad              7
Solution: ABC Pvt. Ltd.

  Mumbai


                                                 Report
    Delhi
                                  Query &                  Sales
               Data             Analysis tools            Manager
             Warehouse

  Chennai




 Banglore

12/27/2012       Tameem Ahmad                                  8
Data Warehousing…
   • Definition
                  A data warehouse is
                  »          -subject-oriented,
                  »          -integrated,
                  »          -time-variant,
                  »          -nonvolatile
                  collection of data in support of management’s decision
                  making process.




12/27/2012           Tameem Ahmad                                          9
Subject-oriented
   • Data warehouse is organized around subjects such as
     sales, product, customer.
   • It focuses on modeling and analysis of data for decision
     makers.
   • Excludes data not useful in decision support process.




12/27/2012                  Tameem Ahmad                        10
Integration
  • Data Warehouse is constructed by integrating multiple
    heterogeneous sources.
  • Data Preprocessing are applied to ensure consistency.
             RDBMS


              Legacy                       Data
              System                     Warehouse

                                                Data Processing
             Flat File
                                                Data
                                                Transformation




12/27/2012                Tameem Ahmad                            11
Time-variant
   • Provides information from historical perspective e.g. past 5-
     10 years




12/27/2012                  Tameem Ahmad                         12
Nonvolatile
   • Data once recorded cannot be updated.
   • Data warehouse requires two operations
     in data accessing
        – Initial loading of data
        – Access of data



             load                                  access




12/27/2012                          Tameem Ahmad            13
Data Warehousing Architecture




12/27/2012    Tameem Ahmad        14
Data Warehousing Architecture                    (Contt…)


   • Data Warehouse server
       • almost always a relational DBMS, rarely flat files
   • OLAP servers
       • to support and operate on multi-dimensional data
       structures
   • Clients
       • Query and reporting tools
       • Analysis tools
       • Data mining tools




12/27/2012                Tameem Ahmad                        15
Data Warehousing Schema
   • Star Schema
   • Snowflake Schema




12/27/2012              Tameem Ahmad   16
Measures & Dimensions

   • Measure – Units sold, Amount.

   • Dimensions – Product, Time, Region




12/27/2012            Tameem Ahmad        17
Star Schema
   • A single, large and central fact table and one table for each
     dimension.
   • Every fact points to one tuple in each of the dimensions
     and has additional attributes.
   • Does not capture hierarchies directly.




12/27/2012                   Tameem Ahmad                            18
Star Schema                                        (Contt…)

                            Fact Table
  Store Dimension                                        Time Dimension
                             Store Key
    Store Key                Product Key                 Period Key
    Store Name               Period Key                  Year
    City                     Units                       Quarter
    State                    Price                       Month

    Region
                             Product Key
                             Product Desc
                            Product Dimension

Benefits: Easy to understand, easy to define hierarchies, reduces no. of
          physical joins.
12/27/2012                      Tameem Ahmad                               19
Snowflake Schema
   • Variant of star schema model.
   • A single, large and central fact table and one or more tables
     for each dimension.
   • Dimension tables are normalized i.e. split dimension table
     data into additional tables




12/27/2012                  Tameem Ahmad                         20
Snowflake Schema                                   (Contt…)

   Store Dimension           Fact Table                     Time Dimension
                              Store Key                     Period Key
   Store Key
                              Product Key                   Year
   Store Name
                              Period Key                    Quarter
   City Key                   Units                         Month
                              Price
    City Dimension
  City Key
                              Product Key
  City
                              Product Desc
  State
  Region                    Product Dimension

   Drawbacks: Time consuming joins,report generation slow
12/27/2012                     Tameem Ahmad                              21
Building the Data Warehouse

   • Data Selection

   • Data Pre-processing
        – Fill missing values

        – Remove inconsistency

   • Data Transformation & Integration

   • Data Loading
             Data in warehouse is stored in form of fact tables and dimension
       tables.




12/27/2012                         Tameem Ahmad                                 22
Data Warehousing Tools
   • Data Warehouse
        – SQL Server 2000 DTS
        – Oracle 8i Warehouse Builder
   • ETL tools
        – Ab Initio
        – Informatica
                                             • Reporting tools
   • OLAP tools                                −MS Excel Pivot Chart
        – SQL Server Analysis                  −VB Applications
          Services                             −cognos,
        – Oracle Express Server                −Microstrategy,
                                               −Hyperion




12/27/2012                    Tameem Ahmad                             23
Thank You

More Related Content

What's hot

Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPDhiren Gala
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecyclebartlowe
 
Prague data management meetup 2017-02-28
Prague data management meetup 2017-02-28Prague data management meetup 2017-02-28
Prague data management meetup 2017-02-28Martin Bém
 
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)SolarWinds
 
How Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data WarehouseHow Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data Warehousemark madsen
 
Datawarehouse & bi introduction
Datawarehouse & bi introductionDatawarehouse & bi introduction
Datawarehouse & bi introductionguest7b34c2
 

What's hot (13)

Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAP
 
Inmon & kimball method
Inmon & kimball methodInmon & kimball method
Inmon & kimball method
 
DWH Concepts
DWH ConceptsDWH Concepts
DWH Concepts
 
Data vault modeling et retour d'expérience
Data vault modeling et retour d'expérienceData vault modeling et retour d'expérience
Data vault modeling et retour d'expérience
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Project+team+1 slides (2)
Project+team+1 slides (2)Project+team+1 slides (2)
Project+team+1 slides (2)
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecycle
 
Prague data management meetup 2017-02-28
Prague data management meetup 2017-02-28Prague data management meetup 2017-02-28
Prague data management meetup 2017-02-28
 
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
 
Unit 2
Unit 2Unit 2
Unit 2
 
How Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data WarehouseHow Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data Warehouse
 
Datawarehouse & bi introduction
Datawarehouse & bi introductionDatawarehouse & bi introduction
Datawarehouse & bi introduction
 

Viewers also liked

Cyclometic Criticisms
Cyclometic Criticisms Cyclometic Criticisms
Cyclometic Criticisms Tameem Ahmad
 
Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic ReasoningTameem Ahmad
 
An Introduction to Soft Computing
An Introduction to Soft ComputingAn Introduction to Soft Computing
An Introduction to Soft ComputingTameem Ahmad
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 

Viewers also liked (7)

Cyclometic Criticisms
Cyclometic Criticisms Cyclometic Criticisms
Cyclometic Criticisms
 
Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic Reasoning
 
An Introduction to Soft Computing
An Introduction to Soft ComputingAn Introduction to Soft Computing
An Introduction to Soft Computing
 
Data warehousing and Data mining
Data warehousing and Data mining Data warehousing and Data mining
Data warehousing and Data mining
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 

Similar to Introduction to Data Warehousing

Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Data Warehouse Basics
Data Warehouse BasicsData Warehouse Basics
Data Warehouse BasicsRam Kedem
 
Zen and Enterprise Architecture
Zen and Enterprise ArchitectureZen and Enterprise Architecture
Zen and Enterprise ArchitectureRichard Green
 
Managing Data Warehouse Growth in the New Era of Big Data
Managing Data Warehouse Growth in the New Era of Big DataManaging Data Warehouse Growth in the New Era of Big Data
Managing Data Warehouse Growth in the New Era of Big DataVineet
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading StrategiesMongoDB
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Soujanya V
 
Bimodal IT and EDW Modernization
Bimodal IT and EDW ModernizationBimodal IT and EDW Modernization
Bimodal IT and EDW ModernizationRobert Gleave
 
Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Empowered Holdings, LLC
 
Cust experience a practical guide 09152010
Cust experience a practical guide 09152010Cust experience a practical guide 09152010
Cust experience a practical guide 09152010ERwin Modeling
 
Webinar: How Banks Manage Reference Data with MongoDB
 Webinar: How Banks Manage Reference Data with MongoDB Webinar: How Banks Manage Reference Data with MongoDB
Webinar: How Banks Manage Reference Data with MongoDBMongoDB
 
NoSQL for the SQL Server Pro
NoSQL for the SQL Server ProNoSQL for the SQL Server Pro
NoSQL for the SQL Server ProLynn Langit
 
Knowledge Data Discovery-Dataware House.pptx
Knowledge Data Discovery-Dataware House.pptxKnowledge Data Discovery-Dataware House.pptx
Knowledge Data Discovery-Dataware House.pptxYosepKris2
 
Data Warehouse approaches with Dynamics AX
Data Warehouse  approaches with Dynamics AXData Warehouse  approaches with Dynamics AX
Data Warehouse approaches with Dynamics AXAlvin You
 
chapter9-220725121547-5ed13e4d.pdf
chapter9-220725121547-5ed13e4d.pdfchapter9-220725121547-5ed13e4d.pdf
chapter9-220725121547-5ed13e4d.pdfMahmoudSOLIMAN380726
 
‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence Management
‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence Management‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence Management
‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence ManagementAhmed Alorage
 
17783_bigdata-notes2.ppt
17783_bigdata-notes2.ppt17783_bigdata-notes2.ppt
17783_bigdata-notes2.pptHARIKRISHNANU13
 

Similar to Introduction to Data Warehousing (20)

Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Data Warehouse Basics
Data Warehouse BasicsData Warehouse Basics
Data Warehouse Basics
 
Zen and Enterprise Architecture
Zen and Enterprise ArchitectureZen and Enterprise Architecture
Zen and Enterprise Architecture
 
Managing Data Warehouse Growth in the New Era of Big Data
Managing Data Warehouse Growth in the New Era of Big DataManaging Data Warehouse Growth in the New Era of Big Data
Managing Data Warehouse Growth in the New Era of Big Data
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
 
Bimodal IT and EDW Modernization
Bimodal IT and EDW ModernizationBimodal IT and EDW Modernization
Bimodal IT and EDW Modernization
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012
 
Cust experience a practical guide 09152010
Cust experience a practical guide 09152010Cust experience a practical guide 09152010
Cust experience a practical guide 09152010
 
Webinar: How Banks Manage Reference Data with MongoDB
 Webinar: How Banks Manage Reference Data with MongoDB Webinar: How Banks Manage Reference Data with MongoDB
Webinar: How Banks Manage Reference Data with MongoDB
 
NoSQL for the SQL Server Pro
NoSQL for the SQL Server ProNoSQL for the SQL Server Pro
NoSQL for the SQL Server Pro
 
Knowledge Data Discovery-Dataware House.pptx
Knowledge Data Discovery-Dataware House.pptxKnowledge Data Discovery-Dataware House.pptx
Knowledge Data Discovery-Dataware House.pptx
 
Data Warehouse approaches with Dynamics AX
Data Warehouse  approaches with Dynamics AXData Warehouse  approaches with Dynamics AX
Data Warehouse approaches with Dynamics AX
 
chapter9-220725121547-5ed13e4d.pdf
chapter9-220725121547-5ed13e4d.pdfchapter9-220725121547-5ed13e4d.pdf
chapter9-220725121547-5ed13e4d.pdf
 
‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence Management
‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence Management‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence Management
‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence Management
 
Big Data Modeling
Big Data ModelingBig Data Modeling
Big Data Modeling
 
17783_bigdata-notes2.ppt
17783_bigdata-notes2.ppt17783_bigdata-notes2.ppt
17783_bigdata-notes2.ppt
 

Introduction to Data Warehousing

  • 1. Introduction to Data Warehousing December 20, 2012 Tameem Ahmad M.Tech. (F) ZHCET, AMU, Aligarh
  • 2. References: • “Building Data Warehouse” by Inmon (Third Edition), New York: John Wiley & Sons, (2002) • “Data Mining: Concepts and Techniques” by Han,Kamber. 2000 • http://www.data-warehouse-online.com/ [Accessed: November 4, 2012] • Data Warehousing Battle of the Giants: Comparing the Basics of the Kimball and Inmon Models: by Mary Breslin http://www.bibestpractices.com/view-articles/4768 12/27/2012 Tameem Ahmad 2
  • 3. Plan for the Presentation • Necessity of Data Warehousing. (Why it is needed?) • What is Data Warehousing? • Architecture • Schema • How to build Data Warehouse (components) • Data Warehousing Tools 12/27/2012 Tameem Ahmad 3
  • 4. ? ? ? ? Necessity is the mother of invention… Why Data Warehouse? 12/27/2012 Tameem Ahmad 4
  • 5. Scenario • ABC Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system. 12/27/2012 Tameem Ahmad 5
  • 6. Scenarion: ABC Pvt. Ltd. Mumbai Delhi Sales per item type per branch Sales for first quarter. Manager Chennai Banglore 12/27/2012 Tameem Ahmad 6 6
  • 7. Solution: ABC Pvt. Ltd.  Extract sales information from each database.  Store the information in a common repository at a single site. 12/27/2012 Tameem Ahmad 7
  • 8. Solution: ABC Pvt. Ltd. Mumbai Report Delhi Query & Sales Data Analysis tools Manager Warehouse Chennai Banglore 12/27/2012 Tameem Ahmad 8
  • 9. Data Warehousing… • Definition A data warehouse is » -subject-oriented, » -integrated, » -time-variant, » -nonvolatile collection of data in support of management’s decision making process. 12/27/2012 Tameem Ahmad 9
  • 10. Subject-oriented • Data warehouse is organized around subjects such as sales, product, customer. • It focuses on modeling and analysis of data for decision makers. • Excludes data not useful in decision support process. 12/27/2012 Tameem Ahmad 10
  • 11. Integration • Data Warehouse is constructed by integrating multiple heterogeneous sources. • Data Preprocessing are applied to ensure consistency. RDBMS Legacy Data System Warehouse Data Processing Flat File Data Transformation 12/27/2012 Tameem Ahmad 11
  • 12. Time-variant • Provides information from historical perspective e.g. past 5- 10 years 12/27/2012 Tameem Ahmad 12
  • 13. Nonvolatile • Data once recorded cannot be updated. • Data warehouse requires two operations in data accessing – Initial loading of data – Access of data load access 12/27/2012 Tameem Ahmad 13
  • 15. Data Warehousing Architecture (Contt…) • Data Warehouse server • almost always a relational DBMS, rarely flat files • OLAP servers • to support and operate on multi-dimensional data structures • Clients • Query and reporting tools • Analysis tools • Data mining tools 12/27/2012 Tameem Ahmad 15
  • 16. Data Warehousing Schema • Star Schema • Snowflake Schema 12/27/2012 Tameem Ahmad 16
  • 17. Measures & Dimensions • Measure – Units sold, Amount. • Dimensions – Product, Time, Region 12/27/2012 Tameem Ahmad 17
  • 18. Star Schema • A single, large and central fact table and one table for each dimension. • Every fact points to one tuple in each of the dimensions and has additional attributes. • Does not capture hierarchies directly. 12/27/2012 Tameem Ahmad 18
  • 19. Star Schema (Contt…) Fact Table Store Dimension Time Dimension Store Key Store Key Product Key Period Key Store Name Period Key Year City Units Quarter State Price Month Region Product Key Product Desc Product Dimension Benefits: Easy to understand, easy to define hierarchies, reduces no. of physical joins. 12/27/2012 Tameem Ahmad 19
  • 20. Snowflake Schema • Variant of star schema model. • A single, large and central fact table and one or more tables for each dimension. • Dimension tables are normalized i.e. split dimension table data into additional tables 12/27/2012 Tameem Ahmad 20
  • 21. Snowflake Schema (Contt…) Store Dimension Fact Table Time Dimension Store Key Period Key Store Key Product Key Year Store Name Period Key Quarter City Key Units Month Price City Dimension City Key Product Key City Product Desc State Region Product Dimension Drawbacks: Time consuming joins,report generation slow 12/27/2012 Tameem Ahmad 21
  • 22. Building the Data Warehouse • Data Selection • Data Pre-processing – Fill missing values – Remove inconsistency • Data Transformation & Integration • Data Loading Data in warehouse is stored in form of fact tables and dimension tables. 12/27/2012 Tameem Ahmad 22
  • 23. Data Warehousing Tools • Data Warehouse – SQL Server 2000 DTS – Oracle 8i Warehouse Builder • ETL tools – Ab Initio – Informatica • Reporting tools • OLAP tools −MS Excel Pivot Chart – SQL Server Analysis −VB Applications Services −cognos, – Oracle Express Server −Microstrategy, −Hyperion 12/27/2012 Tameem Ahmad 23