Introduction to Data Warehousing


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Introduction to Data Warehousing

  1. 1. Introduction to Data Warehousing December 20, 2012Tameem AhmadM.Tech. (F)ZHCET, AMU, Aligarh
  2. 2. References: • “Building Data Warehouse” by Inmon (Third Edition), New York: John Wiley & Sons, (2002) • “Data Mining: Concepts and Techniques” by Han,Kamber. 2000 • [Accessed: November 4, 2012] • Data Warehousing Battle of the Giants: Comparing the Basics of the Kimball and Inmon Models: by Mary Breslin Tameem Ahmad 2
  3. 3. Plan for the Presentation • Necessity of Data Warehousing. (Why it is needed?) • What is Data Warehousing? • Architecture • Schema • How to build Data Warehouse (components) • Data Warehousing Tools12/27/2012 Tameem Ahmad 3
  4. 4. ? ? ? ? Necessity is the mother of invention… Why Data Warehouse?12/27/2012 Tameem Ahmad 4
  5. 5. Scenario • ABC Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system.12/27/2012 Tameem Ahmad 5
  6. 6. Scenarion: ABC Pvt. Ltd. Mumbai Delhi Sales per item type per branch Sales for first quarter. Manager Chennai Banglore12/27/2012 Tameem Ahmad 6 6
  7. 7. Solution: ABC Pvt. Ltd.  Extract sales information from each database.  Store the information in a common repository at a single site.12/27/2012 Tameem Ahmad 7
  8. 8. Solution: ABC Pvt. Ltd. Mumbai Report Delhi Query & Sales Data Analysis tools Manager Warehouse Chennai Banglore12/27/2012 Tameem Ahmad 8
  9. 9. Data Warehousing… • Definition A data warehouse is » -subject-oriented, » -integrated, » -time-variant, » -nonvolatile collection of data in support of management’s decision making process.12/27/2012 Tameem Ahmad 9
  10. 10. Subject-oriented • Data warehouse is organized around subjects such as sales, product, customer. • It focuses on modeling and analysis of data for decision makers. • Excludes data not useful in decision support process.12/27/2012 Tameem Ahmad 10
  11. 11. Integration • Data Warehouse is constructed by integrating multiple heterogeneous sources. • Data Preprocessing are applied to ensure consistency. RDBMS Legacy Data System Warehouse Data Processing Flat File Data Transformation12/27/2012 Tameem Ahmad 11
  12. 12. Time-variant • Provides information from historical perspective e.g. past 5- 10 years12/27/2012 Tameem Ahmad 12
  13. 13. Nonvolatile • Data once recorded cannot be updated. • Data warehouse requires two operations in data accessing – Initial loading of data – Access of data load access12/27/2012 Tameem Ahmad 13
  14. 14. Data Warehousing Architecture12/27/2012 Tameem Ahmad 14
  15. 15. Data Warehousing Architecture (Contt…) • Data Warehouse server • almost always a relational DBMS, rarely flat files • OLAP servers • to support and operate on multi-dimensional data structures • Clients • Query and reporting tools • Analysis tools • Data mining tools12/27/2012 Tameem Ahmad 15
  16. 16. Data Warehousing Schema • Star Schema • Snowflake Schema12/27/2012 Tameem Ahmad 16
  17. 17. Measures & Dimensions • Measure – Units sold, Amount. • Dimensions – Product, Time, Region12/27/2012 Tameem Ahmad 17
  18. 18. Star Schema • A single, large and central fact table and one table for each dimension. • Every fact points to one tuple in each of the dimensions and has additional attributes. • Does not capture hierarchies directly.12/27/2012 Tameem Ahmad 18
  19. 19. Star Schema (Contt…) Fact Table Store Dimension Time Dimension Store Key Store Key Product Key Period Key Store Name Period Key Year City Units Quarter State Price Month Region Product Key Product Desc Product DimensionBenefits: Easy to understand, easy to define hierarchies, reduces no. of physical joins.12/27/2012 Tameem Ahmad 19
  20. 20. Snowflake Schema • Variant of star schema model. • A single, large and central fact table and one or more tables for each dimension. • Dimension tables are normalized i.e. split dimension table data into additional tables12/27/2012 Tameem Ahmad 20
  21. 21. Snowflake Schema (Contt…) Store Dimension Fact Table Time Dimension Store Key Period Key Store Key Product Key Year Store Name Period Key Quarter City Key Units Month Price City Dimension City Key Product Key City Product Desc State Region Product Dimension Drawbacks: Time consuming joins,report generation slow12/27/2012 Tameem Ahmad 21
  22. 22. Building the Data Warehouse • Data Selection • Data Pre-processing – Fill missing values – Remove inconsistency • Data Transformation & Integration • Data Loading Data in warehouse is stored in form of fact tables and dimension tables.12/27/2012 Tameem Ahmad 22
  23. 23. Data Warehousing Tools • Data Warehouse – SQL Server 2000 DTS – Oracle 8i Warehouse Builder • ETL tools – Ab Initio – Informatica • Reporting tools • OLAP tools −MS Excel Pivot Chart – SQL Server Analysis −VB Applications Services −cognos, – Oracle Express Server −Microstrategy, −Hyperion12/27/2012 Tameem Ahmad 23
  24. 24. Thank You