Introduction to Data Warehousing BY G.KIRAN KUMAR HT.NO:001-09-06-002
Why Data Warehouse? Necessity is the mother of invention
Scenario 1 GK Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system.
Scenario 1 : GK Pvt Ltd. Mumbai Delhi Chennai Banglore Sales Manager Sales per item type per branch for first quarter.
Solution 1:GK Pvt Ltd. Extract sales information from each database. Store the information in a common repository at a single site.
Solution 1:GK Pvt Ltd. Mumbai Delhi Chennai Banglore Data Warehouse Sales Manager Query & Analysis tools Report
Scenario 2 One Software company has huge  operational database. Whenever Executives wants some report the OLTP system becomes  slow and data entry operators have to wait  for some time.
Scenario 2 : One Software Company Operational Database Data Entry Operator Data Entry Operator Management Wait Report
Solution 2 Extract data needed for analysis from operational database. Store it in warehouse. Refresh warehouse at regular interval so that it contains up to date information for analysis. Warehouse will contain data with historical perspective.
Solution 2 Operational database Data Warehouse Extract data Data Entry Operator Data Entry Operator Manager Report Transaction
Scenario 3 Cakes & Cookies is a small, new company. President of the company wants his company should grow. He needs information so that he can make correct decisions.
Solution 3 Improve the quality of data before loading it into the warehouse. Perform data cleaning and transformation before loading the data. Use query analysis tools to support adhoc queries.
Solution 3 Query and Analysis tool President Expansion Improvement sales time Data Warehouse
What is Data Warehouse??
Inmons’s definition A data warehouse is -subject-oriented, -integrated, -time-variant, -nonvolatile collection of data in support of management’s decision making process.
Subject-oriented Data warehouse is organized around subjects such as sales,product,customer. It focuses on modeling and analysis of data for decision makers. Excludes data not useful in decision support process.
Integration Data Warehouse is constructed by integrating multiple heterogeneous sources. Data Preprocessing are applied to ensure consistency. RDBMS Legacy System Data Warehouse Flat File Data Processing Data Transformation
Time-variant Provides information from historical perspective e.g. past 5-10 years Every key structure contains either implicitly or explicitly an element of time
Nonvolatile Data once recorded cannot be updated. Data warehouse requires two operations in data accessing Initial loading of data Access of data load access
Data Warehousing Architecture
Data Warehouse Architecture Data Warehouse server almost always a relational DBMS,rarely flat files OLAP servers to support and operate on multi-dimensional data structures Clients Query and reporting tools Analysis tools Data mining tools
Data Warehouse Schema Star Schema Fact Constellation Schema Snowflake Schema
Star Schema A single,large and central fact table and one table for each dimension. Every fact points to one tuple in each of the dimensions and has additional attributes. Does not capture hierarchies directly.
Star Schema (contd..) Store Dimension Time Dimension Product Dimension Fact Table Benefits:  Easy to understand, easy to define hierarchies, reduces no. of physical joins. Store Key Product Key Period Key Units Price Store Key Store Name City State Region Period Key Year Quarter Month Product Key Product Desc
SnowFlake Schema Variant of star schema model. A single, large and central fact table and one or more tables for each dimension. Dimension tables are normalized i.e. split dimension table data into additional tables
SnowFlake Schema (contd..) Time Dimension Product Dimension Fact Table City Dimension Store Dimension Drawbacks: Time consuming joins,report generation slow Store Key Product Key Period Key Units Price Store Key Store Name City Key Period Key Year Quarter Month Product Key Product Desc City Key City State Region
Fact Constellation Multiple fact tables share dimension tables. This schema is viewed as collection of stars hence called galaxy schema or fact constellation. Sophisticated application requires such schema.
Fact Constellation (contd..) Store Dimension Product Dimension Sales Fact Table Shipping Fact Table Store Key Product Key Period Key Units Price Store Key Store Name City State Region Product Key Product Desc Shipper Key Store Key Product Key Period Key Units Price
Building Data Warehouse Data Selection Data Preprocessing Fill missing values Remove inconsistency Data Transformation & Integration Data Loading Data in warehouse is stored in form of fact tables and dimension tables.
Data Warehousing includes Build Data Warehouse Online analysis processing(OLAP). Presentation. RDBMS Flat File Presentation Cleaning ,Selection & Integration Warehouse & OLAP server Client
Need for Data Warehousing Industry has huge amount of operational data Knowledge worker wants to turn this data into useful information. This information is used by them to support strategic decision making . It is a platform for consolidated historical data for analysis. It stores data of good quality so that knowledge worker can make correct decisions.
Advantages Of Data Warehouse There are many advantages to using a data warehouse, some of them are:  Enhances end-user access to a wide variety of data  Business decision makers can obtain various kinds of trend reports e.g. the item with the most sales in a particular area / country for the last two years  Increased data consistency  Providing a place to combine related data from separate sources
Disadvantages Of Data warehouse Data owners lose control over their data, raising ownership (responsibility and accountability), security and privacy issues  Adding new data sources takes time and associated high cost Limited flexibility of use and types of users - requires multiple separate data marts for multiple uses and types of users
CONCLUSION A parallel was made between Operational Systems and Data Warehouse Systems to show their differences mainly in the objectives and type of data that each one deals   Data warehouse is the technology for the future. data warehouse enables knowledge worker to make faster and better decisions
References Building Data Warehouse by Inmon Data Mining:Concepts and Techniques by Han,Kamber. www.dwinfocenter.org www.datawarehousingonline.com www.billinmon.com
Thank You

introduction to datawarehouse

  • 1.
    Introduction to DataWarehousing BY G.KIRAN KUMAR HT.NO:001-09-06-002
  • 2.
    Why Data Warehouse?Necessity is the mother of invention
  • 3.
    Scenario 1 GKPvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system.
  • 4.
    Scenario 1 :GK Pvt Ltd. Mumbai Delhi Chennai Banglore Sales Manager Sales per item type per branch for first quarter.
  • 5.
    Solution 1:GK PvtLtd. Extract sales information from each database. Store the information in a common repository at a single site.
  • 6.
    Solution 1:GK PvtLtd. Mumbai Delhi Chennai Banglore Data Warehouse Sales Manager Query & Analysis tools Report
  • 7.
    Scenario 2 OneSoftware company has huge operational database. Whenever Executives wants some report the OLTP system becomes slow and data entry operators have to wait for some time.
  • 8.
    Scenario 2 :One Software Company Operational Database Data Entry Operator Data Entry Operator Management Wait Report
  • 9.
    Solution 2 Extractdata needed for analysis from operational database. Store it in warehouse. Refresh warehouse at regular interval so that it contains up to date information for analysis. Warehouse will contain data with historical perspective.
  • 10.
    Solution 2 Operationaldatabase Data Warehouse Extract data Data Entry Operator Data Entry Operator Manager Report Transaction
  • 11.
    Scenario 3 Cakes& Cookies is a small, new company. President of the company wants his company should grow. He needs information so that he can make correct decisions.
  • 12.
    Solution 3 Improvethe quality of data before loading it into the warehouse. Perform data cleaning and transformation before loading the data. Use query analysis tools to support adhoc queries.
  • 13.
    Solution 3 Queryand Analysis tool President Expansion Improvement sales time Data Warehouse
  • 14.
    What is DataWarehouse??
  • 15.
    Inmons’s definition Adata warehouse is -subject-oriented, -integrated, -time-variant, -nonvolatile collection of data in support of management’s decision making process.
  • 16.
    Subject-oriented Data warehouseis organized around subjects such as sales,product,customer. It focuses on modeling and analysis of data for decision makers. Excludes data not useful in decision support process.
  • 17.
    Integration Data Warehouseis constructed by integrating multiple heterogeneous sources. Data Preprocessing are applied to ensure consistency. RDBMS Legacy System Data Warehouse Flat File Data Processing Data Transformation
  • 18.
    Time-variant Provides informationfrom historical perspective e.g. past 5-10 years Every key structure contains either implicitly or explicitly an element of time
  • 19.
    Nonvolatile Data oncerecorded cannot be updated. Data warehouse requires two operations in data accessing Initial loading of data Access of data load access
  • 20.
  • 21.
    Data Warehouse ArchitectureData Warehouse server almost always a relational DBMS,rarely flat files OLAP servers to support and operate on multi-dimensional data structures Clients Query and reporting tools Analysis tools Data mining tools
  • 22.
    Data Warehouse SchemaStar Schema Fact Constellation Schema Snowflake Schema
  • 23.
    Star Schema Asingle,large and central fact table and one table for each dimension. Every fact points to one tuple in each of the dimensions and has additional attributes. Does not capture hierarchies directly.
  • 24.
    Star Schema (contd..)Store Dimension Time Dimension Product Dimension Fact Table Benefits: Easy to understand, easy to define hierarchies, reduces no. of physical joins. Store Key Product Key Period Key Units Price Store Key Store Name City State Region Period Key Year Quarter Month Product Key Product Desc
  • 25.
    SnowFlake Schema Variantof star schema model. A single, large and central fact table and one or more tables for each dimension. Dimension tables are normalized i.e. split dimension table data into additional tables
  • 26.
    SnowFlake Schema (contd..)Time Dimension Product Dimension Fact Table City Dimension Store Dimension Drawbacks: Time consuming joins,report generation slow Store Key Product Key Period Key Units Price Store Key Store Name City Key Period Key Year Quarter Month Product Key Product Desc City Key City State Region
  • 27.
    Fact Constellation Multiplefact tables share dimension tables. This schema is viewed as collection of stars hence called galaxy schema or fact constellation. Sophisticated application requires such schema.
  • 28.
    Fact Constellation (contd..)Store Dimension Product Dimension Sales Fact Table Shipping Fact Table Store Key Product Key Period Key Units Price Store Key Store Name City State Region Product Key Product Desc Shipper Key Store Key Product Key Period Key Units Price
  • 29.
    Building Data WarehouseData Selection Data Preprocessing Fill missing values Remove inconsistency Data Transformation & Integration Data Loading Data in warehouse is stored in form of fact tables and dimension tables.
  • 30.
    Data Warehousing includesBuild Data Warehouse Online analysis processing(OLAP). Presentation. RDBMS Flat File Presentation Cleaning ,Selection & Integration Warehouse & OLAP server Client
  • 31.
    Need for DataWarehousing Industry has huge amount of operational data Knowledge worker wants to turn this data into useful information. This information is used by them to support strategic decision making . It is a platform for consolidated historical data for analysis. It stores data of good quality so that knowledge worker can make correct decisions.
  • 32.
    Advantages Of DataWarehouse There are many advantages to using a data warehouse, some of them are: Enhances end-user access to a wide variety of data Business decision makers can obtain various kinds of trend reports e.g. the item with the most sales in a particular area / country for the last two years Increased data consistency Providing a place to combine related data from separate sources
  • 33.
    Disadvantages Of Datawarehouse Data owners lose control over their data, raising ownership (responsibility and accountability), security and privacy issues Adding new data sources takes time and associated high cost Limited flexibility of use and types of users - requires multiple separate data marts for multiple uses and types of users
  • 34.
    CONCLUSION A parallelwas made between Operational Systems and Data Warehouse Systems to show their differences mainly in the objectives and type of data that each one deals Data warehouse is the technology for the future. data warehouse enables knowledge worker to make faster and better decisions
  • 35.
    References Building DataWarehouse by Inmon Data Mining:Concepts and Techniques by Han,Kamber. www.dwinfocenter.org www.datawarehousingonline.com www.billinmon.com
  • 36.

Editor's Notes

  • #3 we invent something only if there is a need for that thing….today we are going to see what data warehousing is…data warehouse is evolved to satisfy some needs….we will see some of these need now
  • #23 We need subject oriented and multidimensional data amodel fro data warehouse which facilitates online analysis