Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Introduction to Data Warehousing            By                 MAHESH.AMPOLU
Necessity is the mother of invention            Why Data Warehouse?
ScenarioUnilever is a company with branches at UK,India, America and Japan. The Sales Managerwants quarterly sales report....
Scenario 1 : Unilever company.   India    UK            Sales per item type per branch    Sales                   for firs...
Solution :      Unilever company. Extract sales information from each database. Store the information in a common reposi...
Solution : Unilever company. India                                         Report  UK                          Query &    ...
Scenario :Hindustan Unilever is a small,new company.President of the company wants his company shouldgrow. He needs inform...
Solution : Improve  the quality of data before loading it  into the warehouse. Perform data cleaning and transformation ...
Solution                                                  Expansio                                                     n  ...
What is Data Warehouse??
Inmons’s definition : A data warehouse is       -subject-oriented,       -integrated,       -time-variant,       -nonvolat...
Subject-oriented Data  warehouse is organized around subjects such  as sales,product,customer. It focuses on modeling an...
Integration Data Warehouse is constructed by integrating  multiple heterogeneous sources. Data Preprocessing are applied...
Integration In   terms of data.  – encoding structures.  – Measurement of       attributes.  – physical attribute.       ...
Time-variant Provides  information from historical perspective  e.g. past 5-10 years Every key structure contains either...
Nonvolatile Data  once recorded cannot be updated. Data warehouse requires two operations in data  accessing   – Initial...
Data Warehousing Architecture
Data Warehouse Architecture Data  Warehouse server  – almost always a relational DBMS,rarely flat     files OLAP servers...
OLTP vs OLAP
Data Warehouse Schema Star Schema Fact Constellation Schema Snowflake Schema
Star SchemaA star schema consists of at least one fact table and a number of dimension tables. Star     Schema is highly...
Star SchemaStore Dimension           Fact Table                   Time Dimension Store Key                 Store Key      ...
SnowFlake Schema Variant of star schema model. A single,large and central fact table and one  or more tables for each di...
SnowFlake SchemaStore Dimension             Fact Table                   Time Dimension                             Store ...
Fact Constellation Multiple fact tables share dimension tables. This schema is viewed as collection of stars  hence call...
Fact Constellation    Sales                              Shipping    Fact Table     Product                   Dimension   ...
Fact Constellation    Sales                              Shipping    Fact Table     Product                   Dimension   ...
Building Data Warehouse Data Selection Data Preprocessing  – Fill missing values  – Remove inconsistency Data Transform...
Case Study  Unilever is a new company which produces  soaps,paste and baverages products with  production unit located at...
Sales InformationReport: The number of units sold.113Report: The number of units sold over time and date January       Feb...
Sales InformationReport : The number of items sold for each product withtime             Jan Feb Mar AprSoaps             ...
Sales InformationReport: The number of items sold in each country for eachproduct with time                     Jan   Feb ...
Sales InformationReport: The number of items sold and income in each region foreach product with time.                    ...
Sales Measures & Dimensions Measure – Units sold, Amount. Dimensions – Product,Time,Region.
Sales Data Warehouse ModelFact TableCountry      Product   Month      Units   RupeesIndia        Soaps     January    3   ...
Sales Data Warehouse Model  City_ID Prod_ID   Month      Units   Rupees  1       589       1/1/1998   3       7.95  1     ...
Sales Data Warehouse ModelProduct Dimension Tables  Prod_ID     Product_Name        Product_Category_ID  589         Soaps...
Sales Data Warehouse ModelRegion Dimension TableCity_ID       City       Region      Country1             India      West ...
Sales Data Warehouse Model   Time                                 Product          Sales Fact   Product                   ...
Data Warehousing includes Build Data Warehouse Online analysis processing(OLAP). Presentation.            Cleaning ,Sel...
Thank You
Data warehousing
Upcoming SlideShare
Loading in …5
×

Data warehousing

1,443 views

Published on

This is the initial one abouit datawarehousing

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Data warehousing

  1. 1. Introduction to Data Warehousing By MAHESH.AMPOLU
  2. 2. Necessity is the mother of invention Why Data Warehouse?
  3. 3. ScenarioUnilever is a company with branches at UK,India, America and Japan. The Sales Managerwants quarterly sales report. Each branch has aseparate operational system.
  4. 4. Scenario 1 : Unilever company. India UK Sales per item type per branch Sales for first quarter. Manager America Japan
  5. 5. Solution : Unilever company. Extract sales information from each database. Store the information in a common repository at a single site.
  6. 6. Solution : Unilever company. India Report UK Query & Sales Data Analysis tools Manager WarehouseAmerica Japan
  7. 7. Scenario :Hindustan Unilever is a small,new company.President of the company wants his company shouldgrow. He needs information so that he can makecorrect decisions.
  8. 8. Solution : Improve the quality of data before loading it into the warehouse. Perform data cleaning and transformation before loading the data. Use query analysis tools to support adhoc queries.
  9. 9. Solution Expansio n sales Data Query and Analysis President Warehouse tool time Improvemen t
  10. 10. What is Data Warehouse??
  11. 11. Inmons’s definition : A data warehouse is -subject-oriented, -integrated, -time-variant, -nonvolatilecollection of data in support of management’sdecision making process.
  12. 12. Subject-oriented Data warehouse is organized around subjects such as sales,product,customer. It focuses on modeling and analysis of data for decision makers. Excludes data not useful in decision support process.
  13. 13. Integration Data Warehouse is constructed by integrating multiple heterogeneous sources. Data Preprocessing are applied to ensure consistency. RDBMS Data Legacy Warehouse System Flat File Data Processing Data Transformation
  14. 14. Integration In terms of data. – encoding structures. – Measurement of attributes. – physical attribute. of data remarks – naming conventions. – Data type format
  15. 15. Time-variant Provides information from historical perspective e.g. past 5-10 years Every key structure contains either implicitly or explicitly an element of time
  16. 16. Nonvolatile Data once recorded cannot be updated. Data warehouse requires two operations in data accessing – Initial loading of data – Access of data load access
  17. 17. Data Warehousing Architecture
  18. 18. Data Warehouse Architecture Data Warehouse server – almost always a relational DBMS,rarely flat files OLAP servers – to support and operate on multi-dimensional data structures Clients – Query and reporting tools – Analysis tools – Data mining tools
  19. 19. OLTP vs OLAP
  20. 20. Data Warehouse Schema Star Schema Fact Constellation Schema Snowflake Schema
  21. 21. Star SchemaA star schema consists of at least one fact table and a number of dimension tables. Star Schema is highly recommended schema for SSAS cubes.
  22. 22. Star SchemaStore Dimension Fact Table Time Dimension Store Key Store Key Period Key Store Name Product Key Year City Period Key Quarter State Units Month Region Price Product Key Product Desc Product DimensionBenefits: Easy to understand, easy to define hierarchies, reducesno. of physicaljoins.
  23. 23. SnowFlake Schema Variant of star schema model. A single,large and central fact table and one or more tables for each dimension. Dimension tables are normalized i.e. split dimension table data into additional tables
  24. 24. SnowFlake SchemaStore Dimension Fact Table Time Dimension Store Key Period KeyStore Key Product Key YearStore Name Period Key QuarterCity Key Units Month Price City DimensionCity Key Product KeyCity Product DescStateRegion Product DimensionDrawbacks: Time consuming joins,report generation slow
  25. 25. Fact Constellation Multiple fact tables share dimension tables. This schema is viewed as collection of stars hence called galaxy schema or fact constellation. Sophisticated application requires such schema.
  26. 26. Fact Constellation Sales Shipping Fact Table Product Dimension Fact TableStore Key Shipper KeyProduct Key Product Key Store KeyPeriod Key Product Desc Product KeyUnits Period KeyPrice Units Price Store Dimension Store Key Store Name City State Region
  27. 27. Fact Constellation Sales Shipping Fact Table Product Dimension Fact TableStore Key Shipper KeyProduct Key Product Key Store KeyPeriod Key Product Desc Product KeyUnits Period KeyPrice Units Price Store Dimension Store Key Store Name City State Region
  28. 28. Building Data Warehouse Data Selection Data Preprocessing – Fill missing values – Remove inconsistency Data Transformation & Integration Data Loading Data in warehouse is stored in form of fact tables and dimension tables.
  29. 29. Case Study Unilever is a new company which produces soaps,paste and baverages products with production unit located at NA. There products are sold in North,North West and Western region of India. They have sales units at India, America , UK and Japan. The President of the company wants sales information.
  30. 30. Sales InformationReport: The number of units sold.113Report: The number of units sold over time and date January February March April 14 41 33 25
  31. 31. Sales InformationReport : The number of items sold for each product withtime Jan Feb Mar AprSoaps 6 17Paste 6 16 6 8 Timebread 8 25 21 Product
  32. 32. Sales InformationReport: The number of items sold in each country for eachproduct with time Jan Feb Mar AprIndia Soaps 3 10 City Paste 3 16 6 bread 4 16 6 TimeUK soaps 3 7 paste 3 8 Product bread 4 9 15
  33. 33. Sales InformationReport: The number of items sold and income in each region foreach product with time. Jan Feb Mar Apr Rs U Rs U Rs U Rs U India Soaps 7.44 3 24.80 10 Paste 7.95 3 42.40 16 15.90 6 bread 7.32 4 29.98 16 10.98 6 UK Soaps 7.44 3 17.36 7 paste 7.95 3 21.20 8 bread 7.32 4 16.47 9 27.45 15
  34. 34. Sales Measures & Dimensions Measure – Units sold, Amount. Dimensions – Product,Time,Region.
  35. 35. Sales Data Warehouse ModelFact TableCountry Product Month Units RupeesIndia Soaps January 3 7.95India Paste January 4 7.32UK Soaps January 3 7.95UK Paste January 4 7.32India Bread February 16 42.40
  36. 36. Sales Data Warehouse Model City_ID Prod_ID Month Units Rupees 1 589 1/1/1998 3 7.95 1 1218 1/1/1998 4 7.32 2 589 1/1/1998 3 7.95 2 1218 1/1/1998 4 7.32 1 589 2/1/1998 16 42.40
  37. 37. Sales Data Warehouse ModelProduct Dimension Tables Prod_ID Product_Name Product_Category_ID 589 Soaps 1 590 Paste 1 288 Bread 2 Product_Category_Id Product_Category 1 domestic 2 food
  38. 38. Sales Data Warehouse ModelRegion Dimension TableCity_ID City Region Country1 India West India2 UK NorthWest India
  39. 39. Sales Data Warehouse Model Time Product Sales Fact Product Category Region
  40. 40. Data Warehousing includes Build Data Warehouse Online analysis processing(OLAP). Presentation. Cleaning ,Selection & IntegrationRDBMS PresentationFlat File Client Warehouse & OLAP server
  41. 41. Thank You

×