Data warehousing


Published on

This is the initial one abouit datawarehousing

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • we invent something only if there is a need for that thing….today we are going to see what data warehousing is…data warehouse is evolved to satisfy some needs….we will see some of these need now
  • When we need to extract data from various sources, some may be manually maintained on paper, some on different legacy systems and integrating the data is a laborious work. Many systems provide some DTS systems to convert data in appropriate format and provide necessary transformations
  • We need subject oriented and multidimensional data amodel fro data warehouse which facilitates online analysis
  • Data warehousing

    1. 1. Introduction to Data Warehousing By MAHESH.AMPOLU
    2. 2. Necessity is the mother of invention Why Data Warehouse?
    3. 3. ScenarioUnilever is a company with branches at UK,India, America and Japan. The Sales Managerwants quarterly sales report. Each branch has aseparate operational system.
    4. 4. Scenario 1 : Unilever company. India UK Sales per item type per branch Sales for first quarter. Manager America Japan
    5. 5. Solution : Unilever company. Extract sales information from each database. Store the information in a common repository at a single site.
    6. 6. Solution : Unilever company. India Report UK Query & Sales Data Analysis tools Manager WarehouseAmerica Japan
    7. 7. Scenario :Hindustan Unilever is a small,new company.President of the company wants his company shouldgrow. He needs information so that he can makecorrect decisions.
    8. 8. Solution : Improve the quality of data before loading it into the warehouse. Perform data cleaning and transformation before loading the data. Use query analysis tools to support adhoc queries.
    9. 9. Solution Expansio n sales Data Query and Analysis President Warehouse tool time Improvemen t
    10. 10. What is Data Warehouse??
    11. 11. Inmons’s definition : A data warehouse is -subject-oriented, -integrated, -time-variant, -nonvolatilecollection of data in support of management’sdecision making process.
    12. 12. Subject-oriented Data warehouse is organized around subjects such as sales,product,customer. It focuses on modeling and analysis of data for decision makers. Excludes data not useful in decision support process.
    13. 13. Integration Data Warehouse is constructed by integrating multiple heterogeneous sources. Data Preprocessing are applied to ensure consistency. RDBMS Data Legacy Warehouse System Flat File Data Processing Data Transformation
    14. 14. Integration In terms of data. – encoding structures. – Measurement of attributes. – physical attribute. of data remarks – naming conventions. – Data type format
    15. 15. Time-variant Provides information from historical perspective e.g. past 5-10 years Every key structure contains either implicitly or explicitly an element of time
    16. 16. Nonvolatile Data once recorded cannot be updated. Data warehouse requires two operations in data accessing – Initial loading of data – Access of data load access
    17. 17. Data Warehousing Architecture
    18. 18. Data Warehouse Architecture Data Warehouse server – almost always a relational DBMS,rarely flat files OLAP servers – to support and operate on multi-dimensional data structures Clients – Query and reporting tools – Analysis tools – Data mining tools
    19. 19. OLTP vs OLAP
    20. 20. Data Warehouse Schema Star Schema Fact Constellation Schema Snowflake Schema
    21. 21. Star SchemaA star schema consists of at least one fact table and a number of dimension tables. Star Schema is highly recommended schema for SSAS cubes.
    22. 22. Star SchemaStore Dimension Fact Table Time Dimension Store Key Store Key Period Key Store Name Product Key Year City Period Key Quarter State Units Month Region Price Product Key Product Desc Product DimensionBenefits: Easy to understand, easy to define hierarchies, reducesno. of physicaljoins.
    23. 23. SnowFlake Schema Variant of star schema model. A single,large and central fact table and one or more tables for each dimension. Dimension tables are normalized i.e. split dimension table data into additional tables
    24. 24. SnowFlake SchemaStore Dimension Fact Table Time Dimension Store Key Period KeyStore Key Product Key YearStore Name Period Key QuarterCity Key Units Month Price City DimensionCity Key Product KeyCity Product DescStateRegion Product DimensionDrawbacks: Time consuming joins,report generation slow
    25. 25. Fact Constellation Multiple fact tables share dimension tables. This schema is viewed as collection of stars hence called galaxy schema or fact constellation. Sophisticated application requires such schema.
    26. 26. Fact Constellation Sales Shipping Fact Table Product Dimension Fact TableStore Key Shipper KeyProduct Key Product Key Store KeyPeriod Key Product Desc Product KeyUnits Period KeyPrice Units Price Store Dimension Store Key Store Name City State Region
    27. 27. Fact Constellation Sales Shipping Fact Table Product Dimension Fact TableStore Key Shipper KeyProduct Key Product Key Store KeyPeriod Key Product Desc Product KeyUnits Period KeyPrice Units Price Store Dimension Store Key Store Name City State Region
    28. 28. Building Data Warehouse Data Selection Data Preprocessing – Fill missing values – Remove inconsistency Data Transformation & Integration Data Loading Data in warehouse is stored in form of fact tables and dimension tables.
    29. 29. Case Study Unilever is a new company which produces soaps,paste and baverages products with production unit located at NA. There products are sold in North,North West and Western region of India. They have sales units at India, America , UK and Japan. The President of the company wants sales information.
    30. 30. Sales InformationReport: The number of units sold.113Report: The number of units sold over time and date January February March April 14 41 33 25
    31. 31. Sales InformationReport : The number of items sold for each product withtime Jan Feb Mar AprSoaps 6 17Paste 6 16 6 8 Timebread 8 25 21 Product
    32. 32. Sales InformationReport: The number of items sold in each country for eachproduct with time Jan Feb Mar AprIndia Soaps 3 10 City Paste 3 16 6 bread 4 16 6 TimeUK soaps 3 7 paste 3 8 Product bread 4 9 15
    33. 33. Sales InformationReport: The number of items sold and income in each region foreach product with time. Jan Feb Mar Apr Rs U Rs U Rs U Rs U India Soaps 7.44 3 24.80 10 Paste 7.95 3 42.40 16 15.90 6 bread 7.32 4 29.98 16 10.98 6 UK Soaps 7.44 3 17.36 7 paste 7.95 3 21.20 8 bread 7.32 4 16.47 9 27.45 15
    34. 34. Sales Measures & Dimensions Measure – Units sold, Amount. Dimensions – Product,Time,Region.
    35. 35. Sales Data Warehouse ModelFact TableCountry Product Month Units RupeesIndia Soaps January 3 7.95India Paste January 4 7.32UK Soaps January 3 7.95UK Paste January 4 7.32India Bread February 16 42.40
    36. 36. Sales Data Warehouse Model City_ID Prod_ID Month Units Rupees 1 589 1/1/1998 3 7.95 1 1218 1/1/1998 4 7.32 2 589 1/1/1998 3 7.95 2 1218 1/1/1998 4 7.32 1 589 2/1/1998 16 42.40
    37. 37. Sales Data Warehouse ModelProduct Dimension Tables Prod_ID Product_Name Product_Category_ID 589 Soaps 1 590 Paste 1 288 Bread 2 Product_Category_Id Product_Category 1 domestic 2 food
    38. 38. Sales Data Warehouse ModelRegion Dimension TableCity_ID City Region Country1 India West India2 UK NorthWest India
    39. 39. Sales Data Warehouse Model Time Product Sales Fact Product Category Region
    40. 40. Data Warehousing includes Build Data Warehouse Online analysis processing(OLAP). Presentation. Cleaning ,Selection & IntegrationRDBMS PresentationFlat File Client Warehouse & OLAP server
    41. 41. Thank You