Your SlideShare is downloading. ×
  • Like
Data warehousing
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Data warehousing

  • 614 views
Published

This is the initial one abouit datawarehousing

This is the initial one abouit datawarehousing

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
614
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
49
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • we invent something only if there is a need for that thing….today we are going to see what data warehousing is…data warehouse is evolved to satisfy some needs….we will see some of these need now
  • When we need to extract data from various sources, some may be manually maintained on paper, some on different legacy systems and integrating the data is a laborious work. Many systems provide some DTS systems to convert data in appropriate format and provide necessary transformations
  • We need subject oriented and multidimensional data amodel fro data warehouse which facilitates online analysis

Transcript

  • 1. Introduction to Data Warehousing By MAHESH.AMPOLU
  • 2. Necessity is the mother of invention Why Data Warehouse?
  • 3. ScenarioUnilever is a company with branches at UK,India, America and Japan. The Sales Managerwants quarterly sales report. Each branch has aseparate operational system.
  • 4. Scenario 1 : Unilever company. India UK Sales per item type per branch Sales for first quarter. Manager America Japan
  • 5. Solution : Unilever company. Extract sales information from each database. Store the information in a common repository at a single site.
  • 6. Solution : Unilever company. India Report UK Query & Sales Data Analysis tools Manager WarehouseAmerica Japan
  • 7. Scenario :Hindustan Unilever is a small,new company.President of the company wants his company shouldgrow. He needs information so that he can makecorrect decisions.
  • 8. Solution : Improve the quality of data before loading it into the warehouse. Perform data cleaning and transformation before loading the data. Use query analysis tools to support adhoc queries.
  • 9. Solution Expansio n sales Data Query and Analysis President Warehouse tool time Improvemen t
  • 10. What is Data Warehouse??
  • 11. Inmons’s definition : A data warehouse is -subject-oriented, -integrated, -time-variant, -nonvolatilecollection of data in support of management’sdecision making process.
  • 12. Subject-oriented Data warehouse is organized around subjects such as sales,product,customer. It focuses on modeling and analysis of data for decision makers. Excludes data not useful in decision support process.
  • 13. Integration Data Warehouse is constructed by integrating multiple heterogeneous sources. Data Preprocessing are applied to ensure consistency. RDBMS Data Legacy Warehouse System Flat File Data Processing Data Transformation
  • 14. Integration In terms of data. – encoding structures. – Measurement of attributes. – physical attribute. of data remarks – naming conventions. – Data type format
  • 15. Time-variant Provides information from historical perspective e.g. past 5-10 years Every key structure contains either implicitly or explicitly an element of time
  • 16. Nonvolatile Data once recorded cannot be updated. Data warehouse requires two operations in data accessing – Initial loading of data – Access of data load access
  • 17. Data Warehousing Architecture
  • 18. Data Warehouse Architecture Data Warehouse server – almost always a relational DBMS,rarely flat files OLAP servers – to support and operate on multi-dimensional data structures Clients – Query and reporting tools – Analysis tools – Data mining tools
  • 19. OLTP vs OLAP
  • 20. Data Warehouse Schema Star Schema Fact Constellation Schema Snowflake Schema
  • 21. Star SchemaA star schema consists of at least one fact table and a number of dimension tables. Star Schema is highly recommended schema for SSAS cubes.
  • 22. Star SchemaStore Dimension Fact Table Time Dimension Store Key Store Key Period Key Store Name Product Key Year City Period Key Quarter State Units Month Region Price Product Key Product Desc Product DimensionBenefits: Easy to understand, easy to define hierarchies, reducesno. of physicaljoins.
  • 23. SnowFlake Schema Variant of star schema model. A single,large and central fact table and one or more tables for each dimension. Dimension tables are normalized i.e. split dimension table data into additional tables
  • 24. SnowFlake SchemaStore Dimension Fact Table Time Dimension Store Key Period KeyStore Key Product Key YearStore Name Period Key QuarterCity Key Units Month Price City DimensionCity Key Product KeyCity Product DescStateRegion Product DimensionDrawbacks: Time consuming joins,report generation slow
  • 25. Fact Constellation Multiple fact tables share dimension tables. This schema is viewed as collection of stars hence called galaxy schema or fact constellation. Sophisticated application requires such schema.
  • 26. Fact Constellation Sales Shipping Fact Table Product Dimension Fact TableStore Key Shipper KeyProduct Key Product Key Store KeyPeriod Key Product Desc Product KeyUnits Period KeyPrice Units Price Store Dimension Store Key Store Name City State Region
  • 27. Fact Constellation Sales Shipping Fact Table Product Dimension Fact TableStore Key Shipper KeyProduct Key Product Key Store KeyPeriod Key Product Desc Product KeyUnits Period KeyPrice Units Price Store Dimension Store Key Store Name City State Region
  • 28. Building Data Warehouse Data Selection Data Preprocessing – Fill missing values – Remove inconsistency Data Transformation & Integration Data Loading Data in warehouse is stored in form of fact tables and dimension tables.
  • 29. Case Study Unilever is a new company which produces soaps,paste and baverages products with production unit located at NA. There products are sold in North,North West and Western region of India. They have sales units at India, America , UK and Japan. The President of the company wants sales information.
  • 30. Sales InformationReport: The number of units sold.113Report: The number of units sold over time and date January February March April 14 41 33 25
  • 31. Sales InformationReport : The number of items sold for each product withtime Jan Feb Mar AprSoaps 6 17Paste 6 16 6 8 Timebread 8 25 21 Product
  • 32. Sales InformationReport: The number of items sold in each country for eachproduct with time Jan Feb Mar AprIndia Soaps 3 10 City Paste 3 16 6 bread 4 16 6 TimeUK soaps 3 7 paste 3 8 Product bread 4 9 15
  • 33. Sales InformationReport: The number of items sold and income in each region foreach product with time. Jan Feb Mar Apr Rs U Rs U Rs U Rs U India Soaps 7.44 3 24.80 10 Paste 7.95 3 42.40 16 15.90 6 bread 7.32 4 29.98 16 10.98 6 UK Soaps 7.44 3 17.36 7 paste 7.95 3 21.20 8 bread 7.32 4 16.47 9 27.45 15
  • 34. Sales Measures & Dimensions Measure – Units sold, Amount. Dimensions – Product,Time,Region.
  • 35. Sales Data Warehouse ModelFact TableCountry Product Month Units RupeesIndia Soaps January 3 7.95India Paste January 4 7.32UK Soaps January 3 7.95UK Paste January 4 7.32India Bread February 16 42.40
  • 36. Sales Data Warehouse Model City_ID Prod_ID Month Units Rupees 1 589 1/1/1998 3 7.95 1 1218 1/1/1998 4 7.32 2 589 1/1/1998 3 7.95 2 1218 1/1/1998 4 7.32 1 589 2/1/1998 16 42.40
  • 37. Sales Data Warehouse ModelProduct Dimension Tables Prod_ID Product_Name Product_Category_ID 589 Soaps 1 590 Paste 1 288 Bread 2 Product_Category_Id Product_Category 1 domestic 2 food
  • 38. Sales Data Warehouse ModelRegion Dimension TableCity_ID City Region Country1 India West India2 UK NorthWest India
  • 39. Sales Data Warehouse Model Time Product Sales Fact Product Category Region
  • 40. Data Warehousing includes Build Data Warehouse Online analysis processing(OLAP). Presentation. Cleaning ,Selection & IntegrationRDBMS PresentationFlat File Client Warehouse & OLAP server
  • 41. Thank You