introduction to datawarehouse

7,928 views

Published on

Published in: Sports, Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
7,928
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
370
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • we invent something only if there is a need for that thing….today we are going to see what data warehousing is…data warehouse is evolved to satisfy some needs….we will see some of these need now
  • We need subject oriented and multidimensional data amodel fro data warehouse which facilitates online analysis
  • introduction to datawarehouse

    1. 1. Introduction to Data Warehousing BY G.KIRAN KUMAR HT.NO:001-09-06-002
    2. 2. Why Data Warehouse? <ul><li>Necessity is the mother of invention </li></ul>
    3. 3. Scenario 1 GK Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system.
    4. 4. Scenario 1 : GK Pvt Ltd. Mumbai Delhi Chennai Banglore Sales Manager Sales per item type per branch for first quarter.
    5. 5. Solution 1:GK Pvt Ltd. <ul><li>Extract sales information from each database. </li></ul><ul><li>Store the information in a common repository at a single site. </li></ul>
    6. 6. Solution 1:GK Pvt Ltd. Mumbai Delhi Chennai Banglore Data Warehouse Sales Manager Query & Analysis tools Report
    7. 7. Scenario 2 One Software company has huge operational database. Whenever Executives wants some report the OLTP system becomes slow and data entry operators have to wait for some time.
    8. 8. Scenario 2 : One Software Company Operational Database Data Entry Operator Data Entry Operator Management Wait Report
    9. 9. Solution 2 <ul><li>Extract data needed for analysis from operational database. </li></ul><ul><li>Store it in warehouse. </li></ul><ul><li>Refresh warehouse at regular interval so that it contains up to date information for analysis. </li></ul><ul><li>Warehouse will contain data with historical perspective. </li></ul>
    10. 10. Solution 2 Operational database Data Warehouse Extract data Data Entry Operator Data Entry Operator Manager Report Transaction
    11. 11. Scenario 3 Cakes & Cookies is a small, new company. President of the company wants his company should grow. He needs information so that he can make correct decisions.
    12. 12. Solution 3 <ul><li>Improve the quality of data before loading it into the warehouse. </li></ul><ul><li>Perform data cleaning and transformation before loading the data. </li></ul><ul><li>Use query analysis tools to support adhoc queries. </li></ul>
    13. 13. Solution 3 Query and Analysis tool President Expansion Improvement sales time Data Warehouse
    14. 14. What is Data Warehouse??
    15. 15. Inmons’s definition A data warehouse is -subject-oriented, -integrated, -time-variant, -nonvolatile collection of data in support of management’s decision making process.
    16. 16. Subject-oriented <ul><li>Data warehouse is organized around subjects such as sales,product,customer. </li></ul><ul><li>It focuses on modeling and analysis of data for decision makers. </li></ul><ul><li>Excludes data not useful in decision support process. </li></ul>
    17. 17. Integration <ul><li>Data Warehouse is constructed by integrating multiple heterogeneous sources. </li></ul><ul><li>Data Preprocessing are applied to ensure consistency. </li></ul>RDBMS Legacy System Data Warehouse Flat File Data Processing Data Transformation
    18. 18. Time-variant <ul><li>Provides information from historical perspective e.g. past 5-10 years </li></ul><ul><li>Every key structure contains either implicitly or explicitly an element of time </li></ul>
    19. 19. Nonvolatile <ul><li>Data once recorded cannot be updated. </li></ul><ul><li>Data warehouse requires two operations in data accessing </li></ul><ul><ul><li>Initial loading of data </li></ul></ul><ul><ul><li>Access of data </li></ul></ul>load access
    20. 20. Data Warehousing Architecture
    21. 21. Data Warehouse Architecture <ul><li>Data Warehouse server </li></ul><ul><ul><li>almost always a relational DBMS,rarely flat files </li></ul></ul><ul><li>OLAP servers </li></ul><ul><ul><li>to support and operate on multi-dimensional data structures </li></ul></ul><ul><li>Clients </li></ul><ul><ul><li>Query and reporting tools </li></ul></ul><ul><ul><li>Analysis tools </li></ul></ul><ul><ul><li>Data mining tools </li></ul></ul>
    22. 22. Data Warehouse Schema <ul><li>Star Schema </li></ul><ul><li>Fact Constellation Schema </li></ul><ul><li>Snowflake Schema </li></ul>
    23. 23. Star Schema <ul><li>A single,large and central fact table and one table for each dimension. </li></ul><ul><li>Every fact points to one tuple in each of the dimensions and has additional attributes. </li></ul><ul><li>Does not capture hierarchies directly. </li></ul>
    24. 24. Star Schema (contd..) Store Dimension Time Dimension Product Dimension Fact Table Benefits: Easy to understand, easy to define hierarchies, reduces no. of physical joins. Store Key Product Key Period Key Units Price Store Key Store Name City State Region Period Key Year Quarter Month Product Key Product Desc
    25. 25. SnowFlake Schema <ul><li>Variant of star schema model. </li></ul><ul><li>A single, large and central fact table and one or more tables for each dimension. </li></ul><ul><li>Dimension tables are normalized i.e. split dimension table data into additional tables </li></ul>
    26. 26. SnowFlake Schema (contd..) Time Dimension Product Dimension Fact Table City Dimension Store Dimension Drawbacks: Time consuming joins,report generation slow Store Key Product Key Period Key Units Price Store Key Store Name City Key Period Key Year Quarter Month Product Key Product Desc City Key City State Region
    27. 27. Fact Constellation <ul><li>Multiple fact tables share dimension tables. </li></ul><ul><li>This schema is viewed as collection of stars hence called galaxy schema or fact constellation. </li></ul><ul><li>Sophisticated application requires such schema. </li></ul>
    28. 28. Fact Constellation (contd..) Store Dimension Product Dimension Sales Fact Table Shipping Fact Table Store Key Product Key Period Key Units Price Store Key Store Name City State Region Product Key Product Desc Shipper Key Store Key Product Key Period Key Units Price
    29. 29. Building Data Warehouse <ul><li>Data Selection </li></ul><ul><li>Data Preprocessing </li></ul><ul><ul><li>Fill missing values </li></ul></ul><ul><ul><li>Remove inconsistency </li></ul></ul><ul><li>Data Transformation & Integration </li></ul><ul><li>Data Loading </li></ul><ul><li>Data in warehouse is stored in form of fact tables and dimension tables. </li></ul>
    30. 30. Data Warehousing includes <ul><li>Build Data Warehouse </li></ul><ul><li>Online analysis processing(OLAP). </li></ul><ul><li>Presentation. </li></ul>RDBMS Flat File Presentation Cleaning ,Selection & Integration Warehouse & OLAP server Client
    31. 31. Need for Data Warehousing <ul><li>Industry has huge amount of operational data </li></ul><ul><li>Knowledge worker wants to turn this data into useful information. </li></ul><ul><li>This information is used by them to support strategic decision making . </li></ul><ul><li>It is a platform for consolidated historical data for analysis. </li></ul><ul><li>It stores data of good quality so that knowledge worker can make correct decisions. </li></ul>
    32. 32. Advantages Of Data Warehouse <ul><li>There are many advantages to using a data warehouse, some of them are: </li></ul><ul><li>Enhances end-user access to a wide variety of data </li></ul><ul><li>Business decision makers can obtain various kinds of trend reports e.g. the item with the most sales in a particular area / country for the last two years </li></ul><ul><li>Increased data consistency </li></ul><ul><li>Providing a place to combine related data from separate sources </li></ul>
    33. 33. Disadvantages Of Data warehouse <ul><li>Data owners lose control over their data, raising ownership (responsibility and accountability), security and privacy issues </li></ul><ul><li>Adding new data sources takes time and associated high cost </li></ul><ul><li>Limited flexibility of use and types of users - requires multiple separate data marts for multiple uses and types of users </li></ul>
    34. 34. CONCLUSION <ul><li>A parallel was made between Operational Systems and Data Warehouse Systems to show their differences mainly in the objectives and type of data that each one deals </li></ul><ul><li>Data warehouse is the technology for the future. </li></ul><ul><li>data warehouse enables knowledge worker to make faster and better decisions </li></ul>
    35. 35. References <ul><li>Building Data Warehouse by Inmon </li></ul><ul><li>Data Mining:Concepts and Techniques by Han,Kamber. </li></ul><ul><li>www.dwinfocenter.org </li></ul><ul><li>www.datawarehousingonline.com </li></ul><ul><li>www.billinmon.com </li></ul>
    36. 36. Thank You

    ×