90300 633579030311875000

531 views
458 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
531
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

90300 633579030311875000

  1. 1. Data Warehousing/Mining Introduction
  2. 2. Outline of Lecture <ul><li>Brief History of Data Warehousing </li></ul><ul><li>What is a Data Warehouse? </li></ul><ul><li>Need For Strategic Information </li></ul><ul><li>Information Crisis </li></ul><ul><li>Operational and Decision Support System </li></ul><ul><li>Difference B/W standard DB and Data warehouse </li></ul>
  3. 3. Data Warehouse Evolution TIME 2000 1995 1980 1960 1975 Information- Based Management Data Revolution “ Middle Ages” “ Prehistoric Times” Relational Databases PC’s and Spreadsheets End-user Interfaces 1st DW Article DW Confs. Vendor DW Frameworks Company DWs “ Building the DW” Inmon (1992) Data Replication Tools 1985 1990
  4. 4. Escalating Need For Strategic Information <ul><li>Organizations need information to formulate the business strategies,establish Goals,set Objectives </li></ul><ul><li>e.g. </li></ul><ul><li>Increase the customer by 10% over the next 5 years </li></ul><ul><li>Gain market share by 15% in the next 2 years </li></ul><ul><li>Increase product quality levels in the top five product groups </li></ul>
  5. 5. The Information Crisis <ul><li>Information is said to be doubled every 18 months </li></ul><ul><li>Organizations have tons of data available </li></ul><ul><li>Then why information Crisis? </li></ul><ul><li>Why cant organizations convert the data into useful information for strategic decision making? </li></ul>
  6. 6. Problem: Heterogeneous Information Sources “ Heterogeneities are everywhere” <ul><li>Different interfaces </li></ul><ul><li>Different data representations </li></ul><ul><li>Diverse structure of databases </li></ul><ul><li>Duplicate and inconsistent information </li></ul>Personal Databases Digital Libraries Scientific Databases World Wide Web
  7. 7. About Some Definitions <ul><li>What is data? </li></ul><ul><li>What is information? </li></ul><ul><li>What is Warehouse? </li></ul>
  8. 8. What is a Data Warehouse? A Practitioners Viewpoint <ul><li>“A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.” </li></ul><ul><li>-- Barry Devlin, IBM Consultant </li></ul>
  9. 9. A Data Warehouse is... <ul><li>Stored collection of diverse data </li></ul><ul><ul><li>A solution to data integration problem </li></ul></ul><ul><ul><li>Single repository of information </li></ul></ul><ul><li>Subject-oriented </li></ul><ul><ul><li>Organized by subject, not by application </li></ul></ul><ul><ul><li>Used for analysis, data mining, etc. </li></ul></ul><ul><li>Large volume of data (Gb, Tb) </li></ul><ul><li>Non-volatile </li></ul><ul><ul><li>Historical </li></ul></ul><ul><ul><li>Time attributes are important </li></ul></ul>
  10. 10. A Data Warehouse is... (continued) <ul><li>Updates infrequent </li></ul><ul><li>Examples </li></ul><ul><ul><li>All transactions EVER at WalMart </li></ul></ul><ul><ul><li>Complete client histories at insurance firm </li></ul></ul><ul><ul><li>Stockbroker financial information and portfolios </li></ul></ul>
  11. 11. Summary Operational Systems Data Warehouse Population Data Warehouse Business Information Interface
  12. 12. What is Operational and Decision Support System <ul><li>Operational Systems </li></ul><ul><li>Making the wheels of Business Turn </li></ul><ul><ul><li>Take an order </li></ul></ul><ul><ul><li>Process a claim </li></ul></ul><ul><ul><li>Make shipment </li></ul></ul><ul><ul><li>Generate an invoice </li></ul></ul><ul><ul><li>Receive cash </li></ul></ul><ul><ul><li>Reserve an airline seat </li></ul></ul>
  13. 13. <ul><li>Decision Support System </li></ul><ul><li>Watching the wheels of business turn </li></ul><ul><ul><li>Show the top selling products </li></ul></ul><ul><ul><li>Show the problem regions </li></ul></ul><ul><ul><li>Tell me why (drill down) </li></ul></ul><ul><ul><li>Let me see other data (drill across) </li></ul></ul><ul><ul><li>Alert me when a district sells below target </li></ul></ul>What is Operational and Decision Support System (Contd…)
  14. 14. Difference <ul><li>Operational </li></ul><ul><li>Current Values </li></ul><ul><li>Optimized for transaction </li></ul><ul><li>High </li></ul><ul><li>Read, update, delete </li></ul><ul><li>Predictable, repetitive </li></ul><ul><li>Sub seconds </li></ul><ul><li>Large Number </li></ul>Informational Archived, derived, optimized Optimized for complex queries Medium to Low Read Ad hoc, random, Heuristic Several Seconds to Minutes Relatively Small number Data Content Data Structure Access Frequency Access Type Usage Response Time Users
  15. 15. Warehouse is a Specialized DB <ul><li>Standard DB </li></ul><ul><li>Mostly updates </li></ul><ul><li>Many small transactions </li></ul><ul><li>Mb - Gb of data </li></ul><ul><li>Current snapshot </li></ul><ul><li>Index/hash on p.k. </li></ul><ul><li>Raw data </li></ul><ul><li>Thousands of users (e.g., clerical users) </li></ul><ul><li>Warehouse </li></ul><ul><li>Mostly reads </li></ul><ul><li>Queries are long and complex </li></ul><ul><li>Gb - Tb of data </li></ul><ul><li>History </li></ul><ul><li>Lots of scans </li></ul><ul><li>Summarized, reconciled data </li></ul><ul><li>Hundreds of users (e.g., decision-makers, analysts) </li></ul>
  16. 16. Warehousing and Industry <ul><li>Warehousing is big business </li></ul><ul><ul><li>$2 billion in 1995 </li></ul></ul><ul><ul><li>$3.5 billion in early 1997 </li></ul></ul><ul><ul><li>About $8 billion in 1998 [Metagroup] </li></ul></ul><ul><li>WalMart has largest warehouse </li></ul><ul><ul><li>900-CPU, 2,700 disk, 23 TB Teradata system </li></ul></ul><ul><ul><li>~7TB in warehouse </li></ul></ul><ul><ul><li>40-50GB per day </li></ul></ul>
  17. 17. Data Warehousing: Two Distinct Issues <ul><li>(1) How to get information into warehouse </li></ul><ul><li>“ Data warehousing” </li></ul><ul><li>(2) What to do with data once it’s in warehouse </li></ul><ul><li>“ Warehouse DBMS” </li></ul><ul><li>Both rich research areas </li></ul><ul><li>Industry has focused on (2) </li></ul>
  18. 18. Thank You Very Much

×