Data Mining and Data Warehousing


Published on

Paper Presentation for Data Mining and Data Warehosuing.

Published in: Technology, Business
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data Mining and Data Warehousing

  1. 1. Yogesh Benawat Sameer Deshmukh
  2. 2. Outline <ul><li>Data Mining </li></ul><ul><li>Data Warehousing </li></ul><ul><li>Q ‘n’ A </li></ul><ul><li>Conclusion </li></ul>
  3. 3. Historical Perspective <ul><li>1960s: </li></ul><ul><ul><li>Data collection, database creation, IMS and network DBMS </li></ul></ul><ul><li>1970s: </li></ul><ul><ul><li>Relational data model, relational DBMS implementation </li></ul></ul><ul><li>1980s: </li></ul><ul><ul><li>RDBMS, advanced data models (extended-relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.) </li></ul></ul><ul><li>1990s —2000s : </li></ul><ul><ul><li>Data mining and data warehousing, multimedia databases, and Web databases </li></ul></ul>
  4. 4. Data Mining
  5. 5. Definition <ul><li>Data mining automates the process of locating and extracting the hidden patterns and knowledge </li></ul><ul><li>In simple words </li></ul><ul><ul><li>Searching for new knowledge </li></ul></ul>
  6. 6. Why we need data mining <ul><li>Data explosion problem </li></ul><ul><ul><li>Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories </li></ul></ul><ul><li>We are drowning in data, but starving for knowledge! </li></ul><ul><li>Solution: Data mining </li></ul><ul><ul><li>Data warehousing and on-line analytical processing </li></ul></ul><ul><ul><li>Extraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases </li></ul></ul>
  7. 7. Data Mining Models <ul><li>Predictive Model </li></ul><ul><li>Descriptive Model </li></ul>
  8. 8. Predictive Model <ul><li>Prediction </li></ul><ul><ul><li>determining how certain attributes will behave in the future </li></ul></ul><ul><li>Regression </li></ul><ul><ul><li>mapping of data item to real valued prediction variable </li></ul></ul><ul><li>Classification </li></ul><ul><ul><li>categorization of data based on combinations of attributes </li></ul></ul><ul><li>Time Series analysis </li></ul><ul><ul><li>examining values of attributes with respect to time </li></ul></ul>
  9. 9. Descriptive Model <ul><li>Clustering </li></ul><ul><ul><li>most closely data clubbed together into clusters </li></ul></ul><ul><li>Data Summarization </li></ul><ul><ul><li>extracting representative information about database </li></ul></ul><ul><li>Association Rules </li></ul><ul><ul><li>associativity defined between data items to form relationship </li></ul></ul><ul><li>Sequence Discovery </li></ul><ul><ul><li>it is used to determine sequential patterns in data based on time sequence of action </li></ul></ul>
  10. 10. Data mining process Fig. General Phases of Data Mining Process Problem Definition Creating Database Exploring database Preparation for creating a data mining model Building Data Mining Model Evaluation Phase Deploying the Data Mining model
  11. 11. Who needs data mining? <ul><li>Whoever has information fastest and uses it wins </li></ul><ul><ul><li>Don McKeough former president of Coke Cola </li></ul></ul><ul><li>Businesses are looking for new ways to let end users find the data they need to: </li></ul><ul><ul><li>make decisions </li></ul></ul><ul><ul><li>Serve customers </li></ul></ul><ul><ul><li>Gain the competitive edge </li></ul></ul>
  12. 12. Applications <ul><li>Business analysis and management </li></ul><ul><li>Computer security </li></ul><ul><li>Customer relationships analysis and management </li></ul><ul><li>Telecommunication analysis and management </li></ul><ul><li>News and entertainment </li></ul><ul><li>Bioinformatics and Healthcare analysis </li></ul>
  13. 13. Summary <ul><li>Need of data mining </li></ul><ul><li>Data mining models </li></ul><ul><li>Process of data mining </li></ul><ul><li>Some applications </li></ul>
  14. 14. Data Warehousing
  15. 15. Data Warehousing <ul><li>Data Warehouse </li></ul><ul><ul><li>What is Data Warehouse? </li></ul></ul><ul><ul><ul><li>Database & Data Warehouse. </li></ul></ul></ul><ul><ul><ul><li>How to distinguish? </li></ul></ul></ul><ul><ul><ul><ul><li>Purpose </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Database : Transactional </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Data Warehouse :Intended for Decision Supporting Applications. </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><li>Functionality </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Optimized for data retrieval, not routine transaction processing. </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><li>Structure </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Performance </li></ul></ul></ul></ul>
  16. 16. Data Warehousing <ul><li>Modern Organization’s needs ? </li></ul><ul><ul><li>Companies spread world wide. </li></ul></ul><ul><ul><ul><li>Have </li></ul></ul></ul><ul><ul><ul><ul><li>So many Data Sources </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Different Operational Systems </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Different Schemas </li></ul></ul></ul></ul><ul><ul><ul><li>Need Data for </li></ul></ul></ul><ul><ul><ul><ul><li>Complex Analysis </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Knowledge Discovery </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Decision Making . </li></ul></ul></ul></ul><ul><ul><ul><li>Solution ??? </li></ul></ul></ul>
  17. 17. Data Warehousing <ul><li>Solution … Data Warehouse. </li></ul><ul><li>Data Warehouse . Definition ?? </li></ul><ul><ul><li>No single definition…. </li></ul></ul><ul><li>Data Warehouse </li></ul><ul><ul><li>Collection of Information gathered from multiple sources , stored under unified schema , at a single site & mainly intended for decision support applications. </li></ul></ul><ul><ul><li>A subject oriented, integrated, nonvolatile, time-variant, collection of data in support of management’s decision. </li></ul></ul><ul><ul><li> ~ W.H. Inmon </li></ul></ul>
  18. 18. Warehouses are Very Large Databases 35% 30% 25% 20% 15% 10% 5% 0% 5GB 5-9GB 10-19GB 50-99GB 250-499GB 20-49GB 100-249GB 500GB-1TB Initial Projected 2Q96 Source: META Group, Inc. Respondents
  19. 19. Data Warehousing <ul><li>Data Warehouse - Architecture </li></ul>
  20. 20. Data Warehousing <ul><li>Data Warehouse building </li></ul><ul><ul><li>When & how to gather data </li></ul></ul><ul><ul><ul><li>Source-driven architecture </li></ul></ul></ul><ul><ul><ul><li>Destination-driven architecture </li></ul></ul></ul><ul><ul><li>What schema to use </li></ul></ul><ul><ul><li>Data Cleansing </li></ul></ul><ul><ul><ul><li>Task of correcting and processing data </li></ul></ul></ul><ul><ul><li>How to propagate updates </li></ul></ul><ul><ul><li>What data to summarize </li></ul></ul><ul><ul><li>And many more…… </li></ul></ul>
  21. 21. Summary <ul><li>What is Data Warehousing? </li></ul><ul><li>Data Warehouse. </li></ul><ul><li>Data Warehouse – Architecture </li></ul><ul><li>Data Warehouse vs. Data Mining </li></ul>
  22. 22. Conclusion <ul><li>Your data is full of undiscovered gems; start digging! </li></ul>
  23. 23. References <ul><li>Data Mining Introductory and advanced Topics </li></ul><ul><li>Margaret H. Dunham </li></ul><ul><li>Modern Data Warehousing, Mining, and visualization George M. Marakas </li></ul><ul><li>Data Mining </li></ul><ul><li> BPB Publications </li></ul><ul><li>Database System Concepts </li></ul><ul><li> Silbershatz, Korth, Sudarshan </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul>
  24. 24. Q ‘n’ A
  25. 25. Thank You!