Yogesh Benawat Sameer Deshmukh
Outline Data Mining  Data Warehousing  Q ‘n’ A Conclusion
Historical Perspective 1960s: Data collection, database creation, IMS and network DBMS 1970s:  Relational data model, relational DBMS implementation 1980s:  RDBMS, advanced data models (extended-relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.) 1990s —2000s :  Data mining and data warehousing, multimedia databases, and Web databases
Data Mining
Definition Data mining automates the process of locating and extracting the hidden patterns and knowledge   In simple words Searching for new knowledge
Why we need data mining Data explosion problem  Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories  We are drowning in data, but starving for knowledge!  Solution: Data mining Data warehousing and on-line analytical processing Extraction of interesting knowledge (rules, regularities,  patterns, constraints) from data in large databases
Data Mining Models Predictive Model Descriptive Model
Predictive Model Prediction determining how certain attributes will behave in the future Regression mapping of data item to real valued prediction variable Classification categorization of data based on combinations of attributes   Time Series analysis examining values of attributes with respect to time
Descriptive Model Clustering  most closely data clubbed together into clusters Data Summarization  extracting representative information about database Association Rules  associativity defined between data items to form relationship Sequence Discovery it is used to determine sequential patterns in data based on time sequence of action
Data mining process Fig. General Phases of Data Mining Process Problem Definition Creating Database Exploring database Preparation for creating a data mining model Building Data Mining Model Evaluation Phase Deploying the Data Mining model
Who needs data mining? Whoever has information fastest and uses it wins Don McKeough former president of Coke Cola   Businesses are looking for new ways to let end users find the data they need to:  make decisions  Serve customers Gain the competitive edge
Applications Business analysis and management  Computer security  Customer relationships analysis and management  Telecommunication analysis and management  News and entertainment  Bioinformatics and Healthcare analysis
Summary Need of data mining Data mining models Process of data mining Some applications
Data Warehousing
Data Warehousing  Data Warehouse What is Data Warehouse? Database & Data Warehouse. How to distinguish? Purpose Database : Transactional Data Warehouse :Intended for Decision Supporting    Applications. Functionality Optimized for data retrieval, not routine transaction processing.  Structure Performance
Data Warehousing Modern Organization’s needs ? Companies spread world wide. Have  So many  Data Sources Different  Operational Systems Different  Schemas Need Data for  Complex Analysis Knowledge Discovery   Decision Making . Solution ???
Data Warehousing  Solution … Data Warehouse. Data Warehouse .  Definition ?? No single definition…. Data Warehouse Collection of Information gathered from  multiple sources , stored under  unified schema , at a  single site  & mainly intended  for  decision support  applications.  A subject oriented, integrated, nonvolatile, time-variant, collection of data in support of management’s decision.   ~  W.H. Inmon
Warehouses are Very Large Databases 35% 30% 25% 20% 15% 10% 5% 0% 5GB 5-9GB 10-19GB 50-99GB 250-499GB 20-49GB 100-249GB 500GB-1TB Initial Projected 2Q96 Source: META Group, Inc. Respondents
Data Warehousing  Data Warehouse - Architecture
Data Warehousing Data Warehouse building When & how to gather data Source-driven architecture   Destination-driven architecture What schema to use  Data Cleansing Task of correcting and processing data How to propagate updates What data to summarize And many more……
Summary  What is Data Warehousing? Data Warehouse. Data Warehouse – Architecture Data Warehouse vs. Data Mining
Conclusion Your data is full of undiscovered gems; start digging!
References  Data Mining Introductory and advanced Topics Margaret H. Dunham Modern Data Warehousing, Mining, and visualization   George M. Marakas Data Mining    BPB Publications  Database System Concepts   Silbershatz, Korth, Sudarshan www.statoo.info/ www.crm2day.com/ www.trilliumsoftware.com/
Q ‘n’ A
Thank You!

Data Mining and Data Warehousing

  • 1.
  • 2.
    Outline Data Mining Data Warehousing Q ‘n’ A Conclusion
  • 3.
    Historical Perspective 1960s:Data collection, database creation, IMS and network DBMS 1970s: Relational data model, relational DBMS implementation 1980s: RDBMS, advanced data models (extended-relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.) 1990s —2000s : Data mining and data warehousing, multimedia databases, and Web databases
  • 4.
  • 5.
    Definition Data miningautomates the process of locating and extracting the hidden patterns and knowledge In simple words Searching for new knowledge
  • 6.
    Why we needdata mining Data explosion problem Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories We are drowning in data, but starving for knowledge! Solution: Data mining Data warehousing and on-line analytical processing Extraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases
  • 7.
    Data Mining ModelsPredictive Model Descriptive Model
  • 8.
    Predictive Model Predictiondetermining how certain attributes will behave in the future Regression mapping of data item to real valued prediction variable Classification categorization of data based on combinations of attributes Time Series analysis examining values of attributes with respect to time
  • 9.
    Descriptive Model Clustering most closely data clubbed together into clusters Data Summarization extracting representative information about database Association Rules associativity defined between data items to form relationship Sequence Discovery it is used to determine sequential patterns in data based on time sequence of action
  • 10.
    Data mining processFig. General Phases of Data Mining Process Problem Definition Creating Database Exploring database Preparation for creating a data mining model Building Data Mining Model Evaluation Phase Deploying the Data Mining model
  • 11.
    Who needs datamining? Whoever has information fastest and uses it wins Don McKeough former president of Coke Cola Businesses are looking for new ways to let end users find the data they need to: make decisions Serve customers Gain the competitive edge
  • 12.
    Applications Business analysisand management Computer security Customer relationships analysis and management Telecommunication analysis and management News and entertainment Bioinformatics and Healthcare analysis
  • 13.
    Summary Need ofdata mining Data mining models Process of data mining Some applications
  • 14.
  • 15.
    Data Warehousing Data Warehouse What is Data Warehouse? Database & Data Warehouse. How to distinguish? Purpose Database : Transactional Data Warehouse :Intended for Decision Supporting Applications. Functionality Optimized for data retrieval, not routine transaction processing. Structure Performance
  • 16.
    Data Warehousing ModernOrganization’s needs ? Companies spread world wide. Have So many Data Sources Different Operational Systems Different Schemas Need Data for Complex Analysis Knowledge Discovery Decision Making . Solution ???
  • 17.
    Data Warehousing Solution … Data Warehouse. Data Warehouse . Definition ?? No single definition…. Data Warehouse Collection of Information gathered from multiple sources , stored under unified schema , at a single site & mainly intended for decision support applications. A subject oriented, integrated, nonvolatile, time-variant, collection of data in support of management’s decision. ~ W.H. Inmon
  • 18.
    Warehouses are VeryLarge Databases 35% 30% 25% 20% 15% 10% 5% 0% 5GB 5-9GB 10-19GB 50-99GB 250-499GB 20-49GB 100-249GB 500GB-1TB Initial Projected 2Q96 Source: META Group, Inc. Respondents
  • 19.
    Data Warehousing Data Warehouse - Architecture
  • 20.
    Data Warehousing DataWarehouse building When & how to gather data Source-driven architecture Destination-driven architecture What schema to use Data Cleansing Task of correcting and processing data How to propagate updates What data to summarize And many more……
  • 21.
    Summary Whatis Data Warehousing? Data Warehouse. Data Warehouse – Architecture Data Warehouse vs. Data Mining
  • 22.
    Conclusion Your datais full of undiscovered gems; start digging!
  • 23.
    References DataMining Introductory and advanced Topics Margaret H. Dunham Modern Data Warehousing, Mining, and visualization George M. Marakas Data Mining BPB Publications Database System Concepts Silbershatz, Korth, Sudarshan www.statoo.info/ www.crm2day.com/ www.trilliumsoftware.com/
  • 24.
  • 25.