4. Definition
Data mining automates the process of locating and
extracting the hidden patterns and knowledge
In simple words
Searching for new knowledge
5. Why we need data mining
Data explosion problem
Automated data collection tools and mature database technology
lead to tremendous amounts of data stored in databases, data
warehouses and other information repositories
We are drowning in data, but starving for knowledge!
Solution: Data mining
Data warehousing and on-line analytical processing
Extraction of interesting knowledge (rules, regularities, patterns,
constraints) from data in large databases
7. Predictive Model
Prediction
determining how certain attributes will behave in the future
Regression
mapping of data item to real valued prediction variable
Classification
categorization of data based on combinations of attributes
Time Series analysis
examining values of attributes with respect to time
8. Descriptive Model
Clustering
most closely data clubbed together into clusters
Data Summarization
extracting representative information about database
Association Rules
associativity defined between data items to form relationship
Sequence Discovery
it is used to determine sequential patterns in data based on
time sequence of action
9. Data mining process
Problem Definition
Creating Database
Exploring database
Preparation for creating a data mining model
Building Data Mining Model
Evaluation Phase
Deploying the Data Mining model
Fig. General Phases of Data Mining Process
10. Who needs data mining?
Whoever has information fastest and uses it wins
Don McKeough former president of Coke Cola
Businesses are looking for new ways to let end users
find the data they need to:
make decisions
Serve customers
Gain the competitive edge
11. Applications
Business analysis and management
Computer security
Customer relationships analysis and management
Telecommunication analysis and management
News and entertainment
Bioinformatics and Healthcare analysis
12. Summary
Need of data mining
Data mining models
Process of data mining
Some applications
14. Data Warehousing
Data Warehouse
What is Data Warehouse?
Database & Data Warehouse.
How to distinguish?
Purpose
Database : Transactional
Data Warehouse :Intended for Decision Supporting
Applications.
Functionality
Optimized for data retrieval, not routine transaction
processing.
Structure
Performance
15. Data Warehousing
Modern Organization’s needs ?
Companies spread world wide.
Have
So many Data Sources
Different Operational Systems
Different Schemas
Need Data for
Complex Analysis
Knowledge Discovery
Decision Making.
Solution ???
16. Data Warehousing
Solution…Data Warehouse.
Data Warehouse . Definition ??
No single definition….
Data Warehouse
Collection of Information gathered from multiple sources,
stored under unified schema, at a single site & mainly
intended for decision support applications.
A subject oriented, integrated, nonvolatile, time-variant,
collection of data in support of management’s decision.
~ W.H. Inmon
17. Warehouses are Very Large
Databases
35%
30%
25%
20%
15%
10%
5%
0%
5GB
10-19GB 50-99GB 250-499GB
5-9GB
20-49GB 100-249GB 500GB-1TB
Initial
Projected 2Q96
Source: META Group, Inc.
Respondents
19. Data Warehousing
Data Warehouse building
When & how to gather data
Source-driven architecture
Destination-driven architecture
What schema to use
Data Cleansing
Task of correcting and processing data
How to propagate updates
What data to summarize
And many more……
20. Summary
What is Data Warehousing?
Data Warehouse.
Data Warehouse – Architecture
Data Warehouse vs. Data Mining
22. References
Data Mining Introductory and advanced Topics
Margaret H. Dunham
Modern Data Warehousing, Mining, and visualization
George M. Marakas
Data Mining
BPB Publications
Database System Concepts
Silbershatz, Korth,
Sudarshan
www.statoo.info/
www.crm2day.com/
www.trilliumsoftware.com/