This document discusses data mining, including its motivations, process, techniques, business applications, and current issues. Data mining involves extracting knowledge from large amounts of data through techniques like classification, regression, and clustering. It has various business applications in marketing, finance, and insurance. Selecting the right application requires considering both technical factors like data quality and non-technical factors like potential benefits and management support.
Discusses data mining concepts including motivation, synonyms, process, operations, techniques, business applications, application selection, and current issues.
Highlights the necessity of data mining due to underutilization of collected data and challenges in querying complex data.
Defines data mining as a technique for extracting knowledge from large datasets and its relation to Knowledge Discovery in Databases (KDD).
Describes the five-step process for knowledge discovery, which includes selecting application domains to evaluating extracted information.
Lists the main operations involved in data mining like classification, regression, link analysis, segmentation, and deviation detection.
Explores major DM techniques including machine learning, artificial neural networks (ANN), and statistical methods.
Discusses various business applications such as marketing segmentation, trend analysis, finance predictions, and insurance fraud detection.
Outlines both non-technical and technical criteria essential for selecting suitable data mining applications.
Examines existing challenges in data mining such as integration with OLAP, tool limitations, data quality, multimedia data issues, and scaling problems.
Data Mining
•Motivation
• Synonym
• Process of DM
• Operation of DM
• DM techniques
• Business Application
• Application Selection
• Current Issues
2.
Motivations for DataMining
• Raw data rarely generates direct
benefits
• Its real value is realized when we
extract information and knowledge
useful
• Some queries are difficult to generate
with SQL
– Which records indicate fraud?
– Which customers are likely to buy product
A?
3.
Motivations for DM
• Only 5%-10% of the collected data has
been ever analyzed to support the
decision-making process
• The amount of the data collected in an
organization continues to increase,
while our ability to analyze that data has
not kept up proportionately
4.
Data Mining (DM)
• A technique which extracts knowledge
from massive data
• It is also known as Knowledge
Discovery and Databases (KDD)
– KDD is defined as the overall process
necessary to discover knowledge, while
DM is one particular activity which applies
a specific algorithm to extract knowledge
– However, these two terms are often used
interchangeably
5.
Process of DataMining
• Extracting knowledge from databases is
a five-step process
• The five-step process of knowledge
discovery is an interactive, iterative
process through which discovery is
evolved
6.
Process of DataMining
• Selecting Application Domain
• Selecting Target Data
• Preprocessing Data
• Extracting Information/Knowledge
• Interpretation and Evaluation
7.
Operations of DM
• Classification
• Regression
• Link Analysis
• Segmentation
• Detecting Deviations
Business Application
•Finance
– Bond rate prediction
– Mutual fund selection
• Insurance
– Fraud Detection
11.
Application Selection
•Non-Technical Criteria
– Potential benefits and payoffs
– Management support
– Domain expert
– End user interest and involvement
– Potential for privacy/legal issues
12.
Application Selection
•Technical Criteria
– Sufficient amount of data
– High quality data
– Prior Knowledge
13.
Current Issues
•Integration
– DM with OLAP
• Limited power of commercial DM Tools
• Data quality problem
• Multimedia data: video, audio, images,
etc.
• Scaling-up problem