Data Mining 
• Motivation 
• Synonym 
• Process of DM 
• Operation of DM 
• DM techniques 
• Business Application 
• Application Selection 
• Current Issues
Motivations for Data Mining 
• Raw data rarely generates direct 
benefits 
• Its real value is realized when we 
extract information and knowledge 
useful 
• Some queries are difficult to generate 
with SQL 
– Which records indicate fraud? 
– Which customers are likely to buy product 
A?
Motivations for DM 
• Only 5%-10% of the collected data has 
been ever analyzed to support the 
decision-making process 
• The amount of the data collected in an 
organization continues to increase, 
while our ability to analyze that data has 
not kept up proportionately
Data Mining (DM) 
• A technique which extracts knowledge 
from massive data 
• It is also known as Knowledge 
Discovery and Databases (KDD) 
– KDD is defined as the overall process 
necessary to discover knowledge, while 
DM is one particular activity which applies 
a specific algorithm to extract knowledge 
– However, these two terms are often used 
interchangeably
Process of Data Mining 
• Extracting knowledge from databases is 
a five-step process 
• The five-step process of knowledge 
discovery is an interactive, iterative 
process through which discovery is 
evolved
Process of Data Mining 
• Selecting Application Domain 
• Selecting Target Data 
• Preprocessing Data 
• Extracting Information/Knowledge 
• Interpretation and Evaluation
Operations of DM 
• Classification 
• Regression 
• Link Analysis 
• Segmentation 
• Detecting Deviations
DM Techniques 
• Machine Learning 
– Induction 
– Conceptual Clustering 
• ANN 
• Statistical Techniques 
• Example-based Methods
Business Application 
• Marketing 
– Market Segmentation 
– Market Basket Analysis 
– Trend Analysis 
– Sales Prediction 
• Finance 
– Bankruptcy prediction 
– Credit approval
Business Application 
• Finance 
– Bond rate prediction 
– Mutual fund selection 
• Insurance 
– Fraud Detection
Application Selection 
• Non-Technical Criteria 
– Potential benefits and payoffs 
– Management support 
– Domain expert 
– End user interest and involvement 
– Potential for privacy/legal issues
Application Selection 
• Technical Criteria 
– Sufficient amount of data 
– High quality data 
– Prior Knowledge
Current Issues 
• Integration 
– DM with OLAP 
• Limited power of commercial DM Tools 
• Data quality problem 
• Multimedia data: video, audio, images, 
etc. 
• Scaling-up problem

Artificial Intelligence: Data Mining

  • 1.
    Data Mining •Motivation • Synonym • Process of DM • Operation of DM • DM techniques • Business Application • Application Selection • Current Issues
  • 2.
    Motivations for DataMining • Raw data rarely generates direct benefits • Its real value is realized when we extract information and knowledge useful • Some queries are difficult to generate with SQL – Which records indicate fraud? – Which customers are likely to buy product A?
  • 3.
    Motivations for DM • Only 5%-10% of the collected data has been ever analyzed to support the decision-making process • The amount of the data collected in an organization continues to increase, while our ability to analyze that data has not kept up proportionately
  • 4.
    Data Mining (DM) • A technique which extracts knowledge from massive data • It is also known as Knowledge Discovery and Databases (KDD) – KDD is defined as the overall process necessary to discover knowledge, while DM is one particular activity which applies a specific algorithm to extract knowledge – However, these two terms are often used interchangeably
  • 5.
    Process of DataMining • Extracting knowledge from databases is a five-step process • The five-step process of knowledge discovery is an interactive, iterative process through which discovery is evolved
  • 6.
    Process of DataMining • Selecting Application Domain • Selecting Target Data • Preprocessing Data • Extracting Information/Knowledge • Interpretation and Evaluation
  • 7.
    Operations of DM • Classification • Regression • Link Analysis • Segmentation • Detecting Deviations
  • 8.
    DM Techniques •Machine Learning – Induction – Conceptual Clustering • ANN • Statistical Techniques • Example-based Methods
  • 9.
    Business Application •Marketing – Market Segmentation – Market Basket Analysis – Trend Analysis – Sales Prediction • Finance – Bankruptcy prediction – Credit approval
  • 10.
    Business Application •Finance – Bond rate prediction – Mutual fund selection • Insurance – Fraud Detection
  • 11.
    Application Selection •Non-Technical Criteria – Potential benefits and payoffs – Management support – Domain expert – End user interest and involvement – Potential for privacy/legal issues
  • 12.
    Application Selection •Technical Criteria – Sufficient amount of data – High quality data – Prior Knowledge
  • 13.
    Current Issues •Integration – DM with OLAP • Limited power of commercial DM Tools • Data quality problem • Multimedia data: video, audio, images, etc. • Scaling-up problem