SlideShare a Scribd company logo
1 of 17
Data Mining
Steps and Functionalities
1
Data Mining: A KDD Process
 Data mining: the core of
knowledge discovery
process.
Data Cleaning
Data Integration
Databases
Data
Warehouse
Task-relevant Data
Selection &
Transformation
Data Mining
Pattern Evaluation
2
Steps of a KDD Process
 Data Cleaning
 Handles Noisy, Inconsistent, Incomplete data
 Missing Values
 Noisy data
 Binning, Clustering etc.
 Inconsistencies
 Tools, functional dependencies
3
 Data Integration
 Schema Integration
 Entity Identification problem
 Redundancy
 Correlation Analysis
 Data Selection
 Select Only the task relevant data
Steps of a KDD Process
4
 Data Transformation
 Transform or consolidate data
 Smoothing, Normalization, Feature Construction
 Data Reduction - Compression
 Data Mining
 Intelligent methods are applied to extract patterns
Steps of a KDD Process
5
 Pattern Evaluation
 Interestingness Measures
 Knowledge Presentation
 Visualization
Steps of a KDD Process
6
Data Mining Functionalities
 Descriptive
 Characterize general properties of the data
 Predictive
 Performs inference
 Mining
 Parallel
 Various Granularities
7
Data Mining Functionalities
 Concept/class description
 Association Analysis
 Classification and Prediction
 Cluster Analysis
 Outlier Analysis
 Evolution Analysis
8
Concept/ Class Description
 Data can be associated with Classes /
Concepts
 Computers, Printers
 BigSpenders Vs BudgetSpenders
 Class / Concept Description
 Classes and Concepts can be summarized in
concise and precise terms
 Data Characterization
 Data Discrimination
9
Data Characterization
 Summarization of the general characteristics
 Data collected and aggregated
 OLAP roll up operation
 Attribute Oriented Induction
 Results – Charts, cubes, rules
 Example
 Characteristics of Customers
10
Data Discrimination
 Compare target class and contrasting classes
 Maybe user specified
 Examples:
 Products whose sales increased Vs decreased
 Regular Shoppers Vs Occasional Shoppers
 Output includes Comparative measures
11
Association Analysis
 Discovery of association rules
 Form: X ⇒ Y
 Multi-dimensional
 Age(X, “20…29”) ∧ income(X, “20K…25K”) ⇒
buys(X, “Laptop”)
 Single Dimensional
 buys(X, “Laptop”) ⇒ buys(X, “Software”)
12
Classification and Prediction
 Classification
 Finds models that describe and differentiate
classes or concepts
 Predicts class
 Training data
 Models – rules, decision trees, NN, formulae
 Preceded by relevance analysis (to eliminate
irrelevant attributes)
13
Classification and Prediction
 Prediction
 Derived model is used for prediction
 Data value prediction
 Class label prediction (Classification)
 Trend identification
14
Cluster Analysis
 Unsupervised
 Class labels are missing in the training set
 Maximize Intra-class similarity
 Minimize Inter-class similarity
 Hierarchy of classes
15
Outlier Analysis
 Objects that do not comply with the general
behavior
 Noise Vs Rare events
 Fraud detection
 Statistical tests
 Deviation based methods
16
Evolution Analysis
 Trend detection
 Time series data
 Involves other functionalities
17

More Related Content

What's hot

What's hot (20)

Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 
Cohesion and coupling
Cohesion and couplingCohesion and coupling
Cohesion and coupling
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methods
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data mining
 
Distributed Query Processing
Distributed Query ProcessingDistributed Query Processing
Distributed Query Processing
 
Deductive databases
Deductive databasesDeductive databases
Deductive databases
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big Data
 
Challenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptxChallenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptx
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
 
3 Data Mining Tasks
3  Data Mining Tasks3  Data Mining Tasks
3 Data Mining Tasks
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
data mining
data miningdata mining
data mining
 
Distributed database
Distributed databaseDistributed database
Distributed database
 
Distributed database management system
Distributed database management  systemDistributed database management  system
Distributed database management system
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data mining
 

Viewers also liked (12)

Ghhh
GhhhGhhh
Ghhh
 
Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining Techniques
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysis
 
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Data Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsData Mining and Business Intelligence Tools
Data Mining and Business Intelligence Tools
 
Association Rule Mining with R
Association Rule Mining with RAssociation Rule Mining with R
Association Rule Mining with R
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data mining
Data miningData mining
Data mining
 

Similar to 1.2 steps and functionalities

Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
sumit621
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
dataminers.ir
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
Phi Jack
 

Similar to 1.2 steps and functionalities (20)

Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Part1
Part1Part1
Part1
 
Data mining
Data miningData mining
Data mining
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Data mining
Data miningData mining
Data mining
 
finalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxfinalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptx
 
Knowledge discovery claudiad amato
Knowledge discovery claudiad amatoKnowledge discovery claudiad amato
Knowledge discovery claudiad amato
 
Talk
TalkTalk
Talk
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured dataset
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 
Data mining
Data miningData mining
Data mining
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge Discovery
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
Data science guide
Data science guideData science guide
Data science guide
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and Techniques
 

More from Krish_ver2

More from Krish_ver2 (20)

5.5 back tracking
5.5 back tracking5.5 back tracking
5.5 back tracking
 
5.5 back track
5.5 back track5.5 back track
5.5 back track
 
5.5 back tracking 02
5.5 back tracking 025.5 back tracking 02
5.5 back tracking 02
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
5.4 randamized algorithm
5.4 randamized algorithm5.4 randamized algorithm
5.4 randamized algorithm
 
5.3 dynamic programming 03
5.3 dynamic programming 035.3 dynamic programming 03
5.3 dynamic programming 03
 
5.3 dynamic programming
5.3 dynamic programming5.3 dynamic programming
5.3 dynamic programming
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-i
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03
 
5.2 divide and conquer
5.2 divide and conquer5.2 divide and conquer
5.2 divide and conquer
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03
 
5.1 greedyyy 02
5.1 greedyyy 025.1 greedyyy 02
5.1 greedyyy 02
 
5.1 greedy
5.1 greedy5.1 greedy
5.1 greedy
 
5.1 greedy 03
5.1 greedy 035.1 greedy 03
5.1 greedy 03
 
4.4 hashing02
4.4 hashing024.4 hashing02
4.4 hashing02
 
4.4 hashing
4.4 hashing4.4 hashing
4.4 hashing
 
4.4 hashing ext
4.4 hashing  ext4.4 hashing  ext
4.4 hashing ext
 
4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashing
 
4.2 bst
4.2 bst4.2 bst
4.2 bst
 

Recently uploaded

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 

1.2 steps and functionalities

  • 1. Data Mining Steps and Functionalities 1
  • 2. Data Mining: A KDD Process  Data mining: the core of knowledge discovery process. Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection & Transformation Data Mining Pattern Evaluation 2
  • 3. Steps of a KDD Process  Data Cleaning  Handles Noisy, Inconsistent, Incomplete data  Missing Values  Noisy data  Binning, Clustering etc.  Inconsistencies  Tools, functional dependencies 3
  • 4.  Data Integration  Schema Integration  Entity Identification problem  Redundancy  Correlation Analysis  Data Selection  Select Only the task relevant data Steps of a KDD Process 4
  • 5.  Data Transformation  Transform or consolidate data  Smoothing, Normalization, Feature Construction  Data Reduction - Compression  Data Mining  Intelligent methods are applied to extract patterns Steps of a KDD Process 5
  • 6.  Pattern Evaluation  Interestingness Measures  Knowledge Presentation  Visualization Steps of a KDD Process 6
  • 7. Data Mining Functionalities  Descriptive  Characterize general properties of the data  Predictive  Performs inference  Mining  Parallel  Various Granularities 7
  • 8. Data Mining Functionalities  Concept/class description  Association Analysis  Classification and Prediction  Cluster Analysis  Outlier Analysis  Evolution Analysis 8
  • 9. Concept/ Class Description  Data can be associated with Classes / Concepts  Computers, Printers  BigSpenders Vs BudgetSpenders  Class / Concept Description  Classes and Concepts can be summarized in concise and precise terms  Data Characterization  Data Discrimination 9
  • 10. Data Characterization  Summarization of the general characteristics  Data collected and aggregated  OLAP roll up operation  Attribute Oriented Induction  Results – Charts, cubes, rules  Example  Characteristics of Customers 10
  • 11. Data Discrimination  Compare target class and contrasting classes  Maybe user specified  Examples:  Products whose sales increased Vs decreased  Regular Shoppers Vs Occasional Shoppers  Output includes Comparative measures 11
  • 12. Association Analysis  Discovery of association rules  Form: X ⇒ Y  Multi-dimensional  Age(X, “20…29”) ∧ income(X, “20K…25K”) ⇒ buys(X, “Laptop”)  Single Dimensional  buys(X, “Laptop”) ⇒ buys(X, “Software”) 12
  • 13. Classification and Prediction  Classification  Finds models that describe and differentiate classes or concepts  Predicts class  Training data  Models – rules, decision trees, NN, formulae  Preceded by relevance analysis (to eliminate irrelevant attributes) 13
  • 14. Classification and Prediction  Prediction  Derived model is used for prediction  Data value prediction  Class label prediction (Classification)  Trend identification 14
  • 15. Cluster Analysis  Unsupervised  Class labels are missing in the training set  Maximize Intra-class similarity  Minimize Inter-class similarity  Hierarchy of classes 15
  • 16. Outlier Analysis  Objects that do not comply with the general behavior  Noise Vs Rare events  Fraud detection  Statistical tests  Deviation based methods 16
  • 17. Evolution Analysis  Trend detection  Time series data  Involves other functionalities 17