SlideShare a Scribd company logo
Data Mining
Steps and Functionalities
1
Data Mining: A KDD Process
 Data mining: the core of
knowledge discovery
process.
Data Cleaning
Data Integration
Databases
Data
Warehouse
Task-relevant Data
Selection &
Transformation
Data Mining
Pattern Evaluation
2
Steps of a KDD Process
 Data Cleaning
 Handles Noisy, Inconsistent, Incomplete data
 Missing Values
 Noisy data
 Binning, Clustering etc.
 Inconsistencies
 Tools, functional dependencies
3
 Data Integration
 Schema Integration
 Entity Identification problem
 Redundancy
 Correlation Analysis
 Data Selection
 Select Only the task relevant data
Steps of a KDD Process
4
 Data Transformation
 Transform or consolidate data
 Smoothing, Normalization, Feature Construction
 Data Reduction - Compression
 Data Mining
 Intelligent methods are applied to extract patterns
Steps of a KDD Process
5
 Pattern Evaluation
 Interestingness Measures
 Knowledge Presentation
 Visualization
Steps of a KDD Process
6
Data Mining Functionalities
 Descriptive
 Characterize general properties of the data
 Predictive
 Performs inference
 Mining
 Parallel
 Various Granularities
7
Data Mining Functionalities
 Concept/class description
 Association Analysis
 Classification and Prediction
 Cluster Analysis
 Outlier Analysis
 Evolution Analysis
8
Concept/ Class Description
 Data can be associated with Classes /
Concepts
 Computers, Printers
 BigSpenders Vs BudgetSpenders
 Class / Concept Description
 Classes and Concepts can be summarized in
concise and precise terms
 Data Characterization
 Data Discrimination
9
Data Characterization
 Summarization of the general characteristics
 Data collected and aggregated
 OLAP roll up operation
 Attribute Oriented Induction
 Results – Charts, cubes, rules
 Example
 Characteristics of Customers
10
Data Discrimination
 Compare target class and contrasting classes
 Maybe user specified
 Examples:
 Products whose sales increased Vs decreased
 Regular Shoppers Vs Occasional Shoppers
 Output includes Comparative measures
11
Association Analysis
 Discovery of association rules
 Form: X ⇒ Y
 Multi-dimensional
 Age(X, “20…29”) ∧ income(X, “20K…25K”) ⇒
buys(X, “Laptop”)
 Single Dimensional
 buys(X, “Laptop”) ⇒ buys(X, “Software”)
12
Classification and Prediction
 Classification
 Finds models that describe and differentiate
classes or concepts
 Predicts class
 Training data
 Models – rules, decision trees, NN, formulae
 Preceded by relevance analysis (to eliminate
irrelevant attributes)
13
Classification and Prediction
 Prediction
 Derived model is used for prediction
 Data value prediction
 Class label prediction (Classification)
 Trend identification
14
Cluster Analysis
 Unsupervised
 Class labels are missing in the training set
 Maximize Intra-class similarity
 Minimize Inter-class similarity
 Hierarchy of classes
15
Outlier Analysis
 Objects that do not comply with the general
behavior
 Noise Vs Rare events
 Fraud detection
 Statistical tests
 Deviation based methods
16
Evolution Analysis
 Trend detection
 Time series data
 Involves other functionalities
17

More Related Content

What's hot

Data cubes
Data cubesData cubes
Data cubes
Mohammed
 
Machine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-offMachine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-off
International Institute of Information Technology (I²IT)
 
Data mining
Data miningData mining
Data mining
Akannsha Totewar
 
Decision tree
Decision tree Decision tree
Decision tree
asna akhtar
 
SCM EWM Solution
SCM EWM SolutionSCM EWM Solution
SCM EWM Solution
Prashant Jha
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
Student
 
Markov Random Field (MRF)
Markov Random Field (MRF)Markov Random Field (MRF)
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
malathieswaran29
 
The 7 steps of Machine Learning
The 7 steps of Machine LearningThe 7 steps of Machine Learning
The 7 steps of Machine Learning
Waziri Shebogholo
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
lavanya marichamy
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion Matrix
Andrew Ferlitsch
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
Fraboni Ec
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
smj
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Madhav Mishra
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
Dr. Abdul Ahad Abro
 
Introduction to-data-mining chapter 1
Introduction to-data-mining  chapter 1Introduction to-data-mining  chapter 1
Introduction to-data-mining chapter 1
Mahmoud Alfarra
 
Warehouse Planning and Implementation
Warehouse Planning and ImplementationWarehouse Planning and Implementation
Warehouse Planning and Implementation
SHIKHA GAUTAM
 
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & BackpropagationArtificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Mohammed Bennamoun
 
2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods
Krish_ver2
 

What's hot (20)

Data cubes
Data cubesData cubes
Data cubes
 
Machine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-offMachine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-off
 
Data mining
Data miningData mining
Data mining
 
Decision tree
Decision tree Decision tree
Decision tree
 
SCM EWM Solution
SCM EWM SolutionSCM EWM Solution
SCM EWM Solution
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Markov Random Field (MRF)
Markov Random Field (MRF)Markov Random Field (MRF)
Markov Random Field (MRF)
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
The 7 steps of Machine Learning
The 7 steps of Machine LearningThe 7 steps of Machine Learning
The 7 steps of Machine Learning
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion Matrix
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Introduction to-data-mining chapter 1
Introduction to-data-mining  chapter 1Introduction to-data-mining  chapter 1
Introduction to-data-mining chapter 1
 
Warehouse Planning and Implementation
Warehouse Planning and ImplementationWarehouse Planning and Implementation
Warehouse Planning and Implementation
 
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & BackpropagationArtificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
 
2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods
 

Viewers also liked

How can we train our employees about the basic concepts of Stock Market?
How can we train our employees about the basic concepts of Stock Market? How can we train our employees about the basic concepts of Stock Market?
How can we train our employees about the basic concepts of Stock Market?
Enhance Systems Pvt. Ltd.
 
CV Rupert Menezes
CV Rupert MenezesCV Rupert Menezes
CV Rupert Menezes
Rupert Menezes
 
Pechacucha
PechacuchaPechacucha
Pechacucha
PauMiranda96
 
Vaidyanathan VP 05
Vaidyanathan VP 05Vaidyanathan VP 05
Vaidyanathan VP 05
Rajeev Vaidyanathan
 
Vikalp Sangam (Alternatives Confluence)
Vikalp Sangam (Alternatives Confluence)Vikalp Sangam (Alternatives Confluence)
Vikalp Sangam (Alternatives Confluence)
Ashish Kothari
 
Sport rabbit
Sport rabbitSport rabbit
Sport rabbit
Jack740
 
Evaluation Activity 3
Evaluation Activity  3Evaluation Activity  3
Evaluation Activity 3
SHEKARIE
 
Почеци словенске писмености
Почеци словенске писменостиПочеци словенске писмености
Почеци словенске писмености
Основна школа "Олга Милошевић" Смед. Паланка
 
Data journalism e narrazioni civiche. A quali condizioni un giornalismo inve...
Data journalism e narrazioni civiche.  A quali condizioni un giornalismo inve...Data journalism e narrazioni civiche.  A quali condizioni un giornalismo inve...
Data journalism e narrazioni civiche. A quali condizioni un giornalismo inve...
Rosy Battaglia
 
Container Inventory Management: Factors influencing Container Interchange
Container Inventory Management: Factors influencing Container InterchangeContainer Inventory Management: Factors influencing Container Interchange
Container Inventory Management: Factors influencing Container Interchange
CINEC Campus
 
Elena fortun
Elena fortunElena fortun
Elena fortun
maccervilla
 
Big Data - How to Get Started
Big Data - How to Get Started Big Data - How to Get Started
Big Data - How to Get Started
Pactera_US
 
Big Data at your Desk with KNIME
Big Data at your Desk with KNIMEBig Data at your Desk with KNIME
Big Data at your Desk with KNIME
DataWorks Summit/Hadoop Summit
 
Ud 7 arte prerrománico
Ud 7  arte prerrománicoUd 7  arte prerrománico
Ud 7 arte prerrománico
Fueradeclase Vdp
 
Food sovereignty: Initiatives and lessons from India
Food sovereignty: Initiatives and lessons from IndiaFood sovereignty: Initiatives and lessons from India
Food sovereignty: Initiatives and lessons from India
Ashish Kothari
 

Viewers also liked (15)

How can we train our employees about the basic concepts of Stock Market?
How can we train our employees about the basic concepts of Stock Market? How can we train our employees about the basic concepts of Stock Market?
How can we train our employees about the basic concepts of Stock Market?
 
CV Rupert Menezes
CV Rupert MenezesCV Rupert Menezes
CV Rupert Menezes
 
Pechacucha
PechacuchaPechacucha
Pechacucha
 
Vaidyanathan VP 05
Vaidyanathan VP 05Vaidyanathan VP 05
Vaidyanathan VP 05
 
Vikalp Sangam (Alternatives Confluence)
Vikalp Sangam (Alternatives Confluence)Vikalp Sangam (Alternatives Confluence)
Vikalp Sangam (Alternatives Confluence)
 
Sport rabbit
Sport rabbitSport rabbit
Sport rabbit
 
Evaluation Activity 3
Evaluation Activity  3Evaluation Activity  3
Evaluation Activity 3
 
Почеци словенске писмености
Почеци словенске писменостиПочеци словенске писмености
Почеци словенске писмености
 
Data journalism e narrazioni civiche. A quali condizioni un giornalismo inve...
Data journalism e narrazioni civiche.  A quali condizioni un giornalismo inve...Data journalism e narrazioni civiche.  A quali condizioni un giornalismo inve...
Data journalism e narrazioni civiche. A quali condizioni un giornalismo inve...
 
Container Inventory Management: Factors influencing Container Interchange
Container Inventory Management: Factors influencing Container InterchangeContainer Inventory Management: Factors influencing Container Interchange
Container Inventory Management: Factors influencing Container Interchange
 
Elena fortun
Elena fortunElena fortun
Elena fortun
 
Big Data - How to Get Started
Big Data - How to Get Started Big Data - How to Get Started
Big Data - How to Get Started
 
Big Data at your Desk with KNIME
Big Data at your Desk with KNIMEBig Data at your Desk with KNIME
Big Data at your Desk with KNIME
 
Ud 7 arte prerrománico
Ud 7  arte prerrománicoUd 7  arte prerrománico
Ud 7 arte prerrománico
 
Food sovereignty: Initiatives and lessons from India
Food sovereignty: Initiatives and lessons from IndiaFood sovereignty: Initiatives and lessons from India
Food sovereignty: Initiatives and lessons from India
 

Similar to 1.2 steps and functionalities

Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
sumit621
 
Part1
Part1Part1
Part1
sumit621
 
Data mining
Data miningData mining
Data mining
DeepikaT13
 
Data mining
Data miningData mining
Data mining
pradeepa n
 
finalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxfinalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptx
shumPanwar
 
Knowledge discovery claudiad amato
Knowledge discovery claudiad amatoKnowledge discovery claudiad amato
Knowledge discovery claudiad amato
SSSW
 
Talk
TalkTalk
Talk
sumit621
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured dataset
Vibhore Agarwal
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
Khalid Salama
 
Data mining
Data miningData mining
Data mining
DeepikaT13
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge Discovery
SSSW
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
Ujjawal
 
Data Mining
Data MiningData Mining
Data Mining
Gary Stefan
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
dataminers.ir
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
Phi Jack
 
Data science guide
Data science guideData science guide
Data science guide
gokulprasath06
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
Mahmoud Alfarra
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and Techniques
DeepaR42
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
Izwan Nizal Mohd Shaharanee
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
bhagathk
 

Similar to 1.2 steps and functionalities (20)

Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Part1
Part1Part1
Part1
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
finalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxfinalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptx
 
Knowledge discovery claudiad amato
Knowledge discovery claudiad amatoKnowledge discovery claudiad amato
Knowledge discovery claudiad amato
 
Talk
TalkTalk
Talk
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured dataset
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 
Data mining
Data miningData mining
Data mining
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge Discovery
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
Data science guide
Data science guideData science guide
Data science guide
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and Techniques
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
 

More from Rajendran

Element distinctness lower bounds
Element distinctness lower boundsElement distinctness lower bounds
Element distinctness lower bounds
Rajendran
 
Scheduling with Startup and Holding Costs
Scheduling with Startup and Holding CostsScheduling with Startup and Holding Costs
Scheduling with Startup and Holding Costs
Rajendran
 
Divide and conquer surfing lower bounds
Divide and conquer  surfing lower boundsDivide and conquer  surfing lower bounds
Divide and conquer surfing lower bounds
Rajendran
 
Red black tree
Red black treeRed black tree
Red black tree
Rajendran
 
Hash table
Hash tableHash table
Hash table
Rajendran
 
Medians and order statistics
Medians and order statisticsMedians and order statistics
Medians and order statistics
Rajendran
 
Proof master theorem
Proof master theoremProof master theorem
Proof master theorem
Rajendran
 
Recursion tree method
Recursion tree methodRecursion tree method
Recursion tree method
Rajendran
 
Recurrence theorem
Recurrence theoremRecurrence theorem
Recurrence theorem
Rajendran
 
Master method
Master method Master method
Master method
Rajendran
 
Master method theorem
Master method theoremMaster method theorem
Master method theorem
Rajendran
 
Hash tables
Hash tablesHash tables
Hash tables
Rajendran
 
Lower bound
Lower boundLower bound
Lower bound
Rajendran
 
Master method theorem
Master method theoremMaster method theorem
Master method theorem
Rajendran
 
Greedy algorithms
Greedy algorithmsGreedy algorithms
Greedy algorithms
Rajendran
 
Longest common subsequences in Algorithm Analysis
Longest common subsequences in Algorithm AnalysisLongest common subsequences in Algorithm Analysis
Longest common subsequences in Algorithm Analysis
Rajendran
 
Dynamic programming in Algorithm Analysis
Dynamic programming in Algorithm AnalysisDynamic programming in Algorithm Analysis
Dynamic programming in Algorithm Analysis
Rajendran
 
Average case Analysis of Quicksort
Average case Analysis of QuicksortAverage case Analysis of Quicksort
Average case Analysis of Quicksort
Rajendran
 
Np completeness
Np completenessNp completeness
Np completeness
Rajendran
 
computer languages
computer languagescomputer languages
computer languages
Rajendran
 

More from Rajendran (20)

Element distinctness lower bounds
Element distinctness lower boundsElement distinctness lower bounds
Element distinctness lower bounds
 
Scheduling with Startup and Holding Costs
Scheduling with Startup and Holding CostsScheduling with Startup and Holding Costs
Scheduling with Startup and Holding Costs
 
Divide and conquer surfing lower bounds
Divide and conquer  surfing lower boundsDivide and conquer  surfing lower bounds
Divide and conquer surfing lower bounds
 
Red black tree
Red black treeRed black tree
Red black tree
 
Hash table
Hash tableHash table
Hash table
 
Medians and order statistics
Medians and order statisticsMedians and order statistics
Medians and order statistics
 
Proof master theorem
Proof master theoremProof master theorem
Proof master theorem
 
Recursion tree method
Recursion tree methodRecursion tree method
Recursion tree method
 
Recurrence theorem
Recurrence theoremRecurrence theorem
Recurrence theorem
 
Master method
Master method Master method
Master method
 
Master method theorem
Master method theoremMaster method theorem
Master method theorem
 
Hash tables
Hash tablesHash tables
Hash tables
 
Lower bound
Lower boundLower bound
Lower bound
 
Master method theorem
Master method theoremMaster method theorem
Master method theorem
 
Greedy algorithms
Greedy algorithmsGreedy algorithms
Greedy algorithms
 
Longest common subsequences in Algorithm Analysis
Longest common subsequences in Algorithm AnalysisLongest common subsequences in Algorithm Analysis
Longest common subsequences in Algorithm Analysis
 
Dynamic programming in Algorithm Analysis
Dynamic programming in Algorithm AnalysisDynamic programming in Algorithm Analysis
Dynamic programming in Algorithm Analysis
 
Average case Analysis of Quicksort
Average case Analysis of QuicksortAverage case Analysis of Quicksort
Average case Analysis of Quicksort
 
Np completeness
Np completenessNp completeness
Np completeness
 
computer languages
computer languagescomputer languages
computer languages
 

Recently uploaded

一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
uqyfuc
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
Gino153088
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
Dwarkadas J Sanghvi College of Engineering
 
OOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming languageOOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming language
PreethaV16
 
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
sydezfe
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
vmspraneeth
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
nedcocy
 
P5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civilP5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civil
AnasAhmadNoor
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
Kamal Acharya
 
2. protection of river banks and bed erosion protection works.ppt
2. protection of river banks and bed erosion protection works.ppt2. protection of river banks and bed erosion protection works.ppt
2. protection of river banks and bed erosion protection works.ppt
abdatawakjira
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
ijseajournal
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
upoux
 
Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...
cannyengineerings
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
PriyankaKilaniya
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
VANDANAMOHANGOUDA
 
Blood finder application project report (1).pdf
Blood finder application project report (1).pdfBlood finder application project report (1).pdf
Blood finder application project report (1).pdf
Kamal Acharya
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
harshapolam10
 

Recently uploaded (20)

一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
 
OOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming languageOOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming language
 
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
 
P5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civilP5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civil
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
 
2. protection of river banks and bed erosion protection works.ppt
2. protection of river banks and bed erosion protection works.ppt2. protection of river banks and bed erosion protection works.ppt
2. protection of river banks and bed erosion protection works.ppt
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
 
Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
 
Blood finder application project report (1).pdf
Blood finder application project report (1).pdfBlood finder application project report (1).pdf
Blood finder application project report (1).pdf
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
 

1.2 steps and functionalities

  • 1. Data Mining Steps and Functionalities 1
  • 2. Data Mining: A KDD Process  Data mining: the core of knowledge discovery process. Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection & Transformation Data Mining Pattern Evaluation 2
  • 3. Steps of a KDD Process  Data Cleaning  Handles Noisy, Inconsistent, Incomplete data  Missing Values  Noisy data  Binning, Clustering etc.  Inconsistencies  Tools, functional dependencies 3
  • 4.  Data Integration  Schema Integration  Entity Identification problem  Redundancy  Correlation Analysis  Data Selection  Select Only the task relevant data Steps of a KDD Process 4
  • 5.  Data Transformation  Transform or consolidate data  Smoothing, Normalization, Feature Construction  Data Reduction - Compression  Data Mining  Intelligent methods are applied to extract patterns Steps of a KDD Process 5
  • 6.  Pattern Evaluation  Interestingness Measures  Knowledge Presentation  Visualization Steps of a KDD Process 6
  • 7. Data Mining Functionalities  Descriptive  Characterize general properties of the data  Predictive  Performs inference  Mining  Parallel  Various Granularities 7
  • 8. Data Mining Functionalities  Concept/class description  Association Analysis  Classification and Prediction  Cluster Analysis  Outlier Analysis  Evolution Analysis 8
  • 9. Concept/ Class Description  Data can be associated with Classes / Concepts  Computers, Printers  BigSpenders Vs BudgetSpenders  Class / Concept Description  Classes and Concepts can be summarized in concise and precise terms  Data Characterization  Data Discrimination 9
  • 10. Data Characterization  Summarization of the general characteristics  Data collected and aggregated  OLAP roll up operation  Attribute Oriented Induction  Results – Charts, cubes, rules  Example  Characteristics of Customers 10
  • 11. Data Discrimination  Compare target class and contrasting classes  Maybe user specified  Examples:  Products whose sales increased Vs decreased  Regular Shoppers Vs Occasional Shoppers  Output includes Comparative measures 11
  • 12. Association Analysis  Discovery of association rules  Form: X ⇒ Y  Multi-dimensional  Age(X, “20…29”) ∧ income(X, “20K…25K”) ⇒ buys(X, “Laptop”)  Single Dimensional  buys(X, “Laptop”) ⇒ buys(X, “Software”) 12
  • 13. Classification and Prediction  Classification  Finds models that describe and differentiate classes or concepts  Predicts class  Training data  Models – rules, decision trees, NN, formulae  Preceded by relevance analysis (to eliminate irrelevant attributes) 13
  • 14. Classification and Prediction  Prediction  Derived model is used for prediction  Data value prediction  Class label prediction (Classification)  Trend identification 14
  • 15. Cluster Analysis  Unsupervised  Class labels are missing in the training set  Maximize Intra-class similarity  Minimize Inter-class similarity  Hierarchy of classes 15
  • 16. Outlier Analysis  Objects that do not comply with the general behavior  Noise Vs Rare events  Fraud detection  Statistical tests  Deviation based methods 16
  • 17. Evolution Analysis  Trend detection  Time series data  Involves other functionalities 17