SlideShare a Scribd company logo
1 of 14
Machine Learning Methods for
Data Mining
Based on-
Data Mining: Concepts and Techniques
Han, Kamber & Pei
A.B.M. Ashikur Rahman
Asst. Professor,
Dept. of CSE, IUT
Data Mining
Knowledge Discovery from Data (KDD) process steps-
• Data Cleaning
• Data Integration
• Data Selection
• Data Transformation
• Pattern Mining
• Pattern Evaluation
• Knowledge Representation
e.g.-
Frequent itemsets,
Association rule (Strong/week)
3
Supervised vs. Unsupervised Learning
• Supervised learning (classification)
• Supervision: The training data (observations, measurements, etc.) are
accompanied by labels indicating the class of the observations
• New data is classified based on the training set
• Unsupervised learning (clustering)
• The class labels of training data is unknown
• Given a set of measurements, observations, etc. with the aim of establishing the
existence of classes or clusters in the data
4
Classification vs. Numeric Prediction
• Classification
• predicts categorical class labels (discrete or nominal)
• classifies data (constructs a model) based on the training set and the values (class
labels) in a classifying attribute and uses it in classifying new data
• Numeric Prediction
• models continuous-valued functions, i.e., predicts unknown or missing values
• Typical applications
• Credit/loan approval:
• Medical diagnosis: if a tumor is cancerous or benign
• Fraud detection: if a transaction is fraudulent
• Web page categorization: which category it is
Prediction Problems:
5
Classification—A Two-Step Process
• Model construction: describing a set of predetermined classes
• Each tuple/sample is assumed to belong to a predefined class, as determined by the class label
attribute
• The set of tuples used for model construction is training set
• The model is represented as classification rules, decision trees, or mathematical formulae
• Model usage: for classifying future or unknown objects
• Estimate accuracy of the model
• The known label of test sample is compared with the classified result from the model
• Accuracy rate is the percentage of test set samples that are correctly classified by the model
• Test set is independent of training set (otherwise overfitting)
• If the accuracy is acceptable, use the model to classify new data
• Note: If the test set is used to select models, it is called validation (test) set
6
Process (1): Model Construction
Training
Data
NAME RANK YEARS TENURED
Mike Assistant Prof 3 no
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
Dave Assistant Prof 6 no
Anne Associate Prof 3 no
Classification
Algorithms
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
Classifier
(Model)
7
Process (2): Using the Model in Prediction
Classifier
Testing
Data
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
Unseen Data
(Jeff, Professor, 4)
Tenured?
Classification Methods
• Decision Tree Induction
• Naïve Bayesian Classification
• Rule based Classification
• Bayesian Belief Network
• Support Vector Machine (SVM) etc.
9
What is Cluster Analysis?
• Cluster: A collection of data objects
• similar (or related) to one another within the same group
• dissimilar (or unrelated) to the objects in other groups
• Cluster analysis (or clustering, data segmentation, …)
• Finding similarities between data according to the characteristics found in the data
and grouping similar data objects into clusters
• Unsupervised learning: no predefined classes (i.e., learning by observations vs.
learning by examples: supervised)
• Typical applications
• As a stand-alone tool to get insight into data distribution
• As a preprocessing step for other algorithms
10
Clustering for Data Understanding and Applications
• Biology: taxonomy of living things: kingdom, phylum, class, order, family, genus and species
• Information retrieval: document clustering
• Land use: Identification of areas of similar land use in an earth observation database
• Marketing: Help marketers discover distinct groups in their customer bases, and then use this
knowledge to develop targeted marketing programs
• City-planning: Identifying groups of houses according to their house type, value, and geographical
location
• Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults
• Climate: understanding earth climate, find patterns of atmospheric and ocean
• Economic Science: market resarch
11
Clustering as a Preprocessing Tool (Utility)
• Summarization:
• Preprocessing for regression, PCA, classification, and association analysis
• Compression:
• Image processing: vector quantization
• Finding K-nearest Neighbors
• Localizing search to one or a small number of clusters
• Outlier detection
• Outliers are often viewed as those “far away” from any cluster
Quality: What Is Good Clustering?
• A good clustering method will produce high quality clusters
• high intra-class similarity: cohesive within clusters
• low inter-class similarity: distinctive between clusters
• The quality of a clustering method depends on
• the similarity measure used by the method
• its implementation, and
• Its ability to discover some or all of the hidden patterns
12
Measure the Quality of Clustering
• Dissimilarity/Similarity metric
• Similarity is expressed in terms of a distance function, typically metric: d(i, j)
• The definitions of distance functions are usually rather different for interval-
scaled, boolean, categorical, ordinal ratio, and vector variables
• Weights should be associated with different variables based on applications and
data semantics
• Quality of clustering:
• There is usually a separate “quality” function that measures the “goodness” of a
cluster.
• It is hard to define “similar enough” or “good enough”
• The answer is typically highly subjective
13
Major Clustering Approaches (I)
• Partitioning approach:
• Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum
of square errors
• Typical methods: k-means, k-medoids, CLARANS
• Hierarchical approach:
• Create a hierarchical decomposition of the set of data (or objects) using some criterion
• Typical methods: Diana, Agnes, BIRCH, CAMELEON
• Density-based approach:
• Based on connectivity and density functions
• Typical methods: DBSACN, OPTICS, DenClue
• Grid-based approach:
• based on a multiple-level granularity structure
• Typical methods: STING, WaveCluster, CLIQUE
14

More Related Content

What's hot

Anomaly Detection Technique
Anomaly Detection TechniqueAnomaly Detection Technique
Anomaly Detection TechniqueChakrit Phain
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detectionguest0edcaf
 
2. sampling techniques
2. sampling techniques2. sampling techniques
2. sampling techniquesDebasish Padhy
 
Qualitative data analysis
Qualitative data analysisQualitative data analysis
Qualitative data analysisShankar Talwar
 
Qualitative Data Analysis (Steps)
Qualitative Data Analysis (Steps)Qualitative Data Analysis (Steps)
Qualitative Data Analysis (Steps)guest7f1ad678
 
Research Method EMBA chapter 10
Research Method EMBA chapter 10Research Method EMBA chapter 10
Research Method EMBA chapter 10Mazhar Poohlah
 
Survey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data MiningSurvey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data Miningijsrd.com
 
Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...
Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...
Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...James Mullooly PhD
 
Chap10 Anomaly Detection
Chap10 Anomaly DetectionChap10 Anomaly Detection
Chap10 Anomaly Detectionguest76d673
 
Research Method for Business chapter 10
Research Method for Business chapter  10Research Method for Business chapter  10
Research Method for Business chapter 10Mazhar Poohlah
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisEva Durall
 
Sampling Design
Sampling DesignSampling Design
Sampling DesignJale Nonan
 
615900072
615900072615900072
615900072picktru
 

What's hot (18)

Anomaly Detection Technique
Anomaly Detection TechniqueAnomaly Detection Technique
Anomaly Detection Technique
 
12 outlier
12 outlier12 outlier
12 outlier
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
2. sampling techniques
2. sampling techniques2. sampling techniques
2. sampling techniques
 
Qualitative data analysis
Qualitative data analysisQualitative data analysis
Qualitative data analysis
 
Qualitative Data Analysis (Steps)
Qualitative Data Analysis (Steps)Qualitative Data Analysis (Steps)
Qualitative Data Analysis (Steps)
 
Research Method EMBA chapter 10
Research Method EMBA chapter 10Research Method EMBA chapter 10
Research Method EMBA chapter 10
 
Survey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data MiningSurvey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data Mining
 
Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...
Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...
Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...
 
Statistical sampling
Statistical samplingStatistical sampling
Statistical sampling
 
导论1
导论1导论1
导论1
 
Chap10 Anomaly Detection
Chap10 Anomaly DetectionChap10 Anomaly Detection
Chap10 Anomaly Detection
 
Knowledge Discovery
Knowledge DiscoveryKnowledge Discovery
Knowledge Discovery
 
Research Method for Business chapter 10
Research Method for Business chapter  10Research Method for Business chapter  10
Research Method for Business chapter 10
 
Classification
ClassificationClassification
Classification
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
 
Sampling Design
Sampling DesignSampling Design
Sampling Design
 
615900072
615900072615900072
615900072
 

Similar to Machine learning algorithms for data mining

Lect8 Classification & prediction
Lect8 Classification & predictionLect8 Classification & prediction
Lect8 Classification & predictionhktripathy
 
Data mining chapter04and5-best
Data mining chapter04and5-bestData mining chapter04and5-best
Data mining chapter04and5-bestABDUmomo
 
Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit ivmalathieswaran29
 
BTech Pattern Recognition Notes
BTech Pattern Recognition NotesBTech Pattern Recognition Notes
BTech Pattern Recognition NotesAshutosh Agrahari
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit vmalathieswaran29
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfAschalewAyele2
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptxNIKHILGR3
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data MiningValerii Klymchuk
 
Classification and Prediction.pptx
Classification and Prediction.pptxClassification and Prediction.pptx
Classification and Prediction.pptxSandeepAgrawal84
 
Lecture 5 machine learning updated
Lecture 5   machine learning updatedLecture 5   machine learning updated
Lecture 5 machine learning updatedVajira Thambawita
 
Advanced Working Principles on Supervised and Unsupervised Learning
Advanced Working Principles on Supervised and Unsupervised LearningAdvanced Working Principles on Supervised and Unsupervised Learning
Advanced Working Principles on Supervised and Unsupervised LearningNahin Kumar Dey
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onwordSulman Ahmed
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdfbintis1
 
Lecture 2 Data mining process.pdf
Lecture 2 Data mining process.pdfLecture 2 Data mining process.pdf
Lecture 2 Data mining process.pdfKaushik Kundu
 
Qualitative and quantitative analysis
Qualitative and quantitative analysisQualitative and quantitative analysis
Qualitative and quantitative analysisNellie Deutsch (Ed.D)
 
Introduction to Data Analysis for Nurse Researchers
Introduction to Data Analysis for Nurse ResearchersIntroduction to Data Analysis for Nurse Researchers
Introduction to Data Analysis for Nurse ResearchersRupa Verma
 

Similar to Machine learning algorithms for data mining (20)

Lect8 Classification & prediction
Lect8 Classification & predictionLect8 Classification & prediction
Lect8 Classification & prediction
 
Data mining chapter04and5-best
Data mining chapter04and5-bestData mining chapter04and5-best
Data mining chapter04and5-best
 
Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit iv
 
BTech Pattern Recognition Notes
BTech Pattern Recognition NotesBTech Pattern Recognition Notes
BTech Pattern Recognition Notes
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptx
 
Weka bike rental
Weka bike rentalWeka bike rental
Weka bike rental
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
 
DM_clustering.ppt
DM_clustering.pptDM_clustering.ppt
DM_clustering.ppt
 
Cluster
ClusterCluster
Cluster
 
Classification and Prediction.pptx
Classification and Prediction.pptxClassification and Prediction.pptx
Classification and Prediction.pptx
 
Lecture 5 machine learning updated
Lecture 5   machine learning updatedLecture 5   machine learning updated
Lecture 5 machine learning updated
 
Advanced Working Principles on Supervised and Unsupervised Learning
Advanced Working Principles on Supervised and Unsupervised LearningAdvanced Working Principles on Supervised and Unsupervised Learning
Advanced Working Principles on Supervised and Unsupervised Learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdf
 
Lecture 2 Data mining process.pdf
Lecture 2 Data mining process.pdfLecture 2 Data mining process.pdf
Lecture 2 Data mining process.pdf
 
Qualitative and quantitative analysis
Qualitative and quantitative analysisQualitative and quantitative analysis
Qualitative and quantitative analysis
 
Introduction to Data Analysis for Nurse Researchers
Introduction to Data Analysis for Nurse ResearchersIntroduction to Data Analysis for Nurse Researchers
Introduction to Data Analysis for Nurse Researchers
 

More from Ashikur Rahman

Graph Theory: Matrix representation of graphs
Graph Theory: Matrix representation of graphsGraph Theory: Matrix representation of graphs
Graph Theory: Matrix representation of graphsAshikur Rahman
 
SOP writing: What, Why & How
SOP writing: What, Why & HowSOP writing: What, Why & How
SOP writing: What, Why & HowAshikur Rahman
 
Graph Theory: Planarity & Dual Graph
Graph Theory: Planarity & Dual GraphGraph Theory: Planarity & Dual Graph
Graph Theory: Planarity & Dual GraphAshikur Rahman
 
Graph Theory: Connectivity & Isomorphism
Graph Theory: Connectivity & Isomorphism Graph Theory: Connectivity & Isomorphism
Graph Theory: Connectivity & Isomorphism Ashikur Rahman
 
Graph Theory: Cut-Set and Cut-Vertices
Graph Theory: Cut-Set and Cut-VerticesGraph Theory: Cut-Set and Cut-Vertices
Graph Theory: Cut-Set and Cut-VerticesAshikur Rahman
 
Graph Theory: Paths & Cycles
Graph Theory: Paths & CyclesGraph Theory: Paths & Cycles
Graph Theory: Paths & CyclesAshikur Rahman
 
Cybercrimes and Cybercriminals
Cybercrimes and CybercriminalsCybercrimes and Cybercriminals
Cybercrimes and CybercriminalsAshikur Rahman
 
E-Marketing and Advertising Concepts
E-Marketing and Advertising ConceptsE-Marketing and Advertising Concepts
E-Marketing and Advertising ConceptsAshikur Rahman
 
Signature verification Using SIFT Features
Signature verification Using SIFT FeaturesSignature verification Using SIFT Features
Signature verification Using SIFT FeaturesAshikur Rahman
 

More from Ashikur Rahman (10)

Graph Theory: Matrix representation of graphs
Graph Theory: Matrix representation of graphsGraph Theory: Matrix representation of graphs
Graph Theory: Matrix representation of graphs
 
SOP writing: What, Why & How
SOP writing: What, Why & HowSOP writing: What, Why & How
SOP writing: What, Why & How
 
Graph Theory: Planarity & Dual Graph
Graph Theory: Planarity & Dual GraphGraph Theory: Planarity & Dual Graph
Graph Theory: Planarity & Dual Graph
 
Graph Theory: Connectivity & Isomorphism
Graph Theory: Connectivity & Isomorphism Graph Theory: Connectivity & Isomorphism
Graph Theory: Connectivity & Isomorphism
 
Graph Theory: Cut-Set and Cut-Vertices
Graph Theory: Cut-Set and Cut-VerticesGraph Theory: Cut-Set and Cut-Vertices
Graph Theory: Cut-Set and Cut-Vertices
 
Graph Theory: Trees
Graph Theory: TreesGraph Theory: Trees
Graph Theory: Trees
 
Graph Theory: Paths & Cycles
Graph Theory: Paths & CyclesGraph Theory: Paths & Cycles
Graph Theory: Paths & Cycles
 
Cybercrimes and Cybercriminals
Cybercrimes and CybercriminalsCybercrimes and Cybercriminals
Cybercrimes and Cybercriminals
 
E-Marketing and Advertising Concepts
E-Marketing and Advertising ConceptsE-Marketing and Advertising Concepts
E-Marketing and Advertising Concepts
 
Signature verification Using SIFT Features
Signature verification Using SIFT FeaturesSignature verification Using SIFT Features
Signature verification Using SIFT Features
 

Recently uploaded

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 

Recently uploaded (20)

Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 

Machine learning algorithms for data mining

  • 1. Machine Learning Methods for Data Mining Based on- Data Mining: Concepts and Techniques Han, Kamber & Pei A.B.M. Ashikur Rahman Asst. Professor, Dept. of CSE, IUT
  • 2. Data Mining Knowledge Discovery from Data (KDD) process steps- • Data Cleaning • Data Integration • Data Selection • Data Transformation • Pattern Mining • Pattern Evaluation • Knowledge Representation e.g.- Frequent itemsets, Association rule (Strong/week)
  • 3. 3 Supervised vs. Unsupervised Learning • Supervised learning (classification) • Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations • New data is classified based on the training set • Unsupervised learning (clustering) • The class labels of training data is unknown • Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
  • 4. 4 Classification vs. Numeric Prediction • Classification • predicts categorical class labels (discrete or nominal) • classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data • Numeric Prediction • models continuous-valued functions, i.e., predicts unknown or missing values • Typical applications • Credit/loan approval: • Medical diagnosis: if a tumor is cancerous or benign • Fraud detection: if a transaction is fraudulent • Web page categorization: which category it is Prediction Problems:
  • 5. 5 Classification—A Two-Step Process • Model construction: describing a set of predetermined classes • Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute • The set of tuples used for model construction is training set • The model is represented as classification rules, decision trees, or mathematical formulae • Model usage: for classifying future or unknown objects • Estimate accuracy of the model • The known label of test sample is compared with the classified result from the model • Accuracy rate is the percentage of test set samples that are correctly classified by the model • Test set is independent of training set (otherwise overfitting) • If the accuracy is acceptable, use the model to classify new data • Note: If the test set is used to select models, it is called validation (test) set
  • 6. 6 Process (1): Model Construction Training Data NAME RANK YEARS TENURED Mike Assistant Prof 3 no Mary Assistant Prof 7 yes Bill Professor 2 yes Jim Associate Prof 7 yes Dave Assistant Prof 6 no Anne Associate Prof 3 no Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Classifier (Model)
  • 7. 7 Process (2): Using the Model in Prediction Classifier Testing Data NAME RANK YEARS TENURED Tom Assistant Prof 2 no Merlisa Associate Prof 7 no George Professor 5 yes Joseph Assistant Prof 7 yes Unseen Data (Jeff, Professor, 4) Tenured?
  • 8. Classification Methods • Decision Tree Induction • Naïve Bayesian Classification • Rule based Classification • Bayesian Belief Network • Support Vector Machine (SVM) etc.
  • 9. 9 What is Cluster Analysis? • Cluster: A collection of data objects • similar (or related) to one another within the same group • dissimilar (or unrelated) to the objects in other groups • Cluster analysis (or clustering, data segmentation, …) • Finding similarities between data according to the characteristics found in the data and grouping similar data objects into clusters • Unsupervised learning: no predefined classes (i.e., learning by observations vs. learning by examples: supervised) • Typical applications • As a stand-alone tool to get insight into data distribution • As a preprocessing step for other algorithms
  • 10. 10 Clustering for Data Understanding and Applications • Biology: taxonomy of living things: kingdom, phylum, class, order, family, genus and species • Information retrieval: document clustering • Land use: Identification of areas of similar land use in an earth observation database • Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs • City-planning: Identifying groups of houses according to their house type, value, and geographical location • Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults • Climate: understanding earth climate, find patterns of atmospheric and ocean • Economic Science: market resarch
  • 11. 11 Clustering as a Preprocessing Tool (Utility) • Summarization: • Preprocessing for regression, PCA, classification, and association analysis • Compression: • Image processing: vector quantization • Finding K-nearest Neighbors • Localizing search to one or a small number of clusters • Outlier detection • Outliers are often viewed as those “far away” from any cluster
  • 12. Quality: What Is Good Clustering? • A good clustering method will produce high quality clusters • high intra-class similarity: cohesive within clusters • low inter-class similarity: distinctive between clusters • The quality of a clustering method depends on • the similarity measure used by the method • its implementation, and • Its ability to discover some or all of the hidden patterns 12
  • 13. Measure the Quality of Clustering • Dissimilarity/Similarity metric • Similarity is expressed in terms of a distance function, typically metric: d(i, j) • The definitions of distance functions are usually rather different for interval- scaled, boolean, categorical, ordinal ratio, and vector variables • Weights should be associated with different variables based on applications and data semantics • Quality of clustering: • There is usually a separate “quality” function that measures the “goodness” of a cluster. • It is hard to define “similar enough” or “good enough” • The answer is typically highly subjective 13
  • 14. Major Clustering Approaches (I) • Partitioning approach: • Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors • Typical methods: k-means, k-medoids, CLARANS • Hierarchical approach: • Create a hierarchical decomposition of the set of data (or objects) using some criterion • Typical methods: Diana, Agnes, BIRCH, CAMELEON • Density-based approach: • Based on connectivity and density functions • Typical methods: DBSACN, OPTICS, DenClue • Grid-based approach: • based on a multiple-level granularity structure • Typical methods: STING, WaveCluster, CLIQUE 14