SlideShare a Scribd company logo
1 of 25
Download to read offline
CLUSTERING IN DATA MINING
NAME -SHAIKH MUSKAN A.
SEAT NO-740
GUIDE NAME –MR.VIJESH
SHUKLA
Clustering In Data Mining
Overview
➢ Introduction
➢ What is Clustering?
➢ Requirements Of Clustering
➢ Application Of Clustering
➢ Clustering Types
➢ Clustering Methods
➢ K-means Algorithm
➢ Summery
➢ References
Introduction
 Clustering is the process of organising data into meaningful
groups, and these groups are called clusters.
 Clustering can be seen as a generalisation of classification. In
classification we have the knowledge about both the object,
the characteristics. So classification is more similar to just
finding “Where to put the new object in”.
 Clustering on the other hand analyses the data and finds out
the characteristics in it, either based on responses (supervised)
or more generally without any responses (unsupervised).
What Is Clustering?
➢ A Cluster is a collection of data objects which are
▪ Similar(or related) to one another within the same group(i.e Cluster)
▪ Dissimilar (or unrelated) to the objects in other groups(i.e Clusters)
➢ Clustering:
Clustering is a process of partitioning a set of data(or objects) into a set of
meaningful sub-classes, called clusters.
➢ Unsupervised learning: no predefined classes
➢ While doing cluster analysis, we first partition the set of data into groups.
That based on data similarity and then assign the labels to the groups.
Example Of Clustering
Requirements Of Clustering
 Scalability
 Ability to deal with any kinds of attributes
 Discovery of clusters with attribute shape
 High Dimensionally
 Ability to deal with noisy data
 Interpretability
Application Of Clustering
 Marketing: Help marketers discover distinct groups in their customer
bases, and then use this knowledge to develop targeted marketing
programs.
 Land use: Identification of areas of similar land use in an earth
observation database.
 Classify-document: Helps in classifying documents on the web for
information discovery.
 City-planning: Identifying groups of houses according to their house
type, value, and geographical location.
 Also we use data clustering in outlier detection application such as
detection of credit card fraud.
Clustering Types
 Portioning Clustering
✓ A division data objects into non-overlapping subsets
(clusters) such that each data object is in exactly one
subset.
 Hierarchical Clusterin
✓ A set of nested clusters organized as a hierarchical tree.
Clustering Types
Portioning Clustering Hierarchical Clustering
Clustering Methods
 Partitioning Method
 Hierarchical Method
 Density-based Method
 Grid-Based Method
Clustering Methods :Partitioning Method
 Partitioning Method
Partitioning method that subdivide the data objects into a set of k clusters .
where k is the number of groups pre-specified.
 the following requirements:
➢ Each group contain at least one object.
➢ Each object must belong to exactly one group.
➢ Example:
k-means algorithm
Clustering Methods :Partitioning Method
 Hierarchical Methods
▪ This method create the hierarchical decomposition of the given set of data
objects. We can classify Hierarchical method on basis of how the hierarchical
decomposition is formed as follows:
 Agglomerative Approach
• bottom-up approach.
• each object forming a separate group.
❑ Divisive Approach
• top-down approach.
• objects in the same cluster.
Clustering Methods :Partitioning Method
 Disadvantage
▪ This method is rigid i.e. once merge or split is done, It can never be undone.
Density-based Method
 Density-based Method
• This method is based on the notion of density. The basic idea is
to continue growing the given cluster as long as the density in
the neighbourhood exceeds some threshold
❑ Major Features:
• Discover Clustered of Arbitary Shape.
 Example:
• DBSCAN Algorithm
Grid-based Method
 Grid-based Method
• In this the objects together form a grid. The object space is
quantized into finite number of cells that form a grid
structure.
 Advantage
• The major advantage of this method is fast processing time.
K-means clustering algorithm
 k-means is one of the simplest unsupervised learning algorithms that
solve the well known clustering problem.
 The procedure follows a simple and easy way to classify a given data
set through a certain number of cluster (assume k clusters) .
K-means clustering algorithm
 How its work.
K-means clustering algorithm
 Algorithmic steps for k-means clustering
 Given k, the k-means algorithm is implemented in four
steps:
 Partition objects into k nonempty subsets
 Compute seed points as the centroids of the clusters of the current
partition (the centroid is the center, i.e., mean point, of the cluster)
 Assign each object to the cluster with the nearest seed point
 Go back to Step 2, stop when no more new assignment
K-means clustering algorithm
 Example:
K-means clustering algorithm
K-means clustering algorithm
 Example
K-means clustering
Advantages Disadvantages
• Simple, understandable
• items automatically
assigned to clusters
• Must pick number of
clusters before hand
• Often terminates at a
local optimum.
• All items forced into a
cluster
• Too sensitive to outliers
What Is the Problem of the K-Means ?
 The k-means algorithm is sensitive to outliers !
 Since an object with an extremely large value may substantially distort the
distribution of the data
 K-Medoids: Instead of taking the mean value of the object in a cluster as a
reference point, medoids can be used, which is the most centrally located object in
a cluster
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
Summery
 Cluster analysis groups objects based on their similarity and has
wide applications
 Measure of similarity can be computed for various types of data
 Clustering algorithms can be categorized into partitioning methods,
hierarchical methods, density-based methods, grid-based methods,
and model-based methods
 Outlier detection and analysis are very useful for fraud detection,
etc. and can be performed by statistical, distance-based or
deviation-based approaches
 There are still lots of research issues on cluster analysis, such as
constraint-based clustering
References
 Data Mining Next Generation Challenges & Future
Directions HillolKargupta, AnupamJoshi, Yelena Yesha,
Krishnamoorthy Sivakumar PHI 9.
 Data Mining Concepts & Techniques Jiawei Han, Mining
Techniques and Trends N.P Gopalan, B. Sivasalvan PHI
 https://sites.google.com/
 https://towardsdatascience.com/
 https://www.tutorialspoint.com

More Related Content

What's hot

Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptxmaha797959
 
DATA WRANGLING presentation.pptx
DATA WRANGLING presentation.pptxDATA WRANGLING presentation.pptx
DATA WRANGLING presentation.pptxAbdullahAbbasi55
 
Presentation on data preparation with pandas
Presentation on data preparation with pandasPresentation on data preparation with pandas
Presentation on data preparation with pandasAkshitaKanther
 
pandas - Python Data Analysis
pandas - Python Data Analysispandas - Python Data Analysis
pandas - Python Data AnalysisAndrew Henshaw
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olapSalah Amean
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text MiningHemant Sharma
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data modelmoni sindhu
 
Data Reduction Stratergies
Data Reduction StratergiesData Reduction Stratergies
Data Reduction StratergiesAnjaliSoorej
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandasPiyush rai
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceMaryamRehman6
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationDatamining Tools
 
Suffix Tree and Suffix Array
Suffix Tree and Suffix ArraySuffix Tree and Suffix Array
Suffix Tree and Suffix ArrayHarshit Agarwal
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysishktripathy
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data miningKamal Acharya
 

What's hot (20)

Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
 
Data Mining
Data MiningData Mining
Data Mining
 
DATA WRANGLING presentation.pptx
DATA WRANGLING presentation.pptxDATA WRANGLING presentation.pptx
DATA WRANGLING presentation.pptx
 
Presentation on data preparation with pandas
Presentation on data preparation with pandasPresentation on data preparation with pandas
Presentation on data preparation with pandas
 
pandas - Python Data Analysis
pandas - Python Data Analysispandas - Python Data Analysis
pandas - Python Data Analysis
 
Pandas
PandasPandas
Pandas
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
WEB MINING.
WEB MINING.WEB MINING.
WEB MINING.
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text Mining
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
 
Data Reduction Stratergies
Data Reduction StratergiesData Reduction Stratergies
Data Reduction Stratergies
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 
Data mining
Data miningData mining
Data mining
 
Web mining
Web miningWeb mining
Web mining
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalization
 
Suffix Tree and Suffix Array
Suffix Tree and Suffix ArraySuffix Tree and Suffix Array
Suffix Tree and Suffix Array
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 

Similar to Clustering in Data Mining: K-Means Algorithm Explained

UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningNandakumar P
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Mustafa Sherazi
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit vmalathieswaran29
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.pptvikassingh569137
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptxJK970901
 
Unit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptxUnit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptxDr.Shweta
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis IntroductionPrasiddhaSarma
 
Chapter 10.1,2,3 pdf.pdf
Chapter 10.1,2,3 pdf.pdfChapter 10.1,2,3 pdf.pdf
Chapter 10.1,2,3 pdf.pdfAmy Aung
 
Paper id 26201478
Paper id 26201478Paper id 26201478
Paper id 26201478IJRAT
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfSowmyaJyothi3
 

Similar to Clustering in Data Mining: K-Means Algorithm Explained (20)

UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
 
Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
 
Data mining
Data miningData mining
Data mining
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)
 
DM_clustering.ppt
DM_clustering.pptDM_clustering.ppt
DM_clustering.ppt
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
Cluster
ClusterCluster
Cluster
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptx
 
Unit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptxUnit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptx
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis Introduction
 
Chapter 10.1,2,3 pdf.pdf
Chapter 10.1,2,3 pdf.pdfChapter 10.1,2,3 pdf.pdf
Chapter 10.1,2,3 pdf.pdf
 
Clustering
ClusteringClustering
Clustering
 
Paper id 26201478
Paper id 26201478Paper id 26201478
Paper id 26201478
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
 
Data clustring
Data clustring Data clustring
Data clustring
 
Descriptive m0deling
Descriptive m0delingDescriptive m0deling
Descriptive m0deling
 

Recently uploaded

Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 

Recently uploaded (20)

Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 

Clustering in Data Mining: K-Means Algorithm Explained

  • 1. CLUSTERING IN DATA MINING NAME -SHAIKH MUSKAN A. SEAT NO-740 GUIDE NAME –MR.VIJESH SHUKLA
  • 2. Clustering In Data Mining Overview ➢ Introduction ➢ What is Clustering? ➢ Requirements Of Clustering ➢ Application Of Clustering ➢ Clustering Types ➢ Clustering Methods ➢ K-means Algorithm ➢ Summery ➢ References
  • 3. Introduction  Clustering is the process of organising data into meaningful groups, and these groups are called clusters.  Clustering can be seen as a generalisation of classification. In classification we have the knowledge about both the object, the characteristics. So classification is more similar to just finding “Where to put the new object in”.  Clustering on the other hand analyses the data and finds out the characteristics in it, either based on responses (supervised) or more generally without any responses (unsupervised).
  • 4. What Is Clustering? ➢ A Cluster is a collection of data objects which are ▪ Similar(or related) to one another within the same group(i.e Cluster) ▪ Dissimilar (or unrelated) to the objects in other groups(i.e Clusters) ➢ Clustering: Clustering is a process of partitioning a set of data(or objects) into a set of meaningful sub-classes, called clusters. ➢ Unsupervised learning: no predefined classes ➢ While doing cluster analysis, we first partition the set of data into groups. That based on data similarity and then assign the labels to the groups.
  • 6. Requirements Of Clustering  Scalability  Ability to deal with any kinds of attributes  Discovery of clusters with attribute shape  High Dimensionally  Ability to deal with noisy data  Interpretability
  • 7. Application Of Clustering  Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs.  Land use: Identification of areas of similar land use in an earth observation database.  Classify-document: Helps in classifying documents on the web for information discovery.  City-planning: Identifying groups of houses according to their house type, value, and geographical location.  Also we use data clustering in outlier detection application such as detection of credit card fraud.
  • 8. Clustering Types  Portioning Clustering ✓ A division data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset.  Hierarchical Clusterin ✓ A set of nested clusters organized as a hierarchical tree.
  • 9. Clustering Types Portioning Clustering Hierarchical Clustering
  • 10. Clustering Methods  Partitioning Method  Hierarchical Method  Density-based Method  Grid-Based Method
  • 11. Clustering Methods :Partitioning Method  Partitioning Method Partitioning method that subdivide the data objects into a set of k clusters . where k is the number of groups pre-specified.  the following requirements: ➢ Each group contain at least one object. ➢ Each object must belong to exactly one group. ➢ Example: k-means algorithm
  • 12. Clustering Methods :Partitioning Method  Hierarchical Methods ▪ This method create the hierarchical decomposition of the given set of data objects. We can classify Hierarchical method on basis of how the hierarchical decomposition is formed as follows:  Agglomerative Approach • bottom-up approach. • each object forming a separate group. ❑ Divisive Approach • top-down approach. • objects in the same cluster.
  • 13. Clustering Methods :Partitioning Method  Disadvantage ▪ This method is rigid i.e. once merge or split is done, It can never be undone.
  • 14. Density-based Method  Density-based Method • This method is based on the notion of density. The basic idea is to continue growing the given cluster as long as the density in the neighbourhood exceeds some threshold ❑ Major Features: • Discover Clustered of Arbitary Shape.  Example: • DBSCAN Algorithm
  • 15. Grid-based Method  Grid-based Method • In this the objects together form a grid. The object space is quantized into finite number of cells that form a grid structure.  Advantage • The major advantage of this method is fast processing time.
  • 16. K-means clustering algorithm  k-means is one of the simplest unsupervised learning algorithms that solve the well known clustering problem.  The procedure follows a simple and easy way to classify a given data set through a certain number of cluster (assume k clusters) .
  • 18. K-means clustering algorithm  Algorithmic steps for k-means clustering  Given k, the k-means algorithm is implemented in four steps:  Partition objects into k nonempty subsets  Compute seed points as the centroids of the clusters of the current partition (the centroid is the center, i.e., mean point, of the cluster)  Assign each object to the cluster with the nearest seed point  Go back to Step 2, stop when no more new assignment
  • 22. K-means clustering Advantages Disadvantages • Simple, understandable • items automatically assigned to clusters • Must pick number of clusters before hand • Often terminates at a local optimum. • All items forced into a cluster • Too sensitive to outliers
  • 23. What Is the Problem of the K-Means ?  The k-means algorithm is sensitive to outliers !  Since an object with an extremely large value may substantially distort the distribution of the data  K-Medoids: Instead of taking the mean value of the object in a cluster as a reference point, medoids can be used, which is the most centrally located object in a cluster 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
  • 24. Summery  Cluster analysis groups objects based on their similarity and has wide applications  Measure of similarity can be computed for various types of data  Clustering algorithms can be categorized into partitioning methods, hierarchical methods, density-based methods, grid-based methods, and model-based methods  Outlier detection and analysis are very useful for fraud detection, etc. and can be performed by statistical, distance-based or deviation-based approaches  There are still lots of research issues on cluster analysis, such as constraint-based clustering
  • 25. References  Data Mining Next Generation Challenges & Future Directions HillolKargupta, AnupamJoshi, Yelena Yesha, Krishnamoorthy Sivakumar PHI 9.  Data Mining Concepts & Techniques Jiawei Han, Mining Techniques and Trends N.P Gopalan, B. Sivasalvan PHI  https://sites.google.com/  https://towardsdatascience.com/  https://www.tutorialspoint.com