SlideShare a Scribd company logo
1 of 19
CLUSTERING:
 What is Clustering
 Clustering Techniques
 Partitioning methods
 Hierarchical methods
 Density-based methods
 Graph based methods
 Model based methods
 Application of Clustering
1
Clustering
 Clustering is a technique that groups similar objects such that
the objects in the same group are more similar to each other
than the objects in the other groups. The group of similar
objects is called a Cluster.
 Clustering helps to split data into several subsets. Each of
these clusters consists of data objects with high inter-similarity
and low intra-similarity.
2
Example
3
4
Clustering
Techniques
Partitioning
methods
Hierarchical
methods
Density-based
methods
Graph based
methods
Model based
clustering
• k-Means algorithm [1957, 1967]
• k-Medoids algorithm
• k-Modes [1998]
• Fuzzy c-means algorithm [1999]
Divisive
Agglomerative
methods
• STING [1997]
• DBSCAN [1996]
• CLIQUE [1998]
• DENCLUE [1998]
• OPTICS [1999]
• Wave Cluster [1998]
• MST Clustering [1999]
• OPOSSUM [2000]
• SNN Similarity Clustering [2001, 2003]
• EM Algorithm [1977]
• Auto class [1996]
• COBWEB [1987]
• ANN Clustering [1982, 1989]
• AGNES [1990]
• BIRCH [1996]
• CURE [1998]
• ROCK [1999]
• Chamelon [1999]
• DIANA [1990]
• PAM [1990]
• CLARA [1990]
• CLARANS [1994]
hniques:
Centroids-based
Clustering(partitioning Clustering)
CS 40003: Data Analytics 5
 Centroid based clustering is considered as one of the
most simplest clustering algorithms, yet the most
effective way of creating clusters and assigning data
points.
 These groups of clustering methods iteratively measure
the distance between the clusters and the characteristic
centroids using various distance metrics. These are
either of Euclidian distance, Manhattan Distance or
Minkowski Distance.
k-Means Algorithm
 k-Means is one of the most widely used and perhaps the
simplest unsupervised algorithms to solve the clustering
problems.
 Using this algorithm, we classify a given data set
through a certain number of predetermined clusters or
“k” clusters.
 Each cluster is assigned a designated cluster center and
they are placed as much as possible far away from each
other.
6
7
where,
||xi – vj|| is the distance between Xi and Vj.
Ci is the count of data in cluster.C is the number of cluster centroids.
Advantages:
. Can be applied to any form of data – as long as the data has numerical (continuous)
entities.
. Much faster than other algorithms.
. Easy to understand and interpret.
Drawbacks:
. Fails for non-linear data.
. This cannot work for Categorical data.
. Cannot handle outliers.
K-Medoids Algorithm
 Medoids is a clustering algorithm resembling the K-Means
clustering technique. It falls under the category of un
supervised technique.It majorly differs from the K-Means
algorithm in terms of the way it selects the clusters’ centres.
The former selects the average of a cluster’s points as its centre
(which may or may not be one of the data points) while the
latter always picks the actual data points from the clusters as
their centres (also known as ‘exemplars’ or ‘medoids’). K-
Medoids also differs in this respect from the K-Medians
algorithm whic,h is the same as K-means.
CS 40003: Data Analytics 8
2.Hierarchical Clustering
 It also called Hierarchical cluster analysis or HCA is
an unsupervised clustering algorithm which involves
creating clusters that have predefined ordering from top
to bottom.
 It then proceeds to perform a decomposition of the data
objects based on this hierarchy, hence obtaining the
clusters.
 This clustering technique is divided into two types:
 Agglomerative Hierarchical Clustering
 Divisive Hierarchical Clustering
9
Agglomerative Approach
10
Agglomerative Hierarchical Clustering is the most common type of hierarchical clustering
used to group objects in clusters based on their similarity. It’s also known as AGNES
(Agglomerative Nesting). It's a “bottom-up” approach: each observation starts in its own
cluster, and pairs of clusters are merged as one moves up the hierarchy.
Diagram:
How does it works:
1.Make each data point a single-point cluster → forms N
clusters
2.Take the two closest data points and make them one
cluster → forms N-1 clusters
3.Take the two closest clusters and make them one cluster
→ Forms N-2 clusters.
4.Repeat step-3 until you are left with only one cluster.
Have a look at the visual representation of Agglomerative
Hierarchical Clustering for better understanding:
11
12
There are several ways to measure the distance between clusters in order to decide the
rules for clustering, and they are often called Linkage Methods. Some of the common
linkage methods are:
Complete-linkage: the distance between two clusters is defined as the longest distance
between two points in each cluster.
Single-linkage: the distance between two clusters is defined as the shortest distance
between two points in each cluster. This linkage may be used to detect high values in
your dataset which may be outliers as they will be merged at the end.
Average-linkage: the distance between two clusters is defined as the average distance
between each point in one cluster to every point in the other cluster.
Centroid-linkage: finds the centroid of cluster 1 and centroid of cluster 2, and then
calculates the distance between the two before merging.
The choice of linkage method entirely depends on you and there is no hard and fast
method that will always give you good results. Different linkage methods lead to
different clusters.
The point of doing all this is to demonstrate the way hierarchical clustering works, it
maintains a memory of how we went through this process and that memory is stored
in Dendrogram.
13
What is a Dendrogram?
A Dendrogram is a type of tree diagram showing hierarchical relationships between
different sets of data.
As already said a Dendrogram contains the memory of hierarchical clustering algorithm, so
just by looking at the Dendrogram you can tell how the cluster is formed.
Devise approach:
14
In Divisive or DIANA(DIvisive ANAlysis Clustering) is a top-down clustering method
where we assign all of the observations to a single cluster and then partition the cluster to
two least similar clusters. Finally, we proceed recursively on each cluster until there is one
cluster for each observation. So this clustering approach is exactly opposite to
Agglomerative clustering.
3. Density-based Clustering
 If one looks into the previous two methods that we discussed, one
would observe that both hierarchical and centroid based
algorithms are dependent on a distance metric.
 The very definition of a cluster is based on this metric. Density-
based clustering methods take density into consideration instead
of distances.
 Clusters are considered as the densest region in a data space,
which is separated by regions of lower object density and it is
defined as a maximal-set of connected points.
15
4.Graph based Clustering
 Transform the data into a graph representation.
 Vertices are the data points to be clustered.
 Edges are weighted based on similarity between data.
16
5.Model based clustering
 Model-based clustering is a broad family of algorithms
designed for modelling an unknown distribution as a mixture
of simpler distributions, sometimes called basis distributions.
The classification of mixture model clustering is based on the
following four criteria.
 Parametric and non parametric model
 Gaussian mixture models (GMMs)
 non-Bayesian methods and Bayesian methods
 mixture of factor analysers (MFA).
17
Applications
 Pattern Recognition
 Spatial Data Analysis
 Image Processing
 Economic Science
 Crime Analysis
 Bio informatics
 Medical Imaging
 Robotics
 Climatology
18
CS 40003: Data Analytics 19

More Related Content

Similar to clustering ppt.pptx

Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...IRJET Journal
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
 
Survey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingSurvey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingIOSR Journals
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsIJMER
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfVIKASGUPTA127897
 
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningNandakumar P
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basicHouw Liong The
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberHouw Liong The
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapterNaveenKumar5162
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapterNaveenKumar5162
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasicengrasi
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptSubrata Kumer Paul
 
[ML]-Unsupervised-learning_Unit2.ppt.pdf
[ML]-Unsupervised-learning_Unit2.ppt.pdf[ML]-Unsupervised-learning_Unit2.ppt.pdf
[ML]-Unsupervised-learning_Unit2.ppt.pdf4NM20IS025BHUSHANNAY
 
Literature Survey On Clustering Techniques
Literature Survey On Clustering TechniquesLiterature Survey On Clustering Techniques
Literature Survey On Clustering TechniquesIOSR Journals
 

Similar to clustering ppt.pptx (20)

Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...
 
cluster.pptx
cluster.pptxcluster.pptx
cluster.pptx
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithm
 
Survey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingSurvey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in Datamining
 
Az36311316
Az36311316Az36311316
Az36311316
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data Fragments
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdf
 
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basic
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
 
CLUSTERING
CLUSTERINGCLUSTERING
CLUSTERING
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
 
[ML]-Unsupervised-learning_Unit2.ppt.pdf
[ML]-Unsupervised-learning_Unit2.ppt.pdf[ML]-Unsupervised-learning_Unit2.ppt.pdf
[ML]-Unsupervised-learning_Unit2.ppt.pdf
 
Literature Survey On Clustering Techniques
Literature Survey On Clustering TechniquesLiterature Survey On Clustering Techniques
Literature Survey On Clustering Techniques
 
Dp33701704
Dp33701704Dp33701704
Dp33701704
 
Dp33701704
Dp33701704Dp33701704
Dp33701704
 

More from chmeghana1

ABM908 5. Quality grades and standards.pptx
ABM908 5. Quality grades and standards.pptxABM908 5. Quality grades and standards.pptx
ABM908 5. Quality grades and standards.pptxchmeghana1
 
DSAConclave Presentation based on introduction
DSAConclave Presentation based on introductionDSAConclave Presentation based on introduction
DSAConclave Presentation based on introductionchmeghana1
 
Asynchronous Data Transfers and Convolution.pptx
Asynchronous  Data Transfers and Convolution.pptxAsynchronous  Data Transfers and Convolution.pptx
Asynchronous Data Transfers and Convolution.pptxchmeghana1
 
ABM910 2.2 Retail management and Food Retailing.pptx
ABM910 2.2 Retail management and Food Retailing.pptxABM910 2.2 Retail management and Food Retailing.pptx
ABM910 2.2 Retail management and Food Retailing.pptxchmeghana1
 
ABM908 7. Food processing, food quality standards and world food trade.pptx
ABM908 7. Food processing, food quality standards and  world food trade.pptxABM908 7. Food processing, food quality standards and  world food trade.pptx
ABM908 7. Food processing, food quality standards and world food trade.pptxchmeghana1
 
0329.emccormi.ppt
0329.emccormi.ppt0329.emccormi.ppt
0329.emccormi.pptchmeghana1
 

More from chmeghana1 (6)

ABM908 5. Quality grades and standards.pptx
ABM908 5. Quality grades and standards.pptxABM908 5. Quality grades and standards.pptx
ABM908 5. Quality grades and standards.pptx
 
DSAConclave Presentation based on introduction
DSAConclave Presentation based on introductionDSAConclave Presentation based on introduction
DSAConclave Presentation based on introduction
 
Asynchronous Data Transfers and Convolution.pptx
Asynchronous  Data Transfers and Convolution.pptxAsynchronous  Data Transfers and Convolution.pptx
Asynchronous Data Transfers and Convolution.pptx
 
ABM910 2.2 Retail management and Food Retailing.pptx
ABM910 2.2 Retail management and Food Retailing.pptxABM910 2.2 Retail management and Food Retailing.pptx
ABM910 2.2 Retail management and Food Retailing.pptx
 
ABM908 7. Food processing, food quality standards and world food trade.pptx
ABM908 7. Food processing, food quality standards and  world food trade.pptxABM908 7. Food processing, food quality standards and  world food trade.pptx
ABM908 7. Food processing, food quality standards and world food trade.pptx
 
0329.emccormi.ppt
0329.emccormi.ppt0329.emccormi.ppt
0329.emccormi.ppt
 

Recently uploaded

GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacingjaychoudhary37
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 

Recently uploaded (20)

GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacing
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 

clustering ppt.pptx

  • 1. CLUSTERING:  What is Clustering  Clustering Techniques  Partitioning methods  Hierarchical methods  Density-based methods  Graph based methods  Model based methods  Application of Clustering 1
  • 2. Clustering  Clustering is a technique that groups similar objects such that the objects in the same group are more similar to each other than the objects in the other groups. The group of similar objects is called a Cluster.  Clustering helps to split data into several subsets. Each of these clusters consists of data objects with high inter-similarity and low intra-similarity. 2
  • 4. 4 Clustering Techniques Partitioning methods Hierarchical methods Density-based methods Graph based methods Model based clustering • k-Means algorithm [1957, 1967] • k-Medoids algorithm • k-Modes [1998] • Fuzzy c-means algorithm [1999] Divisive Agglomerative methods • STING [1997] • DBSCAN [1996] • CLIQUE [1998] • DENCLUE [1998] • OPTICS [1999] • Wave Cluster [1998] • MST Clustering [1999] • OPOSSUM [2000] • SNN Similarity Clustering [2001, 2003] • EM Algorithm [1977] • Auto class [1996] • COBWEB [1987] • ANN Clustering [1982, 1989] • AGNES [1990] • BIRCH [1996] • CURE [1998] • ROCK [1999] • Chamelon [1999] • DIANA [1990] • PAM [1990] • CLARA [1990] • CLARANS [1994] hniques:
  • 5. Centroids-based Clustering(partitioning Clustering) CS 40003: Data Analytics 5  Centroid based clustering is considered as one of the most simplest clustering algorithms, yet the most effective way of creating clusters and assigning data points.  These groups of clustering methods iteratively measure the distance between the clusters and the characteristic centroids using various distance metrics. These are either of Euclidian distance, Manhattan Distance or Minkowski Distance.
  • 6. k-Means Algorithm  k-Means is one of the most widely used and perhaps the simplest unsupervised algorithms to solve the clustering problems.  Using this algorithm, we classify a given data set through a certain number of predetermined clusters or “k” clusters.  Each cluster is assigned a designated cluster center and they are placed as much as possible far away from each other. 6
  • 7. 7 where, ||xi – vj|| is the distance between Xi and Vj. Ci is the count of data in cluster.C is the number of cluster centroids. Advantages: . Can be applied to any form of data – as long as the data has numerical (continuous) entities. . Much faster than other algorithms. . Easy to understand and interpret. Drawbacks: . Fails for non-linear data. . This cannot work for Categorical data. . Cannot handle outliers.
  • 8. K-Medoids Algorithm  Medoids is a clustering algorithm resembling the K-Means clustering technique. It falls under the category of un supervised technique.It majorly differs from the K-Means algorithm in terms of the way it selects the clusters’ centres. The former selects the average of a cluster’s points as its centre (which may or may not be one of the data points) while the latter always picks the actual data points from the clusters as their centres (also known as ‘exemplars’ or ‘medoids’). K- Medoids also differs in this respect from the K-Medians algorithm whic,h is the same as K-means. CS 40003: Data Analytics 8
  • 9. 2.Hierarchical Clustering  It also called Hierarchical cluster analysis or HCA is an unsupervised clustering algorithm which involves creating clusters that have predefined ordering from top to bottom.  It then proceeds to perform a decomposition of the data objects based on this hierarchy, hence obtaining the clusters.  This clustering technique is divided into two types:  Agglomerative Hierarchical Clustering  Divisive Hierarchical Clustering 9
  • 10. Agglomerative Approach 10 Agglomerative Hierarchical Clustering is the most common type of hierarchical clustering used to group objects in clusters based on their similarity. It’s also known as AGNES (Agglomerative Nesting). It's a “bottom-up” approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. Diagram:
  • 11. How does it works: 1.Make each data point a single-point cluster → forms N clusters 2.Take the two closest data points and make them one cluster → forms N-1 clusters 3.Take the two closest clusters and make them one cluster → Forms N-2 clusters. 4.Repeat step-3 until you are left with only one cluster. Have a look at the visual representation of Agglomerative Hierarchical Clustering for better understanding: 11
  • 12. 12 There are several ways to measure the distance between clusters in order to decide the rules for clustering, and they are often called Linkage Methods. Some of the common linkage methods are: Complete-linkage: the distance between two clusters is defined as the longest distance between two points in each cluster. Single-linkage: the distance between two clusters is defined as the shortest distance between two points in each cluster. This linkage may be used to detect high values in your dataset which may be outliers as they will be merged at the end. Average-linkage: the distance between two clusters is defined as the average distance between each point in one cluster to every point in the other cluster. Centroid-linkage: finds the centroid of cluster 1 and centroid of cluster 2, and then calculates the distance between the two before merging. The choice of linkage method entirely depends on you and there is no hard and fast method that will always give you good results. Different linkage methods lead to different clusters. The point of doing all this is to demonstrate the way hierarchical clustering works, it maintains a memory of how we went through this process and that memory is stored in Dendrogram.
  • 13. 13 What is a Dendrogram? A Dendrogram is a type of tree diagram showing hierarchical relationships between different sets of data. As already said a Dendrogram contains the memory of hierarchical clustering algorithm, so just by looking at the Dendrogram you can tell how the cluster is formed.
  • 14. Devise approach: 14 In Divisive or DIANA(DIvisive ANAlysis Clustering) is a top-down clustering method where we assign all of the observations to a single cluster and then partition the cluster to two least similar clusters. Finally, we proceed recursively on each cluster until there is one cluster for each observation. So this clustering approach is exactly opposite to Agglomerative clustering.
  • 15. 3. Density-based Clustering  If one looks into the previous two methods that we discussed, one would observe that both hierarchical and centroid based algorithms are dependent on a distance metric.  The very definition of a cluster is based on this metric. Density- based clustering methods take density into consideration instead of distances.  Clusters are considered as the densest region in a data space, which is separated by regions of lower object density and it is defined as a maximal-set of connected points. 15
  • 16. 4.Graph based Clustering  Transform the data into a graph representation.  Vertices are the data points to be clustered.  Edges are weighted based on similarity between data. 16
  • 17. 5.Model based clustering  Model-based clustering is a broad family of algorithms designed for modelling an unknown distribution as a mixture of simpler distributions, sometimes called basis distributions. The classification of mixture model clustering is based on the following four criteria.  Parametric and non parametric model  Gaussian mixture models (GMMs)  non-Bayesian methods and Bayesian methods  mixture of factor analysers (MFA). 17
  • 18. Applications  Pattern Recognition  Spatial Data Analysis  Image Processing  Economic Science  Crime Analysis  Bio informatics  Medical Imaging  Robotics  Climatology 18
  • 19. CS 40003: Data Analytics 19