SlideShare a Scribd company logo
CLUSTERING:
 What is Clustering
 Clustering Techniques
 Partitioning methods
 Hierarchical methods
 Density-based methods
 Graph based methods
 Model based methods
 Application of Clustering
1
Clustering
 Clustering is a technique that groups similar objects such that
the objects in the same group are more similar to each other
than the objects in the other groups. The group of similar
objects is called a Cluster.
 Clustering helps to split data into several subsets. Each of
these clusters consists of data objects with high inter-similarity
and low intra-similarity.
2
Example
3
4
Clustering
Techniques
Partitioning
methods
Hierarchical
methods
Density-based
methods
Graph based
methods
Model based
clustering
• k-Means algorithm [1957, 1967]
• k-Medoids algorithm
• k-Modes [1998]
• Fuzzy c-means algorithm [1999]
Divisive
Agglomerative
methods
• STING [1997]
• DBSCAN [1996]
• CLIQUE [1998]
• DENCLUE [1998]
• OPTICS [1999]
• Wave Cluster [1998]
• MST Clustering [1999]
• OPOSSUM [2000]
• SNN Similarity Clustering [2001, 2003]
• EM Algorithm [1977]
• Auto class [1996]
• COBWEB [1987]
• ANN Clustering [1982, 1989]
• AGNES [1990]
• BIRCH [1996]
• CURE [1998]
• ROCK [1999]
• Chamelon [1999]
• DIANA [1990]
• PAM [1990]
• CLARA [1990]
• CLARANS [1994]
hniques:
Centroids-based
Clustering(partitioning Clustering)
CS 40003: Data Analytics 5
 Centroid based clustering is considered as one of the
most simplest clustering algorithms, yet the most
effective way of creating clusters and assigning data
points.
 These groups of clustering methods iteratively measure
the distance between the clusters and the characteristic
centroids using various distance metrics. These are
either of Euclidian distance, Manhattan Distance or
Minkowski Distance.
k-Means Algorithm
 k-Means is one of the most widely used and perhaps the
simplest unsupervised algorithms to solve the clustering
problems.
 Using this algorithm, we classify a given data set
through a certain number of predetermined clusters or
“k” clusters.
 Each cluster is assigned a designated cluster center and
they are placed as much as possible far away from each
other.
6
7
where,
||xi – vj|| is the distance between Xi and Vj.
Ci is the count of data in cluster.C is the number of cluster centroids.
Advantages:
. Can be applied to any form of data – as long as the data has numerical (continuous)
entities.
. Much faster than other algorithms.
. Easy to understand and interpret.
Drawbacks:
. Fails for non-linear data.
. This cannot work for Categorical data.
. Cannot handle outliers.
K-Medoids Algorithm
 Medoids is a clustering algorithm resembling the K-Means
clustering technique. It falls under the category of un
supervised technique.It majorly differs from the K-Means
algorithm in terms of the way it selects the clusters’ centres.
The former selects the average of a cluster’s points as its centre
(which may or may not be one of the data points) while the
latter always picks the actual data points from the clusters as
their centres (also known as ‘exemplars’ or ‘medoids’). K-
Medoids also differs in this respect from the K-Medians
algorithm whic,h is the same as K-means.
CS 40003: Data Analytics 8
2.Hierarchical Clustering
 It also called Hierarchical cluster analysis or HCA is
an unsupervised clustering algorithm which involves
creating clusters that have predefined ordering from top
to bottom.
 It then proceeds to perform a decomposition of the data
objects based on this hierarchy, hence obtaining the
clusters.
 This clustering technique is divided into two types:
 Agglomerative Hierarchical Clustering
 Divisive Hierarchical Clustering
9
Agglomerative Approach
10
Agglomerative Hierarchical Clustering is the most common type of hierarchical clustering
used to group objects in clusters based on their similarity. It’s also known as AGNES
(Agglomerative Nesting). It's a “bottom-up” approach: each observation starts in its own
cluster, and pairs of clusters are merged as one moves up the hierarchy.
Diagram:
How does it works:
1.Make each data point a single-point cluster → forms N
clusters
2.Take the two closest data points and make them one
cluster → forms N-1 clusters
3.Take the two closest clusters and make them one cluster
→ Forms N-2 clusters.
4.Repeat step-3 until you are left with only one cluster.
Have a look at the visual representation of Agglomerative
Hierarchical Clustering for better understanding:
11
12
There are several ways to measure the distance between clusters in order to decide the
rules for clustering, and they are often called Linkage Methods. Some of the common
linkage methods are:
Complete-linkage: the distance between two clusters is defined as the longest distance
between two points in each cluster.
Single-linkage: the distance between two clusters is defined as the shortest distance
between two points in each cluster. This linkage may be used to detect high values in
your dataset which may be outliers as they will be merged at the end.
Average-linkage: the distance between two clusters is defined as the average distance
between each point in one cluster to every point in the other cluster.
Centroid-linkage: finds the centroid of cluster 1 and centroid of cluster 2, and then
calculates the distance between the two before merging.
The choice of linkage method entirely depends on you and there is no hard and fast
method that will always give you good results. Different linkage methods lead to
different clusters.
The point of doing all this is to demonstrate the way hierarchical clustering works, it
maintains a memory of how we went through this process and that memory is stored
in Dendrogram.
13
What is a Dendrogram?
A Dendrogram is a type of tree diagram showing hierarchical relationships between
different sets of data.
As already said a Dendrogram contains the memory of hierarchical clustering algorithm, so
just by looking at the Dendrogram you can tell how the cluster is formed.
Devise approach:
14
In Divisive or DIANA(DIvisive ANAlysis Clustering) is a top-down clustering method
where we assign all of the observations to a single cluster and then partition the cluster to
two least similar clusters. Finally, we proceed recursively on each cluster until there is one
cluster for each observation. So this clustering approach is exactly opposite to
Agglomerative clustering.
3. Density-based Clustering
 If one looks into the previous two methods that we discussed, one
would observe that both hierarchical and centroid based
algorithms are dependent on a distance metric.
 The very definition of a cluster is based on this metric. Density-
based clustering methods take density into consideration instead
of distances.
 Clusters are considered as the densest region in a data space,
which is separated by regions of lower object density and it is
defined as a maximal-set of connected points.
15
4.Graph based Clustering
 Transform the data into a graph representation.
 Vertices are the data points to be clustered.
 Edges are weighted based on similarity between data.
16
5.Model based clustering
 Model-based clustering is a broad family of algorithms
designed for modelling an unknown distribution as a mixture
of simpler distributions, sometimes called basis distributions.
The classification of mixture model clustering is based on the
following four criteria.
 Parametric and non parametric model
 Gaussian mixture models (GMMs)
 non-Bayesian methods and Bayesian methods
 mixture of factor analysers (MFA).
17
Applications
 Pattern Recognition
 Spatial Data Analysis
 Image Processing
 Economic Science
 Crime Analysis
 Bio informatics
 Medical Imaging
 Robotics
 Climatology
18
CS 40003: Data Analytics 19

More Related Content

Similar to clustering ppt.pptx

Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...
IRJET Journal
 
cluster.pptx
cluster.pptxcluster.pptx
cluster.pptx
KamranAli649587
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithm
ijsrd.com
 
Survey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingSurvey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in Datamining
IOSR Journals
 
Az36311316
Az36311316Az36311316
Az36311316
IJERA Editor
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data Fragments
IJMER
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdf
VIKASGUPTA127897
 
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
Nandakumar P
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
Houw Liong The
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basic
Houw Liong The
 
CLUSTERING
CLUSTERINGCLUSTERING
CLUSTERING
Aman Jatain
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
NaveenKumar5162
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
NaveenKumar5162
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
engrasi
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
JoonyoungJayGwak
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Subrata Kumer Paul
 
[ML]-Unsupervised-learning_Unit2.ppt.pdf
[ML]-Unsupervised-learning_Unit2.ppt.pdf[ML]-Unsupervised-learning_Unit2.ppt.pdf
[ML]-Unsupervised-learning_Unit2.ppt.pdf
4NM20IS025BHUSHANNAY
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10
mqasimsheikh5
 
Literature Survey On Clustering Techniques
Literature Survey On Clustering TechniquesLiterature Survey On Clustering Techniques
Literature Survey On Clustering Techniques
IOSR Journals
 
Dp33701704
Dp33701704Dp33701704
Dp33701704
IJERA Editor
 

Similar to clustering ppt.pptx (20)

Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...
 
cluster.pptx
cluster.pptxcluster.pptx
cluster.pptx
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithm
 
Survey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingSurvey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in Datamining
 
Az36311316
Az36311316Az36311316
Az36311316
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data Fragments
 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdf
 
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basic
 
CLUSTERING
CLUSTERINGCLUSTERING
CLUSTERING
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
 
[ML]-Unsupervised-learning_Unit2.ppt.pdf
[ML]-Unsupervised-learning_Unit2.ppt.pdf[ML]-Unsupervised-learning_Unit2.ppt.pdf
[ML]-Unsupervised-learning_Unit2.ppt.pdf
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10
 
Literature Survey On Clustering Techniques
Literature Survey On Clustering TechniquesLiterature Survey On Clustering Techniques
Literature Survey On Clustering Techniques
 
Dp33701704
Dp33701704Dp33701704
Dp33701704
 

More from chmeghana1

NDMA - Administration.pptx NDMA Presentation
NDMA - Administration.pptx NDMA PresentationNDMA - Administration.pptx NDMA Presentation
NDMA - Administration.pptx NDMA Presentation
chmeghana1
 
ABM910 4. E-tailing and Understanding food preference of Indian Consumer.pptx
ABM910 4. E-tailing and Understanding food preference of Indian Consumer.pptxABM910 4. E-tailing and Understanding food preference of Indian Consumer.pptx
ABM910 4. E-tailing and Understanding food preference of Indian Consumer.pptx
chmeghana1
 
MTECH control charts in operation management.pptx
MTECH control charts in operation management.pptxMTECH control charts in operation management.pptx
MTECH control charts in operation management.pptx
chmeghana1
 
ABM-910 6. Demographic and Psychographic factors affecting Food Pattern of In...
ABM-910 6. Demographic and Psychographic factors affecting Food Pattern of In...ABM-910 6. Demographic and Psychographic factors affecting Food Pattern of In...
ABM-910 6. Demographic and Psychographic factors affecting Food Pattern of In...
chmeghana1
 
ABM908 8. Hazard Analysis Critical Control Points (HACCP) (1).pptx
ABM908 8. Hazard Analysis Critical Control Points (HACCP) (1).pptxABM908 8. Hazard Analysis Critical Control Points (HACCP) (1).pptx
ABM908 8. Hazard Analysis Critical Control Points (HACCP) (1).pptx
chmeghana1
 
ABM910 8. Principal trends in food wholesaling and retailing (2).pptx
ABM910 8. Principal trends in food wholesaling and retailing (2).pptxABM910 8. Principal trends in food wholesaling and retailing (2).pptx
ABM910 8. Principal trends in food wholesaling and retailing (2).pptx
chmeghana1
 
Faculty Presentation Sample ppt.pptx presentation
Faculty Presentation Sample ppt.pptx presentationFaculty Presentation Sample ppt.pptx presentation
Faculty Presentation Sample ppt.pptx presentation
chmeghana1
 
Flipped Classroom explanation with imp.pptx
Flipped Classroom explanation with imp.pptxFlipped Classroom explanation with imp.pptx
Flipped Classroom explanation with imp.pptx
chmeghana1
 
ABM908 8. Hazard Analysis Critical Control Points (HACCP) (1).pptx
ABM908 8. Hazard Analysis Critical Control Points (HACCP) (1).pptxABM908 8. Hazard Analysis Critical Control Points (HACCP) (1).pptx
ABM908 8. Hazard Analysis Critical Control Points (HACCP) (1).pptx
chmeghana1
 
_ ABM910 10. Value chain and value additions across the chain in food retail ...
_ ABM910 10. Value chain and value additions across the chain in food retail ..._ ABM910 10. Value chain and value additions across the chain in food retail ...
_ ABM910 10. Value chain and value additions across the chain in food retail ...
chmeghana1
 
ABM910 11. Food service marketing (1).pptx
ABM910 11. Food service marketing (1).pptxABM910 11. Food service marketing (1).pptx
ABM910 11. Food service marketing (1).pptx
chmeghana1
 
Sheri Priyanka ( 19mgt8ab278).pptx mushroom
Sheri Priyanka ( 19mgt8ab278).pptx mushroomSheri Priyanka ( 19mgt8ab278).pptx mushroom
Sheri Priyanka ( 19mgt8ab278).pptx mushroom
chmeghana1
 
sandeepnallaraweppt-200809162111.pptxmush
sandeepnallaraweppt-200809162111.pptxmushsandeepnallaraweppt-200809162111.pptxmush
sandeepnallaraweppt-200809162111.pptxmush
chmeghana1
 
Signals and Devices Presentation related to IOT
Signals and Devices Presentation related to IOTSignals and Devices Presentation related to IOT
Signals and Devices Presentation related to IOT
chmeghana1
 
ABM908 5. Quality grades and standards.pptx
ABM908 5. Quality grades and standards.pptxABM908 5. Quality grades and standards.pptx
ABM908 5. Quality grades and standards.pptx
chmeghana1
 
DSAConclave Presentation based on introduction
DSAConclave Presentation based on introductionDSAConclave Presentation based on introduction
DSAConclave Presentation based on introduction
chmeghana1
 
Asynchronous Data Transfers and Convolution.pptx
Asynchronous  Data Transfers and Convolution.pptxAsynchronous  Data Transfers and Convolution.pptx
Asynchronous Data Transfers and Convolution.pptx
chmeghana1
 
ABM910 2.2 Retail management and Food Retailing.pptx
ABM910 2.2 Retail management and Food Retailing.pptxABM910 2.2 Retail management and Food Retailing.pptx
ABM910 2.2 Retail management and Food Retailing.pptx
chmeghana1
 
ABM908 7. Food processing, food quality standards and world food trade.pptx
ABM908 7. Food processing, food quality standards and  world food trade.pptxABM908 7. Food processing, food quality standards and  world food trade.pptx
ABM908 7. Food processing, food quality standards and world food trade.pptx
chmeghana1
 
0329.emccormi.ppt
0329.emccormi.ppt0329.emccormi.ppt
0329.emccormi.ppt
chmeghana1
 

More from chmeghana1 (20)

NDMA - Administration.pptx NDMA Presentation
NDMA - Administration.pptx NDMA PresentationNDMA - Administration.pptx NDMA Presentation
NDMA - Administration.pptx NDMA Presentation
 
ABM910 4. E-tailing and Understanding food preference of Indian Consumer.pptx
ABM910 4. E-tailing and Understanding food preference of Indian Consumer.pptxABM910 4. E-tailing and Understanding food preference of Indian Consumer.pptx
ABM910 4. E-tailing and Understanding food preference of Indian Consumer.pptx
 
MTECH control charts in operation management.pptx
MTECH control charts in operation management.pptxMTECH control charts in operation management.pptx
MTECH control charts in operation management.pptx
 
ABM-910 6. Demographic and Psychographic factors affecting Food Pattern of In...
ABM-910 6. Demographic and Psychographic factors affecting Food Pattern of In...ABM-910 6. Demographic and Psychographic factors affecting Food Pattern of In...
ABM-910 6. Demographic and Psychographic factors affecting Food Pattern of In...
 
ABM908 8. Hazard Analysis Critical Control Points (HACCP) (1).pptx
ABM908 8. Hazard Analysis Critical Control Points (HACCP) (1).pptxABM908 8. Hazard Analysis Critical Control Points (HACCP) (1).pptx
ABM908 8. Hazard Analysis Critical Control Points (HACCP) (1).pptx
 
ABM910 8. Principal trends in food wholesaling and retailing (2).pptx
ABM910 8. Principal trends in food wholesaling and retailing (2).pptxABM910 8. Principal trends in food wholesaling and retailing (2).pptx
ABM910 8. Principal trends in food wholesaling and retailing (2).pptx
 
Faculty Presentation Sample ppt.pptx presentation
Faculty Presentation Sample ppt.pptx presentationFaculty Presentation Sample ppt.pptx presentation
Faculty Presentation Sample ppt.pptx presentation
 
Flipped Classroom explanation with imp.pptx
Flipped Classroom explanation with imp.pptxFlipped Classroom explanation with imp.pptx
Flipped Classroom explanation with imp.pptx
 
ABM908 8. Hazard Analysis Critical Control Points (HACCP) (1).pptx
ABM908 8. Hazard Analysis Critical Control Points (HACCP) (1).pptxABM908 8. Hazard Analysis Critical Control Points (HACCP) (1).pptx
ABM908 8. Hazard Analysis Critical Control Points (HACCP) (1).pptx
 
_ ABM910 10. Value chain and value additions across the chain in food retail ...
_ ABM910 10. Value chain and value additions across the chain in food retail ..._ ABM910 10. Value chain and value additions across the chain in food retail ...
_ ABM910 10. Value chain and value additions across the chain in food retail ...
 
ABM910 11. Food service marketing (1).pptx
ABM910 11. Food service marketing (1).pptxABM910 11. Food service marketing (1).pptx
ABM910 11. Food service marketing (1).pptx
 
Sheri Priyanka ( 19mgt8ab278).pptx mushroom
Sheri Priyanka ( 19mgt8ab278).pptx mushroomSheri Priyanka ( 19mgt8ab278).pptx mushroom
Sheri Priyanka ( 19mgt8ab278).pptx mushroom
 
sandeepnallaraweppt-200809162111.pptxmush
sandeepnallaraweppt-200809162111.pptxmushsandeepnallaraweppt-200809162111.pptxmush
sandeepnallaraweppt-200809162111.pptxmush
 
Signals and Devices Presentation related to IOT
Signals and Devices Presentation related to IOTSignals and Devices Presentation related to IOT
Signals and Devices Presentation related to IOT
 
ABM908 5. Quality grades and standards.pptx
ABM908 5. Quality grades and standards.pptxABM908 5. Quality grades and standards.pptx
ABM908 5. Quality grades and standards.pptx
 
DSAConclave Presentation based on introduction
DSAConclave Presentation based on introductionDSAConclave Presentation based on introduction
DSAConclave Presentation based on introduction
 
Asynchronous Data Transfers and Convolution.pptx
Asynchronous  Data Transfers and Convolution.pptxAsynchronous  Data Transfers and Convolution.pptx
Asynchronous Data Transfers and Convolution.pptx
 
ABM910 2.2 Retail management and Food Retailing.pptx
ABM910 2.2 Retail management and Food Retailing.pptxABM910 2.2 Retail management and Food Retailing.pptx
ABM910 2.2 Retail management and Food Retailing.pptx
 
ABM908 7. Food processing, food quality standards and world food trade.pptx
ABM908 7. Food processing, food quality standards and  world food trade.pptxABM908 7. Food processing, food quality standards and  world food trade.pptx
ABM908 7. Food processing, food quality standards and world food trade.pptx
 
0329.emccormi.ppt
0329.emccormi.ppt0329.emccormi.ppt
0329.emccormi.ppt
 

Recently uploaded

CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
rpskprasana
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
mahammadsalmanmech
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
Aditya Rajan Patra
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
jpsjournal1
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
ihlasbinance2003
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
enizeyimana36
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
RadiNasr
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 

Recently uploaded (20)

CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 

clustering ppt.pptx

  • 1. CLUSTERING:  What is Clustering  Clustering Techniques  Partitioning methods  Hierarchical methods  Density-based methods  Graph based methods  Model based methods  Application of Clustering 1
  • 2. Clustering  Clustering is a technique that groups similar objects such that the objects in the same group are more similar to each other than the objects in the other groups. The group of similar objects is called a Cluster.  Clustering helps to split data into several subsets. Each of these clusters consists of data objects with high inter-similarity and low intra-similarity. 2
  • 4. 4 Clustering Techniques Partitioning methods Hierarchical methods Density-based methods Graph based methods Model based clustering • k-Means algorithm [1957, 1967] • k-Medoids algorithm • k-Modes [1998] • Fuzzy c-means algorithm [1999] Divisive Agglomerative methods • STING [1997] • DBSCAN [1996] • CLIQUE [1998] • DENCLUE [1998] • OPTICS [1999] • Wave Cluster [1998] • MST Clustering [1999] • OPOSSUM [2000] • SNN Similarity Clustering [2001, 2003] • EM Algorithm [1977] • Auto class [1996] • COBWEB [1987] • ANN Clustering [1982, 1989] • AGNES [1990] • BIRCH [1996] • CURE [1998] • ROCK [1999] • Chamelon [1999] • DIANA [1990] • PAM [1990] • CLARA [1990] • CLARANS [1994] hniques:
  • 5. Centroids-based Clustering(partitioning Clustering) CS 40003: Data Analytics 5  Centroid based clustering is considered as one of the most simplest clustering algorithms, yet the most effective way of creating clusters and assigning data points.  These groups of clustering methods iteratively measure the distance between the clusters and the characteristic centroids using various distance metrics. These are either of Euclidian distance, Manhattan Distance or Minkowski Distance.
  • 6. k-Means Algorithm  k-Means is one of the most widely used and perhaps the simplest unsupervised algorithms to solve the clustering problems.  Using this algorithm, we classify a given data set through a certain number of predetermined clusters or “k” clusters.  Each cluster is assigned a designated cluster center and they are placed as much as possible far away from each other. 6
  • 7. 7 where, ||xi – vj|| is the distance between Xi and Vj. Ci is the count of data in cluster.C is the number of cluster centroids. Advantages: . Can be applied to any form of data – as long as the data has numerical (continuous) entities. . Much faster than other algorithms. . Easy to understand and interpret. Drawbacks: . Fails for non-linear data. . This cannot work for Categorical data. . Cannot handle outliers.
  • 8. K-Medoids Algorithm  Medoids is a clustering algorithm resembling the K-Means clustering technique. It falls under the category of un supervised technique.It majorly differs from the K-Means algorithm in terms of the way it selects the clusters’ centres. The former selects the average of a cluster’s points as its centre (which may or may not be one of the data points) while the latter always picks the actual data points from the clusters as their centres (also known as ‘exemplars’ or ‘medoids’). K- Medoids also differs in this respect from the K-Medians algorithm whic,h is the same as K-means. CS 40003: Data Analytics 8
  • 9. 2.Hierarchical Clustering  It also called Hierarchical cluster analysis or HCA is an unsupervised clustering algorithm which involves creating clusters that have predefined ordering from top to bottom.  It then proceeds to perform a decomposition of the data objects based on this hierarchy, hence obtaining the clusters.  This clustering technique is divided into two types:  Agglomerative Hierarchical Clustering  Divisive Hierarchical Clustering 9
  • 10. Agglomerative Approach 10 Agglomerative Hierarchical Clustering is the most common type of hierarchical clustering used to group objects in clusters based on their similarity. It’s also known as AGNES (Agglomerative Nesting). It's a “bottom-up” approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. Diagram:
  • 11. How does it works: 1.Make each data point a single-point cluster → forms N clusters 2.Take the two closest data points and make them one cluster → forms N-1 clusters 3.Take the two closest clusters and make them one cluster → Forms N-2 clusters. 4.Repeat step-3 until you are left with only one cluster. Have a look at the visual representation of Agglomerative Hierarchical Clustering for better understanding: 11
  • 12. 12 There are several ways to measure the distance between clusters in order to decide the rules for clustering, and they are often called Linkage Methods. Some of the common linkage methods are: Complete-linkage: the distance between two clusters is defined as the longest distance between two points in each cluster. Single-linkage: the distance between two clusters is defined as the shortest distance between two points in each cluster. This linkage may be used to detect high values in your dataset which may be outliers as they will be merged at the end. Average-linkage: the distance between two clusters is defined as the average distance between each point in one cluster to every point in the other cluster. Centroid-linkage: finds the centroid of cluster 1 and centroid of cluster 2, and then calculates the distance between the two before merging. The choice of linkage method entirely depends on you and there is no hard and fast method that will always give you good results. Different linkage methods lead to different clusters. The point of doing all this is to demonstrate the way hierarchical clustering works, it maintains a memory of how we went through this process and that memory is stored in Dendrogram.
  • 13. 13 What is a Dendrogram? A Dendrogram is a type of tree diagram showing hierarchical relationships between different sets of data. As already said a Dendrogram contains the memory of hierarchical clustering algorithm, so just by looking at the Dendrogram you can tell how the cluster is formed.
  • 14. Devise approach: 14 In Divisive or DIANA(DIvisive ANAlysis Clustering) is a top-down clustering method where we assign all of the observations to a single cluster and then partition the cluster to two least similar clusters. Finally, we proceed recursively on each cluster until there is one cluster for each observation. So this clustering approach is exactly opposite to Agglomerative clustering.
  • 15. 3. Density-based Clustering  If one looks into the previous two methods that we discussed, one would observe that both hierarchical and centroid based algorithms are dependent on a distance metric.  The very definition of a cluster is based on this metric. Density- based clustering methods take density into consideration instead of distances.  Clusters are considered as the densest region in a data space, which is separated by regions of lower object density and it is defined as a maximal-set of connected points. 15
  • 16. 4.Graph based Clustering  Transform the data into a graph representation.  Vertices are the data points to be clustered.  Edges are weighted based on similarity between data. 16
  • 17. 5.Model based clustering  Model-based clustering is a broad family of algorithms designed for modelling an unknown distribution as a mixture of simpler distributions, sometimes called basis distributions. The classification of mixture model clustering is based on the following four criteria.  Parametric and non parametric model  Gaussian mixture models (GMMs)  non-Bayesian methods and Bayesian methods  mixture of factor analysers (MFA). 17
  • 18. Applications  Pattern Recognition  Spatial Data Analysis  Image Processing  Economic Science  Crime Analysis  Bio informatics  Medical Imaging  Robotics  Climatology 18
  • 19. CS 40003: Data Analytics 19