SlideShare a Scribd company logo
1 of 33
Clustering
Binoy B Nair
What is Clustering?
• Cluster: a collection of data objects
• Similar to one another within the same cluster
• Dissimilar to the objects in other clusters
• Cluster analysis
• Grouping a set of data objects into clusters
• Clustering is unsupervised classification: no predefined classes
• Typical applications
• As a stand-alone tool to get insight into data distribution
• As a preprocessing step for other algorithms
3
Examples of Clustering Applications
• Marketing: Help marketers discover distinct groups in their customer bases, and then use this
knowledge to develop targeted marketing programs
• Land use: Identification of areas of similar land use in an earth observation database
• Insurance: Identifying groups of motor insurance policy holders with a high average claim cost
• Urban planning: Identifying groups of houses according to their house type, value, and
geographical location
• Seismology: Observed earth quake epicenters should be clustered along continent faults
4
What Is a Good Clustering?
• A good clustering method will produce clusters with
• High intra-class similarity
• Low inter-class similarity
• Precise definition of clustering quality is difficult
• Application-dependent
• Ultimately subjective
5
Requirements for Clustering in Data Mining
• Scalability
• Ability to deal with different types of attributes
• Discovery of clusters with arbitrary shape
• Minimal domain knowledge required to determine input parameters
• Ability to deal with noise and outliers
• Insensitivity to order of input records
• Robustness wrt high dimensionality
• Incorporation of user-specified constraints
• Interpretability and usability
6
Major Clustering Approaches
• Partitioning: Construct various partitions and then evaluate them by some criterion
• Hierarchical: Create a hierarchical decomposition of the set of objects using some
criterion
• Model-based: Hypothesize a model for each cluster and find best fit of models to data
• Density-based: Guided by connectivity and density functions
7
Partitioning Algorithms
• Partitioning method: Construct a partition of a database D of n objects into a set of k
clusters
• Given a k, find a partition of k clusters that optimizes the chosen partitioning
criterion
• Global optimal: exhaustively enumerate all partitions
• Heuristic methods: k-means and k-medoids algorithms
• k-means (MacQueen, 1967): Each cluster is represented by the center of the
cluster
• k-medoids or PAM (Partition around medoids) (Kaufman & Rousseeuw, 1987):
Each cluster is represented by one of the objects in the cluster
Partitional Clustering
• Each instance is placed in exactly one of K nonoverlapping
clusters.
• Since only one set of clusters is output, the user normally has
to input the desired number of clusters K.
9
Similarity and Dissimilarity Between Objects
• Euclidean distance (p = 2):
• Properties of a metric d(i,j):
• d(i,j)  0
• d(i,i) = 0
• d(i,j) = d(j,i)
• d(i,j)  d(i,k) + d(k,j)
)||...|||(|),( 22
22
2
11 pp j
x
i
x
j
x
i
x
j
x
i
xjid 
Squared Error
10
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
Objective Function
Algorithm k-means
1. Decide on a value for k.
2. Initialize the k cluster centers (randomly, if necessary).
3. Decide the class memberships of the N objects by assigning them to the
nearest cluster center.
4. Re-estimate the k cluster centers, by assuming the memberships found
above are correct.
5. If none of the N objects changed membership in the last iteration, exit.
Otherwise goto 3.
0
1
2
3
4
5
0 1 2 3 4 5
K-means Clustering: Step 1
Algorithm: k-means, Distance Metric: Euclidean Distance
k1
k2
k3
0
1
2
3
4
5
0 1 2 3 4 5
K-means Clustering: Step 2
Algorithm: k-means, Distance Metric: Euclidean Distance
k1
k2
k3
0
1
2
3
4
5
0 1 2 3 4 5
K-means Clustering: Step 3
Algorithm: k-means, Distance Metric: Euclidean Distance
k1
k2
k3
0
1
2
3
4
5
0 1 2 3 4 5
K-means Clustering: Step 4
Algorithm: k-means, Distance Metric: Euclidean Distance
k1
k2
k3
0
1
2
3
4
5
0 1 2 3 4 5
expression in condition 1
expressionincondition2
K-means Clustering: Step 5
Algorithm: k-means, Distance Metric: Euclidean Distance
k1
k2
k3
Worked out Example
Example
Subject Features
1 (1,1)
2 (1.5,2)
3 (3,4)
4 (5,7)
5 (3.5,5)
6 (4.5,5)
7 (3.5,4.5)
As a simple illustration of a k-means algorithm, consider the following data set
consisting of the scores of two variables on each of seven individuals:
Scatter Plot
Working
Individual Mean Vector (centroid)
Group 1 1 (1, 1)
Group 2 4 (5, 7)
This data set is to be grouped into two clusters, i.e k=2.
As a first step in finding a sensible initial partition, let the feature values of the two
individuals furthest apart (using the Euclidean distance measure), define the initial
cluster means, giving:
Working
• The remaining individuals are now examined in sequence and
allocated to the cluster to which they are closest, in terms of
Euclidean distance to the cluster mean.
• The mean vector is recalculated each time a new member is added.
• This leads to the following series of steps:
Iteration 1- Assign Objects to closest clusters
Object
(Oi)
Features
Centroid 1
(C1)
D(Oi ,C1)
Centroid 2
(C2)
D(Oi ,C2)
Closest
Centroid
1 (1,1) (1,1) (5,7)
2 (1.5,2) (1,1) (5,7)
3 (3,4) (1,1) (5,7)
4 (5,7) (1,1) (5,7)
5 (3.5,5) (1,1) (5,7)
6 (4.5,5) (1,1) (5,7)
7 (3.5,4.5) (1,1) (5,7)
A cluster is
defined by
its centroid
Iteration 1- Assign Objects to closest clusters
Object
(Oi)
Features
Centroid 1
(C1)
D(Oi ,C1)
Centroid 2
(C2)
D(Oi ,C2)
Closest
Centroid
1 (1,1) (1,1) 0 (5,7) 7.21
2 (1.5,2) (1,1) 1.11 (5,7) 6.1
3 (3,4) (1,1) 3.05 (5,7) 3.60
4 (5,7) (1,1) 7.21 (5,7) 0
5 (3.5,5) (1,1) 4.71 (5,7) 2.5
6 (4.5,5) (1,1) 5.31 (5,7) 2.06
7 (3.5,4.5) (1,1) 4.3 (5,7) 2.91
A cluster is
defined by
its centroid
(1 − 1.5)2+(1 − 2)2 =1.11
And so on…
Iteration 1- Assign Objects to closest clusters
Object
(Oi)
Features
Centroid 1
(C1)
D(Oi ,C1)
Centroid 2
(C2)
D(Oi ,C2)
Closest
Centroid
1 (1,1) (1,1) 0 (5,7) 7.21 C1
2 (1.5,2) (1,1) 1.11 (5,7) 6.1 C1
3 (3,4) (1,1) 3.05 (5,7) 3.60 C1
4 (5,7) (1,1) 7.21 (5,7) 0 C2
5 (3.5,5) (1,1) 4.71 (5,7) 2.5 C2
6 (4.5,5) (1,1) 5.31 (5,7) 2.06 C2
7 (3.5,4.5) (1,1) 4.3 (5,7) 2.91 C2
A cluster is
defined by
its centroid
Object 1 is
assigned to
cluster 1 and so
on..
Re computing Centroids at the end of Iteration 1
Individuals New Centroids
Cluster 1 1, 2, 3 C1 = ((1,1)+(1.5,2)+(3,4))/3 = (1.8, 2.3)
Cluster 2 4, 5, 6, 7 C2 =((5,7)+(3.5,5)+(4.5,5)+ (3.5,4.5))/4 = (4.1, 5.4)
Now the initial partition has changed, and the two clusters at this stage having the
following characteristics:
Iteration 2- Check if any object has changed clusters
Object
(Oi)
Features
Centroid 1
(C1)
D(Oi ,C1)
Centroid 2
(C2)
D(Oi ,C2)
Closest
Centroid
1 (1,1) (1.8,2.3) (4.1, 5.4)
2 (1.5,2) (1.8,2.3) (4.1, 5.4)
3 (3,4) (1.8,2.3) (4.1, 5.4)
4 (5,7) (1.8,2.3) (4.1, 5.4)
5 (3.5,5) (1.8,2.3) (4.1, 5.4)
6 (4.5,5) (1.8,2.3) (4.1, 5.4)
7 (3.5,4.5) (1.8,2.3) (4.1, 5.4)
Iteration 2- Check if any object has changed clusters
Object
(Oi)
Features
Centroid 1
(C1)
D(Oi ,C1)
Centroid 2
(C2)
D(Oi ,C2)
Closest
Centroid
1 (1,1) (1.8,2.3) 1.53 (4.1, 5.4) 5.38 C1
2 (1.5,2) (1.8,2.3) 0.42 (4.1, 5.4) 4.28 C1
3 (3,4) (1.8,2.3) 2.08 (4.1, 5.4) 1.78 C2
4 (5,7) (1.8,2.3) 5.69 (4.1, 5.4) 1.84 C2
5 (3.5,5) (1.8,2.3) 3.19 (4.1, 5.4) 0.72 C2
6 (4.5,5) (1.8,2.3) 3.82 (4.1, 5.4) 0.57 C2
7 (3.5,4.5) (1.8,2.3) 2.78 (4.1, 5.4) 1.08 C2
Object 3 has
changed
cluster from 1
to 2
Re computing Centroids at the end of Iteration 2
Individuals New Centroids
Cluster 1 1, 2 C1 = ((1,1)+(1.5,2))/2 = (1.3, 1.5)
Cluster 2 3,4, 5, 6, 7 C2 =((3,4)+(5,7)+(3.5,5)+(4.5,5)+ (3.5,4.5))/5 = (3.9, 5.1)
Now the initial partition has changed wih Object 3 getting relocated to cluster 2 and
the two clusters at this stage having the following characteristics:
Iteration 3- Check if any object has changed clusters
Object
(Oi)
Features
Centroid 1
(C1)
D(Oi ,C1)
Centroid 2
(C2)
D(Oi ,C2)
Closest
Centroid
1 (1,1) (1.3, 1.5) 0.58 (3.9, 5.1) 5.02 C1
2 (1.5,2) (1.3, 1.5) 0.54 (3.9, 5.1) 3.92 C1
3 (3,4) (1.3, 1.5) 3.02 (3.9, 5.1) 1.42 C2
4 (5,7) (1.3, 1.5) 6.63 (3.9, 5.1) 2.19 C2
5 (3.5,5) (1.3, 1.5) 4.13 (3.9, 5.1) 0.41 C2
6 (4.5,5) (1.3, 1.5) 4.74 (3.9, 5.1) 0.61 C2
7 (3.5,4.5) (1.3, 1.5) 3.72 (3.9, 5.1) 0.72 C2
No change in
clusters
compared to
previous
iteration
Conclusion
• In this example each individual is now nearer its own cluster mean
than that of the other cluster and the iteration stops, choosing the
latest partitioning as the final cluster solution.
• Hence Objects {1,2} belong to first cluster and Objects {3,4,5,6,7}
belong to second cluster.
Notes
• The iterative relocation would continue until no more relocations occur.
• Luckily, in the example, we got the no-relocation condition satisfied in 3
iterations, but this is not usually the case. It might require hundreds of
iterations depending on the dataset.
• Also, it is possible that the k-means algorithm won't find a final solution at
all.
• In this case it would be a good idea to consider stopping the algorithm
after a pre-chosen maximum of iterations.
Comments on the K-Means Method
• Strength
• Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations.
Normally, k, t << n.
• Often terminates at a local optimum. The global optimum may be found using
techniques such as: deterministic annealing and genetic algorithms
• Weakness
• Applicable only when mean is defined, then what about categorical data?
• Need to specify k, the number of clusters, in advance
• Unable to handle noisy data and outliers
• Not suitable to discover clusters with non-convex shapes
32
Summary
• K-means algorithm is a simple yet popular method for clustering analysis
• Its performance is determined by initialisation and appropriate distance
measure
• There are several variants of K-means to overcome its weaknesses
• K-Medoids: resistance to noise and/or outliers
• K-Modes: extension to categorical data clustering analysis
• CLARA: extension to deal with large data sets
• Mixture models (EM algorithm): handling uncertainty of clusters
Online tutorial: the K-means function in Matlab
https://www.youtube.com/watch?v=aYzjenNNOcc
References
• http://mnemstudio.org/clustering-k-means-example-1.htm
• Ke Chen, K-means Clustering ,COMP24111-Machine Learning,
University of Manchester, 2016.
• {Insert Reference}/ 10.ppt
• {Insert Reference}/MachinLearning3.ppt

More Related Content

What's hot

K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERINGsingh7599
 
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API ExamplesApache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API ExamplesBinu George
 
Clustering &amp; classification
Clustering &amp; classificationClustering &amp; classification
Clustering &amp; classificationJamshed Khan
 
11. Hashing - Data Structures using C++ by Varsha Patil
11. Hashing - Data Structures using C++ by Varsha Patil11. Hashing - Data Structures using C++ by Varsha Patil
11. Hashing - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Mustafa Sherazi
 
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...Edureka!
 
Analysis Of Algorithms - Hashing
Analysis Of Algorithms - HashingAnalysis Of Algorithms - Hashing
Analysis Of Algorithms - HashingSam Light
 
Zookeeper big sonata
Zookeeper  big sonataZookeeper  big sonata
Zookeeper big sonataAnh Le
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means ClusteringAnna Fensel
 
Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar Timothy Spann
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysisguru_prasadg
 
K means clustering
K means clusteringK means clustering
K means clusteringkeshav goyal
 
CLUSTER SILHOUETTES.pptx
CLUSTER SILHOUETTES.pptxCLUSTER SILHOUETTES.pptx
CLUSTER SILHOUETTES.pptxagniva pradhan
 
Tutorial - Modern Real Time Streaming Architectures
Tutorial - Modern Real Time Streaming ArchitecturesTutorial - Modern Real Time Streaming Architectures
Tutorial - Modern Real Time Streaming ArchitecturesKarthik Ramasamy
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache KafkaPaul Brebner
 

What's hot (20)

K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERING
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API ExamplesApache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
 
KNN
KNNKNN
KNN
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Clustering &amp; classification
Clustering &amp; classificationClustering &amp; classification
Clustering &amp; classification
 
11. Hashing - Data Structures using C++ by Varsha Patil
11. Hashing - Data Structures using C++ by Varsha Patil11. Hashing - Data Structures using C++ by Varsha Patil
11. Hashing - Data Structures using C++ by Varsha Patil
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)
 
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
Analysis Of Algorithms - Hashing
Analysis Of Algorithms - HashingAnalysis Of Algorithms - Hashing
Analysis Of Algorithms - Hashing
 
Zookeeper big sonata
Zookeeper  big sonataZookeeper  big sonata
Zookeeper big sonata
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
CLUSTER SILHOUETTES.pptx
CLUSTER SILHOUETTES.pptxCLUSTER SILHOUETTES.pptx
CLUSTER SILHOUETTES.pptx
 
Tutorial - Modern Real Time Streaming Architectures
Tutorial - Modern Real Time Streaming ArchitecturesTutorial - Modern Real Time Streaming Architectures
Tutorial - Modern Real Time Streaming Architectures
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 

Similar to Pattern recognition binoy k means clustering

Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsNithyananthSengottai
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.pptvikassingh569137
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in RSudhakar Chavan
 
Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means ClusteringJunghoon Kim
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit vmalathieswaran29
 
Training machine learning k means 2017
Training machine learning k means 2017Training machine learning k means 2017
Training machine learning k means 2017Iwan Sofana
 
Business analytics course in delhi
Business analytics course in delhiBusiness analytics course in delhi
Business analytics course in delhibhuvan8999
 
data science course in delhi
data science course in delhidata science course in delhi
data science course in delhidevipatnala1
 
business analytics course in delhi
business analytics course in delhibusiness analytics course in delhi
business analytics course in delhidevipatnala1
 
Best data science training, best data science training institute in hyderabad.
 Best data science training, best data science training institute in hyderabad. Best data science training, best data science training institute in hyderabad.
Best data science training, best data science training institute in hyderabad.Data Analytics Courses in Pune
 
Data scientist course in hyderabad
Data scientist course in hyderabadData scientist course in hyderabad
Data scientist course in hyderabadprathyusha1234
 
Data scientist training in bangalore
Data scientist training in bangaloreData scientist training in bangalore
Data scientist training in bangaloreprathyusha1234
 
Data science course in chennai (3)
Data science course in chennai (3)Data science course in chennai (3)
Data science course in chennai (3)prathyusha1234
 
data science course in chennai
data science course in chennaidata science course in chennai
data science course in chennaidevipatnala1
 
Best institute for data science in hyderabad
Best institute for data science in hyderabadBest institute for data science in hyderabad
Best institute for data science in hyderabadprathyusha1234
 

Similar to Pattern recognition binoy k means clustering (20)

Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering concepts
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
 
Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means Clustering
 
08 clustering
08 clustering08 clustering
08 clustering
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
 
Clustering
ClusteringClustering
Clustering
 
Training machine learning k means 2017
Training machine learning k means 2017Training machine learning k means 2017
Training machine learning k means 2017
 
Clustering
ClusteringClustering
Clustering
 
Business analytics course in delhi
Business analytics course in delhiBusiness analytics course in delhi
Business analytics course in delhi
 
data science course in delhi
data science course in delhidata science course in delhi
data science course in delhi
 
business analytics course in delhi
business analytics course in delhibusiness analytics course in delhi
business analytics course in delhi
 
Best data science training, best data science training institute in hyderabad.
 Best data science training, best data science training institute in hyderabad. Best data science training, best data science training institute in hyderabad.
Best data science training, best data science training institute in hyderabad.
 
Data scientist course in hyderabad
Data scientist course in hyderabadData scientist course in hyderabad
Data scientist course in hyderabad
 
Data scientist training in bangalore
Data scientist training in bangaloreData scientist training in bangalore
Data scientist training in bangalore
 
Data science course in chennai (3)
Data science course in chennai (3)Data science course in chennai (3)
Data science course in chennai (3)
 
data science course in chennai
data science course in chennaidata science course in chennai
data science course in chennai
 
Best institute for data science in hyderabad
Best institute for data science in hyderabadBest institute for data science in hyderabad
Best institute for data science in hyderabad
 

Recently uploaded

ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 

Pattern recognition binoy k means clustering

  • 2. What is Clustering? • Cluster: a collection of data objects • Similar to one another within the same cluster • Dissimilar to the objects in other clusters • Cluster analysis • Grouping a set of data objects into clusters • Clustering is unsupervised classification: no predefined classes • Typical applications • As a stand-alone tool to get insight into data distribution • As a preprocessing step for other algorithms
  • 3. 3 Examples of Clustering Applications • Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs • Land use: Identification of areas of similar land use in an earth observation database • Insurance: Identifying groups of motor insurance policy holders with a high average claim cost • Urban planning: Identifying groups of houses according to their house type, value, and geographical location • Seismology: Observed earth quake epicenters should be clustered along continent faults
  • 4. 4 What Is a Good Clustering? • A good clustering method will produce clusters with • High intra-class similarity • Low inter-class similarity • Precise definition of clustering quality is difficult • Application-dependent • Ultimately subjective
  • 5. 5 Requirements for Clustering in Data Mining • Scalability • Ability to deal with different types of attributes • Discovery of clusters with arbitrary shape • Minimal domain knowledge required to determine input parameters • Ability to deal with noise and outliers • Insensitivity to order of input records • Robustness wrt high dimensionality • Incorporation of user-specified constraints • Interpretability and usability
  • 6. 6 Major Clustering Approaches • Partitioning: Construct various partitions and then evaluate them by some criterion • Hierarchical: Create a hierarchical decomposition of the set of objects using some criterion • Model-based: Hypothesize a model for each cluster and find best fit of models to data • Density-based: Guided by connectivity and density functions
  • 7. 7 Partitioning Algorithms • Partitioning method: Construct a partition of a database D of n objects into a set of k clusters • Given a k, find a partition of k clusters that optimizes the chosen partitioning criterion • Global optimal: exhaustively enumerate all partitions • Heuristic methods: k-means and k-medoids algorithms • k-means (MacQueen, 1967): Each cluster is represented by the center of the cluster • k-medoids or PAM (Partition around medoids) (Kaufman & Rousseeuw, 1987): Each cluster is represented by one of the objects in the cluster
  • 8. Partitional Clustering • Each instance is placed in exactly one of K nonoverlapping clusters. • Since only one set of clusters is output, the user normally has to input the desired number of clusters K.
  • 9. 9 Similarity and Dissimilarity Between Objects • Euclidean distance (p = 2): • Properties of a metric d(i,j): • d(i,j)  0 • d(i,i) = 0 • d(i,j) = d(j,i) • d(i,j)  d(i,k) + d(k,j) )||...|||(|),( 22 22 2 11 pp j x i x j x i x j x i xjid 
  • 10. Squared Error 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 Objective Function
  • 11. Algorithm k-means 1. Decide on a value for k. 2. Initialize the k cluster centers (randomly, if necessary). 3. Decide the class memberships of the N objects by assigning them to the nearest cluster center. 4. Re-estimate the k cluster centers, by assuming the memberships found above are correct. 5. If none of the N objects changed membership in the last iteration, exit. Otherwise goto 3.
  • 12. 0 1 2 3 4 5 0 1 2 3 4 5 K-means Clustering: Step 1 Algorithm: k-means, Distance Metric: Euclidean Distance k1 k2 k3
  • 13. 0 1 2 3 4 5 0 1 2 3 4 5 K-means Clustering: Step 2 Algorithm: k-means, Distance Metric: Euclidean Distance k1 k2 k3
  • 14. 0 1 2 3 4 5 0 1 2 3 4 5 K-means Clustering: Step 3 Algorithm: k-means, Distance Metric: Euclidean Distance k1 k2 k3
  • 15. 0 1 2 3 4 5 0 1 2 3 4 5 K-means Clustering: Step 4 Algorithm: k-means, Distance Metric: Euclidean Distance k1 k2 k3
  • 16. 0 1 2 3 4 5 0 1 2 3 4 5 expression in condition 1 expressionincondition2 K-means Clustering: Step 5 Algorithm: k-means, Distance Metric: Euclidean Distance k1 k2 k3
  • 18. Example Subject Features 1 (1,1) 2 (1.5,2) 3 (3,4) 4 (5,7) 5 (3.5,5) 6 (4.5,5) 7 (3.5,4.5) As a simple illustration of a k-means algorithm, consider the following data set consisting of the scores of two variables on each of seven individuals: Scatter Plot
  • 19. Working Individual Mean Vector (centroid) Group 1 1 (1, 1) Group 2 4 (5, 7) This data set is to be grouped into two clusters, i.e k=2. As a first step in finding a sensible initial partition, let the feature values of the two individuals furthest apart (using the Euclidean distance measure), define the initial cluster means, giving:
  • 20. Working • The remaining individuals are now examined in sequence and allocated to the cluster to which they are closest, in terms of Euclidean distance to the cluster mean. • The mean vector is recalculated each time a new member is added. • This leads to the following series of steps:
  • 21. Iteration 1- Assign Objects to closest clusters Object (Oi) Features Centroid 1 (C1) D(Oi ,C1) Centroid 2 (C2) D(Oi ,C2) Closest Centroid 1 (1,1) (1,1) (5,7) 2 (1.5,2) (1,1) (5,7) 3 (3,4) (1,1) (5,7) 4 (5,7) (1,1) (5,7) 5 (3.5,5) (1,1) (5,7) 6 (4.5,5) (1,1) (5,7) 7 (3.5,4.5) (1,1) (5,7) A cluster is defined by its centroid
  • 22. Iteration 1- Assign Objects to closest clusters Object (Oi) Features Centroid 1 (C1) D(Oi ,C1) Centroid 2 (C2) D(Oi ,C2) Closest Centroid 1 (1,1) (1,1) 0 (5,7) 7.21 2 (1.5,2) (1,1) 1.11 (5,7) 6.1 3 (3,4) (1,1) 3.05 (5,7) 3.60 4 (5,7) (1,1) 7.21 (5,7) 0 5 (3.5,5) (1,1) 4.71 (5,7) 2.5 6 (4.5,5) (1,1) 5.31 (5,7) 2.06 7 (3.5,4.5) (1,1) 4.3 (5,7) 2.91 A cluster is defined by its centroid (1 − 1.5)2+(1 − 2)2 =1.11 And so on…
  • 23. Iteration 1- Assign Objects to closest clusters Object (Oi) Features Centroid 1 (C1) D(Oi ,C1) Centroid 2 (C2) D(Oi ,C2) Closest Centroid 1 (1,1) (1,1) 0 (5,7) 7.21 C1 2 (1.5,2) (1,1) 1.11 (5,7) 6.1 C1 3 (3,4) (1,1) 3.05 (5,7) 3.60 C1 4 (5,7) (1,1) 7.21 (5,7) 0 C2 5 (3.5,5) (1,1) 4.71 (5,7) 2.5 C2 6 (4.5,5) (1,1) 5.31 (5,7) 2.06 C2 7 (3.5,4.5) (1,1) 4.3 (5,7) 2.91 C2 A cluster is defined by its centroid Object 1 is assigned to cluster 1 and so on..
  • 24. Re computing Centroids at the end of Iteration 1 Individuals New Centroids Cluster 1 1, 2, 3 C1 = ((1,1)+(1.5,2)+(3,4))/3 = (1.8, 2.3) Cluster 2 4, 5, 6, 7 C2 =((5,7)+(3.5,5)+(4.5,5)+ (3.5,4.5))/4 = (4.1, 5.4) Now the initial partition has changed, and the two clusters at this stage having the following characteristics:
  • 25. Iteration 2- Check if any object has changed clusters Object (Oi) Features Centroid 1 (C1) D(Oi ,C1) Centroid 2 (C2) D(Oi ,C2) Closest Centroid 1 (1,1) (1.8,2.3) (4.1, 5.4) 2 (1.5,2) (1.8,2.3) (4.1, 5.4) 3 (3,4) (1.8,2.3) (4.1, 5.4) 4 (5,7) (1.8,2.3) (4.1, 5.4) 5 (3.5,5) (1.8,2.3) (4.1, 5.4) 6 (4.5,5) (1.8,2.3) (4.1, 5.4) 7 (3.5,4.5) (1.8,2.3) (4.1, 5.4)
  • 26. Iteration 2- Check if any object has changed clusters Object (Oi) Features Centroid 1 (C1) D(Oi ,C1) Centroid 2 (C2) D(Oi ,C2) Closest Centroid 1 (1,1) (1.8,2.3) 1.53 (4.1, 5.4) 5.38 C1 2 (1.5,2) (1.8,2.3) 0.42 (4.1, 5.4) 4.28 C1 3 (3,4) (1.8,2.3) 2.08 (4.1, 5.4) 1.78 C2 4 (5,7) (1.8,2.3) 5.69 (4.1, 5.4) 1.84 C2 5 (3.5,5) (1.8,2.3) 3.19 (4.1, 5.4) 0.72 C2 6 (4.5,5) (1.8,2.3) 3.82 (4.1, 5.4) 0.57 C2 7 (3.5,4.5) (1.8,2.3) 2.78 (4.1, 5.4) 1.08 C2 Object 3 has changed cluster from 1 to 2
  • 27. Re computing Centroids at the end of Iteration 2 Individuals New Centroids Cluster 1 1, 2 C1 = ((1,1)+(1.5,2))/2 = (1.3, 1.5) Cluster 2 3,4, 5, 6, 7 C2 =((3,4)+(5,7)+(3.5,5)+(4.5,5)+ (3.5,4.5))/5 = (3.9, 5.1) Now the initial partition has changed wih Object 3 getting relocated to cluster 2 and the two clusters at this stage having the following characteristics:
  • 28. Iteration 3- Check if any object has changed clusters Object (Oi) Features Centroid 1 (C1) D(Oi ,C1) Centroid 2 (C2) D(Oi ,C2) Closest Centroid 1 (1,1) (1.3, 1.5) 0.58 (3.9, 5.1) 5.02 C1 2 (1.5,2) (1.3, 1.5) 0.54 (3.9, 5.1) 3.92 C1 3 (3,4) (1.3, 1.5) 3.02 (3.9, 5.1) 1.42 C2 4 (5,7) (1.3, 1.5) 6.63 (3.9, 5.1) 2.19 C2 5 (3.5,5) (1.3, 1.5) 4.13 (3.9, 5.1) 0.41 C2 6 (4.5,5) (1.3, 1.5) 4.74 (3.9, 5.1) 0.61 C2 7 (3.5,4.5) (1.3, 1.5) 3.72 (3.9, 5.1) 0.72 C2 No change in clusters compared to previous iteration
  • 29. Conclusion • In this example each individual is now nearer its own cluster mean than that of the other cluster and the iteration stops, choosing the latest partitioning as the final cluster solution. • Hence Objects {1,2} belong to first cluster and Objects {3,4,5,6,7} belong to second cluster.
  • 30. Notes • The iterative relocation would continue until no more relocations occur. • Luckily, in the example, we got the no-relocation condition satisfied in 3 iterations, but this is not usually the case. It might require hundreds of iterations depending on the dataset. • Also, it is possible that the k-means algorithm won't find a final solution at all. • In this case it would be a good idea to consider stopping the algorithm after a pre-chosen maximum of iterations.
  • 31. Comments on the K-Means Method • Strength • Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n. • Often terminates at a local optimum. The global optimum may be found using techniques such as: deterministic annealing and genetic algorithms • Weakness • Applicable only when mean is defined, then what about categorical data? • Need to specify k, the number of clusters, in advance • Unable to handle noisy data and outliers • Not suitable to discover clusters with non-convex shapes
  • 32. 32 Summary • K-means algorithm is a simple yet popular method for clustering analysis • Its performance is determined by initialisation and appropriate distance measure • There are several variants of K-means to overcome its weaknesses • K-Medoids: resistance to noise and/or outliers • K-Modes: extension to categorical data clustering analysis • CLARA: extension to deal with large data sets • Mixture models (EM algorithm): handling uncertainty of clusters Online tutorial: the K-means function in Matlab https://www.youtube.com/watch?v=aYzjenNNOcc
  • 33. References • http://mnemstudio.org/clustering-k-means-example-1.htm • Ke Chen, K-means Clustering ,COMP24111-Machine Learning, University of Manchester, 2016. • {Insert Reference}/ 10.ppt • {Insert Reference}/MachinLearning3.ppt