SlideShare a Scribd company logo
1 of 31
Download to read offline
Clustering - Definition
─ Process of grouping similar items together
─ Clusters should be very similar to each other
but…
─ Should be very different from the objects of other
clusters/ other clusters
─ We can say that intra-cluster similarity between
objects is high and inter-cluster similarity is low
─ Important human activity --- used from early
childhood in distinguishing between different
items such as cars and cats, animals and plants
etc.
Supervised and Unsupervised Classification
─ What is Classification?
─ What is Supervised Classification/Learning?
─ What is Unsupervised Classification/Learning?
Types of Clustering Algorithms
─ Clustering has been a popular area of research
─ Several methods and techniques have been
developed to determine natural grouping among
the objects
Jain, A. K., Murty, M. N., and Flynn, P. J., Data Clustering: A Survey.
ACM Computing Surveys, 1999. 31: pp. 264-323.
Jain, A. K. and Dubes, R. C., Algorithms for Clustering Data. 1988,
Englewood Cliffs, NJ: Prentice Hall. 013022278X
Types of Clustering Algorithms
Hierarchical
Methods
Partitioning
Methods
Grid-Based
Methods
Clustering
Algorithms Used in
Machine Learning
Algorithms For
High Dimensional
Data
Agglomerative
Algorithms
Divisive
Algorithms
Relocation
Algorithms
Probabilistic
Clustering
K-medoids
Methods
K-means Methods Density-Based
Algorithms
Density-Based
Connectivity
Clustering
Density Functions
Clustering
Gradient Descent
and Artificial
Neural Networks
Evolutionary
Methods
Subspace
Clustering
Co-Clustering
Techniques
Projection
Techniques
Clustering
Hierarchical
Methods
Partitioning
Methods
Grid-Based
Methods
Clustering
Algorithms Used in
Machine Learning
Algorithms For
High Dimensional
Data
Hierarchical
Methods
Partitioning
Methods
Grid-Based
Methods
Clustering
Algorithms Used in
Machine Learning
Algorithms For
High Dimensional
Data
Agglomerative
Algorithms
Divisive
Algorithms
Agglomerative
Algorithms
Divisive
Algorithms
Relocation
Algorithms
Probabilistic
Clustering
K-medoids
Methods
K-means Methods Density-Based
Algorithms
Relocation
Algorithms
Probabilistic
Clustering
K-medoids
Methods
K-means Methods Density-Based
Algorithms
Density-Based
Connectivity
Clustering
Density Functions
Clustering
Density-Based
Connectivity
Clustering
Density Functions
Clustering
Gradient Descent
and Artificial
Neural Networks
Evolutionary
Methods
Gradient Descent
and Artificial
Neural Networks
Evolutionary
Methods
Subspace
Clustering
Co-Clustering
Techniques
Projection
Techniques
Clustering
Classification vs. Clustering
Classification:
Supervised learning:
Learns a method for predicting the
instance class from pre-labeled
(classified) instances
Clustering
Unsupervised learning:
Finds “natural” grouping of
instances given un-labeled data
The Distance Function
• Simplest case: one numeric attribute A
– Distance(X,Y) = A(X) – A(Y)
• Several numeric attributes:
– Distance(X,Y) = Euclidean distance between
X,Y
• Are all attributes equally important?
– Weighting the attributes might be necessary
Simple Clustering: K-means
Works with numeric data only
1) Pick a number (K) of cluster centers (at
random)
2) Assign every item to its nearest cluster
center (e.g. using Euclidean distance)
3) Move each cluster center to the mean of
its assigned items
4) Repeat steps 2,3 until convergence
(change in cluster assignments less than
a threshold)
K-means example, step 1
k1
k2
k3
X
Y
Pick 3
initial
cluster
centers
(randomly)
K-means example, step 2
k1
k2
k3
X
Y
Assign
each point
to the closest
cluster
center
K-means example, step 3
X
Y
Move
each cluster
center
to the mean
of each cluster
k1
k2
k2
k1
k3
k3
K-means example, step 4
X
Y
Reassign
points
closest to a
different new
cluster center
Q: Which
points are
reassigned?
k1
k2
k3
K-means example, step 4 …
X
Y
A: three
points with
animation
k1
k3
k2
K-means example, step 4b
X
Y
re-compute
cluster
means
k1
k3
k2
K-means example, step 5
X
Y
move cluster
centers to
cluster means
k2
k1
k3
Pros and cons of K-Means
K-means variations
• K-medoids – instead of mean, use
medians of each cluster
–Mean of 1, 3, 5, 7, 9 is
–Mean of 1, 3, 5, 7, 1009 is
–Median of 1, 3, 5, 7, 1009 is
–Median advantage: not affected by extreme
values
• For large databases, use sampling
5
205
5
k-Medoids
The k-Medoids Algorithm
Evaluating Cost of Swapping Medoids
Evaluating Cost of Swapping Medoids
Four Cases
Total Cost of Swap
K-means clustering summary
Advantages
• Simple, understandable
• items automatically
assigned to clusters
Disadvantages
• Must pick number of
clusters before hand
• All items forced into a
cluster
• Too sensitive to outliers
since an object with an
extremely large value
may substantially
distort the distribution
of data
Hierarchical clustering
• Agglomerative Clustering
– Start with single-instance clusters
– At each step, join the two closest clusters
– Design decision: distance between clusters
• Divisive Clustering
– Start with one universal cluster
– Find two clusters
– Proceed recursively on each subset
– Can be very fast
• Both methods produce a
dendrogram
g a c i e d k b j f h
Partial Supervision of Clustering
A two dimensional image of supervised clusters
A two dimensional image of supervised clusters (real case)
Partial Supervision of Clustering
Partial Supervision of Clustering
5
4
3
2
1
5
4
3
2
1
Disputed Data
Point
A two dimensional image of the different zones of overlapping clusters
who both claim a data point (More than two clusters claiming a point is
also common)
Research Problems
─ Effective and Efficient methods of Clustering
─ Scalability
─ Handling different types of data
─ Handling complex multidimensional data
─ Complex shapes of clusters
─ Subspace Clustering
─ Cluster overlapping etc.
Examples of Clustering Applications
• Marketing: discover customer groups and use
them for targeted marketing and re-organization
• Astronomy: find groups of similar stars and
galaxies
• Earth-quake studies: Observed earth quake
epicenters should be clustered along continent
faults
• Genomics: finding groups of gene with similar
expressions
• …
Clustering Summary
• unsupervised
• many approaches
–K-means – simple, sometimes useful
• K-medoids is less sensitive to outliers
–Hierarchical clustering – works for symbolic
attributes
–Can be used to fill in missing values

More Related Content

Similar to Clustering.pdf

machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in RSudhakar Chavan
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit vmalathieswaran29
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptionsrefedey275
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3Nandhini S
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSandinoBerutu1
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptImXaib
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxShwetapadmaBabu1
 
Data mining and warehousing
Data mining and warehousingData mining and warehousing
Data mining and warehousingSwetha544947
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxnikshaikh786
 
01 Statistika Lanjut - Cluster Analysis part 1 with sound (1).pptx
01 Statistika Lanjut - Cluster Analysis  part 1 with sound (1).pptx01 Statistika Lanjut - Cluster Analysis  part 1 with sound (1).pptx
01 Statistika Lanjut - Cluster Analysis part 1 with sound (1).pptxniawiya
 
K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsK means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsVoidVampire
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningPyingkodi Maran
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdfbintis1
 

Similar to Clustering.pdf (20)

DM_clustering.ppt
DM_clustering.pptDM_clustering.ppt
DM_clustering.ppt
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
 
Cluster Analysis.pptx
Cluster Analysis.pptxCluster Analysis.pptx
Cluster Analysis.pptx
 
Clusteryanam
ClusteryanamClusteryanam
Clusteryanam
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
 
Clustering
ClusteringClustering
Clustering
 
Clustering
ClusteringClustering
Clustering
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
Data mining and warehousing
Data mining and warehousingData mining and warehousing
Data mining and warehousing
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptx
 
01 Statistika Lanjut - Cluster Analysis part 1 with sound (1).pptx
01 Statistika Lanjut - Cluster Analysis  part 1 with sound (1).pptx01 Statistika Lanjut - Cluster Analysis  part 1 with sound (1).pptx
01 Statistika Lanjut - Cluster Analysis part 1 with sound (1).pptx
 
Clustering on DSS
Clustering on DSSClustering on DSS
Clustering on DSS
 
K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsK means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objects
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdf
 

More from saman Iftikhar

More from saman Iftikhar (14)

This-that-these-those.pdf
This-that-these-those.pdfThis-that-these-those.pdf
This-that-these-those.pdf
 
project planning components.pdf
project planning components.pdfproject planning components.pdf
project planning components.pdf
 
02Data updated.pdf
02Data updated.pdf02Data updated.pdf
02Data updated.pdf
 
networking lab
networking labnetworking lab
networking lab
 
Science
Science Science
Science
 
O p
O pO p
O p
 
Interface andexceptions
Interface andexceptionsInterface andexceptions
Interface andexceptions
 
Ethical principles in psychological research
Ethical principles in psychological researchEthical principles in psychological research
Ethical principles in psychological research
 
polysemy tag detect in tag sets
polysemy tag detect in tag setspolysemy tag detect in tag sets
polysemy tag detect in tag sets
 
Selection
SelectionSelection
Selection
 
Pipeline
PipelinePipeline
Pipeline
 
Context diagram
Context diagramContext diagram
Context diagram
 
Database
DatabaseDatabase
Database
 
Flags registers
Flags registersFlags registers
Flags registers
 

Recently uploaded

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 

Recently uploaded (20)

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 

Clustering.pdf

  • 1. Clustering - Definition ─ Process of grouping similar items together ─ Clusters should be very similar to each other but… ─ Should be very different from the objects of other clusters/ other clusters ─ We can say that intra-cluster similarity between objects is high and inter-cluster similarity is low ─ Important human activity --- used from early childhood in distinguishing between different items such as cars and cats, animals and plants etc.
  • 2. Supervised and Unsupervised Classification ─ What is Classification? ─ What is Supervised Classification/Learning? ─ What is Unsupervised Classification/Learning?
  • 3. Types of Clustering Algorithms ─ Clustering has been a popular area of research ─ Several methods and techniques have been developed to determine natural grouping among the objects Jain, A. K., Murty, M. N., and Flynn, P. J., Data Clustering: A Survey. ACM Computing Surveys, 1999. 31: pp. 264-323. Jain, A. K. and Dubes, R. C., Algorithms for Clustering Data. 1988, Englewood Cliffs, NJ: Prentice Hall. 013022278X
  • 4. Types of Clustering Algorithms Hierarchical Methods Partitioning Methods Grid-Based Methods Clustering Algorithms Used in Machine Learning Algorithms For High Dimensional Data Agglomerative Algorithms Divisive Algorithms Relocation Algorithms Probabilistic Clustering K-medoids Methods K-means Methods Density-Based Algorithms Density-Based Connectivity Clustering Density Functions Clustering Gradient Descent and Artificial Neural Networks Evolutionary Methods Subspace Clustering Co-Clustering Techniques Projection Techniques Clustering Hierarchical Methods Partitioning Methods Grid-Based Methods Clustering Algorithms Used in Machine Learning Algorithms For High Dimensional Data Hierarchical Methods Partitioning Methods Grid-Based Methods Clustering Algorithms Used in Machine Learning Algorithms For High Dimensional Data Agglomerative Algorithms Divisive Algorithms Agglomerative Algorithms Divisive Algorithms Relocation Algorithms Probabilistic Clustering K-medoids Methods K-means Methods Density-Based Algorithms Relocation Algorithms Probabilistic Clustering K-medoids Methods K-means Methods Density-Based Algorithms Density-Based Connectivity Clustering Density Functions Clustering Density-Based Connectivity Clustering Density Functions Clustering Gradient Descent and Artificial Neural Networks Evolutionary Methods Gradient Descent and Artificial Neural Networks Evolutionary Methods Subspace Clustering Co-Clustering Techniques Projection Techniques Clustering
  • 5. Classification vs. Clustering Classification: Supervised learning: Learns a method for predicting the instance class from pre-labeled (classified) instances
  • 6. Clustering Unsupervised learning: Finds “natural” grouping of instances given un-labeled data
  • 7. The Distance Function • Simplest case: one numeric attribute A – Distance(X,Y) = A(X) – A(Y) • Several numeric attributes: – Distance(X,Y) = Euclidean distance between X,Y • Are all attributes equally important? – Weighting the attributes might be necessary
  • 8. Simple Clustering: K-means Works with numeric data only 1) Pick a number (K) of cluster centers (at random) 2) Assign every item to its nearest cluster center (e.g. using Euclidean distance) 3) Move each cluster center to the mean of its assigned items 4) Repeat steps 2,3 until convergence (change in cluster assignments less than a threshold)
  • 9. K-means example, step 1 k1 k2 k3 X Y Pick 3 initial cluster centers (randomly)
  • 10. K-means example, step 2 k1 k2 k3 X Y Assign each point to the closest cluster center
  • 11. K-means example, step 3 X Y Move each cluster center to the mean of each cluster k1 k2 k2 k1 k3 k3
  • 12. K-means example, step 4 X Y Reassign points closest to a different new cluster center Q: Which points are reassigned? k1 k2 k3
  • 13. K-means example, step 4 … X Y A: three points with animation k1 k3 k2
  • 14. K-means example, step 4b X Y re-compute cluster means k1 k3 k2
  • 15. K-means example, step 5 X Y move cluster centers to cluster means k2 k1 k3
  • 16. Pros and cons of K-Means
  • 17. K-means variations • K-medoids – instead of mean, use medians of each cluster –Mean of 1, 3, 5, 7, 9 is –Mean of 1, 3, 5, 7, 1009 is –Median of 1, 3, 5, 7, 1009 is –Median advantage: not affected by extreme values • For large databases, use sampling 5 205 5
  • 20. Evaluating Cost of Swapping Medoids
  • 21. Evaluating Cost of Swapping Medoids
  • 24. K-means clustering summary Advantages • Simple, understandable • items automatically assigned to clusters Disadvantages • Must pick number of clusters before hand • All items forced into a cluster • Too sensitive to outliers since an object with an extremely large value may substantially distort the distribution of data
  • 25. Hierarchical clustering • Agglomerative Clustering – Start with single-instance clusters – At each step, join the two closest clusters – Design decision: distance between clusters • Divisive Clustering – Start with one universal cluster – Find two clusters – Proceed recursively on each subset – Can be very fast • Both methods produce a dendrogram g a c i e d k b j f h
  • 26. Partial Supervision of Clustering A two dimensional image of supervised clusters
  • 27. A two dimensional image of supervised clusters (real case) Partial Supervision of Clustering
  • 28. Partial Supervision of Clustering 5 4 3 2 1 5 4 3 2 1 Disputed Data Point A two dimensional image of the different zones of overlapping clusters who both claim a data point (More than two clusters claiming a point is also common)
  • 29. Research Problems ─ Effective and Efficient methods of Clustering ─ Scalability ─ Handling different types of data ─ Handling complex multidimensional data ─ Complex shapes of clusters ─ Subspace Clustering ─ Cluster overlapping etc.
  • 30. Examples of Clustering Applications • Marketing: discover customer groups and use them for targeted marketing and re-organization • Astronomy: find groups of similar stars and galaxies • Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults • Genomics: finding groups of gene with similar expressions • …
  • 31. Clustering Summary • unsupervised • many approaches –K-means – simple, sometimes useful • K-medoids is less sensitive to outliers –Hierarchical clustering – works for symbolic attributes –Can be used to fill in missing values