SlideShare a Scribd company logo
1 of 18
NADAR SARASWATHI COLLEGE OF ARTS
AND SCIENCE
Department of CS & IT
Clustering Techniques
PRESENTED BY:
S.SABTHAMI
I.M.Sc(IT)
What is clustering?
Clustering
The process of grouping a set of objects into
classes of similar objects
Documents within a cluster should be similar.
Documents from different clusters should be dissimilar.
The commonest form of unsupervised learning
Unsupervised learning = learning from raw data, as opposed
to supervised data where a classification of examples is
given
A common and important task that finds many
applications in IR and other places
A data set with clear cluster
structure
Applications of clustering in IR
• Whole corpus analysis/navigation
– Better user interface: search without typing
• For improving recall in search applications
– Better search results (like pseudo RF)
• For better navigation of search results
– Effective “user recall” will be higher
• For speeding up vector space retrieval
– Cluster-based retrieval gives faster search
For improving search recall
• Cluster hypothesis - Documents in the same cluster behave
similarly with respect to relevance to information needs
• Therefore, to improve search recall:
– Cluster docs in corpus a priori
– When a query matches a doc D, also return other docs in the
cluster containing D
• Hope if we do this: The query “car” will also return docs
containing automobile
– Because clustering grouped together docs containing car with
those containing automobile.
Issues for clustering
• Representation for clustering
– Document representation
• Vector space? Normalization?
– Centroids aren’t length normalized
– Need a notion of similarity/distance
• How many clusters?
– Fixed a priori?
– Completely data driven?
• Avoid “trivial” clusters - too large or small
– If a cluster's too large, then for navigation purposes you've
wasted an extra user click without whittling down the set of
documents much.
• Ideal: semantic similarity.
• Practical: term-statistical similarity
– We will use cosine similarity.
– Docs as vectors.
– For many algorithms, easier to think in terms
of a distance (rather than similarity) between
docs.
– We will mostly speak of Euclidean distance
• But real implementations use cosine similarity
Clustering Algorithms
• Flat algorithms
– Usually start with a random (partial) partitioning
– Refine it iteratively
• K means clustering
• (Model based clustering)
• Hierarchical algorithms
– Bottom-up, agglomerative
– (Top-down, divisive)
Hard vs. soft clustering
• Hard clustering: Each document belongs to exactly one
cluster
– More common and easier to do
• Soft clustering: A document can belong to more than one
cluster.
– Makes more sense for applications like creating browsable
hierarchies
– You may want to put a pair of sneakers in two clusters: (i)
sports apparel and (ii) shoes
– You can only do that with a soft clustering approach.
Partitioning Algorithms
• Partitioning method: Construct a partition of n
documents into a set of K clusters
• Given: a set of documents and the number K
• Find: a partition of K clusters that optimizes the
chosen partitioning criterion
– Globally optimal
• Intractable for many objective functions
• Ergo, exhaustively enumerate all partitions
– Effective heuristic methods: K-means and K-
medoids algorithms
K-Means
• Assumes documents are real-valued vectors.
• Clusters based on centroids (aka the center of
gravity or mean) of points in a cluster, c:
• Reassignment of instances to clusters is based
on distance to the current cluster centroids.
– (Or one can equivalently phrase it in terms of
similarities)
K-Means Algorithm
• Select K random docs {s1, s2,… sK} as seeds.
• Until clustering converges (or other stopping
criterion):
• For each doc di:
• Assign di to the cluster cj such that dist(xi, sj) is
minimal.
• (Next, update the seeds to the centroid of
each cluster)
• For each cluster cj
• sj = (cj)
Termination conditions
• Several possibilities, e.g.,
–A fixed number of iterations.
–Doc partition unchanged.
–Centroid positions don’t change.
Convergence
• Why should the K-means algorithm ever reach a
fixed point?
– A state in which clusters don’t change.
• K-means is a special case of a general procedure
known as the Expectation Maximization (EM)
algorithm.
– EM is known to converge.
– Number of iterations could be large.
– But in practice usually isn’t
Convergence of K-Means
• Define goodness measure of cluster k as sum of
squared distances from cluster centroid:
– Gk = Σi (di – ck)2 (sum over all di in cluster k)
• G = Σk Gk
• Reassignment monotonically decreases G since
each vector is assigned to the closest centroid.
Convergence of K-Means
• Recomputation monotonically decreases each Gk
since (mk is number of members in cluster k):
– Σ (di – a)2 reaches minimum for:
– Σ –2(di – a) = 0
– Σ di = Σ a
– mK a = Σ di
– a = (1/ mk) Σ di = ck
• K-means typically converges quickly
K-means issues
• Recomputing the centroid after every
assignment (rather than after all points are re-
assigned) can improve speed of convergence
of K-means
• Assumes clusters are spherical in vector space
– Sensitive to coordinate changes, weighting etc.
• Disjoint and exhaustive
– Doesn’t have a notion of “outliers” by default
– But can add outlier filtering
Data minig.pptx

More Related Content

Similar to Data minig.pptx

lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.pptRajeshT305412
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfSowmyaJyothi3
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.pptvikassingh569137
 
Mining the social web 6
Mining the social web 6Mining the social web 6
Mining the social web 6HyeonSeok Choi
 
Unsupervised learning Modi.pptx
Unsupervised learning Modi.pptxUnsupervised learning Modi.pptx
Unsupervised learning Modi.pptxssusere1fd42
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in RSudhakar Chavan
 
Cassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixCassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixJason Brown
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)Pravinkumar Landge
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Zachary Thomas
 

Similar to Data minig.pptx (20)

lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
 
Clustering
ClusteringClustering
Clustering
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
Birch1
Birch1Birch1
Birch1
 
Mining the social web 6
Mining the social web 6Mining the social web 6
Mining the social web 6
 
Unsupervised learning Modi.pptx
Unsupervised learning Modi.pptxUnsupervised learning Modi.pptx
Unsupervised learning Modi.pptx
 
Clustering on DSS
Clustering on DSSClustering on DSS
Clustering on DSS
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 
Clustering.pdf
Clustering.pdfClustering.pdf
Clustering.pdf
 
Clustering.pdf
Clustering.pdfClustering.pdf
Clustering.pdf
 
Cassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixCassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating Netflix
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?
 
Machine Learning - Clustering
Machine Learning - ClusteringMachine Learning - Clustering
Machine Learning - Clustering
 
Clustering
ClusteringClustering
Clustering
 

More from SabthamiS1

women%20empowerment11.pptx
women%20empowerment11.pptxwomen%20empowerment11.pptx
women%20empowerment11.pptxSabthamiS1
 
big data analytics.pptx
big data analytics.pptxbig data analytics.pptx
big data analytics.pptxSabthamiS1
 
artificial intelligence.pptx
artificial intelligence.pptxartificial intelligence.pptx
artificial intelligence.pptxSabthamiS1
 
distributed computing.pptx
distributed computing.pptxdistributed computing.pptx
distributed computing.pptxSabthamiS1
 
Network and internet security
Network and internet security Network and internet security
Network and internet security SabthamiS1
 
Advance computer architecture
Advance computer architecture Advance computer architecture
Advance computer architecture SabthamiS1
 
Data structure and algorithm
Data structure and algorithmData structure and algorithm
Data structure and algorithmSabthamiS1
 

More from SabthamiS1 (12)

women%20empowerment11.pptx
women%20empowerment11.pptxwomen%20empowerment11.pptx
women%20empowerment11.pptx
 
big data analytics.pptx
big data analytics.pptxbig data analytics.pptx
big data analytics.pptx
 
iot.pptx
iot.pptxiot.pptx
iot.pptx
 
dip.pptx
dip.pptxdip.pptx
dip.pptx
 
csc.pptx
csc.pptxcsc.pptx
csc.pptx
 
python.pptx
python.pptxpython.pptx
python.pptx
 
artificial intelligence.pptx
artificial intelligence.pptxartificial intelligence.pptx
artificial intelligence.pptx
 
distributed computing.pptx
distributed computing.pptxdistributed computing.pptx
distributed computing.pptx
 
Network and internet security
Network and internet security Network and internet security
Network and internet security
 
Java
Java Java
Java
 
Advance computer architecture
Advance computer architecture Advance computer architecture
Advance computer architecture
 
Data structure and algorithm
Data structure and algorithmData structure and algorithm
Data structure and algorithm
 

Recently uploaded

Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 

Recently uploaded (20)

Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 

Data minig.pptx

  • 1. NADAR SARASWATHI COLLEGE OF ARTS AND SCIENCE Department of CS & IT Clustering Techniques PRESENTED BY: S.SABTHAMI I.M.Sc(IT)
  • 2. What is clustering? Clustering The process of grouping a set of objects into classes of similar objects Documents within a cluster should be similar. Documents from different clusters should be dissimilar. The commonest form of unsupervised learning Unsupervised learning = learning from raw data, as opposed to supervised data where a classification of examples is given A common and important task that finds many applications in IR and other places
  • 3. A data set with clear cluster structure
  • 4. Applications of clustering in IR • Whole corpus analysis/navigation – Better user interface: search without typing • For improving recall in search applications – Better search results (like pseudo RF) • For better navigation of search results – Effective “user recall” will be higher • For speeding up vector space retrieval – Cluster-based retrieval gives faster search
  • 5. For improving search recall • Cluster hypothesis - Documents in the same cluster behave similarly with respect to relevance to information needs • Therefore, to improve search recall: – Cluster docs in corpus a priori – When a query matches a doc D, also return other docs in the cluster containing D • Hope if we do this: The query “car” will also return docs containing automobile – Because clustering grouped together docs containing car with those containing automobile.
  • 6. Issues for clustering • Representation for clustering – Document representation • Vector space? Normalization? – Centroids aren’t length normalized – Need a notion of similarity/distance • How many clusters? – Fixed a priori? – Completely data driven? • Avoid “trivial” clusters - too large or small – If a cluster's too large, then for navigation purposes you've wasted an extra user click without whittling down the set of documents much.
  • 7. • Ideal: semantic similarity. • Practical: term-statistical similarity – We will use cosine similarity. – Docs as vectors. – For many algorithms, easier to think in terms of a distance (rather than similarity) between docs. – We will mostly speak of Euclidean distance • But real implementations use cosine similarity
  • 8. Clustering Algorithms • Flat algorithms – Usually start with a random (partial) partitioning – Refine it iteratively • K means clustering • (Model based clustering) • Hierarchical algorithms – Bottom-up, agglomerative – (Top-down, divisive)
  • 9. Hard vs. soft clustering • Hard clustering: Each document belongs to exactly one cluster – More common and easier to do • Soft clustering: A document can belong to more than one cluster. – Makes more sense for applications like creating browsable hierarchies – You may want to put a pair of sneakers in two clusters: (i) sports apparel and (ii) shoes – You can only do that with a soft clustering approach.
  • 10. Partitioning Algorithms • Partitioning method: Construct a partition of n documents into a set of K clusters • Given: a set of documents and the number K • Find: a partition of K clusters that optimizes the chosen partitioning criterion – Globally optimal • Intractable for many objective functions • Ergo, exhaustively enumerate all partitions – Effective heuristic methods: K-means and K- medoids algorithms
  • 11. K-Means • Assumes documents are real-valued vectors. • Clusters based on centroids (aka the center of gravity or mean) of points in a cluster, c: • Reassignment of instances to clusters is based on distance to the current cluster centroids. – (Or one can equivalently phrase it in terms of similarities)
  • 12. K-Means Algorithm • Select K random docs {s1, s2,… sK} as seeds. • Until clustering converges (or other stopping criterion): • For each doc di: • Assign di to the cluster cj such that dist(xi, sj) is minimal. • (Next, update the seeds to the centroid of each cluster) • For each cluster cj • sj = (cj)
  • 13. Termination conditions • Several possibilities, e.g., –A fixed number of iterations. –Doc partition unchanged. –Centroid positions don’t change.
  • 14. Convergence • Why should the K-means algorithm ever reach a fixed point? – A state in which clusters don’t change. • K-means is a special case of a general procedure known as the Expectation Maximization (EM) algorithm. – EM is known to converge. – Number of iterations could be large. – But in practice usually isn’t
  • 15. Convergence of K-Means • Define goodness measure of cluster k as sum of squared distances from cluster centroid: – Gk = Σi (di – ck)2 (sum over all di in cluster k) • G = Σk Gk • Reassignment monotonically decreases G since each vector is assigned to the closest centroid.
  • 16. Convergence of K-Means • Recomputation monotonically decreases each Gk since (mk is number of members in cluster k): – Σ (di – a)2 reaches minimum for: – Σ –2(di – a) = 0 – Σ di = Σ a – mK a = Σ di – a = (1/ mk) Σ di = ck • K-means typically converges quickly
  • 17. K-means issues • Recomputing the centroid after every assignment (rather than after all points are re- assigned) can improve speed of convergence of K-means • Assumes clusters are spherical in vector space – Sensitive to coordinate changes, weighting etc. • Disjoint and exhaustive – Doesn’t have a notion of “outliers” by default – But can add outlier filtering