Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fast and Probvably Seedings for k-Means


Published on

Material for NIPS paper reading meetup

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Fast and Probvably Seedings for k-Means

  1. 1. Fast and Provably Good Seedings for k-Means O. Bachem, M. Lucic, S. Hassani, A. Krause Presented by Kimikazu Kato, Silver Egg Technology Co., Ltd.
  2. 2. Algorithm of k-Means clustering Determine initial centroids Update centroids and membership of clusters gradually Improvement of this part Existing results: k-means++: sampling according to some metric Bachem et al. 2016: Performance improvement using MCMC, but has some assumption about the distribution of the data Proposed: Another MCMC based algorithm without assumption of the distribution Outline
  3. 3. Related researches kmeans++ Draw accoding to Intuition: Choose initial centroids from the input data so that they scatter as widely as possible Bachem et al. 2016 Intended to overcome the shortcoming of kmeans++: the marginalization cost Metropolitan Hastings algorithm, which utilizes rejection sampling to emulate the distribution. But have some assumption on the input data. as a centroid C: set of centroids which are already chosen
  4. 4. Proposed Algorithm Update from the preceding result: rejection criterion The convergence is mathematically proved.
  5. 5. Experimental Results 1/3
  6. 6. Experimental Results 2/3
  7. 7. Experimental Results 3/3
  8. 8. Conclusion • Novel algorithm for the initialization of centroids in kmeans • Theoretical guarantee on the convergence and the trade-off of accuracy and speed • Experimentally good result