Fast and Probvably Seedings for k-Means

Fast and Provably Good Seedings for k-Means
O. Bachem, M. Lucic, S. Hassani, A. Krause
Presented by Kimikazu Kato,
Silver Egg Technology Co., Ltd.

Algorithm of k-Means clustering
Determine initial
centroids
Update centroids and
membership of clusters
gradually
Improvement of this part
Existing results:
k-means++:
sampling according to some metric
Bachem et al. 2016:
Performance improvement using
MCMC, but has some assumption about
the distribution of the data
Proposed:
Another MCMC based algorithm
without assumption of the distribution
Outline

Related researches
kmeans++
Draw
accoding to
Intuition:
Choose initial centroids from the
input data so that they scatter as
widely as possible
Bachem et al. 2016
Intended to overcome the
shortcoming of kmeans++: the
marginalization cost
Metropolitan Hastings algorithm,
which utilizes rejection sampling
to emulate the distribution.
But have some assumption on the
input data.
as a centroid
C: set of centroids which are
already chosen

Proposed Algorithm
Update from the preceding result: rejection criterion
The convergence is mathematically proved.

Conclusion
• Novel algorithm for the initialization of
centroids in kmeans
• Theoretical guarantee on the convergence
and the trade-off of accuracy and speed
• Experimentally good result

Fast and Probvably Seedings for k-Means

Recommended

Recommended

More Related Content

What's hot

What's hot (11)

Viewers also liked

Viewers also liked (18)

Similar to Fast and Probvably Seedings for k-Means

Similar to Fast and Probvably Seedings for k-Means (20)

More from Kimikazu Kato

More from Kimikazu Kato (20)

Recently uploaded

Recently uploaded (20)

Fast and Probvably Seedings for k-Means