ShareTed Dunning describes an implementation of recent results that provide high quality k-means clustering at very high speed. ...
"For well clusterable data, this algorithm provides good bounds on quality, but practically speaking, it makes clustering practical in many applications by providing roughly 3 orders of magnitude speedup relative to the standard algorithm based on Lloyd's initial efforts. In addition, the algorithm is highly amenable to implementation using map-reduce and shows essentially linear speedup.
Just as significant, this new algorithm allows clustering with a very large number of clusters which makes it practical to use as a feature extraction algorithm or set up for a nearest neighbor search. " - Ted Dunning
