Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- OSGi Community Event 2010 - OSGi Te... by mfrancis 750 views
- HBaseCon 2013: Apache HBase, Apache... by Cloudera, Inc. 2094 views
- Apache drill self service data ex... by MapR Technologies 1266 views
- Building dynamic distributed data s... by mfrancis 1389 views
- Time Series Data in a Time Series W... by MapR Technologies 3094 views
- Alamos and au rico merger presentat... by alamosgoldinc 615 views

No Downloads

Total views

2,984

On SlideShare

0

From Embeds

0

Number of Embeds

3

Shares

0

Downloads

56

Comments

0

Likes

3

No embeds

No notes for slide

- 1. K-Means Clustering with BSP Thomas Jungblut, Testberichte.de, 2012 Study assignment 4th semester, HWR Berlin
- 2. Content What is K-Means Clustering? What is BSP? K-Means with BSP 2/33
- 3. What is K-Means Clustering? 3/33
- 4. Was ist K-Means Clustering?
- 5. 7
- 6. What is K-Means Clustering? Unsupervised Learning Huge number of input vectors k initial centers Two step iterative algorithm Assignment Update 9/33
- 7. How do we parallelize K-Means? 10/33
- 8. What is BSP? BSP = Bulk Synchronous Parallel Paradigm to design parallel algorithms Two basic operations Send message Barrier synchronization 11/33
- 9. What is BSP? P1 P2 P3 ComputationSuperstep Sync Communication Sync 12/33
- 10. What is BSP? Computation phase is queuing messages Within two barrier synchronizations messages are exchanged in bulk Messages from previous superstep are available in next superstep 13
- 11. K-Means with BSPPartition the dataset into equal sized blocks 14/33
- 12. K-Means with BSPPut centers into RAM on each process Centers Sum assigned vectors to a new temporary center object Iterate sequentially over vectors on disk 15/33
- 13. K-Means with BSPCenters Centers CentersCenters Centers Centers
- 14. K-Means with BSP SumsCenters • Center 1 • Sum=25 • 5 times summed • Center 2 • Sum=50 • 10 times summed • Center 3 • Sum=10 • 5 times summed 17/33
- 15. K-Means with BSP SumCenters Send the sum Sum Centers SumCenters Sum Centers
- 16. K-Means with BSP SumCenters Send the sum Sum Centers SumCenters Sum Centers
- 17. K-Means mit BSPCenters Sum Sum • The same calculation on every process Sum • Floating point error Sum can be corrected by Divide by total synchronizing when increments Total it exceeds a given Means Sum threshold New Centers 20/33
- 18. K-Means with BSP Update Assignment Sync 21/33
- 19. K-Means with BSP Partition vectors into equal sized blocks # Blocks = # Tasks Put centers in RAM Assignmentphase Iterative vectors on disk sequentially Sum up temporary centers with assigned vectors Message all tasks with sum and how often something was summed Updatephase Calculate the total sum over all received messages and average Replace old centers with new centers and calc convergence 22/33
- 20. Benchmark 16 Server, 256 Cores, 10G network 80 seconds! Possible starvation: add more servers
- 21. Benchmark Logarithmic scaling Much better than linear scaling of MapReduce 24
- 22. Misc Implementation on Github https://github.com/thomasjungblut/thomasjungblut- common/blob/master/src/de/jungblut/clustering/KMe ansBSP.java Will be comitted to Hama‘s ML-package soon https://issues.apache.org/jira/browse/HAMA-547 25

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment