6.
What is K-Means Clustering? Unsupervised Learning Huge number of input vectors k initial centers Two step iterative algorithm Assignment Update 9/33
8.
What is BSP? BSP = Bulk Synchronous Parallel Paradigm to design parallel algorithms Two basic operations Send message Barrier synchronization 11/33
9.
What is BSP? P1 P2 P3 ComputationSuperstep Sync Communication Sync 12/33
10.
What is BSP? Computation phase is queuing messages Within two barrier synchronizations messages are exchanged in bulk Messages from previous superstep are available in next superstep 13
11.
K-Means with BSPPartition the dataset into equal sized blocks 14/33
12.
K-Means with BSPPut centers into RAM on each process Centers Sum assigned vectors to a new temporary center object Iterate sequentially over vectors on disk 15/33
13.
K-Means with BSPCenters Centers CentersCenters Centers Centers
14.
K-Means with BSP SumsCenters • Center 1 • Sum=25 • 5 times summed • Center 2 • Sum=50 • 10 times summed • Center 3 • Sum=10 • 5 times summed 17/33
15.
K-Means with BSP SumCenters Send the sum Sum Centers SumCenters Sum Centers
16.
K-Means with BSP SumCenters Send the sum Sum Centers SumCenters Sum Centers
17.
K-Means mit BSPCenters Sum Sum • The same calculation on every process Sum • Floating point error Sum can be corrected by Divide by total synchronizing when increments Total it exceeds a given Means Sum threshold New Centers 20/33
19.
K-Means with BSP Partition vectors into equal sized blocks # Blocks = # Tasks Put centers in RAM Assignmentphase Iterative vectors on disk sequentially Sum up temporary centers with assigned vectors Message all tasks with sum and how often something was summed Updatephase Calculate the total sum over all received messages and average Replace old centers with new centers and calc convergence 22/33
20.
Benchmark 16 Server, 256 Cores, 10G network 80 seconds! Possible starvation: add more servers
21.
Benchmark Logarithmic scaling Much better than linear scaling of MapReduce 24
22.
Misc Implementation on Github https://github.com/thomasjungblut/thomasjungblut- common/blob/master/src/de/jungblut/clustering/KMe ansBSP.java Will be comitted to Hama‘s ML-package soon https://issues.apache.org/jira/browse/HAMA-547 25
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.
Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.
Be the first to comment