CloudClusteringToward an Iterative Data Processing Pattern on the CloudAnkur Dave*, Wei Lu†, Jared Jackson†, Roger Barga†*UC Berkeley†Microsoft Research
AnkurDaveWei LuJared JacksonRoger Barga
BackgroundMapReduce and its successors reimplement communication and storageBut cloud services like EC2 and Azure natively provide these utilitiesCan we build an efficient distributed application using only the primitives of the cloud?
Design GoalsEfficient Fault tolerantCloud-native
OutlineWindows AzureK-Means clustering algorithmCloudClustering architectureData localityBuddy systemEvaluationRelated work
Windows AzureVM instancesBlob storageQueue storage
OutlineWindows AzureK-Means clustering algorithmCloudClustering architectureData localityBuddy systemEvaluationRelated work
K-Means algorithmInitializeAssign points to centroidsRecalculate centroidsCentroidsPoints moved?Iteratively groups 𝑛 points into 𝑘 clusters 
K-Means algorithmInitializePartition workAssign points to centroidsCompute partial sumsRecalculate centroidsPoints moved?
OutlineWindows AzureK-Means clustering algorithmCloudClustering architectureData localityBuddy systemEvaluationRelated work
ArchitectureInitializePartition workAssign points to centroids⋮ Centroids: {𝑐1, …, 𝑐𝑛} Compute partial sumsRecalculate centroidsPoints moved?
OutlineWindows AzureK-Means clustering algorithmCloudClustering architectureData localityBuddy systemEvaluationRelated work
Data locality⋮ Single-queue pattern is naturally fault-tolerantProblem: Not suitable for iterative computationMultiple-queue pattern unlocks data localityTradeoff: Complicates fault tolerance
OutlineWindows AzureK-Means clustering algorithmCloudClustering architectureData localityBuddy systemEvaluationRelated work
Handling failureInitializePartition workAssign points to centroidsCompute partial sumsStall?-> Buddy system Recalculate centroidsPoints moved?
Buddy systemBuddy system provides distributed fault detection and recovery
Buddy systemFault domain123Spreading buddies across Azure fault domains provides increased resilience to simultaneous failure
Buddy systemvs.…Cascaded failure detection reduces communication and improves resilience
OutlineWindows AzureK-Means clustering algorithmCloudClustering architectureData localityBuddy systemEvaluationRelated work
EvaluationLinear speedup with instance count
EvaluationSublinear speedup with instance sizeReason: I/O bandwidth doesn’t scale
OutlineWindows AzureK-Means clustering algorithmCloudClustering architectureData localityBuddy systemEvaluationRelated work
Related workExisting frameworks (MapReduce, Dryad,Spark,MPI, …)all support K-Means, but reimplementreliable communicationAzureBlast built directly on cloud services, but algorithm is not iterative
ConclusionsCloudClustering shows that it's possible to build efficient, resilient applications using only the common cloud servicesMultiple-queue pattern unlocks data localityBuddy system provides fault tolerance

CloudClustering: Toward an Iterative Data Processing Pattern on the Cloud

Editor's Notes

  • #2 Thanks for the introductionUndergrad at UC BerkeleyPresenting CloudClustering: data-intensive app on cloudQuestions inline
  • #3 Joint work with Wei Lu, Jared Jackson, and Roger Barga of MSRGlad to have him in the audience
  • #4 MapReduce, Dryad,Twister, HaLoop, Spark:useful abstraction for programming the cloudMapReduce,Dryad:acyclic data flowsSuccessors:iterativeCommunication and storage: sockets, replicationEC2,Azure provide thisCan use only “primitives of the cloud”? -> Simpler
  • #5 Fast:data locality and cachingResilient to failureUse only existing cloud services -- reliable building blocks
  • #6  What we will do:Used Azure to build CloudClusteringImplements K-MeansCloudClustering architecture: building blocks -> efficient, fault-tolerantBenchmarks of CloudClustering running on AzureRelated work
  • #7 Windows/.NET based cloud offeringRun user code on VM instances.Instance fails -> automatically recovered-> Central blob storage, 3x replicated. Limited by network-> Queue storage: small messages
  • #8 K-Means: widely used, well-understood
  • #9 Iterative clustering algorithmGroups n points into k clusters, n >> k-> Initial set of cluster centers -> centroids-> Points assigned to closest centroids-> Centroids move to average of their points-> Repeats until convergence
  • #10 Easy to parallelizeMaster: initial centroids-> Partition points, split across workers-> On workers, same as before for partition-> All returned: averages, move centroids-> Iterate until convergence
  • #11 How it’s implemented
  • #12 Master supervises pool of workers-> Coordinate using queues->First,master loads input into blob storage-> Generates initial centroids-> Sends command to workers: ptr to partition, list of centroids-> Workers do computation-> Generate partial sums-> All returned: master recalculates centroids-> Next iteration
  • #13 2 modifications for efficiency and fault toleranceFirst: data locality
  • #14 Conventional: single queueFault tolerance easy. All state in messages. Failure -> throughput reducedNo good for iterative: no data locality-> Multiple queues: unlocks data localityBut complicates fault tolerance
  • #15 Efficiency with fault tolerance: buddy system
  • #16 Why failure is a problem?-> When worker fails-> Can’t report to master-> Stall-> Buddy system
  • #17 Conventional: heartbeat over socketsDesign goals: use cloud servicesInsight: working on task when fail. Queue reliable -> have a recordPeer-to-peer: master assigns into buddy groups-> Each polls queues of others-> When fail-> Tasks in queue for longer than timeout-> Buddies dequeue tasks, complete themSize: tradeoff between fault tolerance and network trafficLarge group -> resilient to simultaneous, but more polling traffic (charge per transaction)
  • #18 Fault domains: reduce likelihood of sim.failFault domains: prob. of sim.failAllocation algorithm spreads across fault domains
  • #19 Workers -> nodes in graphEdges -> polling queue of another instanceBuddy system: disconnected cliques-> Alternative: cascaded failure detectionTotal ordering of workersEach polls the next, circularLess traffic, better resilienceFuture work
  • #20 Performance
  • #21 Linear speedup as increase number of instances16 instances and 1 GB data
  • #22 Sublinear speedup with faster machinesBecause I/O bandwidth doesn’t always improveUse multiple threads, but I/O is bottleneck
  • #23 Related work
  • #24 All frameworks support K-MeansMapReduce, Dryad: no iteration, launch from driverImplement communication using socketsAzureBlast: NCBI-BLAST genetic tool on cloud servicesNot iterative -> different challenges
  • #25 Can build efficient, resilient with only cloud servicesHelpful patterns: multiple queues, buddy system