• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Apache Hamaa Bulk Synchronous Parallel Computing Edward J. Yoon <edwardyoon@apache.org>
  • 2. Who Am I• Edward J. Yoon – @eddieyoon• Founder of Apache Hama• PMC member of Apache BigTop• Oracle Employee
  • 3. What’s Hama?• Open Source – Under Apache 2.0 License• Written In Java• Apache Top Level Project
  • 4. Characteristics• a General BSP computing engine – M/R like Input/Output Formatter • SequenceFile, Text, Accumulo, Hbase, …, etc. – Job Manager – Checkpoint Recovery• Streaming and Pipes – Python, C++, …, etc.• Graph and Machine Learning Packages – K-means, Gradient Descent, Collaborative Filtering
  • 5. Bulk Synchronous Parallel?• Originally introduced by Valiant• a Sequence of supersteps
  • 6. Compare to M/R and MPI• Supports message-passing paradigm style of application development• Provides a flexible, simple, and easy-to-use small APIs• Enables to perform better than MPI for communication-intensive applications• Guarantees impossibility of deadlocks or collisions in the communication mechanisms
  • 7. So, fit for what?• Processing Big Data w/ complicated relationships – e.g., graph or network.• Iterative or Recursive scientific applications• Continuous Event Processing
  • 8. Which is the Big Data?
  • 9. Could be applied to• Analyze user actions and patterns• Social Target Marketing• Observe evolution of Social networks• Detect anomaly rapidly in Real-time• Business Intelligence
  • 10. Internals• Pluggable RPC Architecture for message transfer – e.g., Hadoop RPC, Avro RPC, …, etc.• Message Collector, Bundler, and Compressor to reduce network overheads and contentions – e.g., Snappy, Bzip2, …, etc.
  • 11. BSP APIpublic abstract void bsp(BSPPeer<K1, V1, K2, V2, M> peer) throws IOException, SyncException;
  • 12. BSP Examples• Pi Calculation• Sparse Matrix-Vector Multiplication• K-means Clustering• Gradient Descent
  • 13. Graph APIpublic void compute(Iterator<M> messages) throws IOException;
  • 14. Graph Examples• In-link Count• Single Source Shortest Path• Pagerank• Bipartitie Matching• Semi-Clustering
  • 15. Find Maximum Value
  • 16. SSSP Performance• a SSSP for random graph of 1 billion edges is computed in 400 seconds on 1 Oracle BDA