• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
612
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
13
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Apache Hamaa Bulk Synchronous Parallel Computing Edward J. Yoon <edwardyoon@apache.org>
  • 2. Who Am I• Edward J. Yoon – @eddieyoon• Founder of Apache Hama• PMC member of Apache BigTop• Oracle Employee
  • 3. What’s Hama?• Open Source – Under Apache 2.0 License• Written In Java• Apache Top Level Project
  • 4. Characteristics• a General BSP computing engine – M/R like Input/Output Formatter • SequenceFile, Text, Accumulo, Hbase, …, etc. – Job Manager – Checkpoint Recovery• Streaming and Pipes – Python, C++, …, etc.• Graph and Machine Learning Packages – K-means, Gradient Descent, Collaborative Filtering
  • 5. Bulk Synchronous Parallel?• Originally introduced by Valiant• a Sequence of supersteps
  • 6. Compare to M/R and MPI• Supports message-passing paradigm style of application development• Provides a flexible, simple, and easy-to-use small APIs• Enables to perform better than MPI for communication-intensive applications• Guarantees impossibility of deadlocks or collisions in the communication mechanisms
  • 7. So, fit for what?• Processing Big Data w/ complicated relationships – e.g., graph or network.• Iterative or Recursive scientific applications• Continuous Event Processing
  • 8. Which is the Big Data?
  • 9. Could be applied to• Analyze user actions and patterns• Social Target Marketing• Observe evolution of Social networks• Detect anomaly rapidly in Real-time• Business Intelligence
  • 10. Internals• Pluggable RPC Architecture for message transfer – e.g., Hadoop RPC, Avro RPC, …, etc.• Message Collector, Bundler, and Compressor to reduce network overheads and contentions – e.g., Snappy, Bzip2, …, etc.
  • 11. BSP APIpublic abstract void bsp(BSPPeer<K1, V1, K2, V2, M> peer) throws IOException, SyncException;
  • 12. BSP Examples• Pi Calculation• Sparse Matrix-Vector Multiplication• K-means Clustering• Gradient Descent
  • 13. Graph APIpublic void compute(Iterator<M> messages) throws IOException;
  • 14. Graph Examples• In-link Count• Single Source Shortest Path• Pagerank• Bipartitie Matching• Semi-Clustering
  • 15. Find Maximum Value
  • 16. SSSP Performance• a SSSP for random graph of 1 billion edges is computed in 400 seconds on 1 Oracle BDA