Apache hama @ Samsung SW Academy

983 views

Published on

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
983
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
15
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Apache hama @ Samsung SW Academy

  1. 1. Apache Hamaa Bulk Synchronous Parallel Computing Edward J. Yoon <edwardyoon@apache.org>
  2. 2. Who Am I• Edward J. Yoon – @eddieyoon• Founder of Apache Hama• PMC member of Apache BigTop• Oracle Employee
  3. 3. What’s Hama?• Open Source – Under Apache 2.0 License• Written In Java• Apache Top Level Project
  4. 4. Characteristics• a General BSP computing engine – M/R like Input/Output Formatter • SequenceFile, Text, Accumulo, Hbase, …, etc. – Job Manager – Checkpoint Recovery• Streaming and Pipes – Python, C++, …, etc.• Graph and Machine Learning Packages – K-means, Gradient Descent, Collaborative Filtering
  5. 5. Bulk Synchronous Parallel?• Originally introduced by Valiant• a Sequence of supersteps
  6. 6. Compare to M/R and MPI• Supports message-passing paradigm style of application development• Provides a flexible, simple, and easy-to-use small APIs• Enables to perform better than MPI for communication-intensive applications• Guarantees impossibility of deadlocks or collisions in the communication mechanisms
  7. 7. So, fit for what?• Processing Big Data w/ complicated relationships – e.g., graph or network.• Iterative or Recursive scientific applications• Continuous Event Processing
  8. 8. Which is the Big Data?
  9. 9. Could be applied to• Analyze user actions and patterns• Social Target Marketing• Observe evolution of Social networks• Detect anomaly rapidly in Real-time• Business Intelligence
  10. 10. Internals• Pluggable RPC Architecture for message transfer – e.g., Hadoop RPC, Avro RPC, …, etc.• Message Collector, Bundler, and Compressor to reduce network overheads and contentions – e.g., Snappy, Bzip2, …, etc.
  11. 11. BSP APIpublic abstract void bsp(BSPPeer<K1, V1, K2, V2, M> peer) throws IOException, SyncException;
  12. 12. BSP Examples• Pi Calculation• Sparse Matrix-Vector Multiplication• K-means Clustering• Gradient Descent
  13. 13. Graph APIpublic void compute(Iterator<M> messages) throws IOException;
  14. 14. Graph Examples• In-link Count• Single Source Shortest Path• Pagerank• Bipartitie Matching• Semi-Clustering
  15. 15. Find Maximum Value
  16. 16. SSSP Performance• a SSSP for random graph of 1 billion edges is computed in 400 seconds on 1 Oracle BDA

×