Apache Hama 0.4

3,725 views
3,650 views

Published on

My presentation @ Korea Telecom

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,725
On SlideShare
0
From Embeds
0
Number of Embeds
1,184
Actions
Shares
0
Downloads
56
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Apache Hama 0.4

  1. 1. Apache Hama v0.4 @ Korea Telecom Edward J. Yoon, Jan 31, 2012 <edwardyoon@apache.org>
  2. 2. Index What is Hama?  Bulk Synchronous Parallel  Schematic diagram of a superstep Hama Characteristics  Internals Why Hama and BSP? Hama In Korea Telecom Performance Evaluation  SSSP  K-means clustering  PageRank
  3. 3. About Me Founder of Apache Hama. Employee for Korea Telecom.
  4. 4. What Is Hama? Apache Incubator Project. BSP (Bulk Synchronous Parallel) for massive scientific computations. Written In Java. Currently 3 releases, 3 main committers and 2 more active contributors.
  5. 5. Bulk Synchronous Parallel? Parallel programming model introduced by Valiant. Consist of a sequence of supersteps. Conceptually simple and intuitive from a programming standpoint. Used for a variety of applications e.g., scientific computing, genetic programming, …
  6. 6. Schematic diagram of a superstepLocal Computation ………. Idle Communication ………. Idle Barrier Synchronization
  7. 7. Hama Characteristics Provides a Pure BSP.  Job submission and management interface.  Multiple tasks per node.  Input/Output Formatter.  Checkpoint recovery. Supports to run in the Clouds using Apache Whirr. Supports to run with Hadoop YARN.
  8. 8. Internals Hadoop RPC is used for BSP tasks to communicate each other. Collection and bundling of messages as a technique to reduce network overheads and contentions. Zookeeper is used for Barrier Synchronization.
  9. 9. Why Hama and BSP? Supports message passing paradigm style of application development. Provides a flexible, simple, and easy-to-use small APIs. Enables to perform better than MPI for communication- intensive applications. Guarantees impossibility of deadlocks or collisions in the communication mechanisms.
  10. 10. Hama In Korea Telecom Structural Analysis of Network Traffic Flows.  traffic mining, engineering, anomaly detection, traffic forecasting and capacity planning  Currently BSP jobs are experimentally running on 512 multi- cores machines.
  11. 11. Performance: SSSP algorithm A SSSP for a random graph (100 million vertices, 1 billion edges) is computed in 30 minutes on 512 cores machines.
  12. 12. Performance: PageRank A PageRank for 30 million random web pages (10 anchors per page) is computed in 20 minutes on 256 cores Hama cluster!
  13. 13. What’s Next? Fault Tolerant System. Message Compression for High Performance. Add some frameworks on top of Hama. Add other RPC libs. Elegant Web UI.

×