• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Introduction of Apache Hama - 2011

Introduction of Apache Hama - 2011



for the lecture.

for the lecture.



Total Views
Views on SlideShare
Embed Views



13 Embeds 18,007

http://www.hadoop.co.kr 13065
http://blog.udanax.org 4677
http://blog.hadoop.co.kr 161
http://4351815591165136289_8554d270e86ed802136ed01cf0e39f97482d7a40.blogspot.com 36
http://feeds.feedburner.com 28
http://www.hanrss.com 18
http://9588112_bc47524b648d47af38fe158b82fc443ed04b304a.blogspot.com 8
http://webcache.googleusercontent.com 7
http://a0.twimg.com 3
http://search.daum.net 1
http://www.rxx.co.il 1 1
http://localhost 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Introduction of Apache Hama - 2011 Introduction of Apache Hama - 2011 Presentation Transcript

    • Introduction of Apache Hama Edward J. Yoon, October 11, 2011 <edwardyoon@apache.org>
    • About Me Founder of Apache Hama. Committer of Apache Bigtop. Employee for KT. http://twitter.com/eddieyoon
    • What Is Hama? Apache Incubator Project. BSP (Bulk Synchronous Parallel) for massive scientific computations. Written In Java. Currently 2 releases, 3 main committers.
    • Hama Characteristics Provides a Pure BSP model .  Job submission and management interface.  Multiple tasks per node.  Checkpoint recovery. Supports to run in the Clouds using Apache Whirr. Supports to run with Hadoop nextGen.
    • Bulk Synchronous Parallel? Parallel programming model introduced by Valiant. Consist of a sequence of supersteps. Conceptually simple and intuitive from a programming standpoint. Used for a variety of applications e.g., scientific computing, genetic programming, …
    • Schematic diagram of a superstepLocal Computation ………. Idle Communication ………. Idle Barrier Synchronization
    • Internals Proactive Task Scheduling. Hadoop RPC is used for BSP tasks to communicate each other. Collection and bundling of messages as a technique to reduce network overheads and contentions. Zookeeper is used for Barrier Synchronization.
    • Pi Calculation Each task executes locally its portion of the loop a number of times. One task acts as master and collects the results through the BSP communication interface.
    • Structural Analysis of Network Traffic Flows Traffic flows in KT clouds.  traffic engineering, anomaly detection, traffic forecasting and capacity planning Currently BSP jobs are experimentally running on 512 multi-cores machines.
    • Random Communication Benchmarks Benchmarked on 16 1U servers using 10 tasks per server. X axis is the time (sec.)of BSP job execution (32 supersteps). Y axis is the number of messages to be sent to random BSP tasks in each superstep.
    • What’s Next? Support Input/Output Formatter like MapReduce. Message Compression for High Performance. Add some frameworks on top of Hama.
    • More Information http://incubator.apache.org/hama http://wiki.apache.org/hama