Apache hama @ Samsung SW Academy
Upcoming SlideShare
Loading in...5

Apache hama @ Samsung SW Academy






Total Views
Views on SlideShare
Embed Views



1 Embed 1

http://localhost 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Apache hama @ Samsung SW Academy Apache hama @ Samsung SW Academy Presentation Transcript

  • Apache Hamaa Bulk Synchronous Parallel Computing Edward J. Yoon <edwardyoon@apache.org>
  • Who Am I• Edward J. Yoon – @eddieyoon• Founder of Apache Hama• PMC member of Apache BigTop• Oracle Employee
  • What’s Hama?• Open Source – Under Apache 2.0 License• Written In Java• Apache Top Level Project
  • Characteristics• a General BSP computing engine – M/R like Input/Output Formatter • SequenceFile, Text, Accumulo, Hbase, …, etc. – Job Manager – Checkpoint Recovery• Streaming and Pipes – Python, C++, …, etc.• Graph and Machine Learning Packages – K-means, Gradient Descent, Collaborative Filtering
  • Bulk Synchronous Parallel?• Originally introduced by Valiant• a Sequence of supersteps
  • Compare to M/R and MPI• Supports message-passing paradigm style of application development• Provides a flexible, simple, and easy-to-use small APIs• Enables to perform better than MPI for communication-intensive applications• Guarantees impossibility of deadlocks or collisions in the communication mechanisms
  • So, fit for what?• Processing Big Data w/ complicated relationships – e.g., graph or network.• Iterative or Recursive scientific applications• Continuous Event Processing
  • Which is the Big Data?
  • Could be applied to• Analyze user actions and patterns• Social Target Marketing• Observe evolution of Social networks• Detect anomaly rapidly in Real-time• Business Intelligence
  • Internals• Pluggable RPC Architecture for message transfer – e.g., Hadoop RPC, Avro RPC, …, etc.• Message Collector, Bundler, and Compressor to reduce network overheads and contentions – e.g., Snappy, Bzip2, …, etc.
  • BSP APIpublic abstract void bsp(BSPPeer<K1, V1, K2, V2, M> peer) throws IOException, SyncException;
  • BSP Examples• Pi Calculation• Sparse Matrix-Vector Multiplication• K-means Clustering• Gradient Descent
  • Graph APIpublic void compute(Iterator<M> messages) throws IOException;
  • Graph Examples• In-link Count• Single Source Shortest Path• Pagerank• Bipartitie Matching• Semi-Clustering
  • Find Maximum Value
  • SSSP Performance• a SSSP for random graph of 1 billion edges is computed in 400 seconds on 1 Oracle BDA