Processing Over a Billion Edgeson Apache GiraphHadoop Summit 2012Avery ChingSoftware Engineer6/14/2012
Agenda1   Motivation and Background2   Giraph Concepts/API3   Example Applications4   Architecture Overview5   Recent/Futu...
What is Apache Giraph?•  Loose implementation of Google’s Pregel that runs   as a map-only job on Hadoop•  “Think like a v...
What (social) graphs are we targeting?•  3/2012 LinkedIn has 161 million users•  6/2012 Twitter discloses 140 million MAU•...
Example applications•  Ranking ▪    Popularity, importance, etc.•  Label Propagation ▪    Location, school, gender, etc.• ...
Bulk synchronous parallel•  Supersteps ▪    A global epoch followed by a global barrier where components      do concurren...
Computation +             Superstep         Communication   ProcessorsTime                     Barrier
MapReduce -> Giraph“Think like a vertex”, not a key-value pair!        MapReduce                         Giraphpublic clas...
Basic Giraph API  Methods available to compute() Immediate effect/access                    Next superstepI getVertexId() ...
Why not implement Giraph with multipleMapReduce jobs?•  Too much disk, no in-memory caching, a superstep   becomes a job! ...
Giraph is a single Map-only job inHadoop•  Hadoop is purely a resource manager for Giraph, all   communication is done thr...
Maximum vertex value implementationpublic class MaxValueVertex extends EdgeListVertex<    IntWritable, IntWritable, IntWri...
Maximum vertex value Processor 1   5             5             5             5 Processor 2   1                            ...
Page rank implementationpublic class SimplePageRankVertex extends EdgeListVertex<LongWritable,DoubleWritable, FloatWritabl...
Giraph In MapReduce
Giraph components•  Master – Application coordinator ▪    One active master at a time ▪    Assigns partition owners to wor...
Graph distribution•  Master graph partitioner ▪    Create initial partitions, generate partition owner changes      betwee...
Graph distribution example          Partition 0              Load/Store   Stats 0                        Worker 0    Compu...
Customizable fault tolerance•  No single point of failure from Giraph threads ▪    With multiple master threads, if the cu...
Master thread fault tolerance Before failure of active master 0            After failure of active master 0  “Active”     ...
Worker thread fault tolerance  Superstep i         Superstep i+1       Superstep i+2(no checkpoint)        (checkpoint)   ...
Optional features•  Combiners ▪    Similar to Map-Reduce combiners ▪    Users implement a combine() method that can reduce...
Recent Netty IPC implementation                                                   300                   50                ...
Recent benchmarks•  Test cluster of 80 machines     ▪    Facebook Hadoop (https://github.com/facebook/hadoop-20)     ▪    ...
Worker scalability                  3000 Time (Seconds)                  2500                  2000                  1500 ...
Edge Scalability                  5000 Time (Seconds)                  4000                  3000                  2000   ...
Worker / edge scalability                  2000                               8 Time (Seconds)                            ...
Apache Giraph has graduated as of5/2012•  Incubated for less than a year (entered incubator   9/12)•  Committers from Hort...
Future improvements•  Out-of-core messages/graph ▪    Under memory pressure, dump messages/portions of the graph      to l...
More future improvements•  Adding a master#compute() method ▪    Arbitrary master computation that sends results to worker...
(c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0
Sessions will resume at 4:30pm                             Page 32
Upcoming SlideShare
Loading in...5
×

Processing edges on apache giraph

7,407

Published on

Published in: Technology
0 Comments
17 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,407
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
17
Embeds 0
No embeds

No notes for slide

Processing edges on apache giraph

  1. 1. Processing Over a Billion Edgeson Apache GiraphHadoop Summit 2012Avery ChingSoftware Engineer6/14/2012
  2. 2. Agenda1 Motivation and Background2 Giraph Concepts/API3 Example Applications4 Architecture Overview5 Recent/Future Improvements
  3. 3. What is Apache Giraph?•  Loose implementation of Google’s Pregel that runs as a map-only job on Hadoop•  “Think like a vertex” that can send messages to any other vertex in the graph using the bulk synchronous parallel programming model•  An in-memory scalable system* ▪  Will be enhanced with out-of-core messages/vertices to handle larger problem sets.
  4. 4. What (social) graphs are we targeting?•  3/2012 LinkedIn has 161 million users•  6/2012 Twitter discloses 140 million MAU•  4/2012 Facebook declares 901 million MAU
  5. 5. Example applications•  Ranking ▪  Popularity, importance, etc.•  Label Propagation ▪  Location, school, gender, etc.•  Community ▪  Groups, interests
  6. 6. Bulk synchronous parallel•  Supersteps ▪  A global epoch followed by a global barrier where components do concurrent computation and send messages•  Point-to-point messages (i.e. vertex to vertex) ▪  Sent during a superstep from one component to another and then delivered in the following superstep•  Computation complete when all components complete
  7. 7. Computation + Superstep Communication ProcessorsTime Barrier
  8. 8. MapReduce -> Giraph“Think like a vertex”, not a key-value pair! MapReduce Giraphpublic class Mapper< public class Vertex< KEYIN, I extends VALUEIN, WritableComparable, KEYOUT, V extends Writable, VALUEOUT> { E extends Writable, void map(KEYIN key, M extends Writable> { VALUEIN value, void compute( Context context) Iterator<M> msgIterator); throws IOException, } InterruptedException;}
  9. 9. Basic Giraph API Methods available to compute() Immediate effect/access Next superstepI getVertexId() void sendMsg(I id, M msg)V getVertexValue() void sendMsgToAllEdges(M msg)void setVertexValue(V vertexValue) void addVertexRequest(Iterator<I> iterator() BasicVertex<I, V, E, M> vertex)E getEdgeValue(I targetVertexId) void removeVertexRequest(I vertexId)boolean hasEdge(I targetVertexId) void addEdgeRequest(boolean addEdge(I targetVertexId, I sourceVertexId, E Edge<I, E> edge)edgeValue) void removeEdgeRequest(E removeEdge(I targetVertexId) I sourceVertexId, I destVertexId)void voteToHalt()boolean isHalted()
  10. 10. Why not implement Giraph with multipleMapReduce jobs?•  Too much disk, no in-memory caching, a superstep becomes a job! Input Map Intermediate Reduce Output format tasks files tasks format Split 0 Output 0 Split 1 Split 2 Split 3 Output 1
  11. 11. Giraph is a single Map-only job inHadoop•  Hadoop is purely a resource manager for Giraph, all communication is done through Netty-based IPC Vertex input Map Vertex output format tasks format Split 0 Output 0 Split 1 Split 2 Split 3 Output 1
  12. 12. Maximum vertex value implementationpublic class MaxValueVertex extends EdgeListVertex< IntWritable, IntWritable, IntWritable, IntWritable> { @Override public void compute(Iterator<IntWritable> msgIterator) { boolean changed = false; while (msgIterator.hasNext()) { IntWritable msgValue = msgIterator.next(); if (msgValue.get() > getVertexValue().get()) { setVertexValue(msgValue); changed = true; } } if (getSuperstep() == 0 || changed) { sendMsgToAllEdges(getVertexValue()); } else { voteToHalt(); } }}
  13. 13. Maximum vertex value Processor 1 5 5 5 5 Processor 2 1 1 5 5 5 2 2 2 5 5 Time Barrier Barrier Barrier
  14. 14. Page rank implementationpublic class SimplePageRankVertex extends EdgeListVertex<LongWritable,DoubleWritable, FloatWritable, DoubleWritable> { public void compute(Iterator<DoubleWritable> msgIterator) { if (getSuperstep() >= 1) { double sum = 0; while (msgIterator.hasNext()) { sum += msgIterator.next().get(); } setVertexValue(new DoubleWritable((0.15f / getNumVertices()) + 0.85f *sum); } if (getSuperstep() < 30) { long edges = getNumOutEdges(); sentMsgToAllEdges(new DoubleWritable(getVertexValue().get() / edges)); } else { voteToHalt(); } }}
  15. 15. Giraph In MapReduce
  16. 16. Giraph components•  Master – Application coordinator ▪  One active master at a time ▪  Assigns partition owners to workers prior to each superstep ▪  Synchronizes supersteps•  Worker – Computation & messaging ▪  Loads the graph from input splits ▪  Does the computation/messaging of its assigned partitions•  ZooKeeper ▪  Maintains global application state
  17. 17. Graph distribution•  Master graph partitioner ▪  Create initial partitions, generate partition owner changes between supersteps•  Worker graph partitioner ▪  Determine which partition a vertex belongs to ▪  Create/modify the partition stats (can split/merge partitions)•  Default is hash partitioning (hashCode()) ▪  Range-based partitioning is also possible on a per-type basis
  18. 18. Graph distribution example Partition 0 Load/Store Stats 0 Worker 0 Compute Partition 1 Messages Stats 1 Partition 2 Load/Store Stats 2 Master Worker 1 Compute Partition 3 Messages Stats 3 Partition 4 Load/Store Stats 4 Worker 2 Compute Partition 5 Messages Stats 5 Partition 6 Load/Store Stats 6 Worker 3 Compute Partition 7 Messages Stats 7
  19. 19. Customizable fault tolerance•  No single point of failure from Giraph threads ▪  With multiple master threads, if the current master dies, a new one will automatically take over. ▪  If a worker thread dies, the application is rolled back to a previously checkpointed superstep. The next superstep will begin with the new amount of workers ▪  If a zookeeper server dies, as long as a quorum remains, the application can proceed•  Hadoop single points of failure still exist ▪  Namenode, jobtracker ▪  Restarting manually from a checkpoint is always possible 19
  20. 20. Master thread fault tolerance Before failure of active master 0 After failure of active master 0 “Active” “Active” Master 0 Master 0 Active Active Master Master “Spare” State “Active” State Master 1 Master 1 “Spare” “Spare” Master 2 Master 2•  One active master, with spare masters taking over in the event of an active master failure•  All active master state is stored in ZooKeeper so that a spare master can immediately step in when an active master fails•  “Active” master implemented as a queue in ZooKeeper 20
  21. 21. Worker thread fault tolerance Superstep i Superstep i+1 Superstep i+2(no checkpoint) (checkpoint) (no checkpoint) Worker failure! Superstep i+1 Superstep i+2 Superstep i+3 (checkpoint) (no checkpoint) (checkpoint) Worker failure after checkpoint complete! Superstep i+3 Application (no checkpoint) Complete•  A single worker death fails the superstep•  Application reverts to the last committed superstep automatically ▪  Master detects worker failure during any superstep with a ZooKeeper “health” znode ▪  Master chooses the last committed superstep and sends a command through ZooKeeper for all workers to restart from that superstep 21
  22. 22. Optional features•  Combiners ▪  Similar to Map-Reduce combiners ▪  Users implement a combine() method that can reduce the amount of messages sent and received ▪  Run on both the client side (memory, network) and server side (memory)•  Aggregators ▪  Similar to MPI aggregation routines (i.e. max, min, sum, etc.) ▪  Commutative and associate operations that are performed globally ▪  Examples include global communication, monitoring, and statistics
  23. 23. Recent Netty IPC implementation 300 50 250 Time (Seconds)•  Big improvement over the 40 Hadoop RPC implementation 200 30 150•  10-39% overall performance 20 improvement 100 50 10•  Still need more Netty tuning 0 0 10 30 50 Workers Netty Hadoop RPC % improvement
  24. 24. Recent benchmarks•  Test cluster of 80 machines ▪  Facebook Hadoop (https://github.com/facebook/hadoop-20) ▪  72 cores, 64+ GB of memory▪  org.apache.giraph.benchmark.PageRankBenchmark ▪  5 supersteps ▪  No checkpointing ▪  10 edges per vertex
  25. 25. Worker scalability 3000 Time (Seconds) 2500 2000 1500 1000 500 0 10 20 30 40 45 50 Workers
  26. 26. Edge Scalability 5000 Time (Seconds) 4000 3000 2000 1000 0 1 2 3 4 5 Edges (Billions)
  27. 27. Worker / edge scalability 2000 8 Time (Seconds) Edges (Billions) 1500 6 1000 4 500 2 0 0 10 30 50 Workers Run Time Workers/Edges
  28. 28. Apache Giraph has graduated as of5/2012•  Incubated for less than a year (entered incubator 9/12)•  Committers from HortonWorks, Twitter, LinkedIn, Facebook, TrendMicro and various schools (VU Amsterdam, TU Berlin, Korea University)•  Released 0.1 as of 2/6/2012, will be release 0.2 within a few months
  29. 29. Future improvements•  Out-of-core messages/graph ▪  Under memory pressure, dump messages/portions of the graph to local disk ▪  Ability to run applications without having all needed memory•  Performance improvements ▪  Netty is a good step in the right direction, but need to tune messaging performance as it takes up a majority of the time ▪  Scale back use of ZooKeeper to only be for health registration, rather than implementing aggregators and coordination
  30. 30. More future improvements•  Adding a master#compute() method ▪  Arbitrary master computation that sends results to workers prior to a superstep to simplify certain algorithms ▪  GIRAPH-127•  Handling skew ▪  Some vertices have a large number of edges and we need to break them up and handle them differently to provide better scalability
  31. 31. (c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0
  32. 32. Sessions will resume at 4:30pm Page 32

×