Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Pregel
Pregel 
• A System for Large-Scale Graph Processing 
• Sufficiently flexible to express arbitrary graph 
algorithms 
• So ...
Pregel: Model Of Computation 
• Vertex state 
• Terminate codition: all vertex are inactive
Pregel: Model Of Computation 
• Sequence of supersteps 
• Invoke compute() for each active vertex 
• Each vertex can 
– Mo...
Pregel: Model Of Computation
Pregel: Model Of Computation
Pregel API
Pregel API 
• Combiners 
• Aggregators 
• Topology Mutations 
• Input and Output
Giraph
Why not implement Giraph with 
multiple MapReduce jobs 
• Too much disk, no in-memory caching, a 
superstep becomes a job!
Giraph is a single Map-only job in 
Hadoop 
• Hadoop is purely a resource manager for Giraph, all 
communication is done t...
Maximum vertex value 
implementation
Giraph components 
• Master 
– One active master at a time 
– Assign partition owners to workers prior to each 
superstep ...
Graph distribution
Upcoming SlideShare
Loading in …5
×

Pregel and giraph

596 views

Published on

Pregel and giraph

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Pregel and giraph

  1. 1. Pregel
  2. 2. Pregel • A System for Large-Scale Graph Processing • Sufficiently flexible to express arbitrary graph algorithms • So easy
  3. 3. Pregel: Model Of Computation • Vertex state • Terminate codition: all vertex are inactive
  4. 4. Pregel: Model Of Computation • Sequence of supersteps • Invoke compute() for each active vertex • Each vertex can – Modify its state, its outgoing edges – Recive messages – Send messages to another
  5. 5. Pregel: Model Of Computation
  6. 6. Pregel: Model Of Computation
  7. 7. Pregel API
  8. 8. Pregel API • Combiners • Aggregators • Topology Mutations • Input and Output
  9. 9. Giraph
  10. 10. Why not implement Giraph with multiple MapReduce jobs • Too much disk, no in-memory caching, a superstep becomes a job!
  11. 11. Giraph is a single Map-only job in Hadoop • Hadoop is purely a resource manager for Giraph, all communication is done through Netty-based IPC
  12. 12. Maximum vertex value implementation
  13. 13. Giraph components • Master – One active master at a time – Assign partition owners to workers prior to each superstep – Synchronize supersteps • Worker – Load the graph from input – Does the computation/messaging of its assigned partitions
  14. 14. Graph distribution

×