Pregel
Pregel 
• A System for Large-Scale Graph Processing 
• Sufficiently flexible to express arbitrary graph 
algorithms 
• So easy
Pregel: Model Of Computation 
• Vertex state 
• Terminate codition: all vertex are inactive
Pregel: Model Of Computation 
• Sequence of supersteps 
• Invoke compute() for each active vertex 
• Each vertex can 
– Modify its state, its outgoing edges 
– Recive messages 
– Send messages to another
Pregel: Model Of Computation
Pregel: Model Of Computation
Pregel API
Pregel API 
• Combiners 
• Aggregators 
• Topology Mutations 
• Input and Output
Giraph
Why not implement Giraph with 
multiple MapReduce jobs 
• Too much disk, no in-memory caching, a 
superstep becomes a job!
Giraph is a single Map-only job in 
Hadoop 
• Hadoop is purely a resource manager for Giraph, all 
communication is done through Netty-based IPC
Maximum vertex value 
implementation
Giraph components 
• Master 
– One active master at a time 
– Assign partition owners to workers prior to each 
superstep 
– Synchronize supersteps 
• Worker 
– Load the graph from input 
– Does the computation/messaging of its assigned 
partitions
Graph distribution

Pregel and giraph

  • 1.
  • 2.
    Pregel • ASystem for Large-Scale Graph Processing • Sufficiently flexible to express arbitrary graph algorithms • So easy
  • 3.
    Pregel: Model OfComputation • Vertex state • Terminate codition: all vertex are inactive
  • 4.
    Pregel: Model OfComputation • Sequence of supersteps • Invoke compute() for each active vertex • Each vertex can – Modify its state, its outgoing edges – Recive messages – Send messages to another
  • 5.
    Pregel: Model OfComputation
  • 6.
    Pregel: Model OfComputation
  • 7.
  • 8.
    Pregel API •Combiners • Aggregators • Topology Mutations • Input and Output
  • 9.
  • 10.
    Why not implementGiraph with multiple MapReduce jobs • Too much disk, no in-memory caching, a superstep becomes a job!
  • 11.
    Giraph is asingle Map-only job in Hadoop • Hadoop is purely a resource manager for Giraph, all communication is done through Netty-based IPC
  • 12.
    Maximum vertex value implementation
  • 13.
    Giraph components •Master – One active master at a time – Assign partition owners to workers prior to each superstep – Synchronize supersteps • Worker – Load the graph from input – Does the computation/messaging of its assigned partitions
  • 14.

Editor's Notes

  • #9 Combiner : gộp các message lại Aggregator : biến toàn mạng, nhận message từ user và combine lại