SlideShare a Scribd company logo
Processing Large
Graphs in Hadoop
Dani Solà
Index
● The Problem
● Google's Pregel
● Example
● Apache Giraph
The Problem
● Processing graphs in MR is not practical:
– Most algorithms are iterative
– Each iteration is mapped to a MR Job
– Takes too long if many iterations are required
– Writing MR for graph processing is not easy
Google's Pregel
● Framework for iterative large graph processing
● Inspired by Bulk Synchronous Parallel model
● Computation is distributed among N+1 nodes
– N workers that do the actual work
– 1 master that synchronizes them
● Takes a vertex-centric approach
– Is much easier to focus on the algorithm
http://kowshik.github.io/JPregel/pregel_paper.pdf
Pregel Main Concepts
● Computations are a sequence of supersteps
● Vertices are randomly distributed among nodes
● Vertices have values and directed edges to
other vertices
● Vertices can send messages to other vertices
● Messages sent at superstep S are received at
superstep S + 1
Computation Life Cycle
● Initially, all vertices are active
● Inactive vertices activate again on receiving
messages
● In each superstep, active vertices:
– Receive messages from the previous superstep
– Can change their value depending on their state
– Can check the value of their neighbors
– Can send messages to other vertices
– Can vote to halt, becoming inactive
● When all vertices are inactive, computation ends
Ex: Shortest Path A D→
● Single source shortest paths example
● Want to find the shortest path from A to D
● For simplicity, edges have value 1
Ex: Shortest Path A D→
A: 0
B: ∞
C: ∞ D: ∞
E: ∞
Superstep 0:
All vertices active, A sends messages and halts
0+1
0+1
0+1
Ex: Shortest Path A D→
A: 0
B: 1
C: 1 D: ∞
E: 1
Superstep 1:
B, C, E get the messages and update their values
1+1
1+1
Ex: Shortest Path A D→
A: 0
B: 1
C: 1 D: 2
E: 1
Superstep 2:
E gets mssge from B, but doesn't change its value
Ex: Shortest Path A D→
A: 0
B: 1
C: 1 D: 2
E: 1
Superstep 3:
All vertices have halted and the computation ends
Apache Giraph
● Open-source implementation of Pregel
● Started by Yahoo, used by FB, LinkedIn, Twitter
● Built on top Hadoop & Zookeeper:
– Mappers are used as nodes: N workers + 1 master
– Master-worker coordination via Zookeeper
– Natively reads and writes to HDFS
– Natively reads and writes Writables
– Can use counters, distributed cache, etc.
https://giraph.apache.org/
Apache Giraph
● Pros:
– Integrates well with Hadoop
– Has many examples included
– Much better tool for processing graphs than raw MR
● Cons:
– Documentation could be better
– Still evolving: API changes in Giraph 1.1.0
– Not as used as other Hadoop projects
GiraphVertex API
public class MyVertex
extends Vertex<IntWritable, IntWritable, NullWritable, IntWritable> {
@Override
public void compute(Iterable<IntWritable> msgs) throws IOException {
int superstep = getSuperstep(); // Current superstep
setValue(val); // Modifies vertex value
sendMessage(neighbor, value); // Sends message to a neighbor
sendMessageToAllEdges(value); // Sends message to all neighbors
}
}
Vertex ID Type
Vertex Value Type
Edge Value Type
Message Value Type
GiraphVertex API
● Look at the shortest path source code:
– SimpleShortestPathsVertex.java (v1.0.0)
Giraph Input/Output
● You can read vertex oriented (adjacency list) or
edge oriented (pairs of vertices) files
● Many formats already available:
– VertexInputFormat / VertexOutputFormat
– HiveVertexInputFormat / HiveVertexOutputFormat
– …
● You can easily read any format extending
VertexInputFormat / EdgeInputFormat
Thanks!

More Related Content

Similar to Processing Large Graphs in Hadoop

Pregel and giraph
Pregel and giraphPregel and giraph
Pregel and giraph
Cao Manh Dat
 
Introduction of MapReduce
Introduction of MapReduceIntroduction of MapReduce
Introduction of MapReduce
HC Lin
 
Hadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupHadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User Group
Csaba Toth
 
Apache Giraph
Apache GiraphApache Giraph
Apache Giraph
Ahmet Emre Aladağ
 
Mr hadoop seedrocket
Mr hadoop seedrocketMr hadoop seedrocket
Mr hadoop seedrocket
SeedRocket
 
MapReduce Programming Model
MapReduce Programming ModelMapReduce Programming Model
MapReduce Programming Model
AdarshaDhakal
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
Andy Petrella
 
MapReduce with Hadoop and Ruby
MapReduce with Hadoop and RubyMapReduce with Hadoop and Ruby
MapReduce with Hadoop and Ruby
Swanand Pagnis
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
Pallav Jha
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Alexey Zinoviev
 
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Ontico
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
samthemonad
 
Benchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsBenchmarking tool for graph algorithms
Benchmarking tool for graph algorithms
Yash Khandelwal
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
Big Data Montreal
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
StampedeCon
 
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Databricks
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
Jakir Hossain
 
Hive and Pig for .NET User Group
Hive and Pig for .NET User GroupHive and Pig for .NET User Group
Hive and Pig for .NET User Group
Csaba Toth
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Mohamed Ali Mahmoud khouder
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processing
sscdotopen
 

Similar to Processing Large Graphs in Hadoop (20)

Pregel and giraph
Pregel and giraphPregel and giraph
Pregel and giraph
 
Introduction of MapReduce
Introduction of MapReduceIntroduction of MapReduce
Introduction of MapReduce
 
Hadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupHadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User Group
 
Apache Giraph
Apache GiraphApache Giraph
Apache Giraph
 
Mr hadoop seedrocket
Mr hadoop seedrocketMr hadoop seedrocket
Mr hadoop seedrocket
 
MapReduce Programming Model
MapReduce Programming ModelMapReduce Programming Model
MapReduce Programming Model
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
 
MapReduce with Hadoop and Ruby
MapReduce with Hadoop and RubyMapReduce with Hadoop and Ruby
MapReduce with Hadoop and Ruby
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
 
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
Benchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsBenchmarking tool for graph algorithms
Benchmarking tool for graph algorithms
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
 
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
 
Hive and Pig for .NET User Group
Hive and Pig for .NET User GroupHive and Pig for .NET User Group
Hive and Pig for .NET User Group
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processing
 

Recently uploaded

GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Envertis Software Solutions
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
pavan998932
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
Hironori Washizaki
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 

Recently uploaded (20)

GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 

Processing Large Graphs in Hadoop

  • 1. Processing Large Graphs in Hadoop Dani Solà
  • 2. Index ● The Problem ● Google's Pregel ● Example ● Apache Giraph
  • 3. The Problem ● Processing graphs in MR is not practical: – Most algorithms are iterative – Each iteration is mapped to a MR Job – Takes too long if many iterations are required – Writing MR for graph processing is not easy
  • 4. Google's Pregel ● Framework for iterative large graph processing ● Inspired by Bulk Synchronous Parallel model ● Computation is distributed among N+1 nodes – N workers that do the actual work – 1 master that synchronizes them ● Takes a vertex-centric approach – Is much easier to focus on the algorithm http://kowshik.github.io/JPregel/pregel_paper.pdf
  • 5. Pregel Main Concepts ● Computations are a sequence of supersteps ● Vertices are randomly distributed among nodes ● Vertices have values and directed edges to other vertices ● Vertices can send messages to other vertices ● Messages sent at superstep S are received at superstep S + 1
  • 6. Computation Life Cycle ● Initially, all vertices are active ● Inactive vertices activate again on receiving messages ● In each superstep, active vertices: – Receive messages from the previous superstep – Can change their value depending on their state – Can check the value of their neighbors – Can send messages to other vertices – Can vote to halt, becoming inactive ● When all vertices are inactive, computation ends
  • 7. Ex: Shortest Path A D→ ● Single source shortest paths example ● Want to find the shortest path from A to D ● For simplicity, edges have value 1
  • 8. Ex: Shortest Path A D→ A: 0 B: ∞ C: ∞ D: ∞ E: ∞ Superstep 0: All vertices active, A sends messages and halts 0+1 0+1 0+1
  • 9. Ex: Shortest Path A D→ A: 0 B: 1 C: 1 D: ∞ E: 1 Superstep 1: B, C, E get the messages and update their values 1+1 1+1
  • 10. Ex: Shortest Path A D→ A: 0 B: 1 C: 1 D: 2 E: 1 Superstep 2: E gets mssge from B, but doesn't change its value
  • 11. Ex: Shortest Path A D→ A: 0 B: 1 C: 1 D: 2 E: 1 Superstep 3: All vertices have halted and the computation ends
  • 12. Apache Giraph ● Open-source implementation of Pregel ● Started by Yahoo, used by FB, LinkedIn, Twitter ● Built on top Hadoop & Zookeeper: – Mappers are used as nodes: N workers + 1 master – Master-worker coordination via Zookeeper – Natively reads and writes to HDFS – Natively reads and writes Writables – Can use counters, distributed cache, etc. https://giraph.apache.org/
  • 13. Apache Giraph ● Pros: – Integrates well with Hadoop – Has many examples included – Much better tool for processing graphs than raw MR ● Cons: – Documentation could be better – Still evolving: API changes in Giraph 1.1.0 – Not as used as other Hadoop projects
  • 14. GiraphVertex API public class MyVertex extends Vertex<IntWritable, IntWritable, NullWritable, IntWritable> { @Override public void compute(Iterable<IntWritable> msgs) throws IOException { int superstep = getSuperstep(); // Current superstep setValue(val); // Modifies vertex value sendMessage(neighbor, value); // Sends message to a neighbor sendMessageToAllEdges(value); // Sends message to all neighbors } } Vertex ID Type Vertex Value Type Edge Value Type Message Value Type
  • 15. GiraphVertex API ● Look at the shortest path source code: – SimpleShortestPathsVertex.java (v1.0.0)
  • 16. Giraph Input/Output ● You can read vertex oriented (adjacency list) or edge oriented (pairs of vertices) files ● Many formats already available: – VertexInputFormat / VertexOutputFormat – HiveVertexInputFormat / HiveVertexOutputFormat – … ● You can easily read any format extending VertexInputFormat / EdgeInputFormat