SlideShare a Scribd company logo
Apache Giraph
Large-scale graph processing done better
Data Mining Class
Sapienza, University of Rome
A. Y. 2016 - 2017
Basic concepts Let’s start Get our hands dirty
Hi!
Simone Santacroce
santacroce.1542338@studenti.uniroma1.it
https://it.linkedin.com/in/simone-santacroce-272739134
Manuel Coppotelli
coppotelli.1540732@studenti.uniroma1.it
https://it.linkedin.com/in/manuelcoppotelli
George Adrian Munteanu
munteanu.1540833@studenti.uniroma1.it
https://it.linkedin.com/in/george-adrian-munteanu-707744134
Lorenzo Marconi
marconi.1494505@studenti.uniroma1.it
https://www.linkedin.com/in/lorenzo-marconi-1a2580105
Antonio La Torre
alatorre182@hotmail.it
https://www.linkedin.com/in/antonio-la-torre-768738134
Lucio Burlini
burlini.1705432@studenti.uniroma1.it
https://www.linkedin.com/in/lucio-burlini-827739134
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Agenda
1 Basic concepts
• Graphs in the real world
• Challenges on graphs
• MapReduce
• Giraph
2 Let’s start
• Out-Degree & In-Degree
3 Get our hands dirty
• Simple PageRank
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Agenda
1 Basic concepts
• Graphs in the real world
• Challenges on graphs
• MapReduce
• Giraph
2 Let’s start
• Out-Degree & In-Degree
3 Get our hands dirty
• Simple PageRank
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Graphs 101
• Graph: representation of a set
of objects G =< V , E >
• Captures pairwise relationships
between objects
• Can have directions, weights,
. . .
Apache Giraph
Basic concepts Let’s start Get our hands dirty
A computer network
Apache Giraph
Basic concepts Let’s start Get our hands dirty
A road map
Apache Giraph
Basic concepts Let’s start Get our hands dirty
The web
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Social networks
• Both physical and Internet mediated
• Users are vertices
• Any kind of interaction generates edges
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Graph are huge!
∼ 50B pages
∼ 1.1B users
∼ 570M users
∼ 530M users
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Graph are nasty
• Graph needs processing
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Graph are nasty
• Graph needs processing
• Each vertex depends on its neighbors, recursively
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Graph are nasty
• Graph needs processing
• Each vertex depends on its neighbors, recursively
• Recursive problems are nicely solved iteratively
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Graph are nasty
• Graph needs processing
• Each vertex depends on its neighbors, recursively
• Recursive problems are nicely solved iteratively
So what?
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Why not MapReduce?1
MapReduce is the current standard to manage big sets of data for
intensive computing.
Repeat N times . . .
1
https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf
Apache Giraph
Basic concepts Let’s start Get our hands dirty
MapReduce Drawbacks
• Each job is executed N times
• Job bootstrap
• Mappers send values and structure
• Extensive IO at input, shuffle & sort, output
Disk I/O and Job scheduling quickly dominate the algorithm
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Google’s Pregel2
• Especially developed for large scale graph processing
2
https://www.cs.cmu.edu/~pavlo/courses/fall2013/static/papers/p135-malewicz.pdf
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Google’s Pregel2
• Especially developed for large scale graph processing
• Intuitive API that let’s you “think like a vertex”
2
https://www.cs.cmu.edu/~pavlo/courses/fall2013/static/papers/p135-malewicz.pdf
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Google’s Pregel2
• Especially developed for large scale graph processing
• Intuitive API that let’s you “think like a vertex”
• Bulk Synchronous Parallel (BSP) as execution model
2
https://www.cs.cmu.edu/~pavlo/courses/fall2013/static/papers/p135-malewicz.pdf
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Google’s Pregel2
• Especially developed for large scale graph processing
• Intuitive API that let’s you “think like a vertex”
• Bulk Synchronous Parallel (BSP) as execution model
• Fault tolerance by checkpointing
2
https://www.cs.cmu.edu/~pavlo/courses/fall2013/static/papers/p135-malewicz.pdf
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Giraph
Apache Giraph
Basic concepts Let’s start Get our hands dirty
The Story
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Think like a vertex
• Each vertex has an id, a value, a list of adjacent neighbors and
corresponding edge values
• Vertices implement algorithms by sending messages
• Messages are delivered at the start of each superstep
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Bulk Synchronous Parallel (BSP)
• Master-Slave architecture
• Batch oriented processing
• Computation happens in-memory
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Advantages
• No locks: message-based communication
• No semaphores: global synchronization
• Iteration isolation: massively parallelizable
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Architecture
Single Map-only Job
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Jobs Schema
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Other things
Aggregators
• Mechanism for global communication and global computation
• Global value calculated in superstep t available in t + 1
• Pre-defined (e.g. sum, max, min) or user-definable functions3
3
The function has to be both commutative and associative
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Other things
Aggregators
• Mechanism for global communication and global computation
• Global value calculated in superstep t available in t + 1
• Pre-defined (e.g. sum, max, min) or user-definable functions3
Combiners
• User-defined function3 for messages before being sent or delivered
• Similar to Hadoop ones
• Saves on network or memory
3
The function has to be both commutative and associative
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Other things
Aggregators
• Mechanism for global communication and global computation
• Global value calculated in superstep t available in t + 1
• Pre-defined (e.g. sum, max, min) or user-definable functions3
Combiners
• User-defined function3 for messages before being sent or delivered
• Similar to Hadoop ones
• Saves on network or memory
Checkpointing
• Store work to disk at user-defined intervals (isn’t always evil)
• Restart on failure
3
The function has to be both commutative and associative
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Agenda
1 Basic concepts
• Graphs in the real world
• Challenges on graphs
• MapReduce
• Giraph
2 Let’s start
• Out-Degree & In-Degree
3 Get our hands dirty
• Simple PageRank
Apache Giraph
Basic concepts Let’s start Get our hands dirty
LongLongNullTextInputFormat
org.apache.giraph.io.formats.LongLongNullTextInputFormat
If there is ad edge from Node 1 to Node 2 then
Node 2 appears in the neighbor list of Node 1
<NODE1 ID> <SPACE> <NEIGHBOR1 ID> <SPACE> <NEIGHBOR2 ID> ...
<NODE2 ID> <SPACE> <NEIGHBOR1 ID> <SPACE> <NEIGHBOR2 ID> ...
...
Apache Giraph
Basic concepts Let’s start Get our hands dirty
IdWithValueTextOutputFormat
org.apache.giraph.io.formats.IdWithValueTextOutputFormat
For each node print the Node ID and the Node Value
<NODE1 ID> <TAB> <NODE1 VALUE>
<NODE2 ID> <TAB> <NODE2 VALUE>
...
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Demo
Demo code
https://github.com/manuelcoppotelli/giraph-demo
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Agenda
1 Basic concepts
• Graphs in the real world
• Challenges on graphs
• MapReduce
• Giraph
2 Let’s start
• Out-Degree & In-Degree
3 Get our hands dirty
• Simple PageRank
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Google’s PageRank4
• The success factor of Google’s search engine
4
http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Google’s PageRank4
• The success factor of Google’s search engine
• A graph algorithm computing the “importance” of webpages
4
http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Google’s PageRank4
• The success factor of Google’s search engine
• A graph algorithm computing the “importance” of webpages
◦ Important pages have a lot of links from other important pages
4
http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Google’s PageRank4
• The success factor of Google’s search engine
• A graph algorithm computing the “importance” of webpages
◦ Important pages have a lot of links from other important pages
◦ Look at the structure of the underlying network
4
http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Google’s PageRank4
• The success factor of Google’s search engine
• A graph algorithm computing the “importance” of webpages
◦ Important pages have a lot of links from other important pages
◦ Look at the structure of the underlying network
• Ability to conduct web scale graph processing
4
http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Simple PageRank
• Recursive definition
PageRanki+1(v) =
1 − d
N
+ d ·
u→v
PageRanki (u)
O(u)
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Simple PageRank
• Recursive definition
PageRanki+1(v) =
1 − d
N
+ d ·
u→v
PageRanki (u)
O(u)
• Where:
◦ d: damping factor; which percentage of the PageRank must be
transferred to the neighbors. Usually 0.85
◦ N: total number of pages
◦ O: out-degree; total number of link within a page
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Simple PageRank Example
1.0
1.0
1.0
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Simple PageRank Example
1.0
1.0
1.0
0.5
0.5
1
1
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Simple PageRank Example
1 · 0.85 + 0.15/3
0.5 · 0.85 + 0.15/3
1.5 · 0.85 + 0.15/3
0.5
0.5
1
1
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Simple PageRank Example
0.43
0.21
0.64
Apache Giraph
Basic concepts Let’s start Get our hands dirty
JsonLongDoubleFloatDoubleVertexInputFormat
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
Express both nodes and edges information using JSON arrays
[<vertex id>, <vertex value>,
[
[<dest vertex id>, <edge value>],
...
]
]
Notice
Fore more in/out formats visit https://github.com/apache/giraph/tree/
trunk/giraph-core/src/main/java/org/apache/giraph/io/formats
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Demo
Demo code
https://github.com/manuelcoppotelli/giraph-demo
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Q? & A!
Apache Giraph
Basic concepts Let’s start Get our hands dirty
Thank you for your attention
Contact us for any questions or problem
Demo code
https://github.com/manuelcoppotelli/giraph-demo
Homework
https://github.com/manuelcoppotelli/giraph-homework
Apache Giraph

More Related Content

What's hot

Scrum Einleitung Präsentation
Scrum Einleitung PräsentationScrum Einleitung Präsentation
Scrum Einleitung Präsentation
Andreas Nerlich
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
DataWorks Summit
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkHow to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They Work
Ilya Ganelin
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
Owen O'Malley
 
Agile Estimation And Planning
Agile Estimation And PlanningAgile Estimation And Planning
Agile Estimation And Planning
Phil Calçado
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Alex Levenson
 
Observabilidad: Todo lo que hay que ver
Observabilidad: Todo lo que hay que verObservabilidad: Todo lo que hay que ver
Observabilidad: Todo lo que hay que ver
Software Guru
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
Databricks
 
Parquet and AVRO
Parquet and AVROParquet and AVRO
Parquet and AVRO
airisData
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
Databricks
 
Rethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsRethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming Systems
Yingjun Wu
 
Scrum Überblick Teil 1
Scrum Überblick Teil 1Scrum Überblick Teil 1
Scrum Überblick Teil 1
Christof Zahn
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Chris Fregly
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesDataWorks Summit
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
colorant
 
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
hadooparchbook
 

What's hot (20)

Scrum Einleitung Präsentation
Scrum Einleitung PräsentationScrum Einleitung Präsentation
Scrum Einleitung Präsentation
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkHow to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They Work
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
 
Agile Estimation And Planning
Agile Estimation And PlanningAgile Estimation And Planning
Agile Estimation And Planning
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
 
Observabilidad: Todo lo que hay que ver
Observabilidad: Todo lo que hay que verObservabilidad: Todo lo que hay que ver
Observabilidad: Todo lo que hay que ver
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
 
Parquet and AVRO
Parquet and AVROParquet and AVRO
Parquet and AVRO
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
Rethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsRethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming Systems
 
Scrum Überblick Teil 1
Scrum Überblick Teil 1Scrum Überblick Teil 1
Scrum Überblick Teil 1
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
Neo4j vs giraph
Neo4j vs giraphNeo4j vs giraph
Neo4j vs giraph
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 

Viewers also liked

Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Ontico
 
Outreach campaign status module
Outreach campaign status moduleOutreach campaign status module
Outreach campaign status moduleeCairn Inc.
 
Deep Dive - Consumer Sentiment Rating & Analysis White Paper
Deep Dive - Consumer Sentiment Rating & Analysis White PaperDeep Dive - Consumer Sentiment Rating & Analysis White Paper
Deep Dive - Consumer Sentiment Rating & Analysis White PaperJon LeMire
 
Sentiment analysis module
Sentiment analysis moduleSentiment analysis module
Sentiment analysis moduleeCairn Inc.
 
CUbRIK research on social aspects
CUbRIK research on social aspectsCUbRIK research on social aspects
CUbRIK research on social aspects
CUbRIK Project
 
Proposal final
Proposal finalProposal final
Proposal final
Mido Razaz
 
Fast, Scalable Graph Processing: Apache Giraph on YARN
Fast, Scalable Graph Processing: Apache Giraph on YARNFast, Scalable Graph Processing: Apache Giraph on YARN
Fast, Scalable Graph Processing: Apache Giraph on YARN
DataWorks Summit
 
Sentiment analytics
Sentiment analytics Sentiment analytics
Sentiment analytics
Kamalika Some
 
Snapshot of winning submissions- Jigsaw Academy ValueLabs Sentiment Analysis ...
Snapshot of winning submissions- Jigsaw Academy ValueLabs Sentiment Analysis ...Snapshot of winning submissions- Jigsaw Academy ValueLabs Sentiment Analysis ...
Snapshot of winning submissions- Jigsaw Academy ValueLabs Sentiment Analysis ...
Jigsaw Academy
 
Aspect-level sentiment analysis of customer reviews using Double Propagation
Aspect-level sentiment analysis of customer reviews using Double PropagationAspect-level sentiment analysis of customer reviews using Double Propagation
Aspect-level sentiment analysis of customer reviews using Double Propagation
Hardik Dalal
 
Psychographic Marketing | What You Show Know
Psychographic Marketing | What You Show KnowPsychographic Marketing | What You Show Know
Psychographic Marketing | What You Show Know
Get A Clue Marketing Show
 
Yelp Data Challenge - Discovering Latent Factors using Ratings and Reviews
Yelp Data Challenge - Discovering Latent Factors using Ratings and ReviewsYelp Data Challenge - Discovering Latent Factors using Ratings and Reviews
Yelp Data Challenge - Discovering Latent Factors using Ratings and Reviews
Tharindu Mathew
 
"Managing User-Generated Reviews" - Jed Nachman (Yelp) - 2009 AIM Conference
"Managing User-Generated Reviews" - Jed Nachman (Yelp) - 2009 AIM Conference"Managing User-Generated Reviews" - Jed Nachman (Yelp) - 2009 AIM Conference
"Managing User-Generated Reviews" - Jed Nachman (Yelp) - 2009 AIM Conference
Joshua Tree Internet Media, LLC
 
Snapchat Group Snaps Proposal
Snapchat Group Snaps ProposalSnapchat Group Snaps Proposal
Snapchat Group Snaps Proposal
Ryan Cunningham
 
Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Parinds...
 Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parinds... Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parinds...
Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Parinds...Jigsaw Academy
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
🧑‍💻 Manuel Coppotelli
 
2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks
Avery Ching
 

Viewers also liked (20)

Giraph
GiraphGiraph
Giraph
 
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
 
Outreach campaign status module
Outreach campaign status moduleOutreach campaign status module
Outreach campaign status module
 
Deep Dive - Consumer Sentiment Rating & Analysis White Paper
Deep Dive - Consumer Sentiment Rating & Analysis White PaperDeep Dive - Consumer Sentiment Rating & Analysis White Paper
Deep Dive - Consumer Sentiment Rating & Analysis White Paper
 
Sentiment analysis module
Sentiment analysis moduleSentiment analysis module
Sentiment analysis module
 
CUbRIK research on social aspects
CUbRIK research on social aspectsCUbRIK research on social aspects
CUbRIK research on social aspects
 
Proposal final
Proposal finalProposal final
Proposal final
 
Fast, Scalable Graph Processing: Apache Giraph on YARN
Fast, Scalable Graph Processing: Apache Giraph on YARNFast, Scalable Graph Processing: Apache Giraph on YARN
Fast, Scalable Graph Processing: Apache Giraph on YARN
 
Sentiment analytics
Sentiment analytics Sentiment analytics
Sentiment analytics
 
Snapshot of winning submissions- Jigsaw Academy ValueLabs Sentiment Analysis ...
Snapshot of winning submissions- Jigsaw Academy ValueLabs Sentiment Analysis ...Snapshot of winning submissions- Jigsaw Academy ValueLabs Sentiment Analysis ...
Snapshot of winning submissions- Jigsaw Academy ValueLabs Sentiment Analysis ...
 
Aspect-level sentiment analysis of customer reviews using Double Propagation
Aspect-level sentiment analysis of customer reviews using Double PropagationAspect-level sentiment analysis of customer reviews using Double Propagation
Aspect-level sentiment analysis of customer reviews using Double Propagation
 
Psychographic Marketing | What You Show Know
Psychographic Marketing | What You Show KnowPsychographic Marketing | What You Show Know
Psychographic Marketing | What You Show Know
 
Yelp Data Challenge - Discovering Latent Factors using Ratings and Reviews
Yelp Data Challenge - Discovering Latent Factors using Ratings and ReviewsYelp Data Challenge - Discovering Latent Factors using Ratings and Reviews
Yelp Data Challenge - Discovering Latent Factors using Ratings and Reviews
 
"Managing User-Generated Reviews" - Jed Nachman (Yelp) - 2009 AIM Conference
"Managing User-Generated Reviews" - Jed Nachman (Yelp) - 2009 AIM Conference"Managing User-Generated Reviews" - Jed Nachman (Yelp) - 2009 AIM Conference
"Managing User-Generated Reviews" - Jed Nachman (Yelp) - 2009 AIM Conference
 
Snapchat Group Snaps Proposal
Snapchat Group Snaps ProposalSnapchat Group Snaps Proposal
Snapchat Group Snaps Proposal
 
Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Parinds...
 Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parinds... Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parinds...
Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Parinds...
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
 
Yelp Project
Yelp ProjectYelp Project
Yelp Project
 
2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks
 
Yelp final
Yelp finalYelp final
Yelp final
 

Similar to Apache Giraph: Large-scale graph processing done better

Giraph+Gora in ApacheCon14
Giraph+Gora in ApacheCon14Giraph+Gora in ApacheCon14
Giraph+Gora in ApacheCon14
Renato Javier Marroquín Mogrovejo
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
Giivee The
 
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraCassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
DataStax Academy
 
Debugging Apache Spark - Scala & Python super happy fun times 2017
Debugging Apache Spark -   Scala & Python super happy fun times 2017Debugging Apache Spark -   Scala & Python super happy fun times 2017
Debugging Apache Spark - Scala & Python super happy fun times 2017
Holden Karau
 
Alexander Janssens & Gert-Jan van Rooij- Getting started with API
Alexander Janssens & Gert-Jan van Rooij- Getting started with APIAlexander Janssens & Gert-Jan van Rooij- Getting started with API
Alexander Janssens & Gert-Jan van Rooij- Getting started with API
TOPdesk
 
Scaling Analytics with Apache Spark
Scaling Analytics with Apache SparkScaling Analytics with Apache Spark
Scaling Analytics with Apache Spark
QuantUniversity
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Sarah Guido
 
2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwords2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwordsNitay Joffe
 
2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users Group2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users GroupNitay Joffe
 
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
confluent
 
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Simplilearn
 
Python ml
Python mlPython ml
Python ml
Shubham Sharma
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
Srinath Perera
 
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
gmalouf678
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
Big Data Spain
 
Scrapy
ScrapyScrapy
Enterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4JEnterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4J
Josh Patterson
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics Applications
DataWorks Summit
 
To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014
Pranav Prakash
 
Introduction to GraphQL (or How I Learned to Stop Worrying about REST APIs)
Introduction to GraphQL (or How I Learned to Stop Worrying about REST APIs)Introduction to GraphQL (or How I Learned to Stop Worrying about REST APIs)
Introduction to GraphQL (or How I Learned to Stop Worrying about REST APIs)
Hafiz Ismail
 

Similar to Apache Giraph: Large-scale graph processing done better (20)

Giraph+Gora in ApacheCon14
Giraph+Gora in ApacheCon14Giraph+Gora in ApacheCon14
Giraph+Gora in ApacheCon14
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
 
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraCassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
 
Debugging Apache Spark - Scala & Python super happy fun times 2017
Debugging Apache Spark -   Scala & Python super happy fun times 2017Debugging Apache Spark -   Scala & Python super happy fun times 2017
Debugging Apache Spark - Scala & Python super happy fun times 2017
 
Alexander Janssens & Gert-Jan van Rooij- Getting started with API
Alexander Janssens & Gert-Jan van Rooij- Getting started with APIAlexander Janssens & Gert-Jan van Rooij- Getting started with API
Alexander Janssens & Gert-Jan van Rooij- Getting started with API
 
Scaling Analytics with Apache Spark
Scaling Analytics with Apache SparkScaling Analytics with Apache Spark
Scaling Analytics with Apache Spark
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
 
2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwords2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwords
 
2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users Group2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users Group
 
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
 
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
 
Python ml
Python mlPython ml
Python ml
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
 
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Scrapy
ScrapyScrapy
Scrapy
 
Enterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4JEnterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4J
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics Applications
 
To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014
 
Introduction to GraphQL (or How I Learned to Stop Worrying about REST APIs)
Introduction to GraphQL (or How I Learned to Stop Worrying about REST APIs)Introduction to GraphQL (or How I Learned to Stop Worrying about REST APIs)
Introduction to GraphQL (or How I Learned to Stop Worrying about REST APIs)
 

Recently uploaded

678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
CarlosHernanMontoyab2
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 

Recently uploaded (20)

678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 

Apache Giraph: Large-scale graph processing done better

  • 1. Apache Giraph Large-scale graph processing done better Data Mining Class Sapienza, University of Rome A. Y. 2016 - 2017
  • 2. Basic concepts Let’s start Get our hands dirty Hi! Simone Santacroce santacroce.1542338@studenti.uniroma1.it https://it.linkedin.com/in/simone-santacroce-272739134 Manuel Coppotelli coppotelli.1540732@studenti.uniroma1.it https://it.linkedin.com/in/manuelcoppotelli George Adrian Munteanu munteanu.1540833@studenti.uniroma1.it https://it.linkedin.com/in/george-adrian-munteanu-707744134 Lorenzo Marconi marconi.1494505@studenti.uniroma1.it https://www.linkedin.com/in/lorenzo-marconi-1a2580105 Antonio La Torre alatorre182@hotmail.it https://www.linkedin.com/in/antonio-la-torre-768738134 Lucio Burlini burlini.1705432@studenti.uniroma1.it https://www.linkedin.com/in/lucio-burlini-827739134 Apache Giraph
  • 3. Basic concepts Let’s start Get our hands dirty Agenda 1 Basic concepts • Graphs in the real world • Challenges on graphs • MapReduce • Giraph 2 Let’s start • Out-Degree & In-Degree 3 Get our hands dirty • Simple PageRank Apache Giraph
  • 4. Basic concepts Let’s start Get our hands dirty Agenda 1 Basic concepts • Graphs in the real world • Challenges on graphs • MapReduce • Giraph 2 Let’s start • Out-Degree & In-Degree 3 Get our hands dirty • Simple PageRank Apache Giraph
  • 5. Basic concepts Let’s start Get our hands dirty Graphs 101 • Graph: representation of a set of objects G =< V , E > • Captures pairwise relationships between objects • Can have directions, weights, . . . Apache Giraph
  • 6. Basic concepts Let’s start Get our hands dirty A computer network Apache Giraph
  • 7. Basic concepts Let’s start Get our hands dirty A road map Apache Giraph
  • 8. Basic concepts Let’s start Get our hands dirty The web Apache Giraph
  • 9. Basic concepts Let’s start Get our hands dirty Social networks • Both physical and Internet mediated • Users are vertices • Any kind of interaction generates edges Apache Giraph
  • 10. Basic concepts Let’s start Get our hands dirty Graph are huge! ∼ 50B pages ∼ 1.1B users ∼ 570M users ∼ 530M users Apache Giraph
  • 11. Basic concepts Let’s start Get our hands dirty Graph are nasty • Graph needs processing Apache Giraph
  • 12. Basic concepts Let’s start Get our hands dirty Graph are nasty • Graph needs processing • Each vertex depends on its neighbors, recursively Apache Giraph
  • 13. Basic concepts Let’s start Get our hands dirty Graph are nasty • Graph needs processing • Each vertex depends on its neighbors, recursively • Recursive problems are nicely solved iteratively Apache Giraph
  • 14. Basic concepts Let’s start Get our hands dirty Graph are nasty • Graph needs processing • Each vertex depends on its neighbors, recursively • Recursive problems are nicely solved iteratively So what? Apache Giraph
  • 15. Basic concepts Let’s start Get our hands dirty Why not MapReduce?1 MapReduce is the current standard to manage big sets of data for intensive computing. Repeat N times . . . 1 https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf Apache Giraph
  • 16. Basic concepts Let’s start Get our hands dirty MapReduce Drawbacks • Each job is executed N times • Job bootstrap • Mappers send values and structure • Extensive IO at input, shuffle & sort, output Disk I/O and Job scheduling quickly dominate the algorithm Apache Giraph
  • 17. Basic concepts Let’s start Get our hands dirty Google’s Pregel2 • Especially developed for large scale graph processing 2 https://www.cs.cmu.edu/~pavlo/courses/fall2013/static/papers/p135-malewicz.pdf Apache Giraph
  • 18. Basic concepts Let’s start Get our hands dirty Google’s Pregel2 • Especially developed for large scale graph processing • Intuitive API that let’s you “think like a vertex” 2 https://www.cs.cmu.edu/~pavlo/courses/fall2013/static/papers/p135-malewicz.pdf Apache Giraph
  • 19. Basic concepts Let’s start Get our hands dirty Google’s Pregel2 • Especially developed for large scale graph processing • Intuitive API that let’s you “think like a vertex” • Bulk Synchronous Parallel (BSP) as execution model 2 https://www.cs.cmu.edu/~pavlo/courses/fall2013/static/papers/p135-malewicz.pdf Apache Giraph
  • 20. Basic concepts Let’s start Get our hands dirty Google’s Pregel2 • Especially developed for large scale graph processing • Intuitive API that let’s you “think like a vertex” • Bulk Synchronous Parallel (BSP) as execution model • Fault tolerance by checkpointing 2 https://www.cs.cmu.edu/~pavlo/courses/fall2013/static/papers/p135-malewicz.pdf Apache Giraph
  • 21. Basic concepts Let’s start Get our hands dirty Giraph Apache Giraph
  • 22. Basic concepts Let’s start Get our hands dirty The Story Apache Giraph
  • 23. Basic concepts Let’s start Get our hands dirty Think like a vertex • Each vertex has an id, a value, a list of adjacent neighbors and corresponding edge values • Vertices implement algorithms by sending messages • Messages are delivered at the start of each superstep Apache Giraph
  • 24. Basic concepts Let’s start Get our hands dirty Bulk Synchronous Parallel (BSP) • Master-Slave architecture • Batch oriented processing • Computation happens in-memory Apache Giraph
  • 25. Basic concepts Let’s start Get our hands dirty Advantages • No locks: message-based communication • No semaphores: global synchronization • Iteration isolation: massively parallelizable Apache Giraph
  • 26. Basic concepts Let’s start Get our hands dirty Architecture Single Map-only Job Apache Giraph
  • 27. Basic concepts Let’s start Get our hands dirty Jobs Schema Apache Giraph
  • 28. Basic concepts Let’s start Get our hands dirty Other things Aggregators • Mechanism for global communication and global computation • Global value calculated in superstep t available in t + 1 • Pre-defined (e.g. sum, max, min) or user-definable functions3 3 The function has to be both commutative and associative Apache Giraph
  • 29. Basic concepts Let’s start Get our hands dirty Other things Aggregators • Mechanism for global communication and global computation • Global value calculated in superstep t available in t + 1 • Pre-defined (e.g. sum, max, min) or user-definable functions3 Combiners • User-defined function3 for messages before being sent or delivered • Similar to Hadoop ones • Saves on network or memory 3 The function has to be both commutative and associative Apache Giraph
  • 30. Basic concepts Let’s start Get our hands dirty Other things Aggregators • Mechanism for global communication and global computation • Global value calculated in superstep t available in t + 1 • Pre-defined (e.g. sum, max, min) or user-definable functions3 Combiners • User-defined function3 for messages before being sent or delivered • Similar to Hadoop ones • Saves on network or memory Checkpointing • Store work to disk at user-defined intervals (isn’t always evil) • Restart on failure 3 The function has to be both commutative and associative Apache Giraph
  • 31. Basic concepts Let’s start Get our hands dirty Agenda 1 Basic concepts • Graphs in the real world • Challenges on graphs • MapReduce • Giraph 2 Let’s start • Out-Degree & In-Degree 3 Get our hands dirty • Simple PageRank Apache Giraph
  • 32. Basic concepts Let’s start Get our hands dirty LongLongNullTextInputFormat org.apache.giraph.io.formats.LongLongNullTextInputFormat If there is ad edge from Node 1 to Node 2 then Node 2 appears in the neighbor list of Node 1 <NODE1 ID> <SPACE> <NEIGHBOR1 ID> <SPACE> <NEIGHBOR2 ID> ... <NODE2 ID> <SPACE> <NEIGHBOR1 ID> <SPACE> <NEIGHBOR2 ID> ... ... Apache Giraph
  • 33. Basic concepts Let’s start Get our hands dirty IdWithValueTextOutputFormat org.apache.giraph.io.formats.IdWithValueTextOutputFormat For each node print the Node ID and the Node Value <NODE1 ID> <TAB> <NODE1 VALUE> <NODE2 ID> <TAB> <NODE2 VALUE> ... Apache Giraph
  • 34. Basic concepts Let’s start Get our hands dirty Demo Demo code https://github.com/manuelcoppotelli/giraph-demo Apache Giraph
  • 35. Basic concepts Let’s start Get our hands dirty Agenda 1 Basic concepts • Graphs in the real world • Challenges on graphs • MapReduce • Giraph 2 Let’s start • Out-Degree & In-Degree 3 Get our hands dirty • Simple PageRank Apache Giraph
  • 36. Basic concepts Let’s start Get our hands dirty Google’s PageRank4 • The success factor of Google’s search engine 4 http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf Apache Giraph
  • 37. Basic concepts Let’s start Get our hands dirty Google’s PageRank4 • The success factor of Google’s search engine • A graph algorithm computing the “importance” of webpages 4 http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf Apache Giraph
  • 38. Basic concepts Let’s start Get our hands dirty Google’s PageRank4 • The success factor of Google’s search engine • A graph algorithm computing the “importance” of webpages ◦ Important pages have a lot of links from other important pages 4 http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf Apache Giraph
  • 39. Basic concepts Let’s start Get our hands dirty Google’s PageRank4 • The success factor of Google’s search engine • A graph algorithm computing the “importance” of webpages ◦ Important pages have a lot of links from other important pages ◦ Look at the structure of the underlying network 4 http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf Apache Giraph
  • 40. Basic concepts Let’s start Get our hands dirty Google’s PageRank4 • The success factor of Google’s search engine • A graph algorithm computing the “importance” of webpages ◦ Important pages have a lot of links from other important pages ◦ Look at the structure of the underlying network • Ability to conduct web scale graph processing 4 http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf Apache Giraph
  • 41. Basic concepts Let’s start Get our hands dirty Simple PageRank • Recursive definition PageRanki+1(v) = 1 − d N + d · u→v PageRanki (u) O(u) Apache Giraph
  • 42. Basic concepts Let’s start Get our hands dirty Simple PageRank • Recursive definition PageRanki+1(v) = 1 − d N + d · u→v PageRanki (u) O(u) • Where: ◦ d: damping factor; which percentage of the PageRank must be transferred to the neighbors. Usually 0.85 ◦ N: total number of pages ◦ O: out-degree; total number of link within a page Apache Giraph
  • 43. Basic concepts Let’s start Get our hands dirty Simple PageRank Example 1.0 1.0 1.0 Apache Giraph
  • 44. Basic concepts Let’s start Get our hands dirty Simple PageRank Example 1.0 1.0 1.0 0.5 0.5 1 1 Apache Giraph
  • 45. Basic concepts Let’s start Get our hands dirty Simple PageRank Example 1 · 0.85 + 0.15/3 0.5 · 0.85 + 0.15/3 1.5 · 0.85 + 0.15/3 0.5 0.5 1 1 Apache Giraph
  • 46. Basic concepts Let’s start Get our hands dirty Simple PageRank Example 0.43 0.21 0.64 Apache Giraph
  • 47. Basic concepts Let’s start Get our hands dirty JsonLongDoubleFloatDoubleVertexInputFormat org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat Express both nodes and edges information using JSON arrays [<vertex id>, <vertex value>, [ [<dest vertex id>, <edge value>], ... ] ] Notice Fore more in/out formats visit https://github.com/apache/giraph/tree/ trunk/giraph-core/src/main/java/org/apache/giraph/io/formats Apache Giraph
  • 48. Basic concepts Let’s start Get our hands dirty Demo Demo code https://github.com/manuelcoppotelli/giraph-demo Apache Giraph
  • 49. Basic concepts Let’s start Get our hands dirty Q? & A! Apache Giraph
  • 50. Basic concepts Let’s start Get our hands dirty Thank you for your attention Contact us for any questions or problem Demo code https://github.com/manuelcoppotelli/giraph-demo Homework https://github.com/manuelcoppotelli/giraph-homework Apache Giraph