SlideShare a Scribd company logo
BenchMarking Tool for
Graph Algorithms
IIIT-H Cloud Computing - Major Project
By:
Abhinaba Sarkar 201405616
Malavika Reddy 201201193
Yash Khandelwal 201302164
Nikita Kad 201330030
Description
● In computer science and mathematics, graphs are abstract data structures that model
structural relationships among objects. They are now widely used for data modeling in
application domains for which identifying relationship patterns, rules, and anomalies is useful.
● These domains include the web graph, social networks,etc. The ever-increasing size of graph-
structured data for these applications creates a critical need for scalable systems that can
process large amounts of it efficiently.
● The project aims at making a benchmarking tool for testing the performance of graph
algorithms like BFS, Pagerank,etc. with MapReduce, Giraph, GraphLab and Neo4j and testing
which approach works better on what kind of graphs.
Motivation
● Analyze the runtime of different types of graph algorithms on different
types of distributed systems.
● Performing computation on a graph data structure requires processing at
each node.
● Each node contains node-specific data as well as links (edges) to other
nodes. So computation must traverse the graph which will take a huge
amount of time.
Approach
The BFS/SSSP algorithm is broken in 2 tasks:
● Map Task:In each Map task, we discover all the neighbors of the node currently in queue (we
used color encoding GRAY for nodes in queue) and add them to our graph.
● Reduce Task:In each Reduce task, we set the correct level of the nodes and update the graph.
The pagerank algorithm is also broken in 2 steps:
● Map Task: Each page emit its neighbours and current pagerank.
● Reduce Task: For each key(page) new page rank is calculated using pagerank emitted in the
map task.
○ PR(A)=(1-d) + d(PR(T1)/C(T1) + ... +PR(Tn)/C(Tn)) Where - C(P) is the cardinality (out-
degree) of page P, d is the damping (“random URL”) factor.
Dijkstra:
● Map task : In each of the map tasks, neighbors are discovered and put into
the queue with color coding gray.
● Reduce task : In each of the reduce tasks, we select the nodes according to
the shortest distances from the current node.
Approach contd.
Giraph and Hadoop
All the computations are done on a cluster of 2 nodes
Graphlab
All the computations are performed on single machine
Applications
In today’s world, dynamic social graphs (like:
linkedin, twitter and facebook) are not feasible to
process in single node. Therefore we need to
benchmark the runtime of different graph
algorithms in distributed system.
Example graph: LinkedIn’s social graph
Complexity
● BFS: The complexity of standard BFS algorithm is O(V+E) but because of
the overhead of read/write in distributed computing, the order reaches O
(E*Depth).
● Similar is the case for Dijkstra’s algorithm. But number of iterations will be
higher than BFS.
● Page Rank: The Complexity of pagerank in distributed system is –
(No. of Node + No. of Relations)*Iterations
Benchmarking - Giraph
Nodes Time
1000 4 min
7.836 sec
1 million 10 min
11.443sec
Nodes Time
1000 3 min 5.655
sec
1 million 11 min 0.05
sec
Nodes Time
1000 5 min
12.111 sec
1 million 16 min
8.652 sec
BFS Dijkstra Pagerank
Nodes Time
1000 6.029 sec
10,000 20.154 sec
1 million 1 min 11.124
sec
Nodes Time
1000 4.852 sec
10,000 13.029 sec
1 million 1 min 10.576sec
Page-Rank
Dijkstra
Benchmarking - Graphlab
Benchmarking - Hadoop
Nodes Time
1000 4 min
7.836 sec
1 million 10 min
11.443sec
Nodes Time
1000 3 min 5.655
sec
1 million 11 min 0.05
sec
BFS Dijkstra Pagerank
Nodes Time
1000 5 min
12.111 sec
1 million 16 min
8.652 sec
BFS and Dijkstra’s runtime depend on the depth of the input graph.
Problems we faced
● Poor locality of memory access.
● Very little work per vertex.
● Changing degree of parallelism.
● Running over many machines makes the problem worse
Conclusion and Future Work
● Although GraphLab is fast, there is constraint on memory as it requires as much memory to
contain the edges and their associated values of any single vertex in the graph.
● From the experimental results, it is seen that the time taken for pagerank algorithm is directly
proportional to the number of relations in the graph when the number of nodes and iterations
are constant. This explains the huge difference in time.
● The runtime of BFS is directly proportional to the depth of the graph. So, greater the depth,
more will be the number of iterations and hence more time.
Future Work:
Taking the input graph from file adds a huge overhead of reading and writing to files in each
iteration, so if somehow we can store the graph and its properties in a Database, the read/write
overhead will be gone and the query time will be reduced. So,we plan to include Database in it.

More Related Content

What's hot

Dr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GISDr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GIS
Shaun Lewis
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyUsing R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Guy Lansley
 
Spatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use CasesSpatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use Casesmathieuraj
 
Executing Joins Dynamically in DDBS Query Optimizer
Executing Joins Dynamically in DDBS Query OptimizerExecuting Joins Dynamically in DDBS Query Optimizer
Executing Joins Dynamically in DDBS Query Optimizer
Er. Shiva K. Shrestha
 
Datech2014-Session1-Document Representation Refinement for Precise Region Des...
Datech2014-Session1-Document Representation Refinement for Precise Region Des...Datech2014-Session1-Document Representation Refinement for Precise Region Des...
Datech2014-Session1-Document Representation Refinement for Precise Region Des...
IMPACT Centre of Competence
 
Graph of UK train stations
Graph of UK train stationsGraph of UK train stations
Graph of UK train stations
Daniyar Mukhanov
 
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and ImplementationDistributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementationtugrulh
 
Reactive Databases for Big Data applications
Reactive Databases for Big Data applicationsReactive Databases for Big Data applications
Reactive Databases for Big Data applications
Graph-TA
 
5 Ways to Improve Your LiDAR Workflows
5 Ways to Improve Your LiDAR Workflows5 Ways to Improve Your LiDAR Workflows
5 Ways to Improve Your LiDAR Workflows
Safe Software
 
Network analysis and Geocoding.
Network analysis and Geocoding.Network analysis and Geocoding.
Network analysis and Geocoding.Habiba28
 
GIS fundamentals - raster
GIS fundamentals - rasterGIS fundamentals - raster
GIS fundamentals - raster
Hans van der Kwast
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
Dr Shashikant Athawale
 
Sparse inverse covariance estimation
Sparse inverse covariance estimationSparse inverse covariance estimation
Sparse inverse covariance estimation
Ayush Singh, MS
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - Introduction
Jungwon Kim
 
How Powerful are Graph Networks?
How Powerful are Graph Networks?How Powerful are Graph Networks?
How Powerful are Graph Networks?
IAMAl
 
Time travel and time series analysis with pandas + statsmodels
Time travel and time series analysis with pandas + statsmodelsTime travel and time series analysis with pandas + statsmodels
Time travel and time series analysis with pandas + statsmodels
Alexander Hendorf
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
Sergey Enin
 
Parallel Algorithms- Sorting and Graph
Parallel Algorithms- Sorting and GraphParallel Algorithms- Sorting and Graph
Parallel Algorithms- Sorting and Graph
Dr Shashikant Athawale
 
Par add shared ifc parameters
Par add shared ifc parametersPar add shared ifc parameters
Par add shared ifc parameters
Menno Mekes
 

What's hot (20)

Dr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GISDr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GIS
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyUsing R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
 
Spatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use CasesSpatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use Cases
 
Executing Joins Dynamically in DDBS Query Optimizer
Executing Joins Dynamically in DDBS Query OptimizerExecuting Joins Dynamically in DDBS Query Optimizer
Executing Joins Dynamically in DDBS Query Optimizer
 
Datech2014-Session1-Document Representation Refinement for Precise Region Des...
Datech2014-Session1-Document Representation Refinement for Precise Region Des...Datech2014-Session1-Document Representation Refinement for Precise Region Des...
Datech2014-Session1-Document Representation Refinement for Precise Region Des...
 
Graph of UK train stations
Graph of UK train stationsGraph of UK train stations
Graph of UK train stations
 
BarnieMAT
BarnieMATBarnieMAT
BarnieMAT
 
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and ImplementationDistributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
 
Reactive Databases for Big Data applications
Reactive Databases for Big Data applicationsReactive Databases for Big Data applications
Reactive Databases for Big Data applications
 
5 Ways to Improve Your LiDAR Workflows
5 Ways to Improve Your LiDAR Workflows5 Ways to Improve Your LiDAR Workflows
5 Ways to Improve Your LiDAR Workflows
 
Network analysis and Geocoding.
Network analysis and Geocoding.Network analysis and Geocoding.
Network analysis and Geocoding.
 
GIS fundamentals - raster
GIS fundamentals - rasterGIS fundamentals - raster
GIS fundamentals - raster
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
 
Sparse inverse covariance estimation
Sparse inverse covariance estimationSparse inverse covariance estimation
Sparse inverse covariance estimation
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - Introduction
 
How Powerful are Graph Networks?
How Powerful are Graph Networks?How Powerful are Graph Networks?
How Powerful are Graph Networks?
 
Time travel and time series analysis with pandas + statsmodels
Time travel and time series analysis with pandas + statsmodelsTime travel and time series analysis with pandas + statsmodels
Time travel and time series analysis with pandas + statsmodels
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
Parallel Algorithms- Sorting and Graph
Parallel Algorithms- Sorting and GraphParallel Algorithms- Sorting and Graph
Parallel Algorithms- Sorting and Graph
 
Par add shared ifc parameters
Par add shared ifc parametersPar add shared ifc parameters
Par add shared ifc parameters
 

Viewers also liked

Benchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsBenchmarking tool for graph algorithms
Benchmarking tool for graph algorithms
Yash Khandelwal
 
Dynamic Draph / Iterative Computation on Apache Giraph
Dynamic Draph / Iterative Computation on Apache GiraphDynamic Draph / Iterative Computation on Apache Giraph
Dynamic Draph / Iterative Computation on Apache GiraphDataWorks Summit
 
Apache Giraph
Apache GiraphApache Giraph
Apache Giraph
Ahmet Emre Aladağ
 
Sparksee overview
Sparksee overviewSparksee overview
Sparksee overview
Sparsity Technologies
 
Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)
Yuanyuan Tian
 
Selling Your House Spring-2015
Selling Your House Spring-2015Selling Your House Spring-2015
Selling Your House Spring-2015
MICHAEL TESSARO
 
Give your body a nutritious diet
Give your body a nutritious dietGive your body a nutritious diet
Give your body a nutritious diet
GM Diet Magic
 
Київська русь
Київська русьКиївська русь
Київська русь
svinchuk
 
El misterio del solitario
El misterio del solitarioEl misterio del solitario
El misterio del solitario
Pamela Quirarte
 
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social WebGraphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
Joël Perras
 
ملخص رسالة ماجستير أحمد المباريدي
ملخص رسالة ماجستير أحمد المباريديملخص رسالة ماجستير أحمد المباريدي
ملخص رسالة ماجستير أحمد المباريدي
Ahmed EL-Mabaredy
 
Apple diseases by Nazia Manzar
Apple diseases by Nazia ManzarApple diseases by Nazia Manzar
Apple diseases by Nazia Manzar
Nazia Manzar
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
MLconf
 
Instagramrettino
InstagramrettinoInstagramrettino
Instagramrettino
joeyrettino
 
2015 SW마에스트로 100+ 컨퍼런스_Hacking IoT
2015 SW마에스트로 100+ 컨퍼런스_Hacking IoT2015 SW마에스트로 100+ 컨퍼런스_Hacking IoT
2015 SW마에스트로 100+ 컨퍼런스_Hacking IoT
2015 SW마에스트로 100+ 컨퍼런스
 

Viewers also liked (17)

Benchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsBenchmarking tool for graph algorithms
Benchmarking tool for graph algorithms
 
Dynamic Draph / Iterative Computation on Apache Giraph
Dynamic Draph / Iterative Computation on Apache GiraphDynamic Draph / Iterative Computation on Apache Giraph
Dynamic Draph / Iterative Computation on Apache Giraph
 
Apache Giraph
Apache GiraphApache Giraph
Apache Giraph
 
Sparksee overview
Sparksee overviewSparksee overview
Sparksee overview
 
Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)
 
Selling Your House Spring-2015
Selling Your House Spring-2015Selling Your House Spring-2015
Selling Your House Spring-2015
 
Give your body a nutritious diet
Give your body a nutritious dietGive your body a nutritious diet
Give your body a nutritious diet
 
Київська русь
Київська русьКиївська русь
Київська русь
 
El misterio del solitario
El misterio del solitarioEl misterio del solitario
El misterio del solitario
 
1
11
1
 
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social WebGraphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
 
5. organ support techniques
5. organ support techniques5. organ support techniques
5. organ support techniques
 
ملخص رسالة ماجستير أحمد المباريدي
ملخص رسالة ماجستير أحمد المباريديملخص رسالة ماجستير أحمد المباريدي
ملخص رسالة ماجستير أحمد المباريدي
 
Apple diseases by Nazia Manzar
Apple diseases by Nazia ManzarApple diseases by Nazia Manzar
Apple diseases by Nazia Manzar
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
 
Instagramrettino
InstagramrettinoInstagramrettino
Instagramrettino
 
2015 SW마에스트로 100+ 컨퍼런스_Hacking IoT
2015 SW마에스트로 100+ 컨퍼런스_Hacking IoT2015 SW마에스트로 100+ 컨퍼런스_Hacking IoT
2015 SW마에스트로 100+ 컨퍼런스_Hacking IoT
 

Similar to Benchmarking Tool for Graph Algorithms

Streaming Python on Hadoop
Streaming Python on HadoopStreaming Python on Hadoop
Streaming Python on Hadoop
Vivian S. Zhang
 
How to Automate CAD & GIS Integration
How to Automate CAD & GIS IntegrationHow to Automate CAD & GIS Integration
How to Automate CAD & GIS Integration
Safe Software
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
ArangoDB Database
 
Machine Learning + Graph Databases for Better Recommendations
Machine Learning + Graph Databases for Better RecommendationsMachine Learning + Graph Databases for Better Recommendations
Machine Learning + Graph Databases for Better Recommendations
ChristopherWoodward16
 
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ArangoDB Database
 
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
Subhajit Sahu
 
Map reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsNishant Gandhi
 
Druid
DruidDruid
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache Spark
Lucian Neghina
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
ArangoDB Database
 
How to Get the Most Out of LiDAR Data
How to Get the Most Out of LiDAR DataHow to Get the Most Out of LiDAR Data
How to Get the Most Out of LiDAR Data
Safe Software
 
Chapter 3 principles of parallel algorithm design
Chapter 3   principles of parallel algorithm designChapter 3   principles of parallel algorithm design
Chapter 3 principles of parallel algorithm design
DenisAkbar1
 
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
dbpublications
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
Kyong-Ha Lee
 
Pregel
PregelPregel
Pregel
Weiru Dai
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspective
পল্লব রায়
 
Multiple dag applications
Multiple dag applicationsMultiple dag applications
Multiple dag applications
csandit
 
MULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORS
MULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORSMULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORS
MULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORS
cscpconf
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
SparkNet presentation
SparkNet presentationSparkNet presentation
SparkNet presentation
Sneh Pahilwani
 

Similar to Benchmarking Tool for Graph Algorithms (20)

Streaming Python on Hadoop
Streaming Python on HadoopStreaming Python on Hadoop
Streaming Python on Hadoop
 
How to Automate CAD & GIS Integration
How to Automate CAD & GIS IntegrationHow to Automate CAD & GIS Integration
How to Automate CAD & GIS Integration
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
 
Machine Learning + Graph Databases for Better Recommendations
Machine Learning + Graph Databases for Better RecommendationsMachine Learning + Graph Databases for Better Recommendations
Machine Learning + Graph Databases for Better Recommendations
 
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
 
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
 
Map reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problems
 
Druid
DruidDruid
Druid
 
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache Spark
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
 
How to Get the Most Out of LiDAR Data
How to Get the Most Out of LiDAR DataHow to Get the Most Out of LiDAR Data
How to Get the Most Out of LiDAR Data
 
Chapter 3 principles of parallel algorithm design
Chapter 3   principles of parallel algorithm designChapter 3   principles of parallel algorithm design
Chapter 3 principles of parallel algorithm design
 
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
 
Pregel
PregelPregel
Pregel
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspective
 
Multiple dag applications
Multiple dag applicationsMultiple dag applications
Multiple dag applications
 
MULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORS
MULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORSMULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORS
MULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORS
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
SparkNet presentation
SparkNet presentationSparkNet presentation
SparkNet presentation
 

Recently uploaded

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 

Recently uploaded (20)

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 

Benchmarking Tool for Graph Algorithms

  • 1. BenchMarking Tool for Graph Algorithms IIIT-H Cloud Computing - Major Project By: Abhinaba Sarkar 201405616 Malavika Reddy 201201193 Yash Khandelwal 201302164 Nikita Kad 201330030
  • 2. Description ● In computer science and mathematics, graphs are abstract data structures that model structural relationships among objects. They are now widely used for data modeling in application domains for which identifying relationship patterns, rules, and anomalies is useful. ● These domains include the web graph, social networks,etc. The ever-increasing size of graph- structured data for these applications creates a critical need for scalable systems that can process large amounts of it efficiently. ● The project aims at making a benchmarking tool for testing the performance of graph algorithms like BFS, Pagerank,etc. with MapReduce, Giraph, GraphLab and Neo4j and testing which approach works better on what kind of graphs.
  • 3. Motivation ● Analyze the runtime of different types of graph algorithms on different types of distributed systems. ● Performing computation on a graph data structure requires processing at each node. ● Each node contains node-specific data as well as links (edges) to other nodes. So computation must traverse the graph which will take a huge amount of time.
  • 4. Approach The BFS/SSSP algorithm is broken in 2 tasks: ● Map Task:In each Map task, we discover all the neighbors of the node currently in queue (we used color encoding GRAY for nodes in queue) and add them to our graph. ● Reduce Task:In each Reduce task, we set the correct level of the nodes and update the graph. The pagerank algorithm is also broken in 2 steps: ● Map Task: Each page emit its neighbours and current pagerank. ● Reduce Task: For each key(page) new page rank is calculated using pagerank emitted in the map task. ○ PR(A)=(1-d) + d(PR(T1)/C(T1) + ... +PR(Tn)/C(Tn)) Where - C(P) is the cardinality (out- degree) of page P, d is the damping (“random URL”) factor. Dijkstra: ● Map task : In each of the map tasks, neighbors are discovered and put into the queue with color coding gray. ● Reduce task : In each of the reduce tasks, we select the nodes according to the shortest distances from the current node.
  • 5. Approach contd. Giraph and Hadoop All the computations are done on a cluster of 2 nodes Graphlab All the computations are performed on single machine
  • 6. Applications In today’s world, dynamic social graphs (like: linkedin, twitter and facebook) are not feasible to process in single node. Therefore we need to benchmark the runtime of different graph algorithms in distributed system. Example graph: LinkedIn’s social graph
  • 7. Complexity ● BFS: The complexity of standard BFS algorithm is O(V+E) but because of the overhead of read/write in distributed computing, the order reaches O (E*Depth). ● Similar is the case for Dijkstra’s algorithm. But number of iterations will be higher than BFS. ● Page Rank: The Complexity of pagerank in distributed system is – (No. of Node + No. of Relations)*Iterations
  • 8. Benchmarking - Giraph Nodes Time 1000 4 min 7.836 sec 1 million 10 min 11.443sec Nodes Time 1000 3 min 5.655 sec 1 million 11 min 0.05 sec Nodes Time 1000 5 min 12.111 sec 1 million 16 min 8.652 sec BFS Dijkstra Pagerank
  • 9. Nodes Time 1000 6.029 sec 10,000 20.154 sec 1 million 1 min 11.124 sec Nodes Time 1000 4.852 sec 10,000 13.029 sec 1 million 1 min 10.576sec Page-Rank Dijkstra Benchmarking - Graphlab
  • 10. Benchmarking - Hadoop Nodes Time 1000 4 min 7.836 sec 1 million 10 min 11.443sec Nodes Time 1000 3 min 5.655 sec 1 million 11 min 0.05 sec BFS Dijkstra Pagerank Nodes Time 1000 5 min 12.111 sec 1 million 16 min 8.652 sec BFS and Dijkstra’s runtime depend on the depth of the input graph.
  • 11. Problems we faced ● Poor locality of memory access. ● Very little work per vertex. ● Changing degree of parallelism. ● Running over many machines makes the problem worse
  • 12. Conclusion and Future Work ● Although GraphLab is fast, there is constraint on memory as it requires as much memory to contain the edges and their associated values of any single vertex in the graph. ● From the experimental results, it is seen that the time taken for pagerank algorithm is directly proportional to the number of relations in the graph when the number of nodes and iterations are constant. This explains the huge difference in time. ● The runtime of BFS is directly proportional to the depth of the graph. So, greater the depth, more will be the number of iterations and hence more time. Future Work: Taking the input graph from file adds a huge overhead of reading and writing to files in each iteration, so if somehow we can store the graph and its properties in a Database, the read/write overhead will be gone and the query time will be reduced. So,we plan to include Database in it.