SlideShare a Scribd company logo
1 of 8
Download to read offline
BenchMarking Tool for
Graph Algorithms
IIIT-H Cloud Computing - Major Project
By:
Abhinaba Sarkar 201405616
Malavika Reddy 201201193
Yash Khandelwal 201302164
Nikita Kad 201330030
Description
● In computer science and mathematics, graphs are abstract data structures that model
structural relationships among objects. They are now widely used for data modeling in
application domains for which identifying relationship patterns, rules, and anomalies is useful.
● These domains include the web graph, social networks,etc. The ever-increasing size of graph-
structured data for these applications creates a critical need for scalable systems that can
process large amounts of it efficiently.
● The project aims at making a benchmarking tool for testing the performance of graph
algorithms like BFS, Pagerank,etc. with MapReduce, Giraph, GraphLab and Neo4j and testing
which approach works better on what kind of graphs.
Motivation
● Analyze the runtime of different types of graph algorithms on different
types of distributed systems.
● Performing computation on a graph data structure requires processing at
each node.
● Each node contains node-specific data as well as links (edges) to other
nodes. So computation must traverse the graph which will take a huge
amount of time.
Approach
The BFS/SSSP algorithm is broken in 2 tasks:
● Map Task:In each Map task, we discover all the neighbors of the node currently in queue (we
used color encoding GRAY for nodes in queue) and add them to our graph.
● Reduce Task:In each Reduce task, we set the correct level of the nodes and update the graph.
The pagerank algorithm is also broken in 2 steps:
● Map Task: Each page emit its neighbours and current pagerank.
● Reduce Task: For each key(page) new page rank is calculated using pagerank emitted in the
map task.
○ PR(A)=(1-d) + d(PR(T1)/C(T1) + ... +PR(Tn)/C(Tn)) Where - C(P) is the cardinality (out-
degree) of page P, d is the damping (“random URL”) factor.
Dijkstra:
● Map task : In each of the map tasks, neighbors are discovered and put into
the queue with color coding gray.
● Reduce task : In each of the reduce tasks, we select the nodes according to
the shortest distances from the current node.
Applications
In today’s world, dynamic social graphs (like:
linkedin, twitter and facebook) are not feasible to
process in single node. Therefore we need to
benchmark the runtime of different graph
algorithms in distributed system.
Example graph: LinkedIn’s social graph
Comlexity
● BFS: The complexity of standard BFS algorithm is O(V+E) but because of
the overhead of read/write in distributed computing, the order reaches O
(E*Depth).
● Similar is the case for Dijkstra’s algorithm. But number of iterations will be
higher than BFS.
● Page Rank: The Complexity of pagerank in distributed system is –
(No. of Node + No. of Relations)*Iterations
Graph-Lab:
Nodes Time
1000 6.029 sec
10,000 20.154 sec
1 million 1 min 11.124
sec
Nodes Time
1000 4.852 sec
10,000 13.029 sec
1 million 1 min 10.576sec
Page-Rank
Dijkstra
Benchmarking
Conclusion and Future Work
From the experimental results, it is seen that the time taken for pagerank algorithm is directly
proportional to the number of relations in the graph when the number of nodes and iterations are
constant. This explains the huge difference in time
The runtime of BFS is directly proportional to the depth of the graph. So, greater the depth, more will
be the number of iterations and hence more time.
Future Work:
Taking the input graph from file adds a huge overhead of reading and writing to files in each
iteration, so if somehow we can store the graph and its properties in a Database, the read/write
overhead will be gone and the query time will be reduced. So,we plan to include Database in it.

More Related Content

What's hot

Dynamic Mapping of Raster Data (IV 2009)
Dynamic Mapping of Raster Data (IV 2009)Dynamic Mapping of Raster Data (IV 2009)
Dynamic Mapping of Raster Data (IV 2009)Matthias Trapp
 
Analysis of Impact of Graph Theory in Computer Application
Analysis of Impact of Graph Theory in Computer ApplicationAnalysis of Impact of Graph Theory in Computer Application
Analysis of Impact of Graph Theory in Computer ApplicationIRJET Journal
 
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...I3E Technologies
 
System architecture
System architectureSystem architecture
System architectureSanjay Raj
 
How Rough Is Your Runway?
How Rough Is Your Runway? How Rough Is Your Runway?
How Rough Is Your Runway? Safe Software
 
Meta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
Meta-MapReduce- A Technique for Reducing Communication in MapReduce ComputationsMeta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
Meta-MapReduce- A Technique for Reducing Communication in MapReduce ComputationsShantanu Sharma
 
Improvement of Spatial Data Quality Using the Data Conflation
Improvement of Spatial Data Quality Using the Data ConflationImprovement of Spatial Data Quality Using the Data Conflation
Improvement of Spatial Data Quality Using the Data ConflationBeniamino Murgante
 
5 spatial data editing
5 spatial data editing5 spatial data editing
5 spatial data editinganita bodke
 
Par add shared ifc parameters
Par add shared ifc parametersPar add shared ifc parameters
Par add shared ifc parametersMenno Mekes
 
Rosaic: A Round-wise Fair Scheduling Approach for Mobile Clouds Based on Task...
Rosaic: A Round-wise Fair Scheduling Approach for Mobile Clouds Based on Task...Rosaic: A Round-wise Fair Scheduling Approach for Mobile Clouds Based on Task...
Rosaic: A Round-wise Fair Scheduling Approach for Mobile Clouds Based on Task...Mahmud Hossain
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyUsing R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyGuy Lansley
 
Dr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GISDr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GISShaun Lewis
 
Synthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modelingSynthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modelingGraph-TA
 
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...Ural-PDC
 
Graphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platformsGraphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platformsGraph-TA
 

What's hot (20)

Dynamic Mapping of Raster Data (IV 2009)
Dynamic Mapping of Raster Data (IV 2009)Dynamic Mapping of Raster Data (IV 2009)
Dynamic Mapping of Raster Data (IV 2009)
 
3D Analyst
3D Analyst3D Analyst
3D Analyst
 
Analysis of Impact of Graph Theory in Computer Application
Analysis of Impact of Graph Theory in Computer ApplicationAnalysis of Impact of Graph Theory in Computer Application
Analysis of Impact of Graph Theory in Computer Application
 
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
 
System architecture
System architectureSystem architecture
System architecture
 
How Rough Is Your Runway?
How Rough Is Your Runway? How Rough Is Your Runway?
How Rough Is Your Runway?
 
Meta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
Meta-MapReduce- A Technique for Reducing Communication in MapReduce ComputationsMeta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
Meta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
 
Iccsa stankuteha180611
Iccsa stankuteha180611Iccsa stankuteha180611
Iccsa stankuteha180611
 
Improvement of Spatial Data Quality Using the Data Conflation
Improvement of Spatial Data Quality Using the Data ConflationImprovement of Spatial Data Quality Using the Data Conflation
Improvement of Spatial Data Quality Using the Data Conflation
 
5 spatial data editing
5 spatial data editing5 spatial data editing
5 spatial data editing
 
Par add shared ifc parameters
Par add shared ifc parametersPar add shared ifc parameters
Par add shared ifc parameters
 
Rosaic: A Round-wise Fair Scheduling Approach for Mobile Clouds Based on Task...
Rosaic: A Round-wise Fair Scheduling Approach for Mobile Clouds Based on Task...Rosaic: A Round-wise Fair Scheduling Approach for Mobile Clouds Based on Task...
Rosaic: A Round-wise Fair Scheduling Approach for Mobile Clouds Based on Task...
 
Resume
ResumeResume
Resume
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyUsing R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
 
Dr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GISDr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GIS
 
Synthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modelingSynthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modeling
 
Maps with leafletR
Maps with leafletRMaps with leafletR
Maps with leafletR
 
BarnieMAT
BarnieMATBarnieMAT
BarnieMAT
 
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
 
Graphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platformsGraphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platforms
 

Viewers also liked

10 motivi per scegliere emc vspex con xtremio
10 motivi per scegliere emc vspex con xtremio10 motivi per scegliere emc vspex con xtremio
10 motivi per scegliere emc vspex con xtremioMaticmind
 
BioJET előállítás növényi olajokból
BioJET előállítás növényi olajokbólBioJET előállítás növényi olajokból
BioJET előállítás növényi olajokbólSnowcrowmisi
 
Brian Kerr & Associates - Marketing
Brian Kerr & Associates - MarketingBrian Kerr & Associates - Marketing
Brian Kerr & Associates - MarketingBrian Kerr
 
2012 taiwan mission
2012 taiwan mission2012 taiwan mission
2012 taiwan missionStephenyesu
 
Cyber Monday Hour by Hour
Cyber Monday Hour by HourCyber Monday Hour by Hour
Cyber Monday Hour by HourEbates.com
 
Dalayil khatm e nabuwat maa radd e qadyaniyat
Dalayil khatm e nabuwat maa radd e qadyaniyatDalayil khatm e nabuwat maa radd e qadyaniyat
Dalayil khatm e nabuwat maa radd e qadyaniyatsunninews92
 
التقنيات المساندة لذوي الإحتياجات الخاصة
التقنيات المساندة لذوي الإحتياجات الخاصة التقنيات المساندة لذوي الإحتياجات الخاصة
التقنيات المساندة لذوي الإحتياجات الخاصة raz20
 

Viewers also liked (18)

Innovative lesson-plan
Innovative lesson-planInnovative lesson-plan
Innovative lesson-plan
 
Taiwo Oluwadare
Taiwo OluwadareTaiwo Oluwadare
Taiwo Oluwadare
 
10 motivi per scegliere emc vspex con xtremio
10 motivi per scegliere emc vspex con xtremio10 motivi per scegliere emc vspex con xtremio
10 motivi per scegliere emc vspex con xtremio
 
BioJET előállítás növényi olajokból
BioJET előállítás növényi olajokbólBioJET előállítás növényi olajokból
BioJET előállítás növényi olajokból
 
דוח סופי
דוח סופידוח סופי
דוח סופי
 
Final PPP
Final PPPFinal PPP
Final PPP
 
Brian Kerr & Associates - Marketing
Brian Kerr & Associates - MarketingBrian Kerr & Associates - Marketing
Brian Kerr & Associates - Marketing
 
Ashrafi anthem
Ashrafi anthemAshrafi anthem
Ashrafi anthem
 
2012 taiwan mission
2012 taiwan mission2012 taiwan mission
2012 taiwan mission
 
Tik kelompok2 bab 3
Tik kelompok2 bab 3Tik kelompok2 bab 3
Tik kelompok2 bab 3
 
Untitled Presentation
Untitled PresentationUntitled Presentation
Untitled Presentation
 
Cyber Monday Hour by Hour
Cyber Monday Hour by HourCyber Monday Hour by Hour
Cyber Monday Hour by Hour
 
Sk1 kd1-1-kebutuhan-manusia
Sk1 kd1-1-kebutuhan-manusiaSk1 kd1-1-kebutuhan-manusia
Sk1 kd1-1-kebutuhan-manusia
 
Class 9
Class 9 Class 9
Class 9
 
Android platform
Android platform Android platform
Android platform
 
The history of the aqualand
The history of the aqualandThe history of the aqualand
The history of the aqualand
 
Dalayil khatm e nabuwat maa radd e qadyaniyat
Dalayil khatm e nabuwat maa radd e qadyaniyatDalayil khatm e nabuwat maa radd e qadyaniyat
Dalayil khatm e nabuwat maa radd e qadyaniyat
 
التقنيات المساندة لذوي الإحتياجات الخاصة
التقنيات المساندة لذوي الإحتياجات الخاصة التقنيات المساندة لذوي الإحتياجات الخاصة
التقنيات المساندة لذوي الإحتياجات الخاصة
 

Similar to Benchmarking tool for graph algorithms

STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...Subhajit Sahu
 
Streaming Python on Hadoop
Streaming Python on HadoopStreaming Python on Hadoop
Streaming Python on HadoopVivian S. Zhang
 
Benchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsBenchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsYash Khandelwal
 
Scheduling Algorithm Based Simulator for Resource Allocation Task in Cloud Co...
Scheduling Algorithm Based Simulator for Resource Allocation Task in Cloud Co...Scheduling Algorithm Based Simulator for Resource Allocation Task in Cloud Co...
Scheduling Algorithm Based Simulator for Resource Allocation Task in Cloud Co...IRJET Journal
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyKyong-Ha Lee
 
Chapter 3 principles of parallel algorithm design
Chapter 3   principles of parallel algorithm designChapter 3   principles of parallel algorithm design
Chapter 3 principles of parallel algorithm designDenisAkbar1
 
Map reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsNishant Gandhi
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce scriptHaripritha
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.pptCheeWeiTan10
 
user_defined_functions_forinterpolation
user_defined_functions_forinterpolationuser_defined_functions_forinterpolation
user_defined_functions_forinterpolationsushanth tiruvaipati
 
Classified 3d Model Retrieval Based on Cascaded Fusion of Local Descriptors
Classified 3d Model Retrieval Based on Cascaded Fusion of Local Descriptors  Classified 3d Model Retrieval Based on Cascaded Fusion of Local Descriptors
Classified 3d Model Retrieval Based on Cascaded Fusion of Local Descriptors ijcga
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspectiveপল্লব রায়
 
How to Automate CAD & GIS Integration
How to Automate CAD & GIS IntegrationHow to Automate CAD & GIS Integration
How to Automate CAD & GIS IntegrationSafe Software
 
A Subgraph Pattern Search over Graph Databases
A Subgraph Pattern Search over Graph DatabasesA Subgraph Pattern Search over Graph Databases
A Subgraph Pattern Search over Graph DatabasesIJMER
 
Introduction to Data Structures Sorting and searching
Introduction to Data Structures Sorting and searchingIntroduction to Data Structures Sorting and searching
Introduction to Data Structures Sorting and searchingMvenkatarao
 

Similar to Benchmarking tool for graph algorithms (20)

STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
 
Streaming Python on Hadoop
Streaming Python on HadoopStreaming Python on Hadoop
Streaming Python on Hadoop
 
Benchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsBenchmarking tool for graph algorithms
Benchmarking tool for graph algorithms
 
Scheduling Algorithm Based Simulator for Resource Allocation Task in Cloud Co...
Scheduling Algorithm Based Simulator for Resource Allocation Task in Cloud Co...Scheduling Algorithm Based Simulator for Resource Allocation Task in Cloud Co...
Scheduling Algorithm Based Simulator for Resource Allocation Task in Cloud Co...
 
Main map reduce
Main map reduceMain map reduce
Main map reduce
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
 
Chap3 slides
Chap3 slidesChap3 slides
Chap3 slides
 
Chapter 3 principles of parallel algorithm design
Chapter 3   principles of parallel algorithm designChapter 3   principles of parallel algorithm design
Chapter 3 principles of parallel algorithm design
 
Map reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problems
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce script
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
user_defined_functions_forinterpolation
user_defined_functions_forinterpolationuser_defined_functions_forinterpolation
user_defined_functions_forinterpolation
 
Classified 3d Model Retrieval Based on Cascaded Fusion of Local Descriptors
Classified 3d Model Retrieval Based on Cascaded Fusion of Local Descriptors  Classified 3d Model Retrieval Based on Cascaded Fusion of Local Descriptors
Classified 3d Model Retrieval Based on Cascaded Fusion of Local Descriptors
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspective
 
How to Automate CAD & GIS Integration
How to Automate CAD & GIS IntegrationHow to Automate CAD & GIS Integration
How to Automate CAD & GIS Integration
 
A Subgraph Pattern Search over Graph Databases
A Subgraph Pattern Search over Graph DatabasesA Subgraph Pattern Search over Graph Databases
A Subgraph Pattern Search over Graph Databases
 
Introduction to Data Structures Sorting and searching
Introduction to Data Structures Sorting and searchingIntroduction to Data Structures Sorting and searching
Introduction to Data Structures Sorting and searching
 
Pregel
PregelPregel
Pregel
 

Recently uploaded

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Benchmarking tool for graph algorithms

  • 1. BenchMarking Tool for Graph Algorithms IIIT-H Cloud Computing - Major Project By: Abhinaba Sarkar 201405616 Malavika Reddy 201201193 Yash Khandelwal 201302164 Nikita Kad 201330030
  • 2. Description ● In computer science and mathematics, graphs are abstract data structures that model structural relationships among objects. They are now widely used for data modeling in application domains for which identifying relationship patterns, rules, and anomalies is useful. ● These domains include the web graph, social networks,etc. The ever-increasing size of graph- structured data for these applications creates a critical need for scalable systems that can process large amounts of it efficiently. ● The project aims at making a benchmarking tool for testing the performance of graph algorithms like BFS, Pagerank,etc. with MapReduce, Giraph, GraphLab and Neo4j and testing which approach works better on what kind of graphs.
  • 3. Motivation ● Analyze the runtime of different types of graph algorithms on different types of distributed systems. ● Performing computation on a graph data structure requires processing at each node. ● Each node contains node-specific data as well as links (edges) to other nodes. So computation must traverse the graph which will take a huge amount of time.
  • 4. Approach The BFS/SSSP algorithm is broken in 2 tasks: ● Map Task:In each Map task, we discover all the neighbors of the node currently in queue (we used color encoding GRAY for nodes in queue) and add them to our graph. ● Reduce Task:In each Reduce task, we set the correct level of the nodes and update the graph. The pagerank algorithm is also broken in 2 steps: ● Map Task: Each page emit its neighbours and current pagerank. ● Reduce Task: For each key(page) new page rank is calculated using pagerank emitted in the map task. ○ PR(A)=(1-d) + d(PR(T1)/C(T1) + ... +PR(Tn)/C(Tn)) Where - C(P) is the cardinality (out- degree) of page P, d is the damping (“random URL”) factor. Dijkstra: ● Map task : In each of the map tasks, neighbors are discovered and put into the queue with color coding gray. ● Reduce task : In each of the reduce tasks, we select the nodes according to the shortest distances from the current node.
  • 5. Applications In today’s world, dynamic social graphs (like: linkedin, twitter and facebook) are not feasible to process in single node. Therefore we need to benchmark the runtime of different graph algorithms in distributed system. Example graph: LinkedIn’s social graph
  • 6. Comlexity ● BFS: The complexity of standard BFS algorithm is O(V+E) but because of the overhead of read/write in distributed computing, the order reaches O (E*Depth). ● Similar is the case for Dijkstra’s algorithm. But number of iterations will be higher than BFS. ● Page Rank: The Complexity of pagerank in distributed system is – (No. of Node + No. of Relations)*Iterations
  • 7. Graph-Lab: Nodes Time 1000 6.029 sec 10,000 20.154 sec 1 million 1 min 11.124 sec Nodes Time 1000 4.852 sec 10,000 13.029 sec 1 million 1 min 10.576sec Page-Rank Dijkstra Benchmarking
  • 8. Conclusion and Future Work From the experimental results, it is seen that the time taken for pagerank algorithm is directly proportional to the number of relations in the graph when the number of nodes and iterations are constant. This explains the huge difference in time The runtime of BFS is directly proportional to the depth of the graph. So, greater the depth, more will be the number of iterations and hence more time. Future Work: Taking the input graph from file adds a huge overhead of reading and writing to files in each iteration, so if somehow we can store the graph and its properties in a Database, the read/write overhead will be gone and the query time will be reduced. So,we plan to include Database in it.