SlideShare a Scribd company logo
1 of 22
Mihai Capotã, Arnau Prat, Peter Boncz, Hassan Chafi
Yong Guo,
Ana Lucia Varbanescu,
Graphalytics: Benchmarking Graph-Processing Platforms
LDBC TUC Meeting
UPC Barcelona, March 2016
GRAPHALYTICS
A Big Data Benchmark for Graph-Processing Platforms
1
http://bl.ocks.org/mbostock/4062045
Tim Hegeman,
Wing Lung Ngai,
https://github.com/tudelft-atlarge/graphalytics/
GRAPHALYTICS was made
possible by a generous
contribution from Oracle.
Alexandru Iosup,
Stijn Heldens,
Graphs at the Core of Our Society:
The LinkedIn ExampleData Deluge
2
Sources: Vincenzo Cosenza, The State of LinkedIn, http://vincos.it/the-state-of-linkedin/
via Christopher Penn, http://www.shiftcomm.com/2014/02/state-linkedin-social-media-dark-horse/
Apr 2014
400
Nov 2015
400
LinkedIn Is Not Unique: Data Deluge
270M MAU
200+ avg followers
>54B edges
1.2B MAU 0.8B DAU
200+ avg followers
>240B edges
company/day:
100+ posts, 1,000+ comments
IBM 280k employee-
-users, 2.6M followers
Graph Processing @large
4
A Graph Processing
Platform
Streaming not considered in this presentation.
Interactive processing not considered in this presentation.
AlgorithmETL
Active Storage
(filtering, compression,
replication, caching)
Distribution
to processing
platform
Graph Processing @large
5
A Graph Processing
Platform
Streaming not considered in this presentation.
Interactive processing not considered in this presentation.
AlgorithmETL
Active Storage
(filtering, compression,
replication, caching)
Distribution
to processing
platform
Ideally,
N cores/disks
 Nx faster
Ideally,
N cores/disks
 Nx faster
Graph Processing @large
6
A Graph Processing
Platform
Streaming not considered in this presentation.
Interactive processing not considered in this presentation.
AlgorithmETL
Active Storage
(filtering, compression,
replication, caching)
Distribution
to processing
platform
Ideally,
N cores/disks
 Nx faster
Ideally,
N cores/disks
 Nx faster
Compute-intesive workload
different/more complex analysis  ?x slower
Dataset-dependent workload
unfriendly graphs  ??x slower
Data-intesive workload
10x graph size  100x—1,000x slower
Graph-Processing Platforms
• Platform: the combined hardware, software, and
programming system that is being used to complete
a graph processing task
7
Trinity
2
Which to choose?
What to tune?
Graphalytics, in a nutshell
• An LDBC benchmark*
• Advanced benchmarking harness
• Diverse real and synthetic datasets
• Many classes of algorithms
• Granula for manual choke-point analysis
• Modern software engineering practices
• Supports many platforms
• Enables comparison of
community-driven and industrial systems
8
http://graphalytics.ewi.tudelft.nl
https://github.com/tudelft-atlarge/graphalytics/
Benchmarking Harness
9
Iosup et al. LDBC Graphalytics: A Benchmark for Large
Scale Graph Analysis on Parallel and Distributed Platform (submitted).
Graphalytics = Representative
Classes of Algorithms
and Datasets
• 2-stage selection process of algorithms datasets
10
Class Examples %
Graph Statistics Diameter, Local Clust. Coeff., PageRank 20
Graph Traversal BFS, SSSP, DFS 50
Connected Comp. Reachability, BiCC, Weakly CC 10
Community
Detection
Clustering, Nearest Neighbor,
Community Detection w Label Propagation
5
Other Sampling, Partitioning <15
Guo et al. How Well do Graph-Processing Platforms Perform? An Empirical
Performance Evaluation and Analysis, IPDPS’14.
+ weighted graphs: Single-Source Shortest Paths (~35%)
Graphalytics = Distributed Graph
Generation w DATAGEN
Person
Generation
Edge
Generation
Activity
Generation
“Knows”
graph
serializa
tion
Activity
serializa
tion
Graphalytics
11
• Rich set of configurations
• More diverse degree distribution than Graph500
• Realistic clustering coefficient and assortativity
Level of Detail
Graphalytics = Portable
Perf. Analysis w Granula
Graph Processing System
Logging Patch
Performance
Analyzer
Granula
Performance
Archive
Granula
Performance
Model
Modeling
Archiving
logs
rules
Granula
Archiver
Sharing
Monitoring
Minimal code invasion + automated data collection at runtime
+ portable archive (+ web UI)  portable bottleneck analysis
Graphalytics = Diverse Set of
Automated Experiments
Category Experiment Algo. Data Nodes/
Threads
Metrics
Baseline Dataset variety BFS,PR All 1 Run, norm.
Algorithm variety All R4(S),
D300(L)
1 Runtime
Scalability Vertical vs. horiz. BFS, PR D300(L),
D1000(XL)
1—16/1—32 Runtime, S
Weak vs. strong BFS, PR G22(S)—
G26(XL)
1—16 Runtime, S
Robustness Stress test BFS All 1 SLA met
Variability BFS D300(L),
D1000(L)
1/16 CV
Self-Test Time to run/part -- Datagen 1—16 Runtime
13
Implementation status
Map
Red
uce
2
Gir
ap
h
Gra
ph
X
Pow
erGr
aph
Graph
Lab
Neo4j PG
X.D
Gra
ph
Mat
Ope
nG
TOTE
M
Map
Graph
M
ed
us
a
LCC G G G G G G -- G G -- -- --
BFS G G G G G G G G G V V V
WC
C
G G G G G G G G G V V V
CDL
P
G G G G G G G G G -- -- --
P’R
ank
-- G G G V -- G G G V V V
SSS
P
-- G G G -- -- G G G -- -- --
https://github.com/tudelft-atlarge/graphalytics/
G=validated, on GitHub
V=validation stage
Implementation status
Map
Red
uce
2
Gir
ap
h
Gra
ph
X
Pow
erGr
aph
Graph
Lab
Neo4j PG
X.D
Gra
ph
Mat
Ope
nG
TOTE
M
Map
Graph
M
ed
us
a
LCC G G G G G G -- G G -- -- --
BFS G G G G G G G G G V V V
WC
C
G G G G G G G G G V V V
CDL
P
G G G G G G G G G -- -- --
P’R
ank
-- G G G V -- G G G V V V
SSS
P
-- G G G -- -- G G G -- -- --
Benchmarking and tuning performed by vendors
G=validated, on GitHub
V=validation stage
Graphalytics Capabilities: An Example
16
Graphalytics enables deep comparison of many systems
at once, through diverse experiments and metrics
Your system here!
Diverse algorithms Diverse metrics
Diverse
datasets
Processing time (s) + Edges[+Vertices]/s
17
Which system is the best?
It depends…
Algorithm + Dataset + Metric
OK, but … why is this system better
for this workload for this metric?
Granula Visualizer
Portable choke-point analysis for everyone!
Graphalytics = Modern Software
Engineering Process
• Graphalytics code reviews
• Internal release to LDBC partners (first, Feb 2015; last, Feb 2016)
• Public release, announced first through LDBC (Apr 2015)
• First full benchmark specification, LDBC criteria (Q1 2016)
• Jenkins continuous integration server
• SonarQube software quality analyzer
19https://github.com/tudelft-atlarge/graphalytics/
Graphalytics, in the future
• An LDBC benchmark*
• Advanced benchmarking harness
• Diverse real and synthetic datasets
• Many classes of algorithms
• Granula for manual choke-point analysis
• Modern software engineering practices
• Supports many platforms
• Enables comparison of
community-driven and industrial systems
20
github.com/tudelft-atlarge/graphalytics/
+ more data generation
+ deeper performance metrics
+ choke-point analysis
PELGA – Performance Engineering for
Large-scale Graph Analytics,
workshop with EuroPar 2016
21
22

More Related Content

What's hot

Benchmarking Tool for Graph Algorithms
Benchmarking Tool for Graph AlgorithmsBenchmarking Tool for Graph Algorithms
Benchmarking Tool for Graph AlgorithmsYash Khandelwal
 
SexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial DataSexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial DataCharalampos (Babis) Nikolaou
 
GDB in SV_1st_meetup_09082016
GDB in SV_1st_meetup_09082016GDB in SV_1st_meetup_09082016
GDB in SV_1st_meetup_09082016Joshua Bae
 
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...Evangelos Kalampokis
 
OpenCube Workshop at eGov2015 & ePart2015 dual conference
OpenCube Workshop at eGov2015 & ePart2015 dual conferenceOpenCube Workshop at eGov2015 & ePart2015 dual conference
OpenCube Workshop at eGov2015 & ePart2015 dual conferenceEvangelos Kalampokis
 
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)Rich Harris
 
GI2016 ppt shi (big data analytics on the internet)
GI2016 ppt shi (big data analytics on the internet)GI2016 ppt shi (big data analytics on the internet)
GI2016 ppt shi (big data analytics on the internet)IGN Vorstand
 
Distributed R: The Next Generation Platform for Predictive Analytics
Distributed R: The Next Generation Platform for Predictive AnalyticsDistributed R: The Next Generation Platform for Predictive Analytics
Distributed R: The Next Generation Platform for Predictive AnalyticsJorge Martinez de Salinas
 
Streaming Weather Data from Web APIs to Jupyter through Kafka
Streaming Weather Data from Web APIs to Jupyter through KafkaStreaming Weather Data from Web APIs to Jupyter through Kafka
Streaming Weather Data from Web APIs to Jupyter through KafkaLeo Salemann
 
Spatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use CasesSpatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use Casesmathieuraj
 
Field Data Collecting, Processing and Sharing: Using web Service Technologies
Field Data Collecting, Processing and Sharing: Using web Service TechnologiesField Data Collecting, Processing and Sharing: Using web Service Technologies
Field Data Collecting, Processing and Sharing: Using web Service TechnologiesNiroshan Sanjaya
 
CKANへの空間情報機能拡張実装の試み
CKANへの空間情報機能拡張実装の試みCKANへの空間情報機能拡張実装の試み
CKANへの空間情報機能拡張実装の試みYoichi Kayama
 
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram SriharshaMagellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram SriharshaSpark Summit
 

What's hot (18)

Benchmarking Tool for Graph Algorithms
Benchmarking Tool for Graph AlgorithmsBenchmarking Tool for Graph Algorithms
Benchmarking Tool for Graph Algorithms
 
SexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial DataSexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial Data
 
2014 july use_r
2014 july use_r2014 july use_r
2014 july use_r
 
Prague Hacks 2015
Prague Hacks 2015Prague Hacks 2015
Prague Hacks 2015
 
GDB in SV_1st_meetup_09082016
GDB in SV_1st_meetup_09082016GDB in SV_1st_meetup_09082016
GDB in SV_1st_meetup_09082016
 
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
 
OpenCube Workshop at eGov2015 & ePart2015 dual conference
OpenCube Workshop at eGov2015 & ePart2015 dual conferenceOpenCube Workshop at eGov2015 & ePart2015 dual conference
OpenCube Workshop at eGov2015 & ePart2015 dual conference
 
Data_Size_statistics
Data_Size_statisticsData_Size_statistics
Data_Size_statistics
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
 
GI2016 ppt shi (big data analytics on the internet)
GI2016 ppt shi (big data analytics on the internet)GI2016 ppt shi (big data analytics on the internet)
GI2016 ppt shi (big data analytics on the internet)
 
Distributed R: The Next Generation Platform for Predictive Analytics
Distributed R: The Next Generation Platform for Predictive AnalyticsDistributed R: The Next Generation Platform for Predictive Analytics
Distributed R: The Next Generation Platform for Predictive Analytics
 
Streaming Weather Data from Web APIs to Jupyter through Kafka
Streaming Weather Data from Web APIs to Jupyter through KafkaStreaming Weather Data from Web APIs to Jupyter through Kafka
Streaming Weather Data from Web APIs to Jupyter through Kafka
 
Spatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use CasesSpatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use Cases
 
Field Data Collecting, Processing and Sharing: Using web Service Technologies
Field Data Collecting, Processing and Sharing: Using web Service TechnologiesField Data Collecting, Processing and Sharing: Using web Service Technologies
Field Data Collecting, Processing and Sharing: Using web Service Technologies
 
Introduction To R
Introduction To RIntroduction To R
Introduction To R
 
CKANへの空間情報機能拡張実装の試み
CKANへの空間情報機能拡張実装の試みCKANへの空間情報機能拡張実装の試み
CKANへの空間情報機能拡張実装の試み
 
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram SriharshaMagellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
 

Viewers also liked

Modelling the Clustering Coefficient of a Random graph
Modelling the Clustering Coefficient of a Random graphModelling the Clustering Coefficient of a Random graph
Modelling the Clustering Coefficient of a Random graphGraph-TA
 
Holistic Benchmarking of Big Linked Data: HOBBIT
Holistic Benchmarking of Big Linked Data: HOBBITHolistic Benchmarking of Big Linked Data: HOBBIT
Holistic Benchmarking of Big Linked Data: HOBBITGraph-TA
 
Benchmarking Versioning for Big Linked Data
Benchmarking Versioning for Big Linked DataBenchmarking Versioning for Big Linked Data
Benchmarking Versioning for Big Linked DataGraph-TA
 
Use of Graphs for Cloud Service Selection in Multi-Cloud Environments
Use of Graphs for Cloud Service Selection in Multi-Cloud EnvironmentsUse of Graphs for Cloud Service Selection in Multi-Cloud Environments
Use of Graphs for Cloud Service Selection in Multi-Cloud EnvironmentsGraph-TA
 
Polyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivotPolyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivotGraph-TA
 
The scarcity of crossing dependencies: a direct outcome of a specific constra...
The scarcity of crossing dependencies: a direct outcome of a specific constra...The scarcity of crossing dependencies: a direct outcome of a specific constra...
The scarcity of crossing dependencies: a direct outcome of a specific constra...Graph-TA
 
Using Evolutionary Computing for Feature-driven Graph generation
Using Evolutionary Computing for Feature-driven Graph generationUsing Evolutionary Computing for Feature-driven Graph generation
Using Evolutionary Computing for Feature-driven Graph generationGraph-TA
 
Identifiability in Dynamic Casual Networks
Identifiability in Dynamic Casual NetworksIdentifiability in Dynamic Casual Networks
Identifiability in Dynamic Casual NetworksGraph-TA
 
Synthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modelingSynthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modelingGraph-TA
 
Computing on Event-sourced Graphs
Computing on Event-sourced GraphsComputing on Event-sourced Graphs
Computing on Event-sourced GraphsGraph-TA
 
Big data Career Opportunuties
Big data  Career OpportunutiesBig data  Career Opportunuties
Big data Career OpportunutiesDevashish Mishra
 
Demystifying Distributed Graph Processing
Demystifying Distributed Graph ProcessingDemystifying Distributed Graph Processing
Demystifying Distributed Graph ProcessingVasia Kalavri
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBernard Marr
 
Company Profile PT SKY LAB
Company Profile PT SKY LABCompany Profile PT SKY LAB
Company Profile PT SKY LABGatot Wahyu
 
05 questões comentadas (bônus)
05 questões comentadas (bônus)05 questões comentadas (bônus)
05 questões comentadas (bônus)Português em Foco
 
Foro de Innovación en Red. II Beyond Internet Barcelona. Ponencia 'Experienci...
Foro de Innovación en Red. II Beyond Internet Barcelona. Ponencia 'Experienci...Foro de Innovación en Red. II Beyond Internet Barcelona. Ponencia 'Experienci...
Foro de Innovación en Red. II Beyond Internet Barcelona. Ponencia 'Experienci...iSOCO
 

Viewers also liked (20)

Modelling the Clustering Coefficient of a Random graph
Modelling the Clustering Coefficient of a Random graphModelling the Clustering Coefficient of a Random graph
Modelling the Clustering Coefficient of a Random graph
 
Holistic Benchmarking of Big Linked Data: HOBBIT
Holistic Benchmarking of Big Linked Data: HOBBITHolistic Benchmarking of Big Linked Data: HOBBIT
Holistic Benchmarking of Big Linked Data: HOBBIT
 
Benchmarking Versioning for Big Linked Data
Benchmarking Versioning for Big Linked DataBenchmarking Versioning for Big Linked Data
Benchmarking Versioning for Big Linked Data
 
Use of Graphs for Cloud Service Selection in Multi-Cloud Environments
Use of Graphs for Cloud Service Selection in Multi-Cloud EnvironmentsUse of Graphs for Cloud Service Selection in Multi-Cloud Environments
Use of Graphs for Cloud Service Selection in Multi-Cloud Environments
 
Polyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivotPolyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivot
 
The scarcity of crossing dependencies: a direct outcome of a specific constra...
The scarcity of crossing dependencies: a direct outcome of a specific constra...The scarcity of crossing dependencies: a direct outcome of a specific constra...
The scarcity of crossing dependencies: a direct outcome of a specific constra...
 
Using Evolutionary Computing for Feature-driven Graph generation
Using Evolutionary Computing for Feature-driven Graph generationUsing Evolutionary Computing for Feature-driven Graph generation
Using Evolutionary Computing for Feature-driven Graph generation
 
Identifiability in Dynamic Casual Networks
Identifiability in Dynamic Casual NetworksIdentifiability in Dynamic Casual Networks
Identifiability in Dynamic Casual Networks
 
Synthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modelingSynthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modeling
 
Computing on Event-sourced Graphs
Computing on Event-sourced GraphsComputing on Event-sourced Graphs
Computing on Event-sourced Graphs
 
Big data Career Opportunuties
Big data  Career OpportunutiesBig data  Career Opportunuties
Big data Career Opportunuties
 
Big Data
Big DataBig Data
Big Data
 
Demystifying Distributed Graph Processing
Demystifying Distributed Graph ProcessingDemystifying Distributed Graph Processing
Demystifying Distributed Graph Processing
 
Oracle Big Data Cloud Serviceのご紹介
Oracle Big Data Cloud Serviceのご紹介Oracle Big Data Cloud Serviceのご紹介
Oracle Big Data Cloud Serviceのご紹介
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 
Mejores Imagenes(10) Maiwald
Mejores  Imagenes(10) MaiwaldMejores  Imagenes(10) Maiwald
Mejores Imagenes(10) Maiwald
 
Company Profile PT SKY LAB
Company Profile PT SKY LABCompany Profile PT SKY LAB
Company Profile PT SKY LAB
 
05 questões comentadas (bônus)
05 questões comentadas (bônus)05 questões comentadas (bônus)
05 questões comentadas (bônus)
 
Foro de Innovación en Red. II Beyond Internet Barcelona. Ponencia 'Experienci...
Foro de Innovación en Red. II Beyond Internet Barcelona. Ponencia 'Experienci...Foro de Innovación en Red. II Beyond Internet Barcelona. Ponencia 'Experienci...
Foro de Innovación en Red. II Beyond Internet Barcelona. Ponencia 'Experienci...
 
Monitor
MonitorMonitor
Monitor
 

Similar to Graphalytics: A big data benchmark for graph-processing platforms

Scaling graph investigations with Math, GPUs, & Experts
Scaling graph investigations with Math, GPUs, & ExpertsScaling graph investigations with Math, GPUs, & Experts
Scaling graph investigations with Math, GPUs, & Expertsgraphistry
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016Tanya Cashorali
 
H2O at Poznan R Meetup
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R MeetupJo-fai Chow
 
Efficient And Invincible Big Data Platform In LINE
Efficient And Invincible Big Data Platform In LINEEfficient And Invincible Big Data Platform In LINE
Efficient And Invincible Big Data Platform In LINELINE Corporation
 
Scaling Spatial Analytics with Google Cloud & CARTO
Scaling Spatial Analytics with Google Cloud & CARTOScaling Spatial Analytics with Google Cloud & CARTO
Scaling Spatial Analytics with Google Cloud & CARTOCARTO
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poliivascucristian
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...TigerGraph
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFKeith Kraus
 
Graph protocol for accessing information about blockchains and d apps
Graph protocol for accessing information about blockchains and d appsGraph protocol for accessing information about blockchains and d apps
Graph protocol for accessing information about blockchains and d appsGene Leybzon
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataInside Analysis
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesKonstantinos Xirogiannopoulos
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesPyData
 
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...Subhajit Sahu
 
Overview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis ToolsOverview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis ToolsKeiichiro Ono
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentationtestSri1
 
H2O at Berlin R Meetup
H2O at Berlin R MeetupH2O at Berlin R Meetup
H2O at Berlin R MeetupJo-fai Chow
 
Berlin R Meetup
Berlin R MeetupBerlin R Meetup
Berlin R MeetupSri Ambati
 

Similar to Graphalytics: A big data benchmark for graph-processing platforms (20)

Scaling graph investigations with Math, GPUs, & Experts
Scaling graph investigations with Math, GPUs, & ExpertsScaling graph investigations with Math, GPUs, & Experts
Scaling graph investigations with Math, GPUs, & Experts
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
 
H2O at Poznan R Meetup
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R Meetup
 
Efficient And Invincible Big Data Platform In LINE
Efficient And Invincible Big Data Platform In LINEEfficient And Invincible Big Data Platform In LINE
Efficient And Invincible Big Data Platform In LINE
 
Scaling Spatial Analytics with Google Cloud & CARTO
Scaling Spatial Analytics with Google Cloud & CARTOScaling Spatial Analytics with Google Cloud & CARTO
Scaling Spatial Analytics with Google Cloud & CARTO
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
Graph protocol for accessing information about blockchains and d apps
Graph protocol for accessing information about blockchains and d appsGraph protocol for accessing information about blockchains and d apps
Graph protocol for accessing information about blockchains and d apps
 
Visual Network Analysis
Visual Network AnalysisVisual Network Analysis
Visual Network Analysis
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
 
Overview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis ToolsOverview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis Tools
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
 
H2O at Berlin R Meetup
H2O at Berlin R MeetupH2O at Berlin R Meetup
H2O at Berlin R Meetup
 
Berlin R Meetup
Berlin R MeetupBerlin R Meetup
Berlin R Meetup
 

More from Graph-TA

RDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsRDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsGraph-TA
 
GRAPHITE — An Extensible Graph Traversal Framework for RDBMS
GRAPHITE — An Extensible Graph Traversal Framework for RDBMSGRAPHITE — An Extensible Graph Traversal Framework for RDBMS
GRAPHITE — An Extensible Graph Traversal Framework for RDBMSGraph-TA
 
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphsOn the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphsGraph-TA
 
Autograph: an evolving lightweight graph tool
Autograph: an evolving lightweight graph toolAutograph: an evolving lightweight graph tool
Autograph: an evolving lightweight graph toolGraph-TA
 
Understanding Graph Structure in Knowledge Bases
Understanding Graph Structure in Knowledge BasesUnderstanding Graph Structure in Knowledge Bases
Understanding Graph Structure in Knowledge BasesGraph-TA
 
Finding patterns of chronic disease and medication prescriptions from a large...
Finding patterns of chronic disease and medication prescriptions from a large...Finding patterns of chronic disease and medication prescriptions from a large...
Finding patterns of chronic disease and medication prescriptions from a large...Graph-TA
 
Recent Updates on IBM System G — GraphBIG and Temporal Data
Recent Updates on IBM System G — GraphBIG and Temporal DataRecent Updates on IBM System G — GraphBIG and Temporal Data
Recent Updates on IBM System G — GraphBIG and Temporal DataGraph-TA
 
Analysing the degree distribution of real graphs by means of several probabil...
Analysing the degree distribution of real graphs by means of several probabil...Analysing the degree distribution of real graphs by means of several probabil...
Analysing the degree distribution of real graphs by means of several probabil...Graph-TA
 
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...Graph-TA
 
Generating synthetic online social network graph data and topologies
Generating synthetic online social network graph data and topologiesGenerating synthetic online social network graph data and topologies
Generating synthetic online social network graph data and topologiesGraph-TA
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataGraph-TA
 
Managing RDF data with graph databases
Managing RDF data with graph databasesManaging RDF data with graph databases
Managing RDF data with graph databasesGraph-TA
 
Graph Based Word Spotting Approach for Large Document Collections
Graph Based Word Spotting Approach for Large Document CollectionsGraph Based Word Spotting Approach for Large Document Collections
Graph Based Word Spotting Approach for Large Document CollectionsGraph-TA
 
Use of graphs for political analysis
Use of graphs for political analysisUse of graphs for political analysis
Use of graphs for political analysisGraph-TA
 
Graphium Chrysalis: Exploiting Graph Database
Graphium Chrysalis: Exploiting Graph DatabaseGraphium Chrysalis: Exploiting Graph Database
Graphium Chrysalis: Exploiting Graph DatabaseGraph-TA
 
Langford sequences through a product of labeled digraphs
Langford sequences through a product of labeled digraphsLangford sequences through a product of labeled digraphs
Langford sequences through a product of labeled digraphsGraph-TA
 

More from Graph-TA (16)

RDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsRDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL Platforms
 
GRAPHITE — An Extensible Graph Traversal Framework for RDBMS
GRAPHITE — An Extensible Graph Traversal Framework for RDBMSGRAPHITE — An Extensible Graph Traversal Framework for RDBMS
GRAPHITE — An Extensible Graph Traversal Framework for RDBMS
 
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphsOn the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
 
Autograph: an evolving lightweight graph tool
Autograph: an evolving lightweight graph toolAutograph: an evolving lightweight graph tool
Autograph: an evolving lightweight graph tool
 
Understanding Graph Structure in Knowledge Bases
Understanding Graph Structure in Knowledge BasesUnderstanding Graph Structure in Knowledge Bases
Understanding Graph Structure in Knowledge Bases
 
Finding patterns of chronic disease and medication prescriptions from a large...
Finding patterns of chronic disease and medication prescriptions from a large...Finding patterns of chronic disease and medication prescriptions from a large...
Finding patterns of chronic disease and medication prescriptions from a large...
 
Recent Updates on IBM System G — GraphBIG and Temporal Data
Recent Updates on IBM System G — GraphBIG and Temporal DataRecent Updates on IBM System G — GraphBIG and Temporal Data
Recent Updates on IBM System G — GraphBIG and Temporal Data
 
Analysing the degree distribution of real graphs by means of several probabil...
Analysing the degree distribution of real graphs by means of several probabil...Analysing the degree distribution of real graphs by means of several probabil...
Analysing the degree distribution of real graphs by means of several probabil...
 
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...
 
Generating synthetic online social network graph data and topologies
Generating synthetic online social network graph data and topologiesGenerating synthetic online social network graph data and topologies
Generating synthetic online social network graph data and topologies
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF Data
 
Managing RDF data with graph databases
Managing RDF data with graph databasesManaging RDF data with graph databases
Managing RDF data with graph databases
 
Graph Based Word Spotting Approach for Large Document Collections
Graph Based Word Spotting Approach for Large Document CollectionsGraph Based Word Spotting Approach for Large Document Collections
Graph Based Word Spotting Approach for Large Document Collections
 
Use of graphs for political analysis
Use of graphs for political analysisUse of graphs for political analysis
Use of graphs for political analysis
 
Graphium Chrysalis: Exploiting Graph Database
Graphium Chrysalis: Exploiting Graph DatabaseGraphium Chrysalis: Exploiting Graph Database
Graphium Chrysalis: Exploiting Graph Database
 
Langford sequences through a product of labeled digraphs
Langford sequences through a product of labeled digraphsLangford sequences through a product of labeled digraphs
Langford sequences through a product of labeled digraphs
 

Recently uploaded

CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture designssuser87fa0c1
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage examplePragyanshuParadkar1
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Effects of rheological properties on mixing
Effects of rheological properties on mixingEffects of rheological properties on mixing
Effects of rheological properties on mixingviprabot1
 

Recently uploaded (20)

CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture design
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage example
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Effects of rheological properties on mixing
Effects of rheological properties on mixingEffects of rheological properties on mixing
Effects of rheological properties on mixing
 

Graphalytics: A big data benchmark for graph-processing platforms

  • 1. Mihai Capotã, Arnau Prat, Peter Boncz, Hassan Chafi Yong Guo, Ana Lucia Varbanescu, Graphalytics: Benchmarking Graph-Processing Platforms LDBC TUC Meeting UPC Barcelona, March 2016 GRAPHALYTICS A Big Data Benchmark for Graph-Processing Platforms 1 http://bl.ocks.org/mbostock/4062045 Tim Hegeman, Wing Lung Ngai, https://github.com/tudelft-atlarge/graphalytics/ GRAPHALYTICS was made possible by a generous contribution from Oracle. Alexandru Iosup, Stijn Heldens,
  • 2. Graphs at the Core of Our Society: The LinkedIn ExampleData Deluge 2 Sources: Vincenzo Cosenza, The State of LinkedIn, http://vincos.it/the-state-of-linkedin/ via Christopher Penn, http://www.shiftcomm.com/2014/02/state-linkedin-social-media-dark-horse/ Apr 2014 400 Nov 2015 400
  • 3. LinkedIn Is Not Unique: Data Deluge 270M MAU 200+ avg followers >54B edges 1.2B MAU 0.8B DAU 200+ avg followers >240B edges company/day: 100+ posts, 1,000+ comments IBM 280k employee- -users, 2.6M followers
  • 4. Graph Processing @large 4 A Graph Processing Platform Streaming not considered in this presentation. Interactive processing not considered in this presentation. AlgorithmETL Active Storage (filtering, compression, replication, caching) Distribution to processing platform
  • 5. Graph Processing @large 5 A Graph Processing Platform Streaming not considered in this presentation. Interactive processing not considered in this presentation. AlgorithmETL Active Storage (filtering, compression, replication, caching) Distribution to processing platform Ideally, N cores/disks  Nx faster Ideally, N cores/disks  Nx faster
  • 6. Graph Processing @large 6 A Graph Processing Platform Streaming not considered in this presentation. Interactive processing not considered in this presentation. AlgorithmETL Active Storage (filtering, compression, replication, caching) Distribution to processing platform Ideally, N cores/disks  Nx faster Ideally, N cores/disks  Nx faster Compute-intesive workload different/more complex analysis  ?x slower Dataset-dependent workload unfriendly graphs  ??x slower Data-intesive workload 10x graph size  100x—1,000x slower
  • 7. Graph-Processing Platforms • Platform: the combined hardware, software, and programming system that is being used to complete a graph processing task 7 Trinity 2 Which to choose? What to tune?
  • 8. Graphalytics, in a nutshell • An LDBC benchmark* • Advanced benchmarking harness • Diverse real and synthetic datasets • Many classes of algorithms • Granula for manual choke-point analysis • Modern software engineering practices • Supports many platforms • Enables comparison of community-driven and industrial systems 8 http://graphalytics.ewi.tudelft.nl https://github.com/tudelft-atlarge/graphalytics/
  • 9. Benchmarking Harness 9 Iosup et al. LDBC Graphalytics: A Benchmark for Large Scale Graph Analysis on Parallel and Distributed Platform (submitted).
  • 10. Graphalytics = Representative Classes of Algorithms and Datasets • 2-stage selection process of algorithms datasets 10 Class Examples % Graph Statistics Diameter, Local Clust. Coeff., PageRank 20 Graph Traversal BFS, SSSP, DFS 50 Connected Comp. Reachability, BiCC, Weakly CC 10 Community Detection Clustering, Nearest Neighbor, Community Detection w Label Propagation 5 Other Sampling, Partitioning <15 Guo et al. How Well do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis, IPDPS’14. + weighted graphs: Single-Source Shortest Paths (~35%)
  • 11. Graphalytics = Distributed Graph Generation w DATAGEN Person Generation Edge Generation Activity Generation “Knows” graph serializa tion Activity serializa tion Graphalytics 11 • Rich set of configurations • More diverse degree distribution than Graph500 • Realistic clustering coefficient and assortativity Level of Detail
  • 12. Graphalytics = Portable Perf. Analysis w Granula Graph Processing System Logging Patch Performance Analyzer Granula Performance Archive Granula Performance Model Modeling Archiving logs rules Granula Archiver Sharing Monitoring Minimal code invasion + automated data collection at runtime + portable archive (+ web UI)  portable bottleneck analysis
  • 13. Graphalytics = Diverse Set of Automated Experiments Category Experiment Algo. Data Nodes/ Threads Metrics Baseline Dataset variety BFS,PR All 1 Run, norm. Algorithm variety All R4(S), D300(L) 1 Runtime Scalability Vertical vs. horiz. BFS, PR D300(L), D1000(XL) 1—16/1—32 Runtime, S Weak vs. strong BFS, PR G22(S)— G26(XL) 1—16 Runtime, S Robustness Stress test BFS All 1 SLA met Variability BFS D300(L), D1000(L) 1/16 CV Self-Test Time to run/part -- Datagen 1—16 Runtime 13
  • 14. Implementation status Map Red uce 2 Gir ap h Gra ph X Pow erGr aph Graph Lab Neo4j PG X.D Gra ph Mat Ope nG TOTE M Map Graph M ed us a LCC G G G G G G -- G G -- -- -- BFS G G G G G G G G G V V V WC C G G G G G G G G G V V V CDL P G G G G G G G G G -- -- -- P’R ank -- G G G V -- G G G V V V SSS P -- G G G -- -- G G G -- -- -- https://github.com/tudelft-atlarge/graphalytics/ G=validated, on GitHub V=validation stage
  • 15. Implementation status Map Red uce 2 Gir ap h Gra ph X Pow erGr aph Graph Lab Neo4j PG X.D Gra ph Mat Ope nG TOTE M Map Graph M ed us a LCC G G G G G G -- G G -- -- -- BFS G G G G G G G G G V V V WC C G G G G G G G G G V V V CDL P G G G G G G G G G -- -- -- P’R ank -- G G G V -- G G G V V V SSS P -- G G G -- -- G G G -- -- -- Benchmarking and tuning performed by vendors G=validated, on GitHub V=validation stage
  • 16. Graphalytics Capabilities: An Example 16 Graphalytics enables deep comparison of many systems at once, through diverse experiments and metrics Your system here! Diverse algorithms Diverse metrics Diverse datasets
  • 17. Processing time (s) + Edges[+Vertices]/s 17 Which system is the best? It depends… Algorithm + Dataset + Metric OK, but … why is this system better for this workload for this metric?
  • 18. Granula Visualizer Portable choke-point analysis for everyone!
  • 19. Graphalytics = Modern Software Engineering Process • Graphalytics code reviews • Internal release to LDBC partners (first, Feb 2015; last, Feb 2016) • Public release, announced first through LDBC (Apr 2015) • First full benchmark specification, LDBC criteria (Q1 2016) • Jenkins continuous integration server • SonarQube software quality analyzer 19https://github.com/tudelft-atlarge/graphalytics/
  • 20. Graphalytics, in the future • An LDBC benchmark* • Advanced benchmarking harness • Diverse real and synthetic datasets • Many classes of algorithms • Granula for manual choke-point analysis • Modern software engineering practices • Supports many platforms • Enables comparison of community-driven and industrial systems 20 github.com/tudelft-atlarge/graphalytics/ + more data generation + deeper performance metrics + choke-point analysis
  • 21. PELGA – Performance Engineering for Large-scale Graph Analytics, workshop with EuroPar 2016 21
  • 22. 22