http://bigdata.com 
http://mapgraph.io 
SYSTAP, LLC 
Graphs 
Graph Databases 
Graph Analytics on GPUs 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
1 
9/19/2014
http://bigdata.com 
http://mapgraph.io 
Graphs 
• This talk is about recent advances in large scale graph processing on 
GPUs. 
– The motivation is extreme performance. 
– Everything we do (as a company) is focused on graphs. 
• Graph Database 
• Graph processing 
• Common characteristics: 
– irregular data shape, irregular access patterns, and irregular parallelism. 
• A lot of data can be mapped onto graphs 
– Sparse matrices and graphs are very close data structures 
– Graphs, as we deal with them, have attributes on vertices and edges. 
• A lot of algorithms can be mapped onto graphs 
– Including many machine learning algorithms. 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
2 
http://www.bigdata.com/blog
Small Business, Founded 2006 100% Employee Owned 
http://bigdata.com 
http://mapgraph.io 
SYSTAP, LLC 
Graph Database 
• High performance, Scalable 
– 50B edges/node 
– High level query language 
– Efficient Graph Traversal 
– High 9s solution 
• Open Source 
– Subscriptions 
GPU Analytics 
• Extreme Performance 
– 5-100x faster than graphlab 
– 10,000x faster than graphdbs 
• DARPA funding 
• Disruptive technology 
– Early adopters 
– Huge ROIs 
• Open Source 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
3 
9/19/2014
Related “Graph” Technologies 
Redpoint repositions existing technology, 
adding interoperability for blueprints and 
gremlin. 
http://bigdata.com 
http://mapgraph.io 
STTR 
Pair up bigdata 
and MapGraph 
MapGraph compares favorably with high end 
hardware solutions from YARC, Oracle, and 
SAP, but is open source and uses commodity 
hardware. 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
4 
9/19/2014
Embedded, Single Server, HA, Scale-out 
• RDF/SPARQL 
• Property graphs 
– Blueprints, gremlin, rexter 
• REST API (NSS) 
• Extension points 
– Stored queries for custom 
application logic on the 
server. 
– Custom services & indices 
– Custom functions 
– Vertex-centric programs 
http://bigdata.com 
http://mapgraph.io 
• Embedded Server 
Journal 
JVM 
• Standalone Server 
Journal 
WAR 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
5 
9/19/2014
http://bigdata.com 
http://mapgraph.io 
High Availability 
• Shared nothing architecture 
– Same data on each node 
– Coordinate only at commit 
– Transparent load balancing 
• Scaling 
– 50 billion triples or quads 
– Query throughput scales linearly 
• Self healing 
– Automatic failover 
– Automatic resync after disconnect 
– Online single node disaster recovery 
• Online Backup 
– Online snapshots (full backups) 
– HA Logs (incremental backups) 
• Point in time recovery (offline) 
HAService 
Quorum 
k=3 
size=3 
leader 
follower 
HAService 
HAService 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
6 
9/19/2014
Embedded, Single Server, HA, Scale-out 
http://bigdata.com 
http://mapgraph.io 
RDF Data and SPARQL Query 
Client Service 
Distributed Index Management and Query 
Management Functions 
Client Service 
Registrar 
Data Service 
Client Service 
Data Service Data Service Data Service 
Data Service Data Service Data Service 
Zookeeper 
Shard Locator 
Transaction Mgr 
Load Balancer 
Unified API 
Application 
Client 
Application 
Client 
Application 
Client 
Application 
Client 
Application 
Client 
Client Service 
SPARQL XML 
SPARQL JSON 
RDF/XML 
N-Triples 
N-Quads 
Turtle 
TriG 
RDF/JSON 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
7 
9/19/2014
http://bigdata.com 
http://mapgraph.io 
And now on GPUs 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
8 
9/19/2014
Similar models, different problems 
• Graph query and graph analytics (traversal/mining) 
– Related data models 
– Very different computational requirements 
• Many technologies are a bad match or limited solution 
– Key-value stores (bigtable, Accumulo, Cassandra, HBase) 
– Map-reduce 
• Anti-pattern 
– Dump all data into “big bucket” 
http://bigdata.com 
http://mapgraph.io 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
9 
9/19/2014
Similar models, different problems 
• Graph query and graph analytics (traversal/mining) 
– Related data models 
– Very different computational requirements 
• Many technologies are a bad match or limited solution 
– Key-value stores (bigtable, Accumulo, Cassandra, HBase) 
– Map-reduce 
• Anti-pattern 
– Dump all data into “big bucket” 
Storage and computation patterns must be 
correctly matched for high performance. 
http://bigdata.com 
http://mapgraph.io 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
10 
9/19/2014
Optimize for the right problem 
• Graph analytics 
– Parallelism – work must be distributed and balanced. 
– Memory bandwidth – memory, not disk, is the bottleneck 
– 2D partitioning – O(log(N)) communications pattern (versus O(N*N)) 
• 1D design looses locality when updating link weights for reverse indices. 
BFS PR 
• Storage and computation patterns must be correctly matched 
for high performance. 
http://bigdata.com 
http://mapgraph.io 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
11 
9/19/2014
GPUs – A Game Changer for Graph Analytics 
• Graphs are a hard problem 
• Non-locality 
• Data dependent parallelism 
• Memory, PCIe bus and network are 
bottlenecks 
• Recent performance gains driven by 
innovations in bottom-up search, 
data layout, and partitioning. 
• GPUs deliver effective parallelism 
• 10x CPU FLOPS 
• 10x CPU/RAM bandwidth 
• Significant speeds up over CPU 
• 3 GTEPS on one GPU 
• 32 GTEPS on 64 GPU cluster 
http://bigdata.com 
http://mapgraph.io 
3500 
3000 
2500 
2000 
1500 
1000 
500 
0 
Breadth-First Search on Graphs 
NVIDIA Tesla C2050 
Multicore per socket 
Sequential 
0 
2 
1 10 100 1000 10000 100000 
Million Traversed Edges per 
Second 
Average Traversal Depth 
1 
1 
2 
1 
1 
2 
2 
2 
1 
3 
2 
3 
2 
1 2 
2 
10x Speedup on GPUs
http://bigdata.com 
http://mapgraph.io 
GPU Hardware Trends 
• K40 GPU (today) 
• 12G RAM/GPU 
• 288 GB/s bandwidth 
• PCIe Gen 3 
• Pascal GPU (Q1 2016) 
• 24G RAM/GPU 
• 1 TB/s bandwidth 
• Unified memory 
across CPU, GPUs 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
13 
9/19/2014
Full Bandwidth Access to CPU RAM 
http://bigdata.com 
http://mapgraph.io 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
14 
9/19/2014
Architecture shapes performance 
• The data was a scale-free graph with 2.7M vertices and 5.6M 
– MapGraph used a larger version of the graph (24M vertices, 25M edges) 
• The query was a 5-degree subgraph (depth-limited BFS) 
• Two main takeaways 
– Horizontal scaling for titan is very expensive – wrong abstraction. 
– GPUs are ridiculously fast. 
platform load (s) 
http://bigdata.com 
http://mapgraph.io 
query 
(ms) comments 
titan 497.00 935 4 node cluster using Cassandra 
neo4j 608.00 668 single node community edition 
bigdata 396.00 281 single node (open source) 
MapGraph 0.08 27 NVIDIA K20 GPU 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
15 
9/19/2014
http://bigdata.com 
http://mapgraph.io 
MapGraph 
Graph Processing on GPUs 
http://MapGraph.io 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
16 
9/19/2014
http://bigdata.com 
http://mapgraph.io 
Think Like a Vertex 
• Simple APIs 
pageRank(Message m) { 
total = m.value(); 
vertex.val = .15 * .85 + total; 
for(nbr : out_neighbors) { 
SendMsg(nbr, vertex.val/num_out_nbrs); 
} 
} 
• Lots of algorithms 
– BFS, SSSP, Page Rank, Connected Components, Louvain Modularity, 
Jaccard Distance, k-means clustering, Betweenness-Centrality, 
Personalized Page Rank, Loopy Belief Propagation, Graph search (crisp 
and approximate), etc. 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
17 
9/19/2014
GAS – a Graph-Parallel Abstraction 
• Graph-Parallel Vertex-Centric API ala GraphLab 
• “Think like a vertex” 
• Gather: collect information 
about my neighborhood 
• Apply: update my value 
• Scatter: signal adjacent vertices 
• Can write all sorts of graph algorithms this way 
– BFS, PageRank, Connected Component, Triangle Counting, Max Flow, 
etc. 
http://bigdata.com 
http://mapgraph.io 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
18 
9/19/2014
http://bigdata.com 
http://mapgraph.io 
MapGraph 
• High-level graph processing framework 
• High programmability 
GPU architecture 
Optimization techniques 
CUDA 
• High performance 
Comparable to low-level approach 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
19 
9/19/2014
http://bigdata.com 
http://mapgraph.io 
MapGraph 
• High-level graph processing framework 
• High programmability 
GPU architecture 
Optimization techniques 
CUDA 
• High performance 
Comparable to low-level approach 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
20 
9/19/2014
http://bigdata.com 
http://mapgraph.io 
Single GPU MapGraph (BFS) 
Dataset #vertices #edges Max Degree Milliseconds 
Webbase 1,000,005 3,105,536 23 1.2 
Delaunay 2,097,152 6,291,408 4,700 24.5 
Bitcoin 6,297,539 28,143,065 4,075,472 345.3 
Wiki 3,566,907 45,030,389 7,061 51.0 
Kron 1,048,576 89,239,674 131,505 47.7 
154.0 
513.6 
74.8 
821.3 
1870.9 
2,000 
1,800 
1,600 
1,400 
1,200 
1,000 
800 
600 
400 
200 
0 
Webbase Delaunay Bitcoin Wiki Kron 
MTEPS 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
21 
9/19/2014
BFS Results : MapGraph vs GraphLab 
1,000.00 
100.00 
10.00 
http://bigdata.com 
http://mapgraph.io 
1.00 
0.10 
Webbase Delaunay Bitcoin Wiki Kron 
Speedup 
MapGraph Speedup vs GraphLab (BFS) 
GL-2 
GL-4 
GL-8 
GL-12 
MPG 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
22 
9/19/2014
PageRank : MapGraph vs GraphLab 
100.00 
10.00 
http://bigdata.com 
http://mapgraph.io 
1.00 
0.10 
Webbase Delaunay Bitcoin Wiki Kron 
Speedup 
MapGraph Speedup vs GraphLab (Page Rank) 
GL-2 
GL-4 
GL-8 
GL-12 
MPG 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
23 
9/19/2014
Graph Mining on GPU Clusters 
• 2D partitioning (aka vertex cuts) 
• Minimizes the communication volume. 
• Batch parallel Gather in row, Scatter in 
column. 
http://bigdata.com 
http://mapgraph.io 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
24 
9/19/2014
Accelerated Graph Analytics 
http://bigdata.com 
http://mapgraph.io 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
25 
9/19/2014
• Work spans multiple orders of magnitude. 
10,000,000 
1,000,000 
100,000 
10,000 
1,000 
100 
http://bigdata.com 
http://mapgraph.io 
10 
1 
Scale 25 Traversal 
0 1 2 3 4 5 6 
Frontier Size 
Iteration 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
26 
9/19/2014
http://bigdata.com 
http://mapgraph.io 
Strong Scaling 
• Speedup on a constant problem size with more GPUs 
• Problem scale 25 
– 2^25 vertices (33,554,432) 
– 2^26 directed edges (1,073,741,824) 
23 
22 
21 
20 
19 
18 
17 
16 
15 
14 
13 
10 20 30 40 50 60 70 
GTEPS 
#GPUs 
Strong scaling 
GPUs GTEPS Time (s) 
16 14.3 0.075 
25 16.4 0.066 
36 18.1 0.059 
64 22.7 0.047 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
27 
9/19/2014
http://bigdata.com 
http://mapgraph.io 
Weak Scaling 
• Scaling the problem size with more GPUs 
35 
30 
25 
20 
15 
10 
5 
0 
1 4 16 64 
GTEPS 
#GPUs 
Weak scaling 
GPUs Scale Vertices Edges Time (s) GTEPS 
1 21 2,097,152 67,108,864 0.0254 3 
4 23 8,388,608 268,435,456 0.0429 6 
16 25 33,554,432 1,073,741,824 0.0715 15 
64 27 134,217,728 4,294,967,296 0.1478 29 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
28 
9/19/2014
http://bigdata.com 
http://mapgraph.io 
Highlights 
• For algorithms on large graphs 
– Memory is the bottleneck 
• CPUs quickly saturate the memory bus. 
• CPU cache thrashing limits scaling for graph traversal. 
• Continued performance gains for CPUs focus on reducing the #of visited edges to reduce 
bandwidth. 
• Hybrid CPU/GPU architectures offload either small degree vertices (reduce cache 
thrashing) or high degree vertices (if the algorithm is FLOPS bound on the CPU, e.g., BC) 
– Many core is the future. 
– GPUs are primarily known for their FLOPS, but they have high memory bandwidth 
and can deliver effective parallelism on parallel graph problems (with sophisticated 
kernels). 
• Scaling to very large graphs on large compute clusters 
– Communications bound. 
• Communication must be constant for perfect scaling 
– Hybrid partitioning seeks to reduce #of messages, size of messages, and optimize 
for asynchronous communications and degree-aware layouts for bottom-up search 
to reduce memory bandwidth. 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
29 
http://www.bigdata.com/blog
http://bigdata.com 
http://mapgraph.io 
Bryan Thompson 
SYSTAP, LLC 
bryan@systap.com 
http://bigdata.com http://mapgraph.io 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
30 
9/19/2014

Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

  • 1.
    http://bigdata.com http://mapgraph.io SYSTAP,LLC Graphs Graph Databases Graph Analytics on GPUs SYSTAP™, LLC © 2006-2014 All Rights Reserved 1 9/19/2014
  • 2.
    http://bigdata.com http://mapgraph.io Graphs • This talk is about recent advances in large scale graph processing on GPUs. – The motivation is extreme performance. – Everything we do (as a company) is focused on graphs. • Graph Database • Graph processing • Common characteristics: – irregular data shape, irregular access patterns, and irregular parallelism. • A lot of data can be mapped onto graphs – Sparse matrices and graphs are very close data structures – Graphs, as we deal with them, have attributes on vertices and edges. • A lot of algorithms can be mapped onto graphs – Including many machine learning algorithms. SYSTAP™, LLC © 2006-2014 All Rights Reserved 2 http://www.bigdata.com/blog
  • 3.
    Small Business, Founded2006 100% Employee Owned http://bigdata.com http://mapgraph.io SYSTAP, LLC Graph Database • High performance, Scalable – 50B edges/node – High level query language – Efficient Graph Traversal – High 9s solution • Open Source – Subscriptions GPU Analytics • Extreme Performance – 5-100x faster than graphlab – 10,000x faster than graphdbs • DARPA funding • Disruptive technology – Early adopters – Huge ROIs • Open Source SYSTAP™, LLC © 2006-2014 All Rights Reserved 3 9/19/2014
  • 4.
    Related “Graph” Technologies Redpoint repositions existing technology, adding interoperability for blueprints and gremlin. http://bigdata.com http://mapgraph.io STTR Pair up bigdata and MapGraph MapGraph compares favorably with high end hardware solutions from YARC, Oracle, and SAP, but is open source and uses commodity hardware. SYSTAP™, LLC © 2006-2014 All Rights Reserved 4 9/19/2014
  • 5.
    Embedded, Single Server,HA, Scale-out • RDF/SPARQL • Property graphs – Blueprints, gremlin, rexter • REST API (NSS) • Extension points – Stored queries for custom application logic on the server. – Custom services & indices – Custom functions – Vertex-centric programs http://bigdata.com http://mapgraph.io • Embedded Server Journal JVM • Standalone Server Journal WAR SYSTAP™, LLC © 2006-2014 All Rights Reserved 5 9/19/2014
  • 6.
    http://bigdata.com http://mapgraph.io HighAvailability • Shared nothing architecture – Same data on each node – Coordinate only at commit – Transparent load balancing • Scaling – 50 billion triples or quads – Query throughput scales linearly • Self healing – Automatic failover – Automatic resync after disconnect – Online single node disaster recovery • Online Backup – Online snapshots (full backups) – HA Logs (incremental backups) • Point in time recovery (offline) HAService Quorum k=3 size=3 leader follower HAService HAService SYSTAP™, LLC © 2006-2014 All Rights Reserved 6 9/19/2014
  • 7.
    Embedded, Single Server,HA, Scale-out http://bigdata.com http://mapgraph.io RDF Data and SPARQL Query Client Service Distributed Index Management and Query Management Functions Client Service Registrar Data Service Client Service Data Service Data Service Data Service Data Service Data Service Data Service Zookeeper Shard Locator Transaction Mgr Load Balancer Unified API Application Client Application Client Application Client Application Client Application Client Client Service SPARQL XML SPARQL JSON RDF/XML N-Triples N-Quads Turtle TriG RDF/JSON SYSTAP™, LLC © 2006-2014 All Rights Reserved 7 9/19/2014
  • 8.
    http://bigdata.com http://mapgraph.io Andnow on GPUs SYSTAP™, LLC © 2006-2014 All Rights Reserved 8 9/19/2014
  • 9.
    Similar models, differentproblems • Graph query and graph analytics (traversal/mining) – Related data models – Very different computational requirements • Many technologies are a bad match or limited solution – Key-value stores (bigtable, Accumulo, Cassandra, HBase) – Map-reduce • Anti-pattern – Dump all data into “big bucket” http://bigdata.com http://mapgraph.io SYSTAP™, LLC © 2006-2014 All Rights Reserved 9 9/19/2014
  • 10.
    Similar models, differentproblems • Graph query and graph analytics (traversal/mining) – Related data models – Very different computational requirements • Many technologies are a bad match or limited solution – Key-value stores (bigtable, Accumulo, Cassandra, HBase) – Map-reduce • Anti-pattern – Dump all data into “big bucket” Storage and computation patterns must be correctly matched for high performance. http://bigdata.com http://mapgraph.io SYSTAP™, LLC © 2006-2014 All Rights Reserved 10 9/19/2014
  • 11.
    Optimize for theright problem • Graph analytics – Parallelism – work must be distributed and balanced. – Memory bandwidth – memory, not disk, is the bottleneck – 2D partitioning – O(log(N)) communications pattern (versus O(N*N)) • 1D design looses locality when updating link weights for reverse indices. BFS PR • Storage and computation patterns must be correctly matched for high performance. http://bigdata.com http://mapgraph.io SYSTAP™, LLC © 2006-2014 All Rights Reserved 11 9/19/2014
  • 12.
    GPUs – AGame Changer for Graph Analytics • Graphs are a hard problem • Non-locality • Data dependent parallelism • Memory, PCIe bus and network are bottlenecks • Recent performance gains driven by innovations in bottom-up search, data layout, and partitioning. • GPUs deliver effective parallelism • 10x CPU FLOPS • 10x CPU/RAM bandwidth • Significant speeds up over CPU • 3 GTEPS on one GPU • 32 GTEPS on 64 GPU cluster http://bigdata.com http://mapgraph.io 3500 3000 2500 2000 1500 1000 500 0 Breadth-First Search on Graphs NVIDIA Tesla C2050 Multicore per socket Sequential 0 2 1 10 100 1000 10000 100000 Million Traversed Edges per Second Average Traversal Depth 1 1 2 1 1 2 2 2 1 3 2 3 2 1 2 2 10x Speedup on GPUs
  • 13.
    http://bigdata.com http://mapgraph.io GPUHardware Trends • K40 GPU (today) • 12G RAM/GPU • 288 GB/s bandwidth • PCIe Gen 3 • Pascal GPU (Q1 2016) • 24G RAM/GPU • 1 TB/s bandwidth • Unified memory across CPU, GPUs SYSTAP™, LLC © 2006-2014 All Rights Reserved 13 9/19/2014
  • 14.
    Full Bandwidth Accessto CPU RAM http://bigdata.com http://mapgraph.io SYSTAP™, LLC © 2006-2014 All Rights Reserved 14 9/19/2014
  • 15.
    Architecture shapes performance • The data was a scale-free graph with 2.7M vertices and 5.6M – MapGraph used a larger version of the graph (24M vertices, 25M edges) • The query was a 5-degree subgraph (depth-limited BFS) • Two main takeaways – Horizontal scaling for titan is very expensive – wrong abstraction. – GPUs are ridiculously fast. platform load (s) http://bigdata.com http://mapgraph.io query (ms) comments titan 497.00 935 4 node cluster using Cassandra neo4j 608.00 668 single node community edition bigdata 396.00 281 single node (open source) MapGraph 0.08 27 NVIDIA K20 GPU SYSTAP™, LLC © 2006-2014 All Rights Reserved 15 9/19/2014
  • 16.
    http://bigdata.com http://mapgraph.io MapGraph Graph Processing on GPUs http://MapGraph.io SYSTAP™, LLC © 2006-2014 All Rights Reserved 16 9/19/2014
  • 17.
    http://bigdata.com http://mapgraph.io ThinkLike a Vertex • Simple APIs pageRank(Message m) { total = m.value(); vertex.val = .15 * .85 + total; for(nbr : out_neighbors) { SendMsg(nbr, vertex.val/num_out_nbrs); } } • Lots of algorithms – BFS, SSSP, Page Rank, Connected Components, Louvain Modularity, Jaccard Distance, k-means clustering, Betweenness-Centrality, Personalized Page Rank, Loopy Belief Propagation, Graph search (crisp and approximate), etc. SYSTAP™, LLC © 2006-2014 All Rights Reserved 17 9/19/2014
  • 18.
    GAS – aGraph-Parallel Abstraction • Graph-Parallel Vertex-Centric API ala GraphLab • “Think like a vertex” • Gather: collect information about my neighborhood • Apply: update my value • Scatter: signal adjacent vertices • Can write all sorts of graph algorithms this way – BFS, PageRank, Connected Component, Triangle Counting, Max Flow, etc. http://bigdata.com http://mapgraph.io SYSTAP™, LLC © 2006-2014 All Rights Reserved 18 9/19/2014
  • 19.
    http://bigdata.com http://mapgraph.io MapGraph • High-level graph processing framework • High programmability GPU architecture Optimization techniques CUDA • High performance Comparable to low-level approach SYSTAP™, LLC © 2006-2014 All Rights Reserved 19 9/19/2014
  • 20.
    http://bigdata.com http://mapgraph.io MapGraph • High-level graph processing framework • High programmability GPU architecture Optimization techniques CUDA • High performance Comparable to low-level approach SYSTAP™, LLC © 2006-2014 All Rights Reserved 20 9/19/2014
  • 21.
    http://bigdata.com http://mapgraph.io SingleGPU MapGraph (BFS) Dataset #vertices #edges Max Degree Milliseconds Webbase 1,000,005 3,105,536 23 1.2 Delaunay 2,097,152 6,291,408 4,700 24.5 Bitcoin 6,297,539 28,143,065 4,075,472 345.3 Wiki 3,566,907 45,030,389 7,061 51.0 Kron 1,048,576 89,239,674 131,505 47.7 154.0 513.6 74.8 821.3 1870.9 2,000 1,800 1,600 1,400 1,200 1,000 800 600 400 200 0 Webbase Delaunay Bitcoin Wiki Kron MTEPS SYSTAP™, LLC © 2006-2014 All Rights Reserved 21 9/19/2014
  • 22.
    BFS Results :MapGraph vs GraphLab 1,000.00 100.00 10.00 http://bigdata.com http://mapgraph.io 1.00 0.10 Webbase Delaunay Bitcoin Wiki Kron Speedup MapGraph Speedup vs GraphLab (BFS) GL-2 GL-4 GL-8 GL-12 MPG SYSTAP™, LLC © 2006-2014 All Rights Reserved 22 9/19/2014
  • 23.
    PageRank : MapGraphvs GraphLab 100.00 10.00 http://bigdata.com http://mapgraph.io 1.00 0.10 Webbase Delaunay Bitcoin Wiki Kron Speedup MapGraph Speedup vs GraphLab (Page Rank) GL-2 GL-4 GL-8 GL-12 MPG SYSTAP™, LLC © 2006-2014 All Rights Reserved 23 9/19/2014
  • 24.
    Graph Mining onGPU Clusters • 2D partitioning (aka vertex cuts) • Minimizes the communication volume. • Batch parallel Gather in row, Scatter in column. http://bigdata.com http://mapgraph.io SYSTAP™, LLC © 2006-2014 All Rights Reserved 24 9/19/2014
  • 25.
    Accelerated Graph Analytics http://bigdata.com http://mapgraph.io SYSTAP™, LLC © 2006-2014 All Rights Reserved 25 9/19/2014
  • 26.
    • Work spansmultiple orders of magnitude. 10,000,000 1,000,000 100,000 10,000 1,000 100 http://bigdata.com http://mapgraph.io 10 1 Scale 25 Traversal 0 1 2 3 4 5 6 Frontier Size Iteration SYSTAP™, LLC © 2006-2014 All Rights Reserved 26 9/19/2014
  • 27.
    http://bigdata.com http://mapgraph.io StrongScaling • Speedup on a constant problem size with more GPUs • Problem scale 25 – 2^25 vertices (33,554,432) – 2^26 directed edges (1,073,741,824) 23 22 21 20 19 18 17 16 15 14 13 10 20 30 40 50 60 70 GTEPS #GPUs Strong scaling GPUs GTEPS Time (s) 16 14.3 0.075 25 16.4 0.066 36 18.1 0.059 64 22.7 0.047 SYSTAP™, LLC © 2006-2014 All Rights Reserved 27 9/19/2014
  • 28.
    http://bigdata.com http://mapgraph.io WeakScaling • Scaling the problem size with more GPUs 35 30 25 20 15 10 5 0 1 4 16 64 GTEPS #GPUs Weak scaling GPUs Scale Vertices Edges Time (s) GTEPS 1 21 2,097,152 67,108,864 0.0254 3 4 23 8,388,608 268,435,456 0.0429 6 16 25 33,554,432 1,073,741,824 0.0715 15 64 27 134,217,728 4,294,967,296 0.1478 29 SYSTAP™, LLC © 2006-2014 All Rights Reserved 28 9/19/2014
  • 29.
    http://bigdata.com http://mapgraph.io Highlights • For algorithms on large graphs – Memory is the bottleneck • CPUs quickly saturate the memory bus. • CPU cache thrashing limits scaling for graph traversal. • Continued performance gains for CPUs focus on reducing the #of visited edges to reduce bandwidth. • Hybrid CPU/GPU architectures offload either small degree vertices (reduce cache thrashing) or high degree vertices (if the algorithm is FLOPS bound on the CPU, e.g., BC) – Many core is the future. – GPUs are primarily known for their FLOPS, but they have high memory bandwidth and can deliver effective parallelism on parallel graph problems (with sophisticated kernels). • Scaling to very large graphs on large compute clusters – Communications bound. • Communication must be constant for perfect scaling – Hybrid partitioning seeks to reduce #of messages, size of messages, and optimize for asynchronous communications and degree-aware layouts for bottom-up search to reduce memory bandwidth. SYSTAP™, LLC © 2006-2014 All Rights Reserved 29 http://www.bigdata.com/blog
  • 30.
    http://bigdata.com http://mapgraph.io BryanThompson SYSTAP, LLC bryan@systap.com http://bigdata.com http://mapgraph.io SYSTAP™, LLC © 2006-2014 All Rights Reserved 30 9/19/2014