Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
http://bigdata.com 
http://mapgraph.io 
SYSTAP, LLC 
Graphs 
Graph Databases 
Graph Analytics on GPUs 
SYSTAP™, LLC 
© 200...
http://bigdata.com 
http://mapgraph.io 
Graphs 
• This talk is about recent advances in large scale graph processing on 
G...
Small Business, Founded 2006 100% Employee Owned 
http://bigdata.com 
http://mapgraph.io 
SYSTAP, LLC 
Graph Database 
• H...
Related “Graph” Technologies 
Redpoint repositions existing technology, 
adding interoperability for blueprints and 
greml...
Embedded, Single Server, HA, Scale-out 
• RDF/SPARQL 
• Property graphs 
– Blueprints, gremlin, rexter 
• REST API (NSS) 
...
http://bigdata.com 
http://mapgraph.io 
High Availability 
• Shared nothing architecture 
– Same data on each node 
– Coor...
Embedded, Single Server, HA, Scale-out 
http://bigdata.com 
http://mapgraph.io 
RDF Data and SPARQL Query 
Client Service ...
http://bigdata.com 
http://mapgraph.io 
And now on GPUs 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
8 
9/19/2014
Similar models, different problems 
• Graph query and graph analytics (traversal/mining) 
– Related data models 
– Very di...
Similar models, different problems 
• Graph query and graph analytics (traversal/mining) 
– Related data models 
– Very di...
Optimize for the right problem 
• Graph analytics 
– Parallelism – work must be distributed and balanced. 
– Memory bandwi...
GPUs – A Game Changer for Graph Analytics 
• Graphs are a hard problem 
• Non-locality 
• Data dependent parallelism 
• Me...
http://bigdata.com 
http://mapgraph.io 
GPU Hardware Trends 
• K40 GPU (today) 
• 12G RAM/GPU 
• 288 GB/s bandwidth 
• PCI...
Full Bandwidth Access to CPU RAM 
http://bigdata.com 
http://mapgraph.io 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
1...
Architecture shapes performance 
• The data was a scale-free graph with 2.7M vertices and 5.6M 
– MapGraph used a larger v...
http://bigdata.com 
http://mapgraph.io 
MapGraph 
Graph Processing on GPUs 
http://MapGraph.io 
SYSTAP™, LLC 
© 2006-2014 ...
http://bigdata.com 
http://mapgraph.io 
Think Like a Vertex 
• Simple APIs 
pageRank(Message m) { 
total = m.value(); 
ver...
GAS – a Graph-Parallel Abstraction 
• Graph-Parallel Vertex-Centric API ala GraphLab 
• “Think like a vertex” 
• Gather: c...
http://bigdata.com 
http://mapgraph.io 
MapGraph 
• High-level graph processing framework 
• High programmability 
GPU arc...
http://bigdata.com 
http://mapgraph.io 
MapGraph 
• High-level graph processing framework 
• High programmability 
GPU arc...
http://bigdata.com 
http://mapgraph.io 
Single GPU MapGraph (BFS) 
Dataset #vertices #edges Max Degree Milliseconds 
Webba...
BFS Results : MapGraph vs GraphLab 
1,000.00 
100.00 
10.00 
http://bigdata.com 
http://mapgraph.io 
1.00 
0.10 
Webbase D...
PageRank : MapGraph vs GraphLab 
100.00 
10.00 
http://bigdata.com 
http://mapgraph.io 
1.00 
0.10 
Webbase Delaunay Bitco...
Graph Mining on GPU Clusters 
• 2D partitioning (aka vertex cuts) 
• Minimizes the communication volume. 
• Batch parallel...
Accelerated Graph Analytics 
http://bigdata.com 
http://mapgraph.io 
SYSTAP™, LLC 
© 2006-2014 All Rights Reserved 
25 
9/...
• Work spans multiple orders of magnitude. 
10,000,000 
1,000,000 
100,000 
10,000 
1,000 
100 
http://bigdata.com 
http:/...
http://bigdata.com 
http://mapgraph.io 
Strong Scaling 
• Speedup on a constant problem size with more GPUs 
• Problem sca...
http://bigdata.com 
http://mapgraph.io 
Weak Scaling 
• Scaling the problem size with more GPUs 
35 
30 
25 
20 
15 
10 
5...
http://bigdata.com 
http://mapgraph.io 
Highlights 
• For algorithms on large graphs 
– Memory is the bottleneck 
• CPUs q...
http://bigdata.com 
http://mapgraph.io 
Bryan Thompson 
SYSTAP, LLC 
bryan@systap.com 
http://bigdata.com http://mapgraph....
Upcoming SlideShare
Loading in …5
×

Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

2,708 views

Published on

I will discuss current research on the MapGraph platform. MapGraph is a new and disruptive technology for ultra-fast processing of large graphs on commodity many-core hardware. On a single GPU you can analyze the bitcoin transaction graph in .35 seconds. With MapGraph on 64 NVIDIA K20 GPUs, you can traverse a scale-free graph of 4.3 billion directed edges in .13 seconds for a throughput of 32 Billion Traversed Edges Per Second (32 GTEPS). I will explain why GPUs are an interesting option for data intensive applications, how we map graphs onto many-core processors, and what the future looks like for the MapGraph platform.

MapGraph provides a familiar vertex-centric abstraction, but its GPU acceleration is 100s of times faster than main memory CPU-only technologies and up to 100,000 times faster than graph technologies based on MapReduce or key-value stores such as HBase, Titan, and Accumulo. Learn more at http://MapGraph.io.

Published in: Technology
  • Be the first to comment

Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

  1. 1. http://bigdata.com http://mapgraph.io SYSTAP, LLC Graphs Graph Databases Graph Analytics on GPUs SYSTAP™, LLC © 2006-2014 All Rights Reserved 1 9/19/2014
  2. 2. http://bigdata.com http://mapgraph.io Graphs • This talk is about recent advances in large scale graph processing on GPUs. – The motivation is extreme performance. – Everything we do (as a company) is focused on graphs. • Graph Database • Graph processing • Common characteristics: – irregular data shape, irregular access patterns, and irregular parallelism. • A lot of data can be mapped onto graphs – Sparse matrices and graphs are very close data structures – Graphs, as we deal with them, have attributes on vertices and edges. • A lot of algorithms can be mapped onto graphs – Including many machine learning algorithms. SYSTAP™, LLC © 2006-2014 All Rights Reserved 2 http://www.bigdata.com/blog
  3. 3. Small Business, Founded 2006 100% Employee Owned http://bigdata.com http://mapgraph.io SYSTAP, LLC Graph Database • High performance, Scalable – 50B edges/node – High level query language – Efficient Graph Traversal – High 9s solution • Open Source – Subscriptions GPU Analytics • Extreme Performance – 5-100x faster than graphlab – 10,000x faster than graphdbs • DARPA funding • Disruptive technology – Early adopters – Huge ROIs • Open Source SYSTAP™, LLC © 2006-2014 All Rights Reserved 3 9/19/2014
  4. 4. Related “Graph” Technologies Redpoint repositions existing technology, adding interoperability for blueprints and gremlin. http://bigdata.com http://mapgraph.io STTR Pair up bigdata and MapGraph MapGraph compares favorably with high end hardware solutions from YARC, Oracle, and SAP, but is open source and uses commodity hardware. SYSTAP™, LLC © 2006-2014 All Rights Reserved 4 9/19/2014
  5. 5. Embedded, Single Server, HA, Scale-out • RDF/SPARQL • Property graphs – Blueprints, gremlin, rexter • REST API (NSS) • Extension points – Stored queries for custom application logic on the server. – Custom services & indices – Custom functions – Vertex-centric programs http://bigdata.com http://mapgraph.io • Embedded Server Journal JVM • Standalone Server Journal WAR SYSTAP™, LLC © 2006-2014 All Rights Reserved 5 9/19/2014
  6. 6. http://bigdata.com http://mapgraph.io High Availability • Shared nothing architecture – Same data on each node – Coordinate only at commit – Transparent load balancing • Scaling – 50 billion triples or quads – Query throughput scales linearly • Self healing – Automatic failover – Automatic resync after disconnect – Online single node disaster recovery • Online Backup – Online snapshots (full backups) – HA Logs (incremental backups) • Point in time recovery (offline) HAService Quorum k=3 size=3 leader follower HAService HAService SYSTAP™, LLC © 2006-2014 All Rights Reserved 6 9/19/2014
  7. 7. Embedded, Single Server, HA, Scale-out http://bigdata.com http://mapgraph.io RDF Data and SPARQL Query Client Service Distributed Index Management and Query Management Functions Client Service Registrar Data Service Client Service Data Service Data Service Data Service Data Service Data Service Data Service Zookeeper Shard Locator Transaction Mgr Load Balancer Unified API Application Client Application Client Application Client Application Client Application Client Client Service SPARQL XML SPARQL JSON RDF/XML N-Triples N-Quads Turtle TriG RDF/JSON SYSTAP™, LLC © 2006-2014 All Rights Reserved 7 9/19/2014
  8. 8. http://bigdata.com http://mapgraph.io And now on GPUs SYSTAP™, LLC © 2006-2014 All Rights Reserved 8 9/19/2014
  9. 9. Similar models, different problems • Graph query and graph analytics (traversal/mining) – Related data models – Very different computational requirements • Many technologies are a bad match or limited solution – Key-value stores (bigtable, Accumulo, Cassandra, HBase) – Map-reduce • Anti-pattern – Dump all data into “big bucket” http://bigdata.com http://mapgraph.io SYSTAP™, LLC © 2006-2014 All Rights Reserved 9 9/19/2014
  10. 10. Similar models, different problems • Graph query and graph analytics (traversal/mining) – Related data models – Very different computational requirements • Many technologies are a bad match or limited solution – Key-value stores (bigtable, Accumulo, Cassandra, HBase) – Map-reduce • Anti-pattern – Dump all data into “big bucket” Storage and computation patterns must be correctly matched for high performance. http://bigdata.com http://mapgraph.io SYSTAP™, LLC © 2006-2014 All Rights Reserved 10 9/19/2014
  11. 11. Optimize for the right problem • Graph analytics – Parallelism – work must be distributed and balanced. – Memory bandwidth – memory, not disk, is the bottleneck – 2D partitioning – O(log(N)) communications pattern (versus O(N*N)) • 1D design looses locality when updating link weights for reverse indices. BFS PR • Storage and computation patterns must be correctly matched for high performance. http://bigdata.com http://mapgraph.io SYSTAP™, LLC © 2006-2014 All Rights Reserved 11 9/19/2014
  12. 12. GPUs – A Game Changer for Graph Analytics • Graphs are a hard problem • Non-locality • Data dependent parallelism • Memory, PCIe bus and network are bottlenecks • Recent performance gains driven by innovations in bottom-up search, data layout, and partitioning. • GPUs deliver effective parallelism • 10x CPU FLOPS • 10x CPU/RAM bandwidth • Significant speeds up over CPU • 3 GTEPS on one GPU • 32 GTEPS on 64 GPU cluster http://bigdata.com http://mapgraph.io 3500 3000 2500 2000 1500 1000 500 0 Breadth-First Search on Graphs NVIDIA Tesla C2050 Multicore per socket Sequential 0 2 1 10 100 1000 10000 100000 Million Traversed Edges per Second Average Traversal Depth 1 1 2 1 1 2 2 2 1 3 2 3 2 1 2 2 10x Speedup on GPUs
  13. 13. http://bigdata.com http://mapgraph.io GPU Hardware Trends • K40 GPU (today) • 12G RAM/GPU • 288 GB/s bandwidth • PCIe Gen 3 • Pascal GPU (Q1 2016) • 24G RAM/GPU • 1 TB/s bandwidth • Unified memory across CPU, GPUs SYSTAP™, LLC © 2006-2014 All Rights Reserved 13 9/19/2014
  14. 14. Full Bandwidth Access to CPU RAM http://bigdata.com http://mapgraph.io SYSTAP™, LLC © 2006-2014 All Rights Reserved 14 9/19/2014
  15. 15. Architecture shapes performance • The data was a scale-free graph with 2.7M vertices and 5.6M – MapGraph used a larger version of the graph (24M vertices, 25M edges) • The query was a 5-degree subgraph (depth-limited BFS) • Two main takeaways – Horizontal scaling for titan is very expensive – wrong abstraction. – GPUs are ridiculously fast. platform load (s) http://bigdata.com http://mapgraph.io query (ms) comments titan 497.00 935 4 node cluster using Cassandra neo4j 608.00 668 single node community edition bigdata 396.00 281 single node (open source) MapGraph 0.08 27 NVIDIA K20 GPU SYSTAP™, LLC © 2006-2014 All Rights Reserved 15 9/19/2014
  16. 16. http://bigdata.com http://mapgraph.io MapGraph Graph Processing on GPUs http://MapGraph.io SYSTAP™, LLC © 2006-2014 All Rights Reserved 16 9/19/2014
  17. 17. http://bigdata.com http://mapgraph.io Think Like a Vertex • Simple APIs pageRank(Message m) { total = m.value(); vertex.val = .15 * .85 + total; for(nbr : out_neighbors) { SendMsg(nbr, vertex.val/num_out_nbrs); } } • Lots of algorithms – BFS, SSSP, Page Rank, Connected Components, Louvain Modularity, Jaccard Distance, k-means clustering, Betweenness-Centrality, Personalized Page Rank, Loopy Belief Propagation, Graph search (crisp and approximate), etc. SYSTAP™, LLC © 2006-2014 All Rights Reserved 17 9/19/2014
  18. 18. GAS – a Graph-Parallel Abstraction • Graph-Parallel Vertex-Centric API ala GraphLab • “Think like a vertex” • Gather: collect information about my neighborhood • Apply: update my value • Scatter: signal adjacent vertices • Can write all sorts of graph algorithms this way – BFS, PageRank, Connected Component, Triangle Counting, Max Flow, etc. http://bigdata.com http://mapgraph.io SYSTAP™, LLC © 2006-2014 All Rights Reserved 18 9/19/2014
  19. 19. http://bigdata.com http://mapgraph.io MapGraph • High-level graph processing framework • High programmability GPU architecture Optimization techniques CUDA • High performance Comparable to low-level approach SYSTAP™, LLC © 2006-2014 All Rights Reserved 19 9/19/2014
  20. 20. http://bigdata.com http://mapgraph.io MapGraph • High-level graph processing framework • High programmability GPU architecture Optimization techniques CUDA • High performance Comparable to low-level approach SYSTAP™, LLC © 2006-2014 All Rights Reserved 20 9/19/2014
  21. 21. http://bigdata.com http://mapgraph.io Single GPU MapGraph (BFS) Dataset #vertices #edges Max Degree Milliseconds Webbase 1,000,005 3,105,536 23 1.2 Delaunay 2,097,152 6,291,408 4,700 24.5 Bitcoin 6,297,539 28,143,065 4,075,472 345.3 Wiki 3,566,907 45,030,389 7,061 51.0 Kron 1,048,576 89,239,674 131,505 47.7 154.0 513.6 74.8 821.3 1870.9 2,000 1,800 1,600 1,400 1,200 1,000 800 600 400 200 0 Webbase Delaunay Bitcoin Wiki Kron MTEPS SYSTAP™, LLC © 2006-2014 All Rights Reserved 21 9/19/2014
  22. 22. BFS Results : MapGraph vs GraphLab 1,000.00 100.00 10.00 http://bigdata.com http://mapgraph.io 1.00 0.10 Webbase Delaunay Bitcoin Wiki Kron Speedup MapGraph Speedup vs GraphLab (BFS) GL-2 GL-4 GL-8 GL-12 MPG SYSTAP™, LLC © 2006-2014 All Rights Reserved 22 9/19/2014
  23. 23. PageRank : MapGraph vs GraphLab 100.00 10.00 http://bigdata.com http://mapgraph.io 1.00 0.10 Webbase Delaunay Bitcoin Wiki Kron Speedup MapGraph Speedup vs GraphLab (Page Rank) GL-2 GL-4 GL-8 GL-12 MPG SYSTAP™, LLC © 2006-2014 All Rights Reserved 23 9/19/2014
  24. 24. Graph Mining on GPU Clusters • 2D partitioning (aka vertex cuts) • Minimizes the communication volume. • Batch parallel Gather in row, Scatter in column. http://bigdata.com http://mapgraph.io SYSTAP™, LLC © 2006-2014 All Rights Reserved 24 9/19/2014
  25. 25. Accelerated Graph Analytics http://bigdata.com http://mapgraph.io SYSTAP™, LLC © 2006-2014 All Rights Reserved 25 9/19/2014
  26. 26. • Work spans multiple orders of magnitude. 10,000,000 1,000,000 100,000 10,000 1,000 100 http://bigdata.com http://mapgraph.io 10 1 Scale 25 Traversal 0 1 2 3 4 5 6 Frontier Size Iteration SYSTAP™, LLC © 2006-2014 All Rights Reserved 26 9/19/2014
  27. 27. http://bigdata.com http://mapgraph.io Strong Scaling • Speedup on a constant problem size with more GPUs • Problem scale 25 – 2^25 vertices (33,554,432) – 2^26 directed edges (1,073,741,824) 23 22 21 20 19 18 17 16 15 14 13 10 20 30 40 50 60 70 GTEPS #GPUs Strong scaling GPUs GTEPS Time (s) 16 14.3 0.075 25 16.4 0.066 36 18.1 0.059 64 22.7 0.047 SYSTAP™, LLC © 2006-2014 All Rights Reserved 27 9/19/2014
  28. 28. http://bigdata.com http://mapgraph.io Weak Scaling • Scaling the problem size with more GPUs 35 30 25 20 15 10 5 0 1 4 16 64 GTEPS #GPUs Weak scaling GPUs Scale Vertices Edges Time (s) GTEPS 1 21 2,097,152 67,108,864 0.0254 3 4 23 8,388,608 268,435,456 0.0429 6 16 25 33,554,432 1,073,741,824 0.0715 15 64 27 134,217,728 4,294,967,296 0.1478 29 SYSTAP™, LLC © 2006-2014 All Rights Reserved 28 9/19/2014
  29. 29. http://bigdata.com http://mapgraph.io Highlights • For algorithms on large graphs – Memory is the bottleneck • CPUs quickly saturate the memory bus. • CPU cache thrashing limits scaling for graph traversal. • Continued performance gains for CPUs focus on reducing the #of visited edges to reduce bandwidth. • Hybrid CPU/GPU architectures offload either small degree vertices (reduce cache thrashing) or high degree vertices (if the algorithm is FLOPS bound on the CPU, e.g., BC) – Many core is the future. – GPUs are primarily known for their FLOPS, but they have high memory bandwidth and can deliver effective parallelism on parallel graph problems (with sophisticated kernels). • Scaling to very large graphs on large compute clusters – Communications bound. • Communication must be constant for perfect scaling – Hybrid partitioning seeks to reduce #of messages, size of messages, and optimize for asynchronous communications and degree-aware layouts for bottom-up search to reduce memory bandwidth. SYSTAP™, LLC © 2006-2014 All Rights Reserved 29 http://www.bigdata.com/blog
  30. 30. http://bigdata.com http://mapgraph.io Bryan Thompson SYSTAP, LLC bryan@systap.com http://bigdata.com http://mapgraph.io SYSTAP™, LLC © 2006-2014 All Rights Reserved 30 9/19/2014

×