Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
SYSTAP LLC | © 2006-2015 All Rights Reserved
MLConf @ NYC: March 27, 2015
SYSTAP LLC | © 2006-2015 All Rights Reserved
Graph Databases Grew at Over 500% in the Last Two Years
Popularity changes pe...
SYSTAP LLC | © 2006-2015 All Rights Reserved
The Amount of Graph Data is Exploding
Billion+ Edges
SYSTAP LLC | © 2006-2015 All Rights Reserved
Graphs are different. You need the
right paradigm and hardware to scale.
http...
SYSTAP LLC | © 2006-2015 All Rights Reserved
SYSTAP Solves the Graph Scaling Problem Using GPUs
●  100s of Times Faster th...
SYSTAP LLC | © 2006-2015 All Rights Reserved
Uncovering influence links in molecular knowledge networks to streamline perso...
SYSTAP LLC | © 2006-2015 All Rights Reserved
Graph is BIG and changing

(Trillion+ Edges)
SYSTAP LLC | © 2006-2015 All Rights Reserved
Graphs Enable People to Find Knowledge
A Bunch of Pages An Answer
http://BlazeGraph.com
http://MapGraph.io
Parallel&Breadth&First&Search&on&
GPU&Clusters&
SYSTAP™, LLC
© 2006-2015 All Righ...
http://BlazeGraph.com
http://MapGraph.io
Many?Core&is&Your&Future&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
2
3/10/15
http://BlazeGraph.com
http://MapGraph.io
MapGraph:&Extreme&performance&
•  GTEPS&is&Billions&(10^9)&of&Traversed&Edges&per...
http://BlazeGraph.com
http://MapGraph.io
SYSTAP’s&MapGraph&APIs&Roadmap&
•  Vertex&Centric&API&
–  Same&performance&as&CUD...
http://BlazeGraph.com
http://MapGraph.io
Breadth&First&Search&
•  Fundamental&building&block&for&graph&algorithms&
–  Incl...
http://BlazeGraph.com
http://MapGraph.io
GPUs&–&A&Game&Changer&for&Graph&AnalyDcs&
0
500
1000
1500
2000
2500
3000
3500
1 1...
http://BlazeGraph.com
http://MapGraph.io
2D&Par__oning&(aka&Vertex&Cuts)&
•  p&x&p&compute&grid&
–  Edges&in&rows/cols&
– ...
http://BlazeGraph.com
http://MapGraph.io
1
10
100
1,000
10,000
100,000
1,000,000
10,000,000
0 1 2 3 4 5 6
FrontierSize
Ite...
http://BlazeGraph.com
http://MapGraph.io
Distributed&BFS&Algorithm&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
7
3/10/15...
http://BlazeGraph.com
http://MapGraph.io
Distributed&BFS&Algorithm&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
8
3/10/15...
http://BlazeGraph.com
http://MapGraph.io
Global&Fron_er&Contrac_on&and&Communica_on&
SYSTAP™, LLC
© 2006-2015 All Rights R...
http://BlazeGraph.com
http://MapGraph.io
Global&Fron_er&Contrac_on&and&Communica_on&
SYSTAP™, LLC
© 2006-2015 All Rights R...
http://BlazeGraph.com
http://MapGraph.io
Global&Fron_er&Contrac_on&and&Communica_on&
SYSTAP™, LLC
© 2006-2014 All Rights R...
http://BlazeGraph.com
http://MapGraph.io
Global&Fron_er&Contrac_on&and&Communica_on&
SYSTAP™, LLC
© 2006-2014 All Rights R...
http://BlazeGraph.com
http://MapGraph.io
Global&Fron_er&Contrac_on&and&Communica_on&
SYSTAP™, LLC
© 2006-2014 All Rights R...
http://BlazeGraph.com
http://MapGraph.io
Global&Fron_er&Contrac_on&and&Communica_on&
SYSTAP™, LLC
© 2006-2014 All Rights R...
http://BlazeGraph.com
http://MapGraph.io
Global&Fron_er&Contrac_on&and&Communica_on&
SYSTAP™, LLC
© 2006-2014 All Rights R...
http://BlazeGraph.com
http://MapGraph.io
Weak&Scaling&
•  Scaling&the&problem&size&with&
more&GPUs&
–  Fixed&problem&size&...
http://BlazeGraph.com
http://MapGraph.io
Central&Itera_on&Costs&(Weak&Scaling)&
SYSTAP™, LLC
© 2006-2015 All Rights Reserv...
http://BlazeGraph.com
http://MapGraph.io
Costs&During&BFS&Traversal&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
18
3/10/...
http://BlazeGraph.com
http://MapGraph.io
Strong&Scaling&
•  Speedup&on&a&constant&problem&size&with&more&GPUs&
•  Problem&...
http://BlazeGraph.com
http://MapGraph.io
MapGraph&Beta&Customer?&
&
Contact&
Bradley&Bebee&
beebs@systap.com&
20
3/10/15
S...
Upcoming SlideShare
Loading in …5
×

Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC

1,541 views

Published on

Graph Traversal at 30 billion edges per second with NVIDIA GPUs: I will discuss current research on the MapGraph platform. MapGraph is a new and disruptive technology for ultra-fast processing of large graphs on commodity many-core hardware. On a single GPU you can analyze the bitcoin transaction graph in .35 seconds. With MapGraph on 64 NVIDIA K20 GPUs, you can traverse a scale-free graph of 4.3 billion directed edges in .13 seconds for a throughput of 32 Billion Traversed Edges Per Second (32 GTEPS). I will explain why GPUs are an interesting option for data intensive applications, how we map graphs onto many-core processors, and what the future looks like for the MapGraph platform.

MapGraph provides a familiar vertex-centric abstraction, but its GPU acceleration is 100s of times faster than main memory CPU-only technologies and up to 100,000 times faster than graph technologies based on MapReduce or key-value stores such as HBase, Titan, and Accumulo. Learn more athttp://MapGraph.io.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC

  1. 1. SYSTAP LLC | © 2006-2015 All Rights Reserved MLConf @ NYC: March 27, 2015
  2. 2. SYSTAP LLC | © 2006-2015 All Rights Reserved Graph Databases Grew at Over 500% in the Last Two Years Popularity changes per category – March 2015 PopularityChanges Graph Databases
  3. 3. SYSTAP LLC | © 2006-2015 All Rights Reserved The Amount of Graph Data is Exploding Billion+ Edges
  4. 4. SYSTAP LLC | © 2006-2015 All Rights Reserved Graphs are different. You need the right paradigm and hardware to scale. https:// datatake.files.wordpress.com/ Graph Cache Thrash The CPU just waits for graph data from main memory... TypeofCacheorMemory Access Latency Per Clock Cycle
  5. 5. SYSTAP LLC | © 2006-2015 All Rights Reserved SYSTAP Solves the Graph Scaling Problem Using GPUs ●  100s of Times Faster than CPU main memory-based systems ●  Up to 40X Cheaper ●  10,000X Faster than disk-based technologies ●  High Level API
  6. 6. SYSTAP LLC | © 2006-2015 All Rights Reserved Uncovering influence links in molecular knowledge networks to streamline personalized medicine | Shin, Dmitriy et al.Journal of Biomedical Informatics , Volume 52 , 394 - 405 Finding the Next Cure for Cancer is a Billion+ Edge Graph Challenge
  7. 7. SYSTAP LLC | © 2006-2015 All Rights Reserved Graph is BIG and changing
 (Trillion+ Edges)
  8. 8. SYSTAP LLC | © 2006-2015 All Rights Reserved Graphs Enable People to Find Knowledge A Bunch of Pages An Answer
  9. 9. http://BlazeGraph.com http://MapGraph.io Parallel&Breadth&First&Search&on& GPU&Clusters& SYSTAP™, LLC © 2006-2015 All Rights Reserved 1 3/15/15 This work was (partially) funded by the DARPA XDATA program under AFRL Contract #FA8750-13-C-0002. This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. D14PC00029. The authors would like to thank Dr. White, NVIDIA, and the MVAPICH group at Ohio State University for their support of this work. Z. Fu, H.K. Dasari, B. Bebee, M. Berzins, B. Thompson. Parallel Breadth First Search on GPU Clusters. IEEE Big Data. Bethesda, MD. 2014. h6p://mapgraph.io&
  10. 10. http://BlazeGraph.com http://MapGraph.io Many?Core&is&Your&Future& SYSTAP™, LLC © 2006-2015 All Rights Reserved 2 3/10/15
  11. 11. http://BlazeGraph.com http://MapGraph.io MapGraph:&Extreme&performance& •  GTEPS&is&Billions&(10^9)&of&Traversed&Edges&per&Second.& –  This&is&the&basic&measure&of&performance&for&graph&traversal.& Configura)on* Cost* GTEPS* $/GTEPS* 4?Core&CPU& $4,000& 0.2& $5,333& 45Core*CPU*+**K20*GPU* $7,000* 3.0* $2,333* XMT?2&(rumored&price)& $1,800,000& 10.0& $188,000& 64*GPUs*(32*nodes*with*2x*K20*GPUs*per* node*and*InfiniBand*DDRx4*–*today)* $500,000* 30.0* $16,666* 16*GPUs*(2*nodes*with*8x*Pascal*GPUs*per* node*and*InfiniBand*DDRx4*–*Q1,*2016)* $125,000* >30.0* <$4,166* SYSTAP™, LLC © 2006-2015 All Rights Reserved 3 3/10/15
  12. 12. http://BlazeGraph.com http://MapGraph.io SYSTAP’s&MapGraph&APIs&Roadmap& •  Vertex&Centric&API& –  Same&performance&as&CUDA.& •  Schema?flexible&data?model& –  Property&Graph&/&RDF& –  Reduce&import&hassles& –  Opera_ons&over&merged&graphs& •  Graph&pa6ern&matching&language& •  DSL/Scala&=>&CUDA&code&genera_on& SYSTAP™, LLC © 2006-2015 All Rights Reserved Ease&of&Use& 4 3/10/15
  13. 13. http://BlazeGraph.com http://MapGraph.io Breadth&First&Search& •  Fundamental&building&block&for&graph&algorithms& –  Including&SPARQL&(JOINs)& •  In&level&synchronous&steps& –  Label&visited&verDces& •  For&this&graph& –  IteraDon&0& –  IteraDon&1& –  IteraDon&2& •  Hard&problem!& –  Basis&for&Graph&500&benchmark& SYSTAP™, LLC © 2006-2015 All Rights Reserved 10 3/10/15 0& 1& 1& 2& 1& 1& 2& 2& 2& 2& 1& 3& 2& 3& 2& 2& 3& 2&
  14. 14. http://BlazeGraph.com http://MapGraph.io GPUs&–&A&Game&Changer&for&Graph&AnalyDcs& 0 500 1000 1500 2000 2500 3000 3500 1 10 100 1000 10000 100000 MillionTraversedEdgesperSecond Average Traversal Depth NVIDIA Tesla C2050 Multicore per socket Sequential Breadth-First Search on Graphs 10x Speedup on GPUs 3 GTEPS •  Graphs&are&a&hard&problem& •  Non?locality& •  Data&dependent&parallelism& •  Memory&bus&and& communicaDon&bo6lenecks& •  GPUs&deliver&effecDve& parallelism& •  10x+&memory&bandwidth& •  Dynamic&parallelism& SYSTAP™, LLC © 2006-2015 All Rights Reserved 11 3/10/15 Graphic: Merrill, Garland, and Grimshaw, “GPU Sparse Graph Traversal”, GPU Technology Conference, 2012
  15. 15. http://BlazeGraph.com http://MapGraph.io 2D&Par__oning&(aka&Vertex&Cuts)& •  p&x&p&compute&grid& –  Edges&in&rows/cols& –  Minimize&messages& •  log(p)&(versus&p2)& –  One&par__on&per&GPU& •  Batch&parallel&opera_on& –  Grid&row:&out?edges& –  Grid&column:&in?edges& •  Representa_ve&fron_ers& SYSTAP™, LLC © 2006-2015 All Rights Reserved 5 3/10/15 G11 G22 G33 G44 C1 C2 C3 C4 R1 In2 t Out1 t Out2 t Out3 t Out4 t In1 t In3 t In4 t R2 R3 R4 Source&Ver_ces& Target&Ver_ces& out-edges in-edges BFS& PR&Query& •  Parallelism&–&work&must&be&distributed& and&balanced.& •  Memory&bandwidth&–&memory,&not&disk,& is&the&bo6leneck&
  16. 16. http://BlazeGraph.com http://MapGraph.io 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 0 1 2 3 4 5 6 FrontierSize Iteration Scale&25&Traversal& •  Work&spans&mul_ple&orders&of&magnitude.& SYSTAP™, LLC © 2006-2015 All Rights Reserved 6 3/10/15
  17. 17. http://BlazeGraph.com http://MapGraph.io Distributed&BFS&Algorithm& SYSTAP™, LLC © 2006-2015 All Rights Reserved 7 3/10/15 1: procedure BFS(Root, Predecessor) 2: In0 ij LocalVertex(Root) 3: for t 0 do 4: Expand(Int i, Outt ij) 5: LocalFrontiert Count(Outt ij) 6: GlobalFrontiert Reduce(LocalFrontiert) 7: if GlobalFrontiert > 0 then 8: Contract(Outt ij, Outt j, Int+1 i , Assignij) 9: UpdateLevels(Outt j, t, level) 10: else 11: UpdatePreds(Assignij,Predsij, level) 12: break 13: end if 14: t++ 15: end for 16: end procedure Figure 3. Distributed Breadth First Search algorithm 1: procedure UPDATEPRED 2: for all v 2 Assignedij 3: Pred[v] 1 4: for i ColOff[v] 5: if level[v] == 6: Pred[v] 7: end if 8: end for 9: end for 10: end procedure Figure 7. Pre for finding the Prefixt ij, whic performing Bitwise-OR Exclu across each row j as shown are several tree-based algori Since the bitmap union opera ! Data parallel 1-hop expand on all GPUs ! Termination check ! Local frontier size ! Global frontier size ! Compute predecessors from local In / Out levels (no communication)! Done ! Starting vertex ! Global Frontier contraction (“wave”) ! Record level of vertices in the In / Out frontier ! Next iteration
  18. 18. http://BlazeGraph.com http://MapGraph.io Distributed&BFS&Algorithm& SYSTAP™, LLC © 2006-2015 All Rights Reserved 8 3/10/15 1: procedure BFS(Root, Predecessor) 2: In0 ij LocalVertex(Root) 3: for t 0 do 4: Expand(Int i, Outt ij) 5: LocalFrontiert Count(Outt ij) 6: GlobalFrontiert Reduce(LocalFrontiert) 7: if GlobalFrontiert > 0 then 8: Contract(Outt ij, Outt j, Int+1 i , Assignij) 9: UpdateLevels(Outt j, t, level) 10: else 11: UpdatePreds(Assignij,Predsij, level) 12: break 13: end if 14: t++ 15: end for 16: end procedure Figure 3. Distributed Breadth First Search algorithm 1: procedure UPDATEPRED 2: for all v 2 Assignedij 3: Pred[v] 1 4: for i ColOff[v] 5: if level[v] == 6: Pred[v] 7: end if 8: end for 9: end for 10: end procedure Figure 7. Pre for finding the Prefixt ij, whic performing Bitwise-OR Exclu across each row j as shown are several tree-based algori Since the bitmap union opera •  Key&differences& •  log(p)&parallel&scan&(vs&sequen_al&wave)& •  GPU?local&computa_on&of&predecessors& •  1&par__on&per&GPU& •  GPUDirect&(vs&RDMA)& ! Data parallel 1-hop expand on all GPUs ! Termination check ! Local frontier size ! Global frontier size ! Compute predecessors from local In / Out levels (no communication)! Done ! Starting vertex ! Global Frontier contraction (“wave”) ! Record level of vertices in the In / Out frontier ! Next iteration
  19. 19. http://BlazeGraph.com http://MapGraph.io Global&Fron_er&Contrac_on&and&Communica_on& SYSTAP™, LLC © 2006-2015 All Rights Reserved 9 3/10/15 10: Outij convert(Lout) 11: end procedure Figure 4. Expand algorithm 1: procedure CONTRACT(Outt ij, Outt j, Int+1 i , Assignij) 2: Prefixt ij ExclusiveScanj(Outt ij) 3: Assignedij Assignedij [ (Outt ij Prefixt ij) 4: if i = p then 5: Outt j Outt ij [ Prefixt ij 6: end if 7: Broadcast(Outt j, p , ROW) 8: if i = j then 9: Int+1 i Outt j 10: end if 11: Broadcast(Int+1 i , i, COL) 12: end procedure Figure 5. Global Frontier contraction and communication algorithm the row. This is done to update t vertices so that they do not prod those vertices in the future iteratio updating the levels of the vertice Int+1 j . Since both Outt j and Int+1 i in the diagonal (i = j), in line 9, the diagonal GPUs and then broa Ci using MPI_Broadcast in line setup along every row and colum facilitate these parallel scan and b During each BFS iteration, we of the algorithm by checking if zero as in line 6 of Figure 3. We frontier using a MPI_Allreduce op edge expansion and before the g phase. However, an unexplored o the termination check with the g and communication phase in orde ! Distributed prefix sum. Prefix is vertices discovered by your left neighbors (log p) ! Vertices first discovered by this GPU. ! Right most column has global frontier for the row ! Broadcast frontier over row. ! Broadcast frontier over column.
  20. 20. http://BlazeGraph.com http://MapGraph.io Global&Fron_er&Contrac_on&and&Communica_on& SYSTAP™, LLC © 2006-2015 All Rights Reserved 10 3/10/15 1: procedure CONTRACT(Outt ij, Outt j, Int+1 i , Assignij) 2: Prefixt ij ExclusiveScanj(Outt ij) 3: Assignedij Assignedij [ (Outt ij Prefixt ij) 4: if i = p then 5: Outt j Outt ij [ Prefixt ij 6: end if 7: Broadcast(Outt j, p , ROW) 8: if i = j then 9: Int+1 i Outt j 10: end if 11: Broadcast(Int+1 i , i, COL) 12: end procedure !&Distributed&prefix&sum.&Prefix&is&ver_ces& discovered&by&your&lel&neighbors&(log&p)& ! Vertices first discovered by this GPU. ! Right most column has global frontier for the row ! Broadcast frontier over row. ! Broadcast frontier over column. G11 G22 G33 G44 C1 C2 C3 C4 R1 In2 t Out1 t Out2 t Out3 t Out4 t In1 t In3 t In4 t R2 R3 R4 Source&Ver_ces& Target&Ver_ces&
  21. 21. http://BlazeGraph.com http://MapGraph.io Global&Fron_er&Contrac_on&and&Communica_on& SYSTAP™, LLC © 2006-2014 All Rights Reserved 11 3/10/15 1: procedure CONTRACT(Outt ij, Outt j, Int+1 i , Assignij) 2: Prefixt ij ExclusiveScanj(Outt ij) 3: Assignedij Assignedij [ (Outt ij Prefixt ij) 4: if i = p then 5: Outt j Outt ij [ Prefixt ij 6: end if 7: Broadcast(Outt j, p , ROW) 8: if i = j then 9: Int+1 i Outt j 10: end if 11: Broadcast(Int+1 i , i, COL) 12: end procedure •  We&use&a&parallel&scan&that&minimizes&communica_on&steps&(vs&work).& •  This&improves&the&overall&scaling&efficiency&by&30%.& !*Distributed*prefix*sum.*Prefix*is*ver)ces* discovered*by*your*leZ*neighbors*(log*p)* ! Vertices first discovered by this GPU. ! Right most column has global frontier for the row ! Broadcast frontier over row. ! Broadcast frontier over column. G11 G22 G33 G44 C1 C2 C3 C4 R1 In2 t Out1 t Out2 t Out3 t Out4 t In1 t In3 t In4 t R2 R3 R4 Source&Ver_ces& Target&Ver_ces&
  22. 22. http://BlazeGraph.com http://MapGraph.io Global&Fron_er&Contrac_on&and&Communica_on& SYSTAP™, LLC © 2006-2014 All Rights Reserved 12 3/15/15 1: procedure CONTRACT(Outt ij, Outt j, Int+1 i , Assignij) 2: Prefixt ij ExclusiveScanj(Outt ij) 3: Assignedij Assignedij [ (Outt ij Prefixt ij) 4: if i = p then 5: Outt j Outt ij [ Prefixt ij 6: end if 7: Broadcast(Outt j, p , ROW) 8: if i = j then 9: Int+1 i Outt j 10: end if 11: Broadcast(Int+1 i , i, COL) 12: end procedure •  We&use&a&parallel&scan&that&minimizes&communica_on&steps&(vs&work).& •  This&improves&the&overall&scaling&efficiency&by&30%.& ! Vertices first discovered by this GPU. ! Right most column has global frontier for the row ! Broadcast frontier over row. ! Broadcast frontier over column. G11 G22 G33 G44 C1 C2 C3 C4 R1 In2 t Out1 t Out2 t Out3 t Out4 t In1 t In3 t In4 t R2 R3 R4 Source&Ver_ces& Target&Ver_ces& !*Distributed*prefix*sum.*Prefix*is*ver)ces* discovered*by*your*leZ*neighbors*(log*p)*
  23. 23. http://BlazeGraph.com http://MapGraph.io Global&Fron_er&Contrac_on&and&Communica_on& SYSTAP™, LLC © 2006-2014 All Rights Reserved 13 3/15/15 1: procedure CONTRACT(Outt ij, Outt j, Int+1 i , Assignij) 2: Prefixt ij ExclusiveScanj(Outt ij) 3: Assignedij Assignedij [ (Outt ij Prefixt ij) 4: if i = p then 5: Outt j Outt ij [ Prefixt ij 6: end if 7: Broadcast(Outt j, p , ROW) 8: if i = j then 9: Int+1 i Outt j 10: end if 11: Broadcast(Int+1 i , i, COL) 12: end procedure •  We&use&a&parallel&scan&that&minimizes&communica_on&steps&(vs&work).& •  This&improves&the&overall&scaling&efficiency&by&30%.& ! Vertices first discovered by this GPU. ! Right most column has global frontier for the row ! Broadcast frontier over row. ! Broadcast frontier over column. G11 G22 G33 G44 C1 C2 C3 C4 R1 In2 t Out1 t Out2 t Out3 t Out4 t In1 t In3 t In4 t R2 R3 R4 Source&Ver_ces& Target&Ver_ces& !&Distributed&prefix&sum.&Prefix&is&ver_ces& discovered&by&your&lel&neighbors&(log&p)&
  24. 24. http://BlazeGraph.com http://MapGraph.io Global&Fron_er&Contrac_on&and&Communica_on& SYSTAP™, LLC © 2006-2014 All Rights Reserved 14 3/15/15 1: procedure CONTRACT(Outt ij, Outt j, Int+1 i , Assignij) 2: Prefixt ij ExclusiveScanj(Outt ij) 3: Assignedij Assignedij [ (Outt ij Prefixt ij) 4: if i = p then 5: Outt j Outt ij [ Prefixt ij 6: end if 7: Broadcast(Outt j, p , ROW) 8: if i = j then 9: Int+1 i Outt j 10: end if 11: Broadcast(Int+1 i , i, COL) 12: end procedure ! Distributed prefix sum. Prefix is vertices discovered by your left neighbors (log p) ! Vertices first discovered by this GPU. ! Right most column has global frontier for the row ! Broadcast frontier over row. ! Broadcast frontier over column. G11 G22 G33 G44 C1 C2 C3 C4 R1 In2 t Out1 t Out2 t Out3 t Out4 t In1 t In3 t In4 t R2 R3 R4 Source&Ver_ces& Target&Ver_ces&
  25. 25. http://BlazeGraph.com http://MapGraph.io Global&Fron_er&Contrac_on&and&Communica_on& SYSTAP™, LLC © 2006-2014 All Rights Reserved 15 3/15/15 1: procedure CONTRACT(Outt ij, Outt j, Int+1 i , Assignij) 2: Prefixt ij ExclusiveScanj(Outt ij) 3: Assignedij Assignedij [ (Outt ij Prefixt ij) 4: if i = p then 5: Outt j Outt ij [ Prefixt ij 6: end if 7: Broadcast(Outt j, p , ROW) 8: if i = j then 9: Int+1 i Outt j 10: end if 11: Broadcast(Int+1 i , i, COL) 12: end procedure ! Distributed prefix sum. Prefix is vertices discovered by your left neighbors (log p) ! Vertices first discovered by this GPU. ! Right most column has global frontier for the row ! Broadcast frontier over row. ! Broadcast frontier over column. G11 G22 G33 G44 C1 C2 C3 C4 R1 In2 t Out1 t Out2 t Out3 t Out4 t In1 t In3 t In4 t R2 R3 R4 Source&Ver_ces& Target&Ver_ces& •  All&GPUs&now&have&the&new&fron_er.&
  26. 26. http://BlazeGraph.com http://MapGraph.io Weak&Scaling& •  Scaling&the&problem&size&with& more&GPUs& –  Fixed&problem&size&per&GPU.& •  Maximum&scale&27&(4.3B&edges)& –  64&K20&GPUs&=>&.147s&=>&29&GTEPS& –  64&K40&GPUs&=>&.135s&=>&32&GTEPS& •  K40&has&faster&memory&bus.& 0 5 10 15 20 25 30 35 1 4 16 64 GTEPS #GPUs GPUs& Scale& Ver)ces& Edges& Time*(s)& GTEPS& 1& 21& 2,097,152& 67,108,864& 0.0254* 2.5& 4& 23& 8,388,608& 268,435,456& 0.0429* 6.3& 16& 25& 33,554,432& 1,073,741,824& 0.0715* 15.0& 64& 27& 134,217,728& 4,294,967,296& 0.1478* 29.1& SYSTAP™, LLC © 2006-2015 All Rights Reserved 16 3/10/15 Scaling the problem size with more GPUs.
  27. 27. http://BlazeGraph.com http://MapGraph.io Central&Itera_on&Costs&(Weak&Scaling)& SYSTAP™, LLC © 2006-2015 All Rights Reserved 17 3/10/15 0.0000 0.0020 0.0040 0.0060 0.0080 0.0100 0.0120 4 16 64 Time(Seconds) #GPUs Computation Communication Termination •  Communica_on& costs&are&not& constant.& •  2D&design&implies& cost&grows&as& 2*log(2p)/log(p)* •  How&to&scale?& –  Overlapping& –  Compression& •  Graph& •  Message& –  Hybrid&par__oning& –  Heterogeneous& compu_ng&
  28. 28. http://BlazeGraph.com http://MapGraph.io Costs&During&BFS&Traversal& SYSTAP™, LLC © 2006-2015 All Rights Reserved 18 3/10/15 •  Chart&shows&the&different&costs&for&each&GPU&in&each&itera_on&(64&GPUs).& •  Wave&_me&is&essen_ally&constant,&as&expected.& •  Compute&_me&peaks&during&the&central&itera_ons.& •  Costs&are&reasonably&well&balanced&across&all&GPUs&aler&the&2nd&itera_on.&
  29. 29. http://BlazeGraph.com http://MapGraph.io Strong&Scaling& •  Speedup&on&a&constant&problem&size&with&more&GPUs& •  Problem&scale&25& –  2^25&ver_ces&(33,554,432)& –  2^26&directed&edges&(1,073,741,824)& •  Strong&scaling&efficiency&of&48%& –  Versus&44%&for&BG/Q& GPUs* GTEPS* Time*(s)* 16* 15.2* 0.071* 25* 18.2* 0.059* 36* 20.5* 0.053* 49& 21.8& 0.049* 64* 22.7* 0.047* SYSTAP™, LLC © 2006-2015 All Rights Reserved 19 3/10/15 Speedup on a constant problem size with more GPUs.
  30. 30. http://BlazeGraph.com http://MapGraph.io MapGraph&Beta&Customer?& & Contact& Bradley&Bebee& beebs@systap.com& 20 3/10/15 SYSTAP™, LLC © 2006-2015 All Rights Reserved

×