SlideShare a Scribd company logo
SYSTAP LLC | © 2006-2015 All Rights Reserved
MLConf @ NYC: March 27, 2015
SYSTAP LLC | © 2006-2015 All Rights Reserved
Graph Databases Grew at Over 500% in the Last Two Years
Popularity changes per category – March 2015
PopularityChanges
Graph Databases
SYSTAP LLC | © 2006-2015 All Rights Reserved
The Amount of Graph Data is Exploding
Billion+ Edges
SYSTAP LLC | © 2006-2015 All Rights Reserved
Graphs are different. You need the
right paradigm and hardware to scale.
https://
datatake.files.wordpress.com/
Graph Cache Thrash
The CPU just waits for
graph data from main
memory...
TypeofCacheorMemory
Access Latency Per Clock Cycle
SYSTAP LLC | © 2006-2015 All Rights Reserved
SYSTAP Solves the Graph Scaling Problem Using GPUs
●  100s of Times Faster than CPU
main memory-based systems
●  Up to 40X Cheaper
●  10,000X Faster than disk-based
technologies
●  High Level API
SYSTAP LLC | © 2006-2015 All Rights Reserved
Uncovering influence links in molecular knowledge networks to streamline personalized medicine | Shin, Dmitriy et al.Journal of
Biomedical Informatics , Volume 52 , 394 - 405
Finding the Next Cure for Cancer is a
Billion+ Edge Graph Challenge
SYSTAP LLC | © 2006-2015 All Rights Reserved
Graph is BIG and changing

(Trillion+ Edges)
SYSTAP LLC | © 2006-2015 All Rights Reserved
Graphs Enable People to Find Knowledge
A Bunch of Pages An Answer
http://BlazeGraph.com
http://MapGraph.io
Parallel&Breadth&First&Search&on&
GPU&Clusters&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
1
3/15/15
This work was (partially) funded by the DARPA XDATA program under AFRL Contract #FA8750-13-C-0002. This material is
based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. D14PC00029.
The authors would like to thank Dr. White, NVIDIA, and the MVAPICH group at Ohio State University for their support of this
work.
Z. Fu, H.K. Dasari, B. Bebee, M. Berzins, B. Thompson. Parallel Breadth First Search on GPU Clusters. IEEE Big Data.
Bethesda, MD. 2014.
h6p://mapgraph.io&
http://BlazeGraph.com
http://MapGraph.io
Many?Core&is&Your&Future&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
2
3/10/15
http://BlazeGraph.com
http://MapGraph.io
MapGraph:&Extreme&performance&
•  GTEPS&is&Billions&(10^9)&of&Traversed&Edges&per&Second.&
–  This&is&the&basic&measure&of&performance&for&graph&traversal.&
Configura)on* Cost* GTEPS* $/GTEPS*
4?Core&CPU& $4,000& 0.2& $5,333&
45Core*CPU*+**K20*GPU* $7,000* 3.0* $2,333*
XMT?2&(rumored&price)& $1,800,000& 10.0& $188,000&
64*GPUs*(32*nodes*with*2x*K20*GPUs*per*
node*and*InfiniBand*DDRx4*–*today)*
$500,000* 30.0* $16,666*
16*GPUs*(2*nodes*with*8x*Pascal*GPUs*per*
node*and*InfiniBand*DDRx4*–*Q1,*2016)*
$125,000* >30.0* <$4,166*
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
3
3/10/15
http://BlazeGraph.com
http://MapGraph.io
SYSTAP’s&MapGraph&APIs&Roadmap&
•  Vertex&Centric&API&
–  Same&performance&as&CUDA.&
•  Schema?flexible&data?model&
–  Property&Graph&/&RDF&
–  Reduce&import&hassles&
–  Opera_ons&over&merged&graphs&
•  Graph&pa6ern&matching&language&
•  DSL/Scala&=>&CUDA&code&genera_on&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
Ease&of&Use&
4
3/10/15
http://BlazeGraph.com
http://MapGraph.io
Breadth&First&Search&
•  Fundamental&building&block&for&graph&algorithms&
–  Including&SPARQL&(JOINs)&
•  In&level&synchronous&steps&
–  Label&visited&verDces&
•  For&this&graph&
–  IteraDon&0&
–  IteraDon&1&
–  IteraDon&2&
•  Hard&problem!&
–  Basis&for&Graph&500&benchmark&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
10
3/10/15
0&
1&
1&
2&
1&
1&
2&
2&
2&
2&
1&
3&
2&
3&
2&
2& 3&
2&
http://BlazeGraph.com
http://MapGraph.io
GPUs&–&A&Game&Changer&for&Graph&AnalyDcs&
0
500
1000
1500
2000
2500
3000
3500
1 10 100 1000 10000 100000
MillionTraversedEdgesperSecond
Average Traversal Depth
NVIDIA Tesla C2050
Multicore per socket
Sequential
Breadth-First Search on Graphs
10x Speedup on GPUs
3 GTEPS
•  Graphs&are&a&hard&problem&
•  Non?locality&
•  Data&dependent&parallelism&
•  Memory&bus&and&
communicaDon&bo6lenecks&
•  GPUs&deliver&effecDve&
parallelism&
•  10x+&memory&bandwidth&
•  Dynamic&parallelism&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
11
3/10/15
Graphic: Merrill, Garland, and Grimshaw, “GPU Sparse Graph
Traversal”, GPU Technology Conference, 2012
http://BlazeGraph.com
http://MapGraph.io
2D&Par__oning&(aka&Vertex&Cuts)&
•  p&x&p&compute&grid&
–  Edges&in&rows/cols&
–  Minimize&messages&
•  log(p)&(versus&p2)&
–  One&par__on&per&GPU&
•  Batch&parallel&opera_on&
–  Grid&row:&out?edges&
–  Grid&column:&in?edges&
•  Representa_ve&fron_ers&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
5
3/10/15
G11
G22
G33
G44
C1
C2
C3
C4
R1
In2
t
Out1
t
Out2
t
Out3
t
Out4
t
In1
t
In3
t
In4
t
R2
R3
R4
Source&Ver_ces&
Target&Ver_ces&
out-edges
in-edges
BFS& PR&Query&
•  Parallelism&–&work&must&be&distributed&
and&balanced.&
•  Memory&bandwidth&–&memory,&not&disk,&
is&the&bo6leneck&
http://BlazeGraph.com
http://MapGraph.io
1
10
100
1,000
10,000
100,000
1,000,000
10,000,000
0 1 2 3 4 5 6
FrontierSize
Iteration
Scale&25&Traversal&
•  Work&spans&mul_ple&orders&of&magnitude.&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
6
3/10/15
http://BlazeGraph.com
http://MapGraph.io
Distributed&BFS&Algorithm&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
7
3/10/15
1: procedure BFS(Root, Predecessor)
2: In0
ij LocalVertex(Root)
3: for t 0 do
4: Expand(Int
i, Outt
ij)
5: LocalFrontiert Count(Outt
ij)
6: GlobalFrontiert Reduce(LocalFrontiert)
7: if GlobalFrontiert > 0 then
8: Contract(Outt
ij, Outt
j, Int+1
i , Assignij)
9: UpdateLevels(Outt
j, t, level)
10: else
11: UpdatePreds(Assignij,Predsij, level)
12: break
13: end if
14: t++
15: end for
16: end procedure
Figure 3. Distributed Breadth First Search algorithm
1: procedure UPDATEPRED
2: for all v 2 Assignedij
3: Pred[v] 1
4: for i ColOff[v]
5: if level[v] ==
6: Pred[v]
7: end if
8: end for
9: end for
10: end procedure
Figure 7. Pre
for finding the Prefixt
ij, whic
performing Bitwise-OR Exclu
across each row j as shown
are several tree-based algori
Since the bitmap union opera
! Data parallel 1-hop expand on all GPUs
! Termination check
! Local frontier size
! Global frontier size
! Compute predecessors from local
In / Out levels (no communication)! Done
! Starting vertex
! Global Frontier contraction (“wave”)
! Record level of vertices in the In /
Out frontier
! Next iteration
http://BlazeGraph.com
http://MapGraph.io
Distributed&BFS&Algorithm&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
8
3/10/15
1: procedure BFS(Root, Predecessor)
2: In0
ij LocalVertex(Root)
3: for t 0 do
4: Expand(Int
i, Outt
ij)
5: LocalFrontiert Count(Outt
ij)
6: GlobalFrontiert Reduce(LocalFrontiert)
7: if GlobalFrontiert > 0 then
8: Contract(Outt
ij, Outt
j, Int+1
i , Assignij)
9: UpdateLevels(Outt
j, t, level)
10: else
11: UpdatePreds(Assignij,Predsij, level)
12: break
13: end if
14: t++
15: end for
16: end procedure
Figure 3. Distributed Breadth First Search algorithm
1: procedure UPDATEPRED
2: for all v 2 Assignedij
3: Pred[v] 1
4: for i ColOff[v]
5: if level[v] ==
6: Pred[v]
7: end if
8: end for
9: end for
10: end procedure
Figure 7. Pre
for finding the Prefixt
ij, whic
performing Bitwise-OR Exclu
across each row j as shown
are several tree-based algori
Since the bitmap union opera
•  Key&differences&
•  log(p)&parallel&scan&(vs&sequen_al&wave)&
•  GPU?local&computa_on&of&predecessors&
•  1&par__on&per&GPU&
•  GPUDirect&(vs&RDMA)&
! Data parallel 1-hop expand on all GPUs
! Termination check
! Local frontier size
! Global frontier size
! Compute predecessors from local
In / Out levels (no communication)! Done
! Starting vertex
! Global Frontier contraction (“wave”)
! Record level of vertices in the In /
Out frontier
! Next iteration
http://BlazeGraph.com
http://MapGraph.io
Global&Fron_er&Contrac_on&and&Communica_on&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
9
3/10/15
10: Outij convert(Lout)
11: end procedure
Figure 4. Expand algorithm
1: procedure CONTRACT(Outt
ij, Outt
j, Int+1
i , Assignij)
2: Prefixt
ij ExclusiveScanj(Outt
ij)
3: Assignedij Assignedij [ (Outt
ij Prefixt
ij)
4: if i = p then
5: Outt
j Outt
ij [ Prefixt
ij
6: end if
7: Broadcast(Outt
j, p , ROW)
8: if i = j then
9: Int+1
i Outt
j
10: end if
11: Broadcast(Int+1
i , i, COL)
12: end procedure
Figure 5. Global Frontier contraction and communication algorithm
the row. This is done to update t
vertices so that they do not prod
those vertices in the future iteratio
updating the levels of the vertice
Int+1
j . Since both Outt
j and Int+1
i
in the diagonal (i = j), in line 9,
the diagonal GPUs and then broa
Ci using MPI_Broadcast in line
setup along every row and colum
facilitate these parallel scan and b
During each BFS iteration, we
of the algorithm by checking if
zero as in line 6 of Figure 3. We
frontier using a MPI_Allreduce op
edge expansion and before the g
phase. However, an unexplored o
the termination check with the g
and communication phase in orde
! Distributed prefix sum. Prefix is vertices
discovered by your left neighbors (log p)
! Vertices first discovered by this GPU.
! Right most column has global frontier for
the row
! Broadcast frontier over row.
! Broadcast frontier over column.
http://BlazeGraph.com
http://MapGraph.io
Global&Fron_er&Contrac_on&and&Communica_on&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
10
3/10/15
1: procedure CONTRACT(Outt
ij, Outt
j, Int+1
i , Assignij)
2: Prefixt
ij ExclusiveScanj(Outt
ij)
3: Assignedij Assignedij [ (Outt
ij Prefixt
ij)
4: if i = p then
5: Outt
j Outt
ij [ Prefixt
ij
6: end if
7: Broadcast(Outt
j, p , ROW)
8: if i = j then
9: Int+1
i Outt
j
10: end if
11: Broadcast(Int+1
i , i, COL)
12: end procedure
!&Distributed&prefix&sum.&Prefix&is&ver_ces&
discovered&by&your&lel&neighbors&(log&p)&
! Vertices first discovered by this GPU.
! Right most column has global frontier for
the row
! Broadcast frontier over row.
! Broadcast frontier over column.
G11
G22
G33
G44
C1
C2
C3
C4
R1
In2
t
Out1
t
Out2
t
Out3
t
Out4
t
In1
t
In3
t
In4
t
R2
R3
R4
Source&Ver_ces&
Target&Ver_ces&
http://BlazeGraph.com
http://MapGraph.io
Global&Fron_er&Contrac_on&and&Communica_on&
SYSTAP™, LLC
© 2006-2014 All Rights Reserved
11
3/10/15
1: procedure CONTRACT(Outt
ij, Outt
j, Int+1
i , Assignij)
2: Prefixt
ij ExclusiveScanj(Outt
ij)
3: Assignedij Assignedij [ (Outt
ij Prefixt
ij)
4: if i = p then
5: Outt
j Outt
ij [ Prefixt
ij
6: end if
7: Broadcast(Outt
j, p , ROW)
8: if i = j then
9: Int+1
i Outt
j
10: end if
11: Broadcast(Int+1
i , i, COL)
12: end procedure
•  We&use&a&parallel&scan&that&minimizes&communica_on&steps&(vs&work).&
•  This&improves&the&overall&scaling&efficiency&by&30%.&
!*Distributed*prefix*sum.*Prefix*is*ver)ces*
discovered*by*your*leZ*neighbors*(log*p)*
! Vertices first discovered by this GPU.
! Right most column has global frontier for
the row
! Broadcast frontier over row.
! Broadcast frontier over column.
G11
G22
G33
G44
C1
C2
C3
C4
R1
In2
t
Out1
t
Out2
t
Out3
t
Out4
t
In1
t
In3
t
In4
t
R2
R3
R4
Source&Ver_ces&
Target&Ver_ces&
http://BlazeGraph.com
http://MapGraph.io
Global&Fron_er&Contrac_on&and&Communica_on&
SYSTAP™, LLC
© 2006-2014 All Rights Reserved
12
3/15/15
1: procedure CONTRACT(Outt
ij, Outt
j, Int+1
i , Assignij)
2: Prefixt
ij ExclusiveScanj(Outt
ij)
3: Assignedij Assignedij [ (Outt
ij Prefixt
ij)
4: if i = p then
5: Outt
j Outt
ij [ Prefixt
ij
6: end if
7: Broadcast(Outt
j, p , ROW)
8: if i = j then
9: Int+1
i Outt
j
10: end if
11: Broadcast(Int+1
i , i, COL)
12: end procedure
•  We&use&a&parallel&scan&that&minimizes&communica_on&steps&(vs&work).&
•  This&improves&the&overall&scaling&efficiency&by&30%.&
! Vertices first discovered by this GPU.
! Right most column has global frontier for
the row
! Broadcast frontier over row.
! Broadcast frontier over column.
G11
G22
G33
G44
C1
C2
C3
C4
R1
In2
t
Out1
t
Out2
t
Out3
t
Out4
t
In1
t
In3
t
In4
t
R2
R3
R4
Source&Ver_ces&
Target&Ver_ces&
!*Distributed*prefix*sum.*Prefix*is*ver)ces*
discovered*by*your*leZ*neighbors*(log*p)*
http://BlazeGraph.com
http://MapGraph.io
Global&Fron_er&Contrac_on&and&Communica_on&
SYSTAP™, LLC
© 2006-2014 All Rights Reserved
13
3/15/15
1: procedure CONTRACT(Outt
ij, Outt
j, Int+1
i , Assignij)
2: Prefixt
ij ExclusiveScanj(Outt
ij)
3: Assignedij Assignedij [ (Outt
ij Prefixt
ij)
4: if i = p then
5: Outt
j Outt
ij [ Prefixt
ij
6: end if
7: Broadcast(Outt
j, p , ROW)
8: if i = j then
9: Int+1
i Outt
j
10: end if
11: Broadcast(Int+1
i , i, COL)
12: end procedure
•  We&use&a&parallel&scan&that&minimizes&communica_on&steps&(vs&work).&
•  This&improves&the&overall&scaling&efficiency&by&30%.&
! Vertices first discovered by this GPU.
! Right most column has global frontier
for the row
! Broadcast frontier over row.
! Broadcast frontier over column.
G11
G22
G33
G44
C1
C2
C3
C4
R1
In2
t
Out1
t
Out2
t
Out3
t
Out4
t
In1
t
In3
t
In4
t
R2
R3
R4
Source&Ver_ces&
Target&Ver_ces&
!&Distributed&prefix&sum.&Prefix&is&ver_ces&
discovered&by&your&lel&neighbors&(log&p)&
http://BlazeGraph.com
http://MapGraph.io
Global&Fron_er&Contrac_on&and&Communica_on&
SYSTAP™, LLC
© 2006-2014 All Rights Reserved
14
3/15/15
1: procedure CONTRACT(Outt
ij, Outt
j, Int+1
i , Assignij)
2: Prefixt
ij ExclusiveScanj(Outt
ij)
3: Assignedij Assignedij [ (Outt
ij Prefixt
ij)
4: if i = p then
5: Outt
j Outt
ij [ Prefixt
ij
6: end if
7: Broadcast(Outt
j, p , ROW)
8: if i = j then
9: Int+1
i Outt
j
10: end if
11: Broadcast(Int+1
i , i, COL)
12: end procedure
! Distributed prefix sum. Prefix is vertices
discovered by your left neighbors (log p)
! Vertices first discovered by this GPU.
! Right most column has global frontier for
the row
! Broadcast frontier over row.
! Broadcast frontier over column.
G11
G22
G33
G44
C1
C2
C3
C4
R1
In2
t
Out1
t
Out2
t
Out3
t
Out4
t
In1
t
In3
t
In4
t
R2
R3
R4
Source&Ver_ces&
Target&Ver_ces&
http://BlazeGraph.com
http://MapGraph.io
Global&Fron_er&Contrac_on&and&Communica_on&
SYSTAP™, LLC
© 2006-2014 All Rights Reserved
15
3/15/15
1: procedure CONTRACT(Outt
ij, Outt
j, Int+1
i , Assignij)
2: Prefixt
ij ExclusiveScanj(Outt
ij)
3: Assignedij Assignedij [ (Outt
ij Prefixt
ij)
4: if i = p then
5: Outt
j Outt
ij [ Prefixt
ij
6: end if
7: Broadcast(Outt
j, p , ROW)
8: if i = j then
9: Int+1
i Outt
j
10: end if
11: Broadcast(Int+1
i , i, COL)
12: end procedure
! Distributed prefix sum. Prefix is vertices
discovered by your left neighbors (log p)
! Vertices first discovered by this GPU.
! Right most column has global frontier for
the row
! Broadcast frontier over row.
! Broadcast frontier over column.
G11
G22
G33
G44
C1
C2
C3
C4
R1
In2
t
Out1
t
Out2
t
Out3
t
Out4
t
In1
t
In3
t
In4
t
R2
R3
R4
Source&Ver_ces&
Target&Ver_ces&
•  All&GPUs&now&have&the&new&fron_er.&
http://BlazeGraph.com
http://MapGraph.io
Weak&Scaling&
•  Scaling&the&problem&size&with&
more&GPUs&
–  Fixed&problem&size&per&GPU.&
•  Maximum&scale&27&(4.3B&edges)&
–  64&K20&GPUs&=>&.147s&=>&29&GTEPS&
–  64&K40&GPUs&=>&.135s&=>&32&GTEPS&
•  K40&has&faster&memory&bus.&
0
5
10
15
20
25
30
35
1 4 16 64
GTEPS
#GPUs
GPUs& Scale& Ver)ces& Edges& Time*(s)& GTEPS&
1& 21& 2,097,152& 67,108,864& 0.0254* 2.5&
4& 23& 8,388,608& 268,435,456& 0.0429* 6.3&
16& 25& 33,554,432& 1,073,741,824& 0.0715* 15.0&
64& 27& 134,217,728& 4,294,967,296& 0.1478* 29.1&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
16
3/10/15
Scaling the problem size with
more GPUs.
http://BlazeGraph.com
http://MapGraph.io
Central&Itera_on&Costs&(Weak&Scaling)&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
17
3/10/15
0.0000
0.0020
0.0040
0.0060
0.0080
0.0100
0.0120
4 16 64
Time(Seconds)
#GPUs
Computation
Communication
Termination
•  Communica_on&
costs&are&not&
constant.&
•  2D&design&implies&
cost&grows&as&
2*log(2p)/log(p)*
•  How&to&scale?&
–  Overlapping&
–  Compression&
•  Graph&
•  Message&
–  Hybrid&par__oning&
–  Heterogeneous&
compu_ng&
http://BlazeGraph.com
http://MapGraph.io
Costs&During&BFS&Traversal&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
18
3/10/15
•  Chart&shows&the&different&costs&for&each&GPU&in&each&itera_on&(64&GPUs).&
•  Wave&_me&is&essen_ally&constant,&as&expected.&
•  Compute&_me&peaks&during&the&central&itera_ons.&
•  Costs&are&reasonably&well&balanced&across&all&GPUs&aler&the&2nd&itera_on.&
http://BlazeGraph.com
http://MapGraph.io
Strong&Scaling&
•  Speedup&on&a&constant&problem&size&with&more&GPUs&
•  Problem&scale&25&
–  2^25&ver_ces&(33,554,432)&
–  2^26&directed&edges&(1,073,741,824)&
•  Strong&scaling&efficiency&of&48%&
–  Versus&44%&for&BG/Q&
GPUs* GTEPS* Time*(s)*
16* 15.2* 0.071*
25* 18.2* 0.059*
36* 20.5* 0.053*
49& 21.8& 0.049*
64* 22.7* 0.047*
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
19
3/10/15
Speedup on a constant
problem size with more GPUs.
http://BlazeGraph.com
http://MapGraph.io
MapGraph&Beta&Customer?&
&
Contact&
Bradley&Bebee&
beebs@systap.com&
20
3/10/15
SYSTAP™, LLC
© 2006-2015 All Rights Reserved

More Related Content

What's hot

Parallel computation
Parallel computationParallel computation
Parallel computation
Jayanti Prasad Ph.D.
 
Computer Graphics & Visualization - 06
Computer Graphics & Visualization - 06Computer Graphics & Visualization - 06
Computer Graphics & Visualization - 06
Pankaj Debbarma
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
Preferred Networks
 
Deep Learning in Python with Tensorflow for Finance
Deep Learning in Python with Tensorflow for FinanceDeep Learning in Python with Tensorflow for Finance
Deep Learning in Python with Tensorflow for Finance
Ben Ball
 
Naist2015 dec ver1
Naist2015 dec ver1Naist2015 dec ver1
Naist2015 dec ver1
Hiroki Nakahara
 
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
Preferred Networks
 
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Ganesan Narayanasamy
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVM
Linaro
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
Hye-min Ahn
 
Chainer v3
Chainer v3Chainer v3
Chainer v3
Seiya Tokui
 
Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...
Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...
Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...
Shinya Takamaeda-Y
 
Magical float repr
Magical float reprMagical float repr
Magical float reprdickinsm
 
Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1
Tyrone Systems
 
Code GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flowCode GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flow
Marina Kolpakova
 
Graph500
Graph500Graph500
Graph500
Jason Riedy
 
End of Sprint 5
End of Sprint 5End of Sprint 5
End of Sprint 5dm_work
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Tyrone Systems
 

What's hot (19)

Parallel computation
Parallel computationParallel computation
Parallel computation
 
Computer Graphics & Visualization - 06
Computer Graphics & Visualization - 06Computer Graphics & Visualization - 06
Computer Graphics & Visualization - 06
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
 
Deep Learning in Python with Tensorflow for Finance
Deep Learning in Python with Tensorflow for FinanceDeep Learning in Python with Tensorflow for Finance
Deep Learning in Python with Tensorflow for Finance
 
Naist2015 dec ver1
Naist2015 dec ver1Naist2015 dec ver1
Naist2015 dec ver1
 
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
 
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVM
 
cnsm2011_slide
cnsm2011_slidecnsm2011_slide
cnsm2011_slide
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
 
MaPU-HPCA2016
MaPU-HPCA2016MaPU-HPCA2016
MaPU-HPCA2016
 
Chainer v3
Chainer v3Chainer v3
Chainer v3
 
Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...
Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...
Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...
 
Magical float repr
Magical float reprMagical float repr
Magical float repr
 
Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1
 
Code GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flowCode GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flow
 
Graph500
Graph500Graph500
Graph500
 
End of Sprint 5
End of Sprint 5End of Sprint 5
End of Sprint 5
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
 

Similar to Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC

6. Implementation
6. Implementation6. Implementation
Crude-Oil Scheduling Technology: moving from simulation to optimization
Crude-Oil Scheduling Technology: moving from simulation to optimizationCrude-Oil Scheduling Technology: moving from simulation to optimization
Crude-Oil Scheduling Technology: moving from simulation to optimization
Brenno Menezes
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New Architectures
Jason Riedy
 
All AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIAll AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AI
Jim Dowling
 
ScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
ScaleGraph - A High-Performance Library for Billion-Scale Graph AnalyticsScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
ScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
Toyotaro Suzumura
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
✔ Eric David Benari, PMP
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...
AMD Developer Central
 
GPUs for GEC Competition @ GECCO-2013
GPUs for GEC Competition @ GECCO-2013GPUs for GEC Competition @ GECCO-2013
GPUs for GEC Competition @ GECCO-2013
Daniele Loiacono
 
GPU Accelerated Machine Learning
GPU Accelerated Machine LearningGPU Accelerated Machine Learning
GPU Accelerated Machine Learning
Sri Ambati
 
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESDynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Subhajit Sahu
 
Open power topics20191023
Open power topics20191023Open power topics20191023
Open power topics20191023
Yutaka Kawai
 
On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)
Yu Liu
 
GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processing
huguk
 
IRJET- To Design 16 bit Synchronous Microprocessor using VHDL on FPGA
IRJET-  	  To Design 16 bit Synchronous Microprocessor using VHDL on FPGAIRJET-  	  To Design 16 bit Synchronous Microprocessor using VHDL on FPGA
IRJET- To Design 16 bit Synchronous Microprocessor using VHDL on FPGA
IRJET Journal
 
Comprehensive Container Based Service Monitoring with Kubernetes and Istio
Comprehensive Container Based Service Monitoring with Kubernetes and IstioComprehensive Container Based Service Monitoring with Kubernetes and Istio
Comprehensive Container Based Service Monitoring with Kubernetes and Istio
Fred Moyer
 
I017425763
I017425763I017425763
I017425763
IOSR Journals
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
Petr Novotný
 
UC2010_BRS1280_Eastman_Chemical_Johnston
UC2010_BRS1280_Eastman_Chemical_JohnstonUC2010_BRS1280_Eastman_Chemical_Johnston
UC2010_BRS1280_Eastman_Chemical_JohnstonH Eddie Newton
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
Jeremy Schneider
 

Similar to Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC (20)

6. Implementation
6. Implementation6. Implementation
6. Implementation
 
Crude-Oil Scheduling Technology: moving from simulation to optimization
Crude-Oil Scheduling Technology: moving from simulation to optimizationCrude-Oil Scheduling Technology: moving from simulation to optimization
Crude-Oil Scheduling Technology: moving from simulation to optimization
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New Architectures
 
All AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIAll AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AI
 
ScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
ScaleGraph - A High-Performance Library for Billion-Scale Graph AnalyticsScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
ScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...
 
GPUs for GEC Competition @ GECCO-2013
GPUs for GEC Competition @ GECCO-2013GPUs for GEC Competition @ GECCO-2013
GPUs for GEC Competition @ GECCO-2013
 
GPU Accelerated Machine Learning
GPU Accelerated Machine LearningGPU Accelerated Machine Learning
GPU Accelerated Machine Learning
 
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESDynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
 
Open power topics20191023
Open power topics20191023Open power topics20191023
Open power topics20191023
 
On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)
 
GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processing
 
IRJET- To Design 16 bit Synchronous Microprocessor using VHDL on FPGA
IRJET-  	  To Design 16 bit Synchronous Microprocessor using VHDL on FPGAIRJET-  	  To Design 16 bit Synchronous Microprocessor using VHDL on FPGA
IRJET- To Design 16 bit Synchronous Microprocessor using VHDL on FPGA
 
Comprehensive Container Based Service Monitoring with Kubernetes and Istio
Comprehensive Container Based Service Monitoring with Kubernetes and IstioComprehensive Container Based Service Monitoring with Kubernetes and Istio
Comprehensive Container Based Service Monitoring with Kubernetes and Istio
 
I017425763
I017425763I017425763
I017425763
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 
UC2010_BRS1280_Eastman_Chemical_Johnston
UC2010_BRS1280_Eastman_Chemical_JohnstonUC2010_BRS1280_Eastman_Chemical_Johnston
UC2010_BRS1280_Eastman_Chemical_Johnston
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
 

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
MLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
MLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
MLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
MLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
MLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
MLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
MLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
MLconf
 

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Recently uploaded

Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 

Recently uploaded (20)

Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 

Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC

  • 1. SYSTAP LLC | © 2006-2015 All Rights Reserved MLConf @ NYC: March 27, 2015
  • 2. SYSTAP LLC | © 2006-2015 All Rights Reserved Graph Databases Grew at Over 500% in the Last Two Years Popularity changes per category – March 2015 PopularityChanges Graph Databases
  • 3. SYSTAP LLC | © 2006-2015 All Rights Reserved The Amount of Graph Data is Exploding Billion+ Edges
  • 4. SYSTAP LLC | © 2006-2015 All Rights Reserved Graphs are different. You need the right paradigm and hardware to scale. https:// datatake.files.wordpress.com/ Graph Cache Thrash The CPU just waits for graph data from main memory... TypeofCacheorMemory Access Latency Per Clock Cycle
  • 5. SYSTAP LLC | © 2006-2015 All Rights Reserved SYSTAP Solves the Graph Scaling Problem Using GPUs ●  100s of Times Faster than CPU main memory-based systems ●  Up to 40X Cheaper ●  10,000X Faster than disk-based technologies ●  High Level API
  • 6. SYSTAP LLC | © 2006-2015 All Rights Reserved Uncovering influence links in molecular knowledge networks to streamline personalized medicine | Shin, Dmitriy et al.Journal of Biomedical Informatics , Volume 52 , 394 - 405 Finding the Next Cure for Cancer is a Billion+ Edge Graph Challenge
  • 7. SYSTAP LLC | © 2006-2015 All Rights Reserved Graph is BIG and changing
 (Trillion+ Edges)
  • 8. SYSTAP LLC | © 2006-2015 All Rights Reserved Graphs Enable People to Find Knowledge A Bunch of Pages An Answer
  • 9. http://BlazeGraph.com http://MapGraph.io Parallel&Breadth&First&Search&on& GPU&Clusters& SYSTAP™, LLC © 2006-2015 All Rights Reserved 1 3/15/15 This work was (partially) funded by the DARPA XDATA program under AFRL Contract #FA8750-13-C-0002. This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. D14PC00029. The authors would like to thank Dr. White, NVIDIA, and the MVAPICH group at Ohio State University for their support of this work. Z. Fu, H.K. Dasari, B. Bebee, M. Berzins, B. Thompson. Parallel Breadth First Search on GPU Clusters. IEEE Big Data. Bethesda, MD. 2014. h6p://mapgraph.io&
  • 11. http://BlazeGraph.com http://MapGraph.io MapGraph:&Extreme&performance& •  GTEPS&is&Billions&(10^9)&of&Traversed&Edges&per&Second.& –  This&is&the&basic&measure&of&performance&for&graph&traversal.& Configura)on* Cost* GTEPS* $/GTEPS* 4?Core&CPU& $4,000& 0.2& $5,333& 45Core*CPU*+**K20*GPU* $7,000* 3.0* $2,333* XMT?2&(rumored&price)& $1,800,000& 10.0& $188,000& 64*GPUs*(32*nodes*with*2x*K20*GPUs*per* node*and*InfiniBand*DDRx4*–*today)* $500,000* 30.0* $16,666* 16*GPUs*(2*nodes*with*8x*Pascal*GPUs*per* node*and*InfiniBand*DDRx4*–*Q1,*2016)* $125,000* >30.0* <$4,166* SYSTAP™, LLC © 2006-2015 All Rights Reserved 3 3/10/15
  • 12. http://BlazeGraph.com http://MapGraph.io SYSTAP’s&MapGraph&APIs&Roadmap& •  Vertex&Centric&API& –  Same&performance&as&CUDA.& •  Schema?flexible&data?model& –  Property&Graph&/&RDF& –  Reduce&import&hassles& –  Opera_ons&over&merged&graphs& •  Graph&pa6ern&matching&language& •  DSL/Scala&=>&CUDA&code&genera_on& SYSTAP™, LLC © 2006-2015 All Rights Reserved Ease&of&Use& 4 3/10/15
  • 13. http://BlazeGraph.com http://MapGraph.io Breadth&First&Search& •  Fundamental&building&block&for&graph&algorithms& –  Including&SPARQL&(JOINs)& •  In&level&synchronous&steps& –  Label&visited&verDces& •  For&this&graph& –  IteraDon&0& –  IteraDon&1& –  IteraDon&2& •  Hard&problem!& –  Basis&for&Graph&500&benchmark& SYSTAP™, LLC © 2006-2015 All Rights Reserved 10 3/10/15 0& 1& 1& 2& 1& 1& 2& 2& 2& 2& 1& 3& 2& 3& 2& 2& 3& 2&
  • 14. http://BlazeGraph.com http://MapGraph.io GPUs&–&A&Game&Changer&for&Graph&AnalyDcs& 0 500 1000 1500 2000 2500 3000 3500 1 10 100 1000 10000 100000 MillionTraversedEdgesperSecond Average Traversal Depth NVIDIA Tesla C2050 Multicore per socket Sequential Breadth-First Search on Graphs 10x Speedup on GPUs 3 GTEPS •  Graphs&are&a&hard&problem& •  Non?locality& •  Data&dependent&parallelism& •  Memory&bus&and& communicaDon&bo6lenecks& •  GPUs&deliver&effecDve& parallelism& •  10x+&memory&bandwidth& •  Dynamic&parallelism& SYSTAP™, LLC © 2006-2015 All Rights Reserved 11 3/10/15 Graphic: Merrill, Garland, and Grimshaw, “GPU Sparse Graph Traversal”, GPU Technology Conference, 2012
  • 15. http://BlazeGraph.com http://MapGraph.io 2D&Par__oning&(aka&Vertex&Cuts)& •  p&x&p&compute&grid& –  Edges&in&rows/cols& –  Minimize&messages& •  log(p)&(versus&p2)& –  One&par__on&per&GPU& •  Batch&parallel&opera_on& –  Grid&row:&out?edges& –  Grid&column:&in?edges& •  Representa_ve&fron_ers& SYSTAP™, LLC © 2006-2015 All Rights Reserved 5 3/10/15 G11 G22 G33 G44 C1 C2 C3 C4 R1 In2 t Out1 t Out2 t Out3 t Out4 t In1 t In3 t In4 t R2 R3 R4 Source&Ver_ces& Target&Ver_ces& out-edges in-edges BFS& PR&Query& •  Parallelism&–&work&must&be&distributed& and&balanced.& •  Memory&bandwidth&–&memory,&not&disk,& is&the&bo6leneck&
  • 16. http://BlazeGraph.com http://MapGraph.io 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 0 1 2 3 4 5 6 FrontierSize Iteration Scale&25&Traversal& •  Work&spans&mul_ple&orders&of&magnitude.& SYSTAP™, LLC © 2006-2015 All Rights Reserved 6 3/10/15
  • 17. http://BlazeGraph.com http://MapGraph.io Distributed&BFS&Algorithm& SYSTAP™, LLC © 2006-2015 All Rights Reserved 7 3/10/15 1: procedure BFS(Root, Predecessor) 2: In0 ij LocalVertex(Root) 3: for t 0 do 4: Expand(Int i, Outt ij) 5: LocalFrontiert Count(Outt ij) 6: GlobalFrontiert Reduce(LocalFrontiert) 7: if GlobalFrontiert > 0 then 8: Contract(Outt ij, Outt j, Int+1 i , Assignij) 9: UpdateLevels(Outt j, t, level) 10: else 11: UpdatePreds(Assignij,Predsij, level) 12: break 13: end if 14: t++ 15: end for 16: end procedure Figure 3. Distributed Breadth First Search algorithm 1: procedure UPDATEPRED 2: for all v 2 Assignedij 3: Pred[v] 1 4: for i ColOff[v] 5: if level[v] == 6: Pred[v] 7: end if 8: end for 9: end for 10: end procedure Figure 7. Pre for finding the Prefixt ij, whic performing Bitwise-OR Exclu across each row j as shown are several tree-based algori Since the bitmap union opera ! Data parallel 1-hop expand on all GPUs ! Termination check ! Local frontier size ! Global frontier size ! Compute predecessors from local In / Out levels (no communication)! Done ! Starting vertex ! Global Frontier contraction (“wave”) ! Record level of vertices in the In / Out frontier ! Next iteration
  • 18. http://BlazeGraph.com http://MapGraph.io Distributed&BFS&Algorithm& SYSTAP™, LLC © 2006-2015 All Rights Reserved 8 3/10/15 1: procedure BFS(Root, Predecessor) 2: In0 ij LocalVertex(Root) 3: for t 0 do 4: Expand(Int i, Outt ij) 5: LocalFrontiert Count(Outt ij) 6: GlobalFrontiert Reduce(LocalFrontiert) 7: if GlobalFrontiert > 0 then 8: Contract(Outt ij, Outt j, Int+1 i , Assignij) 9: UpdateLevels(Outt j, t, level) 10: else 11: UpdatePreds(Assignij,Predsij, level) 12: break 13: end if 14: t++ 15: end for 16: end procedure Figure 3. Distributed Breadth First Search algorithm 1: procedure UPDATEPRED 2: for all v 2 Assignedij 3: Pred[v] 1 4: for i ColOff[v] 5: if level[v] == 6: Pred[v] 7: end if 8: end for 9: end for 10: end procedure Figure 7. Pre for finding the Prefixt ij, whic performing Bitwise-OR Exclu across each row j as shown are several tree-based algori Since the bitmap union opera •  Key&differences& •  log(p)&parallel&scan&(vs&sequen_al&wave)& •  GPU?local&computa_on&of&predecessors& •  1&par__on&per&GPU& •  GPUDirect&(vs&RDMA)& ! Data parallel 1-hop expand on all GPUs ! Termination check ! Local frontier size ! Global frontier size ! Compute predecessors from local In / Out levels (no communication)! Done ! Starting vertex ! Global Frontier contraction (“wave”) ! Record level of vertices in the In / Out frontier ! Next iteration
  • 19. http://BlazeGraph.com http://MapGraph.io Global&Fron_er&Contrac_on&and&Communica_on& SYSTAP™, LLC © 2006-2015 All Rights Reserved 9 3/10/15 10: Outij convert(Lout) 11: end procedure Figure 4. Expand algorithm 1: procedure CONTRACT(Outt ij, Outt j, Int+1 i , Assignij) 2: Prefixt ij ExclusiveScanj(Outt ij) 3: Assignedij Assignedij [ (Outt ij Prefixt ij) 4: if i = p then 5: Outt j Outt ij [ Prefixt ij 6: end if 7: Broadcast(Outt j, p , ROW) 8: if i = j then 9: Int+1 i Outt j 10: end if 11: Broadcast(Int+1 i , i, COL) 12: end procedure Figure 5. Global Frontier contraction and communication algorithm the row. This is done to update t vertices so that they do not prod those vertices in the future iteratio updating the levels of the vertice Int+1 j . Since both Outt j and Int+1 i in the diagonal (i = j), in line 9, the diagonal GPUs and then broa Ci using MPI_Broadcast in line setup along every row and colum facilitate these parallel scan and b During each BFS iteration, we of the algorithm by checking if zero as in line 6 of Figure 3. We frontier using a MPI_Allreduce op edge expansion and before the g phase. However, an unexplored o the termination check with the g and communication phase in orde ! Distributed prefix sum. Prefix is vertices discovered by your left neighbors (log p) ! Vertices first discovered by this GPU. ! Right most column has global frontier for the row ! Broadcast frontier over row. ! Broadcast frontier over column.
  • 20. http://BlazeGraph.com http://MapGraph.io Global&Fron_er&Contrac_on&and&Communica_on& SYSTAP™, LLC © 2006-2015 All Rights Reserved 10 3/10/15 1: procedure CONTRACT(Outt ij, Outt j, Int+1 i , Assignij) 2: Prefixt ij ExclusiveScanj(Outt ij) 3: Assignedij Assignedij [ (Outt ij Prefixt ij) 4: if i = p then 5: Outt j Outt ij [ Prefixt ij 6: end if 7: Broadcast(Outt j, p , ROW) 8: if i = j then 9: Int+1 i Outt j 10: end if 11: Broadcast(Int+1 i , i, COL) 12: end procedure !&Distributed&prefix&sum.&Prefix&is&ver_ces& discovered&by&your&lel&neighbors&(log&p)& ! Vertices first discovered by this GPU. ! Right most column has global frontier for the row ! Broadcast frontier over row. ! Broadcast frontier over column. G11 G22 G33 G44 C1 C2 C3 C4 R1 In2 t Out1 t Out2 t Out3 t Out4 t In1 t In3 t In4 t R2 R3 R4 Source&Ver_ces& Target&Ver_ces&
  • 21. http://BlazeGraph.com http://MapGraph.io Global&Fron_er&Contrac_on&and&Communica_on& SYSTAP™, LLC © 2006-2014 All Rights Reserved 11 3/10/15 1: procedure CONTRACT(Outt ij, Outt j, Int+1 i , Assignij) 2: Prefixt ij ExclusiveScanj(Outt ij) 3: Assignedij Assignedij [ (Outt ij Prefixt ij) 4: if i = p then 5: Outt j Outt ij [ Prefixt ij 6: end if 7: Broadcast(Outt j, p , ROW) 8: if i = j then 9: Int+1 i Outt j 10: end if 11: Broadcast(Int+1 i , i, COL) 12: end procedure •  We&use&a&parallel&scan&that&minimizes&communica_on&steps&(vs&work).& •  This&improves&the&overall&scaling&efficiency&by&30%.& !*Distributed*prefix*sum.*Prefix*is*ver)ces* discovered*by*your*leZ*neighbors*(log*p)* ! Vertices first discovered by this GPU. ! Right most column has global frontier for the row ! Broadcast frontier over row. ! Broadcast frontier over column. G11 G22 G33 G44 C1 C2 C3 C4 R1 In2 t Out1 t Out2 t Out3 t Out4 t In1 t In3 t In4 t R2 R3 R4 Source&Ver_ces& Target&Ver_ces&
  • 22. http://BlazeGraph.com http://MapGraph.io Global&Fron_er&Contrac_on&and&Communica_on& SYSTAP™, LLC © 2006-2014 All Rights Reserved 12 3/15/15 1: procedure CONTRACT(Outt ij, Outt j, Int+1 i , Assignij) 2: Prefixt ij ExclusiveScanj(Outt ij) 3: Assignedij Assignedij [ (Outt ij Prefixt ij) 4: if i = p then 5: Outt j Outt ij [ Prefixt ij 6: end if 7: Broadcast(Outt j, p , ROW) 8: if i = j then 9: Int+1 i Outt j 10: end if 11: Broadcast(Int+1 i , i, COL) 12: end procedure •  We&use&a&parallel&scan&that&minimizes&communica_on&steps&(vs&work).& •  This&improves&the&overall&scaling&efficiency&by&30%.& ! Vertices first discovered by this GPU. ! Right most column has global frontier for the row ! Broadcast frontier over row. ! Broadcast frontier over column. G11 G22 G33 G44 C1 C2 C3 C4 R1 In2 t Out1 t Out2 t Out3 t Out4 t In1 t In3 t In4 t R2 R3 R4 Source&Ver_ces& Target&Ver_ces& !*Distributed*prefix*sum.*Prefix*is*ver)ces* discovered*by*your*leZ*neighbors*(log*p)*
  • 23. http://BlazeGraph.com http://MapGraph.io Global&Fron_er&Contrac_on&and&Communica_on& SYSTAP™, LLC © 2006-2014 All Rights Reserved 13 3/15/15 1: procedure CONTRACT(Outt ij, Outt j, Int+1 i , Assignij) 2: Prefixt ij ExclusiveScanj(Outt ij) 3: Assignedij Assignedij [ (Outt ij Prefixt ij) 4: if i = p then 5: Outt j Outt ij [ Prefixt ij 6: end if 7: Broadcast(Outt j, p , ROW) 8: if i = j then 9: Int+1 i Outt j 10: end if 11: Broadcast(Int+1 i , i, COL) 12: end procedure •  We&use&a&parallel&scan&that&minimizes&communica_on&steps&(vs&work).& •  This&improves&the&overall&scaling&efficiency&by&30%.& ! Vertices first discovered by this GPU. ! Right most column has global frontier for the row ! Broadcast frontier over row. ! Broadcast frontier over column. G11 G22 G33 G44 C1 C2 C3 C4 R1 In2 t Out1 t Out2 t Out3 t Out4 t In1 t In3 t In4 t R2 R3 R4 Source&Ver_ces& Target&Ver_ces& !&Distributed&prefix&sum.&Prefix&is&ver_ces& discovered&by&your&lel&neighbors&(log&p)&
  • 24. http://BlazeGraph.com http://MapGraph.io Global&Fron_er&Contrac_on&and&Communica_on& SYSTAP™, LLC © 2006-2014 All Rights Reserved 14 3/15/15 1: procedure CONTRACT(Outt ij, Outt j, Int+1 i , Assignij) 2: Prefixt ij ExclusiveScanj(Outt ij) 3: Assignedij Assignedij [ (Outt ij Prefixt ij) 4: if i = p then 5: Outt j Outt ij [ Prefixt ij 6: end if 7: Broadcast(Outt j, p , ROW) 8: if i = j then 9: Int+1 i Outt j 10: end if 11: Broadcast(Int+1 i , i, COL) 12: end procedure ! Distributed prefix sum. Prefix is vertices discovered by your left neighbors (log p) ! Vertices first discovered by this GPU. ! Right most column has global frontier for the row ! Broadcast frontier over row. ! Broadcast frontier over column. G11 G22 G33 G44 C1 C2 C3 C4 R1 In2 t Out1 t Out2 t Out3 t Out4 t In1 t In3 t In4 t R2 R3 R4 Source&Ver_ces& Target&Ver_ces&
  • 25. http://BlazeGraph.com http://MapGraph.io Global&Fron_er&Contrac_on&and&Communica_on& SYSTAP™, LLC © 2006-2014 All Rights Reserved 15 3/15/15 1: procedure CONTRACT(Outt ij, Outt j, Int+1 i , Assignij) 2: Prefixt ij ExclusiveScanj(Outt ij) 3: Assignedij Assignedij [ (Outt ij Prefixt ij) 4: if i = p then 5: Outt j Outt ij [ Prefixt ij 6: end if 7: Broadcast(Outt j, p , ROW) 8: if i = j then 9: Int+1 i Outt j 10: end if 11: Broadcast(Int+1 i , i, COL) 12: end procedure ! Distributed prefix sum. Prefix is vertices discovered by your left neighbors (log p) ! Vertices first discovered by this GPU. ! Right most column has global frontier for the row ! Broadcast frontier over row. ! Broadcast frontier over column. G11 G22 G33 G44 C1 C2 C3 C4 R1 In2 t Out1 t Out2 t Out3 t Out4 t In1 t In3 t In4 t R2 R3 R4 Source&Ver_ces& Target&Ver_ces& •  All&GPUs&now&have&the&new&fron_er.&
  • 26. http://BlazeGraph.com http://MapGraph.io Weak&Scaling& •  Scaling&the&problem&size&with& more&GPUs& –  Fixed&problem&size&per&GPU.& •  Maximum&scale&27&(4.3B&edges)& –  64&K20&GPUs&=>&.147s&=>&29&GTEPS& –  64&K40&GPUs&=>&.135s&=>&32&GTEPS& •  K40&has&faster&memory&bus.& 0 5 10 15 20 25 30 35 1 4 16 64 GTEPS #GPUs GPUs& Scale& Ver)ces& Edges& Time*(s)& GTEPS& 1& 21& 2,097,152& 67,108,864& 0.0254* 2.5& 4& 23& 8,388,608& 268,435,456& 0.0429* 6.3& 16& 25& 33,554,432& 1,073,741,824& 0.0715* 15.0& 64& 27& 134,217,728& 4,294,967,296& 0.1478* 29.1& SYSTAP™, LLC © 2006-2015 All Rights Reserved 16 3/10/15 Scaling the problem size with more GPUs.
  • 27. http://BlazeGraph.com http://MapGraph.io Central&Itera_on&Costs&(Weak&Scaling)& SYSTAP™, LLC © 2006-2015 All Rights Reserved 17 3/10/15 0.0000 0.0020 0.0040 0.0060 0.0080 0.0100 0.0120 4 16 64 Time(Seconds) #GPUs Computation Communication Termination •  Communica_on& costs&are&not& constant.& •  2D&design&implies& cost&grows&as& 2*log(2p)/log(p)* •  How&to&scale?& –  Overlapping& –  Compression& •  Graph& •  Message& –  Hybrid&par__oning& –  Heterogeneous& compu_ng&
  • 28. http://BlazeGraph.com http://MapGraph.io Costs&During&BFS&Traversal& SYSTAP™, LLC © 2006-2015 All Rights Reserved 18 3/10/15 •  Chart&shows&the&different&costs&for&each&GPU&in&each&itera_on&(64&GPUs).& •  Wave&_me&is&essen_ally&constant,&as&expected.& •  Compute&_me&peaks&during&the&central&itera_ons.& •  Costs&are&reasonably&well&balanced&across&all&GPUs&aler&the&2nd&itera_on.&
  • 29. http://BlazeGraph.com http://MapGraph.io Strong&Scaling& •  Speedup&on&a&constant&problem&size&with&more&GPUs& •  Problem&scale&25& –  2^25&ver_ces&(33,554,432)& –  2^26&directed&edges&(1,073,741,824)& •  Strong&scaling&efficiency&of&48%& –  Versus&44%&for&BG/Q& GPUs* GTEPS* Time*(s)* 16* 15.2* 0.071* 25* 18.2* 0.059* 36* 20.5* 0.053* 49& 21.8& 0.049* 64* 22.7* 0.047* SYSTAP™, LLC © 2006-2015 All Rights Reserved 19 3/10/15 Speedup on a constant problem size with more GPUs.