SlideShare a Scribd company logo
1 of 31
Download to read offline
High-Performance Analysis of Streaming
Graphs
E. Jason Riedy
School of Computational Science and Engineering
Georgia Institute of Technology
HPC Analytic Workshop, 28 June 2017
Outline
1. Motivation and Applications
2. Updating Algorithms for Streaming Graphs
3. Current and Future STINGER Models
4. Extracting Interesting Subgraphs
5. GPUs for Streaming Graphs?
6. Closing
Streaming Graphs — ACS, 28 June 2017 1/28
(insert prefix here)-scale data analysis
Cyber-security Identify anomalies, malicious actors
Health care Finding outbreaks, population epidemiology
Social networks Advertising, searching, grouping
Intelligence Decisions at scale, regulating markets, smart &
sustainable cities
Systems biology Understanding interactions, drug design
Power grid Disruptions, conservation
Simulation Discrete events, cracking meshes
Changes are important. Cannot stop the world...
Streaming Graphs — ACS, 28 June 2017 2/28
Why Graphs?
Another tool, like dense and sparse linear algebra.
• Combine things with pairwise
relationships
• Smaller, more generic than raw data.
• Taught (roughly) to all CS students...
• Semantic attributions can capture
essential relationships.
• Traversals can be faster than filtering
DB joins.
• Provide clear phrasing for queries
about relationships.
Streaming Graphs — ACS, 28 June 2017 3/28
Potential Applications
• Social Networks
• Identify communities, influences, bridges, trends,
anomalies (trends before they happen)...
• Potential to help social sciences, city planning, and
others with large-scale data.
• Cybersecurity
• Determine if new connections can access a device or
represent new threat in < 5ms...
• Is the transfer by a virus / persistent threat?
• Bioinformatics, health
• Construct gene sequences, analyze protein
interactions, map brain interactions
• Credit fraud forensics ⇒ detection ⇒ monitoring
• Real-time integration of all the customer’s data
Streaming Graphs — ACS, 28 June 2017 4/28
Streaming graph data
Network data rates:
• Gigabit ethernet: 81k – 1.5M packets per second
• Over 130 000 flows per second on 10 GigE (< 7.7 µs)
Person-level data rates:
• 500M posts per day on Twitter (6k / sec)1
• 3M posts per minute on Facebook (50k / sec)2
But often analyze only changes and not entire graph.
Throughput & latency trade off and expose different
levels of concurrency.
1
www.internetlivestats.com/twitter-statistics/
2
www.jeffbullas.com/2015/04/17/21-awesome-facebook-facts-and-statistics-you-need-to-check-out/
Streaming Graphs — ACS, 28 June 2017 5/28
Streaming graph analysis
Terminology, will go into more details:
• Streaming changes into a massive, evolving graph
• Will compare models later...
• Need to handle deletions as well as insertions
Previous STINGER performance results (x86-64):
Data ingest >2M upd/sec [Ediger, McColl, Poovey, Campbell, &
Bader 2014]
Clustering coefficients >100K upd/sec [R, Meyerhenke, B, E,
& Mattson 2012]
Connected comp. >1M upd/sec [McColl, Green, & B 2013]
Community clustering >100K upd/sec∗
[R & B 2013]
PageRank Up to 40× latency improvement [R 2016]
Streaming Graphs — ACS, 28 June 2017 6/28
Updating Algorithms for Streaming Graphs
2. Updating Algorithms for Streaming Graphs
2.1 Overview of STINGER
2.2 High-level Summary of Selected Algorithms
Streaming Graphs — ACS, 28 June 2017 7/28
STINGER: Framework for streaming graphs
Slide credit: Rob McColl and David Ediger
• OpenMP + sufficiently POSIX-ish
• Multiple processes for resilience
Streaming Graphs — ACS, 28 June 2017 8/28
Current STINGER model
Pre-process batch:
Sort by source vertex,
reconcile ins/del.
Pre-change hook
Alter graph (may “age off”old edges)
Post-change hook
STINGER
graph
Batch of insertions / deletions
Affected vertices
Change in metric
Streaming Graphs — ACS, 28 June 2017 9/28
Simpler Algorithms
• Triangle counting (Ediger, Jiang,
Riedy, & Bader, MTAAP 2010)
• Re-count around any changes.
• Connected components (McColl,
Green, & Bader, HIPC 2013)
• Insertions only: Trivial, just check
labels.
• Insertions and deletions: Fix
spanning forests.
• PageRank (Riedy, GABB 2016)
• Solve (I − αAT
∆D−1
∆ )∆x =
α(AT
∆D−1
∆ − ATD−1)x + r
v
i
j
m
n
1.6%
1.6%
1.6%
1.6%
1.6%
D
3.9%
F
3.9%
E
8.1%
C
34.3%B
38.4%
A
3.3%
Image from Wikipedia
Streaming Graphs — ACS, 28 June 2017 10/28
More Complex Algorithms
• Community monitoring (Godbolé, MS Thesis 2016)
• Track the dendogram, backtrack around changes.
• Simple re-agglomeration (Riedy & Bader, MTAAP
2013) loses quality.
• Seed set expansion
• Record sequence of merges, backtrack to changes.
(Zakrzewska & Bader, SNAM 2016)
• Betweenness centrality
• Carefully maintain the BFS trees (Green, McColl, &
Bader, SocialCom 2012)
• See also: Recent work at Karlsruhe in NetworKit
Streaming Graphs — ACS, 28 June 2017 11/28
Current and Future STINGER Models
3. Current and Future STINGER Models
3.1 How Fast Can We Go?
3.2 A Model for Streaming Validity
Streaming Graphs — ACS, 28 June 2017 12/28
Is STINGER’s current model good enough?
Data ingest rates, R-MAT into R-MAT, scales 24 & 30
q
q
q
q
q
q
1e+02
1e+03
1e+04
1e+05
1e+06
1 10 100 1000 10000 1e+05
Batch size
Updaterate(upd/s)
platform q Power8 Haswell Haswell−30
q
q q
q
q q0.00316
0.00562
0.01000
0.01778
0.03162
1 10 100 1000 10000 1e+05
Batch size
Avg.updatetime(s)
platform q Power8 Haswell Haswell−30
Want to add analysis clients without slowing data ingest!
Note that scale 30 starts with 1.1B vertices, 17B edges...
(Different STINGER internal parameters.)
Streaming Graphs — ACS, 28 June 2017 13/28
What if we don’t hold up changes?
Additional STINGER model
Analyze concurrently with the graph changes, and
produce a result correct for the starting graph and
some subset of concurrent changes.3
Sample of other models
• Put in a query, wait for sufficient data [Phillips, et al.]
• Evolving: Sample, accurate w/high-prob.
• Classical: dynamic algorithms, versioned data
3
Chunxing Yin, Riedy, Bader. “Validity of Graph Algorithms on
Streaming Data.” January 2017. (in submission)
Streaming Graphs — ACS, 28 June 2017 14/28
Algorithm validity in our model: Example.
Can you compute degrees in an undirected graph (no self
loops) concurrently with changes?
Algorithm: Iterate over vertices, count the number of
neighbors.
1
Compute deg(v1)
1 0
Compute deg(v2)
delete edge
Cannot correspond to an undirected graph plus any
subset of concurrent changes.
Valid for our model? No!
Not incorrect, just not valid for our model.
Streaming Graphs — ACS, 28 June 2017 15/28
Algorithm validity in our model
• What is valid?
• Typical BFS and follow-ons (betweenness centrality)
• Shiloach-Vishkin connected components
• PageRank, Katz (using Jacobi the right way)
• Triangle counting by region growing
• Using data once & saved decisions...
• What is invalid?
• Making a decision twice in implementations.
• ∆-stepping SSSP: Decrease a weight below ∆
• Degree optimization: Cross threshold, miss vertex
• Applying old information.
• Labeling in S. Kahan’s components alg.
Streaming Graphs — ACS, 28 June 2017 16/28
Fun properties
Due to Chunxing Yin, under sensible assumptions:
• You can produce a single-change stream to demonstrate
invalidity.
• Algorithms that produce a subgraph of their input cannot be
guaranteed to run concurrently with changes and always
produce snapshot outputs.
In progress:
• Validity for streaming! Apply an algorithm valid for our model.
Also collect the changes during execution. Now update the
result for those changes while more changes accumulate.
Repeat.
• Verification for debugging, etc.
Streaming Graphs — ACS, 28 June 2017 17/28
Extracting Interesting Subgraphs
4. Extracting Interesting Subgraphs
4.1 Seed Set Expansion
4.2 Expansion through PageRank and Katz
Streaming Graphs — ACS, 28 June 2017 18/28
Graphs: Big, nasty hairballs
Yifan Hu’s (AT&T) visualization of the in-2004 data set
http://www2.research.att.com/~yifanhu/gallery.html
Streaming Graphs — ACS, 28 June 2017 19/28
But no shortage of structure...
in-2004, matrix format from Davis, Florida
Sparse Matrix Collection
Jason’s network via LinkedIn Labs
• Locally, there are clusters or communities.
• There are methods for global community detection.
• Also need local communities around seeds for
queries and targeted analysis.
Streaming Graphs — ACS, 28 June 2017 20/28
Seed set expansion
• Seed set expansion finds the “best” subgraph or
communities for a set of vertices of interest
• Many quality criteria: Modularity, conductance-ish,
etc.
• Want to produce smaller expansions for viz. as well
as larger communities for deeper analysis.
• Dynamic agglomerative / modularity algorithms
update larger communities faster than
recomputation [Zakrzewska & Bader]
Streaming Graphs — ACS, 28 June 2017 21/28
PageRank and Katz centrality
Both PageRank and Katz centrality recover blocks in
artificial stochastic block model graphs.
0 200 400 600 800 1000
0
200
400
600
800
1000
0 200 400 600 800 1000
0
200
400
600
800
1000
0 200 400 600 800 1000
0
200
400
600
800
1000
0 200 400 600 800 1000
0
200
400
600
800
1000
R-MAT + SBM w/ 1000-vertex blocks
Tracking with Katz recalls ≥ 93% of the growing and
shrinking middle community using five random seeds.
Nathan, Zakrzewska, Riedy, & Bader, “Local Community Detection in Dynamic Graphs Using Personalized Centrality.”
Updating the expanded sets using incremental iterations:
Katz: ∆x(k+1)
= αA∆∆x(k)
+ (r − α∆Ax)|∆x(k+1)
Streaming Graphs — ACS, 28 June 2017 22/28
Streaming seed set expansion
Work in progress!
• Which seed set expansion methods provide
subgraphs useful for further analysis? How do the
results compare to global analysis?
• We do not want to maintain the entire |V| PR or Katz
vector, only around |S| where S is the output.
• Need to adapt the updating algorithms to be valid in
our non-stop model.
Streaming Graphs — ACS, 28 June 2017 23/28
GPUs for Streaming Graphs?
5. GPUs for Streaming Graphs?
5.1 cuSTINGER
Streaming Graphs — ACS, 28 June 2017 24/28
So... Now what?
• Maintain these communities / subgraphs on or near
accelerators!
• Sending changes may help with the connection
bandwidth problem.
• cuSTINGER [Green & Bader]
• A variant of STINGER for NVIDIA GPUs
• Ingest at rates over 107 updates / sec
• Ingest & triangle count updates at up to 2 × 106
upd/s (higher in prep!)
• Amenable to existing high-performance static
analysis kernels like betweenness centrality.
• https://github.com/cuStinger
Streaming Graphs — ACS, 28 June 2017 25/28
So... Now what?
• Maintain these communities / subgraphs on or near
accelerators!
• Sending changes may help with the connection
bandwidth problem.
• Micron Automata (in progress with Aluru, Roy, and
Srivatsava)
• Hardware implementation of non-deterministic finite
automata
• Can be adapted to tackle graph problems!
• Theoretically fast subgraph matching...
Streaming Graphs — ACS, 28 June 2017 25/28
So... Now what?
• Maintain these communities / subgraphs on or near
accelerators!
• Sending changes may help with the connection
bandwidth problem.
• Others?
• Examining FPGA + HMC combinations to move closer
to memory (Jeff Young, Rich Vuduc).
• Hopefully Emu soon... Interested in others!
Center for Research into Novel Computing Hierarchies
(CRNCH), dir. Tom Conte, GT
Streaming Graphs — ACS, 28 June 2017 25/28
Future directions
• Of course, continue developing streaming / dynamic
/ incremental algorithms.
• For massive graphs, computing small changes is
always a win.
• Improving approximations or replacing expensive
metrics like betweenness centrality would be great.
• Include more external and semantic data.
• If vertices are documents or data records, many
more measures of similarity.
• Only now being exploited in concert with static graph
algorithms.
STINGER represents only some approaches! There are others.
Streaming Graphs — ACS, 28 June 2017 26/28
HPC Lab People
Faculty:
• David A. Bader
• Jason Riedy
• Oded Green∗
Included here:
• Chunxing Lin
• Eisha Nathan
• Anita Zakrzewska
STINGER:
• Robert McColl,
• James Fairbanks∗
(GTRI),
• Adam McLaughlin∗
,
• David Ediger∗
(GTRI),
• Jason Poovey (GTRI),
• Daniel Henderson†
,
• Karl Jiang†
, and
• feedback from users in
industry, government,
academia
Support: DoD, DoE, NSF, Intel, IBM, Oracle, NVIDIA
∗ Ph.D. related to STINGER. † Other previous students.
Streaming Graphs — ACS, 28 June 2017 27/28
STINGER: Where do you get it?
Home: www.cc.gatech.edu/stinger/
Code: git.cc.gatech.edu/git/project/stinger.git/
Gateway to
• code,
• development,
• documentation,
• presentations...
Remember: Academic code, but maturing
with contributions.
Users / contributors / questioners:
Georgia Tech, PNNL, CMU, Berkeley, Intel,
Cray, NVIDIA, IBM, Federal Government,
Ionic Security, Citi, Accenture, ...
Streaming Graphs — ACS, 28 June 2017 28/28

More Related Content

Similar to High-Performance Analysis of Streaming Graphs

High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsJason Riedy
 
ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...
ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...
ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...Jason Riedy
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsJason Riedy
 
An Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataAn Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataIJSTA
 
IEEE Big data 2016 Title and Abstract
IEEE Big data  2016 Title and AbstractIEEE Big data  2016 Title and Abstract
IEEE Big data 2016 Title and Abstracttsysglobalsolutions
 
Scalable Machine Learning: The Role of Stratified Data Sharding
Scalable Machine Learning: The Role of Stratified Data ShardingScalable Machine Learning: The Role of Stratified Data Sharding
Scalable Machine Learning: The Role of Stratified Data Shardinginside-BigData.com
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
Challenges in Analytics for BIG Data
Challenges in Analytics for BIG DataChallenges in Analytics for BIG Data
Challenges in Analytics for BIG DataPrasant Misra
 
Introduction to STINGER
Introduction to STINGERIntroduction to STINGER
Introduction to STINGERrobertmccoll
 
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...Sunghoon Joo
 
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...IOSR Journals
 
Symbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSymbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSteffen Staab
 
Efficiently Maintaining Distributed Model-Based Views on Real-Time Data Streams
Efficiently Maintaining Distributed Model-Based Views on Real-Time Data StreamsEfficiently Maintaining Distributed Model-Based Views on Real-Time Data Streams
Efficiently Maintaining Distributed Model-Based Views on Real-Time Data StreamsPlanetData Network of Excellence
 
Certain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic DataminingCertain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic Dataminingijdmtaiir
 
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...Big Data Value Association
 
STINGER: Multi-threaded Graph Streaming
STINGER: Multi-threaded Graph StreamingSTINGER: Multi-threaded Graph Streaming
STINGER: Multi-threaded Graph StreamingJason Riedy
 
2017 09-ohkawa-MCSoC2017-presen
2017 09-ohkawa-MCSoC2017-presen2017 09-ohkawa-MCSoC2017-presen
2017 09-ohkawa-MCSoC2017-presenTakeshi Ohkawa
 
A frame work for clustering time evolving data
A frame work for clustering time evolving dataA frame work for clustering time evolving data
A frame work for clustering time evolving dataiaemedu
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...miyurud
 
Massive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World ProblemsMassive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World Problemsinside-BigData.com
 

Similar to High-Performance Analysis of Streaming Graphs (20)

High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...
ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...
ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 
An Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataAn Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional Data
 
IEEE Big data 2016 Title and Abstract
IEEE Big data  2016 Title and AbstractIEEE Big data  2016 Title and Abstract
IEEE Big data 2016 Title and Abstract
 
Scalable Machine Learning: The Role of Stratified Data Sharding
Scalable Machine Learning: The Role of Stratified Data ShardingScalable Machine Learning: The Role of Stratified Data Sharding
Scalable Machine Learning: The Role of Stratified Data Sharding
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
Challenges in Analytics for BIG Data
Challenges in Analytics for BIG DataChallenges in Analytics for BIG Data
Challenges in Analytics for BIG Data
 
Introduction to STINGER
Introduction to STINGERIntroduction to STINGER
Introduction to STINGER
 
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
 
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
 
Symbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSymbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine Learning
 
Efficiently Maintaining Distributed Model-Based Views on Real-Time Data Streams
Efficiently Maintaining Distributed Model-Based Views on Real-Time Data StreamsEfficiently Maintaining Distributed Model-Based Views on Real-Time Data Streams
Efficiently Maintaining Distributed Model-Based Views on Real-Time Data Streams
 
Certain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic DataminingCertain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic Datamining
 
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
 
STINGER: Multi-threaded Graph Streaming
STINGER: Multi-threaded Graph StreamingSTINGER: Multi-threaded Graph Streaming
STINGER: Multi-threaded Graph Streaming
 
2017 09-ohkawa-MCSoC2017-presen
2017 09-ohkawa-MCSoC2017-presen2017 09-ohkawa-MCSoC2017-presen
2017 09-ohkawa-MCSoC2017-presen
 
A frame work for clustering time evolving data
A frame work for clustering time evolving dataA frame work for clustering time evolving data
A frame work for clustering time evolving data
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
 
Massive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World ProblemsMassive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World Problems
 

More from Jason Riedy

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFJason Riedy
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13Jason Riedy
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFJason Riedy
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architecturesJason Riedy
 
GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and EmusJason Riedy
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...Jason Riedy
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondJason Riedy
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksJason Riedy
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateJason Riedy
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Jason Riedy
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesJason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsJason Riedy
 
Network Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity AnalysisNetwork Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity AnalysisJason Riedy
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsJason Riedy
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsJason Riedy
 
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive GraphsSIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive GraphsJason Riedy
 

More from Jason Riedy (20)

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architectures
 
GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and Emus
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to Architecture
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and Beyond
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with Microbenchmarks
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery Update
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New Architectures
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming Graphs
 
Network Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity AnalysisNetwork Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity Analysis
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive GraphsSIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
 

Recently uploaded

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 

Recently uploaded (20)

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 

High-Performance Analysis of Streaming Graphs

  • 1. High-Performance Analysis of Streaming Graphs E. Jason Riedy School of Computational Science and Engineering Georgia Institute of Technology HPC Analytic Workshop, 28 June 2017
  • 2. Outline 1. Motivation and Applications 2. Updating Algorithms for Streaming Graphs 3. Current and Future STINGER Models 4. Extracting Interesting Subgraphs 5. GPUs for Streaming Graphs? 6. Closing Streaming Graphs — ACS, 28 June 2017 1/28
  • 3. (insert prefix here)-scale data analysis Cyber-security Identify anomalies, malicious actors Health care Finding outbreaks, population epidemiology Social networks Advertising, searching, grouping Intelligence Decisions at scale, regulating markets, smart & sustainable cities Systems biology Understanding interactions, drug design Power grid Disruptions, conservation Simulation Discrete events, cracking meshes Changes are important. Cannot stop the world... Streaming Graphs — ACS, 28 June 2017 2/28
  • 4. Why Graphs? Another tool, like dense and sparse linear algebra. • Combine things with pairwise relationships • Smaller, more generic than raw data. • Taught (roughly) to all CS students... • Semantic attributions can capture essential relationships. • Traversals can be faster than filtering DB joins. • Provide clear phrasing for queries about relationships. Streaming Graphs — ACS, 28 June 2017 3/28
  • 5. Potential Applications • Social Networks • Identify communities, influences, bridges, trends, anomalies (trends before they happen)... • Potential to help social sciences, city planning, and others with large-scale data. • Cybersecurity • Determine if new connections can access a device or represent new threat in < 5ms... • Is the transfer by a virus / persistent threat? • Bioinformatics, health • Construct gene sequences, analyze protein interactions, map brain interactions • Credit fraud forensics ⇒ detection ⇒ monitoring • Real-time integration of all the customer’s data Streaming Graphs — ACS, 28 June 2017 4/28
  • 6. Streaming graph data Network data rates: • Gigabit ethernet: 81k – 1.5M packets per second • Over 130 000 flows per second on 10 GigE (< 7.7 µs) Person-level data rates: • 500M posts per day on Twitter (6k / sec)1 • 3M posts per minute on Facebook (50k / sec)2 But often analyze only changes and not entire graph. Throughput & latency trade off and expose different levels of concurrency. 1 www.internetlivestats.com/twitter-statistics/ 2 www.jeffbullas.com/2015/04/17/21-awesome-facebook-facts-and-statistics-you-need-to-check-out/ Streaming Graphs — ACS, 28 June 2017 5/28
  • 7. Streaming graph analysis Terminology, will go into more details: • Streaming changes into a massive, evolving graph • Will compare models later... • Need to handle deletions as well as insertions Previous STINGER performance results (x86-64): Data ingest >2M upd/sec [Ediger, McColl, Poovey, Campbell, & Bader 2014] Clustering coefficients >100K upd/sec [R, Meyerhenke, B, E, & Mattson 2012] Connected comp. >1M upd/sec [McColl, Green, & B 2013] Community clustering >100K upd/sec∗ [R & B 2013] PageRank Up to 40× latency improvement [R 2016] Streaming Graphs — ACS, 28 June 2017 6/28
  • 8. Updating Algorithms for Streaming Graphs 2. Updating Algorithms for Streaming Graphs 2.1 Overview of STINGER 2.2 High-level Summary of Selected Algorithms Streaming Graphs — ACS, 28 June 2017 7/28
  • 9. STINGER: Framework for streaming graphs Slide credit: Rob McColl and David Ediger • OpenMP + sufficiently POSIX-ish • Multiple processes for resilience Streaming Graphs — ACS, 28 June 2017 8/28
  • 10. Current STINGER model Pre-process batch: Sort by source vertex, reconcile ins/del. Pre-change hook Alter graph (may “age off”old edges) Post-change hook STINGER graph Batch of insertions / deletions Affected vertices Change in metric Streaming Graphs — ACS, 28 June 2017 9/28
  • 11. Simpler Algorithms • Triangle counting (Ediger, Jiang, Riedy, & Bader, MTAAP 2010) • Re-count around any changes. • Connected components (McColl, Green, & Bader, HIPC 2013) • Insertions only: Trivial, just check labels. • Insertions and deletions: Fix spanning forests. • PageRank (Riedy, GABB 2016) • Solve (I − αAT ∆D−1 ∆ )∆x = α(AT ∆D−1 ∆ − ATD−1)x + r v i j m n 1.6% 1.6% 1.6% 1.6% 1.6% D 3.9% F 3.9% E 8.1% C 34.3%B 38.4% A 3.3% Image from Wikipedia Streaming Graphs — ACS, 28 June 2017 10/28
  • 12. More Complex Algorithms • Community monitoring (Godbolé, MS Thesis 2016) • Track the dendogram, backtrack around changes. • Simple re-agglomeration (Riedy & Bader, MTAAP 2013) loses quality. • Seed set expansion • Record sequence of merges, backtrack to changes. (Zakrzewska & Bader, SNAM 2016) • Betweenness centrality • Carefully maintain the BFS trees (Green, McColl, & Bader, SocialCom 2012) • See also: Recent work at Karlsruhe in NetworKit Streaming Graphs — ACS, 28 June 2017 11/28
  • 13. Current and Future STINGER Models 3. Current and Future STINGER Models 3.1 How Fast Can We Go? 3.2 A Model for Streaming Validity Streaming Graphs — ACS, 28 June 2017 12/28
  • 14. Is STINGER’s current model good enough? Data ingest rates, R-MAT into R-MAT, scales 24 & 30 q q q q q q 1e+02 1e+03 1e+04 1e+05 1e+06 1 10 100 1000 10000 1e+05 Batch size Updaterate(upd/s) platform q Power8 Haswell Haswell−30 q q q q q q0.00316 0.00562 0.01000 0.01778 0.03162 1 10 100 1000 10000 1e+05 Batch size Avg.updatetime(s) platform q Power8 Haswell Haswell−30 Want to add analysis clients without slowing data ingest! Note that scale 30 starts with 1.1B vertices, 17B edges... (Different STINGER internal parameters.) Streaming Graphs — ACS, 28 June 2017 13/28
  • 15. What if we don’t hold up changes? Additional STINGER model Analyze concurrently with the graph changes, and produce a result correct for the starting graph and some subset of concurrent changes.3 Sample of other models • Put in a query, wait for sufficient data [Phillips, et al.] • Evolving: Sample, accurate w/high-prob. • Classical: dynamic algorithms, versioned data 3 Chunxing Yin, Riedy, Bader. “Validity of Graph Algorithms on Streaming Data.” January 2017. (in submission) Streaming Graphs — ACS, 28 June 2017 14/28
  • 16. Algorithm validity in our model: Example. Can you compute degrees in an undirected graph (no self loops) concurrently with changes? Algorithm: Iterate over vertices, count the number of neighbors. 1 Compute deg(v1) 1 0 Compute deg(v2) delete edge Cannot correspond to an undirected graph plus any subset of concurrent changes. Valid for our model? No! Not incorrect, just not valid for our model. Streaming Graphs — ACS, 28 June 2017 15/28
  • 17. Algorithm validity in our model • What is valid? • Typical BFS and follow-ons (betweenness centrality) • Shiloach-Vishkin connected components • PageRank, Katz (using Jacobi the right way) • Triangle counting by region growing • Using data once & saved decisions... • What is invalid? • Making a decision twice in implementations. • ∆-stepping SSSP: Decrease a weight below ∆ • Degree optimization: Cross threshold, miss vertex • Applying old information. • Labeling in S. Kahan’s components alg. Streaming Graphs — ACS, 28 June 2017 16/28
  • 18. Fun properties Due to Chunxing Yin, under sensible assumptions: • You can produce a single-change stream to demonstrate invalidity. • Algorithms that produce a subgraph of their input cannot be guaranteed to run concurrently with changes and always produce snapshot outputs. In progress: • Validity for streaming! Apply an algorithm valid for our model. Also collect the changes during execution. Now update the result for those changes while more changes accumulate. Repeat. • Verification for debugging, etc. Streaming Graphs — ACS, 28 June 2017 17/28
  • 19. Extracting Interesting Subgraphs 4. Extracting Interesting Subgraphs 4.1 Seed Set Expansion 4.2 Expansion through PageRank and Katz Streaming Graphs — ACS, 28 June 2017 18/28
  • 20. Graphs: Big, nasty hairballs Yifan Hu’s (AT&T) visualization of the in-2004 data set http://www2.research.att.com/~yifanhu/gallery.html Streaming Graphs — ACS, 28 June 2017 19/28
  • 21. But no shortage of structure... in-2004, matrix format from Davis, Florida Sparse Matrix Collection Jason’s network via LinkedIn Labs • Locally, there are clusters or communities. • There are methods for global community detection. • Also need local communities around seeds for queries and targeted analysis. Streaming Graphs — ACS, 28 June 2017 20/28
  • 22. Seed set expansion • Seed set expansion finds the “best” subgraph or communities for a set of vertices of interest • Many quality criteria: Modularity, conductance-ish, etc. • Want to produce smaller expansions for viz. as well as larger communities for deeper analysis. • Dynamic agglomerative / modularity algorithms update larger communities faster than recomputation [Zakrzewska & Bader] Streaming Graphs — ACS, 28 June 2017 21/28
  • 23. PageRank and Katz centrality Both PageRank and Katz centrality recover blocks in artificial stochastic block model graphs. 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 R-MAT + SBM w/ 1000-vertex blocks Tracking with Katz recalls ≥ 93% of the growing and shrinking middle community using five random seeds. Nathan, Zakrzewska, Riedy, & Bader, “Local Community Detection in Dynamic Graphs Using Personalized Centrality.” Updating the expanded sets using incremental iterations: Katz: ∆x(k+1) = αA∆∆x(k) + (r − α∆Ax)|∆x(k+1) Streaming Graphs — ACS, 28 June 2017 22/28
  • 24. Streaming seed set expansion Work in progress! • Which seed set expansion methods provide subgraphs useful for further analysis? How do the results compare to global analysis? • We do not want to maintain the entire |V| PR or Katz vector, only around |S| where S is the output. • Need to adapt the updating algorithms to be valid in our non-stop model. Streaming Graphs — ACS, 28 June 2017 23/28
  • 25. GPUs for Streaming Graphs? 5. GPUs for Streaming Graphs? 5.1 cuSTINGER Streaming Graphs — ACS, 28 June 2017 24/28
  • 26. So... Now what? • Maintain these communities / subgraphs on or near accelerators! • Sending changes may help with the connection bandwidth problem. • cuSTINGER [Green & Bader] • A variant of STINGER for NVIDIA GPUs • Ingest at rates over 107 updates / sec • Ingest & triangle count updates at up to 2 × 106 upd/s (higher in prep!) • Amenable to existing high-performance static analysis kernels like betweenness centrality. • https://github.com/cuStinger Streaming Graphs — ACS, 28 June 2017 25/28
  • 27. So... Now what? • Maintain these communities / subgraphs on or near accelerators! • Sending changes may help with the connection bandwidth problem. • Micron Automata (in progress with Aluru, Roy, and Srivatsava) • Hardware implementation of non-deterministic finite automata • Can be adapted to tackle graph problems! • Theoretically fast subgraph matching... Streaming Graphs — ACS, 28 June 2017 25/28
  • 28. So... Now what? • Maintain these communities / subgraphs on or near accelerators! • Sending changes may help with the connection bandwidth problem. • Others? • Examining FPGA + HMC combinations to move closer to memory (Jeff Young, Rich Vuduc). • Hopefully Emu soon... Interested in others! Center for Research into Novel Computing Hierarchies (CRNCH), dir. Tom Conte, GT Streaming Graphs — ACS, 28 June 2017 25/28
  • 29. Future directions • Of course, continue developing streaming / dynamic / incremental algorithms. • For massive graphs, computing small changes is always a win. • Improving approximations or replacing expensive metrics like betweenness centrality would be great. • Include more external and semantic data. • If vertices are documents or data records, many more measures of similarity. • Only now being exploited in concert with static graph algorithms. STINGER represents only some approaches! There are others. Streaming Graphs — ACS, 28 June 2017 26/28
  • 30. HPC Lab People Faculty: • David A. Bader • Jason Riedy • Oded Green∗ Included here: • Chunxing Lin • Eisha Nathan • Anita Zakrzewska STINGER: • Robert McColl, • James Fairbanks∗ (GTRI), • Adam McLaughlin∗ , • David Ediger∗ (GTRI), • Jason Poovey (GTRI), • Daniel Henderson† , • Karl Jiang† , and • feedback from users in industry, government, academia Support: DoD, DoE, NSF, Intel, IBM, Oracle, NVIDIA ∗ Ph.D. related to STINGER. † Other previous students. Streaming Graphs — ACS, 28 June 2017 27/28
  • 31. STINGER: Where do you get it? Home: www.cc.gatech.edu/stinger/ Code: git.cc.gatech.edu/git/project/stinger.git/ Gateway to • code, • development, • documentation, • presentations... Remember: Academic code, but maturing with contributions. Users / contributors / questioners: Georgia Tech, PNNL, CMU, Berkeley, Intel, Cray, NVIDIA, IBM, Federal Government, Ionic Security, Citi, Accenture, ... Streaming Graphs — ACS, 28 June 2017 28/28