SlideShare a Scribd company logo
1 of 23
Download to read offline
A New Algorithm Model for Massive-Scale
Streaming Graph Analysis
E. Jason Riedy, Chunxing Yin, and David A. Bader
Georgia Institute of Technology
SIAM Workshop on Network Science, 14 July 2017
Outline
Motivation and Applications
Current and Future STINGER Models
Closing
Streaming Graphs — SIAM NS, 14 July 2017 1/19
Motivation and Applications
(insert prefix here)-scale data analysis
Cyber-security Identify anomalies, malicious actors
Health care Finding outbreaks, population epidemiology
Social networks Advertising, searching, grouping
Intelligence Decisions at scale, regulating markets, smart &
sustainable cities
Systems biology Understanding interactions, drug design
Power grid Disruptions, conservation
Simulation Discrete events, cracking meshes
Changes are important. Cannot stop the world...
Streaming Graphs — SIAM NS, 14 July 2017 2/19
Potential Applications
• Social Networks
• Identify communities, influences, bridges, trends,
anomalies (trends before they happen)...
• Potential to help social sciences, city planning, and
others with large-scale data.
• Cybersecurity
• Determine if new connections can access a device or
represent new threat in < 5ms...
• Is the transfer by a virus / persistent threat?
• Bioinformatics, health
• Construct gene sequences, analyze protein
interactions, map brain interactions
• Credit fraud forensics ⇒ detection ⇒ monitoring
• Real-time integration of all the customer’s data
Streaming Graphs — SIAM NS, 14 July 2017 3/19
Streaming graph data
Network data rates:
• Gigabit ethernet: 81k – 1.5M packets per second
• Over 130 000 flows per second on 10 GigE (< 7.7 µs)
Person-level data rates:
• 500M posts per day on Twitter (6k / sec)1
• 3M posts per minute on Facebook (50k / sec)2
Should analyze only changes and not entire graph.
Throughput & latency trade off and expose different
levels of concurrency.
1
www.internetlivestats.com/twitter-statistics/
2
www.jeffbullas.com/2015/04/17/21-awesome-facebook-facts-and-statistics-you-need-to-check-out/
Streaming Graphs — SIAM NS, 14 July 2017 4/19
Streaming graph analysis
Terminology, will go into more details:
• Streaming changes into a massive, evolving graph
• Will compare models later...
• Need to handle deletions as well as insertions
Previous STINGER performance results (x86-64):
Data ingest >2M upd/sec [Ediger, McColl, Poovey, Campbell, &
Bader 2014]
Clustering coefficients >100K upd/sec [R, Meyerhenke, B, E,
& Mattson 2012]
Connected comp. >1M upd/sec [McColl, Green, & B 2013]
Community clustering >100K upd/sec∗
[R & B 2013]
PageRank Up to 40× latency improvement [R 2016]
Streaming Graphs — SIAM NS, 14 July 2017 5/19
Current and Future STINGER
Models
STINGER: Framework for streaming graphs
Slide credit: Rob McColl and David Ediger
• OpenMP + sufficiently POSIX-ish
• Multiple processes for resilience
Streaming Graphs — SIAM NS, 14 July 2017 6/19
Current STINGER model
Pre-process batch:
Sort by source vertex,
reconcile ins/del.
Pre-change hook
Alter graph (may “age off”old edges)
Post-change hook
STINGER
graph
Batch of insertions / deletions
Affected vertices
Change in metric
Streaming Graphs — SIAM NS, 14 July 2017 7/19
Is STINGER’s current model good enough?
Data ingest rates, R-MAT into R-MAT, scales 24 & 30
q
q
q
q
q
q
1e+02
1e+03
1e+04
1e+05
1e+06
1 10 100 1000 10000 1e+05
Batch size
Updaterate(upd/s)
platform q Power8 Haswell Haswell−30
q
q q
q
q q0.00316
0.00562
0.01000
0.01778
0.03162
1 10 100 1000 10000 1e+05
Batch size
Avg.updatetime(s)
platform q Power8 Haswell Haswell−30
Want to add analysis clients without slowing data ingest!
Note that scale 30 starts with 1.1B vertices, 17B edges...
(Different STINGER internal parameters.)
Streaming Graphs — SIAM NS, 14 July 2017 8/19
What if we don’t hold up changes?
When is an algorithm valid?
Analyze concurrently with the graph changes, and
produce a result correct for the starting graph and
some subset of concurrent changes.3
• No locking beyond atomic operations.
• No versioned data structure.
• No stopping.
3
Chunxing Yin, Riedy, Bader. “Validity of Graph Algorithms on
Streaming Data.” 2017. (in submission)
Streaming Graphs — SIAM NS, 14 July 2017 9/19
Sample of other execution models
• Put in a query, wait for sufficient data [Phillips, et al.
at Sandia]
• Different but very interesting model.
• Evolving: Sample, accurate w/high-prob.
• Difficult to generalize into graph results (e.g.
shortest path tree).
• Classical: dynamic algorithms, versioned data
• Can require drastically more storage, possibly a copy
of the graph per property, or more overhead for
techniques like read-copy-update.
We are assuming we cannot “re-run” the world and must
keep up.
Streaming Graphs — SIAM NS, 14 July 2017 10/19
Algorithm validity in our model: Example.
Can you compute degrees in an undirected graph (no self
loops) concurrently with changes?
Algorithm: Iterate over vertices, count the number of
neighbors.
1
Compute deg(v1)
1 0
Compute deg(v2)
delete edge
Cannot correspond to an undirected graph at all!
Valid for our model? No!
Not incorrect, just not valid for our model.
Streaming Graphs — SIAM NS, 14 July 2017 11/19
Algorithm validity in our model: Example.
Can you compute degrees in an undirected graph (no self
loops) concurrently with changes?
Algorithm: Iterate over edges, increment the degrees of
the endpoints.
1 1
Inc deg(v1), deg(v2)
1 1
(later...)
delete edge
Corresponds to the beginning graph plus a subset of
concurrent changes.
Valid for our model? Yes!
Undirected stored as directed: skip edges with v1 ≥ v2.
Streaming Graphs — SIAM NS, 14 July 2017 12/19
Algorithm validity in our model
s
w(e1) = 10
w(e2) = 5 → 1
∆ = 4
• What is valid?
• Typical BFS
• Shiloach-Vishkin connected components
• PageRank (will describe...)
• Saved decisions...
• What is invalid?
• Making a decision twice in implementations
• ∆-stepping SSSP: Decrease a weight below ∆
• Degree optimization: Cross threshold, miss vertex
• Applying old or different information
• Multiply counting triangles: Counts match no graph
• Multiple searches: Betweenness centrality
• Labeling in S. Kahan’s components alg
Streaming Graphs — SIAM NS, 14 July 2017 13/19
PageRank without stopping
Apply Jacobi iteration to the linear system form of
PageRank:
x(k+1)
= αAT
D−1
x(k)
+ (1 − α)v.
Amusingly, the residual
r(k)
= (1 − α)v − (I − αAT
D−1
)x(k)
= x(k+1)
− x(k)
.
So if r(k)
is small, converged to a solution of a system near
the graph in the most recent iteration, hence to a graph
containing the original plus some subset of changes.
Streaming Graphs — SIAM NS, 14 July 2017 14/19
Fun properties for one-shot queries
Due to Chunxing Yin, under sensible assumptions:
1. You can produce a single-change stream to
demonstrate invalidity.
• Idea: Start with a graph that incorporates all the
visible changes, introduce the one change at the
right time.
2. Algorithms that produce a subgraph of their input
cannot be guaranteed to run concurrently with
changes and always produce moment-in-time
outputs.
• Idea: Any time a snapshot result could happen,
delete then re-insert an edge from the output.
Streaming Graphs — SIAM NS, 14 July 2017 15/19
On to streaming...
Can we update graph metrics as new data arrives?
• Track what changed during the one-shot query.
• Update locally around those changes, while other
changes are occuring.
• If the update is valid, can repeat to follow a
streaming graph.
Initial
∆0
Upd. w/∆0
∆1
Upd. w/∆1
∆2
Example: PageRank. Treat only the changed portions as
unconverged.
Streaming Graphs — SIAM NS, 14 July 2017 16/19
Then what?
• Many analyses do not scale in
performance to graphs with
billions of vertices.
• But we can extract
subgraphs...
• without stopping data ingest,
and...
• update the results!
Work in progress, based on PageRank and Katz.
Streaming Graphs — SIAM NS, 14 July 2017 17/19
Closing
Closing
• Summary
• Analysis concurrent with graph change can work.
• But not all methods are valid. Avoid evaluating
conditions or exploring the graph more than once.
• Valid updating methods can continue
• Future work
• Track subgraphs / communities for “slow” analyses
• Develop more valid updating methods,
approximation results
• Consider the debugging problem...
• And metadata...
Non-stop validity is only one approach! There are others.
Streaming Graphs — SIAM NS, 14 July 2017 18/19
STINGER: Where do you get it?
Home: www.cc.gatech.edu/stinger/
Code: git.cc.gatech.edu/git/project/stinger.git/
Gateway to
• code,
• development,
• documentation,
• presentations...
Remember: Academic code, but maturing
with contributions.
Users / contributors / questioners:
Georgia Tech, PNNL, CMU, Berkeley, Intel,
Cray, NVIDIA, IBM, Federal Government,
Ionic Security, Citi, Accenture, ...
Streaming Graphs — SIAM NS, 14 July 2017 19/19

More Related Content

Similar to A New Algorithm Model for Massive-Scale Streaming Graph Analysis

STINGER: Multi-threaded Graph Streaming
STINGER: Multi-threaded Graph StreamingSTINGER: Multi-threaded Graph Streaming
STINGER: Multi-threaded Graph Streaming
Jason Riedy
 
Scalable Machine Learning: The Role of Stratified Data Sharding
Scalable Machine Learning: The Role of Stratified Data ShardingScalable Machine Learning: The Role of Stratified Data Sharding
Scalable Machine Learning: The Role of Stratified Data Sharding
inside-BigData.com
 
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTESDyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
Subhajit Sahu
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
miyurud
 
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESDynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Subhajit Sahu
 

Similar to A New Algorithm Model for Massive-Scale Streaming Graph Analysis (20)

High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 
Distributed deep learning
Distributed deep learningDistributed deep learning
Distributed deep learning
 
STINGER: Multi-threaded Graph Streaming
STINGER: Multi-threaded Graph StreamingSTINGER: Multi-threaded Graph Streaming
STINGER: Multi-threaded Graph Streaming
 
Scalable Machine Learning: The Role of Stratified Data Sharding
Scalable Machine Learning: The Role of Stratified Data ShardingScalable Machine Learning: The Role of Stratified Data Sharding
Scalable Machine Learning: The Role of Stratified Data Sharding
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
 
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTESDyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
 
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)
 
11.concept for a web map implementation with faster query response
11.concept for a web map implementation with faster query response11.concept for a web map implementation with faster query response
11.concept for a web map implementation with faster query response
 
Concept for a web map implementation with faster query response
Concept for a web map implementation with faster query responseConcept for a web map implementation with faster query response
Concept for a web map implementation with faster query response
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
 
IEEE Big data 2016 Title and Abstract
IEEE Big data  2016 Title and AbstractIEEE Big data  2016 Title and Abstract
IEEE Big data 2016 Title and Abstract
 
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESDynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
 
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONSBIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
 
Data analytics in computer networking
Data analytics in computer networkingData analytics in computer networking
Data analytics in computer networking
 

More from Jason Riedy

More from Jason Riedy (20)

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architectures
 
GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and Emus
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to Architecture
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and Beyond
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with Microbenchmarks
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery Update
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New Architectures
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming Graphs
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
Network Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity AnalysisNetwork Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity Analysis
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 

Recently uploaded

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 

Recently uploaded (20)

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 

A New Algorithm Model for Massive-Scale Streaming Graph Analysis

  • 1. A New Algorithm Model for Massive-Scale Streaming Graph Analysis E. Jason Riedy, Chunxing Yin, and David A. Bader Georgia Institute of Technology SIAM Workshop on Network Science, 14 July 2017
  • 2. Outline Motivation and Applications Current and Future STINGER Models Closing Streaming Graphs — SIAM NS, 14 July 2017 1/19
  • 4. (insert prefix here)-scale data analysis Cyber-security Identify anomalies, malicious actors Health care Finding outbreaks, population epidemiology Social networks Advertising, searching, grouping Intelligence Decisions at scale, regulating markets, smart & sustainable cities Systems biology Understanding interactions, drug design Power grid Disruptions, conservation Simulation Discrete events, cracking meshes Changes are important. Cannot stop the world... Streaming Graphs — SIAM NS, 14 July 2017 2/19
  • 5. Potential Applications • Social Networks • Identify communities, influences, bridges, trends, anomalies (trends before they happen)... • Potential to help social sciences, city planning, and others with large-scale data. • Cybersecurity • Determine if new connections can access a device or represent new threat in < 5ms... • Is the transfer by a virus / persistent threat? • Bioinformatics, health • Construct gene sequences, analyze protein interactions, map brain interactions • Credit fraud forensics ⇒ detection ⇒ monitoring • Real-time integration of all the customer’s data Streaming Graphs — SIAM NS, 14 July 2017 3/19
  • 6. Streaming graph data Network data rates: • Gigabit ethernet: 81k – 1.5M packets per second • Over 130 000 flows per second on 10 GigE (< 7.7 µs) Person-level data rates: • 500M posts per day on Twitter (6k / sec)1 • 3M posts per minute on Facebook (50k / sec)2 Should analyze only changes and not entire graph. Throughput & latency trade off and expose different levels of concurrency. 1 www.internetlivestats.com/twitter-statistics/ 2 www.jeffbullas.com/2015/04/17/21-awesome-facebook-facts-and-statistics-you-need-to-check-out/ Streaming Graphs — SIAM NS, 14 July 2017 4/19
  • 7. Streaming graph analysis Terminology, will go into more details: • Streaming changes into a massive, evolving graph • Will compare models later... • Need to handle deletions as well as insertions Previous STINGER performance results (x86-64): Data ingest >2M upd/sec [Ediger, McColl, Poovey, Campbell, & Bader 2014] Clustering coefficients >100K upd/sec [R, Meyerhenke, B, E, & Mattson 2012] Connected comp. >1M upd/sec [McColl, Green, & B 2013] Community clustering >100K upd/sec∗ [R & B 2013] PageRank Up to 40× latency improvement [R 2016] Streaming Graphs — SIAM NS, 14 July 2017 5/19
  • 8. Current and Future STINGER Models
  • 9. STINGER: Framework for streaming graphs Slide credit: Rob McColl and David Ediger • OpenMP + sufficiently POSIX-ish • Multiple processes for resilience Streaming Graphs — SIAM NS, 14 July 2017 6/19
  • 10. Current STINGER model Pre-process batch: Sort by source vertex, reconcile ins/del. Pre-change hook Alter graph (may “age off”old edges) Post-change hook STINGER graph Batch of insertions / deletions Affected vertices Change in metric Streaming Graphs — SIAM NS, 14 July 2017 7/19
  • 11. Is STINGER’s current model good enough? Data ingest rates, R-MAT into R-MAT, scales 24 & 30 q q q q q q 1e+02 1e+03 1e+04 1e+05 1e+06 1 10 100 1000 10000 1e+05 Batch size Updaterate(upd/s) platform q Power8 Haswell Haswell−30 q q q q q q0.00316 0.00562 0.01000 0.01778 0.03162 1 10 100 1000 10000 1e+05 Batch size Avg.updatetime(s) platform q Power8 Haswell Haswell−30 Want to add analysis clients without slowing data ingest! Note that scale 30 starts with 1.1B vertices, 17B edges... (Different STINGER internal parameters.) Streaming Graphs — SIAM NS, 14 July 2017 8/19
  • 12. What if we don’t hold up changes? When is an algorithm valid? Analyze concurrently with the graph changes, and produce a result correct for the starting graph and some subset of concurrent changes.3 • No locking beyond atomic operations. • No versioned data structure. • No stopping. 3 Chunxing Yin, Riedy, Bader. “Validity of Graph Algorithms on Streaming Data.” 2017. (in submission) Streaming Graphs — SIAM NS, 14 July 2017 9/19
  • 13. Sample of other execution models • Put in a query, wait for sufficient data [Phillips, et al. at Sandia] • Different but very interesting model. • Evolving: Sample, accurate w/high-prob. • Difficult to generalize into graph results (e.g. shortest path tree). • Classical: dynamic algorithms, versioned data • Can require drastically more storage, possibly a copy of the graph per property, or more overhead for techniques like read-copy-update. We are assuming we cannot “re-run” the world and must keep up. Streaming Graphs — SIAM NS, 14 July 2017 10/19
  • 14. Algorithm validity in our model: Example. Can you compute degrees in an undirected graph (no self loops) concurrently with changes? Algorithm: Iterate over vertices, count the number of neighbors. 1 Compute deg(v1) 1 0 Compute deg(v2) delete edge Cannot correspond to an undirected graph at all! Valid for our model? No! Not incorrect, just not valid for our model. Streaming Graphs — SIAM NS, 14 July 2017 11/19
  • 15. Algorithm validity in our model: Example. Can you compute degrees in an undirected graph (no self loops) concurrently with changes? Algorithm: Iterate over edges, increment the degrees of the endpoints. 1 1 Inc deg(v1), deg(v2) 1 1 (later...) delete edge Corresponds to the beginning graph plus a subset of concurrent changes. Valid for our model? Yes! Undirected stored as directed: skip edges with v1 ≥ v2. Streaming Graphs — SIAM NS, 14 July 2017 12/19
  • 16. Algorithm validity in our model s w(e1) = 10 w(e2) = 5 → 1 ∆ = 4 • What is valid? • Typical BFS • Shiloach-Vishkin connected components • PageRank (will describe...) • Saved decisions... • What is invalid? • Making a decision twice in implementations • ∆-stepping SSSP: Decrease a weight below ∆ • Degree optimization: Cross threshold, miss vertex • Applying old or different information • Multiply counting triangles: Counts match no graph • Multiple searches: Betweenness centrality • Labeling in S. Kahan’s components alg Streaming Graphs — SIAM NS, 14 July 2017 13/19
  • 17. PageRank without stopping Apply Jacobi iteration to the linear system form of PageRank: x(k+1) = αAT D−1 x(k) + (1 − α)v. Amusingly, the residual r(k) = (1 − α)v − (I − αAT D−1 )x(k) = x(k+1) − x(k) . So if r(k) is small, converged to a solution of a system near the graph in the most recent iteration, hence to a graph containing the original plus some subset of changes. Streaming Graphs — SIAM NS, 14 July 2017 14/19
  • 18. Fun properties for one-shot queries Due to Chunxing Yin, under sensible assumptions: 1. You can produce a single-change stream to demonstrate invalidity. • Idea: Start with a graph that incorporates all the visible changes, introduce the one change at the right time. 2. Algorithms that produce a subgraph of their input cannot be guaranteed to run concurrently with changes and always produce moment-in-time outputs. • Idea: Any time a snapshot result could happen, delete then re-insert an edge from the output. Streaming Graphs — SIAM NS, 14 July 2017 15/19
  • 19. On to streaming... Can we update graph metrics as new data arrives? • Track what changed during the one-shot query. • Update locally around those changes, while other changes are occuring. • If the update is valid, can repeat to follow a streaming graph. Initial ∆0 Upd. w/∆0 ∆1 Upd. w/∆1 ∆2 Example: PageRank. Treat only the changed portions as unconverged. Streaming Graphs — SIAM NS, 14 July 2017 16/19
  • 20. Then what? • Many analyses do not scale in performance to graphs with billions of vertices. • But we can extract subgraphs... • without stopping data ingest, and... • update the results! Work in progress, based on PageRank and Katz. Streaming Graphs — SIAM NS, 14 July 2017 17/19
  • 22. Closing • Summary • Analysis concurrent with graph change can work. • But not all methods are valid. Avoid evaluating conditions or exploring the graph more than once. • Valid updating methods can continue • Future work • Track subgraphs / communities for “slow” analyses • Develop more valid updating methods, approximation results • Consider the debugging problem... • And metadata... Non-stop validity is only one approach! There are others. Streaming Graphs — SIAM NS, 14 July 2017 18/19
  • 23. STINGER: Where do you get it? Home: www.cc.gatech.edu/stinger/ Code: git.cc.gatech.edu/git/project/stinger.git/ Gateway to • code, • development, • documentation, • presentations... Remember: Academic code, but maturing with contributions. Users / contributors / questioners: Georgia Tech, PNNL, CMU, Berkeley, Intel, Cray, NVIDIA, IBM, Federal Government, Ionic Security, Citi, Accenture, ... Streaming Graphs — SIAM NS, 14 July 2017 19/19