Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- The AI Rush by Jean-Baptiste Dumont 1134930 views
- AI and Machine Learning Demystified... by Carol Smith 3646791 views
- 10 facts about jobs in the future by Pew Research Cent... 675466 views
- 2017 holiday survey: An annual anal... by Deloitte United S... 1090108 views
- Harry Surden - Artificial Intellige... by Harry Surden 637231 views
- Inside Google's Numbers in 2017 by Rand Fishkin 1219183 views

174 views

Published on

Published in:
Data & Analytics

No Downloads

Total views

174

On SlideShare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

5

Comments

0

Likes

1

No embeds

No notes for slide

- 1. A New Algorithm Model for Massive-Scale Streaming Graph Analysis E. Jason Riedy, Chunxing Yin, and David A. Bader Georgia Institute of Technology SIAM Workshop on Network Science, 14 July 2017
- 2. Outline Motivation and Applications Current and Future STINGER Models Closing Streaming Graphs — SIAM NS, 14 July 2017 1/19
- 3. Motivation and Applications
- 4. (insert preﬁx here)-scale data analysis Cyber-security Identify anomalies, malicious actors Health care Finding outbreaks, population epidemiology Social networks Advertising, searching, grouping Intelligence Decisions at scale, regulating markets, smart & sustainable cities Systems biology Understanding interactions, drug design Power grid Disruptions, conservation Simulation Discrete events, cracking meshes Changes are important. Cannot stop the world... Streaming Graphs — SIAM NS, 14 July 2017 2/19
- 5. Potential Applications • Social Networks • Identify communities, inﬂuences, bridges, trends, anomalies (trends before they happen)... • Potential to help social sciences, city planning, and others with large-scale data. • Cybersecurity • Determine if new connections can access a device or represent new threat in < 5ms... • Is the transfer by a virus / persistent threat? • Bioinformatics, health • Construct gene sequences, analyze protein interactions, map brain interactions • Credit fraud forensics ⇒ detection ⇒ monitoring • Real-time integration of all the customer’s data Streaming Graphs — SIAM NS, 14 July 2017 3/19
- 6. Streaming graph data Network data rates: • Gigabit ethernet: 81k – 1.5M packets per second • Over 130 000 ﬂows per second on 10 GigE (< 7.7 µs) Person-level data rates: • 500M posts per day on Twitter (6k / sec)1 • 3M posts per minute on Facebook (50k / sec)2 Should analyze only changes and not entire graph. Throughput & latency trade off and expose different levels of concurrency. 1 www.internetlivestats.com/twitter-statistics/ 2 www.jeffbullas.com/2015/04/17/21-awesome-facebook-facts-and-statistics-you-need-to-check-out/ Streaming Graphs — SIAM NS, 14 July 2017 4/19
- 7. Streaming graph analysis Terminology, will go into more details: • Streaming changes into a massive, evolving graph • Will compare models later... • Need to handle deletions as well as insertions Previous STINGER performance results (x86-64): Data ingest >2M upd/sec [Ediger, McColl, Poovey, Campbell, & Bader 2014] Clustering coefﬁcients >100K upd/sec [R, Meyerhenke, B, E, & Mattson 2012] Connected comp. >1M upd/sec [McColl, Green, & B 2013] Community clustering >100K upd/sec∗ [R & B 2013] PageRank Up to 40× latency improvement [R 2016] Streaming Graphs — SIAM NS, 14 July 2017 5/19
- 8. Current and Future STINGER Models
- 9. STINGER: Framework for streaming graphs Slide credit: Rob McColl and David Ediger • OpenMP + sufﬁciently POSIX-ish • Multiple processes for resilience Streaming Graphs — SIAM NS, 14 July 2017 6/19
- 10. Current STINGER model Pre-process batch: Sort by source vertex, reconcile ins/del. Pre-change hook Alter graph (may “age off”old edges) Post-change hook STINGER graph Batch of insertions / deletions Affected vertices Change in metric Streaming Graphs — SIAM NS, 14 July 2017 7/19
- 11. Is STINGER’s current model good enough? Data ingest rates, R-MAT into R-MAT, scales 24 & 30 q q q q q q 1e+02 1e+03 1e+04 1e+05 1e+06 1 10 100 1000 10000 1e+05 Batch size Updaterate(upd/s) platform q Power8 Haswell Haswell−30 q q q q q q0.00316 0.00562 0.01000 0.01778 0.03162 1 10 100 1000 10000 1e+05 Batch size Avg.updatetime(s) platform q Power8 Haswell Haswell−30 Want to add analysis clients without slowing data ingest! Note that scale 30 starts with 1.1B vertices, 17B edges... (Different STINGER internal parameters.) Streaming Graphs — SIAM NS, 14 July 2017 8/19
- 12. What if we don’t hold up changes? When is an algorithm valid? Analyze concurrently with the graph changes, and produce a result correct for the starting graph and some subset of concurrent changes.3 • No locking beyond atomic operations. • No versioned data structure. • No stopping. 3 Chunxing Yin, Riedy, Bader. “Validity of Graph Algorithms on Streaming Data.” 2017. (in submission) Streaming Graphs — SIAM NS, 14 July 2017 9/19
- 13. Sample of other execution models • Put in a query, wait for sufﬁcient data [Phillips, et al. at Sandia] • Different but very interesting model. • Evolving: Sample, accurate w/high-prob. • Difﬁcult to generalize into graph results (e.g. shortest path tree). • Classical: dynamic algorithms, versioned data • Can require drastically more storage, possibly a copy of the graph per property, or more overhead for techniques like read-copy-update. We are assuming we cannot “re-run” the world and must keep up. Streaming Graphs — SIAM NS, 14 July 2017 10/19
- 14. Algorithm validity in our model: Example. Can you compute degrees in an undirected graph (no self loops) concurrently with changes? Algorithm: Iterate over vertices, count the number of neighbors. 1 Compute deg(v1) 1 0 Compute deg(v2) delete edge Cannot correspond to an undirected graph at all! Valid for our model? No! Not incorrect, just not valid for our model. Streaming Graphs — SIAM NS, 14 July 2017 11/19
- 15. Algorithm validity in our model: Example. Can you compute degrees in an undirected graph (no self loops) concurrently with changes? Algorithm: Iterate over edges, increment the degrees of the endpoints. 1 1 Inc deg(v1), deg(v2) 1 1 (later...) delete edge Corresponds to the beginning graph plus a subset of concurrent changes. Valid for our model? Yes! Undirected stored as directed: skip edges with v1 ≥ v2. Streaming Graphs — SIAM NS, 14 July 2017 12/19
- 16. Algorithm validity in our model s w(e1) = 10 w(e2) = 5 → 1 ∆ = 4 • What is valid? • Typical BFS • Shiloach-Vishkin connected components • PageRank (will describe...) • Saved decisions... • What is invalid? • Making a decision twice in implementations • ∆-stepping SSSP: Decrease a weight below ∆ • Degree optimization: Cross threshold, miss vertex • Applying old or different information • Multiply counting triangles: Counts match no graph • Multiple searches: Betweenness centrality • Labeling in S. Kahan’s components alg Streaming Graphs — SIAM NS, 14 July 2017 13/19
- 17. PageRank without stopping Apply Jacobi iteration to the linear system form of PageRank: x(k+1) = αAT D−1 x(k) + (1 − α)v. Amusingly, the residual r(k) = (1 − α)v − (I − αAT D−1 )x(k) = x(k+1) − x(k) . So if r(k) is small, converged to a solution of a system near the graph in the most recent iteration, hence to a graph containing the original plus some subset of changes. Streaming Graphs — SIAM NS, 14 July 2017 14/19
- 18. Fun properties for one-shot queries Due to Chunxing Yin, under sensible assumptions: 1. You can produce a single-change stream to demonstrate invalidity. • Idea: Start with a graph that incorporates all the visible changes, introduce the one change at the right time. 2. Algorithms that produce a subgraph of their input cannot be guaranteed to run concurrently with changes and always produce moment-in-time outputs. • Idea: Any time a snapshot result could happen, delete then re-insert an edge from the output. Streaming Graphs — SIAM NS, 14 July 2017 15/19
- 19. On to streaming... Can we update graph metrics as new data arrives? • Track what changed during the one-shot query. • Update locally around those changes, while other changes are occuring. • If the update is valid, can repeat to follow a streaming graph. Initial ∆0 Upd. w/∆0 ∆1 Upd. w/∆1 ∆2 Example: PageRank. Treat only the changed portions as unconverged. Streaming Graphs — SIAM NS, 14 July 2017 16/19
- 20. Then what? • Many analyses do not scale in performance to graphs with billions of vertices. • But we can extract subgraphs... • without stopping data ingest, and... • update the results! Work in progress, based on PageRank and Katz. Streaming Graphs — SIAM NS, 14 July 2017 17/19
- 21. Closing
- 22. Closing • Summary • Analysis concurrent with graph change can work. • But not all methods are valid. Avoid evaluating conditions or exploring the graph more than once. • Valid updating methods can continue • Future work • Track subgraphs / communities for “slow” analyses • Develop more valid updating methods, approximation results • Consider the debugging problem... • And metadata... Non-stop validity is only one approach! There are others. Streaming Graphs — SIAM NS, 14 July 2017 18/19
- 23. STINGER: Where do you get it? Home: www.cc.gatech.edu/stinger/ Code: git.cc.gatech.edu/git/project/stinger.git/ Gateway to • code, • development, • documentation, • presentations... Remember: Academic code, but maturing with contributions. Users / contributors / questioners: Georgia Tech, PNNL, CMU, Berkeley, Intel, Cray, NVIDIA, IBM, Federal Government, Ionic Security, Citi, Accenture, ... Streaming Graphs — SIAM NS, 14 July 2017 19/19

No public clipboards found for this slide

Be the first to comment