ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Architectures
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Architectures

  • 872 views
Uploaded on

Analyzing static snapshots of massive, graph-structured data cannot keep pace with the growth of social networks, financial transactions, and other valuable data sources. We introduce a framework,......

Analyzing static snapshots of massive, graph-structured data cannot keep pace with the growth of social networks, financial transactions, and other valuable data sources. We introduce a framework, STING (Spatio-Temporal Interaction Networks and Graphs), and evaluate its performance on multicore, multisocket Intel(TM)-based platforms. STING achieves rates of around 100,000 edge updates per second on large, dynamic graphs with a single, general data structure. We achieve speed-ups of up to 1000$\times$ over parallel static computation, improve monitoring a dynamic graph's connected components, and show an exact algorithm for maintaining local clustering coefficients performs better on Intel-based platforms than our earlier approximate algorithm.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
872
On Slideshare
872
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
11
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Analysis of Streaming Social Networks andGraphs on Multicore ArchitecturesJason Riedy, Henning Meyerhenke (KIT), David A. Bader,David Ediger, Timothy G. Mattson (Intel MPRL) 29 March, 2012
  • 2. Outline Motivation Graphs are pervasive! Why analyze data streams? Overall streaming approach Technical STING and STINGER Clustering coefficients Connected components ConclusionsICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 2/26
  • 3. Exascale Data Analysis Health care Finding outbreaks, population epidemiology Social networks Advertising, searching, grouping Intelligence Decisions at scale, regulating algorithms Systems biology Understanding interactions, drug design Power grid Disruptions, conservation Simulation Discrete events, cracking meshes The data is full of semantically rich relationships. Graphs! Graphs! Graphs!ICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 3/26
  • 4. These are not easy graphs. Yifan Hu’s (AT&T) visualization of the Livejournal data setICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 4/26
  • 5. But no shortage of structure... Protein interactions, Giot et al., “A Protein Interaction Map of Drosophila melanogaster”, Jason’s network via LinkedIn Labs Science 302, 1722-1736, 2003. • Globally, there rarely are good, balanced separators in the scientific computing sense. • Locally, there are clusters or communities and many levels of detail.ICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 5/26
  • 6. Also no shortage of data... Existing (some out-of-date) data volumes NYSE 1.5 TB generated daily into a maintained 8 PB archive Google “Several dozen” 1PB data sets (CACM, Jan 2010) LHC 15 PB per year (avg. 21 TB daily) http://public.web.cern.ch/public/en/lhc/ Computing-en.html Wal-Mart 536 TB, 1B entries daily (2006) EBay 2 PB, traditional DB, and 6.5PB streaming, 17 trillion records, 1.5B records/day, each web click is 50-150 details. http://www.dbms2.com/2009/04/30/ ebays-two-enormous-data-warehouses/ Faceboot 845 M users... and growing. • All data is rich and semantic (graphs!) and changing. • Base data rates include items and not relationships.ICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 6/26
  • 7. General approaches • High-performance static graph analysis • Develop techniques that apply to unchanging massive graphs. • Provides useful after-the-fact information, starting points. • Serves many existing applications well: market research, much bioinformatics, ... • High-performance streaming graph analysis • Focus on the dynamic changes within massive graphs. • Find trends or new information as they appear. • Serves upcoming applications: fault or threat detection, trend analysis, ... Both very important to different areas. Remaining focus is on streaming. Note: Not CS theory streaming, but analysis of streaming data.ICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 7/26
  • 8. Why analyze data streams? Data transfer • 1 Gb Ethernet: 8.7TB daily at Data volumes 100%, 5-6TB daily realistic NYSE 1.5TB daily • Multi-TB storage on 10GE: 300TB LHC 41TB daily daily read, 90TB daily write Facebook Who knows? • CPU ↔ Memory: QPI,HT: 2PB/day@100% Data growth Speed growth • Facebook: > 2×/yr • Ethernet/IB/etc.: 4× in next 2 • Twitter: > 10×/yr years. Maybe. • Growing sources: • Flash storage, direct: 10× write, Bioinformatics, 4× read. Relatively huge cost. µsensors, securityICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 8/26
  • 9. Overall streaming approach Protein interactions, Giot et al., “A Protein Interaction Map of Drosophila melanogaster”, Jason’s network via LinkedIn Labs Science 302, 1722-1736, 2003. Assumptions • A graph represents some real-world phenomenon. • But not necessarily exactly! • Noise comes from lost updates, partial information, ...ICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 9/26
  • 10. Overall streaming approach Protein interactions, Giot et al., “A Protein Interaction Map of Drosophila melanogaster”, Jason’s network via LinkedIn Labs Science 302, 1722-1736, 2003. Assumptions • We target massive, “social network” graphs. • Small diameter, power-law degrees • Small changes in massive graphs often are unrelated.ICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 10/26
  • 11. Overall streaming approach Protein interactions, Giot et al., “A Protein Interaction Map of Drosophila melanogaster”, Jason’s network via LinkedIn Labs Science 302, 1722-1736, 2003. Assumptions • The graph changes, but we don’t need a continuous view. • We can accumulate changes into batches... • But not so many that it impedes responsiveness.ICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 11/26
  • 12. Difficulties for performance • What partitioning methods apply? • Geometric? Nope. • Balanced? Nope. • Is there a single, useful decomposition? Not likely. • Some partitions exist, but they don’t often help with balanced bisection or memory locality. • Performance needs new approaches, not just standard scientific Jason’s network via LinkedIn Labs computing methods.ICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 12/26
  • 13. STING’s focus Control action prediction summary Source data Simulation / query Viz • STING manages queries against changing graph data. • Visualization and control often are application specific. • Ideal: Maintain many persistent graph analysis kernels. • Keep one current snapshot of the graph resident. • Let kernels maintain smaller histories. • Also (a harder goal), coordinate the kernels’ cooperation. • Gather data into a typed graph structure, STINGER.ICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 13/26
  • 14. STINGER STING Extensible Representation: • Rule #1: No explicit locking. • Rely on atomic operations. • Massive graph: Scattered updates, scattered reads rarely conflict. • Time stamps: provide some thresholds for time.ICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 14/26
  • 15. Initial results Prototype STING and STINGER Monitoring the following properties: 1 clustering coefficients, 2 connected components, and 3 community structure (in progress). High-level • Support high rates of change, over 10k updates per second. • Performance scales somewhat with available processing. http://www.cc.gatech.edu/stinger/ICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 15/26
  • 16. Experimental setup Unless otherwise noted Line Model Speed (GHz) Sockets Cores Nehalem X5570 2.93 2 4 Westmere E7-8870 2.40 4 10 • Westmere loaned by Intel (thank you!) • All memory: 1067MHz DDR3, installed appropriately • Implementations: OpenMP, gcc 4.6.1, Linux ≈ 3.0 kernel • Artificial graph and edge stream generated by R-MAT [Chakrabarti, Zhan, & Faloutsos]. • Scale x, edge factor f ⇒ 2x vertices, ≈ f · 2x edges. • Edge actions: 7/8th insertions, 1/8th deletions • Results over five batches of edge actions. • Caveat: No vector instructions, low-level optimizations yet.ICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 16/26
  • 17. Clustering coefficients • Used to measure “small-world-ness” [Watts & Strogatz] and potential i m community structure • Larger clustering coefficient ⇒ more v inter-connected j n • Roughly the ratio of the number of actual to potential triangles • Defined in terms of triplets. • i – v – j is a closed triplet (triangle). • m – v – n is an open triplet. • Clustering coefficient: # of closed triplets / total # of triplets • Locally around v or globally for entire graph.ICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 17/26
  • 18. Updating triangle counts Given Edge {u, v } to be inserted (+) or deleted (-) Approach Search for vertices adjacent to both u and v , update counts on those and u and v Three methods Brute force Intersect neighbors of u and v by iterating over each, O(du dv ) time. Sorted list Sort u’s neighbors. For each neighbor of v , check if in the sorted list. Compressed bits Summarize u’s neighbors in a bit array. Reduces check for v ’s neighbors to O(1) time each. Approximate with Bloom filters. [MTAAP10] All rely on atomic addition.ICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 18/26
  • 19. Batches of 10k actions Brute force Bloom filter Sorted list 1.7e+04 3.4e+04 8.8e+04 1.2e+05 8.8e+04 1.4e+05 105.5 1.5e+04 1.3e+05 1.2e+05 5.1e+03 2.4e+04 2.2e+04 Updates per seconds, both metric and STINGER 3.9e+03 2.0e+04 1.7e+04 q q q q q q q q q 5 q q q q 10 q q qq q q q q q q q q Machine 104.5 q a 4 x E7−8870 q q a 2 x X5570 q q q q q 104 103.5 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 Threads Graph size: scale 22, edge factor 16ICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 19/26
  • 20. Different batch sizes Brute force Bloom filter Sorted list 105.5 105 q q q q q q q q q q Updates per seconds, both metric and STINGER q q q 100 4.5 q 10 q q q q q q q q q 104 q qq q 3.5 q q 10 105.5 105 q q q q q q q q Machine 1000 104.5 q q 4 x E7−8870 q q q 2 x X5570 104 q q q q q q 103.5 105.5 q q q q q q q 105 qq q q q qqq q q q q q q 10000 q 4.5 q 10 q q q q q q 104 103.5 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 Threads Graph size: scale 22, edge factor 16ICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 20/26
  • 21. Different batch sizes: Reactivity Brute force Bloom filter Sorted list 100 10−0.5 Seconds between updates, both metric and STINGER 10−1 100 q qq 10−1.5 q q q q q q 10−2 q q q q q q q q q q 10−2.5 q q q q q q q q q 10−3 q q q 100 10−0.5 q q q Machine 10−1 qq 1000 q q q q 4 x E7−8870 10−1.5 q q q q 10−2 q qq q q 2 x X5570 10−2.5 10−3 100 q q q q q 10−0.5 q q q q q q q 10−1 qq qqq 10000 q q q q q q q q q q q 10−1.5 10−2 10−2.5 10−3 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 Threads Graph size: scale 22, edge factor 16ICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 21/26
  • 22. Connected components • Maintain a mapping from vertex to component. • Global property, unlike triangle counts • In “scale free” social networks: • Often one big component, and • many tiny ones. • Edge changes often sit within components. • Remaining insertions merge components. • Deletions are more difficult...ICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 22/26
  • 23. Connected components: Deleted edges The difficult case • Very few deletions matter. • Determining which matter may require a large graph search. • Re-running static component detection. • (Long history, see related work in [MTAAP11].) • Coping mechanisms: • Heuristics. • Second level of batching.ICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 23/26
  • 24. Deletion heuristics Rule out effect-less deletions • Use the spanning tree by-product of static connected component algorithms. • Ignore deletions when one of the following occur: 1 The deleted edge is not in the spanning tree. 2 If the endpoints share a common neighbor∗ . 3 If the loose endpoint can reach the root∗ . • In the last two (∗), also fix the spanning tree. Rules out 99.7% of deletions.ICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 24/26
  • 25. Connected components: Performance 106 2.4e+03 1.6e+04 6.4e+03 3.2e+03 2.0e+03 105 100 Updates per seconds, both metric and STINGER 104 103 106 1.7e+04 7.7e+04 1.4e+04 1.9e+04 2.0e+04 105 Machine 1000 104 a 4 x E7−8870 a 2 x X5570 3 10 106 5.8e+04 1.3e+05 1.5e+05 5.5e+04 1.1e+05 5 10 10000 104 103 12 4 6 8 12 16 24 32 40 48 56 64 72 80 Threads Graph size: scale 22, edge factor 16ICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 25/26
  • 26. Conclusions • On commercial Intel-based platforms, our STING system supports update rates of • 100 000 updates per second for clustering coefficients, and • 70 000 updates per second for component labels. • The rates are on batches small enough to respond 100 or 70 times per second, respectively. • Batching updates exposes parallelism without much delay. • Reducing graph searches (memory access) with better heuristics improves performance more with fewer paths to memory. Software freely available http://www.cc.gatech.edu/stinger/ICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 26/26
  • 27. Acknowledgment of supportICASSP 2012—Analysis of Streaming Social Networks and Graphs on Multicore Architectures—Jason Riedy 27/26