Project Reports and Proposals          Bader Lab
Project report: STING  Presenter: Jason Riedy  STING: A FRAMEWORK FOR  ANALYZING SPATIO-TEMPORAL  INTERACTION NETWORKS AND...
Streaming Data AnalysisCurrent needs, future knowledge:• Massive, irregularly structured                                  ...
Streaming Data AnalysisCurrent needs, future knowledge:• Massive, irregularly structured                                  ...
Our Recent Contributions• Demonstrating value     – Analysis of public Twitter data• Analyzing streaming data in parallel ...
“Mining Twitter for Social Good”• Immense volume of data:• Twitter: 165M users, 55M tweets / day• Goal: Use the interactio...
Twitter Data Sets•Influenza H1N1 Tweets in September 2009–Keywords: flu, h1n1, influenza, swine flu•Atlanta Flood Tweets i...
Tweeters ranked by Betweenness Centrality       Rank   H1N1                      atlflood       1      @CDCFlu            ...
Exact vs. ApproximateBetweenness Centrality Performance                     2009-09           2009-09          2009-09    ...
The Cray XMT•Tolerates latency by massive multithreading–Hardware support for 128 threads on each processor       Image So...
Handling massive data rates• Current data rates: Handle 240k edge insertions &  deletions per second on a Cray XMT.       ...
Seed Set Expansion• Useful to find communities  to which several vertices  belong.• Blue vertices are  are seeds, red vert...
Comparing expansion methods• Results of (first known) comparison of methods:   – McCloskey-Bader variation of widely used,...
References1. A. Clauset, M.E.J. Newman, and C. Moore. “Finding community structure in   very large networks.” Physical Rev...
Current Bader Lab Personnel• Faculty: David A. Bader• Research Scientists:    – Henning Meyerhenke (University of Paderbor...
Acknowledgment of Support                            32
Upcoming SlideShare
Loading in …5
×

STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs

942 views

Published on

Presentation for NSF IUCRC Center for Hybrid Multicore Productivity Research program meeting.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
942
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs

  1. 1. Project Reports and Proposals Bader Lab
  2. 2. Project report: STING Presenter: Jason Riedy STING: A FRAMEWORK FOR ANALYZING SPATIO-TEMPORAL INTERACTION NETWORKS AND GRAPHS I/UCRC Review Meeting, Dec 2010
  3. 3. Streaming Data AnalysisCurrent needs, future knowledge:• Massive, irregularly structured input data. • New simulation, analysis methods ? • Widely varied, unexplored response / control methods ? ... Analysts need us here. Yesterday. The STING framework is growing to help. I/UCRC Review Meeting, Dec 2010 19
  4. 4. Streaming Data AnalysisCurrent needs, future knowledge:• Massive, irregularly structured input data. • New simulation, analysis methods ? • Widely varied, unexplored response / control methods Facebook friendship graph: 30k edges per pixel on 1600x1200 ... screen! I/UCRC Review Meeting, Dec 2010 20
  5. 5. Our Recent Contributions• Demonstrating value – Analysis of public Twitter data• Analyzing streaming data in parallel – Streaming connected component tracking• Exploring new algorithms – Seeded community detection – First known comparison of methods D. Ediger, K. Jiang, J. Riedy, D. Bader, C. Corley, R. Farber, and W. Reynolds. “Massive Social Network Analysis: Mining Twitter for Social Good,” International Conference on Parallel Processing, San Diego, CA September 13-16, 2010. P. Pande, K. Jiang, R. Sharma, J. Riedy, D. Bader. “Seeded Community Detection in Social Networks.” Technical Report. Being revised for submission. D. Ediger, J. Riedy, D. Bader. “Tracking Structure of Streaming Social Networks,” in submission. I/UCRC Review Meeting, Dec 2010 21
  6. 6. “Mining Twitter for Social Good”• Immense volume of data:• Twitter: 165M users, 55M tweets / day• Goal: Use the interaction network to understand and characterize information flow. Twitter social network using Large GraphResults from collaboration with PNNL: LayoutD. Ediger, K. Jiang, J. Riedy, D. Bader, C. Corley,R. Farber, and W. Reynolds. “Massive SocialNetwork Analysis: Mining Twitter for SocialGood,” International Conference on ParallelProcessing, San Diego, CA September 13-16,2010. I/UCRC Review Meeting, Dec 2010 22
  7. 7. Twitter Data Sets•Influenza H1N1 Tweets in September 2009–Keywords: flu, h1n1, influenza, swine flu•Atlanta Flood Tweets in September 2009–Hash tag: #atlflood•All public tweets on September 1st, 2009–For performance evaluation Source: CDC http://hippwaters.wordpress.com/2009/09/22/atlanta-flood-images/ I/UCRC Review Meeting, Dec 2010 23
  8. 8. Tweeters ranked by Betweenness Centrality Rank H1N1 atlflood 1 @CDCFlu @ajc 2 @addthis @driveafastercar 3 @Official_PAX @ATLCheap 4 @FluGov @TWCi 5 @nytimes @HelloNorthGA 6 @tweetmeme @11AliveNews 7 @mercola @WSB_TV 8 @CNN @shaunking 9 @backstreetboys @Carl 10 @EllieSmith_x @SpaceyG 11 @TIME @ATLINtownPaper 12 @CDCemergency @TJsDJs 13 @CDC_eHealth @ATLien 14 @perezhilton @MarshallRamsey 15 @billmaher @Kanye I/UCRC Review Meeting, Dec 2010 24
  9. 9. Exact vs. ApproximateBetweenness Centrality Performance 2009-09 2009-09 2009-09 2009-09 2009-09 H1N1 Atlanta Day 1 All Days: 1-9 Days: 1-30 Flood All AllVertices 46,457 2,283 1,242,715 4,093,202 7,213,879Edges 36,886 2,774 1,020,671 7,146,911 18,153,410Cray XMT 39.18s 6.14s 48m50s 20h21m Not RunGraphCT (64 processors) (64 processors) (128 processors) (128 processors)Nehalem 17s 1s Stopped Not Run Not RunExact after 72Snap-GT hours(multi-threaded)Cray XMT 3.94s 4.08s 5.12s 14.49s 33.09sGraphCT (16 processors) (16 processors) (16 processors) (16 processors) (16 processors)Approx.(128 samples) I/UCRC Review Meeting, Dec 2010 25
  10. 10. The Cray XMT•Tolerates latency by massive multithreading–Hardware support for 128 threads on each processor Image Source: cray.com–Globally hashed address space–No data cache–Single cycle context switch–Multiple outstanding memory requests–Fine-grained, word-level synchronization–Flexibly supports dynamic load balancing•Example, extreme architecture – Useful lesson: Tolerating memory latency assists graph analysis. – Graph500: XMTs at #3, #4, #6 w/1 afternoons work.•PNNLs 128 processor XMT: 16384 threads, 1 TiB of shared memory I/UCRC Review Meeting, Dec 2010 26
  11. 11. Handling massive data rates• Current data rates: Handle 240k edge insertions & deletions per second on a Cray XMT. Updates / sec Edge adds only 930k Edge adds + STINGER 300k Adds + Deletes + STINGER 240k Sizes 32P, 1M batch RMAT: 16M vertices, 135M edges (edge factor 8)• Note: GigE packet rates are ~550k/sec I/UCRC Review Meeting, Dec 2010 27
  12. 12. Seed Set Expansion• Useful to find communities to which several vertices belong.• Blue vertices are are seeds, red vertices belong to a discovered community.• Uses: Selection for viz, expensive analysis... I/UCRC Review Meeting, Dec 2010 28
  13. 13. Comparing expansion methods• Results of (first known) comparison of methods: – McCloskey-Bader variation of widely used, agglomerative Clauset-Newman-Moore heuristic produces smallest sets with good properties. – Followed closely by a personalized PageRank approach [Andersen, Chung, & Lang, 2006].• Currently working on parallel, streaming version...Results from P. Pande, K. Jiang, R. Sharma, J. Riedy, D. Bader. “Seeded CommunityDetection in Social Networks.” Technical Report. Being revised for submission. I/UCRC Review Meeting, Dec 2010 29
  14. 14. References1. A. Clauset, M.E.J. Newman, and C. Moore. “Finding community structure in very large networks.” Physical Review E, 70(6):66111, 2004.2. R. Andersen and K. Lang. “Communities from seed sets.” In Proceedings of the 15th international conference on World Wide Web, page 232. ACM, 2006.3. R. Andersen, F. Chung, and K. Lang. 2006. “Local Graph Partitioning using PageRank Vectors.” In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (October 21 - 24, 2006). FOCS. IEEE Computer Society, Washington, DC, 475-486.4. J.P. McCloskey and D.A. Bader, “Modularity and Graph Algorithms,” Minisymposium on Analyzing Massive Real-World Graphs, 2010 SIAM Annual Meeting (AN10), Pittsburgh, PA, July 12-16, 2010. I/UCRC Review Meeting, Dec 2010 30
  15. 15. Current Bader Lab Personnel• Faculty: David A. Bader• Research Scientists: – Henning Meyerhenke (University of Paderborn, Germany) – Jason Riedy (UC Berkeley)• Graduate Students: – David Ediger (George Washington University) – Seunghwa Kang (Seoul National University, Korea) – Xing Liu (Huazhong University of Science and Technology, China & IBM) – Robert McColl (Vanderbilt University) – Pushkar Pande (IIT Roorkee, India) – Emily Rogers (UC Berkeley) – Vipin Sachdeva (IIT Guwahati, UNM, IBM) – Vyomkesh Tripathi (IIT-Kharagpur, India) – Ivan Walker (Jackson State University) – Zhaoming Yin (Peking Univ, China) 31
  16. 16. Acknowledgment of Support 32

×