Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos

6,079 views

Published on

What do graphs look like? How do they evolve over time? How does influence/news/viruses propagate, over time? We present a long list of static and temporal laws, and some recent observations on real graphs. We show that fractals and self-similarity can explain several of the observed patterns, and we conclude with cascade analysis and a surprising result on virus propagation and immunization.

Published in: Technology, Education

Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos

  1. 1. CMU SCS Large Graph Mining - Patterns, Explanations and Cascade Analysis Christos Faloutsos CMU
  2. 2. CMU SCS Thank you! • Wei FAN • Albert BIFET • Qiang YANG • Philip YU KDD'13 BIGMINE (c) 2013, C. Faloutsos 2
  3. 3. CMU SCS (c) 2013, C. Faloutsos 3 Roadmap • Introduction – Motivation – Why study (big) graphs? • Part#1: Patterns in graphs • Part#2: Cascade analysis • Conclusions • [Extra: ebay fraud; tensors; spikes] KDD'13 BIGMINE
  4. 4. CMU SCS (c) 2013, C. Faloutsos 4 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ‟91] >$10B revenue >0.5B users KDD'13 BIGMINE
  5. 5. CMU SCS (c) 2013, C. Faloutsos 5 Graphs - why should we care? • web-log (‘blog’) news propagation • computer network security: email/IP traffic and anomaly detection • Recommendation systems • .... • Many-to-many db relationship -> graph KDD'13 BIGMINE
  6. 6. CMU SCS (c) 2013, C. Faloutsos 6 Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – Static graphs – Time-evolving graphs – Why so many power-laws? • Part#2: Cascade analysis • Conclusions KDD'13 BIGMINE
  7. 7. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 7 Part 1: Patterns & Laws
  8. 8. CMU SCS (c) 2013, C. Faloutsos 8 Laws and patterns • Q1: Are real graphs random? KDD'13 BIGMINE
  9. 9. CMU SCS (c) 2013, C. Faloutsos 9 Laws and patterns • Q1: Are real graphs random? • A1: NO!! – Diameter – in- and out- degree distributions – other (surprising) patterns • Q2: why ‘no good cuts’? • A2: <self-similarity – stay tuned> • So, let’s look at the data KDD'13 BIGMINE
  10. 10. CMU SCS (c) 2013, C. Faloutsos 10 Solution# S.1 • Power law in the degree distribution [SIGCOMM99] log(rank) log(degree) internet domains att.com ibm.com KDD'13 BIGMINE
  11. 11. CMU SCS (c) 2013, C. Faloutsos 11 Solution# S.1 • Power law in the degree distribution [SIGCOMM99] log(rank) log(degree) -0.82 internet domains att.com ibm.com KDD'13 BIGMINE
  12. 12. CMU SCS (c) 2013, C. Faloutsos 12 Solution# S.1 • Q: So what? log(rank) log(degree) -0.82 internet domains att.com ibm.com KDD'13 BIGMINE
  13. 13. CMU SCS (c) 2013, C. Faloutsos 13 Solution# S.1 • Q: So what? • A1: # of two-step-away pairs: log(rank) log(degree) -0.82 internet domains att.com ibm.com KDD'13 BIGMINE
  14. 14. CMU SCS (c) 2013, C. Faloutsos 14 Solution# S.1 • Q: So what? • A1: # of two-step-away pairs: O(d_max ^2) ~ 10M^2 log(rank) log(degree) -0.82 internet domains att.com ibm.com KDD'13 BIGMINE ~0.8PB -> a data center(!) DCO @ CMU Gaussian trap
  15. 15. CMU SCS (c) 2013, C. Faloutsos 15 Solution# S.1 • Q: So what? • A1: # of two-step-away pairs: O(d_max ^2) ~ 10M^2 log(rank) log(degree) -0.82 internet domains att.com ibm.com KDD'13 BIGMINE ~0.8PB -> a data center(!) Gaussian trap
  16. 16. CMU SCS (c) 2013, C. Faloutsos 16 Solution# S.2: Eigen Exponent E • A2: power law in the eigenvalues of the adjacency matrix E = -0.48 Exponent = slope Eigenvalue Rank of decreasing eigenvalue May 2001 KDD'13 BIGMINE A x = x
  17. 17. CMU SCS (c) 2013, C. Faloutsos 17 Roadmap • Introduction – Motivation • Problem#1: Patterns in graphs – Static graphs • degree, diameter, eigen, • Triangles – Time evolving graphs • Problem#2: Tools KDD'13 BIGMINE
  18. 18. CMU SCS (c) 2013, C. Faloutsos 18 Solution# S.3: Triangle „Laws‟ • Real social networks have a lot of triangles KDD'13 BIGMINE
  19. 19. CMU SCS (c) 2013, C. Faloutsos 19 Solution# S.3: Triangle „Laws‟ • Real social networks have a lot of triangles – Friends of friends are friends • Any patterns? – 2x the friends, 2x the triangles ? KDD'13 BIGMINE
  20. 20. CMU SCS (c) 2013, C. Faloutsos 20 Triangle Law: #S.3 [Tsourakakis ICDM 2008] SNReuters Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n1.6 triangles KDD'13 BIGMINE
  21. 21. CMU SCS (c) 2013, C. Faloutsos 21 Triangle Law: Computations [Tsourakakis ICDM 2008] But: triangles are expensive to compute (3-way join; several approx. algos) – O(dmax 2) Q: Can we do that quickly? A: details KDD'13 BIGMINE
  22. 22. CMU SCS (c) 2013, C. Faloutsos 22 Triangle Law: Computations [Tsourakakis ICDM 2008] But: triangles are expensive to compute (3-way join; several approx. algos) – O(dmax 2) Q: Can we do that quickly? A: Yes! #triangles = 1/6 Sum ( i 3 ) (and, because of skewness (S2) , we only need the top few eigenvalues! - O(E) details KDD'13 BIGMINE A x = x
  23. 23. CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 23KDD'13 BIGMINE 23(c) 2013, C. Faloutsos ? ? ?
  24. 24. CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 24KDD'13 BIGMINE 24(c) 2013, C. Faloutsos
  25. 25. CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 25KDD'13 BIGMINE 25(c) 2013, C. Faloutsos
  26. 26. CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 26KDD'13 BIGMINE 26(c) 2013, C. Faloutsos
  27. 27. CMU SCS (c) 2013, C. Faloutsos 27 Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – Static graphs • Power law degrees; eigenvalues; triangles • Anti-pattern: NO good cuts! – Time-evolving graphs • …. • Conclusions KDD'13 BIGMINE
  28. 28. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 28 Background: Graph cut problem • Given a graph, and k • Break it into k (disjoint) communities
  29. 29. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 29 Graph cut problem • Given a graph, and k • Break it into k (disjoint) communities • (assume: block diagonal = ‘cavemen’ graph) k = 2
  30. 30. CMU SCS Many algo‟s for graph partitioning • METIS [Karypis, Kumar +] • 2nd eigenvector of Laplacian • Modularity-based [Girwan+Newman] • Max flow [Flake+] • … • … • … KDD'13 BIGMINE (c) 2013, C. Faloutsos 30
  31. 31. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 31 Strange behavior of min cuts • Subtle details: next – Preliminaries: min-cut plots of ‘usual’ graphs NetMine: New Mining Tools for Large Graphs, by D. Chakrabarti, Y. Zhan, D. Blandford, C. Faloutsos and G. Blelloch, in the SDM 2004 Workshop on Link Analysis, Counter-terrorism and Privacy Statistical Properties of Community Structure in Large Social and Information Networks, J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney. WWW 2008.
  32. 32. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 32 “Min-cut” plot • Do min-cuts recursively. log (# edges) log (mincut-size / #edges) N nodes Mincut size = sqrt(N)
  33. 33. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 33 “Min-cut” plot • Do min-cuts recursively. log (# edges) log (mincut-size / #edges) N nodes New min-cut
  34. 34. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 34 “Min-cut” plot • Do min-cuts recursively. log (# edges) log (mincut-size / #edges) N nodes New min-cut Slope = -0.5 Better cut
  35. 35. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 35 “Min-cut” plot log (# edges) log (mincut-size / #edges) Slope = -1/d For a d-dimensional grid, the slope is -1/d log (# edges) log (mincut-size / #edges) For a random graph (and clique), the slope is 0
  36. 36. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 36 Experiments • Datasets: – Google Web Graph: 916,428 nodes and 5,105,039 edges – Lucent Router Graph: Undirected graph of network routers from www.isi.edu/scan/mercator/maps.html; 112,969 nodes and 181,639 edges – User  Website Clickstream Graph: 222,704 nodes and 952,580 edges NetMine: New Mining Tools for Large Graphs, by D. Chakrabarti, Y. Zhan, D. Blandford, C. Faloutsos and G. Blelloch, in the SDM 2004 Workshop on Link Analysis, Counter-terrorism and Privacy
  37. 37. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 37 “Min-cut” plot • What does it look like for a real-world graph? log (# edges) log (mincut-size / #edges) ?
  38. 38. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 38 Experiments • Used the METIS algorithm [Karypis, Kumar, 1995] log (# edges) log(mincut-size/#edges) •Google Web graph • Values along the y- axis are averaged •“lip” for large # edges • Slope of -0.4, corresponds to a 2.5- dimensional grid! Slope~ -0.4 log (# edges) log (mincut-size / #edges)
  39. 39. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 39 Experiments • Used the METIS algorithm [Karypis, Kumar, 1995] log (# edges) log(mincut-size/#edges) •Google Web graph • Values along the y- axis are averaged •“lip” for large # edges • Slope of -0.4, corresponds to a 2.5- dimensional grid! Slope~ -0.4 log (# edges) log (mincut-size / #edges) Better cut
  40. 40. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 40 Experiments • Same results for other graphs too… log (# edges) log (# edges) log(mincut-size/#edges) log(mincut-size/#edges) Lucent Router graph Clickstream graph Slope~ -0.57 Slope~ -0.45
  41. 41. CMU SCS Why no good cuts? • Answer: self-similarity (few foils later) KDD'13 BIGMINE (c) 2013, C. Faloutsos 41
  42. 42. CMU SCS (c) 2013, C. Faloutsos 42 Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – Static graphs – Time-evolving graphs – Why so many power-laws? • Part#2: Cascade analysis • Conclusions KDD'13 BIGMINE
  43. 43. CMU SCS (c) 2013, C. Faloutsos 43 Problem: Time evolution • with Jure Leskovec (CMU -> Stanford) • and Jon Kleinberg (Cornell – sabb. @ CMU) KDD'13 BIGMINE Jure Leskovec, Jon Kleinberg and Christos Faloutsos: Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, KDD 2005
  44. 44. CMU SCS (c) 2013, C. Faloutsos 44 T.1 Evolution of the Diameter • Prior work on Power Law graphs hints atslowly growing diameter: – [diameter ~ O( N1/3)] – diameter ~ O(log N) – diameter ~ O(log log N) • What is happening in real data? KDD'13 BIGMINE diameter
  45. 45. CMU SCS (c) 2013, C. Faloutsos 45 T.1 Evolution of the Diameter • Prior work on Power Law graphs hints atslowly growing diameter: – [diameter ~ O( N1/3)] – diameter ~ O(log N) – diameter ~ O(log log N) • What is happening in real data? • Diameter shrinks over time KDD'13 BIGMINE
  46. 46. CMU SCS (c) 2013, C. Faloutsos 46 T.1 Diameter – “Patents” • Patent citation network • 25 years of data • @1999 – 2.9 M nodes – 16.5 M edges time [years] diameter KDD'13 BIGMINE
  47. 47. CMU SCS (c) 2013, C. Faloutsos 47 T.2 Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) KDD'13 BIGMINE Say, k friends on average k
  48. 48. CMU SCS (c) 2013, C. Faloutsos 48 T.2 Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) • A: over-doubled! ~ 3x – But obeying the ``Densification Power Law’’ KDD'13 BIGMINE Say, k friends on average Gaussian trap
  49. 49. CMU SCS (c) 2013, C. Faloutsos 49 T.2 Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) • A: over-doubled! ~ 3x – But obeying the ``Densification Power Law’’ KDD'13 BIGMINE Say, k friends on average Gaussian trap ✗ ✔ log log lin lin
  50. 50. CMU SCS (c) 2013, C. Faloutsos 50 T.2 Densification – Patent Citations • Citations among patents granted • @1999 – 2.9 M nodes – 16.5 M edges • Each year is a datapoint N(t) E(t) 1.66 KDD'13 BIGMINE
  51. 51. CMU SCS MORE Graph Patterns KDD'13 BIGMINE (c) 2013, C. Faloutsos 51 RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. PKDD‟09.
  52. 52. CMU SCS MORE Graph Patterns KDD'13 BIGMINE (c) 2013, C. Faloutsos 52 ✔ ✔ ✔ ✔ ✔ RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. PKDD‟09.
  53. 53. CMU SCS MORE Graph Patterns KDD'13 BIGMINE (c) 2013, C. Faloutsos 53 • Mary McGlohon, Leman Akoglu, Christos Faloutsos. Statistical Properties of Social Networks. in "Social Network Data Analytics” (Ed.: CharuAggarwal) • Deepayan Chakrabarti and Christos Faloutsos, Graph Mining: Laws, Tools, and Case StudiesOct. 2012, Morgan Claypool.
  54. 54. CMU SCS (c) 2013, C. Faloutsos 54 Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – … – Why so many power-laws? – Why no ‘good cuts’? • Part#2: Cascade analysis • Conclusions KDD'13 BIGMINE
  55. 55. CMU SCS 2 Questions, one answer • Q1: why so many power laws • Q2: why no ‘good cuts’? KDD'13 BIGMINE (c) 2013, C. Faloutsos 55
  56. 56. CMU SCS 2 Questions, one answer • Q1: why so many power laws • Q2: why no ‘good cuts’? • A: Self-similarity = fractals = ‘RMAT’ ~ ‘Kronecker graphs’ KDD'13 BIGMINE (c) 2013, C. Faloutsos 56 possible
  57. 57. CMU SCS 20‟‟ intro to fractals • Remove the middle triangle; repeat • ->Sierpinski triangle • (Bonus question - dimensionality? – >1 (inf. perimeter – (4/3)∞ ) – <2 (zero area – (3/4) ∞ ) KDD'13 BIGMINE (c) 2013, C. Faloutsos 57 …
  58. 58. CMU SCS 20‟‟ intro to fractals KDD'13 BIGMINE (c) 2013, C. Faloutsos 58 Self-similarity -> no char. scale -> power laws, eg: 2x the radius, 3x the #neighbors nn(r) nn(r) = C rlog3/log2
  59. 59. CMU SCS 20‟‟ intro to fractals KDD'13 BIGMINE (c) 2013, C. Faloutsos 59 Self-similarity ->no char. scale -> power laws, eg: 2x the radius, 3x the #neighbors nn(r) nn(r) = C rlog3/log2
  60. 60. CMU SCS 20‟‟ intro to fractals KDD'13 BIGMINE (c) 2013, C. Faloutsos 60 Self-similarity -> no char. scale -> power laws, eg: 2x the radius, 3x the #neighbors nn = C rlog3/log2 N(t) E(t) 1.66 Reminder: Densification P.L. (2x nodes, ~3x edges)
  61. 61. CMU SCS 20‟‟ intro to fractals KDD'13 BIGMINE (c) 2013, C. Faloutsos 61 Self-similarity -> no char. scale -> power laws, eg: 2x the radius, 3x the #neighbors nn = C rlog3/log2 2x the radius, 4x neighbors nn = C rlog4/log2 = C r2
  62. 62. CMU SCS 20‟‟ intro to fractals KDD'13 BIGMINE (c) 2013, C. Faloutsos 62 Self-similarity -> no char. scale -> power laws, eg: 2x the radius, 3x the #neighbors nn = C rlog3/log2 2x the radius, 4x neighbors nn = C rlog4/log2 = C r2 Fractal dim. =1.58
  63. 63. CMU SCS 20‟‟ intro to fractals KDD'13 BIGMINE (c) 2013, C. Faloutsos 63 Self-similarity -> no char. scale ->power laws, eg: 2x the radius, 3x the #neighbors nn = C rlog3/log2 2x the radius, 4x neighbors nn = C rlog4/log2 = C r2 Fractal dim.
  64. 64. CMU SCS How does self-similarity help in graphs? • A: RMAT/Kronecker generators – With self-similarity, we get all power-laws, automatically, – And small/shrinking diameter – And `no good cuts’ KDD'13 BIGMINE (c) 2013, C. Faloutsos 64 R-MAT: A Recursive Model for Graph Mining, by D. Chakrabarti, Y. Zhan and C. Faloutsos, SDM 2004, Orlando, Florida, USA Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication, by J. Leskovec, D. Chakrabarti, J. Kleinberg, and C. Faloutsos, in PKDD 2005, Porto, Portugal
  65. 65. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 65 Graph gen.: Problem dfn • Given a growing graph with count of nodes N1, N2, … • Generate a realistic sequence of graphs that will obey all the patterns – Static Patterns S1 Power Law Degree Distribution S2 Power Law eigenvalue and eigenvector distribution Small Diameter – Dynamic Patterns T2 Growth Power Law (2x nodes; 3x edges) T1 Shrinking/Stabilizing Diameters
  66. 66. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 66Adjacency matrix Kronecker Graphs Intermediate stage Adjacency matrix
  67. 67. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 67Adjacency matrix Kronecker Graphs Intermediate stage Adjacency matrix
  68. 68. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 68Adjacency matrix Kronecker Graphs Intermediate stage Adjacency matrix
  69. 69. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 69 Kronecker Graphs • Continuing multiplying with G1 we obtain G4and so on … G4 adjacency matrix
  70. 70. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 70 Kronecker Graphs • Continuing multiplying with G1 we obtain G4and so on … G4 adjacency matrix
  71. 71. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 71 Kronecker Graphs • Continuing multiplying with G1 we obtain G4and so on … G4 adjacency matrix
  72. 72. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 72 Kronecker Graphs • Continuing multiplying with G1 we obtain G4and so on … G4 adjacency matrix Holes within holes; Communities within communities
  73. 73. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 73 Properties: • We can PROVE that – Degree distribution is multinomial ~ power law – Diameter: constant – Eigenvalue distribution: multinomial – First eigenvector: multinomial new Self-similarity -> power laws
  74. 74. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 74 Problem Definition • Given a growing graph with nodes N1, N2, … • Generate a realistic sequence of graphs that will obey all the patterns – Static Patterns Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter – Dynamic Patterns Growth Power Law Shrinking/Stabilizing Diameters • First generator for which we can prove all these properties     
  75. 75. CMU SCS Impact: Graph500 • Based on RMAT (= 2x2 Kronecker) • Standard for graph benchmarks • http://www.graph500.org/ • Competitions 2x year, with all major entities: LLNL, Argonne, ITC-U. Tokyo, Riken, ORNL, Sandia, PSC, … KDD'13 BIGMINE (c) 2013, C. Faloutsos 75 R-MAT: A Recursive Model for Graph Mining, by D. Chakrabarti, Y. Zhan and C. Faloutsos, SDM 2004, Orlando, Florida, USA To iterate is human, to recurse is devine
  76. 76. CMU SCS (c) 2013, C. Faloutsos 76 Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – … – Q1: Why so many power-laws? – Q2: Why no ‘good cuts’? • Part#2: Cascade analysis • Conclusions KDD'13 BIGMINE A: real graphs -> self similar -> power laws
  77. 77. CMU SCS Q2: Why „no good cuts‟? • A: self-similarity – Communities within communities within communities … KDD'13 BIGMINE (c) 2013, C. Faloutsos 77
  78. 78. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 78 Kronecker Product – a Graph • Continuing multiplying with G1 we obtain G4and so on … G4 adjacency matrix REMINDER
  79. 79. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 79 Kronecker Product – a Graph • Continuing multiplying with G1 we obtain G4and so on … G4 adjacency matrix REMINDER Communities within communities within communities … „Linux users‟ „Mac users‟ „win users‟
  80. 80. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 80 Kronecker Product – a Graph • Continuing multiplying with G1 we obtain G4and so on … G4 adjacency matrix REMINDER Communities within communities within communities … How many Communities? 3? 9? 27?
  81. 81. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 81 Kronecker Product – a Graph • Continuing multiplying with G1 we obtain G4and so on … G4 adjacency matrix REMINDER Communities within communities within communities … How many Communities? 3? 9? 27? A: one – but not a typical, block-like community…
  82. 82. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 82 Communities? (Gaussian) Clusters? Piece-wise flat parts? age # songs
  83. 83. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 83 Wrong questions to ask! age # songs
  84. 84. CMU SCS Summary of Part#1 • *many* patterns in real graphs – Small & shrinking diameters – Power-laws everywhere – Gaussian trap – ‘no good cuts’ • Self-similarity (RMAT/Kronecker): good model KDD'13 BIGMINE (c) 2013, C. Faloutsos 84
  85. 85. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 85 Part 2: Cascades & Immunization
  86. 86. CMU SCS Why do we care? • Information Diffusion • Viral Marketing • Epidemiology and Public Health • Cyber Security • Human mobility • Games and Virtual Worlds • Ecology • ........ (c) 2013, C. Faloutsos 86KDD'13 BIGMINE
  87. 87. CMU SCS (c) 2013, C. Faloutsos 87 Roadmap • Introduction – Motivation • Part#1: Patterns in graphs • Part#2: Cascade analysis – (Fractional) Immunization – Epidemic thresholds • Conclusions KDD'13 BIGMINE
  88. 88. CMU SCS Fractional Immunization of Networks B. Aditya Prakash, LadaAdamic, Theodore Iwashyna (M.D.), Hanghang Tong, Christos Faloutsos SDM 2013, Austin, TX (c) 2013, C. Faloutsos 88KDD'13 BIGMINE
  89. 89. CMU SCS Whom to immunize? • Dynamical Processes over networks •Each circle is a hospital • ~3,000 hospitals • More than 30,000 patients transferred [US-MEDICARE NETWORK 2005] Problem: Given k units of disinfectant, whom to immunize? (c) 2013, C. Faloutsos 89KDD'13 BIGMINE
  90. 90. CMU SCS Whom to immunize? CURRENT PRACTICE OUR METHOD [US-MEDICARE NETWORK 2005] ~6x fewer! (c) 2013, C. Faloutsos 90KDD'13 BIGMINE Hospital-acquired inf. : 99K+ lives, $5B+ per year
  91. 91. CMU SCS Fractional Asymmetric Immunization Hospital Another Hospital Drug-resistant Bacteria (like XDR-TB) (c) 2013, C. Faloutsos 91KDD'13 BIGMINE
  92. 92. CMU SCS Fractional Asymmetric Immunization Hospital Another Hospital (c) 2013, C. Faloutsos 92KDD'13 BIGMINE
  93. 93. CMU SCS Fractional Asymmetric Immunization Hospital Another Hospital (c) 2013, C. Faloutsos 93KDD'13 BIGMINE
  94. 94. CMU SCS Fractional Asymmetric Immunization Hospital Another Hospital Problem: Given k units of disinfectant, distribute them to maximize hospitals saved (c) 2013, C. Faloutsos 94KDD'13 BIGMINE
  95. 95. CMU SCS Fractional Asymmetric Immunization Hospital Another Hospital Problem: Given k units of disinfectant, distribute them to maximize hospitals saved @ 365 days (c) 2013, C. Faloutsos 95KDD'13 BIGMINE
  96. 96. CMU SCS Straightforward solution: Simulation: 1. Distribute resources 2. ‘infect’ a few nodes 3. Simulate evolution of spreading – (10x, take avg) 4. Tweak, and repeat step 1 KDD'13 BIGMINE (c) 2013, C. Faloutsos 96
  97. 97. CMU SCS Straightforward solution: Simulation: 1. Distribute resources 2. ‘infect’ a few nodes 3. Simulate evolution of spreading – (10x, take avg) 4. Tweak, and repeat step 1 KDD'13 BIGMINE (c) 2013, C. Faloutsos 97
  98. 98. CMU SCS Straightforward solution: Simulation: 1. Distribute resources 2. ‘infect’ a few nodes 3. Simulate evolution of spreading – (10x, take avg) 4. Tweak, and repeat step 1 KDD'13 BIGMINE (c) 2013, C. Faloutsos 98
  99. 99. CMU SCS Straightforward solution: Simulation: 1. Distribute resources 2. ‘infect’ a few nodes 3. Simulate evolution of spreading – (10x, take avg) 4. Tweak, and repeat step 1 KDD'13 BIGMINE (c) 2013, C. Faloutsos 99
  100. 100. CMU SCS Running Time Simulations SMART-ALLOC > 1 week Wall-Clock Time ≈ 14 secs > 30,000x speed-up! better (c) 2013, C. Faloutsos 100KDD'13 BIGMINE
  101. 101. CMU SCS Experiments K = 120 better (c) 2013, C. Faloutsos 101KDD'13 BIGMINE # epochs # infected uniform SMART-ALLOC
  102. 102. CMU SCS What is the „silver bullet‟? A: Try to decrease connectivity of graph Q: how to measure connectivity? – Avg degree? Max degree? – Std degree / avg degree ? – Diameter? – Modularity? – ‘Conductance’ (~min cut size)? – Some combination of above? KDD'13 BIGMINE (c) 2013, C. Faloutsos 102 ≈ 14 secs > 30,000x speed- up!
  103. 103. CMU SCS What is the „silver bullet‟? A: Try to decrease connectivity of graph Q: how to measure connectivity? A: first eigenvalue of adjacency matrix Q1: why?? (Q2: dfn& intuition of eigenvalue ? ) KDD'13 BIGMINE (c) 2013, C. Faloutsos 103 Avg degree Max degree Diameter Modularity „Conductance‟
  104. 104. CMU SCS Why eigenvalue? A1: ‘G2’ theorem and ‘eigen-drop’: • For (almost) any type of virus • For any network • -> no epidemic, if small-enough first eigenvalue (λ1 ) of adjacency matrix • Heuristic: for immunization, try to min λ1 • The smaller λ1, the closer to extinction. KDD'13 BIGMINE (c) 2013, C. Faloutsos 104 Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks, B. Aditya Prakash, Deepayan Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos Faloutsos, ICDM 2011, Vancouver, Canada
  105. 105. CMU SCS Why eigenvalue? A1: ‘G2’ theorem and ‘eigen-drop’: • For (almost) any type of virus • For any network • -> no epidemic, if small-enough first eigenvalue (λ1 ) of adjacency matrix • Heuristic: for immunization, try to min λ1 • The smaller λ1, the closer to extinction. KDD'13 BIGMINE (c) 2013, C. Faloutsos 105
  106. 106. CMU SCS Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks B. Aditya Prakash, Deepayan Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos Faloutsos IEEE ICDM 2011, Vancouver extended version, in arxiv http://arxiv.org/abs/1004.0060 G2 theorem ~10 pages proof
  107. 107. CMU SCS Our thresholds for some models • s = effective strength • s< 1 : below threshold (c) 2013, C. Faloutsos 107KDD'13 BIGMINE Models Effective Strength (s) Threshold (tipping point) SIS, SIR, SIRS, SEIR s = λ . s = 1 SIV, SEIV s = λ . (H.I.V.) s = λ . 12 221 vv v 2121 VVISI
  108. 108. CMU SCS Our thresholds for some models • s = effective strength • s< 1 : below threshold (c) 2013, C. Faloutsos 108KDD'13 BIGMINE Models Effective Strength (s) Threshold (tipping point) SIS, SIR, SIRS, SEIR s = λ . s = 1 SIV, SEIV s = λ . (H.I.V.) s = λ . 12 221 vv v 2121 VVISI No immunity Temp. immunity w/ incubation
  109. 109. CMU SCS (c) 2013, C. Faloutsos 109 Roadmap • Introduction – Motivation • Part#1: Patterns in graphs • Part#2: Cascade analysis – (Fractional) Immunization – intuition behind λ1 • Conclusions KDD'13 BIGMINE
  110. 110. CMU SCS Intuition for λ “Official” definitions: • Let A be the adjacency matrix. Then λ is the root with the largest magnitude of the characteristic polynomial of A [det(A – xI)]. • Also: Ax = x Neither gives much intuition! “Un-official” Intuition • For ‘homogeneous’ graphs, λ == degree • λ ~ avg degree – done right, for skewed degree distributions (c) 2013, C. Faloutsos 110KDD'13 BIGMINE
  111. 111. CMU SCS Largest Eigenvalue (λ) λ ≈ 2 λ = N λ = N-1 N = 1000 nodes λ ≈ 2 λ= 31.67 λ= 999 better connectivity higher λ (c) 2013, C. Faloutsos 111KDD'13 BIGMINE
  112. 112. CMU SCS Largest Eigenvalue (λ) λ ≈ 2 λ = N λ = N-1 N = 1000 nodes λ ≈ 2 λ= 31.67 λ= 999 better connectivity higher λ (c) 2013, C. Faloutsos 112KDD'13 BIGMINE
  113. 113. CMU SCS Examples: Simulations – SIR (mumps) FractionofInfections Footprint Effective StrengthTime ticks (c) 2013, C. Faloutsos 113KDD'13 BIGMINE (a) Infection profile (b) “Take-off” plot PORTLAND graph: synthetic population, 31 million links, 6 million nodes
  114. 114. CMU SCS Examples: Simulations – SIRS (pertusis) FractionofInfections Footprint Effective StrengthTime ticks (c) 2013, C. Faloutsos 114KDD'13 BIGMINE (a) Infection profile (b) “Take-off” plot PORTLAND graph: synthetic population, 31 million links, 6 million nodes
  115. 115. CMU SCS Immunization - conclusion In (almost any) immunization setting, • Allocate resources, such that to • Minimize λ1 • (regardless of virus specifics) • Conversely, in a market penetration setting – Allocate resources to – Maximize λ1KDD'13 BIGMINE (c) 2013, C. Faloutsos 115
  116. 116. CMU SCS (c) 2013, C. Faloutsos 116 Roadmap • Introduction – Motivation • Part#1: Patterns in graphs • Part#2: Cascade analysis – (Fractional) Immunization – Epidemic thresholds • What next? • Acks& Conclusions • [Tools: ebay fraud; tensors; spikes] KDD'13 BIGMINE
  117. 117. CMU SCS Challenge #1: „Connectome‟ – brain wiring KDD'13 BIGMINE (c) 2013, C. Faloutsos 117 • Which neurons get activated by ‘bee’ • How wiring evolves • Modeling epilepsy N. Sidiropoulos George Karypis V. Papalexakis Tom Mitchell
  118. 118. CMU SCS Challenge#2: Time evolving networks / tensors • Periodicities? Burstiness? • What is ‘typical’ behavior of a node, over time • Heterogeneous graphs (= nodes w/ attributes) KDD'13 BIGMINE (c) 2013, C. Faloutsos 118 …
  119. 119. CMU SCS (c) 2013, C. Faloutsos 119 Roadmap • Introduction – Motivation • Part#1: Patterns in graphs • Part#2: Cascade analysis – (Fractional) Immunization – Epidemic thresholds • Acks& Conclusions • [Tools: ebay fraud; tensors; spikes] KDD'13 BIGMINE Off line
  120. 120. CMU SCS (c) 2013, C. Faloutsos 120 Thanks KDD'13 BIGMINE Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab
  121. 121. CMU SCS (c) 2013, C. Faloutsos 121 Thanks KDD'13 BIGMINE Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab Disclaimer: All opinions are mine; not necessarily reflecting the opinions of the funding agencies
  122. 122. CMU SCS (c) 2013, C. Faloutsos 122 Project info: PEGASUS KDD'13 BIGMINE www.cs.cmu.edu/~pegasus Results on large graphs: with Pegasus + hadoop + M45 Apache license Code, papers, manual, video Prof. U Kang Prof. Polo Chau
  123. 123. CMU SCS (c) 2013, C. Faloutsos 123 Cast Akoglu, Leman Chau, Polo Kang, U McGlohon, Mary Tong, Hanghang Prakash, Aditya KDD'13 BIGMINE Koutra, Danai Beutel, Alex Papalexakis, Vagelis
  124. 124. CMU SCS (c) 2013, C. Faloutsos 124 CONCLUSION#1 – Big data • Large datasets reveal patterns/outliers that are invisible otherwise KDD'13 BIGMINE
  125. 125. CMU SCS (c) 2013, C. Faloutsos 125 CONCLUSION#2 – self-similarity • powerful tool / viewpoint – Power laws; shrinking diameters – Gaussian trap (eg., F.O.F.) – ‘no good cuts’ – RMAT – graph500 generator KDD'13 BIGMINE
  126. 126. CMU SCS (c) 2013, C. Faloutsos 126 CONCLUSION#3 – eigen-drop • Cascades & immunization: G2 theorem &eigenvalue KDD'13 BIGMINE CURRENT PRACTICE OUR METHOD [US-MEDICARE NETWORK 2005] ~6x fewer! ≈ 14 secs > 30,000x speed- up!
  127. 127. CMU SCS (c) 2013, C. Faloutsos 127 References • D. Chakrabarti, C. Faloutsos: Graph Mining – Laws, Tools and Case Studies, Morgan Claypool 2012 • http://www.morganclaypool.com/doi/abs/10.2200/S004 49ED1V01Y201209DMK006 KDD'13 BIGMINE
  128. 128. CMU SCS TAKE HOME MESSAGE: Cross-disciplinarity KDD'13 BIGMINE (c) 2013, C. Faloutsos 128
  129. 129. CMU SCS TAKE HOME MESSAGE: Cross-disciplinarity KDD'13 BIGMINE (c) 2013, C. Faloutsos 129 QUESTIONS?

×