Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Localized methods in graph mining

1,315 views

Published on

Localized methods in graph mining exploit the local structures in a graph instead attempting to find global structures. These are widely successful at all sorts of problems including community detection, label propagation, and a few others.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Localized methods in graph mining

  1. 1. Localized methods in ! graph mining David F. Gleich! Purdue University! Joint work with Kyle Kloster @" Purdue & Michael Mahoney @ Berkeley supported by " NSF CAREER CCF-1149756 David Gleich · Purdue 1
  2. 2. David Gleich · Purdue 2
  3. 3. STORY TIME •  Simple theme •  Many pictures! David Gleich · Purdue 3
  4. 4. Localized methods in graph mining " use the local structure of a network ! (and not the global structure). USE THIS NOT THIS David Gleich · Purdue 4
  5. 5. Point 1 " Localized methods are the right thing to use for large graph mining Point 2 " Localized methods are still the right thing to use even if you don't believe my answer to part 1. David Gleich · Purdue 5
  6. 6. Some graphs have global structure David Gleich · Purdue 6 Image by R. Rossi from our paper" on clique detection for " Temporal Strong-Components
  7. 7. Some graphs do not David Gleich · Purdue 7
  8. 8. Some graphs are random David Gleich · Purdue 8
  9. 9. Can you tell which one is random? David Gleich · Purdue 9
  10. 10. At large scales, ! real networks ! look random (or slightly better) David Gleich · Purdue 10
  11. 11. Localized methods only operate on meaningful local structures in the data David Gleich · Purdue 11
  12. 12. CAVEATS There are large-scale global structures. BUT They don’t look like what your small-scale intuition would predict. Continents exist in Facebook, but they don’t look small scale structures Leskovec, Lang, Dasgupta, Mahoney. Internet Math, 2009. Ugander, Backstrom, WSDM (2013) Jeub, Balachandran, Porter, Mucha, Mahoney. Phys Rev E 2015. David Gleich · Purdue 12
  13. 13. Point 1 " Localized methods are the right thing to use for large graph mining Point 2 " Localized methods are still the right thing to use even if you don't believe my answer to part 1. David Gleich · Purdue 13
  14. 14. Local algorithms give fast answers to global queries " (for small-source diffusions) David Gleich · Purdue 14
  15. 15. Local algorithms give useful answers to global queries " (for small-source diffusions) David Gleich · Purdue 15
  16. 16. Pictures from Sparse Matrix Respository (David & Hu) www.cise.ufl.edu/research/sparse/matrices/ David Gleich · Purdue 16
  17. 17. Graph diffusions David Gleich · Purdue 17 ate t in on work, or mesh, from a typical problem in scientific computing high low Diffusions show how {importance, rank, information, status, …} flows from a source to target nodes via edges
  18. 18. Graph diffusions David Gleich · Purdue 18 f = 1X k=0 ↵k Pk s ate t in on work, or mesh, from a typical problem in scientific computing high low A – adjacency matrix! D – degree matrix! P – column stochastic operator s – the “seed” (a sparse vector) f – the diffusion result 𝛼k – the path weights P = AD 1 Px = X j!i 1 dj xj Graph diffusions help: 1.  Attribute prediction 2.  Community detection 3.  “Ranking” 4.  Find small conductance sets 5.  Graph label propagation
  19. 19. Graph diffusions David Gleich · Purdue 19 ate t in on work, or mesh, from a typical problem in scientific computing high low h = e t 1X k=0 tk k! Pk s h = e t exp{tP}s PageRank Heat kernel x = (1 ) 1X k=0 k Pk s (I P)x = (1 )s P = AD 1 Px = X j!i 1 dj xj
  20. 20. Graph diffusions David Gleich · Purdue 20 h = e t 1X k=0 tk k! Pk s h = e t exp{tP}s PageRank Heat kernel 0 20 40 60 80 100 10 −5 10 0 t=1 t=5 t=15 α=0.85 α=0.99 Weight Length x = (1 ) 1X k=0 k Pk s (I P)x = (1 )s
  21. 21. TwitterRank GeneRank IsoRank MonitorRank BookRank TimedPage- Rank CiteRank AuthorRank PopRank FactRank ObjectRank FolkRank ItemRank BuddyRank TwitterRank HostRank DirRank TrustRank BadRank VisualRank PAGERANK BEYOND THE WEB 15 PAGERANK BEYOND THE WEB PageRank beyond the Webhttp://arxiv.org/abs/1407.5107 (I ↵P)x = (1 ↵)x by Jessica Leber Fast Magazine 21
  22. 22. Diffusion based ! community detection 1.  Given a seed, approximate the diffusion. 2.  Extract the community. Both are local operations. David Gleich · Purdue 22
  23. 23. Conductance communities Conductance is one of the most important community scores [Schaeffer07] The conductance of a set of vertices is the ratio of edges leaving to total edges: Equivalently, it’s the probability that a random edge leaves the set. Small conductance ó Good community (S) = cut(S) min vol(S), vol( ¯S) (edges leaving the set) (total edges in the set) David Gleich · Purdue cut(S) = 7 vol(S) = 33 vol( ¯S) = 11 (S) = 7/11 23
  24. 24. Andersen- Chung-Lang! personalized PageRank community theorem! [Andersen et al. 2006]! [Ghosh et al. 2014, KDD] Informally Suppose the seeds are in a set of good conductance, then a sweep-cut on a diffusion will find a set with conductance that’s nearly as good. … also, it’s really fast. David Gleich · Purdue 24
  25. 25. Sweep-cuts find small- conductance sets Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 (b) Zhou (3 labels) (c) Andersen-Lang (3 labels) (d) Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 (a) The adjacency structure of our sample with the three unbalanced classes indicated. Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 (b) Zhou (3 labels) (c) Andersen-Lang (3 labels) Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 Class 1 Class 2 Class 3 (e) Zhou (15 labels) (f) Andersen-Lang (15 labels) Figure 4: A study of the paradoxical e↵ects of value-based rounding on di↵usion GOOD ! SET 1! Check the conductance for all “prefixes” of the diffusion vector sorted by value – there is a fast update – O(sum of degrees work) GOOD ! SET 2! GOOD ! SET 3! David Gleich · Purdue 25
  26. 26. Diffusions are localized " and we have algorithms to find their local regions David Gleich · Purdue 26
  27. 27. Uniformly localized ! solutions in flickr plot(x) David Gleich · Purdue 27 0 2 4 6 8 10 x 10 5 0 0.02 0.04 0.06 0.08 0.1 10 0 10 2 10 4 10 6 10 −15 10 −10 10 −5 10 0 10 0 10 2 10 4 10 6 10 −15 10 −10 10 −5 10 0 nonzeros Crawl of flickr from 2006 ~800k nodes, 6M edges, beta=1/2 (I P)x = (1 )s nnz(x) ⇡ 800k kD1 (xx⇤ )k1"
  28. 28. Our mission! Find the solution with work " roughly proportional to the " localization, not the matrix. David Gleich · Purdue 28
  29. 29. Our Point" The push procedure gives " localized algorithms for diffusions " in a pleasingly wide variety of settings. Our Results" New empirical and theoretical insights into why and how “push” is so effective David Gleich · Purdue 29
  30. 30. The Push Algorithm for PageRank Proposed (in closest form) in Andersen, Chung, Lang (also by McSherry, Jeh & Widom, Berkhin) for fast approx. PageRank Derived to show improved runtime for balanced solvers David Gleich · Purdue 30 1.  Used for empirical studies of “communities” 2.  Local Cheeger inequality. 3.  Used for “fast Page-Rank approximation” 4.  Works on massive graphs O(1 second) for 4 billion edge graph on a laptop. 5.  It yields weakly localized PageRank approximations! Newman’s netscience! 379 vertices, 1828 nnz Produce an ε-accurate entrywise localized PageRank vector in work 1 "(1 )
  31. 31. Gauss-Seidel and ! Gauss-Southwell David Gleich · Purdue 31 Methods to solve A x = b x(k+1) = x(k) + ⇢j ej [Ax(k+1) ]j = [b]jUpdate such that In words “Relax” or “free” the jth coordinate of your solution vector in order to satisfy the jth equation of your linear system. Gauss-Seidel repeatedly cycle through j = 1 to n Gauss-Southwell use the value of j that has the highest magnitude residual r(k) = b Ax(k) a b c
  32. 32. Almost “the push” method The Push Method! David Gleich · Purdue 32 1. x(1) = 0, r(1) = (1 )ei , k = 1 2. while any rj > "dj (dj is the degree of node j) 3. x(k+1) = x(k) + (rj "dj ⇢)ej 4. r(k+1) i = 8 >< >: "dj ⇢ i = j r(k) i + (rj "dj ⇢)/dj i ⇠ j r(k) i otherwise 5. k k + 1 ", ⇢ Only push “some” of the residual – If we want tolerance “eps” then push to tolerance “eps” and no further
  33. 33. Push is fast! For the PageRank diffusion, Push gives constant work (entry-wise)." Andersen, Chung, Lang FOCS 2006 1.  For the Katz diffusion" Push works empirically fast " Bonchi, Gleich, et al., 2012, Internet Math. 2.  For the exponential" Push gives uniform localization on power-law graphs and fast runtimes" Gleich and Kloster, 2014, Internet Math. 3.  For the heat-kernel diffusion " Push gives constant work (entry-wise)" Kloster and Gleich, 2014, KDD 4.  For the PageRank diffusion " Push yields sparsity regularization" Gleich and Mahoney, ICML 2014 5.  For a general class of diffusions " There is a Cheeger inequality like before" Ghosh, Teng, et al. KDD 2014 6.  For the PageRank diffusion " Push gives the solution path in constant work (entry-wise)" Kloster and Gleich, arXiv:1503.00322 x = exp(tP)ei x = exp(P)ei (I P)x = (1 ↵)ei (I A)x = (1 ↵)ei PageRank Katz David Gleich · Purdue 33
  34. 34. Push is useful! 1.  Push implicitly regularizes semi- supervised learning" Gleich and Mahoney, submitted 2.  Push gives state of the art results for overlapping community detection " Whang, Gleich, Dhillon, CIKM 2013! Whang, Gleich, Dhillon, In prep. 3.  Push for overlapping clusters decrease communication in parallel solutions" Andersen, Gleich, Mirrokni, WSDM 2012 David Gleich · Purdue 34 F1 F2 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 DBLP demon bigclam graclus centers spread hubs random egonet 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 dem big gra spr ran ego Figure 3: F1 and F2 measures comparing our algorithmic comm indicates better communities. 6 7 8 Run time demon bigclam graclus centers spread hubs random Our seed set because eac property ind sion method version. Als Seeding Phase Seed Set Expansion Phase Propagation Phase Joyce Jiyoung Whang, The University of Texas at Austin Conference on Information and Knowledge Management (8/44)
  35. 35. HK PPR F1 0.87 0.34 Set size 14 67 F1 0.33 0.14 Set size 192 15293 Amazon (Average) Us! Prev. Best Thisset Heat Kernel Based Community Detection KDD 2014 Kyle Kloster and David F. Gleich! Purdue University f = exp{tP}s = 1X k=0 tk k! Pk s Convert to a linear system, and solve in constant time
  36. 36. Heat kernel localization General recipe! 1.  Take problem X, " convert into a linear system 2.  Apply “push” to that linear system 3.  Analyze and bound total work David  Gleich  ·∙  Purdue   36   Heat kernel recipe! 1.  Convert into " " " 2.  Apply “push” 3.  Analyze work bound " x = exp(tP)ei 2 6 6 6 6 6 6 4 III tP/1 III tP/2 ... ... III tP/N III 3 7 7 7 7 7 7 5 2 6 6 6 6 6 6 4 v0 v1 ... ... vN 3 7 7 7 7 7 7 5 = 2 6 6 6 6 6 6 4 ei 0 ... ... 0 3 7 7 7 7 7 7 5
  37. 37. There is a fast deterministic adaptation of the push method David Gleich · Purdue 37 Kloster & Gleich, KDD2014 ons hat hen, erm s is d to the s: (7) (8) tity em (9) to k ⇡ we # G is graph as dictionary -of -sets , # seed is an array of seeds , # t, eps , N, psis are precomputed x = {} # Store x, r as dictionaries r = {} # initialize residual Q = collections.deque () # initialize queue for s in seed: r[(s ,0)] = 1./ len(seed) Q.append ((s ,0)) while len(Q) > 0: (v,j) = Q.popleft () # v has r[(v,j)] ... rvj = r[(v,j)] # perform the hk -relax step if v not in x: x[v] = 0. x[v] += rvj r[(v,j)] = 0. mass = (t*rvj/( float(j)+1.))/ len(G[v]) for u in G[v]: # for neighbors of v next = (u,j+1) # in the next block if j+1 == N: # last step , add to soln x[u] += rvj/len(G(v)) continue if next not in r: r[next] = 0. thresh = math.exp(t)*eps*len(G[u]) thresh = thresh /(N*psis[j+1])/2. if r[next] < thresh and r[next] + mass >= thresh: Q.append(next) # add u to queue r[next] = r[next] + mass Figure 2: Pseudo-code for our algorithm as work- ing python code. The graph is stored as a dic- Let h = e t exp{tP}s. Let x = hk-push(") output Then kD 1 (x h)k1  " after looking at 2Net " edges. We believe that the bound below suffices N  2t log(1/") MMDS 2014 THEOREM!
  38. 38. Analysis, three pages to one slide 1.  State the approximation error that results from approximating using the linear system.! “Standard” matrix-approximation result. 2.  Bound the work involved in doing push. ! Iterate y ≥ 0, residual r ≥ 0 " Each step moves “mass” from r to y, " keeps non-neg and increasing property." Each step moves at least “deg(i)·ε” mass in deg(i) work" So in T steps, we “push” Sum [ deg(i)·ε , i in each step]" But we can only push “so much”, so we can bound this from above, and invert to get a total work bound. David Gleich · Purdue 38 Kloster & Gleich, KDD2014 X i2steps "deg(i)  et
  39. 39. Runtime David Gleich · Purdue 39 5 6 7 8 9 0 0.5 1 1.5 2 Runtime: hk vs. ppr log10(|V|+|E|) Runtime(s) hkgrow 50% 25% 75% pprgrow 50% 25% 75%
  40. 40. PageRank solution paths 10 1 10 2 10 3 10 4 10 5 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 1/ε DegreenormalizedPageRank φ=0.005 φ=0.010 φ=0.060 φ=0.111 φ=0.268 David Gleich · Purdue 40 Compute one diffusion, and all sweep-cuts, for all values of epsilon Kloster & Gleich, arXiv:1503.00322
  41. 41. PageRank solution paths David Gleich · Purdue 41 These take about a second to compute with our “new” push-based algorithm on graphs with millions of nodes and edges Related to the LARS method for 1-norm regularized problems
  42. 42. Use “centers” of graph partitions to seed for overlapping communities David Gleich · Purdue 42 0 10 20 30 40 50 60 70 80 90 100 0 Coverage (percentage) Student Version of MATLAB (a) AstroPh 0 0 0 10 20 30 40 50 60 70 80 90 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Coverage (percentage) MaximumConductance egonet graclus centers spread hubs random bigclam (d) Flickr 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 MaximumConductance Flickr social network 2M vertices" 22M edges We can cover 95% of network with communities of cond. ~0.15.
  43. 43. References and ongoing work Gleich and Kloster – Relaxation methods for the matrix exponential, J. Internet Math " Kloster and Gleich – Heat kernel based community detection KDD2014 Gleich and Mahoney – Algorithmic Anti-differentiation, ICML 2014 " Gleich and Mahoney – Regularized diffusions, Submitted Whang, Gleich, Dhillon – Seeds for overlapping communities, CIKM 2013 www.cs.purdue.edu/homes/dgleich/codes/nexpokit! www.cs.purdue.edu/homes/dgleich/codes/l1pagerank •  Improved localization bounds for functions of matrices •  Asynchronous and parallel “push”-style methods •  Localized methods beyond conductance David Gleich · Purdue 43 Supported by NSF CAREER 1149756-CCF www.cs.purdue.edu/homes/dgleich

×