Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Non-exhaustive, Overlapping K-means

2,342 views

Published on

A copy of my slides from the SILO Seminar at UW Madison on our recent developments for the NEO-K-Means methods including new optimization routines and results.

Published in: Technology

Non-exhaustive, Overlapping K-means

  1. 1. Non-exhaustive, overlapping K-means clustering David F. Gleich! Purdue University!
  2. 2. Real-world graph and point data have overlapping clusters. GeneRa 10 20 30 40 50 60 70 NM_003748NM_003862Contig32125_RCU82987AB037863NM_020974Contig55377_RCNM_003882NM_000849Contig48328_RCContig46223_RCNM_006117NM_003239NM_018401AF257175AF201951NM_001282Contig63102_RCNM_000286Contig34634_RCNM_000320AB033007AL355708NM_000017NM_006763AF148505Contig57595NM_001280AJ224741U45975Contig49670_RCContig753_RCContig25055_RCContig53646_RCContig42421_RCContig51749_RCAL137514NM_004911NM_000224NM_013262Contig41887_RCNM_004163AB020689NM_015416Contig43747_RCNM_012429AB033043AL133619NM_016569NM_004480NM_004798Contig37063_RCNM_000507AB037745Contig50802_RCNM_001007Contig53742_RCNM_018104Contig51963Contig53268_RCNM_012261NM_020244Contig55813_RCContig27312_RCContig44064_RCNM_002570NM_002900AL050090NM_015417Contig47405_RCNM_016337Contig55829_RCContig37598Contig45347_RCNM_020675NM_003234AL080110AL137295Contig17359_RCNM_013296NM_019013AF052159Contig55313_RCNM_002358NM_004358Contig50106_RCNM_005342NM_014754U58033Contig64688NM_001827Contig3902_RCContig41413_RCNM_015434NM_014078NM_018120NM_001124L27560Contig45816_RCAL050021NM_006115NM_001333NM_005496Contig51519_RCContig1778_RCNM_014363NM_001905NM_018454NM_002811NM_004603AB032973NM_006096D25328Contig46802_RCX94232NM_018004Contig8581_RCContig55188_RCContig50410Contig53226_RCNM_012214NM_006201NM_006372Contig13480_RCAL137502Contig40128_RCNM_003676NM_013437Contig2504_RCAL133603NM_012177R70506_RCNM_003662NM_018136NM_000158NM_018410Contig21812_RCNM_004052Contig4595Contig60864_RCNM_003878U96131NM_005563NM_018455Contig44799_RCNM_003258NM_004456NM_003158NM_014750Contig25343_RCNM_005196Contig57864_RCNM_014109NM_002808Contig58368_RCContig46653_RCNM_004504M21551NM_014875NM_001168NM_003376NM_018098AF161553NM_020166NM_017779NM_018265AF155117NM_004701NM_006281Contig44289_RCNM_004336Contig33814_RCNM_003600NM_006265NM_000291NM_000096NM_001673NM_001216NM_014968NM_018354NM_007036NM_004702Contig2399_RCNM_001809Contig20217_RCNM_003981NM_007203NM_006681AF055033NM_014889NM_020386NM_000599Contig56457_RCNM_005915Contig24252_RCContig55725_RCNM_002916NM_014321NM_006931AL080079Contig51464_RCNM_000788NM_016448X05610NM_014791Contig40831_RCAK000745NM_015984NM_016577Contig32185_RCAF052162AF073519NM_003607NM_006101NM_003875Contig25991Contig35251_RCNM_004994NM_000436NM_002073NM_002019NM_000127NM_020188AL137718Contig28552_RCContig38288_RCAA555029_RCNM_016359Contig46218_RCContig63649_RCAL080059 Social networks have overlapping clusters because of social circles Genes have overlapping clusters due to their role in multiple functions SILO Seminar David Gleich · Purdue
  3. 3. Overlapping research projects are what got me here too! PhD Thesis on Google’s PageRank MSR Intern and Overlapping Clusters for Distributed Computation Accelerated NCP plots and locally minimal communities Neighborhood inflated seed expansion for overlapping communities Non- exhaustive overlapping " K-means SILO Seminar David Gleich · Purdue 1.  NISE Clustering - Whang, Gleich, Dhillon, CIKM 2013 2.  NEO-K-means - Whang, Gleich, Dhillon, SDM 2015 3.  NEO-K-means SDP " Hou, Whang, Gleich, Dhillon, KDD 2015 4.  Multiplier Methods for Overlapping K-Means" Hou, Whang, Gleich, Dhillon, Submitted
  4. 4. SILO Seminar David Gleich · Purdue
  5. 5. es around the seed sets Overlapping communities via seed set expansion works nicely. Filtering Phase Seeding Phase Seed Set Expansion Phase Propagation Phase Joyce Jiyoung Whang, The University of Texas at Austin Conference on Information and Knowledge Management (8/44 0 10 20 30 40 50 60 70 80 90 100 0 0.1 0.2 Coverage (percentage) M Student Version of MATLAB (a) AstroPh 0 10 20 30 40 50 60 70 80 90 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Coverage (percentage) MaximumConductance egonet graclus centers spread hubs random bigclam (d) Flickr Figure 2: Conductance vs. graph cov centers” outperforms other seeding str We can cover 95% of network with communities of cond. ~0.15. Flickr social network 2M vertices, 22M edges cond(S) = cut(S)/“size”(S) SILO Seminar David Gleich · Purdue
  6. 6. We wanted a more principled approach to achieve these results. SILO Seminar David Gleich · Purdue
  7. 7. The state of the art for clustering SILO Seminar David Gleich · Purdue K-Means Problem 1 Problem 2 Problem 3 Problem 4 😀 😊 😟 😢 K-Means
  8. 8. The state of the art for clustering SILO Seminar David Gleich · Purdue K-Means Problem 1 Problem 2 Problem 3 Problem 4 😀 😊 K-Means NEO-K-Means NEO K-Means 😊 😊
  9. 9. m1 m2 || xi – m1 || || xi – m2 || K-means as optimization. SILO Seminar David Gleich · Purdue minimize P ij Uij kxi mj k 2 subject to U is an assignment to clusters mj = 1P i Ui j Uij xi minimize P ij Uij kxi mj k 2 subject to U is an multi-assignment to clusters mj = 1P i Ui j Uij xi Input Points x1, ... , xn Find an assignment matrix U that gives cluster assignments to minimize x1 x2 x3 x4 U = 2 6 6 4 1 0 1 0 0 1 0 1 3 7 7 5 c1 c2 K-means objective! K-means’ objective with overlap?!
  10. 10. Overlap is not a natural addition to optimization based clustering. SILO Seminar David Gleich · Purdue
  11. 11. The NEO-K-means objective balances overlap and outliers. SILO Seminar David Gleich · Purdue minimize P ij Uij kxi mj k 2 subject to Uij is binary trace(UT U) = (1 + ↵)n (↵n overlap) eT Ind[Ue] (1 )n (up to n outliers) mj = 1P i Ui j Uij xi · If ↵, = 0, then we get back to K-means. · Automatically choose ↵, based on K-means. 😊 1. Make (1 + ↵)n total assignments. 2. Allow up to n outliers.
  12. 12. −8 −6 −4 −2 0 2 4 6 8 Cluster 1 Cluster 2 Cluster 1 & 2 Not assigned Lloyd’s algorithm for NEO-K-means is just a wee-bit more complex. SILO Seminar David Gleich · Purdue Until done 1. Update centroids. 2. Assign (1 )n nodes to closest centroid 3. Make (↵ + )n assignments based on minimizing distance. 2 4 6 8 Cluster 1 Cluster 2 Cluster 1 & 2 Not assigned This algorithm correctly assigns our example case and even determines overlap and outlier parameters! THEOREM Lloyds algorithm decrease the objective monotonically.
  13. 13. The non-exhaustiveness is necessary for assignments. SILO Seminar David Gleich · Purdue −6 −4 −2 0 2 4 6 8 Cluster 1 Cluster 2 Cluster 1 & 2 Not assigned b) First extension of k-means −8 −6 −4 −2 0 2 4 −8 −6 −4 −2 0 2 4 6 8 Cluster 1 Cluster 2 Cluster 1 & 2 Not assigned (c) NEO-K-Means nerated (n=1,000, ↵=0.1, =0.005). Green points indicate o −4 −2 0 2 4 6 8 & 2 gned st extension of k-means −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 Cluster 1 Cluster 2 Cluster 1 & 2 Not assigned (c) NEO-K-Means d (n=1,000, ↵=0.1, =0.005). Green points indicate overlap4 6 8 Cluster 1 Cluster 2 Cluster 1 & 2 Not assigned Output without assignment constraint. (beta = 1) NEO-K-means output (correct)
  14. 14. The Weighted, Kernel " NEO-K-Means objective. •  Introduce weights for each data point. •  Introduce feature maps for each data point too. SILO Seminar David Gleich · Purdue minimize P ij Uij wi k (xi ) mj k 2 subject to Uij is binary trace(UT U) = (1 + ↵)n (↵n overlap) eT Ind[Ue] (1 )n (up to n outliers) mj = 1P i Uij wi wi Uij xi X ij Uij wi k (xi ) mj k 2 = X ij Uij wi Kii uj WKWuj uT j Wuj ! Theorem If K = D 1 + D 1 AD 1 , then the NEO-K-Means objective is equivalent to overlapping conductance. NOTE
  15. 15. This means that NEO-K-Means was the principled objective we were after! SILO Seminar David Gleich · Purdue
  16. 16. Conductance communities Conductance is one of the most important community scores [Schaeffer07] The conductance of a set of vertices is the ratio of edges leaving to total edges: Equivalently, it’s the probability that a random edge leaves the set. Small conductance ó Good community (S) = cut(S) min vol(S), vol( ¯S) (edges leaving the set) (total edges in the set) David Gleich · Purdue cut(S) = 7 vol(S) = 33 vol( ¯S) = 11 (S) = 7/11 SILO Seminar
  17. 17. Our theorem means that NEO-K-Means can optimize the sum-conductance obj. SILO Seminar David Gleich · Purdue (S)  cut(S) vol(S) + cut( ¯S) vol( ¯S) X S2C cut(S) vol(S) = X S2C (S) if vol(S)  vol( ¯S) Conductance Normalized cut bi-partition NEO-K-Means" objective When we use this method to partition the Karate club network, we get reasonable solutions. •  Inspired by Dhillon et al.’s work on Graclus •  We have a multilevel method to optimize the graph case.
  18. 18. We get state of the art clustering perf. on vector and graph datasets. SILO Seminar David Gleich · Purdue F1 scores on vector datasets from the Mulan repository. moc fuzzy esp isp okm rokm NEO synth1 0.833 0.959 0.977 0.985 0.989 0.969 0.996 synth2 0.836 0.957 0.952 0.973 0.967 0.975 0.996 synth3 0.547 0.919 0.968 0.952 0.970 0.928 0.996 yeast - 0.308 0.289 0.203 0.311 0.203 0.366 music 0.534 0.533 0.527 0.508 0.527 0.454 0.550 scene 0.467 0.431 0.572 0.586 0.571 0.593 0.626 n dim. ¯|C| outliers k synth1 5,000 2 2,750 0 2 synth2 1,000 2 550 5 2 synth3 6,000 2 3,600 6 2 yeast 2,417 103 731.5 0 14 music 593 72 184.7 0 6 scene 2,407 294 430.8 0 6 The Mulan testset has a number of appropriate datasets
  19. 19. NEO-K-Means with Lloyds is fast and usually accurate but inconsistent. SILO Seminar David Gleich · Purdue −6 −4 −2 0 2 4 6 −2 0 2 4 6 8 10 Cluster 1 Cluster 2 Cluster 1 & 2 Cluster 3 Not assigned −4 −2 0 2 4 6 ster 1 ster 2 ster 1 & 2 ster 3 assigned −6 −4 −2 −2 0 2 4 6 8 10 Cluster 1 Cluster 2 Cluster 1 & 2 Cluster 3 Not assigned A more complicated overlapping test case The output from NEO-K- Means with Lloyd’s method
  20. 20. Can we get a more robust method? Yes! SILO Seminar David Gleich · Purdue
  21. 21. Towards better optimization of the objective 1.  An SDP relaxation of the objective. 2.  A practical low-rank SDP heuristic. 3.  Faster optimization methods for the heuristic. SILO Seminar David Gleich · Purdue
  22. 22. From assignments to co- occurrence matrices SILO Seminar David Gleich · Purdue There are three key variables in our formulation 1. The co-occurrence matrix Z = X j Wuj uT j W/uT j Wuj 2. The overlap vector f 3. The assignment indicator g U = 2 6 6 4 1 0 1 1 0 1 0 0 3 7 7 5 f = 2 6 6 4 1 2 1 0 3 7 7 5 g = 2 6 6 4 1 1 1 0 3 7 7 5
  23. 23. We can convert our objective into a trace minimization problem. SILO Seminar David Gleich · Purdue Kij = (xi )T (xj ) di = wi Kii X ij Uij wi k (xi ) mj k 2 = X ij Uij wi Kii uj WKWuj uT j Wuj ! = X ij Uij wi Kii X j uj WKWuj uT j Wuj = fT d trace(KZ) Z = normalized co-occurrence f = overlap count g = assignment indicator The objective function
  24. 24. There is an SDP-like framework to solve NEO-K-means. SILO Seminar David Gleich · Purdue maximize Z,f,g trace(KZ) fT d subject to trace(W 1 Z) = k, (a) Zij 0, (b) Z ⌫ 0, Z = ZT (c) Ze = Wf, (d) eT f = (1 + ↵)n, (e) eT g (1 )n, (f) f g, (g) rank(Z) = k, (h) f 2 Zn 0, g 2 {0, 1}n . (i) Z must come from an assignment matrix Overlap and assignment constraints Combinatorial constraints
  25. 25. There is an SDP-relaxation to approximate NEO-K-means. SILO Seminar David Gleich · Purdue Z must come from an assignment matrix Overlap and assignment constraints maximize Z,f,g trace(KZ) fT d subject to trace(W 1 Z) = k, (a) Zij 0, (b) Z ⌫ 0, Z = ZT (c) Ze = Wf, (d) eT f = (1 + ↵)n, (e) eT g (1 )n, (f) f g, (g) 0  g  1 Relaxed constraints
  26. 26. This SDP can easily solve simple problems. SILO Seminar David Gleich · Purdue NEO-K-Means SDP Solution Z from CVX is even rank 2!
  27. 27. But SDP methods have a number of issues for large-scale problems. 1.  The number of variables is quadratic in the number of data points 2.  The best solvers can only solve problems with a few hundred or thousand points. So like many before us (e.g. Burer & Monteiro, Kulis Surendran, and Platt 2007, and more) we optimize a low-rank factorization of the solution SILO Seminar David Gleich · Purdue
  28. 28. Using the NEO-K-Means Low-Rank SDP, we can find assignments directly. SILO Seminar David Gleich · Purdue NEO-K-Means Low-rank SDP Y YT kZ YYT k = 2.3 ⇥ 10 4
  29. 29. maximize Y,f,g,s,r trace(YT KY) fT d subject to k = trace(YT W 1 Y) 0 = YYT e Wf 0 = eT f (1 + ↵)n 0 = f g s 0 = eT g (1 )n r Yij 0, s 0, r 0 0  f  ke, 0  g  1 The Low-Rank NEO-K-Means SDP We lose convexity but gain practicality. We introduce slacks at this point. SILO Seminar David Gleich · Purdue icky non-convex term simple bound constraints
  30. 30. We use an augmented Lagrangian method to optimize this problem SILO Seminar David Gleich · Purdue Journal on Optimization, 18(1):186–205, 2007. [29] K. Trohidis, G. Tsoumakas, G. Kalliris, and I. P. Vlahavas. Multi-label classification of music into emotions. In International Conference on Music Information Retrieval, pages 325–330, 2008. [30] J. J. Whang, I. S. Dhillon, and D. F. Gleich. Non-exhaustive, overlapping k-means. In Proceedings of the SIAM International Conference on Data Mining, pages 936–944, 2015. [31] J. J. Whang, D. Gleich, and I. S. Dhillon. Overlapping community detection using seed set expansion. In ACM International Conference on Information and Knowledge Management, pages 2099–2108, 2013. [32] L. F. Wu, T. R. Hughes, A. P. Davierwala, M. D. Robinson, R. Stoughton, and S. J. Altschuler. Large-scale prediction of saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nature Genetics, 31(3):255–265, June 2002. [33] E. P. Xing and M. I. Jordan. On semidefinite relaxations for normalized k-cut and connections to spectral clustering. Technical Report UCB/USD-3-1265, University of California, Berkeley, 2003. [34] J. Yang and J. Leskovec. Overlapping community detection at scale: a nonnegative matrix factorization approach. In ACM International Conference on Web Search and Data Mining, pages 587–596, 2013. [35] S. X. Yu and J. Shi. Multiclass spectral clustering. In IEEE International Conference on Computer Vision - Volume 2, 2003. APPENDIX A. AUGMENTED LAGRANGIANS The augmented Lagrangian framework is a general strat- egy to solve nonlinear optimization problems with equality tion and the gradient vector. B. GRADIENTS FOR NEO-LR We now describe the analytic form of the gradients for the augmented Lagrangian of the NEO-LR objective and a brief validation that these are correct. Consider the augmented Lagrangian (5). The gradient has five components for the five sets of variables: Y , f, g, s and r: rY LA(Y , f, g, s, r; , µ, , ) = 2KY eµT Y µeT Y 2( 1 (tr(Y T W 1 Y ) k))W 1 Y + (Y Y T eeT Y + eeT Y Y T Y ) (W feT Y + efT W Y ) rf LA(Y , f, g, s, r; , µ, , ) = d + W µ (W Y Y T e W 2 f) 2e + (eT f (1 + ↵)n)e + (f g s) rgLA(Y , f, g, s, r; , µ, , ) = (f g s) 3e + (eT g (1 )n r)e rsLA(Y , f, g, s, r; , µ, , ) = (f g s) rrLA(Y , f, g, s, r; , µ, , ) = 3 (eT g (1 )n r) Using analytic gradients in a black-box solver such as L- BFGS-B is problematic if the gradients are even slightly in- correctly computed. To guarantee the analytic gradients we derive are correct, we use forward finite di↵erence method to get numerical approximation of the gradients based on the objective function. We compare these with our analytic gradient and expect to see small relative di↵erences on the order of 10 5 or 10 6 . This is exactly what Figure 4 shows. ous studies of low-rank sdp approximations [6]. Let = [ 1; 2; 3] be the Lagrange multipliers associated th the three scalar constraints (s), (u), (w), and µ and be the Lagrange multipliers associated with the vector nstraints (t) and (v), respectively. Let 0 be a penalty rameter. The augmented Lagrangian for (4) is: LA(Y, f, g, s, r; , µ, , ) = fT d trace(Y T KY ) | {z } the objective 1(trace(Y T W 1 Y ) k) + 2 (trace(Y T W 1 Y ) k)2 µT (Y Y T e W f) + 2 (Y Y T e W f)T (Y Y T e W f) 2(eT f (1 + ↵)n) + 2 (eT f (1 + ↵)n)2 T (f g s) + 2 (f g s)T (f g s) 3(eT g (1 )n r) + 2 (eT g (1 )n r)2 (5) t each step in the augmented Lagrangian solution frame- ork, we solve the following subproblem: minimize LA(Y , f, g, s, r; , µ, , )
  31. 31. We use an augmented Lagrangian method to optimize this problem •  Use L-BFGS-B to optimize each step. •  Update the multiplier estimates in the standard way. •  Pick parameters in a modestly standard way. •  Some variability between problems to show best results, only a little variation in time/performance. •  Faster than the NEOS solvers SILO Seminar David Gleich · Purdue Low rank structure in NEO-K-Means solution Explore low rank structure in NEO-K-Means SDP mparison with Solvers on NEOS Server NEOS Server 1: State-of-the-Art Solvers for Numerical Optimization Our solver with ALM approach is much faster than theirs (e.g., SNOPT which is suitable for large nonlinearly constrained problems with a modest number of degrees of freedom). Our Solver ALM (obj/time) SNOPT solver (obj/time) MUSIC 79514.130/92s 79515.156/306s SCENE 18534.030/3798s 18534.021/8910s YEAST 8902.253/4331s Not solved
  32. 32. We win with our LRSDP solver vs. " the CVX default solver •  Dolphins (n=62) and Les Mis (n=77) are graph probs •  LRSDP is much faster and just as accurate. SILO Seminar David Gleich · Purdue LRSDP is roughly an order of magnitude faster than cvx. LRSDP generates solutions as good as the global optimal from cvx. The objective value are di↵erent in light of the solution tolerances. dolphins 1 : 62 nodes, 159 edges, les miserables 2 : 77 nodes, 254 edges Objective value Run time SDP LRSDP SDP LRSDP dolphins k=2, ↵=0.2, =0 -1.968893 -1.968329 107.03 secs 2.55 secs k=2, ↵=0.2, =0.05 -1.969080 -1.968128 56.99 secs 2.96 secs k=3, ↵=0.3, =0 -2.913601 -2.915384 160.57 secs 5.39 secs k=3, ↵=0.3, =0.05 -2.921634 -2.922252 71.83 secs 8.39 secs les miserables k=2, ↵=0.2, =0 -1.937268 -1.935365 453.96 secs 7.10 secs k=2, ↵=0.3, =0 -1.949212 -1.945632 447.20 secs 10.24 secs k=3, ↵=0.2, =0.05 -2.845720 -2.845070 261.64 secs 13.53 secs k=3, ↵=0.3, =0.05 -2.859959 -2.859565 267.07 secs 19.31 secs 1 D. Lusseau et al., Behavioral Ecology and Sociobiology, 2003. 2 D. E. Knuth. The Stanford GraphBase: A Platform for Combinatorial Computing. Addison-Wesley, 1993. Yangyang Hou (Purdue CS) Low Rank Methods for Optimizing Clustering Nov 2, 2015 26 / 61 Dolphins from Lusseau et al. 2003; Les Mis from Knuth GraphBase
  33. 33. Rounding and Improvement are both important. SILO Seminar David Gleich · Purdue Input ! Relaxed solution ! Rounded solution ! Improved solution Rounding f gives the number of clusters g gives the set of assignments Option 1 Use g and f to determine the number of assignments and go greedy. Option 2 Just greedily assign based on W 1 Y. Improvement Run NEO-K-Means on the output. Initialization Run NEO-K-Means on the intput.
  34. 34. The new method is more robust, even in simple tests. Consider clustering a cycle graph SILO Seminar David Gleich · Purdue
  35. 35. We use disconnected nodes to measure the cluster quality. SILO Seminar David Gleich · Purdue disconnected nodes 0 0.5 1 1.5 2 2.5 3 3.5 4 0 10 20 30 40 50 60 70 80 90 100 Noise No.ofdisconnectednodes random+onelevel neo multilevel neo lrsdp As we increase the noise, only the LRSDP method can reliably find the true clustering.
  36. 36. We get improved vector and graph clustering results too. SILO Seminar David Gleich · Purdue Low rank structure in NEO-K-Means solution Explore low rank structure in NEO-K-Means SDP mental Results on Data Clustering parison of NEO-K-Means objective function values Real-world datasets from Mulan1 By using the LRSDP solution as the initialization of the iterative algorithm, we can achieve better (smaller) objective function values. worst best avg. yeast kmeans+neo 9611 9495 9549 lrsdp+neo 9440 9280 9364 slrsdp+neo 9471 9231 9367 music kmeans+neo 87779 70158 77015 lrsdp+neo 82323 70157 75923 slrsdp+neo 82336 70159 75926 scene kmeans+neo 18905 18745 18806 lrsdp+neo 18904 18759 18811 slrsdp+neo 18895 18760 18810 mulan.sourceforge.net/datasets.html ou (Purdue CS) Low Rank Methods for Optimizing Clustering Nov 2, 2015 31 / 61 Low rank structure in NEO-K-Means solution Explore low rank structure in NEO-K-Means SDP Experimental Results on Data Clustering F1 scores on real-world vector datasets (the larger, the better) NEO-K-Means-based methods outperform other methods. Low-rank SDP method improves the clustering results. moc esp isp okm kmeans+neo lrsdp+neo slrsdp+neo yeast worst - 0.274 0.232 0.311 0.356 0.390 0.369 best - 0.289 0.256 0.323 0.366 0.391 0.391 avg. - 0.284 0.248 0.317 0.360 0.391 0.382 music worst 0.530 0.514 0.506 0.524 0.526 0.537 0.541 best 0.544 0.539 0.539 0.531 0.551 0.552 0.552 avg. 0.538 0.526 0.517 0.527 0.543 0.545 0.547 scene worst 0.466 0.569 0.586 0.571 0.597 0.610 0.605 best 0.470 0.582 0.609 0.576 0.627 0.614 0.625 avg. 0.467 0.575 0.598 0.573 0.610 0.613 0.613 Yangyang Hou (Purdue CS) Low Rank Methods for Optimizing Clustering Nov 2, 2015 32 / 61 We have improved results – impressively so on the yeast dataset – and only slightly worse on the scene data.
  37. 37. We get improved vector and graph clustering results too. SILO Seminar David Gleich · Purdue Facebook1 Facebook2 HepPh AstroPh bigclam 0.830 0.640 0.625 0.645 demon 0.495 0.318 0.503 0.570 oslom 0.319 0.445 0.465 0.580 nise 0.297 0.293 0.102 0.153 m-neo 0.285 0.269 0.206 0.190 LRSDP 0.222 0.148 0.091 0.137 No. of vertices No. of edges Facebook1 348 2,866 Facebook2 756 30,780 HepPh 11,204 117,619 AstroPh 17,903 196,972 For these graphs, we dramatically improve the conductance-vs- coverage plots.
  38. 38. Lloyd’s iterative method takes O(1 second) LRSDP method takes O(1 hour) Now we want to improve the LRSDP time. SILO Seminar David Gleich · Purdue
  39. 39. We can improve the optimization beyond ALM. 1.  Proximal augmented Lagrangian (PALM)" Add a regularization term to the augmented Lagrangian" " " " Solve with L-BFGS-B 2.  ADMM method (5 blocks) SILO Seminar David Gleich · Purdue x(k+1) = argmin LA(x(k) ; (k) ...) + 1 2⌧ kx x(k) k Yk+1 = argmin Y LA(Y, fk , gk , sk , rk ; k , µk , k , ) fk+1 = argmin f LA(Yk+1 , f, gk , sk , rk k , µk , k , ) gk+1 = argmin g LA(Yk+1 , fk+1 , g, sk , rk k , µk , k , ) sk+1 = argmin s LA(Yk+1 , fk+1 , gk+1 , s, rk k , µk , k , ) rk+1 = argmin r LA(Yk+1 , fk+1 , gk+1 , sk+1 , r k , µk , k , ) Convex J Non-convex L
  40. 40. We had to get a new convergence result for the proximal method Results for bound-constrained sub-problems? Ours is a a small adaptation of a general result due to Pennanen (2002). SILO Seminar David Gleich · Purdue Low rank structure in NEO-K-Means solution Explore low rank structure in NEO-K-Means SDP Convergence analysis of PALM 1 Theorem 1 Let (¯x, ¯) be a KKT pair satisfying the strongly second order su cient condition and assume the gradients rc(¯x) are linearly independent. If the { k } are large enough with k ! ¯  1 and if k(x0, 0) (¯x, ¯)k is small enough, then there exists a sequence {(xk , k )} conforming to Algorithm 1 along with open neighborhoods Ck such that for each k, xk+1 is the unique solution in Ck to (Pk ). Then also, the sequence {(xk , k )} converges linearly and Fej´er monotonically to ¯x, ¯ with rate r(¯) < 1 that is decreasing in ¯ and r(¯) ! 0 as ¯ ! 1.
  41. 41. On the yeast dataset, we see no difference in objective, but faster solves SILO Seminar David Gleich · Purdue 0 500 1000 1500 2000 2500 3000 3500 4000 4500 iterative ALM PALM ADMM Runtimes on YEAST 8700 8800 8900 9000 9100 9200 ALM PALM ADMM f(x) values on YEAST
  42. 42. On yeast, we see much better discrete objectives and F1 scores. SILO Seminar David Gleich · Purdue 9000 9100 9200 9300 9400 9500 9600 9700 iterative ALM PALM ADMM NEO−K−Means objectives on YEAST 0.34 0.345 0.35 0.355 0.36 0.365 0.37 0.375 0.38 0.385 0.39 iterative ALM PALM ADMM F1 Scores on YEAST
  43. 43. Recap For overlapping clustering of data and overlapping community detection of graphs, we have a new objective •  Fast Lloyd-like iterative algorithm •  SDP relaxation •  Low-rank SDP relaxation •  Proximal and ADMM acceleration techniques SILO Seminar David Gleich · Purdue 1.  NEO-K-means - Whang, Gleich, Dhillon, SDM 2015 2.  NEO-K-means SDP + Aug. Lagrangian" Hou, Whang, Gleich, Dhillon, KDD 2015 3.  Multiplier Methods for Overlapping K-Means" Hou, Whang, Gleich, Dhillon, Submitted
  44. 44. SILO Seminar David Gleich · Purdue plot(x) 0 2 4 6 8 10 x 10 5 0 0.02 0.04 0.06 0.08 0.1 10 0 10 2 10 4 10 6 10 −15 10 −10 10 −5 10 0 10 0 10 2 10 4 10 6 10 −15 10 −10 10 −5 10 0 nonzeros Crawl of flickr from 2006 ~800k nodes, 6M edges, beta=1/2 (I P)x = (1 )s nnz(x) ⇡ 800k kD1 (xx⇤ )k1" Localized solutions of diffusion equations in large graphs. Joint with Kyle Kloster. WAW2013, KDD2014, WAW2015; J. Internet Math. the answer [5]. Thus, just as in scientific computing, marrying the method to the model is key for the best scientific computing on social networks. Ultimately, none of these steps dif- fer from the practice of physical sci- entific computing. The challenges in creating models, devising algorithms, validating results, and comparing models just take on different chal- lenges when the problems come from social data instead of physical mod- els. Thus, let us return to our starting question: What does the matrix have to do with the social network? Just as in scientific computing, many inter- esting problems, models, and meth- ods for social networks boil down to matrix computations. Yet, as in the expander example above, the types of matrix questions change dramatical- ly in order to fit social network mod- els. Let’s see what’s been done that’s enticingly and refreshingly different from the types of matrix computa- tions encountered in physical scien- tific computing. EXPANDER GRAPHS AND PARALLEL COMPUTING Recently, a coalition of folks from aca- demia, national labs, and industry set out to tackle the problems in parallel computing and expander graphs. They established the Graph 500 benchmark (http://www.graph500.org) to measure the performance of a parallel com- puter on a standard graph computa- tion with an expander graph. Over the past three years, they’ve seen perfor- mance grow by more than 1,000-times Diffusion in a plate Movie interest in diffusion The network, or mesh, from a typical problem in scientific computing n a low dimensional space—think of two or three dimensions. These physical ut limits on the size of the boundary or “surface area” of the space given its No such limits exist in social networks and these two sets are usually about size. A network with this property is called an expander network. Size of set » Size of boundary “Networks” from PDEs are usually physical Social networks are expanders
  45. 45. SILO Seminar David Gleich · Purdue Higher order organization of complex networks Joint with Austin Benson and Jure Leskovec 9 10 8 7 2 0 4 3 11 6 5 1 CEPDR CEPVR IL2R OLLR RIAL RIAR RIVL RIVR RMDDR RMDL RMDR RMDVL RMFL SMDDL SMDDR SMDVR URBR By using a new generalization of spectral clustering methods, we are able to find completely novel and relevant structures in complex systems such as the connectome and transport networks.
  46. 46. SILO Seminar David Gleich · Purdue SIAM Annual Meeting ! (AN16)! July 11-15, 2016 The Westin Waterfront" Boston, Massachusetts David Gleich, Purdue Mary Silber, Northwestern Big Data, Data Science, and Privacy Education, Communication, and Policy Reproducibility and Ethics Efficiency and Optimization Integrating Models and Data (incl. " computational social science, PDEs) Dynamic Networks (learning, evolution, " adaptation, and cooperation) Applied Math, Statistics, and " Machine Learning Earth systems; environmental/ecological applications Epidemiology
  47. 47. Future work Even faster solvers Understand why the solution seems to be rank-2. Better init for Lloyds. SILO Seminar David Gleich · Purdue Solution Z from CVX is even rank 2! 1.  NEO-K-means - Whang, Gleich, Dhillon, SDM 2015 2.  NEO-K-means SDP + Aug. Lagrangian" Hou, Whang, Gleich, Dhillon, KDD 2015 3.  Multiplier Methods for Overlapping K-Means" Hou, Whang, Gleich, Dhillon, Submitted

×