Non-exhaustive, Overlapping K-means

Non-exhaustive,
overlapping K-means
clustering
David F. Gleich!
Purdue University!

Real-world graph and point data
have overlapping clusters.
GeneRa
10 20 30 40 50 60 70
NM_003748NM_003862Contig32125_RCU82987AB037863NM_020974Contig55377_RCNM_003882NM_000849Contig48328_RCContig46223_RCNM_006117NM_003239NM_018401AF257175AF201951NM_001282Contig63102_RCNM_000286Contig34634_RCNM_000320AB033007AL355708NM_000017NM_006763AF148505Contig57595NM_001280AJ224741U45975Contig49670_RCContig753_RCContig25055_RCContig53646_RCContig42421_RCContig51749_RCAL137514NM_004911NM_000224NM_013262Contig41887_RCNM_004163AB020689NM_015416Contig43747_RCNM_012429AB033043AL133619NM_016569NM_004480NM_004798Contig37063_RCNM_000507AB037745Contig50802_RCNM_001007Contig53742_RCNM_018104Contig51963Contig53268_RCNM_012261NM_020244Contig55813_RCContig27312_RCContig44064_RCNM_002570NM_002900AL050090NM_015417Contig47405_RCNM_016337Contig55829_RCContig37598Contig45347_RCNM_020675NM_003234AL080110AL137295Contig17359_RCNM_013296NM_019013AF052159Contig55313_RCNM_002358NM_004358Contig50106_RCNM_005342NM_014754U58033Contig64688NM_001827Contig3902_RCContig41413_RCNM_015434NM_014078NM_018120NM_001124L27560Contig45816_RCAL050021NM_006115NM_001333NM_005496Contig51519_RCContig1778_RCNM_014363NM_001905NM_018454NM_002811NM_004603AB032973NM_006096D25328Contig46802_RCX94232NM_018004Contig8581_RCContig55188_RCContig50410Contig53226_RCNM_012214NM_006201NM_006372Contig13480_RCAL137502Contig40128_RCNM_003676NM_013437Contig2504_RCAL133603NM_012177R70506_RCNM_003662NM_018136NM_000158NM_018410Contig21812_RCNM_004052Contig4595Contig60864_RCNM_003878U96131NM_005563NM_018455Contig44799_RCNM_003258NM_004456NM_003158NM_014750Contig25343_RCNM_005196Contig57864_RCNM_014109NM_002808Contig58368_RCContig46653_RCNM_004504M21551NM_014875NM_001168NM_003376NM_018098AF161553NM_020166NM_017779NM_018265AF155117NM_004701NM_006281Contig44289_RCNM_004336Contig33814_RCNM_003600NM_006265NM_000291NM_000096NM_001673NM_001216NM_014968NM_018354NM_007036NM_004702Contig2399_RCNM_001809Contig20217_RCNM_003981NM_007203NM_006681AF055033NM_014889NM_020386NM_000599Contig56457_RCNM_005915Contig24252_RCContig55725_RCNM_002916NM_014321NM_006931AL080079Contig51464_RCNM_000788NM_016448X05610NM_014791Contig40831_RCAK000745NM_015984NM_016577Contig32185_RCAF052162AF073519NM_003607NM_006101NM_003875Contig25991Contig35251_RCNM_004994NM_000436NM_002073NM_002019NM_000127NM_020188AL137718Contig28552_RCContig38288_RCAA555029_RCNM_016359Contig46218_RCContig63649_RCAL080059
Social networks have overlapping
clusters because of social circles
Genes have overlapping clusters
due to their role in multiple functions
SILO Seminar
David Gleich · Purdue

Overlapping research projects
are what got me here too!
PhD Thesis
on Google’s
PageRank
MSR Intern
and
Overlapping
Clusters for
Distributed
Computation
Accelerated
NCP plots
and locally
minimal
communities
Neighborhood
inﬂated seed
expansion for
overlapping
communities
Non-
exhaustive
overlapping "
K-means
SILO Seminar
1.  NISE Clustering - Whang, Gleich, Dhillon, CIKM 2013
2.  NEO-K-means - Whang, Gleich, Dhillon, SDM 2015
3.  NEO-K-means SDP "
Hou, Whang, Gleich, Dhillon, KDD 2015
4.  Multiplier Methods for Overlapping K-Means"
Hou, Whang, Gleich, Dhillon, Submitted

SILO Seminar

es around the seed sets
Overlapping communities via
seed set expansion works nicely.
Filtering Phase
Seeding Phase
Seed Set Expansion Phase
Propagation Phase
Joyce Jiyoung Whang, The University of Texas at Austin Conference on Information and Knowledge Management (8/44
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
Coverage (percentage)
M
Student Version of MATLAB
(a) AstroPh
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Coverage (percentage)
MaximumConductance
egonet
graclus centers
spread hubs
random
bigclam
(d) Flickr
Figure 2: Conductance vs. graph cov
centers” outperforms other seeding str
We can cover 95% of network with
communities of cond. ~0.15.
Flickr social network
2M vertices, 22M edges
cond(S)
= cut(S)/“size”(S)
SILO Seminar

We wanted a more principled
approach to achieve these results.
SILO Seminar

The state of the art for clustering
SILO Seminar
K-Means
Problem 1
Problem 2
Problem 3
Problem 4
😀
😊
😟
😢
K-Means

The state of the art for clustering
SILO Seminar
K-Means
Problem 1
Problem 2
Problem 3
Problem 4
😀
😊
K-Means
NEO-K-Means
NEO K-Means
😊
😊

m1
m2
|| xi – m1 ||
|| xi – m2 ||
K-means as
optimization.
SILO Seminar
minimize
P
ij Uij kxi mj k
2
subject to U is an assignment to clusters
mj = 1P
i Ui j Uij xi
minimize
P
ij Uij kxi mj k
2
subject to U is an multi-assignment to clusters
mj = 1P
i Ui j Uij xi
Input Points x1, ... , xn
Find an assignment
matrix U that gives
cluster assignments
to minimize
x1
x2
x3
x4
U =
2
6
6
4
1 0
1 0
0 1
0 1
3
7
7
5
c1 c2
K-means objective!
K-means’ objective with overlap?!

Overlap is not a natural addition
to optimization based clustering.
SILO Seminar

The NEO-K-means objective
balances overlap and outliers.
SILO Seminar
minimize
P
ij Uij kxi mj k
2
subject to Uij is binary
trace(UT
U) = (1 + ↵)n (↵n overlap)
eT
Ind[Ue] (1 )n (up to n outliers)
mj = 1P
i Ui j Uij xi
· If ↵, = 0, then we get back to K-means.
· Automatically choose ↵, based on K-means.
😊
1. Make (1 + ↵)n total assignments.
2. Allow up to n outliers.

−8
−6
−4
−2
0
2
4
6
8 Cluster 1
Cluster 2
Cluster 1 & 2
Not assigned
Lloyd’s algorithm for NEO-K-means
is just a wee-bit more complex.
SILO Seminar
Until done
1. Update centroids.
2. Assign (1 )n nodes to closest centroid
3. Make (↵ + )n assignments based on minimizing distance.
2
4
6
8 Cluster 1
Cluster 2
Cluster 1 & 2
Not assigned
This algorithm
correctly assigns our
example case and
even determines
overlap and outlier
parameters!
THEOREM Lloyds
algorithm decrease the
objective monotonically.

The non-exhaustiveness is
necessary for assignments.
SILO Seminar
−6 −4 −2 0 2 4 6 8
Cluster 1
Cluster 2
Cluster 1 & 2
Not assigned
b) First extension of k-means
−8 −6 −4 −2 0 2 4
−8
−6
−4
−2
0
2
4
6
8 Cluster 1
Cluster 2
Cluster 1 & 2
Not assigned
(c) NEO-K-Means
nerated (n=1,000, ↵=0.1, =0.005). Green points indicate o
−4 −2 0 2 4 6 8
& 2
gned
st extension of k-means
−8 −6 −4 −2 0 2 4 6 8
−8
−6
−4
−2
0
2
4
6
8 Cluster 1
Cluster 2
Cluster 1 & 2
Not assigned
(c) NEO-K-Means
d (n=1,000, ↵=0.1, =0.005). Green points indicate overlap4
6
8 Cluster 1
Cluster 2
Cluster 1 & 2
Not assigned
Output without assignment constraint.
(beta = 1)
NEO-K-means output (correct)

The Weighted, Kernel "
NEO-K-Means objective.
•  Introduce weights for each data point.
•  Introduce feature maps for each data point too.
SILO Seminar
minimize
P
ij Uij wi k (xi ) mj k
2
subject to Uij is binary
trace(UT
U) = (1 + ↵)n (↵n overlap)
eT
Ind[Ue] (1 )n (up to n outliers)
mj = 1P
i Uij wi
wi Uij xi
X
ij
Uij wi k (xi ) mj k
2
=
X
ij
Uij wi Kii
uj WKWuj
uT
j Wuj
!
Theorem If K = D 1
+ D 1
AD 1
, then the NEO-K-Means objective
is equivalent to overlapping conductance.
NOTE

This means that NEO-K-Means was
the principled objective we were after!
SILO Seminar

Conductance communities
Conductance is one of the most
important community scores [Schaeffer07]
The conductance of a set of vertices is
the ratio of edges leaving to total edges:

Equivalently, it’s the probability that a
random edge leaves the set.
Small conductance ó Good community
(S) =
cut(S)
min vol(S), vol( ¯S)
(edges leaving the set)
(total edges
in the set)
cut(S) = 7
vol(S) = 33
vol( ¯S) = 11
(S) = 7/11
SILO Seminar

Our theorem means that NEO-K-Means
can optimize the sum-conductance obj.
SILO Seminar
(S) 
cut(S)
vol(S)
+
cut( ¯S)
vol( ¯S)
X
S2C
cut(S)
vol(S)
=
X
S2C
(S) if vol(S)  vol( ¯S)
Conductance
Normalized cut bi-partition
NEO-K-Means"
objective
When we use this
method to partition the
Karate club network, we
get reasonable solutions.
•  Inspired by Dhillon et al.’s
work on Graclus
•  We have a multilevel
method to optimize the
graph case.

We get state of the art clustering
perf. on vector and graph datasets.
SILO Seminar
F1 scores on vector datasets from the Mulan repository.
moc fuzzy esp isp okm rokm NEO
synth1 0.833 0.959 0.977 0.985 0.989 0.969 0.996
synth2 0.836 0.957 0.952 0.973 0.967 0.975 0.996
synth3 0.547 0.919 0.968 0.952 0.970 0.928 0.996
yeast - 0.308 0.289 0.203 0.311 0.203 0.366
music 0.534 0.533 0.527 0.508 0.527 0.454 0.550
scene 0.467 0.431 0.572 0.586 0.571 0.593 0.626
n dim. ¯|C| outliers k
synth1 5,000 2 2,750 0 2
synth2 1,000 2 550 5 2
synth3 6,000 2 3,600 6 2
yeast 2,417 103 731.5 0 14
music 593 72 184.7 0 6
scene 2,407 294 430.8 0 6
The Mulan testset has
a number of
appropriate datasets

NEO-K-Means with Lloyds is fast and
usually accurate but inconsistent.
SILO Seminar
−6 −4 −2 0 2 4 6
−2
0
2
4
6
8
10
Cluster 1
Cluster 2
Cluster 1 & 2
Cluster 3
Not assigned
−4 −2 0 2 4 6
ster 1
ster 2
ster 1 & 2
ster 3
assigned
−6 −4 −2
−2
0
2
4
6
8
10
Cluster 1
Cluster 2
Cluster 1 & 2
Cluster 3
Not assigned
A more complicated
overlapping test case
The output from NEO-K-
Means with Lloyd’s method

Can we get a more robust method?

Yes!
SILO Seminar

Towards better optimization
of the objective
1.  An SDP relaxation of the objective.
2.  A practical low-rank SDP heuristic.
3.  Faster optimization methods for the heuristic.
SILO Seminar

From assignments to co-
occurrence matrices
SILO Seminar
There are three key variables in our formulation
1. The co-occurrence matrix
Z =
X
j
Wuj uT
j W/uT
j Wuj
2. The overlap vector f
3. The assignment indicator g
U =
2
6
6
4
1 0
1 1
0 1
0 0
3
7
7
5
f =
2
6
6
4
1
2
1
0
3
7
7
5 g =
2
6
6
4
1
1
1
0
3
7
7
5

We can convert our objective into a
trace minimization problem.
SILO Seminar
Kij = (xi )T
(xj )
di = wi Kii
X
ij
Uij wi k (xi ) mj k
2
=
X
ij
Uij wi Kii
uj WKWuj
uT
j Wuj
!
=
X
ij
Uij wi Kii
X
j
uj WKWuj
uT
j Wuj
= fT
d trace(KZ)
Z = normalized co-occurrence
f = overlap count
g = assignment indicator
The objective function

There is an SDP-like framework to
solve NEO-K-means.
SILO Seminar
maximize
Z,f,g
trace(KZ) fT
d
subject to trace(W 1
Z) = k, (a)
Zij 0, (b)
Z ⌫ 0, Z = ZT
(c)
Ze = Wf, (d)
eT
f = (1 + ↵)n, (e)
eT
g (1 )n, (f)
f g, (g)
rank(Z) = k, (h)
f 2 Zn
0, g 2 {0, 1}n
. (i)
Z must come from
an assignment matrix
Overlap and assignment
constraints
Combinatorial constraints

There is an SDP-relaxation to
approximate NEO-K-means.
SILO Seminar
Z must come from
an assignment matrix
Overlap and assignment
constraints
maximize
Z,f,g
trace(KZ) fT
d
subject to trace(W 1
Z) = k, (a)
Zij 0, (b)
Z ⌫ 0, Z = ZT
(c)
Ze = Wf, (d)
eT
f = (1 + ↵)n, (e)
eT
g (1 )n, (f)
f g, (g)
0  g  1 Relaxed constraints

This SDP can easily solve
simple problems.
SILO Seminar
NEO-K-Means
SDP
Solution Z from CVX is even rank 2!

But SDP methods have a number of
issues for large-scale problems.
1.  The number of variables is quadratic in the number of
data points
2.  The best solvers can only solve problems with a few
hundred or thousand points.
So like many before us (e.g. Burer & Monteiro, Kulis
Surendran, and Platt 2007, and more)
we optimize a low-rank factorization of the solution

SILO Seminar

Using the NEO-K-Means Low-Rank
SDP, we can ﬁnd assignments directly.
SILO Seminar
NEO-K-Means
Low-rank SDP
Y YT
kZ YYT
k = 2.3 ⇥ 10 4

maximize
Y,f,g,s,r
trace(YT
KY) fT
d
subject to k = trace(YT
W 1
Y)
0 = YYT
e Wf
0 = eT
f (1 + ↵)n
0 = f g s
0 = eT
g (1 )n r
Yij 0, s 0, r 0
0  f  ke, 0  g  1
The Low-Rank NEO-K-Means SDP
We lose convexity but gain practicality.
We introduce slacks at this point.
SILO Seminar
icky non-convex term
simple bound constraints

We use an augmented Lagrangian
method to optimize this problem
SILO Seminar
Journal on Optimization, 18(1):186–205, 2007.
[29] K. Trohidis, G. Tsoumakas, G. Kalliris, and I. P.
Vlahavas. Multi-label classification of music into
emotions. In International Conference on Music
Information Retrieval, pages 325–330, 2008.
[30] J. J. Whang, I. S. Dhillon, and D. F. Gleich.
Non-exhaustive, overlapping k-means. In Proceedings
of the SIAM International Conference on Data
Mining, pages 936–944, 2015.
[31] J. J. Whang, D. Gleich, and I. S. Dhillon. Overlapping
community detection using seed set expansion. In
ACM International Conference on Information and
Knowledge Management, pages 2099–2108, 2013.
[32] L. F. Wu, T. R. Hughes, A. P. Davierwala, M. D.
Robinson, R. Stoughton, and S. J. Altschuler.
Large-scale prediction of saccharomyces cerevisiae
gene function using overlapping transcriptional
clusters. Nature Genetics, 31(3):255–265, June 2002.
[33] E. P. Xing and M. I. Jordan. On semidefinite
relaxations for normalized k-cut and connections to
spectral clustering. Technical Report
UCB/USD-3-1265, University of California, Berkeley,
2003.
[34] J. Yang and J. Leskovec. Overlapping community
detection at scale: a nonnegative matrix factorization
approach. In ACM International Conference on Web
Search and Data Mining, pages 587–596, 2013.
[35] S. X. Yu and J. Shi. Multiclass spectral clustering. In
IEEE International Conference on Computer Vision -
Volume 2, 2003.
APPENDIX
A. AUGMENTED LAGRANGIANS
The augmented Lagrangian framework is a general strat-
egy to solve nonlinear optimization problems with equality
tion and the gradient vector.
B. GRADIENTS FOR NEO-LR
We now describe the analytic form of the gradients for the
augmented Lagrangian of the NEO-LR objective and a brief
validation that these are correct. Consider the augmented
Lagrangian (5). The gradient has five components for the
five sets of variables: Y , f, g, s and r:
rY LA(Y , f, g, s, r; , µ, , ) =
2KY eµT
Y µeT
Y
2( 1 (tr(Y T
W 1
Y ) k))W 1
Y
+ (Y Y T
eeT
Y + eeT
Y Y T
Y ) (W feT
Y + efT
W Y )
rf LA(Y , f, g, s, r; , µ, , ) =
d + W µ (W Y Y T
e W 2
f) 2e + (eT
f (1 + ↵)n)e
+ (f g s)
rgLA(Y , f, g, s, r; , µ, , ) =
(f g s) 3e + (eT
g (1 )n r)e
rsLA(Y , f, g, s, r; , µ, , ) = (f g s)
rrLA(Y , f, g, s, r; , µ, , ) = 3 (eT
g (1 )n r)
Using analytic gradients in a black-box solver such as L-
BFGS-B is problematic if the gradients are even slightly in-
correctly computed. To guarantee the analytic gradients we
derive are correct, we use forward finite di↵erence method
to get numerical approximation of the gradients based on
the objective function. We compare these with our analytic
gradient and expect to see small relative di↵erences on the
order of 10 5
or 10 6
. This is exactly what Figure 4 shows.
ous studies of low-rank sdp approximations [6].
Let = [ 1; 2; 3] be the Lagrange multipliers associated
th the three scalar constraints (s), (u), (w), and µ and
be the Lagrange multipliers associated with the vector
nstraints (t) and (v), respectively. Let 0 be a penalty
rameter. The augmented Lagrangian for (4) is:
LA(Y, f, g, s, r; , µ, , ) =
fT
d trace(Y T
KY )
| {z }
the objective
1(trace(Y T
W 1
Y ) k)
+
2
(trace(Y T
W 1
Y ) k)2
µT
(Y Y T
e W f)
+
2
(Y Y T
e W f)T
(Y Y T
e W f)
2(eT
f (1 + ↵)n) +
2
(eT
f (1 + ↵)n)2
T
(f g s) +
2
(f g s)T
(f g s)
3(eT
g (1 )n r)
+
2
(eT
g (1 )n r)2
(5)
t each step in the augmented Lagrangian solution frame-
ork, we solve the following subproblem:
minimize LA(Y , f, g, s, r; , µ, , )

We use an augmented Lagrangian
method to optimize this problem
•  Use L-BFGS-B to optimize each step.
•  Update the multiplier estimates in the standard way.
•  Pick parameters in a modestly standard way.
•  Some variability between problems to show best results, only
a little variation in time/performance.
•  Faster than the NEOS solvers

SILO Seminar
Low rank structure in NEO-K-Means solution Explore low rank structure in NEO-K-Means SDP
mparison with Solvers on NEOS Server
NEOS Server 1: State-of-the-Art Solvers for Numerical Optimization
Our solver with ALM approach is much faster than theirs (e.g.,
SNOPT which is suitable for large nonlinearly constrained problems
with a modest number of degrees of freedom).
Our Solver ALM (obj/time) SNOPT solver (obj/time)
MUSIC 79514.130/92s 79515.156/306s
SCENE 18534.030/3798s 18534.021/8910s
YEAST 8902.253/4331s Not solved

We win with our LRSDP solver vs. "
the CVX default solver
•  Dolphins (n=62) and Les Mis (n=77) are graph probs
•  LRSDP is much faster and just as accurate.
SILO Seminar
LRSDP is roughly an order of magnitude faster than cvx.
LRSDP generates solutions as good as the global optimal from cvx.
The objective value are di↵erent in light of the solution tolerances.
dolphins 1
: 62 nodes, 159 edges, les miserables 2
: 77 nodes, 254 edges
Objective value Run time
SDP LRSDP SDP LRSDP
dolphins
k=2, ↵=0.2, =0 -1.968893 -1.968329 107.03 secs 2.55 secs
k=2, ↵=0.2, =0.05 -1.969080 -1.968128 56.99 secs 2.96 secs
k=3, ↵=0.3, =0 -2.913601 -2.915384 160.57 secs 5.39 secs
k=3, ↵=0.3, =0.05 -2.921634 -2.922252 71.83 secs 8.39 secs
les miserables
k=2, ↵=0.2, =0 -1.937268 -1.935365 453.96 secs 7.10 secs
k=2, ↵=0.3, =0 -1.949212 -1.945632 447.20 secs 10.24 secs
k=3, ↵=0.2, =0.05 -2.845720 -2.845070 261.64 secs 13.53 secs
k=3, ↵=0.3, =0.05 -2.859959 -2.859565 267.07 secs 19.31 secs
1
D. Lusseau et al., Behavioral Ecology and Sociobiology, 2003.
2
D. E. Knuth. The Stanford GraphBase: A Platform for Combinatorial Computing. Addison-Wesley, 1993.
Yangyang Hou (Purdue CS) Low Rank Methods for Optimizing Clustering Nov 2, 2015 26 / 61
Dolphins from Lusseau et al. 2003;
Les Mis from Knuth GraphBase

Rounding and Improvement
are both important.
SILO Seminar
Input ! Relaxed solution ! Rounded solution ! Improved solution
Rounding
f gives the number of clusters
g gives the set of assignments
Option 1
Use g and f to determine
the number of assignments and go greedy.
Option 2
Just greedily assign based on W 1
Y.
Improvement
Run NEO-K-Means on the output.
Initialization
Run NEO-K-Means on the intput.

The new method is more
robust, even in simple tests.
Consider clustering a cycle graph
SILO Seminar

We use disconnected nodes to
measure the cluster quality.
SILO Seminar
disconnected nodes
0 0.5 1 1.5 2 2.5 3 3.5 4
0
10
20
30
40
50
60
70
80
90
100
Noise
No.ofdisconnectednodes
random+onelevel neo
multilevel neo
lrsdp As we increase the noise,
only the LRSDP method
can reliably ﬁnd the true
clustering.

We get improved vector and
graph clustering results too.
SILO Seminar
mental Results on Data Clustering
parison of NEO-K-Means objective function values
Real-world datasets from Mulan1
By using the LRSDP solution as the initialization of the iterative
algorithm, we can achieve better (smaller) objective function values.
worst best avg.
yeast
kmeans+neo 9611 9495 9549
lrsdp+neo 9440 9280 9364
slrsdp+neo 9471 9231 9367
music
kmeans+neo 87779 70158 77015
lrsdp+neo 82323 70157 75923
slrsdp+neo 82336 70159 75926
scene
kmeans+neo 18905 18745 18806
lrsdp+neo 18904 18759 18811
slrsdp+neo 18895 18760 18810
mulan.sourceforge.net/datasets.html
ou (Purdue CS) Low Rank Methods for Optimizing Clustering Nov 2, 2015 31 / 61
Experimental Results on Data Clustering
F1 scores on real-world vector datasets (the larger, the better)
NEO-K-Means-based methods outperform other methods.
Low-rank SDP method improves the clustering results.
moc esp isp okm kmeans+neo lrsdp+neo slrsdp+neo
yeast
worst - 0.274 0.232 0.311 0.356 0.390 0.369
best - 0.289 0.256 0.323 0.366 0.391 0.391
avg. - 0.284 0.248 0.317 0.360 0.391 0.382
music
worst 0.530 0.514 0.506 0.524 0.526 0.537 0.541
best 0.544 0.539 0.539 0.531 0.551 0.552 0.552
avg. 0.538 0.526 0.517 0.527 0.543 0.545 0.547
scene
worst 0.466 0.569 0.586 0.571 0.597 0.610 0.605
best 0.470 0.582 0.609 0.576 0.627 0.614 0.625
avg. 0.467 0.575 0.598 0.573 0.610 0.613 0.613
Yangyang Hou (Purdue CS) Low Rank Methods for Optimizing Clustering Nov 2, 2015 32 / 61
We have improved results – impressively so on the yeast dataset –
and only slightly worse on the scene data.

We get improved vector and
graph clustering results too.
SILO Seminar
Facebook1 Facebook2 HepPh AstroPh
bigclam 0.830 0.640 0.625 0.645
demon 0.495 0.318 0.503 0.570
oslom 0.319 0.445 0.465 0.580
nise 0.297 0.293 0.102 0.153
m-neo 0.285 0.269 0.206 0.190
LRSDP 0.222 0.148 0.091 0.137
No. of vertices No. of edges
Facebook1 348 2,866
Facebook2 756 30,780
HepPh 11,204 117,619
AstroPh 17,903 196,972
For these graphs, we
dramatically improve
the conductance-vs-
coverage plots.

Lloyd’s iterative method takes O(1 second)
LRSDP method takes O(1 hour)

Now we want to improve the LRSDP time.
SILO Seminar

We can improve the
optimization beyond ALM.
1.  Proximal augmented Lagrangian (PALM)"

Add a regularization term to the augmented Lagrangian"
"
"
"

Solve with L-BFGS-B
2.  ADMM method (5 blocks)
SILO Seminar
x(k+1)
= argmin LA(x(k)
; (k)
...) +
1
2⌧
kx x(k)
k
Yk+1
= argmin
Y
LA(Y, fk
, gk
, sk
, rk
; k
, µk
, k
, )
fk+1
= argmin
f
LA(Yk+1
, f, gk
, sk
, rk k
, µk
, k
, )
gk+1
= argmin
g
LA(Yk+1
, fk+1
, g, sk
, rk k
, µk
, k
, )
sk+1
= argmin
s
LA(Yk+1
, fk+1
, gk+1
, s, rk k
, µk
, k
, )
rk+1
= argmin
r
LA(Yk+1
, fk+1
, gk+1
, sk+1
, r k
, µk
, k
, )
Convex J
Non-convex L

We had to get a new convergence result
for the proximal method
Results for bound-constrained sub-problems?
Ours is a a small adaptation of a general result
due to Pennanen (2002).
SILO Seminar
Convergence analysis of PALM 1
Theorem 1
Let (¯x, ¯) be a KKT pair satisfying the strongly second order su cient condition and
assume the gradients rc(¯x) are linearly independent. If the { k } are large enough with
k ! ¯  1 and if k(x0, 0) (¯x, ¯)k is small enough, then there exists a sequence
{(xk , k )} conforming to Algorithm 1 along with open neighborhoods Ck such that for
each k, xk+1 is the unique solution in Ck to (Pk
). Then also, the sequence {(xk , k )}
converges linearly and Fej´er monotonically to ¯x, ¯ with rate r(¯) < 1 that is decreasing
in ¯ and r(¯) ! 0 as ¯ ! 1.

On the yeast dataset, we see no
difference in objective, but faster solves
SILO Seminar
0
500
1000
1500
2000
2500
3000
3500
4000
4500
iterative ALM PALM ADMM
Runtimes on YEAST
8700
8800
8900
9000
9100
9200
ALM PALM ADMM
f(x) values on YEAST

On yeast, we see much better
discrete objectives and F1 scores.
SILO Seminar
9000
9100
9200
9300
9400
9500
9600
9700
NEO−K−Means objectives on YEAST
0.34
0.345
0.35
0.355
0.36
0.365
0.37
0.375
0.38
0.385
0.39
F1 Scores on YEAST

Recap
For overlapping clustering of data and
overlapping community detection of graphs, we
have a new objective
•  Fast Lloyd-like iterative algorithm
•  SDP relaxation
•  Low-rank SDP relaxation
•  Proximal and ADMM acceleration techniques
SILO Seminar
2.  NEO-K-means SDP + Aug. Lagrangian"

SILO Seminar
plot(x)
0 2 4 6 8 10
x 10
5
0
0.02
0.04
0.06
0.08
0.1
10
0
10
2
10
4
10
6
10
−15
10
−10
10
−5
10
0
10
0
10
2
10
4
10
6
10
−15
10
−10
10
−5
10
0
nonzeros
Crawl of flickr from 2006 ~800k nodes, 6M edges, beta=1/2
(I P)x = (1 )s
nnz(x) ⇡ 800k
kD1
(xx⇤
)k1"
Localized solutions of diffusion equations in large graphs.
Joint with Kyle Kloster. WAW2013, KDD2014, WAW2015; J. Internet Math.
the answer [5]. Thus, just as in scientific
computing, marrying the method to
the model is key for the best scientific
computing on social networks.
Ultimately, none of these steps dif-
fer from the practice of physical sci-
entific computing. The challenges in
creating models, devising algorithms,
validating results, and comparing
models just take on different chal-
lenges when the problems come from
social data instead of physical mod-
els. Thus, let us return to our starting
question: What does the matrix have
to do with the social network? Just as
in scientific computing, many inter-
esting problems, models, and meth-
ods for social networks boil down to
matrix computations. Yet, as in the
expander example above, the types of
matrix questions change dramatical-
ly in order to fit social network mod-
els. Let’s see what’s been done that’s
enticingly and refreshingly different
from the types of matrix computa-
tions encountered in physical scien-
tific computing.
EXPANDER GRAPHS AND
PARALLEL COMPUTING
Recently, a coalition of folks from aca-
demia, national labs, and industry set
out to tackle the problems in parallel
computing and expander graphs. They
established the Graph 500 benchmark
(http://www.graph500.org) to measure
the performance of a parallel com-
puter on a standard graph computa-
tion with an expander graph. Over the
past three years, they’ve seen perfor-
mance grow by more than 1,000-times
Diffusion
in a plate
Movie
interest in
diffusion
The network, or mesh, from a typical problem in scientific computing
n a low dimensional space—think of two or three dimensions. These physical
ut limits on the size of the boundary or “surface area” of the space given its
No such limits exist in social networks and these two sets are usually about
size. A network with this property is called an expander network.
Size of set » Size of boundary
“Networks”
from PDEs
are usually
physical
Social networks
are expanders

SILO Seminar
Higher order organization of complex networks
Joint with Austin Benson and Jure Leskovec
9
10
8
7
2
0
4
3
11
6
5
1
CEPDR
CEPVR
IL2R
OLLR
RIAL
RIAR
RIVL
RIVR
RMDDR
RMDL
RMDR
RMDVL
RMFL
SMDDL
SMDDR
SMDVR
URBR
By using a new generalization of
spectral clustering methods, we are
able to ﬁnd completely novel and
relevant structures in complex systems
such as the connectome and transport
networks.

SILO Seminar
SIAM Annual Meeting !
(AN16)!

July 11-15, 2016
The Westin Waterfront"
Boston, Massachusetts

David Gleich, Purdue
Mary Silber, Northwestern
Big Data, Data Science, and Privacy
Education, Communication, and Policy
Reproducibility and Ethics
Efﬁciency and Optimization

Integrating Models and Data (incl. "
computational social science, PDEs)
Dynamic Networks (learning, evolution, "
adaptation, and cooperation)
Applied Math, Statistics, and "
Machine Learning
Earth systems; environmental/ecological
applications
Epidemiology

Future work
Even faster solvers

Understand why the
solution seems to be
rank-2.
Better init for Lloyds.
SILO Seminar
Solution Z from CVX is even rank 2!
2.  NEO-K-means SDP + Aug. Lagrangian"

Non-exhaustive, Overlapping K-means

More Related Content

What's hot

Viewers also liked

Similar to Non-exhaustive, Overlapping K-means

Recently uploaded

Non-exhaustive, Overlapping K-means