Cross-validation estimate of the number of clusters in a network

Cross-validation estimate of !
the number of clusters in a network
AIST
Scientiﬁc Reports 7, 3327 (2017). arXiv:1605.07915 (2016).
Tatsuro Kawamoto
1
“Comparative analysis on the selection of number of clusters in community detection”,!
T.K., Y. Kabashima, arXiv:1606.07668 (2016).
related work
Tokyo Tech Yoshiyuki Kabashima

Graph clustering (community detection)
Books about US politics
Determine the number of clusters q that
most efﬁciently describes the network.
Goal

Framework: statistical inference!
Model: Stochastic Block Model!
Algorithm: EM algorithm + Belief Propagation (BP)!
!
Selection of q (model selection): !
LOOCV estimates of prediction errors
Summary
• 4 types of the LOOCV estimates prediction/training errors are considered.
• The LOOCV can be performed efficiently using BP.
• Performance is reasonable in practice.
• Overfit/underfit tendency among the LOOCVs is analyzed theoretically.
(LOOCV = leave-one-out cross-validation)
Principled, scalable, and widely applicable.

ω : connection prob. (afﬁnity matrix)
q : # of clusters
γ : sizes of clusters
p(A, | , !, q) =
NY
i=1
i
Y
i<j
!Aij
i j
1 ! i j
1 Aij
Aij : adjacency matrix
σi : the cluster that vertex i belongs
Stochastic block model (SBM)

f =
1
N
log
X
p(A, | , !)
j
i k
i
i
=
1
Zi i
e h i
Y
k2@i
X
k
k!i
k
! k i
!
i!j
i
=
1
Zi!j i
e h i
Y
k2@ij
X
k
k!i
k
! k i
!
Belief Propagation (BP)
Minimize the (Bethe) free energy
using the EM algorithm
p(A, | , !, q) =
NY
i=1
i
Y
i<j
!Aij
i j
1 ! i j
1 Aij
Marginal distribution of σ w.r.t. vertex i
i
i
Decelle et al., PRE (2011) BP sparsePartial Bayes

Previous works
Decelle et al., PRE (2011)
Scalable algorithm in sparse case!
High accuracy in the ideal situation (known theoretically)!
!
but the Bethe free energy typically fails to determine q in practice
We keep the algorithm and use
prediction errors to determine q
BP sparse
we focus on
Partial Bayes
can’t select q…

test settraining set
Prediction error
example
error
model complexity
prediction error
training error
most parsimonious
total dataset
predict test set from training set

Prediction error
training settraining set test set
test set training set
3-fold cross-validation
total dataset

test set
dataset
i j
edge
test set
non-edge
?i j
{Aij}
…
?
i
j
training set
training set
Leave-one-out cross-validation
very heavy computation
(if it is done by brute force)

Bayes prediction error
?i j
cross-entropy error function
EBayes(q) =
1
L
X
i<j
pactual(Aij)
h
log ˆp(Aij|A(i,j)
)
i
Analytic Expression in terms of BP!!
(i.e., no need of brute force!)
i j
σ are marginalized
ˆp(Aij = 1|A(i,j)
) =
X
i, j
ˆp(Aij = 1| i, j)p( i, j|A(i,j)
)
=
X
i, j
! i j
i!j
i
j!i
j

Gibbs prediction error
EGibbs(q) =
1
L
X
(i,j)2E
X
i, j
i!j
i
j!i
j
log ! i j
i!j
i
! i,argmax{ i!j
}
MAP estimate
Gibbs training error
Etraining(q) =
1
L
X
(i,j)2E
X
i, j
i!j
i
! i j
j!i
j
Zij
log ! i j
i j
Measure the error before
marginalizing w.r.t. σ
Choose most likely σ
Use all the data to
measure the error

Results
Gibbs prediction
MAP (Gibbs)
Bayes prediction
Gibbs training
political books
Bethefreeenergy
We use the “one-standard error rule.”
Hastie, Tibshirani, & Friedman “Elements of Statistical Learning” (2013)

Actual partitions
metadata
q = 3 q = 5

Degree-corrected SBM
Gibbs prediction
MAP (Gibbs)
Bayes prediction
Gibbs training
ba
Political blogspolitical books
un-corrected corrected

Relations among errors
EBayes = EGibbs DKL p( i, j|A(i,j))||p( i, j|A)
EBayes = Etraining + DKL p( i, j|A)||p( i, j|A(i,j))
sample average
qtraining qBayes qGibbs
Etraining  EBayes  EGibbs
If the partitions of different q constitutes a
hierarchical structure (sufﬁcient condition),
deduced from the monotonicity of the KL divergence.
Directly from the Bayes rule,

Bethe free energy in terms of the
prediction errors
If we consider the leaf-one-vertex-out version
of the Bayes prediction error,
i
Ev
Bayes(q) =
1
L
X
i
log Zi
fBethe(q) / Ev
Bayes(q) EBayes(q) + const.
Ev
Bayes(q)Note that the error for each edge is counted twice in

When the network is actually
generated by the SBM
a b c d e
• The Bayes prediction error achieves the information-theoretic
detectability threshold for q=2 equal-size clusters. (analytically derived)!
!
• The Gibbs prediction error strictly underﬁts near the detectability
threshold. (analytically derived)

Hold-out method & K-fold CV
10-fold cross-validationholdout method
network scienceb
holdout method 10-fold cross-validation
political booksa It is possible to perform the hold-out
method and the K-fold CV using BP.
But they have both
computational and conceptual
issues.
Their performances look nice indeed.
Orders of magnitude heavier than the LOOCV!

Codes are on GitHub
https://github.com/tatsuro-kawamoto/graphBIX
sbm.jl SBM!
mod.jl simpler one
（with&without degree-correction）

Conclusion
BP sparse
Prediction error(s)selection of q :

Cross-validation estimate of the number of clusters in a network

More Related Content

Similar to Cross-validation estimate of the number of clusters in a network

Recently uploaded

Cross-validation estimate of the number of clusters in a network