SlideShare a Scribd company logo
1 of 21
Download to read offline
Cross-validation estimate of !
the number of clusters in a network
AIST
Scientific Reports 7, 3327 (2017). arXiv:1605.07915 (2016).
Tatsuro Kawamoto
1
“Comparative analysis on the selection of number of clusters in community detection”,!
T.K., Y. Kabashima, arXiv:1606.07668 (2016).
related work
Tokyo Tech Yoshiyuki Kabashima
Graph clustering (community detection)
Books about US politics
Determine the number of clusters q that
most efficiently describes the network.
Goal
Framework: statistical inference!
Model: Stochastic Block Model!
Algorithm: EM algorithm + Belief Propagation (BP)!
!
Selection of q (model selection): !
LOOCV estimates of prediction errors
Summary
• 4 types of the LOOCV estimates prediction/training errors are considered.
• The LOOCV can be performed efficiently using BP.
• Performance is reasonable in practice.
• Overfit/underfit tendency among the LOOCVs is analyzed theoretically.
(LOOCV = leave-one-out cross-validation)
Principled, scalable, and widely applicable.
ω : connection prob. (affinity matrix)
q : # of clusters
γ : sizes of clusters
p(A, | , !, q) =
NY
i=1
i
Y
i<j
!Aij
i j
1 ! i j
1 Aij
Aij : adjacency matrix
σi : the cluster that vertex i belongs
Stochastic block model (SBM)
f =
1
N
log
X
p(A, | , !)
j
i k
i
i
=
1
Zi i
e h i
Y
k2@i
X
k
k!i
k
! k i
!
i!j
i
=
1
Zi!j i
e h i
Y
k2@ij
X
k
k!i
k
! k i
!
Belief Propagation (BP)
Minimize the (Bethe) free energy
using the EM algorithm
p(A, | , !, q) =
NY
i=1
i
Y
i<j
!Aij
i j
1 ! i j
1 Aij
Marginal distribution of σ w.r.t. vertex i
i
i
Decelle et al., PRE (2011) BP sparsePartial Bayes
Previous works
Decelle et al., PRE (2011)
Scalable algorithm in sparse case!
High accuracy in the ideal situation (known theoretically)!
!
but the Bethe free energy typically fails to determine q in practice
We keep the algorithm and use
prediction errors to determine q
BP sparse
we focus on
Partial Bayes
can’t select q…
test settraining set
Prediction error
example
error
model complexity
prediction error
training error
most parsimonious
total dataset
predict test set from training set
test settraining set
Prediction error
training settraining set test set
test settraining set
test set training set
3-fold cross-validation
total dataset
test set
dataset
i j
edge
test set
non-edge
?i j
{Aij}
…
?
i
j
training set
training set
Leave-one-out cross-validation
very heavy computation
(if it is done by brute force)
Bayes prediction error
?i j
cross-entropy error function
EBayes(q) =
1
L
X
i<j
pactual(Aij)
h
log ˆp(Aij|A(i,j)
)
i
Analytic Expression in terms of BP!!
(i.e., no need of brute force!)
i j
σ are marginalized
ˆp(Aij = 1|A(i,j)
) =
X
i, j
ˆp(Aij = 1| i, j)p( i, j|A(i,j)
)
=
X
i, j
! i j
i!j
i
j!i
j
Gibbs prediction error
EGibbs(q) =
1
L
X
(i,j)2E
X
i, j
i!j
i
j!i
j
log ! i j
i!j
i
! i,argmax{ i!j
}
MAP estimate
Gibbs training error
Etraining(q) =
1
L
X
(i,j)2E
X
i, j
i!j
i
! i j
j!i
j
Zij
log ! i j
i j
Measure the error before
marginalizing w.r.t. σ
Choose most likely σ
Use all the data to
measure the error
Results
Gibbs prediction
MAP (Gibbs)
Bayes prediction
Gibbs training
political books
Bethefreeenergy
We use the “one-standard error rule.”
Hastie, Tibshirani, & Friedman “Elements of Statistical Learning” (2013)
Actual partitions
metadata
q = 3 q = 5
Other networks
Degree-corrected SBM
Gibbs prediction
MAP (Gibbs)
Bayes prediction
Gibbs training
ba
Political blogspolitical books
un-corrected corrected
Relations among errors
EBayes = EGibbs DKL p( i, j|A(i,j))||p( i, j|A)
EBayes = Etraining + DKL p( i, j|A)||p( i, j|A(i,j))
sample average
qtraining qBayes qGibbs
Etraining  EBayes  EGibbs
If the partitions of different q constitutes a
hierarchical structure (sufficient condition),
deduced from the monotonicity of the KL divergence.
Directly from the Bayes rule,
Bethe free energy in terms of the
prediction errors
If we consider the leaf-one-vertex-out version
of the Bayes prediction error,
i
Ev
Bayes(q) =
1
L
X
i
log Zi
fBethe(q) / Ev
Bayes(q) EBayes(q) + const.
Ev
Bayes(q)Note that the error for each edge is counted twice in
When the network is actually
generated by the SBM
a b c d e
• The Bayes prediction error achieves the information-theoretic
detectability threshold for q=2 equal-size clusters. (analytically derived)!
!
• The Gibbs prediction error strictly underfits near the detectability
threshold. (analytically derived)
Hold-out method & K-fold CV
10-fold cross-validationholdout method
network scienceb
holdout method 10-fold cross-validation
political booksa It is possible to perform the hold-out
method and the K-fold CV using BP.
But they have both
computational and conceptual
issues.
Their performances look nice indeed.
Orders of magnitude heavier than the LOOCV!
Codes are on GitHub
https://github.com/tatsuro-kawamoto/graphBIX
sbm.jl SBM!
mod.jl simpler one
(with&without degree-correction)
Conclusion
BP sparse
Prediction error(s)selection of q :

More Related Content

Similar to Cross-validation estimate of the number of clusters in a network

So sánh cấu trúc protein_Protein structure comparison
So sánh cấu trúc protein_Protein structure comparisonSo sánh cấu trúc protein_Protein structure comparison
So sánh cấu trúc protein_Protein structure comparisonbomxuan868
 
Naïve Bayes Machine Learning Classification with R Programming: A case study ...
Naïve Bayes Machine Learning Classification with R Programming: A case study ...Naïve Bayes Machine Learning Classification with R Programming: A case study ...
Naïve Bayes Machine Learning Classification with R Programming: A case study ...SubmissionResearchpa
 
Improving Tree augmented Naive Bayes for class probability estimation
Improving Tree augmented Naive Bayes for class probability estimationImproving Tree augmented Naive Bayes for class probability estimation
Improving Tree augmented Naive Bayes for class probability estimationBeat Winehouse
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional VerificationSai Kiran Kadam
 
Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013Sheng Wang
 
NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured predictionzukun
 
More on randomization semi-definite programming and derandomization
More on randomization semi-definite programming and derandomizationMore on randomization semi-definite programming and derandomization
More on randomization semi-definite programming and derandomizationAbner Chih Yi Huang
 
Risk Classification with an Adaptive Naive Bayes Kernel Machine Model
Risk Classification with an Adaptive Naive Bayes Kernel Machine ModelRisk Classification with an Adaptive Naive Bayes Kernel Machine Model
Risk Classification with an Adaptive Naive Bayes Kernel Machine ModelJessica Minnier
 
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsYoonho Lee
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsNBER
 
DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club avrilcoghlan
 
Exploiting Worker Correlation for Label Aggregation in Crowdsourcing
Exploiting Worker Correlation for Label Aggregation in CrowdsourcingExploiting Worker Correlation for Label Aggregation in Crowdsourcing
Exploiting Worker Correlation for Label Aggregation in CrowdsourcingYuanLi589586
 
Non parametric bayesian learning in discrete data
Non parametric bayesian learning in discrete dataNon parametric bayesian learning in discrete data
Non parametric bayesian learning in discrete dataYueshen Xu
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...butest
 
Bayesian Hierarchical Models
Bayesian Hierarchical ModelsBayesian Hierarchical Models
Bayesian Hierarchical ModelsAmmar Rashed
 
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...cvpaper. challenge
 
Modeling uncertainty in deep learning
Modeling uncertainty in deep learning Modeling uncertainty in deep learning
Modeling uncertainty in deep learning Sungjoon Choi
 

Similar to Cross-validation estimate of the number of clusters in a network (20)

So sánh cấu trúc protein_Protein structure comparison
So sánh cấu trúc protein_Protein structure comparisonSo sánh cấu trúc protein_Protein structure comparison
So sánh cấu trúc protein_Protein structure comparison
 
Naïve Bayes Machine Learning Classification with R Programming: A case study ...
Naïve Bayes Machine Learning Classification with R Programming: A case study ...Naïve Bayes Machine Learning Classification with R Programming: A case study ...
Naïve Bayes Machine Learning Classification with R Programming: A case study ...
 
Improving Tree augmented Naive Bayes for class probability estimation
Improving Tree augmented Naive Bayes for class probability estimationImproving Tree augmented Naive Bayes for class probability estimation
Improving Tree augmented Naive Bayes for class probability estimation
 
5 5 10
5 5 105 5 10
5 5 10
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013
 
NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured prediction
 
AI Lesson 29
AI Lesson 29AI Lesson 29
AI Lesson 29
 
Lesson 29
Lesson 29Lesson 29
Lesson 29
 
More on randomization semi-definite programming and derandomization
More on randomization semi-definite programming and derandomizationMore on randomization semi-definite programming and derandomization
More on randomization semi-definite programming and derandomization
 
Risk Classification with an Adaptive Naive Bayes Kernel Machine Model
Risk Classification with an Adaptive Naive Bayes Kernel Machine ModelRisk Classification with an Adaptive Naive Bayes Kernel Machine Model
Risk Classification with an Adaptive Naive Bayes Kernel Machine Model
 
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation Graphs
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and Algorithms
 
DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club
 
Exploiting Worker Correlation for Label Aggregation in Crowdsourcing
Exploiting Worker Correlation for Label Aggregation in CrowdsourcingExploiting Worker Correlation for Label Aggregation in Crowdsourcing
Exploiting Worker Correlation for Label Aggregation in Crowdsourcing
 
Non parametric bayesian learning in discrete data
Non parametric bayesian learning in discrete dataNon parametric bayesian learning in discrete data
Non parametric bayesian learning in discrete data
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
Bayesian Hierarchical Models
Bayesian Hierarchical ModelsBayesian Hierarchical Models
Bayesian Hierarchical Models
 
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
 
Modeling uncertainty in deep learning
Modeling uncertainty in deep learning Modeling uncertainty in deep learning
Modeling uncertainty in deep learning
 

Recently uploaded

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionPriyansha Singh
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 

Recently uploaded (20)

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorption
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 

Cross-validation estimate of the number of clusters in a network

  • 1. Cross-validation estimate of ! the number of clusters in a network AIST Scientific Reports 7, 3327 (2017). arXiv:1605.07915 (2016). Tatsuro Kawamoto 1 “Comparative analysis on the selection of number of clusters in community detection”,! T.K., Y. Kabashima, arXiv:1606.07668 (2016). related work Tokyo Tech Yoshiyuki Kabashima
  • 2. Graph clustering (community detection) Books about US politics Determine the number of clusters q that most efficiently describes the network. Goal
  • 3. Framework: statistical inference! Model: Stochastic Block Model! Algorithm: EM algorithm + Belief Propagation (BP)! ! Selection of q (model selection): ! LOOCV estimates of prediction errors Summary • 4 types of the LOOCV estimates prediction/training errors are considered. • The LOOCV can be performed efficiently using BP. • Performance is reasonable in practice. • Overfit/underfit tendency among the LOOCVs is analyzed theoretically. (LOOCV = leave-one-out cross-validation) Principled, scalable, and widely applicable.
  • 4. ω : connection prob. (affinity matrix) q : # of clusters γ : sizes of clusters p(A, | , !, q) = NY i=1 i Y i<j !Aij i j 1 ! i j 1 Aij Aij : adjacency matrix σi : the cluster that vertex i belongs Stochastic block model (SBM)
  • 5. f = 1 N log X p(A, | , !) j i k i i = 1 Zi i e h i Y k2@i X k k!i k ! k i ! i!j i = 1 Zi!j i e h i Y k2@ij X k k!i k ! k i ! Belief Propagation (BP) Minimize the (Bethe) free energy using the EM algorithm p(A, | , !, q) = NY i=1 i Y i<j !Aij i j 1 ! i j 1 Aij Marginal distribution of σ w.r.t. vertex i i i Decelle et al., PRE (2011) BP sparsePartial Bayes
  • 6. Previous works Decelle et al., PRE (2011) Scalable algorithm in sparse case! High accuracy in the ideal situation (known theoretically)! ! but the Bethe free energy typically fails to determine q in practice We keep the algorithm and use prediction errors to determine q BP sparse we focus on Partial Bayes can’t select q…
  • 7. test settraining set Prediction error example error model complexity prediction error training error most parsimonious total dataset predict test set from training set
  • 8. test settraining set Prediction error training settraining set test set test settraining set test set training set 3-fold cross-validation total dataset
  • 9. test set dataset i j edge test set non-edge ?i j {Aij} … ? i j training set training set Leave-one-out cross-validation very heavy computation (if it is done by brute force)
  • 10. Bayes prediction error ?i j cross-entropy error function EBayes(q) = 1 L X i<j pactual(Aij) h log ˆp(Aij|A(i,j) ) i Analytic Expression in terms of BP!! (i.e., no need of brute force!) i j σ are marginalized ˆp(Aij = 1|A(i,j) ) = X i, j ˆp(Aij = 1| i, j)p( i, j|A(i,j) ) = X i, j ! i j i!j i j!i j
  • 11. Gibbs prediction error EGibbs(q) = 1 L X (i,j)2E X i, j i!j i j!i j log ! i j i!j i ! i,argmax{ i!j } MAP estimate Gibbs training error Etraining(q) = 1 L X (i,j)2E X i, j i!j i ! i j j!i j Zij log ! i j i j Measure the error before marginalizing w.r.t. σ Choose most likely σ Use all the data to measure the error
  • 12. Results Gibbs prediction MAP (Gibbs) Bayes prediction Gibbs training political books Bethefreeenergy We use the “one-standard error rule.” Hastie, Tibshirani, & Friedman “Elements of Statistical Learning” (2013)
  • 15. Degree-corrected SBM Gibbs prediction MAP (Gibbs) Bayes prediction Gibbs training ba Political blogspolitical books un-corrected corrected
  • 16. Relations among errors EBayes = EGibbs DKL p( i, j|A(i,j))||p( i, j|A) EBayes = Etraining + DKL p( i, j|A)||p( i, j|A(i,j)) sample average qtraining qBayes qGibbs Etraining  EBayes  EGibbs If the partitions of different q constitutes a hierarchical structure (sufficient condition), deduced from the monotonicity of the KL divergence. Directly from the Bayes rule,
  • 17. Bethe free energy in terms of the prediction errors If we consider the leaf-one-vertex-out version of the Bayes prediction error, i Ev Bayes(q) = 1 L X i log Zi fBethe(q) / Ev Bayes(q) EBayes(q) + const. Ev Bayes(q)Note that the error for each edge is counted twice in
  • 18. When the network is actually generated by the SBM a b c d e • The Bayes prediction error achieves the information-theoretic detectability threshold for q=2 equal-size clusters. (analytically derived)! ! • The Gibbs prediction error strictly underfits near the detectability threshold. (analytically derived)
  • 19. Hold-out method & K-fold CV 10-fold cross-validationholdout method network scienceb holdout method 10-fold cross-validation political booksa It is possible to perform the hold-out method and the K-fold CV using BP. But they have both computational and conceptual issues. Their performances look nice indeed. Orders of magnitude heavier than the LOOCV!
  • 20. Codes are on GitHub https://github.com/tatsuro-kawamoto/graphBIX sbm.jl SBM! mod.jl simpler one (with&without degree-correction)