SlideShare a Scribd company logo
1 of 23
Download to read offline
Deep k-Means: Jointly Clustering with k-Means and Learning
Representations
Thibaut THONET
thibaut.thonet@univ-grenoble-alpes.fr
Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG
Joint work with Maziar MORADI FARD and Eric GAUSSIER
5 September 2018 @ ENBIS, Nancy
Thibaut Thonet Deep k-Means
Clustering
Clustering is the process of organizing unlabeled objects into groups (clusters)
whose members are similar in some way
Clustering approaches may be classified as:
Hard clustering: each object belongs at most to one cluster
Soft clustering: each object can belong to more than one cluster
Thibaut Thonet Deep k-Means 2 / 16
k-Means clustering
k-Means is a centroid-based approach for hard clustering [MacQueen, 1967].
Given a set of objects X, k-Means clustering aims to group the objects into k clusters
of similar samples by minimizing the following loss function:
min
R
x∈X
||x − c(x; R)||2
2
where R are the cluster centers and c(x; R) = arg min
r∈R
||x − r||2
is the nearest cluster
center to x
r1
r2
K
Assign objects to clusters Update cluster centers
Thibaut Thonet Deep k-Means 3 / 16
k-Means clustering
k-Means is a centroid-based approach for hard clustering [MacQueen, 1967].
Given a set of objects X, k-Means clustering aims to group the objects into k clusters
of similar samples by minimizing the following loss function:
min
R
x∈X
||x − c(x; R)||2
2
where R are the cluster centers and c(x; R) = arg min
r∈R
||x − r||2
is the nearest cluster
center to x
r1
r2
K
Assign objects to clusters Update cluster centers
...But the input space is often high-dimensional, sparse and/or with redundant
dimensions
=⇒ It may not be suitable for clustering
Thibaut Thonet Deep k-Means 3 / 16
k-Means in an embedded space: Auto-Encoder + k-Means
1. Train an
auto-encoder on the
dataset to learn object
embeddings (e.g., for
text, low-dimensional
dense representations)
2. Perform k-Means in
the embedding space
…
x Auto(x)
r1
r2
K
(x)hθ
(x)hθ
(x) = ||x − Auto(x)|min
θ
∑
x
Lrec ∑
x
|2
2
(x) = || (x) − c( (x); R)|min
R
∑
x
Lclust ∑
x
hθ hθ |2
2
with c( (x); R) = || (x) − r|hθ argmin
r∈R
hθ |2
Untitled Diagram.xml
Thibaut Thonet Deep k-Means 4 / 16
k-Means in an embedded space: Auto-Encoder + k-Means
1. Train an
auto-encoder on the
dataset to learn object
embeddings (e.g., for
text, low-dimensional
dense representations)
2. Perform k-Means in
the embedding space
…
x Auto(x)
r1
r2
K
(x)hθ
(x)hθ
(x) = ||x − Auto(x)|min
θ
∑
x
Lrec ∑
x
|2
2
(x) = || (x) − c( (x); R)|min
R
∑
x
Lclust ∑
x
hθ hθ |2
2
with c( (x); R) = || (x) − r|hθ argmin
r∈R
hθ |2
Untitled Diagram.xml
...But embeddings are not specifically learned for clustering purposes
=⇒ They may still not be suitable for clustering
Thibaut Thonet Deep k-Means 4 / 16
k-Means in an embedded space: Deep Clustering Network
The Deep Clustering Network (DCN) [Yang+, 2017] alternatively (i) learns cluster
representatives R and auto-encoder parameters θ using SGD and (ii) assigns data
points to the cluster with the nearest representative in the embedding space
= || (x) − c( (x); R)|hθ hθ |2
2
(x) = ||x − Auto(x)|Lrec |2
2
…
(x)Lclust
x Auto(x)
with c( (x); R) = || (x) − r|hθ argmin
r∈R
hθ |2
L = (x) + λ (x)min
R,θ
∑
x
Lrec Lclust
r1
r2
K
(x)hθ
diagram_dcn.xml
Thibaut Thonet Deep k-Means 5 / 16
k-Means in an embedded space: Deep Clustering Network
The Deep Clustering Network (DCN) [Yang+, 2017] alternatively (i) learns cluster
representatives R and auto-encoder parameters θ using SGD and (ii) assigns data
points to the cluster with the nearest representative in the embedding space
= || (x) − c( (x); R)|hθ hθ |2
2
(x) = ||x − Auto(x)|Lrec |2
2
…
(x)Lclust
x Auto(x)
with c( (x); R) = || (x) − r|hθ argmin
r∈R
hθ |2
L = (x) + λ (x)min
R,θ
∑
x
Lrec Lclust
r1
r2
K
(x)hθ
diagram_dcn.xml
...But impossibility to solely rely on SGD due to discrete assignments (argmin)
=⇒ Non-joint and less scalable training
Thibaut Thonet Deep k-Means 5 / 16
Deep k-means: overview
= closeness( (x), )∑
k
hθ rk
(x) = ||x − Auto(x)|Lrec |2
2
…
(x)Lclust
x Auto(x)
× || (x) − |hθ rk |2
2
L = (x) + λ (x)min
R,θ
∑
x
Lrec Lclust
r1
(x)hθ
r2
K
am_f.xml
Thibaut Thonet Deep k-Means 6 / 16
Deep k-means: a differentiable surrogate to DCN
We propose to solve a fully differentiable problem surrogate to DCN’s [Moradi Fard+,
2018]:
P
(α)
DKM: min
R,θ
L(α)
=
x∈X
Lrec(x) + λ L
(α)
clust(x)
with L
(α)
clust(x) =
r∈R
closeness(hθ(x), r; α) × ||hθ(x) − r||2
such that:
closeness(hθ(x), r; α) is differentiable wrt both θ and r
lim
α→∞
closeness(hθ(x), r; α) =



1 if r = arg min
r ∈R
||hθ(x) − r ||2
0 otherwise
Thibaut Thonet Deep k-Means 7 / 16
Deep k-means: a differentiable surrogate to DCN
We propose to solve a fully differentiable problem surrogate to DCN’s [Moradi Fard+,
2018]:
P
(α)
DKM: min
R,θ
L(α)
=
x∈X
Lrec(x) + λ L
(α)
clust(x)
with L
(α)
clust(x) =
r∈R
closeness(hθ(x), r; α) × ||hθ(x) − r||2
such that:
closeness(hθ(x), r; α) is differentiable wrt both θ and r
lim
α→∞
closeness(hθ(x), r; α) =



1 if r = arg min
r ∈R
||hθ(x) − r ||2
0 otherwise
Intuitively, closeness(hθ(x), r; α) can be seen as a relaxation to DCN’s hard
clustering assignments such that lim
α→∞
P
(α)
DKM = PDCN holds
Thibaut Thonet Deep k-Means 7 / 16
Deep k-means: choice of closeness and α
We chose closeness to be defined based on a parameterized softmax:
closeness(hθ(x), r; α) =
exp(−α ||hθ(x) − r||2
)
r ∈R
exp(−α ||hθ(x) − r ||2
)
where α can be either set as a constant or progressively increased (deterministic
annealing)
Thibaut Thonet Deep k-Means 8 / 16
Deep k-means: choice of closeness and α
We chose closeness to be defined based on a parameterized softmax:
closeness(hθ(x), r; α) =
exp(−α ||hθ(x) − r||2
)
r ∈R
exp(−α ||hθ(x) − r ||2
)
where α can be either set as a constant or progressively increased (deterministic
annealing)
α plays two roles: (a) approximation of hard clustering and (b) inverse temperature
in a deterministic annealing scheme
DKMa
: random initialization of θ and R + annealing: sequence (αn)n with
α1 = 0.1
DKMp
: pretraining of θ and k-means-based initialization of R + no annealing:
constant α = 1000
where the sequence (αn)n is defined as αn+1 = 21/ log(n)2
× αn
Thibaut Thonet Deep k-Means 8 / 16
Deep k-means: SGD-based training algorithm
Algorithm 1 Deep k-means
Input: data X, number of clusters K, trade-off hyperparameter
λ, scheme for α, number of epochs T, number of minibatches N,
learning rate η
Output: autoencoder parameters θ, cluster representatives R
Initialize θ and rk, 1 ≤ k ≤ K (randomly or through pretraining)
for each α do # α levels (if α not constant)
for t = 1 to T do # epochs per α
for n = 1 to N do # minibatches
Draw a minibatch ˜X ⊂ X
Update (θ, R) ← (θ, R) − η 1
| ˜X| (θ, R)
˜L(α)
end for
end for
end for
Thibaut Thonet Deep k-Means 9 / 16
Experimental setup
AE architecture: encoder with d-500-500-2000-K neurons and mirrored decoder
Baselines
k-Means
AE + k-Means
Deep Clustering Network [Yang+, 2017]
Improved Deep Embedded Clustering [Guo+, 2017]
Datasets
Text
20 Newsgroups: 20 classes, 18,846 samples
RCV1: 4 classes, 10,000 samples
Image
MNIST: 10 classes, 70,000 samples
USPS: 10 classes, 9,298 samples
Clustering metrics
Clustering accuracy (ACC)
Normalized Mutual Information (NMI)
Thibaut Thonet Deep k-Means 10 / 16
Clustering performance
Mean ± std for ACC and NMI computed over 10 (seeded) runs. Bold (resp. underlined)
values correspond to results with no significant difference (p > 0.05) to the best
approach with (resp. without) pretraining for each dataset/metric pair
Model
MNIST USPS 20NEWS RCV1
ACC NMI ACC NMI ACC NMI ACC NMI
KM 53.5±0.3 49.8±0.5 67.3±0.1 61.4±0.1 23.2±1.5 21.6±1.8 50.8±2.9 31.3±5.4
AE-KM 80.8±1.8 75.2±1.1 72.9±0.8 71.7±1.2 49.0±2.9 44.5±1.5 56.7±3.6 31.5±4.3
Deep clustering approaches without pretraining
DCNnp
34.8±3.0 18.1±1.0 36.4±3.5 16.9±1.3 17.9±1.0 9.8±0.5 41.3±4.0 6.9±1.8
IDECnp
61.8±3.0 62.4±1.6 53.9±5.1 50.0±3.8 22.3±1.5 22.3±1.5 56.7±5.3 31.4±2.8
DKMa
82.3±3.2 78.0±1.9 75.5±6.8 73.0±2.3 44.8±2.4 42.8±1.1 53.8±5.5 28.0±5.8
Deep clustering approaches with pretraining
DCNp
81.1±1.9 75.7±1.1 73.0±0.8 71.9±1.2 49.2±2.9 44.7±1.5 56.7±3.6 31.6±4.3
IDECp
85.7±2.4 86.4±1.0 75.2±0.5 74.9±0.6 40.5±1.3 38.2±1.0 59.5±5.7 34.7±5.0
DKMp
84.0±2.2 79.6±0.9 75.7±1.3 77.6±1.1 51.2±2.8 46.7±1.2 58.3±3.8 33.1±4.9
Thibaut Thonet Deep k-Means 11 / 16
‘k-Means-friendliness’ of learned representations
Mean ± std for ACC and NMI computed over 10 (seeded) runs. Bold values
correspond to results with no significant difference (p > 0.05) to the best
Model
MNIST USPS 20NEWS RCV1
ACC NMI ACC NMI ACC NMI ACC NMI
AE-KM 80.8±1.8 75.2±1.1 72.9±0.8 71.7±1.2 49.0±2.9 44.5±1.5 56.7±3.6 31.5±4.3
DCNp
+ KM 84.9±3.1 79.4±1.5 73.9±0.7 74.1±1.1 50.5±3.1 46.5±1.6 57.3±3.6 32.3±4.4
DKMa
+ KM 84.8±1.3 78.7±0.8 76.9±4.9 74.3±1.5 49.0±2.5 44.0±1.0 53.4±5.9 27.4±5.3
DKMp
+ KM 85.1±3.0 79.9±1.5 75.7±1.3 77.6±1.1 52.1±2.7 47.1±1.3 58.3±3.8 33.0±4.9
40 30 20 10 0 10 20 30
40
30
20
10
0
10
20
30
40
AE
40 30 20 10 0 10 20 30 40
30
20
10
0
10
20
30
40
DCN
30 20 10 0 10 20 30 40
40
30
20
10
0
10
20
30
40
DKMa
30 20 10 0 10 20 30 40
30
20
10
0
10
20
30
DKMp
Thibaut Thonet Deep k-Means 12 / 16
Conclusion
Proposition of Deep k-Means, a new approach to jointly perform k-means
clustering and representation learning
Take-home messages:
Pretraining is clearly beneficial to deep clustering
The differentiable formulation of DKM enables fully joint SGD and thus
efficient use of GPU
k-Means-based approaches can perform on par with state-of-the-art deep
clustering approaches
Thibaut Thonet Deep k-Means 13 / 16
Ongoing work: Constrained Deep k-Means
We wish to guide the clustering results in order to capture information that is relevant
to the user (e.g., expert knowledge on the classes). We consider here that this
information takes the form of lexical constraints with a set of keywords for document
clustering
engine
car
diet
foodnovel
book
Thibaut Thonet Deep k-Means 14 / 16
Ongoing work: Constrained Deep k-Means
We wish to guide the clustering results in order to capture information that is relevant
to the user (e.g., expert knowledge on the classes). We consider here that this
information takes the form of lexical constraints with a set of keywords for document
clustering
engine
car
diet
foodnovel
book
Two approaches considered:
Constrain the document embeddings to put more emphasis on the keywords
Constrain the cluster representatives to be related to subsets of the keywords
Thibaut Thonet Deep k-Means 14 / 16
Thank you!
Paper pre-print available at: https://arxiv.org/pdf/1806.10069.pdf
Thibaut Thonet Deep k-Means 15 / 16
References
Guo, X., Gao, L., Liu, X., & Yin, J. (2017). Improved Deep Embedded Clustering
with Local Structure Preservation. In Proceedings of the 26th International Joint
Conference on Artificial Intelligence (pp. 1753–1759).
MacQueen, J. (1967). Some Methods for Classification and Analysis of
Multivariate Observations. In Proceedings of the 5th Berkeley Symposium on
Mathematical Statistics and Probability (pp. 281–297).
Moradi Fard, M., Thonet, T., & Gaussier, E. (2018). Deep k-Means: Jointly
Clustering with k-Means and Learning Representations. arXiv:1806.10069.
Yang, B., Fu, X., Sidiropoulos, N. D., & Hong, M. (2017). Towards
K-means-friendly Spaces: Simultaneous Deep Learning and Clustering. In
ICML ’17 (pp. 3861–3870).
Thibaut Thonet Deep k-Means 16 / 16
Appendix: clustering metrics
Given the groundtruth classes S = {S1, . . . , SK }, the obtained clusters
C = {C1, . . . , CK }, and the dataset X:
ACC(C, S) = max
φ
1
|X|
|X|
i=1
I{si = φ(ci)}
NMI(C, S) =
2 I(C, S)
H(C) + H(S)
with I(C, S) =
j,k
|Cj ∩ Sk|
|X|
log
|X| |Cj ∩ Sk|
|Cj| |Sk|
and H(C) = −
j
|Cj|
|X|
log
|Cj|
|X|
Thibaut Thonet Deep k-Means 16 / 16

More Related Content

What's hot

Knowledge representation in AI
Knowledge representation in AIKnowledge representation in AI
Knowledge representation in AI
Vishal Singh
 

What's hot (20)

[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)
 
K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
 
ImageProcessing10-Segmentation(Thresholding) (1).ppt
ImageProcessing10-Segmentation(Thresholding) (1).pptImageProcessing10-Segmentation(Thresholding) (1).ppt
ImageProcessing10-Segmentation(Thresholding) (1).ppt
 
Lecture26
Lecture26Lecture26
Lecture26
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Module 2: Machine Learning Deep Dive
Module 2:  Machine Learning Deep DiveModule 2:  Machine Learning Deep Dive
Module 2: Machine Learning Deep Dive
 
Design and Analysis of Algorithms
Design and Analysis of AlgorithmsDesign and Analysis of Algorithms
Design and Analysis of Algorithms
 
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEUnified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
PyTorch Introduction
PyTorch IntroductionPyTorch Introduction
PyTorch Introduction
 
Knowledge representation in AI
Knowledge representation in AIKnowledge representation in AI
Knowledge representation in AI
 
PRML Chapter 3
PRML Chapter 3PRML Chapter 3
PRML Chapter 3
 
Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.
 
Medical image analysis
Medical image analysisMedical image analysis
Medical image analysis
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in Vision
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Artificial Intelligence: What Is Reinforcement Learning?
Artificial Intelligence: What Is Reinforcement Learning?Artificial Intelligence: What Is Reinforcement Learning?
Artificial Intelligence: What Is Reinforcement Learning?
 
Transformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxTransformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptx
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 

Similar to ENBIS 2018 presentation on Deep k-Means

Information-theoretic clustering with applications
Information-theoretic clustering  with applicationsInformation-theoretic clustering  with applications
Information-theoretic clustering with applications
Frank Nielsen
 

Similar to ENBIS 2018 presentation on Deep k-Means (20)

Information-theoretic clustering with applications
Information-theoretic clustering  with applicationsInformation-theoretic clustering  with applications
Information-theoretic clustering with applications
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 
Lect4
Lect4Lect4
Lect4
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4
 
Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03
 
Codes and Isogenies
Codes and IsogeniesCodes and Isogenies
Codes and Isogenies
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
 
Smart Multitask Bregman Clustering
Smart Multitask Bregman ClusteringSmart Multitask Bregman Clustering
Smart Multitask Bregman Clustering
 
MLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackMLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic track
 
Subquad multi ff
Subquad multi ffSubquad multi ff
Subquad multi ff
 
Q-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeQ-Metrics in Theory and Practice
Q-Metrics in Theory and Practice
 
Q-Metrics in Theory And Practice
Q-Metrics in Theory And PracticeQ-Metrics in Theory And Practice
Q-Metrics in Theory And Practice
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
 
Unit 3
Unit 3Unit 3
Unit 3
 
Unit 3
Unit 3Unit 3
Unit 3
 
Semi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleSemi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster Ensemble
 

Recently uploaded

The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 

Recently uploaded (20)

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
chemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdfchemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdf
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 

ENBIS 2018 presentation on Deep k-Means

  • 1. Deep k-Means: Jointly Clustering with k-Means and Learning Representations Thibaut THONET thibaut.thonet@univ-grenoble-alpes.fr Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG Joint work with Maziar MORADI FARD and Eric GAUSSIER 5 September 2018 @ ENBIS, Nancy Thibaut Thonet Deep k-Means
  • 2. Clustering Clustering is the process of organizing unlabeled objects into groups (clusters) whose members are similar in some way Clustering approaches may be classified as: Hard clustering: each object belongs at most to one cluster Soft clustering: each object can belong to more than one cluster Thibaut Thonet Deep k-Means 2 / 16
  • 3. k-Means clustering k-Means is a centroid-based approach for hard clustering [MacQueen, 1967]. Given a set of objects X, k-Means clustering aims to group the objects into k clusters of similar samples by minimizing the following loss function: min R x∈X ||x − c(x; R)||2 2 where R are the cluster centers and c(x; R) = arg min r∈R ||x − r||2 is the nearest cluster center to x r1 r2 K Assign objects to clusters Update cluster centers Thibaut Thonet Deep k-Means 3 / 16
  • 4. k-Means clustering k-Means is a centroid-based approach for hard clustering [MacQueen, 1967]. Given a set of objects X, k-Means clustering aims to group the objects into k clusters of similar samples by minimizing the following loss function: min R x∈X ||x − c(x; R)||2 2 where R are the cluster centers and c(x; R) = arg min r∈R ||x − r||2 is the nearest cluster center to x r1 r2 K Assign objects to clusters Update cluster centers ...But the input space is often high-dimensional, sparse and/or with redundant dimensions =⇒ It may not be suitable for clustering Thibaut Thonet Deep k-Means 3 / 16
  • 5. k-Means in an embedded space: Auto-Encoder + k-Means 1. Train an auto-encoder on the dataset to learn object embeddings (e.g., for text, low-dimensional dense representations) 2. Perform k-Means in the embedding space … x Auto(x) r1 r2 K (x)hθ (x)hθ (x) = ||x − Auto(x)|min θ ∑ x Lrec ∑ x |2 2 (x) = || (x) − c( (x); R)|min R ∑ x Lclust ∑ x hθ hθ |2 2 with c( (x); R) = || (x) − r|hθ argmin r∈R hθ |2 Untitled Diagram.xml Thibaut Thonet Deep k-Means 4 / 16
  • 6. k-Means in an embedded space: Auto-Encoder + k-Means 1. Train an auto-encoder on the dataset to learn object embeddings (e.g., for text, low-dimensional dense representations) 2. Perform k-Means in the embedding space … x Auto(x) r1 r2 K (x)hθ (x)hθ (x) = ||x − Auto(x)|min θ ∑ x Lrec ∑ x |2 2 (x) = || (x) − c( (x); R)|min R ∑ x Lclust ∑ x hθ hθ |2 2 with c( (x); R) = || (x) − r|hθ argmin r∈R hθ |2 Untitled Diagram.xml ...But embeddings are not specifically learned for clustering purposes =⇒ They may still not be suitable for clustering Thibaut Thonet Deep k-Means 4 / 16
  • 7. k-Means in an embedded space: Deep Clustering Network The Deep Clustering Network (DCN) [Yang+, 2017] alternatively (i) learns cluster representatives R and auto-encoder parameters θ using SGD and (ii) assigns data points to the cluster with the nearest representative in the embedding space = || (x) − c( (x); R)|hθ hθ |2 2 (x) = ||x − Auto(x)|Lrec |2 2 … (x)Lclust x Auto(x) with c( (x); R) = || (x) − r|hθ argmin r∈R hθ |2 L = (x) + λ (x)min R,θ ∑ x Lrec Lclust r1 r2 K (x)hθ diagram_dcn.xml Thibaut Thonet Deep k-Means 5 / 16
  • 8. k-Means in an embedded space: Deep Clustering Network The Deep Clustering Network (DCN) [Yang+, 2017] alternatively (i) learns cluster representatives R and auto-encoder parameters θ using SGD and (ii) assigns data points to the cluster with the nearest representative in the embedding space = || (x) − c( (x); R)|hθ hθ |2 2 (x) = ||x − Auto(x)|Lrec |2 2 … (x)Lclust x Auto(x) with c( (x); R) = || (x) − r|hθ argmin r∈R hθ |2 L = (x) + λ (x)min R,θ ∑ x Lrec Lclust r1 r2 K (x)hθ diagram_dcn.xml ...But impossibility to solely rely on SGD due to discrete assignments (argmin) =⇒ Non-joint and less scalable training Thibaut Thonet Deep k-Means 5 / 16
  • 9. Deep k-means: overview = closeness( (x), )∑ k hθ rk (x) = ||x − Auto(x)|Lrec |2 2 … (x)Lclust x Auto(x) × || (x) − |hθ rk |2 2 L = (x) + λ (x)min R,θ ∑ x Lrec Lclust r1 (x)hθ r2 K am_f.xml Thibaut Thonet Deep k-Means 6 / 16
  • 10. Deep k-means: a differentiable surrogate to DCN We propose to solve a fully differentiable problem surrogate to DCN’s [Moradi Fard+, 2018]: P (α) DKM: min R,θ L(α) = x∈X Lrec(x) + λ L (α) clust(x) with L (α) clust(x) = r∈R closeness(hθ(x), r; α) × ||hθ(x) − r||2 such that: closeness(hθ(x), r; α) is differentiable wrt both θ and r lim α→∞ closeness(hθ(x), r; α) =    1 if r = arg min r ∈R ||hθ(x) − r ||2 0 otherwise Thibaut Thonet Deep k-Means 7 / 16
  • 11. Deep k-means: a differentiable surrogate to DCN We propose to solve a fully differentiable problem surrogate to DCN’s [Moradi Fard+, 2018]: P (α) DKM: min R,θ L(α) = x∈X Lrec(x) + λ L (α) clust(x) with L (α) clust(x) = r∈R closeness(hθ(x), r; α) × ||hθ(x) − r||2 such that: closeness(hθ(x), r; α) is differentiable wrt both θ and r lim α→∞ closeness(hθ(x), r; α) =    1 if r = arg min r ∈R ||hθ(x) − r ||2 0 otherwise Intuitively, closeness(hθ(x), r; α) can be seen as a relaxation to DCN’s hard clustering assignments such that lim α→∞ P (α) DKM = PDCN holds Thibaut Thonet Deep k-Means 7 / 16
  • 12. Deep k-means: choice of closeness and α We chose closeness to be defined based on a parameterized softmax: closeness(hθ(x), r; α) = exp(−α ||hθ(x) − r||2 ) r ∈R exp(−α ||hθ(x) − r ||2 ) where α can be either set as a constant or progressively increased (deterministic annealing) Thibaut Thonet Deep k-Means 8 / 16
  • 13. Deep k-means: choice of closeness and α We chose closeness to be defined based on a parameterized softmax: closeness(hθ(x), r; α) = exp(−α ||hθ(x) − r||2 ) r ∈R exp(−α ||hθ(x) − r ||2 ) where α can be either set as a constant or progressively increased (deterministic annealing) α plays two roles: (a) approximation of hard clustering and (b) inverse temperature in a deterministic annealing scheme DKMa : random initialization of θ and R + annealing: sequence (αn)n with α1 = 0.1 DKMp : pretraining of θ and k-means-based initialization of R + no annealing: constant α = 1000 where the sequence (αn)n is defined as αn+1 = 21/ log(n)2 × αn Thibaut Thonet Deep k-Means 8 / 16
  • 14. Deep k-means: SGD-based training algorithm Algorithm 1 Deep k-means Input: data X, number of clusters K, trade-off hyperparameter λ, scheme for α, number of epochs T, number of minibatches N, learning rate η Output: autoencoder parameters θ, cluster representatives R Initialize θ and rk, 1 ≤ k ≤ K (randomly or through pretraining) for each α do # α levels (if α not constant) for t = 1 to T do # epochs per α for n = 1 to N do # minibatches Draw a minibatch ˜X ⊂ X Update (θ, R) ← (θ, R) − η 1 | ˜X| (θ, R) ˜L(α) end for end for end for Thibaut Thonet Deep k-Means 9 / 16
  • 15. Experimental setup AE architecture: encoder with d-500-500-2000-K neurons and mirrored decoder Baselines k-Means AE + k-Means Deep Clustering Network [Yang+, 2017] Improved Deep Embedded Clustering [Guo+, 2017] Datasets Text 20 Newsgroups: 20 classes, 18,846 samples RCV1: 4 classes, 10,000 samples Image MNIST: 10 classes, 70,000 samples USPS: 10 classes, 9,298 samples Clustering metrics Clustering accuracy (ACC) Normalized Mutual Information (NMI) Thibaut Thonet Deep k-Means 10 / 16
  • 16. Clustering performance Mean ± std for ACC and NMI computed over 10 (seeded) runs. Bold (resp. underlined) values correspond to results with no significant difference (p > 0.05) to the best approach with (resp. without) pretraining for each dataset/metric pair Model MNIST USPS 20NEWS RCV1 ACC NMI ACC NMI ACC NMI ACC NMI KM 53.5±0.3 49.8±0.5 67.3±0.1 61.4±0.1 23.2±1.5 21.6±1.8 50.8±2.9 31.3±5.4 AE-KM 80.8±1.8 75.2±1.1 72.9±0.8 71.7±1.2 49.0±2.9 44.5±1.5 56.7±3.6 31.5±4.3 Deep clustering approaches without pretraining DCNnp 34.8±3.0 18.1±1.0 36.4±3.5 16.9±1.3 17.9±1.0 9.8±0.5 41.3±4.0 6.9±1.8 IDECnp 61.8±3.0 62.4±1.6 53.9±5.1 50.0±3.8 22.3±1.5 22.3±1.5 56.7±5.3 31.4±2.8 DKMa 82.3±3.2 78.0±1.9 75.5±6.8 73.0±2.3 44.8±2.4 42.8±1.1 53.8±5.5 28.0±5.8 Deep clustering approaches with pretraining DCNp 81.1±1.9 75.7±1.1 73.0±0.8 71.9±1.2 49.2±2.9 44.7±1.5 56.7±3.6 31.6±4.3 IDECp 85.7±2.4 86.4±1.0 75.2±0.5 74.9±0.6 40.5±1.3 38.2±1.0 59.5±5.7 34.7±5.0 DKMp 84.0±2.2 79.6±0.9 75.7±1.3 77.6±1.1 51.2±2.8 46.7±1.2 58.3±3.8 33.1±4.9 Thibaut Thonet Deep k-Means 11 / 16
  • 17. ‘k-Means-friendliness’ of learned representations Mean ± std for ACC and NMI computed over 10 (seeded) runs. Bold values correspond to results with no significant difference (p > 0.05) to the best Model MNIST USPS 20NEWS RCV1 ACC NMI ACC NMI ACC NMI ACC NMI AE-KM 80.8±1.8 75.2±1.1 72.9±0.8 71.7±1.2 49.0±2.9 44.5±1.5 56.7±3.6 31.5±4.3 DCNp + KM 84.9±3.1 79.4±1.5 73.9±0.7 74.1±1.1 50.5±3.1 46.5±1.6 57.3±3.6 32.3±4.4 DKMa + KM 84.8±1.3 78.7±0.8 76.9±4.9 74.3±1.5 49.0±2.5 44.0±1.0 53.4±5.9 27.4±5.3 DKMp + KM 85.1±3.0 79.9±1.5 75.7±1.3 77.6±1.1 52.1±2.7 47.1±1.3 58.3±3.8 33.0±4.9 40 30 20 10 0 10 20 30 40 30 20 10 0 10 20 30 40 AE 40 30 20 10 0 10 20 30 40 30 20 10 0 10 20 30 40 DCN 30 20 10 0 10 20 30 40 40 30 20 10 0 10 20 30 40 DKMa 30 20 10 0 10 20 30 40 30 20 10 0 10 20 30 DKMp Thibaut Thonet Deep k-Means 12 / 16
  • 18. Conclusion Proposition of Deep k-Means, a new approach to jointly perform k-means clustering and representation learning Take-home messages: Pretraining is clearly beneficial to deep clustering The differentiable formulation of DKM enables fully joint SGD and thus efficient use of GPU k-Means-based approaches can perform on par with state-of-the-art deep clustering approaches Thibaut Thonet Deep k-Means 13 / 16
  • 19. Ongoing work: Constrained Deep k-Means We wish to guide the clustering results in order to capture information that is relevant to the user (e.g., expert knowledge on the classes). We consider here that this information takes the form of lexical constraints with a set of keywords for document clustering engine car diet foodnovel book Thibaut Thonet Deep k-Means 14 / 16
  • 20. Ongoing work: Constrained Deep k-Means We wish to guide the clustering results in order to capture information that is relevant to the user (e.g., expert knowledge on the classes). We consider here that this information takes the form of lexical constraints with a set of keywords for document clustering engine car diet foodnovel book Two approaches considered: Constrain the document embeddings to put more emphasis on the keywords Constrain the cluster representatives to be related to subsets of the keywords Thibaut Thonet Deep k-Means 14 / 16
  • 21. Thank you! Paper pre-print available at: https://arxiv.org/pdf/1806.10069.pdf Thibaut Thonet Deep k-Means 15 / 16
  • 22. References Guo, X., Gao, L., Liu, X., & Yin, J. (2017). Improved Deep Embedded Clustering with Local Structure Preservation. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 1753–1759). MacQueen, J. (1967). Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability (pp. 281–297). Moradi Fard, M., Thonet, T., & Gaussier, E. (2018). Deep k-Means: Jointly Clustering with k-Means and Learning Representations. arXiv:1806.10069. Yang, B., Fu, X., Sidiropoulos, N. D., & Hong, M. (2017). Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering. In ICML ’17 (pp. 3861–3870). Thibaut Thonet Deep k-Means 16 / 16
  • 23. Appendix: clustering metrics Given the groundtruth classes S = {S1, . . . , SK }, the obtained clusters C = {C1, . . . , CK }, and the dataset X: ACC(C, S) = max φ 1 |X| |X| i=1 I{si = φ(ci)} NMI(C, S) = 2 I(C, S) H(C) + H(S) with I(C, S) = j,k |Cj ∩ Sk| |X| log |X| |Cj ∩ Sk| |Cj| |Sk| and H(C) = − j |Cj| |X| log |Cj| |X| Thibaut Thonet Deep k-Means 16 / 16