SlideShare a Scribd company logo
1 of 36
Download to read offline
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
Machine Learning: unsupervised classifiers
for a divorce dataset
Paula Robles López
Universidad Politécnica de Madrid (UPM)
7-12-2022
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 1 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
1 Introduction
2 Hierarchical clustering
3 Nonhierarchical clustering
4 Feature importances
5 External validation
6 Conclusions
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 2 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
1 Introduction
2 Hierarchical clustering
3 Nonhierarchical clustering
4 Feature importances
5 External validation
6 Conclusions
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 3 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
Problem overview:
1 Goal: to group the data observations into k clusters
according to their similarities
2 Data for 84 divorced and 86 married people from Turkey.
Balanced classes. "UNKNOWN
3 54 ordinal features and 170 total records.
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 4 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
PCA
We will be performing a PCA for a better cluster visualization:
→ 170 objects in a 54-dimensional space
→ Dimensionality reduction to a 2-dimensional space
→ The PCs explain +80% of the initial data variance
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 5 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
PCA
Our first step, 170 data points projected onto the first two PC:
" We do not know the real labels, we have no prior knowledge
of the groupings.
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 6 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
1 Introduction
2 Hierarchical clustering
3 Nonhierarchical clustering
4 Feature importances
5 External validation
6 Conclusions
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 7 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
We will be performing agglomerative clustering, but first we need
to find the k value.
→ We need internal validation measures like a dendrogram of the
tree-like groupings according to the clustering.
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 8 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
→ We also have the silhoutte scores.
k = 2, silhouette score = 0.809
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 9 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
k = 3, silhouette score = 0.726
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 10 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
k = 4, silhouette score = 0.479
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 11 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
k = 2 is the optimal number of cluster according to the silhouette
scores!
→ The Calinski index also says so.
Higher values → the clusters are dense and well separated
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 12 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
Clustering results!
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 13 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
1 Introduction
2 Hierarchical clustering
3 Nonhierarchical clustering
Partitional clustering
Probabilistic clustering
4 Feature importances
5 External validation
6 Conclusions
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 14 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
1 Introduction
2 Hierarchical clustering
3 Nonhierarchical clustering
Partitional clustering
Probabilistic clustering
4 Feature importances
5 External validation
6 Conclusions
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 15 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
We will be performing K-Means clustering, but first we need to
find the k value.
→ We need internal validation measures like the elbow method of
the SSE against the number of clusters.
k = 2 is the elbow point.
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 16 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
→ We also have the silhoutte scores.
k = 2, silhouette score = 0.809
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 17 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
k = 3, silhouette score = 0.724
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 18 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
k = 4, silhouette score = 0.493
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 19 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
k = 2 is the optimal number of cluster according to the silhouette
scores!
→ The Calinski index also says so.
Higher values → the clusters are dense and well separated
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 20 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
Clustering results!
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 21 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
1 Introduction
2 Hierarchical clustering
3 Nonhierarchical clustering
Partitional clustering
Probabilistic clustering
4 Feature importances
5 External validation
6 Conclusions
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 22 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
We will be performing Gaussian Mixture clustering, but first we
need to find the k value.
→ We need internal validation measures like the BIC score to
choose the best fitting model among the candidates.
k = 3 is the lowest point.
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 23 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
→ We also have the silhoutte scores.
k = 2, silhouette score = 0.809
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 24 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
k = 3, silhouette score = 0.715
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 25 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
k = 4, silhouette score = 0.466
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 26 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
k = 2 is the optimal number of cluster according to the silhouette
scores!
→ The Calinski index also says so.
Higher values → the clusters are dense and well separated
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 27 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
Clustering results!
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 28 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
1 Introduction
2 Hierarchical clustering
3 Nonhierarchical clustering
4 Feature importances
5 External validation
6 Conclusions
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 29 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
We can use the Random Forest classifier to know more about
the clustering.
→ New categorical variable: the K-Means cluster assignation.
→ We compute this variable as the class label.
→ We train a Random Forest classifier and extract the most
important features.
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 30 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
→ Shared values, overall happiness and knowledge about the
partner’s inner and outer world.
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 31 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
1 Introduction
2 Hierarchical clustering
3 Nonhierarchical clustering
4 Feature importances
5 External validation
6 Conclusions
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 32 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
External information can be used to validate the clusterings. Here,
we use the class label to calculate how similar our cluster results
are to reality.
→ For this: Adjusted Rand Index (ARI).
Close to one → almost perfect match!
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 33 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
1 Introduction
2 Hierarchical clustering
3 Nonhierarchical clustering
4 Feature importances
5 External validation
6 Conclusions
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 34 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
Conclusions:
1 The silhouette plots and the Calinski index suggest k = 2 for
all the clusterings.
2 All the clustering methods give the same output → our results
are reasonably reliable.
3 We can use PCA to improve efficiency and get equally good
clustering results!
4 The feature importances extraction proves that there are more
core-like issues and fundamental incompantibilities than
expected, which could be related to the data source (Turkey).
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 35 / 36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .. .. .. .
Introduction
. .. .. .. .. .. .. .
Hierarchical clustering
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
Nonhierarchical clustering
. .. .. .
Feature importances
. .. .
External validation
. .. .. .
Conclusions
Thank you.
Paula Robles López Universidad Politécnica de Madrid (UPM)
Machine Learning: unsupervised classifiers 36 / 36

More Related Content

Similar to clustering

Optimization in scilab
Optimization in scilabOptimization in scilab
Optimization in scilabScilab
 
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...valentincivil
 
Mongo db crud-guide
Mongo db crud-guideMongo db crud-guide
Mongo db crud-guideDan Llimpe
 
Mongo db crud guide
Mongo db crud guideMongo db crud guide
Mongo db crud guideDeysi Gmarra
 
Performance Evaluation of Path Planning Techniques for Unmanned Aerial Vehicles
Performance Evaluation of Path Planning Techniques for Unmanned Aerial VehiclesPerformance Evaluation of Path Planning Techniques for Unmanned Aerial Vehicles
Performance Evaluation of Path Planning Techniques for Unmanned Aerial VehiclesApuroop Paleti
 
(Textbooks in mathematics) hodge, jonathan k. schlicker, steven sundstrom,...
(Textbooks in mathematics) hodge, jonathan k.  schlicker,  steven  sundstrom,...(Textbooks in mathematics) hodge, jonathan k.  schlicker,  steven  sundstrom,...
(Textbooks in mathematics) hodge, jonathan k. schlicker, steven sundstrom,...META GUNAWAN
 
Showcase: on segmentation importance for marketing campaign in retail using R...
Showcase: on segmentation importance for marketing campaign in retail using R...Showcase: on segmentation importance for marketing campaign in retail using R...
Showcase: on segmentation importance for marketing campaign in retail using R...Wit Jakuczun
 
Final Report - Major Project - MAP
Final Report - Major Project - MAPFinal Report - Major Project - MAP
Final Report - Major Project - MAPArjun Aravind
 
A Bilevel Optimization Approach to Machine Learning
A Bilevel Optimization Approach to Machine LearningA Bilevel Optimization Approach to Machine Learning
A Bilevel Optimization Approach to Machine Learningbutest
 
Modlica an introduction by Arun Umrao
Modlica an introduction by Arun UmraoModlica an introduction by Arun Umrao
Modlica an introduction by Arun Umraossuserd6b1fd
 

Similar to clustering (20)

Master thesis
Master thesisMaster thesis
Master thesis
 
Optimization in scilab
Optimization in scilabOptimization in scilab
Optimization in scilab
 
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
 
xlelke00
xlelke00xlelke00
xlelke00
 
Non omniscience
Non omniscienceNon omniscience
Non omniscience
 
Marketing Analytics
Marketing AnalyticsMarketing Analytics
Marketing Analytics
 
Mongo db crud-guide
Mongo db crud-guideMongo db crud-guide
Mongo db crud-guide
 
Mongo db crud guide
Mongo db crud guideMongo db crud guide
Mongo db crud guide
 
Performance Evaluation of Path Planning Techniques for Unmanned Aerial Vehicles
Performance Evaluation of Path Planning Techniques for Unmanned Aerial VehiclesPerformance Evaluation of Path Planning Techniques for Unmanned Aerial Vehicles
Performance Evaluation of Path Planning Techniques for Unmanned Aerial Vehicles
 
Rprogramming
RprogrammingRprogramming
Rprogramming
 
Thesis Abstract
Thesis AbstractThesis Abstract
Thesis Abstract
 
(Textbooks in mathematics) hodge, jonathan k. schlicker, steven sundstrom,...
(Textbooks in mathematics) hodge, jonathan k.  schlicker,  steven  sundstrom,...(Textbooks in mathematics) hodge, jonathan k.  schlicker,  steven  sundstrom,...
(Textbooks in mathematics) hodge, jonathan k. schlicker, steven sundstrom,...
 
HonsTokelo
HonsTokeloHonsTokelo
HonsTokelo
 
phd_unimi_R08725
phd_unimi_R08725phd_unimi_R08725
phd_unimi_R08725
 
Sunidhi_MSc_F2015
Sunidhi_MSc_F2015Sunidhi_MSc_F2015
Sunidhi_MSc_F2015
 
Showcase: on segmentation importance for marketing campaign in retail using R...
Showcase: on segmentation importance for marketing campaign in retail using R...Showcase: on segmentation importance for marketing campaign in retail using R...
Showcase: on segmentation importance for marketing campaign in retail using R...
 
Final Report - Major Project - MAP
Final Report - Major Project - MAPFinal Report - Major Project - MAP
Final Report - Major Project - MAP
 
Thesis_Prakash
Thesis_PrakashThesis_Prakash
Thesis_Prakash
 
A Bilevel Optimization Approach to Machine Learning
A Bilevel Optimization Approach to Machine LearningA Bilevel Optimization Approach to Machine Learning
A Bilevel Optimization Approach to Machine Learning
 
Modlica an introduction by Arun Umrao
Modlica an introduction by Arun UmraoModlica an introduction by Arun Umrao
Modlica an introduction by Arun Umrao
 

Recently uploaded

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 

Recently uploaded (20)

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 

clustering

  • 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions Machine Learning: unsupervised classifiers for a divorce dataset Paula Robles López Universidad Politécnica de Madrid (UPM) 7-12-2022 Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 1 / 36
  • 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions 1 Introduction 2 Hierarchical clustering 3 Nonhierarchical clustering 4 Feature importances 5 External validation 6 Conclusions Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 2 / 36
  • 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions 1 Introduction 2 Hierarchical clustering 3 Nonhierarchical clustering 4 Feature importances 5 External validation 6 Conclusions Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 3 / 36
  • 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions Problem overview: 1 Goal: to group the data observations into k clusters according to their similarities 2 Data for 84 divorced and 86 married people from Turkey. Balanced classes. "UNKNOWN 3 54 ordinal features and 170 total records. Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 4 / 36
  • 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions PCA We will be performing a PCA for a better cluster visualization: → 170 objects in a 54-dimensional space → Dimensionality reduction to a 2-dimensional space → The PCs explain +80% of the initial data variance Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 5 / 36
  • 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions PCA Our first step, 170 data points projected onto the first two PC: " We do not know the real labels, we have no prior knowledge of the groupings. Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 6 / 36
  • 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions 1 Introduction 2 Hierarchical clustering 3 Nonhierarchical clustering 4 Feature importances 5 External validation 6 Conclusions Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 7 / 36
  • 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions We will be performing agglomerative clustering, but first we need to find the k value. → We need internal validation measures like a dendrogram of the tree-like groupings according to the clustering. Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 8 / 36
  • 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions → We also have the silhoutte scores. k = 2, silhouette score = 0.809 Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 9 / 36
  • 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions k = 3, silhouette score = 0.726 Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 10 / 36
  • 11. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions k = 4, silhouette score = 0.479 Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 11 / 36
  • 12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions k = 2 is the optimal number of cluster according to the silhouette scores! → The Calinski index also says so. Higher values → the clusters are dense and well separated Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 12 / 36
  • 13. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions Clustering results! Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 13 / 36
  • 14. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions 1 Introduction 2 Hierarchical clustering 3 Nonhierarchical clustering Partitional clustering Probabilistic clustering 4 Feature importances 5 External validation 6 Conclusions Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 14 / 36
  • 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions 1 Introduction 2 Hierarchical clustering 3 Nonhierarchical clustering Partitional clustering Probabilistic clustering 4 Feature importances 5 External validation 6 Conclusions Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 15 / 36
  • 16. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions We will be performing K-Means clustering, but first we need to find the k value. → We need internal validation measures like the elbow method of the SSE against the number of clusters. k = 2 is the elbow point. Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 16 / 36
  • 17. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions → We also have the silhoutte scores. k = 2, silhouette score = 0.809 Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 17 / 36
  • 18. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions k = 3, silhouette score = 0.724 Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 18 / 36
  • 19. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions k = 4, silhouette score = 0.493 Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 19 / 36
  • 20. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions k = 2 is the optimal number of cluster according to the silhouette scores! → The Calinski index also says so. Higher values → the clusters are dense and well separated Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 20 / 36
  • 21. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions Clustering results! Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 21 / 36
  • 22. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions 1 Introduction 2 Hierarchical clustering 3 Nonhierarchical clustering Partitional clustering Probabilistic clustering 4 Feature importances 5 External validation 6 Conclusions Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 22 / 36
  • 23. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions We will be performing Gaussian Mixture clustering, but first we need to find the k value. → We need internal validation measures like the BIC score to choose the best fitting model among the candidates. k = 3 is the lowest point. Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 23 / 36
  • 24. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions → We also have the silhoutte scores. k = 2, silhouette score = 0.809 Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 24 / 36
  • 25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions k = 3, silhouette score = 0.715 Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 25 / 36
  • 26. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions k = 4, silhouette score = 0.466 Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 26 / 36
  • 27. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions k = 2 is the optimal number of cluster according to the silhouette scores! → The Calinski index also says so. Higher values → the clusters are dense and well separated Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 27 / 36
  • 28. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions Clustering results! Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 28 / 36
  • 29. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions 1 Introduction 2 Hierarchical clustering 3 Nonhierarchical clustering 4 Feature importances 5 External validation 6 Conclusions Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 29 / 36
  • 30. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions We can use the Random Forest classifier to know more about the clustering. → New categorical variable: the K-Means cluster assignation. → We compute this variable as the class label. → We train a Random Forest classifier and extract the most important features. Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 30 / 36
  • 31. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions → Shared values, overall happiness and knowledge about the partner’s inner and outer world. Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 31 / 36
  • 32. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions 1 Introduction 2 Hierarchical clustering 3 Nonhierarchical clustering 4 Feature importances 5 External validation 6 Conclusions Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 32 / 36
  • 33. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions External information can be used to validate the clusterings. Here, we use the class label to calculate how similar our cluster results are to reality. → For this: Adjusted Rand Index (ARI). Close to one → almost perfect match! Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 33 / 36
  • 34. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions 1 Introduction 2 Hierarchical clustering 3 Nonhierarchical clustering 4 Feature importances 5 External validation 6 Conclusions Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 34 / 36
  • 35. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions Conclusions: 1 The silhouette plots and the Calinski index suggest k = 2 for all the clusterings. 2 All the clustering methods give the same output → our results are reasonably reliable. 3 We can use PCA to improve efficiency and get equally good clustering results! 4 The feature importances extraction proves that there are more core-like issues and fundamental incompantibilities than expected, which could be related to the data source (Turkey). Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 35 / 36
  • 36. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . Introduction . .. .. .. .. .. .. . Hierarchical clustering . .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Nonhierarchical clustering . .. .. . Feature importances . .. . External validation . .. .. . Conclusions Thank you. Paula Robles López Universidad Politécnica de Madrid (UPM) Machine Learning: unsupervised classifiers 36 / 36