COMMUNITY DETECTION FOR
MULTI-LABEL CLASSIFICATION
Elaine Cecília Gatto | Alan Valejo | Mauri Ferrandin | Ricardo Cerri
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 2
Co-Supervisor
UFSC
Elaine Cecília
Gatto - Cissa
Alan
Demétrius
Baria Valejo
Main
Supervisor
UFSCar
Ricardo Cerri
PhD
Candidate
UFSCar
Mauri
Ferrandin
Collaborator
UFSCar
Researches
CONTENTS
• Introduction
• Proposal
• Experiments
• Results and Discussion
• Conclusion and Future Works
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 3
INTRODUCTION
INTRODUCTION
• Multi-Label Classification
• Label Correlations
• Multi-label Approaches:
• Global:
• New models or adaptation of existing
models;
• Learn all labels at once;
• Does not correctly learn correlations;
• Induction of a single model (one
classifier)
• Local:
• Divide the original problem into binary
problems;
• Learn each label individually;
• It does not learn the correlations;
• Induction of one model per label (many
classifiers);
• Different approach
• Use the advantgens of both;
• Mitigate the disadvantagens;
• Between global and local approaches
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 5
PROPOSAL
HYBRID PARTITIONS FOR MULTI-LABEL
CLASSIFICATION - HPML
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 7
Figure 1 – Types of partitions considered in this paper.
HYBRID PARTITIONS FOR MULTI-LABEL
CLASSIFICATION - HPML
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 8
Figure 2 – FlowChart HPML
HYBRID PARTITIONS FOR MULTI-LABEL
CLASSIFICATION - HPML
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 9
Stratification for Multi-Label Classification
Figure 2 – FlowChart HPML
HYBRID PARTITIONS FOR MULTI-LABEL
CLASSIFICATION - HPML
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 10
- Similarity Measures
- Jaccard Index
- Rogers-Tanimoto
- Similarity Matrices
- Vector-based data (Label Co-Occurrence Graphs)
- Sparsification
- cut edges with small weights
- Knn: k=1, k=2, k=3
- Threshold: self-loops and 10%
- 5 label co-occurrence graphs for each similarity
measure (10 in total)
Figure 2 – FlowChart HPML
Complex Networks
Community Detection Methods
- systematically encode interactions between
data and find relationships between them;
- correlations and partitioning;
- set of vertices with many edges inside and
some edges outside
HYBRID PARTITIONS FOR MULTI-LABEL
CLASSIFICATION - HPML
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 11
- Hierarchical Methods (dendrograms): several hybrid partitions for each
- Non-Hierarchical Methods: only one partition for each
- Several partitions in general
Figure 2 – FlowChart HPML
HYBRID PARTITIONS FOR MULTI-LABEL
CLASSIFICATION - HPML
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 12
- Modularity measure as a criterion for choosing a method
- Measures the separation among vertices
- Quantify the density of links within communities compared to links between communities
- Build the corresponding datasets
Figure 2 – FlowChart HPML
HYBRID PARTITIONS FOR MULTI-LABEL
CLASSIFICATION - HPML
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 13
- Validates all hybrid partitions from hierarchical methods
- Highest silhouette coefficient as criterion for choosing a hybrid partition
Figure 2 – FlowChart HPML
HYBRID PARTITIONS FOR MULTI-LABEL
CLASSIFICATION - HPML
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 14
- CLUS framework
- PCTs
- hierarchical multi-label classification
- binary and multi-label versions
Figure 2 – FlowChart HPML
- Same classifier for all type of partitions
- Compare partitions not methods
- Investigate the improvements for Hybrid
to local and global partitions
EXPERIMENTS
Datasets
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 16
• 10 fold cross
validation
• 20 datasets
• 5 domains:
áudio, music,
biology, image
and text
• Instances from
194 to 10k
• Labels from 4
to 178
Methods
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 17
Measures:
- MLP (missing label problem):
calculates the proportion of labels
that are never predicted
- MACRO-F1: considers the
individual performances in each
class
RESULTS AND
DISCUSSION
Community Detection Methods
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 19
Hierarchical
C.D.M.
Non-Hierarchical
C.D.M.
Jaccard
Index
Hierarchical
C.D.M.
Non-Hierarchical
C.D.M.
Rogers
Tanimoto
Hierarchical
C.D.M.
Non-Hierarchical
C.D.M.
Random
KNN TR KNN TR KNN TR KNN TR KNN TR KNN TR
Edge Betweenness
WalkTrap
WalkTrap WalkTrap
Info Map Info Map Info Map
Best Chosen Hybrid Partition
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 20
Most chosen Hybrid Partition in general:
A hybrid partition with 2 clusters is closer
to a global partition that is composed of a
single cluster.
This can be one reason that our performance results
are competitive compared with other partitions,
overcome the global, and are not superior to the local
ones for some datasets.
Performance
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 21
Performance
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 22
RANDOM PARTITIONS
- Better or superior than local for some
datasets
- Superior than global for most
datasets;
HYBRID PARTITIONS
- Better or superior than local for some
datasets
- Superior than global for most
datasets;
HYBRID – RANDOM - LOCAL
Competitive between them!
LOCAL
PARTITIONS
Best results
GLOBAL
PARTITIONS
Worst results
Performance Values
In General for all datasets,
partitions and measures
Performance
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 23
MACRO-F1
- Range: 0.0 to 1.0
- Low performance values
MLP
- Range: 1.0 to 00
- High performance values = high
prediction error
In General for all datasets and partitions
• HPML managed to obtain hybrid
partitions that can improve the
classifier.
• Low level of correlations between the
labels – random partitions better
• Global and local approaches may not
be learning correctly the label
correlations
• Our approach worked!!!
Statistical Tests
Nemenyi + Friedman
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 24
MACRO-F1
No differences:
Lo – NHRa
Lo – H-HPML
G – H-Ra – NH - H-HMPL
Different:
Lo – G
MLP
No differences:
Lo - Random – H-HPML - NH
H-HPML - NH
Different:
Lo - G
Left Side: best methods
Right Side: worst methods
CONCLUSION AND
FUTURE WORKS
Conclusion and Future Works
• Hybrid partitions obtained better or competitive results in several datasets;
• The average performance remained competitive for most methods and datasets;
• Independently of the partitioning used:
o There is no vast improvement besides our competitive results;
o Most labels were not learned by the classifier, even by traditional approaches;
o The classifier still has difficulties learning several labels and predicting them correctly;
• The local and global approaches still need improvements:
o They may not correctly learn label correlations;
• Multi-label classification methods need to improve because:
o Regardless of the partitioning used, or if the correlations were (or not) explored, we cannot state with absolute certainty that
they are correctly learning the labels.
• Still, it is better to use a partition composed of disjoint correlated labels clusters, even a random partitions, than a
global partition;
• Explore other multi-label evaluation measures;
• Use other classifiers and datasets;
20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 26
https://sites.google.com/view/cissagatto
THANKS!
“Whoever goes up the stairs should start at the
bottom. To be good at something you have to take it
one step at a time” (Haruichi Furudate – Haikyuu!!)

Community Detection Method for Multi-Label Classification

  • 1.
    COMMUNITY DETECTION FOR MULTI-LABELCLASSIFICATION Elaine Cecília Gatto | Alan Valejo | Mauri Ferrandin | Ricardo Cerri
  • 2.
    20/09/2023 12th BrazilianConference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 2 Co-Supervisor UFSC Elaine Cecília Gatto - Cissa Alan Demétrius Baria Valejo Main Supervisor UFSCar Ricardo Cerri PhD Candidate UFSCar Mauri Ferrandin Collaborator UFSCar Researches
  • 3.
    CONTENTS • Introduction • Proposal •Experiments • Results and Discussion • Conclusion and Future Works 20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 3
  • 4.
  • 5.
    INTRODUCTION • Multi-Label Classification •Label Correlations • Multi-label Approaches: • Global: • New models or adaptation of existing models; • Learn all labels at once; • Does not correctly learn correlations; • Induction of a single model (one classifier) • Local: • Divide the original problem into binary problems; • Learn each label individually; • It does not learn the correlations; • Induction of one model per label (many classifiers); • Different approach • Use the advantgens of both; • Mitigate the disadvantagens; • Between global and local approaches 20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 5
  • 6.
  • 7.
    HYBRID PARTITIONS FORMULTI-LABEL CLASSIFICATION - HPML 20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 7 Figure 1 – Types of partitions considered in this paper.
  • 8.
    HYBRID PARTITIONS FORMULTI-LABEL CLASSIFICATION - HPML 20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 8 Figure 2 – FlowChart HPML
  • 9.
    HYBRID PARTITIONS FORMULTI-LABEL CLASSIFICATION - HPML 20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 9 Stratification for Multi-Label Classification Figure 2 – FlowChart HPML
  • 10.
    HYBRID PARTITIONS FORMULTI-LABEL CLASSIFICATION - HPML 20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 10 - Similarity Measures - Jaccard Index - Rogers-Tanimoto - Similarity Matrices - Vector-based data (Label Co-Occurrence Graphs) - Sparsification - cut edges with small weights - Knn: k=1, k=2, k=3 - Threshold: self-loops and 10% - 5 label co-occurrence graphs for each similarity measure (10 in total) Figure 2 – FlowChart HPML Complex Networks Community Detection Methods - systematically encode interactions between data and find relationships between them; - correlations and partitioning; - set of vertices with many edges inside and some edges outside
  • 11.
    HYBRID PARTITIONS FORMULTI-LABEL CLASSIFICATION - HPML 20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 11 - Hierarchical Methods (dendrograms): several hybrid partitions for each - Non-Hierarchical Methods: only one partition for each - Several partitions in general Figure 2 – FlowChart HPML
  • 12.
    HYBRID PARTITIONS FORMULTI-LABEL CLASSIFICATION - HPML 20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 12 - Modularity measure as a criterion for choosing a method - Measures the separation among vertices - Quantify the density of links within communities compared to links between communities - Build the corresponding datasets Figure 2 – FlowChart HPML
  • 13.
    HYBRID PARTITIONS FORMULTI-LABEL CLASSIFICATION - HPML 20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 13 - Validates all hybrid partitions from hierarchical methods - Highest silhouette coefficient as criterion for choosing a hybrid partition Figure 2 – FlowChart HPML
  • 14.
    HYBRID PARTITIONS FORMULTI-LABEL CLASSIFICATION - HPML 20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 14 - CLUS framework - PCTs - hierarchical multi-label classification - binary and multi-label versions Figure 2 – FlowChart HPML - Same classifier for all type of partitions - Compare partitions not methods - Investigate the improvements for Hybrid to local and global partitions
  • 15.
  • 16.
    Datasets 20/09/2023 12th BrazilianConference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 16 • 10 fold cross validation • 20 datasets • 5 domains: áudio, music, biology, image and text • Instances from 194 to 10k • Labels from 4 to 178
  • 17.
    Methods 20/09/2023 12th BrazilianConference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 17 Measures: - MLP (missing label problem): calculates the proportion of labels that are never predicted - MACRO-F1: considers the individual performances in each class
  • 18.
  • 19.
    Community Detection Methods 20/09/202312th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 19 Hierarchical C.D.M. Non-Hierarchical C.D.M. Jaccard Index Hierarchical C.D.M. Non-Hierarchical C.D.M. Rogers Tanimoto Hierarchical C.D.M. Non-Hierarchical C.D.M. Random KNN TR KNN TR KNN TR KNN TR KNN TR KNN TR Edge Betweenness WalkTrap WalkTrap WalkTrap Info Map Info Map Info Map
  • 20.
    Best Chosen HybridPartition 20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 20 Most chosen Hybrid Partition in general: A hybrid partition with 2 clusters is closer to a global partition that is composed of a single cluster. This can be one reason that our performance results are competitive compared with other partitions, overcome the global, and are not superior to the local ones for some datasets.
  • 21.
    Performance 20/09/2023 12th BrazilianConference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 21
  • 22.
    Performance 20/09/2023 12th BrazilianConference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 22 RANDOM PARTITIONS - Better or superior than local for some datasets - Superior than global for most datasets; HYBRID PARTITIONS - Better or superior than local for some datasets - Superior than global for most datasets; HYBRID – RANDOM - LOCAL Competitive between them! LOCAL PARTITIONS Best results GLOBAL PARTITIONS Worst results Performance Values In General for all datasets, partitions and measures
  • 23.
    Performance 20/09/2023 12th BrazilianConference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 23 MACRO-F1 - Range: 0.0 to 1.0 - Low performance values MLP - Range: 1.0 to 00 - High performance values = high prediction error In General for all datasets and partitions • HPML managed to obtain hybrid partitions that can improve the classifier. • Low level of correlations between the labels – random partitions better • Global and local approaches may not be learning correctly the label correlations • Our approach worked!!!
  • 24.
    Statistical Tests Nemenyi +Friedman 20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 24 MACRO-F1 No differences: Lo – NHRa Lo – H-HPML G – H-Ra – NH - H-HMPL Different: Lo – G MLP No differences: Lo - Random – H-HPML - NH H-HPML - NH Different: Lo - G Left Side: best methods Right Side: worst methods
  • 25.
  • 26.
    Conclusion and FutureWorks • Hybrid partitions obtained better or competitive results in several datasets; • The average performance remained competitive for most methods and datasets; • Independently of the partitioning used: o There is no vast improvement besides our competitive results; o Most labels were not learned by the classifier, even by traditional approaches; o The classifier still has difficulties learning several labels and predicting them correctly; • The local and global approaches still need improvements: o They may not correctly learn label correlations; • Multi-label classification methods need to improve because: o Regardless of the partitioning used, or if the correlations were (or not) explored, we cannot state with absolute certainty that they are correctly learning the labels. • Still, it is better to use a partition composed of disjoint correlated labels clusters, even a random partitions, than a global partition; • Explore other multi-label evaluation measures; • Use other classifiers and datasets; 20/09/2023 12th Brazilian Conference on Intelligent Systems | Community Detection for Multi-Label Classification | BioMaL 26
  • 27.
    https://sites.google.com/view/cissagatto THANKS! “Whoever goes upthe stairs should start at the bottom. To be good at something you have to take it one step at a time” (Haruichi Furudate – Haikyuu!!)