IJCCI2023.pdf

•

0 likes•12 views

Semi-Supervised Fuzzy C-Means for Regression We propose a method to perform regression on partially labeled data, which is based on SSFCM (Semi-Supervised Fuzzy C-Means), an algorithm for semi-supervised classification based on fuzzy clustering. The proposed method, called SSFCM-R, precedes the application of SSFCM with a relabeling module based on target discretization. After the application of SSFCM, regression is carried out according to one out of two possible schemes: (i) the output corresponds to the label of the closest cluster; (ii) the output is a linear combination of the cluster labels weighted by the membership degree of the input. Some experiments on synthetic data are reported to compare both approaches. IJCCI 15th International joint Conference on Computational Intelligence, 13-15 November, 2023, Rome, Italy full paper: https://www.researchgate.net/publication/375671573_Semi-Supervised_Fuzzy_C-Means_for_Regression

Education

gabriella.casalino@uniba.it
Semi-Supervised Fuzzy C-Means
for Regression
Gabriella Casalino, Giovanna Castellano, Corrado Mencar
IJCCI 2023 - 15th International Joint Conference on Computational Intelligence
13-15 November 2023
Rome (Italy)

Supervised learning Semi-supervised learning Unsupervised learning

Text analysis E-health Learning analytics
Manufacturing Energy management Cyber security

Semi-Supervised Fuzzy C-Means
• Semi-supervised version of fuzzy C-Means (FCM)
• Exploits partially labeled data to drive the clustering process
• Minimizes the following objective function:
• Outcomes:
Membership matrix and a set of k centroids
U = [ujk] ck =
∑
N
j=1
u2
jkxj
∑
N
j=1
u2
jk
J =
K
∑
k=1
N
∑
j=1
um
jkd2
jk + α
K
∑
k=1
N
∑
j=1
(ujk − bj fjk)
m
d2
jk
supervised component
unsupervised component

Semi-Supervised Fuzzy C-Means for Regression
• SSFCM-R Semi-Supervised Fuzzy C-Means for Regression
• Regression algorithm based on the classi
fi
cation algorithm SSFCM (Pedrycz
and Waletzky, 1997)
• Labeled prototypes
• Three main stages:
• Pre-processing: discretization and relabeling is applied to the target values
• Clustering: SSFCM
• Post-processing: matching method using the derived label prototypes

SSFCM-R Pre-processing
• Let the set of numerical labels
• The set is discretized into intervals
• For each interval the subset is
computed
• The average value is computed
• New labels:
• The number intervals is a hyperparameter
Y = {y ∈
𝒴
|(x, y) ∈ L}
Y C
[ai, bi], i = 1,2,…, C Yi = Y ∩ [ai, bi]
̂
yi
̂
L = {(x, ̂
y) ∈ ̂
D|y ≠ □ }
C

SSFCM-R Pre-processing
• Three discretization strategies:
• D1: Equal-width discretization, separating all
possible values into bins, each having the same
width;
• D2: Equal-frequency discretization, separating all
possible values into bins, each having the same
amount of observations;
• D3: The intervals are de
fi
ned on the basis of the
centroids produced by K-Means clustering
C
C

SSFCM-R Post-processing
Given a new input , the estimated value is computed according to one
out of two possible strategies:
• max: The closest prototype to is determined and corresponds to the
class label
• sum: The membership degrees of to each cluster are determined by using
SSFCM, the estimated value corresponds to the weighted average
x ∈
𝒳
y
ck x ymax
̂
yik
x
y
ysum =
K
∑
k=1
uk(x) ̂
yik

Experimental settings
Eight labeling percentages
Three synthetic data Three bin sizes
Three discretization strategies Two post-processing methods MSE and TIME

Conclusions and future work
• SSFCM-R leverages a discretization mechanism to move from a continuous domain to
a discrete one
• The in
fl
uence of data complexity, discretization strategy, labeling percentage, and
number of bins, on the results, has been studied
• The equal width strategy has been proven to be the more e
ff
ective
• A small number of bins is preferable
• The post-processing method sum achieved lower errors than the max method
• Study di
ff
erent discretization strategies
• The e
ff
ectiveness of the proposed approach will be evaluated on real-world applications
• It will be compared with other semi-supervised regression algorithms

Thanks!
Gabriella Casalino
Computer Science Department
University of Bari, Italy
gabriella.casalino@uniba.it

Similar to IJCCI2023.pdf

Clustering techniquestalktoharry

IRJET- Handwritten Decimal Image Compression using Deep Stacked AutoencoderIRJET Journal

Restricting the Flow: Information Bottlenecks for Attributiontaeseon ryu

Parallel k nn on gpu architecture using opencleSAT Publishing House

Parallel knn on gpu architecture using opencleSAT Journals

IRJET- Different Data Mining Techniques for Weather PredictionIRJET Journal

Novel algorithms for Knowledge discovery from neural networks in Classificat...Dr.(Mrs).Gethsiyal Augasta

Unsupervised Learning Clustering KMean and Hirarchical.pptxFaridAliMousa1

Clustering on database systems rkmVahid Mirjalili

FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit

Extended Fuzzy C-Means with Random Sampling Techniques for Clustering Large DataAM Publications

Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI

Unsupervised learning networksDr. C.V. Suresh Babu

New Approach of Preprocessing For Numeral RecognitionIJERA Editor

Experimental study of Data clustering using k- Means and modified algorithmsIJDKP

Clustering introductionYan Xu

Machine Learning Algorithms for Image Classification of Hand Digits and Face ...IRJET Journal

A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020NECST Lab @ Politecnico di Milano

IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...IRJET Journal

Iiwas19 yamazaki slideKotaro Yamazaki

Similar to IJCCI2023.pdf (20)

Clustering techniques

IRJET- Handwritten Decimal Image Compression using Deep Stacked Autoencoder

Restricting the Flow: Information Bottlenecks for Attribution

Parallel k nn on gpu architecture using opencl

Parallel knn on gpu architecture using opencl

IRJET- Different Data Mining Techniques for Weather Prediction

Novel algorithms for Knowledge discovery from neural networks in Classificat...

Unsupervised Learning Clustering KMean and Hirarchical.pptx

Clustering on database systems rkm

FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS

Extended Fuzzy C-Means with Random Sampling Techniques for Clustering Large Data

Semantic Segmentation on Satellite Imagery

Unsupervised learning networks

New Approach of Preprocessing For Numeral Recognition

Experimental study of Data clustering using k- Means and modified algorithms

Clustering introduction

Machine Learning Algorithms for Image Classification of Hand Digits and Face ...

A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020

IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...

Iiwas19 yamazaki slide

Recently uploaded

Arihant handbook biology for class 11 .pdfchloefrazer622

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy

Accessible design: Minimum effort, maximum impactdawncurless

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching

fourth grading exam for kindergarten in writingTeacherCyreneCayanan

Class 11th Physics NEET formula sheet pdfAyushMahapatra5

Activity 01 - Artificial Culture (1).pdfciinovamais

Advance Mobile Application Development class 07Dr. Mazin Mohamed alkathiri

BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur

Software Engineering Methodologies (overview)eniolaolutunde

Student login on Anyboli platform.helpinRaunakKeshri1

Measures of Central Tendency: Mean, Median and ModeThiyagu K

Sports & Fitness Value Added Course FY..Disha Kariya

Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic

Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K

The basics of sentences session 2pptx copy.pptxheathfieldcps1

Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417

9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt

Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019

Recently uploaded (20)

Arihant handbook biology for class 11 .pdf

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf

Accessible design: Minimum effort, maximum impact

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...

fourth grading exam for kindergarten in writing

Class 11th Physics NEET formula sheet pdf

Activity 01 - Artificial Culture (1).pdf

Advance Mobile Application Development class 07

BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...

Software Engineering Methodologies (overview)

Student login on Anyboli platform.helpin

Measures of Central Tendency: Mean, Median and Mode

Sports & Fitness Value Added Course FY..

Key note speaker Neum_Admir Softic_ENG.pdf

Measures of Dispersion and Variability: Range, QD, AD and SD

The basics of sentences session 2pptx copy.pptx

Unit-IV- Pharma. Marketing Channels.pptx

9548086042 for call girls in Indira Nagar with room service

Sanyam Choudhary Chemistry practical.pdf

IJCCI2023.pdf

1. gabriella.casalino@uniba.it Semi-Supervised Fuzzy C-Means for Regression Gabriella Casalino, Giovanna Castellano, Corrado Mencar IJCCI 2023 - 15th International Joint Conference on Computational Intelligence 13-15 November 2023 Rome (Italy)

2. Supervised learning Semi-supervised learning Unsupervised learning

3. Text analysis E-health Learning analytics Manufacturing Energy management Cyber security

4. Classi fi cation Regression

5. Semi-Supervised Fuzzy C-Means • Semi-supervised version of fuzzy C-Means (FCM) • Exploits partially labeled data to drive the clustering process • Minimizes the following objective function: • Outcomes: Membership matrix and a set of k centroids U = [ujk] ck = ∑ N j=1 u2 jkxj ∑ N j=1 u2 jk J = K ∑ k=1 N ∑ j=1 um jkd2 jk + α K ∑ k=1 N ∑ j=1 (ujk − bj fjk) m d2 jk supervised component unsupervised component

6. Semi-Supervised Fuzzy C-Means for Regression • SSFCM-R Semi-Supervised Fuzzy C-Means for Regression • Regression algorithm based on the classi fi cation algorithm SSFCM (Pedrycz and Waletzky, 1997) • Labeled prototypes • Three main stages: • Pre-processing: discretization and relabeling is applied to the target values • Clustering: SSFCM • Post-processing: matching method using the derived label prototypes

7. SSFCM-R Pre-processing • Let the set of numerical labels • The set is discretized into intervals • For each interval the subset is computed • The average value is computed • New labels: • The number intervals is a hyperparameter Y = {y ∈ 𝒴 |(x, y) ∈ L} Y C [ai, bi], i = 1,2,…, C Yi = Y ∩ [ai, bi] ̂ yi ̂ L = {(x, ̂ y) ∈ ̂ D|y ≠ □ } C

8. SSFCM-R Pre-processing • Three discretization strategies: • D1: Equal-width discretization, separating all possible values into bins, each having the same width; • D2: Equal-frequency discretization, separating all possible values into bins, each having the same amount of observations; • D3: The intervals are de fi ned on the basis of the centroids produced by K-Means clustering C C

9. SSFCM-R Post-processing Given a new input , the estimated value is computed according to one out of two possible strategies: • max: The closest prototype to is determined and corresponds to the class label • sum: The membership degrees of to each cluster are determined by using SSFCM, the estimated value corresponds to the weighted average x ∈ 𝒳 y ck x ymax ̂ yik x y ysum = K ∑ k=1 uk(x) ̂ yik

10. Experiments - Data S1 S2 S3

11. Experimental settings Eight labeling percentages Three synthetic data Three bin sizes Three discretization strategies Two post-processing methods MSE and TIME

12. Experiments - Results

13. Experiments - Results

14. Experiments - Results

15. Conclusions and future work • SSFCM-R leverages a discretization mechanism to move from a continuous domain to a discrete one • The in fl uence of data complexity, discretization strategy, labeling percentage, and number of bins, on the results, has been studied • The equal width strategy has been proven to be the more e ff ective • A small number of bins is preferable • The post-processing method sum achieved lower errors than the max method • Study di ff erent discretization strategies • The e ff ectiveness of the proposed approach will be evaluated on real-world applications • It will be compared with other semi-supervised regression algorithms

16. Thanks! Gabriella Casalino Computer Science Department University of Bari, Italy gabriella.casalino@uniba.it

IJCCI2023.pdf

Recommended

Recommended

More Related Content

Similar to IJCCI2023.pdf

Similar to IJCCI2023.pdf (20)

More from Gabriella Casalino

More from Gabriella Casalino (15)

Recently uploaded

Recently uploaded (20)

IJCCI2023.pdf