SlideShare a Scribd company logo
Multi-task Distributed Learning using
Vision Transformer with
Random Patch Permutation
Sangjoon Parka, Jong Chul Yea,b
BISPL - BioImaging, Signal Processing, and Learning Lab.
aDept. of Bio and Brain Engineering
bKim Jaechul Graduate School of AI
KAIST, Korea
Background
• Artificial intelligence has been gaining unprecedent popularity,
including medical imaging.
• To enable the AI models to offer precise decision support with
robustness, an enormous amount of data are indispensable.
• However, data collected from volunteer participation of a few
institutions cannot fully meet the amount to guarantee robustness.
Background
• Especially for the newly emerging disease like COVID-19, the
limitation can be exacerbated as it is hard to build a large, well-
curated dataset promptly.
• The ability to collaborate between institutions is critical for the
successful application of AI in medical imaging, but the rigorous
regulations and the ethical restrictions is an another obstacle.
• United States Health Insurance Portability and Accountability Act (HIPAA)
• European General Data Protection Regulation (GDPR)
Background
• Accordingly, distributed learning methods, which perform
learning tasks at edge devices in a distributed fashion, can be
effectively utilized in healthcare research.
• Federated learning (FL), Split learning (SL), Swarm learning, etc..
• However, there are still many limitations with the existing
distributed learning methods, which hinder widespread adoption.
• FL: dependency on client-side computation, gradient inversion attack
• SL: high communication overhead, feature hijacking, slower convergence
Distributed learning methods
Park et al. NeurIPS 2021
Existing distributed learning methods
Vision Transformer (ViT)
We focused on the model configuration and intrinsic properties of Vision
Transformer (ViT), suitable for multi-task distributed learning.
Dosovitsky et al. ICLR 2021
Federated Split Task-Agnostic (FeSTA) learning
Park et al. NeurIPS 2021
Federated Split Task-Agnostic (FeSTA) learning
Park et al. NeurIPS 2021
Limitations
Problem with FeSTA learning
1. Huge communication overheads.
• Even larger than FL or SL.
2. Marginal improvement in performance with MTL.
3. No method for privacy-preservation.
• Privacy threatening the same as the FL or SL.
Purpose
Here, we aim to develop a new algorithm dubbed “Federated Split
Task-Agnostic learning with permutating pure ViT (p-FeSTA)”, which
alleviate these drawbacks.
To this end, we adopted the pure ViT architecture and patch
permutation module, which enforce model to benefit more from MTL
and to offer better privacy.
Permutation invariance of self-attention mechanism
=
Self-attention mechanism is essentially “permutation invariant”.
Permutation invariance of ViT
Naser et al. NeurIPS 2021
Different from CNN, ViT is essentially patch-permutation invariant.
We hypothesized that the property of permutation invariance can be utilized for privacy-
preservation.
Model configuration of the proposed p-FeSTA
Learning process of the proposed p-FeSTA
Datasets for multi-task learning
We simulated a multi-task learning with three tasks: classification, severity prediction
(regression), and segmentation with 6 clients.
Class
CNUH
(Test)
KNUH
(Client #1)
BIMCV
(Client #2)
Normal 417 400 93
Other
infection
58 400 -
COVID-
19
81 293 782
Total 556 1,093 875
Severity
CNUH
(Test)
YNU
(Client #3)
Brixia
(Client #4)
1 26 63 261
2 11 59 443
3 8 25 414
4 7 35 866
5 12 18 745
6 17 86 1,536
Total 81 286 4,265
Data
Subset 1
(Test)
Subset 2
(Client #5)
Subset 3
(Client #6)
Total 1,000 4,840 4,839
Classification data
Severity data
Segmentation data
Performance comparison
Methods
Classification
(AUC)
Severity
(MSE)
Segmentation
(Dice)
Average Normal Others COVID-19
Data centralized 0.671 (0.051) 0.735 (0.071) 0.777 (0.045) 0.500 (0.051) 1.592 (0.081) 0.793 (0.005)
Federated learning 0.601 (0.036) 0.597 (0.146) 0.483 (0.068) 0.722 (0.023) 2.159 (0.188) 0.789 (0.001)
Split learning 0.546 (0.024) 0.522 (0.067) 0.534 (0.050) 0.583 (0.013) 2.546 (0.414) 0.790 (0.000)
FeSTA (STL) 0.718 (0.047) 0.680 (0.088) 0.677 (0.032) 0.795 (0.036) 1.318 (0.125) 0.801 (0.011)
p-FeSTA (STL) 0.696 (0.022) 0.739 (0.093) 0.557 (0.118) 0.790 (0.045) 1.848 (0.080) 0.803 (0.004)
FeSTA (MTL) 0.780 (0.019) 0.785 (0.009) 0.793 (0.100) 0.761 (0.034) 1.416 (0.048) 0.796 (0.013)
p-FeSTA (MTL) 0.884 (0.008) 0.906 (0.004) 0.890 (0.011) 0.857 (0.014) 1.361 (0.057) 0.808 (0.003)
Values are presented in mean (standard deviation) of three repeats with different seed.
Significantly reduced communication costs
Communication overheads
Where R = total rounds, P = number of parameters, n = avg. rounds, B = batch size,
F = features, G = gradients, D = total number of data
Significantly reduced communication costs
Total Features/gradients Parameters
Classification
FL 10456.152M - 10456.152M
SL 9474.048M 9474.048M -
FeSTA 11390.423M 9474.048M 1916.375M
p-FeSTA 4880.648M (42.8%) 4844.890M 35.758M
Severity
FL 11090.794M - 11090.794M
SL 9474.048M 9474.048M -
FeSTA 12025.065M 9474.048M 2551.017M
p-FeSTA 5435.649M (45.2%) 4765.249M 670.401M
Segmentation
FL 11160.985M - 11160.985M
SL 9474.048M 9474.048M -
FeSTA 12095.256M 9474.048M 2621.208M
p-FeSTA 5899.113M (48.8%) 5158.520M 740.592M
Privacy preservation with permutation module
Privacy preservation with permutation module
Permutation module makes it “underdetermined problem”.
Ablation studies
Methods
Classification
(AUC)
Severity
(MSE)
Segmentation
(Dice)
Average Normal Others COVID-19
Proposed
0.884
(0.008)
0.906
(0.004)
0.890
(0.011)
0.857
(0.014)
1.361
(0.057)
0.808 (0.003)
w learnable head
0.890
(0.001)
0.909
(0.014)
0.895
(0.005)
0.866
(0.013)
1.545
(0.386)
0.789 (0.000)
w/o permutation
0.890
(0.010)
0.909
(0.002)
0.904
(0.023)
0.858
(0.008)
1.461
(0.064)
0.809 (0.002)
w/o positional encoding
0.827
(0.028)
0.831
(0.035)
0.786
(0.049)
0.862
(0.007)
1.942
(0.112)
0.798 (0.004)
Values are presented in mean (standard deviation) of three repeats with different seed.
Comparison of the distributed learning methods
FL SL FeSTA p-FeSTA
Model averaging O X O O
Client-side learning Parallel Sequential Parallel Parallel
Model split X O O O
Communication cost High High High Low
Benefit from MTL X X Small Large
Privacy preservation X X X O
Conclusion & Summary
• We proposed the novel p-FESTA framework with pure ViT, which
elicits the synergy of MTL as well as reduces the communication
overhead significantly compared to the existing methods.
• In addition, we also enhanced the privacy using the Permutation
module in a way specific to ViT.
• We believe that our work is a step toward facilitating distributed
learning among the institutions wanting to participate in different
tasks.
Thank you for attention!
Q & A

More Related Content

Similar to 20220517_KoSAIM_sjp_v1.2.pdf

FACE RECOGNITION USING ELM-LRF
FACE RECOGNITION USING ELM-LRFFACE RECOGNITION USING ELM-LRF
FACE RECOGNITION USING ELM-LRF
Aras Masood
 
Madhavi
MadhaviMadhavi
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
BaoTramDuong2
 
Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm O...
Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm O...Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm O...
Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm O...
ijtsrd
 
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
ijsc
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
ijsc
 
Disease Identification and Detection in Apple Tree
Disease Identification and Detection in Apple TreeDisease Identification and Detection in Apple Tree
Disease Identification and Detection in Apple Tree
ijtsrd
 
Neural networks for the prediction and forecasting of water resources variables
Neural networks for the prediction and forecasting of water resources variablesNeural networks for the prediction and forecasting of water resources variables
Neural networks for the prediction and forecasting of water resources variables
Jonathan D'Cruz
 
SURVEY PAPER ON OUT LIER DETECTION USING FUZZY LOGIC BASED METHOD
SURVEY PAPER ON OUT LIER DETECTION USING FUZZY LOGIC BASED METHODSURVEY PAPER ON OUT LIER DETECTION USING FUZZY LOGIC BASED METHOD
SURVEY PAPER ON OUT LIER DETECTION USING FUZZY LOGIC BASED METHOD
IJCI JOURNAL
 
IRJET- A Privacy Leakage Upper Bound Constraint-Based Approach for Cost-E...
IRJET-  	  A Privacy Leakage Upper Bound Constraint-Based Approach for Cost-E...IRJET-  	  A Privacy Leakage Upper Bound Constraint-Based Approach for Cost-E...
IRJET- A Privacy Leakage Upper Bound Constraint-Based Approach for Cost-E...
IRJET Journal
 
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNINGSEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
gerogepatton
 
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNINGSEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
gerogepatton
 
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNINGSEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
ijaia
 
Ccids 2019 cutting edges of ai technology in medicine
Ccids 2019 cutting edges of ai technology in medicineCcids 2019 cutting edges of ai technology in medicine
Ccids 2019 cutting edges of ai technology in medicine
Namkug Kim
 
IRJET - Survey on Analysis of Breast Cancer Prediction
IRJET - Survey on Analysis of Breast Cancer PredictionIRJET - Survey on Analysis of Breast Cancer Prediction
IRJET - Survey on Analysis of Breast Cancer Prediction
IRJET Journal
 
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
Damian R. Mingle, MBA
 
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
cscpconf
 
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
IJET - International Journal of Engineering and Techniques
 
A MACHINE LEARNING METHODOLOGY FOR DIAGNOSING CHRONIC KIDNEY DISEASE
A MACHINE LEARNING METHODOLOGY FOR DIAGNOSING CHRONIC KIDNEY DISEASEA MACHINE LEARNING METHODOLOGY FOR DIAGNOSING CHRONIC KIDNEY DISEASE
A MACHINE LEARNING METHODOLOGY FOR DIAGNOSING CHRONIC KIDNEY DISEASE
IRJET Journal
 
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
ahmad abdelhafeez
 

Similar to 20220517_KoSAIM_sjp_v1.2.pdf (20)

FACE RECOGNITION USING ELM-LRF
FACE RECOGNITION USING ELM-LRFFACE RECOGNITION USING ELM-LRF
FACE RECOGNITION USING ELM-LRF
 
Madhavi
MadhaviMadhavi
Madhavi
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
 
Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm O...
Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm O...Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm O...
Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm O...
 
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
 
Disease Identification and Detection in Apple Tree
Disease Identification and Detection in Apple TreeDisease Identification and Detection in Apple Tree
Disease Identification and Detection in Apple Tree
 
Neural networks for the prediction and forecasting of water resources variables
Neural networks for the prediction and forecasting of water resources variablesNeural networks for the prediction and forecasting of water resources variables
Neural networks for the prediction and forecasting of water resources variables
 
SURVEY PAPER ON OUT LIER DETECTION USING FUZZY LOGIC BASED METHOD
SURVEY PAPER ON OUT LIER DETECTION USING FUZZY LOGIC BASED METHODSURVEY PAPER ON OUT LIER DETECTION USING FUZZY LOGIC BASED METHOD
SURVEY PAPER ON OUT LIER DETECTION USING FUZZY LOGIC BASED METHOD
 
IRJET- A Privacy Leakage Upper Bound Constraint-Based Approach for Cost-E...
IRJET-  	  A Privacy Leakage Upper Bound Constraint-Based Approach for Cost-E...IRJET-  	  A Privacy Leakage Upper Bound Constraint-Based Approach for Cost-E...
IRJET- A Privacy Leakage Upper Bound Constraint-Based Approach for Cost-E...
 
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNINGSEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
 
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNINGSEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
 
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNINGSEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
SEGMENTATION OF THE GASTROINTESTINAL TRACT MRI USING DEEP LEARNING
 
Ccids 2019 cutting edges of ai technology in medicine
Ccids 2019 cutting edges of ai technology in medicineCcids 2019 cutting edges of ai technology in medicine
Ccids 2019 cutting edges of ai technology in medicine
 
IRJET - Survey on Analysis of Breast Cancer Prediction
IRJET - Survey on Analysis of Breast Cancer PredictionIRJET - Survey on Analysis of Breast Cancer Prediction
IRJET - Survey on Analysis of Breast Cancer Prediction
 
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
 
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
 
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
 
A MACHINE LEARNING METHODOLOGY FOR DIAGNOSING CHRONIC KIDNEY DISEASE
A MACHINE LEARNING METHODOLOGY FOR DIAGNOSING CHRONIC KIDNEY DISEASEA MACHINE LEARNING METHODOLOGY FOR DIAGNOSING CHRONIC KIDNEY DISEASE
A MACHINE LEARNING METHODOLOGY FOR DIAGNOSING CHRONIC KIDNEY DISEASE
 
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
 

Recently uploaded

Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 

Recently uploaded (20)

Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 

20220517_KoSAIM_sjp_v1.2.pdf

  • 1. Multi-task Distributed Learning using Vision Transformer with Random Patch Permutation Sangjoon Parka, Jong Chul Yea,b BISPL - BioImaging, Signal Processing, and Learning Lab. aDept. of Bio and Brain Engineering bKim Jaechul Graduate School of AI KAIST, Korea
  • 2. Background • Artificial intelligence has been gaining unprecedent popularity, including medical imaging. • To enable the AI models to offer precise decision support with robustness, an enormous amount of data are indispensable. • However, data collected from volunteer participation of a few institutions cannot fully meet the amount to guarantee robustness.
  • 3. Background • Especially for the newly emerging disease like COVID-19, the limitation can be exacerbated as it is hard to build a large, well- curated dataset promptly. • The ability to collaborate between institutions is critical for the successful application of AI in medical imaging, but the rigorous regulations and the ethical restrictions is an another obstacle. • United States Health Insurance Portability and Accountability Act (HIPAA) • European General Data Protection Regulation (GDPR)
  • 4. Background • Accordingly, distributed learning methods, which perform learning tasks at edge devices in a distributed fashion, can be effectively utilized in healthcare research. • Federated learning (FL), Split learning (SL), Swarm learning, etc.. • However, there are still many limitations with the existing distributed learning methods, which hinder widespread adoption. • FL: dependency on client-side computation, gradient inversion attack • SL: high communication overhead, feature hijacking, slower convergence
  • 5. Distributed learning methods Park et al. NeurIPS 2021 Existing distributed learning methods
  • 6. Vision Transformer (ViT) We focused on the model configuration and intrinsic properties of Vision Transformer (ViT), suitable for multi-task distributed learning. Dosovitsky et al. ICLR 2021
  • 7. Federated Split Task-Agnostic (FeSTA) learning Park et al. NeurIPS 2021
  • 8. Federated Split Task-Agnostic (FeSTA) learning Park et al. NeurIPS 2021
  • 9. Limitations Problem with FeSTA learning 1. Huge communication overheads. • Even larger than FL or SL. 2. Marginal improvement in performance with MTL. 3. No method for privacy-preservation. • Privacy threatening the same as the FL or SL.
  • 10. Purpose Here, we aim to develop a new algorithm dubbed “Federated Split Task-Agnostic learning with permutating pure ViT (p-FeSTA)”, which alleviate these drawbacks. To this end, we adopted the pure ViT architecture and patch permutation module, which enforce model to benefit more from MTL and to offer better privacy.
  • 11. Permutation invariance of self-attention mechanism = Self-attention mechanism is essentially “permutation invariant”.
  • 12. Permutation invariance of ViT Naser et al. NeurIPS 2021 Different from CNN, ViT is essentially patch-permutation invariant. We hypothesized that the property of permutation invariance can be utilized for privacy- preservation.
  • 13. Model configuration of the proposed p-FeSTA
  • 14. Learning process of the proposed p-FeSTA
  • 15. Datasets for multi-task learning We simulated a multi-task learning with three tasks: classification, severity prediction (regression), and segmentation with 6 clients. Class CNUH (Test) KNUH (Client #1) BIMCV (Client #2) Normal 417 400 93 Other infection 58 400 - COVID- 19 81 293 782 Total 556 1,093 875 Severity CNUH (Test) YNU (Client #3) Brixia (Client #4) 1 26 63 261 2 11 59 443 3 8 25 414 4 7 35 866 5 12 18 745 6 17 86 1,536 Total 81 286 4,265 Data Subset 1 (Test) Subset 2 (Client #5) Subset 3 (Client #6) Total 1,000 4,840 4,839 Classification data Severity data Segmentation data
  • 16. Performance comparison Methods Classification (AUC) Severity (MSE) Segmentation (Dice) Average Normal Others COVID-19 Data centralized 0.671 (0.051) 0.735 (0.071) 0.777 (0.045) 0.500 (0.051) 1.592 (0.081) 0.793 (0.005) Federated learning 0.601 (0.036) 0.597 (0.146) 0.483 (0.068) 0.722 (0.023) 2.159 (0.188) 0.789 (0.001) Split learning 0.546 (0.024) 0.522 (0.067) 0.534 (0.050) 0.583 (0.013) 2.546 (0.414) 0.790 (0.000) FeSTA (STL) 0.718 (0.047) 0.680 (0.088) 0.677 (0.032) 0.795 (0.036) 1.318 (0.125) 0.801 (0.011) p-FeSTA (STL) 0.696 (0.022) 0.739 (0.093) 0.557 (0.118) 0.790 (0.045) 1.848 (0.080) 0.803 (0.004) FeSTA (MTL) 0.780 (0.019) 0.785 (0.009) 0.793 (0.100) 0.761 (0.034) 1.416 (0.048) 0.796 (0.013) p-FeSTA (MTL) 0.884 (0.008) 0.906 (0.004) 0.890 (0.011) 0.857 (0.014) 1.361 (0.057) 0.808 (0.003) Values are presented in mean (standard deviation) of three repeats with different seed.
  • 17. Significantly reduced communication costs Communication overheads Where R = total rounds, P = number of parameters, n = avg. rounds, B = batch size, F = features, G = gradients, D = total number of data
  • 18. Significantly reduced communication costs Total Features/gradients Parameters Classification FL 10456.152M - 10456.152M SL 9474.048M 9474.048M - FeSTA 11390.423M 9474.048M 1916.375M p-FeSTA 4880.648M (42.8%) 4844.890M 35.758M Severity FL 11090.794M - 11090.794M SL 9474.048M 9474.048M - FeSTA 12025.065M 9474.048M 2551.017M p-FeSTA 5435.649M (45.2%) 4765.249M 670.401M Segmentation FL 11160.985M - 11160.985M SL 9474.048M 9474.048M - FeSTA 12095.256M 9474.048M 2621.208M p-FeSTA 5899.113M (48.8%) 5158.520M 740.592M
  • 19. Privacy preservation with permutation module
  • 20. Privacy preservation with permutation module Permutation module makes it “underdetermined problem”.
  • 21. Ablation studies Methods Classification (AUC) Severity (MSE) Segmentation (Dice) Average Normal Others COVID-19 Proposed 0.884 (0.008) 0.906 (0.004) 0.890 (0.011) 0.857 (0.014) 1.361 (0.057) 0.808 (0.003) w learnable head 0.890 (0.001) 0.909 (0.014) 0.895 (0.005) 0.866 (0.013) 1.545 (0.386) 0.789 (0.000) w/o permutation 0.890 (0.010) 0.909 (0.002) 0.904 (0.023) 0.858 (0.008) 1.461 (0.064) 0.809 (0.002) w/o positional encoding 0.827 (0.028) 0.831 (0.035) 0.786 (0.049) 0.862 (0.007) 1.942 (0.112) 0.798 (0.004) Values are presented in mean (standard deviation) of three repeats with different seed.
  • 22. Comparison of the distributed learning methods FL SL FeSTA p-FeSTA Model averaging O X O O Client-side learning Parallel Sequential Parallel Parallel Model split X O O O Communication cost High High High Low Benefit from MTL X X Small Large Privacy preservation X X X O
  • 23. Conclusion & Summary • We proposed the novel p-FESTA framework with pure ViT, which elicits the synergy of MTL as well as reduces the communication overhead significantly compared to the existing methods. • In addition, we also enhanced the privacy using the Permutation module in a way specific to ViT. • We believe that our work is a step toward facilitating distributed learning among the institutions wanting to participate in different tasks.
  • 24. Thank you for attention!
  • 25. Q & A