SlideShare a Scribd company logo
1 of 28
Transformers in Medical Imaging
DK
ViT
The key advantage of ViT: Long-range modeling by Multi-head
Self-Attention.
Why long-range modelling is useful for
medical imaging?
Paper 1: Bilateral-ViT for Robust Fovea
Localization (ISBI22 Best Paper Finalist)
The fovea is a key anatomical location in the retina. Visual
acuity is highest in the fovea region.
Challenges of Robust fovea localization
• The fovea is normally a darker spot however its local appearance can
be complicated by retina diseases.
• The shape of the blood vessel provides very useful global image
structure info (long-range) for fovea localization.
Methods
• A transformer-based network taking
in both retina image and vessel
segmentation mask for robust fovea
localization.
• Overall architecture is a U2Net
structure with customizations.
• We formulate the fovea localization
as a segmentation problem and the
loss function is a dice loss + cross
entropy.
Methods – Main Branch and Vessel Branch
• Main branch taking into
the retina images and
encoding the feature using
both CNN-feature blocks
and transformer block
• Vessel branch taking into
the vessel segmentation
mask with different scale
(size) and performing
feature encoding
Methods – Fusion Branch
• Fusion branch merges the
image feature and vessel
feature in a multi-scale
manner.
• The fusion branch and
vessel branch feature
blocks are also U-Net-like.
• Hence the overall
network is a nested U-net
(U2-Net).
Experiments
• Performed much better compared to U2Net (Pure CNN Based) Or
TransUNet, especially on diseased images.
• Our network = TransUNet + U2Net + Customized vessel fusion =
Good performance
Experiments
• Good cross-dataset performance -> again implying Robust fovea
localization
Paper 2 - RTNet: Relation Transformer Network for
Diabetic Retinopathy Multi-lesion Segmentation
(TMI 2022)
Lesion segmentation in retina images
by considering the interaction among
different lesions, and interaction
between lesions and blood vessels.
Methods
• Input is a retina image. Outputs are lesion masks segmentation results (multi-class) and a vessel
mask.
• Vessel mask is only an auxiliary branch used only in training to provide the vessel supervisory signals.
In the training, the vessel mask pseudo ground-truth is provided by another vessel segmentation
trained model.
• Loss function is simply standard pixel-wise cross-entropy segmentation loss
Methods – Global Block
• Global branch models the global spatial attention for each channel, such that small
lesions/small structures can be highlighted via global spatial attention.
• Two separate global branches for vessel features and lesions feature respectively.
Methods – Relation Block
• Relation blocks are just standard transformer blocks modeling long-range spatial interactions.
• Self-attention block models the interaction among different lesions
• Cross attention block models the interaction between lesion features and vessel features
Experiments –Ablation Study
Most important result: the ablation study to prove relationship blocks are
effective!
Cross attention head is super-important for MA(小出血点)and SE(软渗).
Both MA and SE are difficult-to-segment lesions that might be helped by the
presence of blood vessels/other lesions. Motivation verified.
Experiment – Relation Blocks Attention
Visualization
As we can see self-attention highlights some other potential lesion
regions, and cross attention highlights blood vessels.
Experiments – SOTA comparisons
• Same datasets experiments: performance is simply so-so. OK but not impressive.
Experiments – Cross Dataset
• Significantly beat the competitors on cross-dataset settings.
Conclusion – Part 1
• Long-range interactions, and hence, the applications of transformers,
are indeed important for many medical tasks, like fovea localization
and lesion segmentation.
• Medicial papers: strong clinical background and certain experiments
(like cross dataset settings and attention visualization) would be
impressive.
• Next: long-range interactions are expensive quadratic complexity,
especially for 3D settings -> Efficient transformers
Efficient Transformers
Paper 1: Self-Supervised Pre-Training of Swin
Transformers for 3D Medical Image Analysis
(NVIDIA, CVPR22)
• SOTA performance on MSD and
BTCV benchmark.
• MSD: a comprehensive
benchmark of 10 segmentation
tasks for both CT and MRI
• BTCV: abdomen segmentation
challenges for 13 organs.
• Code and pre-trained model
available!
Why the performance can be so good?
• Applies their customized SSL tricks: Masked Volume Inpainting (cutoff
augmentation + reconstruction loss), Image Rotation (classifying
rotation angles), and Contrastive coding
• Pretraining on 5,050 publicly available CT images from various
applications
• SOTA model architecture (Swin UNETR)
Swin UNETR
• Encoder: a series of Swin transform blocks + down sampling
• Decoder: resnet blocks + upsampling
• Overall: a Unet-like architecture
Swin Transformer Blocks
• Divided the 3D tokens into subwindows and only calculate the self-attention
within each subwindow. That is, we do not calculate global attention and only
calculate local attention.
• Global attention can be modeled in deeper network layers
• To avoid boundary issues, use shifted windowing mechanism.
Paper 2: CoTr: Efficiently Bridging CNN and
Transformer for 3D Medical Image Segmentation
(MICCAI 21)
• A hybrid CNN – transformer approach
• We mainly want to know how it uses the deformable transformer (DeTrans)
for efficiently modeling long-range interactions.
DEFORMABLE DETR (DETR): DEFORMABLE
TRANSFORMERS FOR END-TO-END OBJECT
DETECTION
• Do not perform long-range interaction from query pixel to all image pixels.
• Instead, sample a smaller number of image positions (learned sampling
offsets), and only calculate attention on the sampled image positions.
Conclusion – Part 2
• Some established efficient transformer techniques (Swin and
deformable sampling)
• Swin UNETR: a strong baseline for starting your medical
segmentation projects.
Discussion
Why do you think transformer or long-range interaction helps in your
machine learning projects?

More Related Content

Similar to Transformer in Medical Imaging A brief review

Module 1
Module 1Module 1
Module 1ushaBS2
 
Automatic left ventricle segmentation
Automatic left ventricle segmentationAutomatic left ventricle segmentation
Automatic left ventricle segmentationahmad abdelhafeez
 
IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...
IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...
IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...IRJET Journal
 
YOLO BASED SHIP IMAGE DETECTION AND CLASSIFICATION
YOLO BASED SHIP IMAGE DETECTION AND CLASSIFICATIONYOLO BASED SHIP IMAGE DETECTION AND CLASSIFICATION
YOLO BASED SHIP IMAGE DETECTION AND CLASSIFICATIONIRJET Journal
 
A Survey on Retinal Area Detector From Scanning Laser Ophthalmoscope (SLO) Im...
A Survey on Retinal Area Detector From Scanning Laser Ophthalmoscope (SLO) Im...A Survey on Retinal Area Detector From Scanning Laser Ophthalmoscope (SLO) Im...
A Survey on Retinal Area Detector From Scanning Laser Ophthalmoscope (SLO) Im...IRJET Journal
 
Zeeshan.ali.presentations
Zeeshan.ali.presentationsZeeshan.ali.presentations
Zeeshan.ali.presentationsZeeshan Ali
 
spectral domain OCT by Hala Fathi Hannot
spectral domain OCT by Hala Fathi Hannotspectral domain OCT by Hala Fathi Hannot
spectral domain OCT by Hala Fathi HannotHala Hannot
 
Optimal deep learning model For Classification of Lung Cancer on CT Images
Optimal deep learning model For Classification of  Lung Cancer on CT ImagesOptimal deep learning model For Classification of  Lung Cancer on CT Images
Optimal deep learning model For Classification of Lung Cancer on CT ImagesDr.Sachi Nandan Mohanty
 
Optimal Deep Learning model for Classification of Lung Cancer
Optimal Deep Learning model for Classification of Lung CancerOptimal Deep Learning model for Classification of Lung Cancer
Optimal Deep Learning model for Classification of Lung CancerDr.Sachi Nandan Mohanty
 
IRJET- Automatic Detection of Diabetic Retinopathy using R-CNN
IRJET- Automatic Detection of Diabetic Retinopathy using R-CNNIRJET- Automatic Detection of Diabetic Retinopathy using R-CNN
IRJET- Automatic Detection of Diabetic Retinopathy using R-CNNIRJET Journal
 
Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...
Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...
Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...Mara Graziani
 
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...IRJET Journal
 

Similar to Transformer in Medical Imaging A brief review (20)

Module 1
Module 1Module 1
Module 1
 
Automatic left ventricle segmentation
Automatic left ventricle segmentationAutomatic left ventricle segmentation
Automatic left ventricle segmentation
 
IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...
IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...
IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...
 
WT in IP.ppt
WT in IP.pptWT in IP.ppt
WT in IP.ppt
 
YOLO BASED SHIP IMAGE DETECTION AND CLASSIFICATION
YOLO BASED SHIP IMAGE DETECTION AND CLASSIFICATIONYOLO BASED SHIP IMAGE DETECTION AND CLASSIFICATION
YOLO BASED SHIP IMAGE DETECTION AND CLASSIFICATION
 
Deep learning and computer vision
Deep learning and computer visionDeep learning and computer vision
Deep learning and computer vision
 
Role of oct in glaucoma
Role of oct in glaucomaRole of oct in glaucoma
Role of oct in glaucoma
 
A Survey on Retinal Area Detector From Scanning Laser Ophthalmoscope (SLO) Im...
A Survey on Retinal Area Detector From Scanning Laser Ophthalmoscope (SLO) Im...A Survey on Retinal Area Detector From Scanning Laser Ophthalmoscope (SLO) Im...
A Survey on Retinal Area Detector From Scanning Laser Ophthalmoscope (SLO) Im...
 
Zeeshan.ali.presentations
Zeeshan.ali.presentationsZeeshan.ali.presentations
Zeeshan.ali.presentations
 
IMPROVISED RADIOTHERAPY TECHNIQUES IN TELE COBALT WITHOUT MLC
IMPROVISED RADIOTHERAPY TECHNIQUES IN TELE COBALT WITHOUT MLCIMPROVISED RADIOTHERAPY TECHNIQUES IN TELE COBALT WITHOUT MLC
IMPROVISED RADIOTHERAPY TECHNIQUES IN TELE COBALT WITHOUT MLC
 
spectral domain OCT by Hala Fathi Hannot
spectral domain OCT by Hala Fathi Hannotspectral domain OCT by Hala Fathi Hannot
spectral domain OCT by Hala Fathi Hannot
 
Learning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep visionLearning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep vision
 
Optimal deep learning model For Classification of Lung Cancer on CT Images
Optimal deep learning model For Classification of  Lung Cancer on CT ImagesOptimal deep learning model For Classification of  Lung Cancer on CT Images
Optimal deep learning model For Classification of Lung Cancer on CT Images
 
Optimal Deep Learning model for Classification of Lung Cancer
Optimal Deep Learning model for Classification of Lung CancerOptimal Deep Learning model for Classification of Lung Cancer
Optimal Deep Learning model for Classification of Lung Cancer
 
IRJET- Automatic Detection of Diabetic Retinopathy using R-CNN
IRJET- Automatic Detection of Diabetic Retinopathy using R-CNNIRJET- Automatic Detection of Diabetic Retinopathy using R-CNN
IRJET- Automatic Detection of Diabetic Retinopathy using R-CNN
 
Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...
Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...
Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...
 
From APECE to ASML A Semiconductor Journey
From APECE to ASML A Semiconductor JourneyFrom APECE to ASML A Semiconductor Journey
From APECE to ASML A Semiconductor Journey
 
Oct demystified
Oct demystified  Oct demystified
Oct demystified
 
Kc3118711875
Kc3118711875Kc3118711875
Kc3118711875
 
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
 

Recently uploaded

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Transformer in Medical Imaging A brief review

  • 2. ViT The key advantage of ViT: Long-range modeling by Multi-head Self-Attention.
  • 3. Why long-range modelling is useful for medical imaging?
  • 4. Paper 1: Bilateral-ViT for Robust Fovea Localization (ISBI22 Best Paper Finalist) The fovea is a key anatomical location in the retina. Visual acuity is highest in the fovea region.
  • 5. Challenges of Robust fovea localization • The fovea is normally a darker spot however its local appearance can be complicated by retina diseases. • The shape of the blood vessel provides very useful global image structure info (long-range) for fovea localization.
  • 6. Methods • A transformer-based network taking in both retina image and vessel segmentation mask for robust fovea localization. • Overall architecture is a U2Net structure with customizations. • We formulate the fovea localization as a segmentation problem and the loss function is a dice loss + cross entropy.
  • 7. Methods – Main Branch and Vessel Branch • Main branch taking into the retina images and encoding the feature using both CNN-feature blocks and transformer block • Vessel branch taking into the vessel segmentation mask with different scale (size) and performing feature encoding
  • 8. Methods – Fusion Branch • Fusion branch merges the image feature and vessel feature in a multi-scale manner. • The fusion branch and vessel branch feature blocks are also U-Net-like. • Hence the overall network is a nested U-net (U2-Net).
  • 9. Experiments • Performed much better compared to U2Net (Pure CNN Based) Or TransUNet, especially on diseased images. • Our network = TransUNet + U2Net + Customized vessel fusion = Good performance
  • 10. Experiments • Good cross-dataset performance -> again implying Robust fovea localization
  • 11. Paper 2 - RTNet: Relation Transformer Network for Diabetic Retinopathy Multi-lesion Segmentation (TMI 2022) Lesion segmentation in retina images by considering the interaction among different lesions, and interaction between lesions and blood vessels.
  • 12. Methods • Input is a retina image. Outputs are lesion masks segmentation results (multi-class) and a vessel mask. • Vessel mask is only an auxiliary branch used only in training to provide the vessel supervisory signals. In the training, the vessel mask pseudo ground-truth is provided by another vessel segmentation trained model. • Loss function is simply standard pixel-wise cross-entropy segmentation loss
  • 13. Methods – Global Block • Global branch models the global spatial attention for each channel, such that small lesions/small structures can be highlighted via global spatial attention. • Two separate global branches for vessel features and lesions feature respectively.
  • 14. Methods – Relation Block • Relation blocks are just standard transformer blocks modeling long-range spatial interactions. • Self-attention block models the interaction among different lesions • Cross attention block models the interaction between lesion features and vessel features
  • 15. Experiments –Ablation Study Most important result: the ablation study to prove relationship blocks are effective! Cross attention head is super-important for MA(小出血点)and SE(软渗). Both MA and SE are difficult-to-segment lesions that might be helped by the presence of blood vessels/other lesions. Motivation verified.
  • 16. Experiment – Relation Blocks Attention Visualization As we can see self-attention highlights some other potential lesion regions, and cross attention highlights blood vessels.
  • 17. Experiments – SOTA comparisons • Same datasets experiments: performance is simply so-so. OK but not impressive.
  • 18. Experiments – Cross Dataset • Significantly beat the competitors on cross-dataset settings.
  • 19. Conclusion – Part 1 • Long-range interactions, and hence, the applications of transformers, are indeed important for many medical tasks, like fovea localization and lesion segmentation. • Medicial papers: strong clinical background and certain experiments (like cross dataset settings and attention visualization) would be impressive. • Next: long-range interactions are expensive quadratic complexity, especially for 3D settings -> Efficient transformers
  • 21. Paper 1: Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis (NVIDIA, CVPR22) • SOTA performance on MSD and BTCV benchmark. • MSD: a comprehensive benchmark of 10 segmentation tasks for both CT and MRI • BTCV: abdomen segmentation challenges for 13 organs. • Code and pre-trained model available!
  • 22. Why the performance can be so good? • Applies their customized SSL tricks: Masked Volume Inpainting (cutoff augmentation + reconstruction loss), Image Rotation (classifying rotation angles), and Contrastive coding • Pretraining on 5,050 publicly available CT images from various applications • SOTA model architecture (Swin UNETR)
  • 23. Swin UNETR • Encoder: a series of Swin transform blocks + down sampling • Decoder: resnet blocks + upsampling • Overall: a Unet-like architecture
  • 24. Swin Transformer Blocks • Divided the 3D tokens into subwindows and only calculate the self-attention within each subwindow. That is, we do not calculate global attention and only calculate local attention. • Global attention can be modeled in deeper network layers • To avoid boundary issues, use shifted windowing mechanism.
  • 25. Paper 2: CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation (MICCAI 21) • A hybrid CNN – transformer approach • We mainly want to know how it uses the deformable transformer (DeTrans) for efficiently modeling long-range interactions.
  • 26. DEFORMABLE DETR (DETR): DEFORMABLE TRANSFORMERS FOR END-TO-END OBJECT DETECTION • Do not perform long-range interaction from query pixel to all image pixels. • Instead, sample a smaller number of image positions (learned sampling offsets), and only calculate attention on the sampled image positions.
  • 27. Conclusion – Part 2 • Some established efficient transformer techniques (Swin and deformable sampling) • Swin UNETR: a strong baseline for starting your medical segmentation projects.
  • 28. Discussion Why do you think transformer or long-range interaction helps in your machine learning projects?