Semantic Segmentation AIML Project

•

0 likes•104 views

Comparative study of three state-of-the-art papers on Semantic segmentation, a technique to identify each and every pixel of an image to class. The approach used is using Transformers and the papers we compare were using different techniques like Swin Transformer, Segmenter and SegFormer. Learning need much more improvements.

Technology

PROJECT
Applied AI and ML Cohort 2
SEMANTIC SEGMENTATION
AARUNI PARIMAL
aarupari@gmail.com
Presented by:
HITESH KUMAR
proudhitesh@gmail.com
GROUP 9
Project Guide:
Prof. ADITYA NIGAM
aditya@iitmandi.ac.in
Prof. ARNAV BHAVASAR
arnav@iitmandi.ac.in
TA Mentors:
ANOUSHKA BANARJEE
s19016@students.iitmandi.ac.in
RANJEET RANJAN
ranjanjharanjeet@gmail.com

OBJECTIVE
The object of this presentation is to fulfil program
requirement for accomplishing this course and to learn
something practical by solving a real problem and follow
research methodology.
Further towards research the objective of this study to
solve a computer vision problem called semantic
segmentation and perform it using Transformers which
was a topic of syllabus of this program.
PURPOSE

SEMANTIC SEGMENTATION
Segmentation is a computer vision problem to identify and label
each and every pixel of an image into classes.
INTRODUCTION
Identification and classification of each object in an image
Classification of each pixel according to the corresponding object

COMPUTER VISION PROBLE
IMAGE SEGMANTATION
P1 P3
P2
PERSON
Object
Identification
PERSON PERSON
PERSON
PERSON PERSON
PERSON
Instance
Segmentation
Semantic
Segmentation
Original Image
Position and
Identification Mask
Single Object
Single Class Mask
Single Object
Multi Class Mask
Fig: showing difference between identification, instance segmentation and semantic segmentation.

BUILDING MODEL
Our initial approach to train our model was by using
“Transformers”. We replicated model proposed in a paper
and then thought to improve.
Because we were unable to achieve any improvement
over what was already given in the paper, so we did a
comparative analysis of 3 state of the art papers. Also we
generated few image, video sample and also tried
same on live stream.
APPROACH

BUILDING MODEL
TRANSFORMERS
Source: https://jalammar.github.io/illustrated-transformer/

BUILDING MODEL
TRANSFORMER
Inspired by its results in NLP
Treats the problem as sequence-to-sequence conversion
Divide image into patches, like word tokens in text

BUILDING MODEL
DATASET
ADE20K
20,210 training images
2,000 validation images
150 classes
https://groups.csail.mit.edu/vision/datasets/ADE20K/

BUILDING MODEL
METHODS / MODELS
Segmenter
SegFormer
Swin Transform

SEGMENTER
ARCITECTURE
Source: https://arxiv.org/pdf/2105.05633

SEGFORMER
ARCITECTURE
Source: https://arxiv.org/pdf/2105.15203

SWIN TRANSFORMER
ARCITECTURE
Source: https://arxiv.org/pdf/2103.14030

TRAINING
ENVORNMENT / TOOLS
Google CoLab pro
PyTorch V1.11
Implemented directly from official source code
Using MMSegmentation library by Open MMLab for SigFormer and Swin
Default configurations with smallest image and patch size
FFMPEG tool to create side by side sample video

BUILDING MODEL
RESULT – SEGMENTER
Training 1:
epochs = 1, mIoU = 12.08
Training 2:
epochs = 64, mIoU = 8.54
Training 3:
epochs = 64, mIoU = 18.98
Training 4:
epochs = 64, mIoU = 38.37

BUILDING MODEL
RESULT – SegFormer
Training 1:
epochs = 16, mIoU = 11.53
mIoU score Training loss

Training 1:
epochs = 16, mIoU = 11.53
BUILDING MODEL
RESULT – Swin Transformer
mIoU score Training loss

FUTURE POSSIBILITIES
Train for more epochs
Use better pretrained weights for feature extraction
Evaluate on different datasets
Implement the model on videos
ON IMPROVEMENTS

LITERATURE REVIEW
REFERENCES
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
[2020]
Segmenter: Transformer for Semantic Segmentation [2021]
SegFormer: Simple and Efficient Design for Semantic Segmentation with
Transformers [2021] Swin
Transformer: Hierarchical Vision Transformer using Shifted Windows [2021]

GRATITUDE
WE LEARNT A LOT
All IIT Mandi Faculty members
WilyNXT team
Mentor TAs
Master Class Mentors
Group 9 Team members
And all other background team who made this
possible.

What's hot

[Paper Reading] Attention is All You NeedDaiki Tanaka

Transformer in Computer VisionDongmin Choi

TransformersAnup Joseph

Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya

An introduction to the Transformers architecture and BERTSuman Debnath

Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Sergey Karayev

Natural language processing and transformer modelsDing Li

BertAbdallah Bashir

Subword tokenizersHa Loc Do

BERT introductionHanwha System / ICT

Deep Generative Models Chia-Wen Cheng

NLP using transformers Arvind Devaraj

Thomas Wolf "Transfer learning in NLP"Fwdays

Semantic Segmentation Methods using Deep LearningSungjoon Choi

BERTMohd Shukri Hasan

NLP State of the Art | BERTshaurya uppal

BERT Finetuning Webinar Presentationbhavesh_physics

Deep learning for NLP and TransformerArvind Devaraj

Transformers in Vision: From Zero to HeroBill Liu

DETR ECCV20Mengmeng Xu

What's hot (20)

[Paper Reading] Attention is All You Need

Transformer in Computer Vision

Transformers

Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018

An introduction to the Transformers architecture and BERT

Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)

Natural language processing and transformer models

Bert

Subword tokenizers

BERT introduction

Deep Generative Models

NLP using transformers

Thomas Wolf "Transfer learning in NLP"

Semantic Segmentation Methods using Deep Learning

BERT

NLP State of the Art | BERT

BERT Finetuning Webinar Presentation

Deep learning for NLP and Transformer

Transformers in Vision: From Zero to Hero

DETR ECCV20

Similar to Semantic Segmentation AIML Project

Design pattern applicationgayatri thakur

UNIT1_Himani Sharma.pptxAman287268

HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNINGIRJET Journal

Minor Project Synopsis on Data Structure VisualizerRonitShrivastava057

FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDSIRJET Journal

Exploring Generating AI with Diffusion ModelsKonfHubTechConferenc

mainppt-210725060740.pdfSTYLISHGAMER1

Attendance Management System using Face RecognitionNanditaDutta4

mainppt-210725060740.pdf2235JaydeepKonkar

Presentation image processing(1)cegonsoft1999

Presentation image processingcegonsoft1999

Automated Image Captioning – Model Based on CNN – GRU ArchitectureIRJET Journal

E Learningnaedwards21

Com apps briefPreston Liew

IRJET- Object Detection in an Image using Convolutional Neural NetworkIRJET Journal

Preliminry reportJiten Ahuja

Presentation on 6 month Training Project (e-Notice App)Priyanka Kapoor

Dependency injectionMindfire Solutions

Web based Software Developmentdaveparky

Similar to Semantic Segmentation AIML Project (20)

Design pattern application

UNIT1_Himani Sharma.pptx

HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING

Minor Project Synopsis on Data Structure Visualizer

FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS

Exploring Generating AI with Diffusion Models

mainppt-210725060740.pdf

Attendance Management System using Face Recognition

mainppt-210725060740.pdf

Presentation image processing(1)

Presentation image processing

Automated Image Captioning – Model Based on CNN – GRU Architecture

E Learning

Com apps brief

IRJET- Object Detection in an Image using Convolutional Neural Network

Preliminry report

Presentation on 6 month Training Project (e-Notice App)

Dependency injection

Web based Software Development

Recently uploaded

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Histor y of HAM Radio presentation slidevu2urc

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Scaling API-first – The story of a global engineering organizationRadu Cotescu

A Call to Action for Generative AI in 2024Results

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

🐬 The future of MySQL is Postgres 🐘RTylerCroy

Recently uploaded (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

The Codex of Business Writing Software for Real-World Solutions 2.pptx

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Histor y of HAM Radio presentation slide

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Advantages of Hiring UIUX Design Service Providers for Your Business

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Boost PC performance: How more available memory can improve productivity

The 7 Things I Know About Cyber Security After 25 Years | April 2024

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Scaling API-first – The story of a global engineering organization

A Call to Action for Generative AI in 2024

Finology Group – Insurtech Innovation Award 2024

08448380779 Call Girls In Friends Colony Women Seeking Men

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

🐬 The future of MySQL is Postgres 🐘

Semantic Segmentation AIML Project

1. PROJECT Applied AI and ML Cohort 2 SEMANTIC SEGMENTATION AARUNI PARIMAL aarupari@gmail.com Presented by: HITESH KUMAR proudhitesh@gmail.com GROUP 9 Project Guide: Prof. ADITYA NIGAM aditya@iitmandi.ac.in Prof. ARNAV BHAVASAR arnav@iitmandi.ac.in TA Mentors: ANOUSHKA BANARJEE s19016@students.iitmandi.ac.in RANJEET RANJAN ranjanjharanjeet@gmail.com

2. OBJECTIVE The object of this presentation is to fulfil program requirement for accomplishing this course and to learn something practical by solving a real problem and follow research methodology. Further towards research the objective of this study to solve a computer vision problem called semantic segmentation and perform it using Transformers which was a topic of syllabus of this program. PURPOSE

3. SEMANTIC SEGMENTATION Segmentation is a computer vision problem to identify and label each and every pixel of an image into classes. INTRODUCTION Identification and classification of each object in an image Classification of each pixel according to the corresponding object

4. COMPUTER VISION PROBLE IMAGE SEGMANTATION P1 P3 P2 PERSON Object Identification PERSON PERSON PERSON PERSON PERSON PERSON Instance Segmentation Semantic Segmentation Original Image Position and Identification Mask Single Object Single Class Mask Single Object Multi Class Mask Fig: showing difference between identification, instance segmentation and semantic segmentation.

5. BUILDING MODEL Our initial approach to train our model was by using “Transformers”. We replicated model proposed in a paper and then thought to improve. Because we were unable to achieve any improvement over what was already given in the paper, so we did a comparative analysis of 3 state of the art papers. Also we generated few image, video sample and also tried same on live stream. APPROACH

6. BUILDING MODEL TRANSFORMERS Source: https://jalammar.github.io/illustrated-transformer/

7. BUILDING MODEL TRANSFORMER Inspired by its results in NLP Treats the problem as sequence-to-sequence conversion Divide image into patches, like word tokens in text

8. BUILDING MODEL DATASET ADE20K 20,210 training images 2,000 validation images 150 classes https://groups.csail.mit.edu/vision/datasets/ADE20K/

9. BUILDING MODEL METHODS / MODELS Segmenter SegFormer Swin Transform

10. SEGMENTER ARCITECTURE Source: https://arxiv.org/pdf/2105.05633

11. SEGFORMER ARCITECTURE Source: https://arxiv.org/pdf/2105.15203

12. SWIN TRANSFORMER ARCITECTURE Source: https://arxiv.org/pdf/2103.14030

13. TRAINING ENVORNMENT / TOOLS Google CoLab pro PyTorch V1.11 Implemented directly from official source code Using MMSegmentation library by Open MMLab for SigFormer and Swin Default configurations with smallest image and patch size FFMPEG tool to create side by side sample video

14. RESULTS INFERENCE - 1

15. RESULTS INFERENCE - 2

16. RESULTS INFERENCE - 3

17. RESULTS INFERENCE - 4

18. RESULTS INFERENCE - 5

19. BUILDING MODEL RESULT – SEGMENTER Training 1: epochs = 1, mIoU = 12.08 Training 2: epochs = 64, mIoU = 8.54 Training 3: epochs = 64, mIoU = 18.98 Training 4: epochs = 64, mIoU = 38.37

20. BUILDING MODEL RESULT – SegFormer Training 1: epochs = 16, mIoU = 11.53 mIoU score Training loss

21. Training 1: epochs = 16, mIoU = 11.53 BUILDING MODEL RESULT – Swin Transformer mIoU score Training loss

22. FUTURE POSSIBILITIES Train for more epochs Use better pretrained weights for feature extraction Evaluate on different datasets Implement the model on videos ON IMPROVEMENTS

23. LITERATURE REVIEW REFERENCES An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [2020] Segmenter: Transformer for Semantic Segmentation [2021] SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers [2021] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [2021]

24. GRATITUDE WE LEARNT A LOT All IIT Mandi Faculty members WilyNXT team Mentor TAs Master Class Mentors Group 9 Team members And all other background team who made this possible.

25. QUESTIONS

26. THANKS

Semantic Segmentation AIML Project

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Semantic Segmentation AIML Project

Similar to Semantic Segmentation AIML Project (20)

Recently uploaded

Recently uploaded (20)

Semantic Segmentation AIML Project