SlideShare a Scribd company logo
1 of 16
2021 CVPR
Jaemin Jeong Seminar 2
Bisenet
Jaemin Jeong Seminar 3
Bisenet-V2
Jaemin Jeong Seminar 4
 Bisenet has been proved to be a popular two-stream network for real-time segmentation.
 However, its principle of adding an extra path to encode spatial information is time-consuming, and the
backbones borrowed from pretrained tasks, e.g., image classification, may be inefficient for image
segmentation due to the deficiency of task specific design.
 They propose a novel and efficient structure named Short-Term Dense Concatenate network (STDC network)
by removing structure redundancy.
 They gradually reduce the dimension of feature maps and use the aggregation of them for image
representation.
 In the decoder, they propose a Detail Aggregation module by integration the learning of spatial information
into low-level layers in single-stream manner
 Finally, the low-level features and deep features are fused to predict the final segmentation results.
 (Cityscape) we achieve 71.9% mIoU on the test set with a speed of 250.4 FPS on NVIDIA GTX 1080Ti, which
is 45.2% faster than the latest methods, and achieve 76.8% mIoU with 97.0 FPS while inferring on higher
resolution images.
Abstract
Jaemin Jeong Seminar 5
Performance
Jaemin Jeong Seminar 6
 They design a Short-Term Dense Concatenate module (STDC module) to extract deep features
with scalable receptive field and multi-scale information. This module promotes the performance of
our STDC network with affordable computational cost.
 They propose the Detail Aggregation module to learn the decoder, leading to more precise
preservation of spatial details in low-level layers without extra computation cost in the inference
time.
 They conduct extensive experiments to present the effectiveness of our methods. The experiment
results present that STDC networks achieve new state-of-the-art results on ImageNet, Cityscapes
and CamVid.
 Specifically, our STDC1-Seg50 achieves 71.9% mIoU on the Cityscapes test set at a speed of
250.4 FPS on one NVIDIA GTX 1080Ti card. Under the same experiment setting, our STDC2-
Seg75 achieves 76.8% mIoU at a speed of 97.0 FPS.
Introduction
Jaemin Jeong Seminar 7
 Short-Term Dense Concatenate Module
π‘₯𝑖 = πΆπ‘œπ‘›π‘£π‘‹π‘– π‘₯π‘–βˆ’1, π‘˜π‘–
π‘₯π‘œπ‘’π‘‘π‘π‘’π‘‘ = 𝐹(π‘₯1, π‘₯2, … , π‘₯𝑛)
ConvX includes one convolutional layer, one batch
normalization layer and ReLU activation layer, and π‘˜π‘– is
the kernel size of convolutional layer.
Design of Encoding Network
Jaemin Jeong Seminar 8
Design of Encoding Network
 𝑆 : stride
 𝑅 : repeat
 𝐢 : output channels
 𝑀 : input channel
 𝑁 : output channel
Jaemin Jeong Seminar 9
Network Architecture
Jaemin Jeong Seminar 10
 Seg Head includes a 3Γ—3 Conv-BN-ReLU operator followed with a 1 Γ— 1 convolution to get the
output dimension N, which is set as the number of classes.
 we upsample the detail feature maps to the original size and fuse it with a trainable 1 Γ— 1
convolution for dynamic re-wegihting.
 Finally, we adopt a threshold 0.1 to convert the predicted details to the final binary detail ground-
truth with boundary and corner informations.
 we use a Detail Head to produce the detail map, which guide the shallow layer to encode spatial
information.
 Detail Head includes a 3 Γ— 3 Conv-BN-ReLU operator followed with a 1 Γ— 1 convolution to get the
output detail map.
Network Architecture
Jaemin Jeong Seminar 11
Detail Loss
 Since the number of detail pixels is much less than the non-detail pixels, detail prediction is a class
imbalance problem.
 we adopt binary cross-entropy and dice loss to jointly optimize the detail learning.
 Dice loss measures the overlap between predict maps and ground-truth. Also, it is insensitive to
the number of foreground/background pixels, which means it can alleviating the class-imbalance
problem.
πΏπ‘‘π‘’π‘‘π‘Žπ‘–π‘™ 𝑝𝑑, 𝑔𝑑 = 𝐿𝑑𝑖𝑐𝑒 𝑝𝑑, 𝑔𝑑 + 𝐿𝑏𝑐𝑒(𝑝𝑑, 𝑔𝑑)
𝐿𝑑𝑖𝑐𝑒 𝑝𝑑, 𝑔𝑑 = 1 βˆ’
2 𝑖
π»Γ—π‘Š
𝑝𝑑
𝑖
𝑔𝑑
𝑖
+ πœ–
𝑖
π»Γ—π‘Š
𝑝𝑑
𝑖 2
+ 𝑖
π»Γ—π‘Š
𝑔𝑑
𝑖 2
+ πœ–
 πœ– = 1
Detail Ground-truth Generation
Jaemin Jeong Seminar 12
Experiments
Jaemin Jeong Seminar 13
Experiments
Jaemin Jeong Seminar 14
Experiments
Jaemin Jeong Seminar 15
Experiments
Jaemin Jeong Seminar 16
Experiments
We use 50 and 75 after the method name to represent the
input size 512Γ—1024 and 768Γ—1536 respectively.

More Related Content

What's hot

facility layout paper
 facility layout paper facility layout paper
facility layout paperSaurabh Tiwary
Β 
Parallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemMLParallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemMLJanani C
Β 
What Is Dynamic Programming? | Dynamic Programming Explained | Programming Fo...
What Is Dynamic Programming? | Dynamic Programming Explained | Programming Fo...What Is Dynamic Programming? | Dynamic Programming Explained | Programming Fo...
What Is Dynamic Programming? | Dynamic Programming Explained | Programming Fo...Simplilearn
Β 
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...Artem Lutov
Β 
CAD: introduction to floorplanning
CAD:  introduction to floorplanningCAD:  introduction to floorplanning
CAD: introduction to floorplanningTeam-VLSI-ITMU
Β 
Gray Image Watermarking using slant transform - digital image processing
Gray Image Watermarking using slant transform - digital image processingGray Image Watermarking using slant transform - digital image processing
Gray Image Watermarking using slant transform - digital image processingNITHIN KALLE PALLY
Β 
Fuzzy control design_tutorial
Fuzzy control design_tutorialFuzzy control design_tutorial
Fuzzy control design_tutorialResul Çâteli
Β 
Numerical analysis m3 l2slides
Numerical analysis  m3 l2slidesNumerical analysis  m3 l2slides
Numerical analysis m3 l2slidesSHAMJITH KM
Β 
Faster computation with matlab
Faster computation with matlabFaster computation with matlab
Faster computation with matlabMuhammad Alli
Β 
Fpga implementation of optimal step size nlms algorithm and its performance a...
Fpga implementation of optimal step size nlms algorithm and its performance a...Fpga implementation of optimal step size nlms algorithm and its performance a...
Fpga implementation of optimal step size nlms algorithm and its performance a...eSAT Publishing House
Β 
Fpga implementation of optimal step size nlms algorithm and its performance a...
Fpga implementation of optimal step size nlms algorithm and its performance a...Fpga implementation of optimal step size nlms algorithm and its performance a...
Fpga implementation of optimal step size nlms algorithm and its performance a...eSAT Journals
Β 
Craft software for dummies
Craft software for dummiesCraft software for dummies
Craft software for dummiesRama Renspandy
Β 
Lecture 10 (Digital Image Processing)
Lecture 10 (Digital Image Processing)Lecture 10 (Digital Image Processing)
Lecture 10 (Digital Image Processing)VARUN KUMAR
Β 
Computer graphic software and data base
Computer graphic software and data baseComputer graphic software and data base
Computer graphic software and data baseSiddeshKumar N M
Β 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deploymenttaeseon ryu
Β 
Block diagram representation of DT systems
Block diagram representation of DT systemsBlock diagram representation of DT systems
Block diagram representation of DT systemsDr.SHANTHI K.G
Β 
Popular image restoration technique
Popular image restoration techniquePopular image restoration technique
Popular image restoration techniqueVARUN KUMAR
Β 

What's hot (20)

facility layout paper
 facility layout paper facility layout paper
facility layout paper
Β 
Parallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemMLParallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemML
Β 
Plant Layout Algorithm
Plant Layout AlgorithmPlant Layout Algorithm
Plant Layout Algorithm
Β 
What Is Dynamic Programming? | Dynamic Programming Explained | Programming Fo...
What Is Dynamic Programming? | Dynamic Programming Explained | Programming Fo...What Is Dynamic Programming? | Dynamic Programming Explained | Programming Fo...
What Is Dynamic Programming? | Dynamic Programming Explained | Programming Fo...
Β 
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
Β 
CAD: introduction to floorplanning
CAD:  introduction to floorplanningCAD:  introduction to floorplanning
CAD: introduction to floorplanning
Β 
Gray Image Watermarking using slant transform - digital image processing
Gray Image Watermarking using slant transform - digital image processingGray Image Watermarking using slant transform - digital image processing
Gray Image Watermarking using slant transform - digital image processing
Β 
Project 1
Project 1Project 1
Project 1
Β 
Fuzzy control design_tutorial
Fuzzy control design_tutorialFuzzy control design_tutorial
Fuzzy control design_tutorial
Β 
Numerical analysis m3 l2slides
Numerical analysis  m3 l2slidesNumerical analysis  m3 l2slides
Numerical analysis m3 l2slides
Β 
Faster computation with matlab
Faster computation with matlabFaster computation with matlab
Faster computation with matlab
Β 
Fpga implementation of optimal step size nlms algorithm and its performance a...
Fpga implementation of optimal step size nlms algorithm and its performance a...Fpga implementation of optimal step size nlms algorithm and its performance a...
Fpga implementation of optimal step size nlms algorithm and its performance a...
Β 
Fpga implementation of optimal step size nlms algorithm and its performance a...
Fpga implementation of optimal step size nlms algorithm and its performance a...Fpga implementation of optimal step size nlms algorithm and its performance a...
Fpga implementation of optimal step size nlms algorithm and its performance a...
Β 
Craft software for dummies
Craft software for dummiesCraft software for dummies
Craft software for dummies
Β 
International Journal of Engineering Inventions (IJEI)
International Journal of Engineering Inventions (IJEI)International Journal of Engineering Inventions (IJEI)
International Journal of Engineering Inventions (IJEI)
Β 
Lecture 10 (Digital Image Processing)
Lecture 10 (Digital Image Processing)Lecture 10 (Digital Image Processing)
Lecture 10 (Digital Image Processing)
Β 
Computer graphic software and data base
Computer graphic software and data baseComputer graphic software and data base
Computer graphic software and data base
Β 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
Β 
Block diagram representation of DT systems
Block diagram representation of DT systemsBlock diagram representation of DT systems
Block diagram representation of DT systems
Β 
Popular image restoration technique
Popular image restoration techniquePopular image restoration technique
Popular image restoration technique
Β 

Similar to 2022-01-17-Rethinking_Bisenet.pptx

IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
Β 
Machine Vision on Embedded Hardware
Machine Vision on Embedded HardwareMachine Vision on Embedded Hardware
Machine Vision on Embedded HardwareJash Shah
Β 
33 8951 suseela g suseela paper8 (edit)new2
33 8951 suseela g   suseela paper8 (edit)new233 8951 suseela g   suseela paper8 (edit)new2
33 8951 suseela g suseela paper8 (edit)new2IAESIJEECS
Β 
33 8951 suseela g suseela paper8 (edit)new2
33 8951 suseela g   suseela paper8 (edit)new233 8951 suseela g   suseela paper8 (edit)new2
33 8951 suseela g suseela paper8 (edit)new2IAESIJEECS
Β 
34 8951 suseela g suseela paper8 (edit)new
34 8951 suseela g   suseela paper8 (edit)new34 8951 suseela g   suseela paper8 (edit)new
34 8951 suseela g suseela paper8 (edit)newIAESIJEECS
Β 
Kassem2009
Kassem2009Kassem2009
Kassem2009lazchi
Β 
Deep learning for image super resolution
Deep learning for image super resolutionDeep learning for image super resolution
Deep learning for image super resolutionPrudhvi Raj
Β 
Deep learning for image super resolution
Deep learning for image super resolutionDeep learning for image super resolution
Deep learning for image super resolutionPrudhvi Raj
Β 
Pipelined Architecture of 2D-DCT, Quantization and ZigZag Process for JPEG Im...
Pipelined Architecture of 2D-DCT, Quantization and ZigZag Process for JPEG Im...Pipelined Architecture of 2D-DCT, Quantization and ZigZag Process for JPEG Im...
Pipelined Architecture of 2D-DCT, Quantization and ZigZag Process for JPEG Im...VLSICS Design
Β 
PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...
PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...
PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...VLSICS Design
Β 
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...Stochastic Computing Correlation Utilization in Convolutional Neural Network ...
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...TELKOMNIKA JOURNAL
Β 
A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...IRJET Journal
Β 
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...rinzindorjej
Β 
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...rinzindorjej
Β 
Jv2517361741
Jv2517361741Jv2517361741
Jv2517361741IJERA Editor
Β 
Jv2517361741
Jv2517361741Jv2517361741
Jv2517361741IJERA Editor
Β 
Paper id 25201467
Paper id 25201467Paper id 25201467
Paper id 25201467IJRAT
Β 
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...IJERA Editor
Β 

Similar to 2022-01-17-Rethinking_Bisenet.pptx (20)

IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
Β 
Machine Vision on Embedded Hardware
Machine Vision on Embedded HardwareMachine Vision on Embedded Hardware
Machine Vision on Embedded Hardware
Β 
33 8951 suseela g suseela paper8 (edit)new2
33 8951 suseela g   suseela paper8 (edit)new233 8951 suseela g   suseela paper8 (edit)new2
33 8951 suseela g suseela paper8 (edit)new2
Β 
33 8951 suseela g suseela paper8 (edit)new2
33 8951 suseela g   suseela paper8 (edit)new233 8951 suseela g   suseela paper8 (edit)new2
33 8951 suseela g suseela paper8 (edit)new2
Β 
34 8951 suseela g suseela paper8 (edit)new
34 8951 suseela g   suseela paper8 (edit)new34 8951 suseela g   suseela paper8 (edit)new
34 8951 suseela g suseela paper8 (edit)new
Β 
Kassem2009
Kassem2009Kassem2009
Kassem2009
Β 
Deep learning for image super resolution
Deep learning for image super resolutionDeep learning for image super resolution
Deep learning for image super resolution
Β 
Deep learning for image super resolution
Deep learning for image super resolutionDeep learning for image super resolution
Deep learning for image super resolution
Β 
Pipelined Architecture of 2D-DCT, Quantization and ZigZag Process for JPEG Im...
Pipelined Architecture of 2D-DCT, Quantization and ZigZag Process for JPEG Im...Pipelined Architecture of 2D-DCT, Quantization and ZigZag Process for JPEG Im...
Pipelined Architecture of 2D-DCT, Quantization and ZigZag Process for JPEG Im...
Β 
PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...
PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...
PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...
Β 
An35225228
An35225228An35225228
An35225228
Β 
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...Stochastic Computing Correlation Utilization in Convolutional Neural Network ...
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...
Β 
A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...
Β 
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
Β 
6119ijcsitce01
6119ijcsitce016119ijcsitce01
6119ijcsitce01
Β 
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
Β 
Jv2517361741
Jv2517361741Jv2517361741
Jv2517361741
Β 
Jv2517361741
Jv2517361741Jv2517361741
Jv2517361741
Β 
Paper id 25201467
Paper id 25201467Paper id 25201467
Paper id 25201467
Β 
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
Β 

More from JAEMINJEONG5

Jaemin_230701_Simple_Copy_paste.pptx
Jaemin_230701_Simple_Copy_paste.pptxJaemin_230701_Simple_Copy_paste.pptx
Jaemin_230701_Simple_Copy_paste.pptxJAEMINJEONG5
Β 
2021 05-04-u2-net
2021 05-04-u2-net2021 05-04-u2-net
2021 05-04-u2-netJAEMINJEONG5
Β 
2021 04-04-google nmt
2021 04-04-google nmt2021 04-04-google nmt
2021 04-04-google nmtJAEMINJEONG5
Β 
2021 04-03-sean
2021 04-03-sean2021 04-03-sean
2021 04-03-seanJAEMINJEONG5
Β 
2021 03-02-spade
2021 03-02-spade2021 03-02-spade
2021 03-02-spadeJAEMINJEONG5
Β 
2021 03-02-distributed representations-of_words_and_phrases
2021 03-02-distributed representations-of_words_and_phrases2021 03-02-distributed representations-of_words_and_phrases
2021 03-02-distributed representations-of_words_and_phrasesJAEMINJEONG5
Β 
2021 03-02-transformer interpretability
2021 03-02-transformer interpretability2021 03-02-transformer interpretability
2021 03-02-transformer interpretabilityJAEMINJEONG5
Β 
2021 03-01-on the relationship between self-attention and convolutional layers
2021 03-01-on the relationship between self-attention and convolutional layers2021 03-01-on the relationship between self-attention and convolutional layers
2021 03-01-on the relationship between self-attention and convolutional layersJAEMINJEONG5
Β 
2021 01-04-learning filter-basis
2021 01-04-learning filter-basis2021 01-04-learning filter-basis
2021 01-04-learning filter-basisJAEMINJEONG5
Β 
2021 01-02-linformer
2021 01-02-linformer2021 01-02-linformer
2021 01-02-linformerJAEMINJEONG5
Β 
2020 12-04-shake shake
2020 12-04-shake shake2020 12-04-shake shake
2020 12-04-shake shakeJAEMINJEONG5
Β 
2020 12-03-vit
2020 12-03-vit2020 12-03-vit
2020 12-03-vitJAEMINJEONG5
Β 
2020 12-2-detr
2020 12-2-detr2020 12-2-detr
2020 12-2-detrJAEMINJEONG5
Β 
2020 11 4_bag_of_tricks
2020 11 4_bag_of_tricks2020 11 4_bag_of_tricks
2020 11 4_bag_of_tricksJAEMINJEONG5
Β 
2020 11 2_automated sleep stage scoring of the sleep heart
2020 11 2_automated sleep stage scoring of the sleep heart2020 11 2_automated sleep stage scoring of the sleep heart
2020 11 2_automated sleep stage scoring of the sleep heartJAEMINJEONG5
Β 
2020 11 1_sleep_net
2020 11 1_sleep_net2020 11 1_sleep_net
2020 11 1_sleep_netJAEMINJEONG5
Β 
2020 12-1-adam w
2020 12-1-adam w2020 12-1-adam w
2020 12-1-adam wJAEMINJEONG5
Β 
2020 11 3_face_detection
2020 11 3_face_detection2020 11 3_face_detection
2020 11 3_face_detectionJAEMINJEONG5
Β 
white blood cell classification
white blood cell classificationwhite blood cell classification
white blood cell classificationJAEMINJEONG5
Β 

More from JAEMINJEONG5 (19)

Jaemin_230701_Simple_Copy_paste.pptx
Jaemin_230701_Simple_Copy_paste.pptxJaemin_230701_Simple_Copy_paste.pptx
Jaemin_230701_Simple_Copy_paste.pptx
Β 
2021 05-04-u2-net
2021 05-04-u2-net2021 05-04-u2-net
2021 05-04-u2-net
Β 
2021 04-04-google nmt
2021 04-04-google nmt2021 04-04-google nmt
2021 04-04-google nmt
Β 
2021 04-03-sean
2021 04-03-sean2021 04-03-sean
2021 04-03-sean
Β 
2021 03-02-spade
2021 03-02-spade2021 03-02-spade
2021 03-02-spade
Β 
2021 03-02-distributed representations-of_words_and_phrases
2021 03-02-distributed representations-of_words_and_phrases2021 03-02-distributed representations-of_words_and_phrases
2021 03-02-distributed representations-of_words_and_phrases
Β 
2021 03-02-transformer interpretability
2021 03-02-transformer interpretability2021 03-02-transformer interpretability
2021 03-02-transformer interpretability
Β 
2021 03-01-on the relationship between self-attention and convolutional layers
2021 03-01-on the relationship between self-attention and convolutional layers2021 03-01-on the relationship between self-attention and convolutional layers
2021 03-01-on the relationship between self-attention and convolutional layers
Β 
2021 01-04-learning filter-basis
2021 01-04-learning filter-basis2021 01-04-learning filter-basis
2021 01-04-learning filter-basis
Β 
2021 01-02-linformer
2021 01-02-linformer2021 01-02-linformer
2021 01-02-linformer
Β 
2020 12-04-shake shake
2020 12-04-shake shake2020 12-04-shake shake
2020 12-04-shake shake
Β 
2020 12-03-vit
2020 12-03-vit2020 12-03-vit
2020 12-03-vit
Β 
2020 12-2-detr
2020 12-2-detr2020 12-2-detr
2020 12-2-detr
Β 
2020 11 4_bag_of_tricks
2020 11 4_bag_of_tricks2020 11 4_bag_of_tricks
2020 11 4_bag_of_tricks
Β 
2020 11 2_automated sleep stage scoring of the sleep heart
2020 11 2_automated sleep stage scoring of the sleep heart2020 11 2_automated sleep stage scoring of the sleep heart
2020 11 2_automated sleep stage scoring of the sleep heart
Β 
2020 11 1_sleep_net
2020 11 1_sleep_net2020 11 1_sleep_net
2020 11 1_sleep_net
Β 
2020 12-1-adam w
2020 12-1-adam w2020 12-1-adam w
2020 12-1-adam w
Β 
2020 11 3_face_detection
2020 11 3_face_detection2020 11 3_face_detection
2020 11 3_face_detection
Β 
white blood cell classification
white blood cell classificationwhite blood cell classification
white blood cell classification
Β 

Recently uploaded

An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
Β 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxsomshekarkn64
Β 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
Β 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
Β 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
Β 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
Β 
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...9953056974 Low Rate Call Girls In Saket, Delhi NCR
Β 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
Β 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
Β 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
Β 
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
Β 
Piping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringPiping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringJuanCarlosMorales19600
Β 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
Β 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
Β 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
Β 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
Β 
Call Us ≽ 8377877756 β‰Ό Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 β‰Ό Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 β‰Ό Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 β‰Ό Call Girls In Shastri Nagar (Delhi)dollysharma2066
Β 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
Β 

Recently uploaded (20)

An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
Β 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptx
Β 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
Β 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
Β 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
Β 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
Β 
young call girls in Rajiv ChowkπŸ” 9953056974 πŸ” Delhi escort Service
young call girls in Rajiv ChowkπŸ” 9953056974 πŸ” Delhi escort Serviceyoung call girls in Rajiv ChowkπŸ” 9953056974 πŸ” Delhi escort Service
young call girls in Rajiv ChowkπŸ” 9953056974 πŸ” Delhi escort Service
Β 
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
πŸ”9953056974πŸ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
Β 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
Β 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
Β 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Β 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
Β 
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort service
Β 
Piping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringPiping Basic stress analysis by engineering
Piping Basic stress analysis by engineering
Β 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
Β 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
Β 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Β 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
Β 
Call Us ≽ 8377877756 β‰Ό Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 β‰Ό Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 β‰Ό Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 β‰Ό Call Girls In Shastri Nagar (Delhi)
Β 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
Β 

2022-01-17-Rethinking_Bisenet.pptx

  • 3. Jaemin Jeong Seminar 3 Bisenet-V2
  • 4. Jaemin Jeong Seminar 4  Bisenet has been proved to be a popular two-stream network for real-time segmentation.  However, its principle of adding an extra path to encode spatial information is time-consuming, and the backbones borrowed from pretrained tasks, e.g., image classification, may be inefficient for image segmentation due to the deficiency of task specific design.  They propose a novel and efficient structure named Short-Term Dense Concatenate network (STDC network) by removing structure redundancy.  They gradually reduce the dimension of feature maps and use the aggregation of them for image representation.  In the decoder, they propose a Detail Aggregation module by integration the learning of spatial information into low-level layers in single-stream manner  Finally, the low-level features and deep features are fused to predict the final segmentation results.  (Cityscape) we achieve 71.9% mIoU on the test set with a speed of 250.4 FPS on NVIDIA GTX 1080Ti, which is 45.2% faster than the latest methods, and achieve 76.8% mIoU with 97.0 FPS while inferring on higher resolution images. Abstract
  • 5. Jaemin Jeong Seminar 5 Performance
  • 6. Jaemin Jeong Seminar 6  They design a Short-Term Dense Concatenate module (STDC module) to extract deep features with scalable receptive field and multi-scale information. This module promotes the performance of our STDC network with affordable computational cost.  They propose the Detail Aggregation module to learn the decoder, leading to more precise preservation of spatial details in low-level layers without extra computation cost in the inference time.  They conduct extensive experiments to present the effectiveness of our methods. The experiment results present that STDC networks achieve new state-of-the-art results on ImageNet, Cityscapes and CamVid.  Specifically, our STDC1-Seg50 achieves 71.9% mIoU on the Cityscapes test set at a speed of 250.4 FPS on one NVIDIA GTX 1080Ti card. Under the same experiment setting, our STDC2- Seg75 achieves 76.8% mIoU at a speed of 97.0 FPS. Introduction
  • 7. Jaemin Jeong Seminar 7  Short-Term Dense Concatenate Module π‘₯𝑖 = πΆπ‘œπ‘›π‘£π‘‹π‘– π‘₯π‘–βˆ’1, π‘˜π‘– π‘₯π‘œπ‘’π‘‘π‘π‘’π‘‘ = 𝐹(π‘₯1, π‘₯2, … , π‘₯𝑛) ConvX includes one convolutional layer, one batch normalization layer and ReLU activation layer, and π‘˜π‘– is the kernel size of convolutional layer. Design of Encoding Network
  • 8. Jaemin Jeong Seminar 8 Design of Encoding Network  𝑆 : stride  𝑅 : repeat  𝐢 : output channels  𝑀 : input channel  𝑁 : output channel
  • 9. Jaemin Jeong Seminar 9 Network Architecture
  • 10. Jaemin Jeong Seminar 10  Seg Head includes a 3Γ—3 Conv-BN-ReLU operator followed with a 1 Γ— 1 convolution to get the output dimension N, which is set as the number of classes.  we upsample the detail feature maps to the original size and fuse it with a trainable 1 Γ— 1 convolution for dynamic re-wegihting.  Finally, we adopt a threshold 0.1 to convert the predicted details to the final binary detail ground- truth with boundary and corner informations.  we use a Detail Head to produce the detail map, which guide the shallow layer to encode spatial information.  Detail Head includes a 3 Γ— 3 Conv-BN-ReLU operator followed with a 1 Γ— 1 convolution to get the output detail map. Network Architecture
  • 11. Jaemin Jeong Seminar 11 Detail Loss  Since the number of detail pixels is much less than the non-detail pixels, detail prediction is a class imbalance problem.  we adopt binary cross-entropy and dice loss to jointly optimize the detail learning.  Dice loss measures the overlap between predict maps and ground-truth. Also, it is insensitive to the number of foreground/background pixels, which means it can alleviating the class-imbalance problem. πΏπ‘‘π‘’π‘‘π‘Žπ‘–π‘™ 𝑝𝑑, 𝑔𝑑 = 𝐿𝑑𝑖𝑐𝑒 𝑝𝑑, 𝑔𝑑 + 𝐿𝑏𝑐𝑒(𝑝𝑑, 𝑔𝑑) 𝐿𝑑𝑖𝑐𝑒 𝑝𝑑, 𝑔𝑑 = 1 βˆ’ 2 𝑖 π»Γ—π‘Š 𝑝𝑑 𝑖 𝑔𝑑 𝑖 + πœ– 𝑖 π»Γ—π‘Š 𝑝𝑑 𝑖 2 + 𝑖 π»Γ—π‘Š 𝑔𝑑 𝑖 2 + πœ–  πœ– = 1 Detail Ground-truth Generation
  • 12. Jaemin Jeong Seminar 12 Experiments
  • 13. Jaemin Jeong Seminar 13 Experiments
  • 14. Jaemin Jeong Seminar 14 Experiments
  • 15. Jaemin Jeong Seminar 15 Experiments
  • 16. Jaemin Jeong Seminar 16 Experiments We use 50 and 75 after the method name to represent the input size 512Γ—1024 and 768Γ—1536 respectively.

Editor's Notes

  1. Hello The title of the paper to be presented is Rethinking BiSeNet For Real-time Semantic Segmentation. It is adopted in CVPR 2021.
  2. First, we need to know the model structure of BiseNet. Bisenet is a semantic segmentation model and consists of Spatial Path and Context Path. There are three layer in Spatial Path, and each layer consists of convolution layer with stride 2, batch normalization, and ReLU. Therefore, the size of feature map becomes 1/8 of the original image, and rich spatial information can be stored. Context Path uses a lightweight model to perform downsampling quickly, so a large Receptive Field can be obtained, and Receptive Field that maximizes global context information can be provided using global average pooling. Attention Refinement Module is a module for combining features obtained from Context Path. It obtain global context using global average pooling and compute attention refinement. It also has the advantage of being able to easily Feature Fusion Module is a module for combining two features. We cannot simply combine two features. Spatial path has low-level features and context path has high-level features. To combine this information, we connect the two features, scale them using batch normalization, and obtain global pooling and weight vectors. Spatial Path : Spatial pathμ—λŠ” μ„Έκ°œμ˜ λ ˆμ΄μ–΄κ°€ μ‘΄μž¬ν•˜λ©° 각 μΈ΅μ—μ„œ strideκ°€ 2인 μ»¨λ³Όλ£¨μ…˜μ„ μ§„ν–‰ν•˜λ©° batch nomalizationκ³Ό ReLUλ₯Ό μ§„ν–‰ν•©λ‹ˆλ‹€. λ”°λΌμ„œ feature의 ν¬κΈ°λŠ” 원본 μ΄λ―Έμ§€μ˜ 1/8이 되며 ν’λΆ€ν•œ Spatial information을 μ €μž₯ν•  수 μžˆμŠ΅λ‹ˆλ‹€. Context Path : Context Pathμ—μ„œλŠ” κ²½λŸ‰ν™”λœ λͺ¨λΈ(xception)을 μ‚¬μš©ν•΄ λΉ λ₯΄κ²Œ λ‹€μš΄ μƒ˜ν”Œλ§μ„ 진행할 수 μžˆμ–΄μ„œ 큰 Receptive Fieldλ₯Ό 얻을 수 있고 Global average pooling 을 μ΄μš©ν•΄ global context 정보λ₯Ό μ΅œλŒ€ Receptive Field μ œκ³΅ν•  수 μžˆμŠ΅λ‹ˆλ‹€. Attention Refinement Module : Context Pathμ—μ„œ 얻은 νŠΉμ§•λ“€μ„ κ²°ν•©ν•˜κΈ° μœ„ν•œ λͺ¨λ“ˆμž…λ‹ˆλ‹€. Global average pooling을 μ‚¬μš©ν•΄ Global contextλ₯Ό μ–»κ³  attention refinement λ₯Ό κ³„μ‚°ν•©λ‹ˆλ‹€. λ˜ν•œ μ—…μƒ˜ν”Œλ§ 없이 Global context 정보λ₯Ό μ‰½κ²Œ κ²°ν•©ν•  수 μžˆλŠ” μž₯점이 μžˆμŠ΅λ‹ˆλ‹€. Feature Fusion Modul (FFM) : 두가지 featureλ₯Ό κ²°ν•©ν•˜κΈ° μœ„ν•œ λͺ¨λ“ˆμž…λ‹ˆλ‹€. 두가지 featureλ₯Ό κ°„λ‹¨νžˆ κ²°ν•©ν•  수 μ—†μŠ΅λ‹ˆλ‹€. Spatial pathμ—λŠ” low-level의 featureλ₯Ό 가지고 있고 Context Pathμ—μ„œλŠ” High-level featureλ₯Ό 가지고 μžˆμŠ΅λ‹ˆλ‹€. 이 정보듀을 κ²°ν•©ν•˜κΈ° μœ„ν•΄ λ‘κ°œμ˜ featureλ₯Ό μ—°κ²°ν•˜κ³  batch nomalization을 μ‚¬μš©ν•΄ μŠ€μΌ€μΌμ„ μ‘°μ •ν•˜κ³  global pooling 및 κ°€μ€‘μΉ˜ 벑터λ₯Ό κ΅¬ν•©λ‹ˆλ‹€.
  3. bisenetv2 is an improved version of bisenet's model. First, bisenetv2 simplifies the structure by eliminating time-consuming cross-layer connections. The overall structure has been changed to a more compact structure and well-designed components. We made the context path deeper so we can encode more detail. We designed light-weight components using depth-wise convolution in spatial path. A comprehensive ablative experiment was performed. Performance has been greatly improved. Context Branch Since low-level information must be contained, it must have an abundant amount of channels. Therefore, the number of layers is reduced while increasing the amount of channels. Since it has a wide spatial size, residual connection is not performed. Spatial Branch Contrary to the detail branch, it has a small amount of channels and stacks the layers deeply. In this case, a fast-down sampling method is used to widen a wide receptive field and increase the level of feature representation. Aggregation Layer It is a layer for merging the above two branches. Because fast-down sampling is used in the semantic branch, the output is small compared to the detail branch. Therefore, we upsampling the output feature map of the semantic branch. After that, each element-wise product is processed and then added. ------------------------------------------------------------ ------------------------------------------------------------ ------------------- Context Branch It consists of a total of three layers, and each includes convolution, batch normalization, and ReLu activation functions. Finally, a feature map with the size of 1/8 of the input can be extracted. Spatial Branch Spatial Branch uses Stem Block, Context Embedding Block, Gather, and Expansion Layer structures. Stem Block Concat the two branches after downsampling in different ways. By using the following strategy, we were able to reduce computation cost and select features effectively.Context Embedding BlockAverage pooling and resisual connection are used to efficiently obtain global contextual information. Gather and Expansion Layer depth-wise convolution is used, and finally, 1X1 convolution is used to project the output of depth-wise conv. Unlike the structure of the existing mobilenet v2, the GE layer was able to obtain better feature quality by using one more 3X3 convolution. Booster Training Strategy A booster strategy is introduced for segmentation accuracy. During training, the feature representation is improved, and the computation cost is not very large during inference. Inserted between semantic branches. ꡬ쑰 λ‹¨μˆœν™” μ‹œκ°„μ΄ 많이 μ†Œμš”λ˜λŠ” cross-layer connection을 μ—†μ•° 전체적인 ꡬ쑰λ₯Ό 더 compactν•œ ꡬ쑰와 well-designed components둜 λ³€κ²½ Detail Pathλ₯Ό 더 깊게 λ§Œλ“€μ–΄μ„œ 더 λ§Žμ€ detail을 encodeν•  수 μžˆλ„λ‘ 함 (context) Semantic Pathμ—μ„œ depth-wise convolution을 μ‚¬μš©ν•œ light-weight components을 섀계 (spatial) comprehensive ablative experimentλ₯Ό μˆ˜ν–‰ μ„±λŠ₯ ν–₯상 많이 함 ---------------------------------------------------------------------------------------------------------------------- Detail Branch low-level의 정보듀이 담겨야 ν•˜λ―€λ‘œ ν’λΆ€ν•œ channel 양을 κ°€μ Έμ•Ό ν•œλ‹€. λ”°λΌμ„œ 채널 양은 많게 ν•˜λ©΄μ„œ Layer의 수λ₯Ό 쀄인닀. 넓은 spatial sizeλ₯Ό 가지고 μžˆμœΌλ―€λ‘œ residual connection을 μ§„ν–‰ν•˜μ§€ μ•ŠλŠ”λ‹€. Semantic Branch Detail branchμ™€λŠ” λ°˜λŒ€λ‘œ 적은 채널 양을 가지며 layerλ₯Ό 깊게 μŒ“λŠ”λ‹€. μ΄λ•Œ 넓은 receptive field λ„“νžˆκ³  feature representation의 level을 높이기 μœ„ν•΄ fast-down sampling 방법을 μ΄μš©ν•œλ‹€. Aggregation Layer μœ„ 두 가지 branchλ₯Ό merge ν•˜κΈ° μœ„ν•œ layer이닀. semantic branchμ—μ„œ fast-down sampling을 μ‚¬μš©ν–ˆκΈ° λ•Œλ¬Έμ— detail branch에 λΉ„ν•΄ output이 μž‘λ‹€. λ”°λΌμ„œ semantic branch의 output feature map을 upsampling ν•œλ‹€. 이후 각각 element-wise productλ₯Ό μ§„ν–‰ν•œ ν›„ λ”ν•œλ‹€. ----------------------------------------------------------------------------------------------------------------------- Detail Branch 총 3개의 layer둜 이루어져 있으며 각각 convolutionκ³Ό batch normalization, 그리고 ReLu ν™œμ„±ν™” ν•¨μˆ˜κ°€ ν¬ν•¨λ˜μ–΄μžˆλ‹€. μ΅œμ’…μ μœΌλ‘œ input의 1/8 크기의 feature map을 뽑을 수 μžˆλ‹€. Semantic Branch - Semantic BranchλŠ” Stem Blockκ³Ό Context Embedding Block, Gather and Expansion Layer ꡬ쑰λ₯Ό μ΄μš©ν•œλ‹€.Β  Stem Block μ„œλ‘œ λ‹€λ₯Έ λ°©μ‹μ˜ downsampling을 μ§„ν–‰ν•œ 이후에 두 branchλ₯Ό concat ν•œλ‹€. λ‹€μŒκ³Ό 같은 μ „λž΅μ„ μ‚¬μš©ν•¨μœΌλ‘œμ¨ computation costλ₯Ό 쀄이고 효과적으둜 featureλ₯Ό 뽑을 수 μžˆμ—ˆλ‹€. Context Embedding Block global contextual 정보λ₯Ό 효율적으둜 μ–»κΈ° μœ„ν•΄μ„œ average poolingκ³Ό resisual connection을 μ‚¬μš©ν–ˆλ‹€. Gather and Expansion Layerdepth-wise convolution을 μ΄μš©ν•˜λ©° λ§ˆμ§€λ§‰μ— 1X1 convolution을 μ΄μš©ν•΄ depth-wise conv의 output을 projection μ‹œν‚¨λ‹€. κΈ°μ‘΄ mobilenet v2의 κ΅¬μ‘°μ™€λŠ” λ‹€λ₯΄κ²Œ GE layerλŠ” 3X3 convolution을 ν•˜λ‚˜ 더 μ‚¬μš©ν•¨μœΌλ‘œμ¨ 더 쒋은 feature qualityλ₯Ό 얻을 수 μžˆμ—ˆλ‹€. Booster Training Strategysegmentation accuracyλ₯Ό μœ„ν•΄μ„œ booster strategyλ₯Ό λ„μž…ν–ˆλ‹€. νŠΈλ ˆμ΄λ‹ μ‹œ, feature representation을 ν–₯μƒμ‹œν‚€λ©° inference μ‹œμ—λŠ” computation costλŠ” 그리 크지 μ•Šλ‹€. semantic branch 사이사이에 μ‚½μž…μ‹œν‚¨λ‹€. -------------------------------------------------------------------------------------------------------------------------
  4. STDC is the proposed method in this paper. STDC has better mean IOU and inference speed compared to other methods. Bisenet은 보닀 λ†’μŠ΅λ‹ˆλ‹€. λ‹€λ₯Έ 방법듀에 λΉ„ν•΄μ„œλ„ μ›”λ“±νžˆ λ›°μ–΄λ‚©λ‹ˆλ‹€.
  5. λ§Œμ•½ κ·Έλ¦Ό c처럼 block2μ—μ„œ λ‹€μš΄μƒ˜ν”Œλ§ν•˜λ©΄ κ·Έμ „ λ ˆμ΄μ–΄λ₯Ό fusion ν•  λ•ŒλŠ” stride 2λ₯Ό κ°€μ§€λŠ” average pooling ν•˜μ—¬ fusionν•œλ‹€. If downsampling in block 2 as shown in figure c, fusion is performed by average pooling with stride 2 when fusion of the previous layer. Short-Term Dense Concatenate Module Kernel size First block 1 Rest of them 3 we focus on scalable receptive field and multi-scale informations. Low-level layers need enough channels to encode more fine-grained informations with small receptive field, while high-level layers with large receptive field focus more on high-level information induction, setting the same channel with low-level layers may cause information redundancy. Down-sample is only happened in Block2.
  6. S R C M N is stride repeat output channels input channel output channel respectively
  7. Stage 3 : 1/8 Stage 4 : 1/16 Stage 5 : 1/32 In the figure, the blue area is used for training and inference. The green area is used for training only. Laplacian Conv is just 2d convolution
  8. STDC is faster than other methods and has a high mean IOU.
  9. Detail Guidance λ₯Ό μ‚¬μš©ν–ˆμ„λ•Œμ™€ μ•ˆν–ˆμ„ λ•Œλ₯Ό λΉ„κ΅ν•©λ‹ˆλ‹€. μ‚¬μš©ν•  λ•Œ 쑰금 더 λ””ν…ŒμΌν•œ 뢀뢄을 μ‚΄λ¦¬κ²Œ λ©λ‹ˆλ‹€. In this figure, they compare the case with and without the Detail Guidance. When Detail Guidance is used, more detail is detected.
  10. This table compares using Spatial Path and using Detail Guidance. STDC using Detail Guidance has some performance and speed improvements than STDC using Spatial Path. STDCλŠ” Spatial Pathλ₯Ό μ œκ±°ν•˜κ³  μ•½κ°„μ˜ μ„±λŠ₯ ν–₯상과 속도 ν–₯상을 κ°€μ§‘λ‹ˆλ‹€.
  11. From this table, we can see that STDC1 has a fast speed and maintains similar accuracy to the previous method, and STDC2 has a high accuracy while having a similar speed to the previous method.
  12. Ej 높은 정확도와 더 높은 FPS 50 70은 각각 이미지 크기λ₯Ό μ˜λ―Έν•©λ‹ˆλ‹€. This table represents the mean IOU and FPS according to the backbone network. 50 means that input size is 512 x 1024. 75 means that input size is 768 x 1536