Mask R-CNN

•Download as PPTX, PDF•

1 like•559 views

Mask R-CNN present a conceptually simple, flexible, and general framework for object instance segmentation. This approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without tricks, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. presentation: https://www.youtube.com/watch?v=FZePQKPEwoo (한국어) reference: He, Kaiming, et al. "Mask r-cnn." arXiv preprint arXiv:1703.06870 (2017).

Engineering

Mask R-CNN
CM Seminar 2017.09.01
Jaehyun Jun
Biointelligence Laboratory
Interdisciplinary Program of Neuro Science, Seoul National Univertisy
http://bi.snu.ac.kr

Overview
 Task
 object detection: classify objects and localize using bounding box
 instance segmentation: classify each pixel into a fixed set of
categories
© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr 2

Main Idea
 Goal: develop a comparably enabling framework for
instance segmentation
 Extension of Faster R-CNN
 RoIPool -> RoIAlign
 decouple mask and class prediction
3© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

Related Work
 R-CNN (Region-based CNN)
 Bounding box: Selective Search -> AlexNet -> linear regression
 Classification: Selective Search -> AlexNet -> SVM
 Faster R-CNN
 Bounding box: Region Proposal Network (RPN)
 Bounding box + Classification
: extract feature using RoIPool
 RoIPool: large negative effect on
predicting pixel-accurate mask
 misalign problem
4© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

History
 R-CNN -> SPP-net
 SPP-net -> Fast R-CNN
5© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

History
 Fast R-CNN -> Faster R-CNN
6© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr
 Faster R-CNN -> Mask R-CNN

Related Work
 ResNeXt
 increasing cardinality is more effective than depth or width of
networks
 ResNeXt works better than ResNet having same number of
parameters
7© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

Mask R-CNN
 Two stage
1. Bounding box: Region Proposal Network (RPN)
2. Class and Box offset
 Output
 binary mask
 one for each classes
 Loss
 L = Lcls + Lbox + Lmask
 Lmask: average binary cross-entropy loss
 positive RoIs
8© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

Region Proposal Networks
1. Propose k reference boxes on sliding-window
2. Map sliding-window into lower-dimensional feature
3.1. feed on box-regression layer
3.2. feed on box-classification layer
9© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

RoIPooling
10© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

RoIPooling - Differentiable
11© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

RoIPooling - SGD step
12© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr
 Region-wise sampling to make mini-batches

RoIAlign
 RoIPool: misalignment problem
 RoIAlign: use decimal points as boundary size & apply
bilinear interpolation (e.g. [x/16] -> x/16)
13© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

Architecture
 Backbone: feature extraction
 ResNet-50 or ResNet-101
 ResNeXt-50 or ResNeXt-101
 Feature Pyramid Network (FPN)
 C4 or C5 (convolutional layer)
 RoIAlign: aligning the extracted features with the input
 quantization X -> bilinear interpolation
 [x/16] -> x/16
 Head: bounding-box recognition & mask prediction
 ResNet-C4 -> 9-layer ‘res5’
 FPN -> res5 + filter
14© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

Dataset
 MS COCO
 Object Instance Annotations
 Object Keypoint Annotations
15© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

Experiment
 Mask R-CNN vs. FCIS+++
 no artifacts
16© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

Experiment
 RoIPool vs. RoIWarp vs. RoIAlign
17© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

Experiment
 Mask R-CNN vs. Faster R-CNN
 Improve 3~6%
18© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

Experiment
 Human Pose Estimation (Keypoint Detection)
19© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

Experiment
 Cityscapes
 1st on Instance Level Semantic
Labeling Task
20© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

What's hot

Introduction to object detectionBrodmann17

CNN Attention NetworksTaeoh Kim

Deep Learning in Computer VisionSungjoon Choi

Object Detection using Deep Neural NetworksUsman Qayyum

ViT (Vision Transformer) Review [CDM]Dongmin Choi

DcganBrian Kim

Convolutional neural network Yan Xu

Mask-RCNN for Instance SegmentationDat Nguyen

PR-217: EfficientDet: Scalable and Efficient Object DetectionJinwon Lee

Object Detection Methods using Deep LearningSungjoon Choi

Semantic segmentation with Convolutional Neural Network ApproachesFellowship at Vodafone FutureLab

Yolov5 Hochschule Bonn-Rhein-Sieg

Object Detection and Recognition Intel Nervana

CNN Machine learning DeepLearningAbhishek Sharma

Yolo v2 ai_tech_20190421穗碧陳

Deep learning based object detection basicsBrodmann17

1시간만에 GAN(Generative Adversarial Network) 완전 정복하기NAVER Engineering

Deep Learning for Video: Action Recognition (UPC 2018)Universitat Politècnica de Catalunya

Densenet CNNArunKumar7374

Faster rcnn捷恩蔡

What's hot (20)

Introduction to object detection

CNN Attention Networks

Deep Learning in Computer Vision

Object Detection using Deep Neural Networks

ViT (Vision Transformer) Review [CDM]

Dcgan

Convolutional neural network

Mask-RCNN for Instance Segmentation

PR-217: EfficientDet: Scalable and Efficient Object Detection

Object Detection Methods using Deep Learning

Semantic segmentation with Convolutional Neural Network Approaches

Yolov5

Object Detection and Recognition

CNN Machine learning DeepLearning

Yolo v2 ai_tech_20190421

Deep learning based object detection basics

1시간만에 GAN(Generative Adversarial Network) 완전 정복하기

Deep Learning for Video: Action Recognition (UPC 2018)

Densenet CNN

Faster rcnn

Similar to Mask R-CNN

Object Single Frame Using YOLO ModelIRJET Journal

Deep galaxy classification of galaxies based on deep convolutional neural ne...Aboul Ella Hassanien

Recognition and Detection of Real-Time Objects Using Unified Network of Faste...dbpublications

CNNs: from the Basics to Recent AdvancesDmytro Mishkin

Garbage Classification Using Deep Learning TechniquesIRJET Journal

PointNetPetteriTeikariPhD

[Paper] DetectoRS for Object DetectionSusang Kim

Recent Progress on Object Detection_20170331Jihong Kang

Remote Sensing IEEE 2015 ProjectsVijay Karan

SULI posterTing Chi

最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui

IJSRED-V2I5P40IJSRED

Resume_optics_Gupta RoySubharup Gupta Roy

Remote Sensing IEEE 2015 ProjectsVijay Karan

A Literature Survey: Neural Networks for object detectionvivatechijri

Deep Learning - Optimization BasicJaehyun Jun

IRJET- Weakly Supervised Object Detection by using Fast R-CNNIRJET Journal

(Research Note) Delving deeper into convolutional neural networks for camera ...Jacky Liu

A Review on Color Recognition using Deep Learning and Different Image Segment...IRJET Journal

IRJET- Extension to Visual Information Narrator using Neural NetworkIRJET Journal

Similar to Mask R-CNN (20)

Object Single Frame Using YOLO Model

Deep galaxy classification of galaxies based on deep convolutional neural ne...

Recognition and Detection of Real-Time Objects Using Unified Network of Faste...

CNNs: from the Basics to Recent Advances

Garbage Classification Using Deep Learning Techniques

PointNet

[Paper] DetectoRS for Object Detection

Recent Progress on Object Detection_20170331

Remote Sensing IEEE 2015 Projects

SULI poster

最近の研究情勢についていくために - Deep Learningを中心に -

IJSRED-V2I5P40

Resume_optics_Gupta Roy

Remote Sensing IEEE 2015 Projects

A Literature Survey: Neural Networks for object detection

Deep Learning - Optimization Basic

IRJET- Weakly Supervised Object Detection by using Fast R-CNN

(Research Note) Delving deeper into convolutional neural networks for camera ...

A Review on Color Recognition using Deep Learning and Different Image Segment...

IRJET- Extension to Visual Information Narrator using Neural Network

Recently uploaded

Introduction to IEEE STANDARDS and its different types.pptxupamatechverse

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEslot gacor bisa pakai pulsa

HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95

Introduction and different types of Ethernet.pptxupamatechverse

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis

UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N

College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

Introduction to Multiple Access Protocol.pptxupamatechverse

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth

Extrusion Processes and Their Limitations120cr0395

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal

result management system report for college projectTonystark477637

Recently uploaded (20)

Introduction to IEEE STANDARDS and its different types.pptx

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV

Introduction and different types of Ethernet.pptx

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR

Introduction to Multiple Access Protocol.pptx

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...

IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...

Extrusion Processes and Their Limitations

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...

result management system report for college project

Mask R-CNN

1. Mask R-CNN CM Seminar 2017.09.01 Jaehyun Jun Biointelligence Laboratory Interdisciplinary Program of Neuro Science, Seoul National Univertisy http://bi.snu.ac.kr

2. Overview  Task  object detection: classify objects and localize using bounding box  instance segmentation: classify each pixel into a fixed set of categories © 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr 2

3. Main Idea  Goal: develop a comparably enabling framework for instance segmentation  Extension of Faster R-CNN  RoIPool -> RoIAlign  decouple mask and class prediction 3© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

4. Related Work  R-CNN (Region-based CNN)  Bounding box: Selective Search -> AlexNet -> linear regression  Classification: Selective Search -> AlexNet -> SVM  Faster R-CNN  Bounding box: Region Proposal Network (RPN)  Bounding box + Classification : extract feature using RoIPool  RoIPool: large negative effect on predicting pixel-accurate mask  misalign problem 4© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

7. Related Work  ResNeXt  increasing cardinality is more effective than depth or width of networks  ResNeXt works better than ResNet having same number of parameters 7© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

8. Mask R-CNN  Two stage 1. Bounding box: Region Proposal Network (RPN) 2. Class and Box offset  Output  binary mask  one for each classes  Loss  L = Lcls + Lbox + Lmask  Lmask: average binary cross-entropy loss  positive RoIs 8© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

9. Region Proposal Networks 1. Propose k reference boxes on sliding-window 2. Map sliding-window into lower-dimensional feature 3.1. feed on box-regression layer 3.2. feed on box-classification layer 9© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

13. RoIAlign  RoIPool: misalignment problem  RoIAlign: use decimal points as boundary size & apply bilinear interpolation (e.g. [x/16] -> x/16) 13© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

14. Architecture  Backbone: feature extraction  ResNet-50 or ResNet-101  ResNeXt-50 or ResNeXt-101  Feature Pyramid Network (FPN)  C4 or C5 (convolutional layer)  RoIAlign: aligning the extracted features with the input  quantization X -> bilinear interpolation  [x/16] -> x/16  Head: bounding-box recognition & mask prediction  ResNet-C4 -> 9-layer ‘res5’  FPN -> res5 + filter 14© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr

21. Q & A

Editor's Notes

2016 City scape dataset - Instance Level Semantic Labeling Task 1등
R-CNN은 모든 object 에 대해서 별개의 network로 feature map을 뽑기 때문에 중복된 연산이 많이 일어남 -> RoIPool: 전체 이미지에 대해서 하나의 network로 feature map을 뽑고 object에 해당하는 feature map을 추출하여 사용
mask와 class prediction을 분리시키는 것이 핵심
RoIPool 에서 [x/16] 을 사용한 이유? bilinear interpolation는 어디에 어떻게 사용되고 사용하는 이유?
AP50, 75 의 의미? IoU threshold 라고 하는데 overlap된 영역이 50% 75% 넘으면 처리하지 않는다 정도의 내용인지…

Mask R-CNN

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Mask R-CNN

Similar to Mask R-CNN (20)

Recently uploaded

Recently uploaded (20)

Mask R-CNN

Editor's Notes