Mask R-CNN present a conceptually simple, flexible, and general framework for object instance segmentation. This approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without tricks, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition.
presentation: https://www.youtube.com/watch?v=FZePQKPEwoo (한국어)
reference: He, Kaiming, et al. "Mask r-cnn." arXiv preprint arXiv:1703.06870 (2017).
result management system report for college project
Mask R-CNN
1. Mask R-CNN
CM Seminar 2017.09.01
Jaehyun Jun
Biointelligence Laboratory
Interdisciplinary Program of Neuro Science, Seoul National Univertisy
http://bi.snu.ac.kr
R-CNN은 모든 object 에 대해서 별개의 network로 feature map을 뽑기 때문에 중복된 연산이 많이 일어남
-> RoIPool: 전체 이미지에 대해서 하나의 network로 feature map을 뽑고 object에 해당하는 feature map을 추출하여 사용
mask와 class prediction을 분리시키는 것이 핵심
RoIPool 에서 [x/16] 을 사용한 이유?
bilinear interpolation는 어디에 어떻게 사용되고 사용하는 이유?
AP50, 75 의 의미? IoU threshold 라고 하는데 overlap된 영역이 50% 75% 넘으면 처리하지 않는다 정도의 내용인지…