RWTH AACHEN
Media Informatics
Hojun Lim
1
Fast R-CNN
Paper review session 1
■ Comparison: R-CNN vs Fast R-CNN
■ Image Pyramid
■ Scale Invariance (Multi-scale)
■ Truncated SVD for replacing weights of FC layers
■ Performance Metric: Pascal VOC 2012 vs COCO
Outline
2
Comparision: R-CNN vs Fast R-CNN
3
■ R-CNN
□ Architecture
□ Classification
□ Regression (localization)
-> BBOX encoding: for reducing the answer space.
It can be further reduced by variance trick
X
Comparision: R-CNN vs Fast R-CNN
4
■ R-CNN
□ Defacts
□ Multi-stage training pipeline
(1) Train ConvNet for localization
(2) Train SVMs to ConvNet features
(3) Replacing Softmax by SVM and finetune
□ Training is expensive
□ Convolution for each region proposal, after warping
□ Object detection is slow
Comparision: R-CNN vs Fast R-CNN
5
■ Fast R-CNN
□ Architecture
□ single-stage training pipeline: combining
(1) Log loss
(2) Smooth L1 (= Huber loss when delta is 1)
□ Multi-task loss for each RoI
Indicator function,
u = 0 for background
Comparision: R-CNN vs Fast R-CNN
6
■ Fast R-CNN
□ Improvements
□ Feed whole image through ConvNet
□ RoI Pooling (no warping)
y
x
Backprop of RoI pooling
Comparision: R-CNN vs Fast R-CNN
7
■ Fast R-CNN
□ Limitation
□ Complete architecture depends on external
RoI proposal algorithm
□ Have to extract fixed N(=64) regions
from each image
□ Hard negative mining:
25% positive: IoU in [0.5, 1]
75% negative: IoU in [0.1, 0.5)
□ Weekly addressed multi-scale invariance
□ Brute-force (fixing image resolution)
□ Image Pyramid: expensive
Image Pyramid
8
[1] Image Pyramid (Gaussian Pyramid)
Scale invariance
9
Scale invariance
10
Truncated SVD
11
Truncated SVD
12
Q. Why is it helpful to reduce num parameters?
A. Suppose (n, d, r) = (100, 100, 2)
Truncated SVD
13
2D dataset example 3D dataset example
Q&A
Thank you !
14

Fast rcnn

  • 1.
    RWTH AACHEN Media Informatics HojunLim 1 Fast R-CNN Paper review session 1
  • 2.
    ■ Comparison: R-CNNvs Fast R-CNN ■ Image Pyramid ■ Scale Invariance (Multi-scale) ■ Truncated SVD for replacing weights of FC layers ■ Performance Metric: Pascal VOC 2012 vs COCO Outline 2
  • 3.
    Comparision: R-CNN vsFast R-CNN 3 ■ R-CNN □ Architecture □ Classification □ Regression (localization) -> BBOX encoding: for reducing the answer space. It can be further reduced by variance trick X
  • 4.
    Comparision: R-CNN vsFast R-CNN 4 ■ R-CNN □ Defacts □ Multi-stage training pipeline (1) Train ConvNet for localization (2) Train SVMs to ConvNet features (3) Replacing Softmax by SVM and finetune □ Training is expensive □ Convolution for each region proposal, after warping □ Object detection is slow
  • 5.
    Comparision: R-CNN vsFast R-CNN 5 ■ Fast R-CNN □ Architecture □ single-stage training pipeline: combining (1) Log loss (2) Smooth L1 (= Huber loss when delta is 1) □ Multi-task loss for each RoI Indicator function, u = 0 for background
  • 6.
    Comparision: R-CNN vsFast R-CNN 6 ■ Fast R-CNN □ Improvements □ Feed whole image through ConvNet □ RoI Pooling (no warping) y x Backprop of RoI pooling
  • 7.
    Comparision: R-CNN vsFast R-CNN 7 ■ Fast R-CNN □ Limitation □ Complete architecture depends on external RoI proposal algorithm □ Have to extract fixed N(=64) regions from each image □ Hard negative mining: 25% positive: IoU in [0.5, 1] 75% negative: IoU in [0.1, 0.5) □ Weekly addressed multi-scale invariance □ Brute-force (fixing image resolution) □ Image Pyramid: expensive
  • 8.
    Image Pyramid 8 [1] ImagePyramid (Gaussian Pyramid)
  • 9.
  • 10.
  • 11.
  • 12.
    Truncated SVD 12 Q. Whyis it helpful to reduce num parameters? A. Suppose (n, d, r) = (100, 100, 2)
  • 13.
    Truncated SVD 13 2D datasetexample 3D dataset example
  • 14.

Editor's Notes

  • #2 MA-INF 2307 - Lab Vision
  • #3 main idea of the paper(same as the first talk) - review your goals - present & discuss your results - comment on your own implementation (what was available, what had to be done, what were the difficulties)-> 1. data preprocessing(parsing json), managing two independent projects, making code work in general(since we have many variable here:FDA_mode, round, thresholding, and so on) - conclusion (e.g strengths/weaknesses of the paper, potential future work) -> 시간 부족(학습) -> 사실은 selfsupervised에서 multiband average가 있어야 함.ㅎ ㅏ지만 시간상 하지 못하였다. FDA에서는 사실 이부분에서 주요한 성능향상이 이루어졌기 때문에 Intra에서도 향상이 기대된다.
  • #10 Explain meaning of ‘Domain adaptation’ : adapting a model trained with annotated samples from one distribution (source), to operate on a different (target) distribution for which no annotations are given Our method does not require any training to perform the domain alignment, just a simple Fourier Transform and its inverse. Despite its simplicity, it achieves state-of-the-art performance in the current benchmarks, when integrated into a relatively standard semantic segmentation model Many researches have been proposed for ’Domain Adaptation’ However, state-of-the-art methods are complex
  • #11 Explain meaning of ‘Domain adaptation’ : adapting a model trained with annotated samples from one distribution (source), to operate on a different (target) distribution for which no annotations are given Our method does not require any training to perform the domain alignment, just a simple Fourier Transform and its inverse. Despite its simplicity, it achieves state-of-the-art performance in the current benchmarks, when integrated into a relatively standard semantic segmentation model Many researches have been proposed for ’Domain Adaptation’ However, state-of-the-art methods are complex
  • #12 <a|b> <a| |b> a^{T}b