Revisiting the Sibling Head in Object Detector

Contents
1. Introduction
2. Methods
3. Experiments
4. Conclusion
2

Introduction
3
• The specialized sibling head for both classification and localization
• Single stage, two-stage, anchor free 모두 사용됨
• Sibling head 내 두 object function 사이의 충돌 우려
• IoU-Net (2018)
• 좋은 classification score를 만드는 feature는 coarse bbox를 예측할 것
• Localization confidence로 IoU를 계산하는 extra head 추가
• Localization confidence와 classification confidence를 final classification score로 통합
• Tight bbox에 대한 confidence score는 높이고 좋지 않은 것은 줄임
• 하지만 spatial point의 misalignment는 여전히 남아있음

Introduction
4
• Double-Head R-CNN
• Sibling head를 classification, localization branch로 분리
• 두 task의 shared parameter를 감소시킴
• 성능은 향상되었지만, 두 branch로 들어가는 feature가 ROI pooling으로 만들어졌기 때문에
두 task간 충돌은 여전히 남아있음
• Anchor-based object detector의 sibling head를 살펴보자
• 각 레이어의 feature map에서 classification과 localization의 spatial sensitivity
• Classification : some salient area / bbox regression : boundary
• Spatial dimension에서 misalignment는 성능향상에 제한이 있음

Introduction
5
• Task-aware Spatial Disentanglement (TSD)
• Classification과 localization의 gradient flow를 공간적으로 분리하자
• 두 task가 절충없이 optimal location을 점차적으로 찾도록 함
• TSD와 classical sibling head 사이의 performance margin을 키우는 progressive constraint (PC)
• Hyper-parameter margin

Methods
6
• A rectangular bbox proposal 𝑃, the ground-truth bbox ℬ with class 𝑦
• Faster R-CNN : ℒ = ℒ 𝑐𝑙𝑠 ℋ1 𝐹𝑙, 𝑃 , 𝑦 + ℒ 𝑙𝑜𝑐 ℋ2 𝐹𝑙, 𝑃 , ℬ
• ℋ1 ⋅ = 𝑓 ⋅ , 𝒞 ⋅ , ℋ2 ⋅ = 𝑓 ⋅ , ℛ ⋅
• 𝑓 ⋅ : the feature extractor
• 𝒞 ⋅ and ℛ ⋅ : the functions for transforming feature to predict specific category and localize object
• A novel TSD head : ℒ = ℒ 𝑐𝑙𝑠
𝐷
ℋ1
𝐷
𝐹𝑙, ෠𝑃𝑐 , 𝑦 + ℒ 𝑙𝑜𝑐
𝐷
ℋ2
𝐷
𝐹𝑙, ෠𝑃𝑟 , ℬ
• ෠𝑃𝑐 = 𝜏 𝑐 𝑃, Δ𝐶 , ෠𝑃𝑟 = 𝜏 𝑟 𝑃, ΔR from the shared 𝑃
• ΔC : a pointwise deformation of 𝑃
• ΔR : a proposal-wise translation
• ℋ1
𝐷
⋅ = 𝑓𝑐 ⋅ , 𝒞 ⋅ , ℋ2
𝐷
⋅ = 𝑓𝑟 ⋅ , ℛ ⋅

Methods
7
• A rectangular bbox proposal 𝑃, the ground-truth bbox ℬ with class 𝑦
• A novel TSD head : ℒ = ℒ 𝑐𝑙𝑠
𝐷
ℋ1
𝐷
𝐹𝑙, ෠𝑃𝑐 , 𝑦 + ℒ 𝑙𝑜𝑐
𝐷
ℋ2
𝐷
𝐹𝑙, ෠𝑃𝑟 , ℬ
• ෠𝑃𝑐 = 𝜏 𝑐 𝑃, Δ𝐶 , ෠𝑃𝑟 = 𝜏 𝑟 𝑃, ΔR from the shared 𝑃
• ΔC : a pointwise deformation of 𝑃
• ΔR : a proposal-wise translation
• ℋ1
𝐷
⋅ = 𝑓𝑐 ⋅ , 𝒞 ⋅ , ℋ2
𝐷
⋅ = 𝑓𝑟 ⋅ , ℛ ⋅
• TSD는 𝑃의 RoI feature를 input으로 수행하고 disentangled proposal ෠𝑃𝑐와 ෠𝑃𝑟을 각각 생성
• 두 task는 분리된 proposal을 통해 spatial dimension에서 분리 가능
• ෠𝐹𝑐 (classification-specific feature map) → a three-layer fully connected networks for classification
• ෠𝐹𝑟 (localization-specific feature map) → ~ for localization
• 분리함으로써, task-aware feature representation을 배울 수 있음!

Methods
8
• Task-aware spatial disentanglement learning
• 𝐹 : the RoI feature of 𝑃일 때, deformation-learning manner를 TSD에 추가
• Localization
• ℱ𝑟 : 새로운 proposal ෠𝑃𝑟 생성을 위해 𝑃에서 proposal-wise translation 생성
Δ𝑅 = 𝛾ℱ𝑟 𝐹; 𝜃𝑟 ⋅ 𝑤, ℎ
• Δ𝑅 ∈ ℝ1×1×2
and the output of ℱ𝑟 for each layer : 256, 256, 2
• 𝛾 : a pre-defined scalar to modulate the magnitude of the Δ𝑅
• 𝑤, ℎ : the width and height of 𝑃
• The proposal-wise translation
෠𝑃𝑟 = 𝜏 𝑟 𝑃, Δ𝑅 = 𝑃 + ΔR
• 𝑃의 pixel의 좌표가 동일한 Δ𝑅을 통해 새로운 좌표로 이동
• Localization task만 적용
• Δ𝑅이 미분가능하도록 bilinear interpolation 적용

Methods
9
• Classification
• ℱ𝑐 : 불규칙적인 shape의 ෠𝑃𝑐 생성을 위해 regular grid 𝑘 × 𝑘에서 pointwise deformation
• (x, y)-th grid에서 ෠𝑃𝑐에서 새로운 sample point를 얻기 위해 translation Δ𝐶 𝑥, 𝑦,∗ 수행
Δ𝐶 = 𝛾ℱ𝑐 𝐹; 𝜃𝑐 ⋅ 𝑤, ℎ
• Δ𝐶 ∈ ℝ 𝑘×𝑘×2
• ℱ𝑟 : a three-layer fully connected network with output 256, 256, 𝑘 × 𝑘 × 2
• 𝜃𝑐 : a learned parameter

Methods
10
• Classification
• ℱ𝑟과 ℱ𝑐의 첫 번째 레이어는 parameter를 줄이기 위해 공유
• 불규칙한 ෠𝑃𝑐에서 feature map ෠𝐹𝑐을 만들기 위해, deformable RoI pooling 적용
෠𝐹𝑐 𝑥, 𝑦 = ෍
𝑝∈𝐺 𝑥,𝑦
ℱ 𝐵 𝑝0 + Δ𝐶 𝑥, 𝑦, 1 , 𝑝1 + Δ𝐶 𝑥, 𝑦, 2
𝐺 𝑥, 𝑦
• 𝐺 𝑥, 𝑦 : the (x,y)-th grid, 𝐺 𝑥, 𝑦 : the number of sample-points in the grid
• 𝑝 𝑥, 𝑝 𝑦 : the coordinate of the sample point in grid 𝐺 𝑥, 𝑦
• ℱ 𝐵 ⋅ : the bilinear interpolation to make the Δ𝐶 differentiable
https://arxiv.org/abs/1703.06211

Methods
11
• Progressive constraint
• Classification branch
ℳ𝑐𝑙𝑠 = ℋ1 𝑦|𝐹𝑙, 𝑃 − ℋ1
𝐷
𝑦|𝐹𝑙, 𝜏 𝑐 𝑃, Δ𝐶 + 𝑚 𝑐 +
• ℋ 𝑦| ⋅ : the confidence score of the 𝑦-th class
• 𝑚 𝑐 : the predefined margin, ⋅ + : ReLU function
• Localization branch
ℳ𝑙𝑜𝑐 = 𝐼𝑜𝑈 ෡ℬ, ℬ − 𝐼𝑜𝑈 ෡ℬ 𝑫, ℬ + 𝑚 𝑟 +
• ෡ℬ : the predicted box by sibling head
• ෡ℬ 𝐷 : ℋ2
𝐷
𝐹𝑙, 𝜏 𝑟 𝑃, Δ𝑅 에서 regression, 𝑃가 negative proposal이면 ℳ𝑙𝑜𝑐은 무시

Methods
12
• Progressive constraint
• Faster R-CNN에서 TSD의 전체 lossfunction
ℒ = ℒ 𝑟𝑝𝑛 + ℒ 𝑐𝑙𝑠 + ℒ 𝑙𝑜𝑐
𝑐𝑙𝑎𝑠𝑠𝑖𝑐𝑎𝑙 𝑙𝑜𝑠𝑠
+ ℒ 𝑐𝑙𝑠
𝐷
+ ℒ 𝑙𝑜𝑐
𝐷
+ ℳ𝑐𝑙𝑠 + ℳ𝑙𝑜𝑐
𝑇𝑆𝐷 𝑙𝑜𝑠𝑠
• TSD는 classification과 localization의 task-specific feature representation을 학습

Experiments
13
• Dataset
• 80-category MS-COCO dataset
• 80k train images & 35k subset of val images & 5k val images for test & 20k test-dev
• 500-category OpenImageV5 challenge dataset
• 1,674,979 training images & 34,917 val images
• AP.5 on public leaderboard
• Implementation details
• ImageNet pre-trained models / hyper-parameters of Faster R-CNN
• Resize such that the shorter edge is 800 pixels / anchor scale = 8 / aspect ratio = {0.5, 1, 2}
• RoIAlign / the pooling size is 7 in both ℋ1
∗
and ℋ2
∗
/ …

Experiments
14
• Ablation studies
• 𝑚 𝑐 = 𝑚 𝑟 = 0.2
• Task-aware disentanglement
• Backbone과 head에서 다양한 decoupling option으로 실험 (Fig. 3)
• Backbone에서 decoupling하면 성능이 크게 저하 (D 𝑠8, D 𝑠16, D 𝑠32)
• Backbone의 semantic information을 공유되어야함
• Dℎ𝑒𝑎𝑑와 비교했을 때, TSD w/o PC가 소폭 상승

Experiments
15
• 𝑚 𝑐 = 𝑚 𝑟 = 0.2
• Joint training with sibling head ℋ∗
• TSD와 sibling head를 함께 학습하면 어떨까?
• ෠𝑃𝑐와 ෠𝑃𝑟은 original proposal 𝑃와 충돌하지 않음! (Tab. 2)
• Effectiveness of PC
• TSD의 성능을 높이기 위해 PC 제안
• AP.75에서는 1.5나 향상, AP.5에서는 영향 거의 없음 (Tab. 3)
• PC가 더 정확한 classification과 regression을 유도
• IoU from 0.5:0.95에서 AP가 1.3 향상

Experiments
16
• 𝑚 𝑐 = 𝑚 𝑟 = 0.2
• Derived proposal learning manner for ℋ∗
𝑫
• ෠𝑃𝑟과 ෠𝑃𝑐를 계산하는 방법의 조합을 다양하게 실험 (Tab. 4)
• 𝑃𝑜𝑖𝑛𝑡. 𝑤가 classification에서 이점이 분명하고 PC와 함께 사용하면 더 좋음
• 𝑃𝑟𝑜𝑝. 𝑤는 localization에 약간의 성능 개선
• Classification은 shape의 제약없는 optimal local feature이 필요
• Regression은 global geometric shape information이 유지되어야함
• Delving to the effective PC
• PC 값들에 대한 ablation study (Fig. 4)
• ℳ𝑙𝑜𝑠과 ℳ𝑐𝑙𝑠 모두 성능 향상

Experiments
17
• Applicable to variant backbones
• 다른 모델에도 적용해보자

Experiments
18
• Applicable to Mask R-CNN
• Mask R-CNN같이 Instance Segmentation에도 적용해보자

Experiments
19
• Generalization on large-scale OpenImage
• COCO말고 OpenImage dataset에도 적용해보자

Experiments
20
• Comparison with state-of-the-Arts
• COCO SOTA 모델들과 비교해보자 (𝑚 𝑐 = 0.5, 𝑚 𝑟 = 0.2)

Experiments
21
• Analysis and discussion
• Performance in different IoU criteria
• IoU threshold가 증가함에 따라, 성능차가 점점 증가 (Fig. 6)
• Performance in different scale criteria
• AP threshold도 바꿔가며 확인 (Tab. 9)

Experiments
22
• Analysis and discussion
• What did TSD learn?
• False positive를 줄였고, 더 정확하게 bbox를 예측
• ෠𝑃𝑟 : translate to the boundary / ෠𝑃𝑐 : concentrate on the local appearance and object context information

Conclusion
23
• Conclusion
• Task-aware spatial disentanglement (TSD)
• To alleviate the inherent conflict in sibling head
• To learn the task-aware spatial disentanglement to bread through the performance limitation
• Progressive Constraint (PC)
• To enlarge the performance margin between the disentangled and the shared proposals
• To provide additional performance gain

Revisiting the Sibling Head in Object Detector

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Revisiting the Sibling Head in Object Detector

Similar to Revisiting the Sibling Head in Object Detector (20)

More from Sungchul Kim

More from Sungchul Kim (20)

Recently uploaded

Recently uploaded (20)

Revisiting the Sibling Head in Object Detector