3. Introduction
3
• The specialized sibling head for both classification and localization
• Single stage, two-stage, anchor free 모두 사용됨
• Sibling head 내 두 object function 사이의 충돌 우려
• IoU-Net (2018)
• 좋은 classification score를 만드는 feature는 coarse bbox를 예측할 것
• Localization confidence로 IoU를 계산하는 extra head 추가
• Localization confidence와 classification confidence를 final classification score로 통합
• Tight bbox에 대한 confidence score는 높이고 좋지 않은 것은 줄임
• 하지만 spatial point의 misalignment는 여전히 남아있음
4. Introduction
4
• Double-Head R-CNN
• Sibling head를 classification, localization branch로 분리
• 두 task의 shared parameter를 감소시킴
• 성능은 향상되었지만, 두 branch로 들어가는 feature가 ROI pooling으로 만들어졌기 때문에
두 task간 충돌은 여전히 남아있음
• Anchor-based object detector의 sibling head를 살펴보자
• 각 레이어의 feature map에서 classification과 localization의 spatial sensitivity
• Classification : some salient area / bbox regression : boundary
• Spatial dimension에서 misalignment는 성능향상에 제한이 있음
6. Methods
6
• Task-aware Spatial Disentanglement (TSD)
• A rectangular bbox proposal 𝑃, the ground-truth bbox ℬ with class 𝑦
• Faster R-CNN : ℒ = ℒ 𝑐𝑙𝑠 ℋ1 𝐹𝑙, 𝑃 , 𝑦 + ℒ 𝑙𝑜𝑐 ℋ2 𝐹𝑙, 𝑃 , ℬ
• ℋ1 ⋅ = 𝑓 ⋅ , 𝒞 ⋅ , ℋ2 ⋅ = 𝑓 ⋅ , ℛ ⋅
• 𝑓 ⋅ : the feature extractor
• 𝒞 ⋅ and ℛ ⋅ : the functions for transforming feature to predict specific category and localize object
• A novel TSD head : ℒ = ℒ 𝑐𝑙𝑠
𝐷
ℋ1
𝐷
𝐹𝑙, 𝑃𝑐 , 𝑦 + ℒ 𝑙𝑜𝑐
𝐷
ℋ2
𝐷
𝐹𝑙, 𝑃𝑟 , ℬ
• 𝑃𝑐 = 𝜏 𝑐 𝑃, Δ𝐶 , 𝑃𝑟 = 𝜏 𝑟 𝑃, ΔR from the shared 𝑃
• ΔC : a pointwise deformation of 𝑃
• ΔR : a proposal-wise translation
• ℋ1
𝐷
⋅ = 𝑓𝑐 ⋅ , 𝒞 ⋅ , ℋ2
𝐷
⋅ = 𝑓𝑟 ⋅ , ℛ ⋅
7. Methods
7
• Task-aware Spatial Disentanglement (TSD)
• A rectangular bbox proposal 𝑃, the ground-truth bbox ℬ with class 𝑦
• A novel TSD head : ℒ = ℒ 𝑐𝑙𝑠
𝐷
ℋ1
𝐷
𝐹𝑙, 𝑃𝑐 , 𝑦 + ℒ 𝑙𝑜𝑐
𝐷
ℋ2
𝐷
𝐹𝑙, 𝑃𝑟 , ℬ
• 𝑃𝑐 = 𝜏 𝑐 𝑃, Δ𝐶 , 𝑃𝑟 = 𝜏 𝑟 𝑃, ΔR from the shared 𝑃
• ΔC : a pointwise deformation of 𝑃
• ΔR : a proposal-wise translation
• ℋ1
𝐷
⋅ = 𝑓𝑐 ⋅ , 𝒞 ⋅ , ℋ2
𝐷
⋅ = 𝑓𝑟 ⋅ , ℛ ⋅
• TSD는 𝑃의 RoI feature를 input으로 수행하고 disentangled proposal 𝑃𝑐와 𝑃𝑟을 각각 생성
• 두 task는 분리된 proposal을 통해 spatial dimension에서 분리 가능
• 𝐹𝑐 (classification-specific feature map) → a three-layer fully connected networks for classification
• 𝐹𝑟 (localization-specific feature map) → ~ for localization
• 분리함으로써, task-aware feature representation을 배울 수 있음!
8. Methods
8
• Task-aware spatial disentanglement learning
• 𝐹 : the RoI feature of 𝑃일 때, deformation-learning manner를 TSD에 추가
• Localization
• ℱ𝑟 : 새로운 proposal 𝑃𝑟 생성을 위해 𝑃에서 proposal-wise translation 생성
Δ𝑅 = 𝛾ℱ𝑟 𝐹; 𝜃𝑟 ⋅ 𝑤, ℎ
• Δ𝑅 ∈ ℝ1×1×2
and the output of ℱ𝑟 for each layer : 256, 256, 2
• 𝛾 : a pre-defined scalar to modulate the magnitude of the Δ𝑅
• 𝑤, ℎ : the width and height of 𝑃
• The proposal-wise translation
𝑃𝑟 = 𝜏 𝑟 𝑃, Δ𝑅 = 𝑃 + ΔR
• 𝑃의 pixel의 좌표가 동일한 Δ𝑅을 통해 새로운 좌표로 이동
• Localization task만 적용
• Δ𝑅이 미분가능하도록 bilinear interpolation 적용
9. Methods
9
• Task-aware spatial disentanglement learning
• 𝐹 : the RoI feature of 𝑃일 때, deformation-learning manner를 TSD에 추가
• Classification
• ℱ𝑐 : 불규칙적인 shape의 𝑃𝑐 생성을 위해 regular grid 𝑘 × 𝑘에서 pointwise deformation
• (x, y)-th grid에서 𝑃𝑐에서 새로운 sample point를 얻기 위해 translation Δ𝐶 𝑥, 𝑦,∗ 수행
Δ𝐶 = 𝛾ℱ𝑐 𝐹; 𝜃𝑐 ⋅ 𝑤, ℎ
• Δ𝐶 ∈ ℝ 𝑘×𝑘×2
• ℱ𝑟 : a three-layer fully connected network with output 256, 256, 𝑘 × 𝑘 × 2
• 𝜃𝑐 : a learned parameter
10. Methods
10
• Task-aware spatial disentanglement learning
• 𝐹 : the RoI feature of 𝑃일 때, deformation-learning manner를 TSD에 추가
• Classification
• ℱ𝑟과 ℱ𝑐의 첫 번째 레이어는 parameter를 줄이기 위해 공유
• 불규칙한 𝑃𝑐에서 feature map 𝐹𝑐을 만들기 위해, deformable RoI pooling 적용
𝐹𝑐 𝑥, 𝑦 =
𝑝∈𝐺 𝑥,𝑦
ℱ 𝐵 𝑝0 + Δ𝐶 𝑥, 𝑦, 1 , 𝑝1 + Δ𝐶 𝑥, 𝑦, 2
𝐺 𝑥, 𝑦
• 𝐺 𝑥, 𝑦 : the (x,y)-th grid, 𝐺 𝑥, 𝑦 : the number of sample-points in the grid
• 𝑝 𝑥, 𝑝 𝑦 : the coordinate of the sample point in grid 𝐺 𝑥, 𝑦
• ℱ 𝐵 ⋅ : the bilinear interpolation to make the Δ𝐶 differentiable
https://arxiv.org/abs/1703.06211
11. Methods
11
• Progressive constraint
• Classification branch
ℳ𝑐𝑙𝑠 = ℋ1 𝑦|𝐹𝑙, 𝑃 − ℋ1
𝐷
𝑦|𝐹𝑙, 𝜏 𝑐 𝑃, Δ𝐶 + 𝑚 𝑐 +
• ℋ 𝑦| ⋅ : the confidence score of the 𝑦-th class
• 𝑚 𝑐 : the predefined margin, ⋅ + : ReLU function
• Localization branch
ℳ𝑙𝑜𝑐 = 𝐼𝑜𝑈 ℬ, ℬ − 𝐼𝑜𝑈 ℬ 𝑫, ℬ + 𝑚 𝑟 +
• ℬ : the predicted box by sibling head
• ℬ 𝐷 : ℋ2
𝐷
𝐹𝑙, 𝜏 𝑟 𝑃, Δ𝑅 에서 regression, 𝑃가 negative proposal이면 ℳ𝑙𝑜𝑐은 무시
13. Experiments
13
• Dataset
• 80-category MS-COCO dataset
• 80k train images & 35k subset of val images & 5k val images for test & 20k test-dev
• 500-category OpenImageV5 challenge dataset
• 1,674,979 training images & 34,917 val images
• AP.5 on public leaderboard
• Implementation details
• ImageNet pre-trained models / hyper-parameters of Faster R-CNN
• Resize such that the shorter edge is 800 pixels / anchor scale = 8 / aspect ratio = {0.5, 1, 2}
• RoIAlign / the pooling size is 7 in both ℋ1
∗
and ℋ2
∗
/ …
14. Experiments
14
• Ablation studies
• 𝑚 𝑐 = 𝑚 𝑟 = 0.2
• Task-aware disentanglement
• Backbone과 head에서 다양한 decoupling option으로 실험 (Fig. 3)
• Backbone에서 decoupling하면 성능이 크게 저하 (D 𝑠8, D 𝑠16, D 𝑠32)
• Backbone의 semantic information을 공유되어야함
• Dℎ𝑒𝑎𝑑와 비교했을 때, TSD w/o PC가 소폭 상승
15. Experiments
15
• Ablation studies
• 𝑚 𝑐 = 𝑚 𝑟 = 0.2
• Joint training with sibling head ℋ∗
• TSD와 sibling head를 함께 학습하면 어떨까?
• 𝑃𝑐와 𝑃𝑟은 original proposal 𝑃와 충돌하지 않음! (Tab. 2)
• Effectiveness of PC
• TSD의 성능을 높이기 위해 PC 제안
• AP.75에서는 1.5나 향상, AP.5에서는 영향 거의 없음 (Tab. 3)
• PC가 더 정확한 classification과 regression을 유도
• IoU from 0.5:0.95에서 AP가 1.3 향상
16. Experiments
16
• Ablation studies
• 𝑚 𝑐 = 𝑚 𝑟 = 0.2
• Derived proposal learning manner for ℋ∗
𝑫
• 𝑃𝑟과 𝑃𝑐를 계산하는 방법의 조합을 다양하게 실험 (Tab. 4)
• 𝑃𝑜𝑖𝑛𝑡. 𝑤가 classification에서 이점이 분명하고 PC와 함께 사용하면 더 좋음
• 𝑃𝑟𝑜𝑝. 𝑤는 localization에 약간의 성능 개선
• Classification은 shape의 제약없는 optimal local feature이 필요
• Regression은 global geometric shape information이 유지되어야함
• Delving to the effective PC
• PC 값들에 대한 ablation study (Fig. 4)
• ℳ𝑙𝑜𝑠과 ℳ𝑐𝑙𝑠 모두 성능 향상
21. Experiments
21
• Analysis and discussion
• Performance in different IoU criteria
• IoU threshold가 증가함에 따라, 성능차가 점점 증가 (Fig. 6)
• Performance in different scale criteria
• AP threshold도 바꿔가며 확인 (Tab. 9)
22. Experiments
22
• Analysis and discussion
• What did TSD learn?
• False positive를 줄였고, 더 정확하게 bbox를 예측
• 𝑃𝑟 : translate to the boundary / 𝑃𝑐 : concentrate on the local appearance and object context information
23. Conclusion
23
• Conclusion
• Task-aware spatial disentanglement (TSD)
• To alleviate the inherent conflict in sibling head
• To learn the task-aware spatial disentanglement to bread through the performance limitation
• Progressive Constraint (PC)
• To enlarge the performance margin between the disentangled and the shared proposals
• To provide additional performance gain