SlideShare a Scribd company logo
1 of 24
Download to read offline
김 성 철
Contents
1. Introduction
2. Methods
3. Experiments
4. Conclusion
2
Introduction
3
• The specialized sibling head for both classification and localization
• Single stage, two-stage, anchor free 모두 사용됨
• Sibling head 내 두 object function 사이의 충돌 우려
• IoU-Net (2018)
• 좋은 classification score를 만드는 feature는 coarse bbox를 예측할 것
• Localization confidence로 IoU를 계산하는 extra head 추가
• Localization confidence와 classification confidence를 final classification score로 통합
• Tight bbox에 대한 confidence score는 높이고 좋지 않은 것은 줄임
• 하지만 spatial point의 misalignment는 여전히 남아있음
Introduction
4
• Double-Head R-CNN
• Sibling head를 classification, localization branch로 분리
• 두 task의 shared parameter를 감소시킴
• 성능은 향상되었지만, 두 branch로 들어가는 feature가 ROI pooling으로 만들어졌기 때문에
두 task간 충돌은 여전히 남아있음
• Anchor-based object detector의 sibling head를 살펴보자
• 각 레이어의 feature map에서 classification과 localization의 spatial sensitivity
• Classification : some salient area / bbox regression : boundary
• Spatial dimension에서 misalignment는 성능향상에 제한이 있음
Introduction
5
• Task-aware Spatial Disentanglement (TSD)
• Classification과 localization의 gradient flow를 공간적으로 분리하자
• 두 task가 절충없이 optimal location을 점차적으로 찾도록 함
• TSD와 classical sibling head 사이의 performance margin을 키우는 progressive constraint (PC)
• Hyper-parameter margin
Methods
6
• Task-aware Spatial Disentanglement (TSD)
• A rectangular bbox proposal 𝑃, the ground-truth bbox ℬ with class 𝑦
• Faster R-CNN : ℒ = ℒ 𝑐𝑙𝑠 ℋ1 𝐹𝑙, 𝑃 , 𝑦 + ℒ 𝑙𝑜𝑐 ℋ2 𝐹𝑙, 𝑃 , ℬ
• ℋ1 ⋅ = 𝑓 ⋅ , 𝒞 ⋅ , ℋ2 ⋅ = 𝑓 ⋅ , ℛ ⋅
• 𝑓 ⋅ : the feature extractor
• 𝒞 ⋅ and ℛ ⋅ : the functions for transforming feature to predict specific category and localize object
• A novel TSD head : ℒ = ℒ 𝑐𝑙𝑠
𝐷
ℋ1
𝐷
𝐹𝑙, ෠𝑃𝑐 , 𝑦 + ℒ 𝑙𝑜𝑐
𝐷
ℋ2
𝐷
𝐹𝑙, ෠𝑃𝑟 , ℬ
• ෠𝑃𝑐 = 𝜏 𝑐 𝑃, Δ𝐶 , ෠𝑃𝑟 = 𝜏 𝑟 𝑃, ΔR from the shared 𝑃
• ΔC : a pointwise deformation of 𝑃
• ΔR : a proposal-wise translation
• ℋ1
𝐷
⋅ = 𝑓𝑐 ⋅ , 𝒞 ⋅ , ℋ2
𝐷
⋅ = 𝑓𝑟 ⋅ , ℛ ⋅
Methods
7
• Task-aware Spatial Disentanglement (TSD)
• A rectangular bbox proposal 𝑃, the ground-truth bbox ℬ with class 𝑦
• A novel TSD head : ℒ = ℒ 𝑐𝑙𝑠
𝐷
ℋ1
𝐷
𝐹𝑙, ෠𝑃𝑐 , 𝑦 + ℒ 𝑙𝑜𝑐
𝐷
ℋ2
𝐷
𝐹𝑙, ෠𝑃𝑟 , ℬ
• ෠𝑃𝑐 = 𝜏 𝑐 𝑃, Δ𝐶 , ෠𝑃𝑟 = 𝜏 𝑟 𝑃, ΔR from the shared 𝑃
• ΔC : a pointwise deformation of 𝑃
• ΔR : a proposal-wise translation
• ℋ1
𝐷
⋅ = 𝑓𝑐 ⋅ , 𝒞 ⋅ , ℋ2
𝐷
⋅ = 𝑓𝑟 ⋅ , ℛ ⋅
• TSD는 𝑃의 RoI feature를 input으로 수행하고 disentangled proposal ෠𝑃𝑐와 ෠𝑃𝑟을 각각 생성
• 두 task는 분리된 proposal을 통해 spatial dimension에서 분리 가능
• ෠𝐹𝑐 (classification-specific feature map) → a three-layer fully connected networks for classification
• ෠𝐹𝑟 (localization-specific feature map) → ~ for localization
• 분리함으로써, task-aware feature representation을 배울 수 있음!
Methods
8
• Task-aware spatial disentanglement learning
• 𝐹 : the RoI feature of 𝑃일 때, deformation-learning manner를 TSD에 추가
• Localization
• ℱ𝑟 : 새로운 proposal ෠𝑃𝑟 생성을 위해 𝑃에서 proposal-wise translation 생성
Δ𝑅 = 𝛾ℱ𝑟 𝐹; 𝜃𝑟 ⋅ 𝑤, ℎ
• Δ𝑅 ∈ ℝ1×1×2
and the output of ℱ𝑟 for each layer : 256, 256, 2
• 𝛾 : a pre-defined scalar to modulate the magnitude of the Δ𝑅
• 𝑤, ℎ : the width and height of 𝑃
• The proposal-wise translation
෠𝑃𝑟 = 𝜏 𝑟 𝑃, Δ𝑅 = 𝑃 + ΔR
• 𝑃의 pixel의 좌표가 동일한 Δ𝑅을 통해 새로운 좌표로 이동
• Localization task만 적용
• Δ𝑅이 미분가능하도록 bilinear interpolation 적용
Methods
9
• Task-aware spatial disentanglement learning
• 𝐹 : the RoI feature of 𝑃일 때, deformation-learning manner를 TSD에 추가
• Classification
• ℱ𝑐 : 불규칙적인 shape의 ෠𝑃𝑐 생성을 위해 regular grid 𝑘 × 𝑘에서 pointwise deformation
• (x, y)-th grid에서 ෠𝑃𝑐에서 새로운 sample point를 얻기 위해 translation Δ𝐶 𝑥, 𝑦,∗ 수행
Δ𝐶 = 𝛾ℱ𝑐 𝐹; 𝜃𝑐 ⋅ 𝑤, ℎ
• Δ𝐶 ∈ ℝ 𝑘×𝑘×2
• ℱ𝑟 : a three-layer fully connected network with output 256, 256, 𝑘 × 𝑘 × 2
• 𝜃𝑐 : a learned parameter
Methods
10
• Task-aware spatial disentanglement learning
• 𝐹 : the RoI feature of 𝑃일 때, deformation-learning manner를 TSD에 추가
• Classification
• ℱ𝑟과 ℱ𝑐의 첫 번째 레이어는 parameter를 줄이기 위해 공유
• 불규칙한 ෠𝑃𝑐에서 feature map ෠𝐹𝑐을 만들기 위해, deformable RoI pooling 적용
෠𝐹𝑐 𝑥, 𝑦 = ෍
𝑝∈𝐺 𝑥,𝑦
ℱ 𝐵 𝑝0 + Δ𝐶 𝑥, 𝑦, 1 , 𝑝1 + Δ𝐶 𝑥, 𝑦, 2
𝐺 𝑥, 𝑦
• 𝐺 𝑥, 𝑦 : the (x,y)-th grid, 𝐺 𝑥, 𝑦 : the number of sample-points in the grid
• 𝑝 𝑥, 𝑝 𝑦 : the coordinate of the sample point in grid 𝐺 𝑥, 𝑦
• ℱ 𝐵 ⋅ : the bilinear interpolation to make the Δ𝐶 differentiable
https://arxiv.org/abs/1703.06211
Methods
11
• Progressive constraint
• Classification branch
ℳ𝑐𝑙𝑠 = ℋ1 𝑦|𝐹𝑙, 𝑃 − ℋ1
𝐷
𝑦|𝐹𝑙, 𝜏 𝑐 𝑃, Δ𝐶 + 𝑚 𝑐 +
• ℋ 𝑦| ⋅ : the confidence score of the 𝑦-th class
• 𝑚 𝑐 : the predefined margin, ⋅ + : ReLU function
• Localization branch
ℳ𝑙𝑜𝑐 = 𝐼𝑜𝑈 ෡ℬ, ℬ − 𝐼𝑜𝑈 ෡ℬ 𝑫, ℬ + 𝑚 𝑟 +
• ෡ℬ : the predicted box by sibling head
• ෡ℬ 𝐷 : ℋ2
𝐷
𝐹𝑙, 𝜏 𝑟 𝑃, Δ𝑅 에서 regression, 𝑃가 negative proposal이면 ℳ𝑙𝑜𝑐은 무시
Methods
12
• Progressive constraint
• Faster R-CNN에서 TSD의 전체 lossfunction
ℒ = ℒ 𝑟𝑝𝑛 + ℒ 𝑐𝑙𝑠 + ℒ 𝑙𝑜𝑐
𝑐𝑙𝑎𝑠𝑠𝑖𝑐𝑎𝑙 𝑙𝑜𝑠𝑠
+ ℒ 𝑐𝑙𝑠
𝐷
+ ℒ 𝑙𝑜𝑐
𝐷
+ ℳ𝑐𝑙𝑠 + ℳ𝑙𝑜𝑐
𝑇𝑆𝐷 𝑙𝑜𝑠𝑠
• TSD는 classification과 localization의 task-specific feature representation을 학습
Experiments
13
• Dataset
• 80-category MS-COCO dataset
• 80k train images & 35k subset of val images & 5k val images for test & 20k test-dev
• 500-category OpenImageV5 challenge dataset
• 1,674,979 training images & 34,917 val images
• AP.5 on public leaderboard
• Implementation details
• ImageNet pre-trained models / hyper-parameters of Faster R-CNN
• Resize such that the shorter edge is 800 pixels / anchor scale = 8 / aspect ratio = {0.5, 1, 2}
• RoIAlign / the pooling size is 7 in both ℋ1
∗
and ℋ2
∗
/ …
Experiments
14
• Ablation studies
• 𝑚 𝑐 = 𝑚 𝑟 = 0.2
• Task-aware disentanglement
• Backbone과 head에서 다양한 decoupling option으로 실험 (Fig. 3)
• Backbone에서 decoupling하면 성능이 크게 저하 (D 𝑠8, D 𝑠16, D 𝑠32)
• Backbone의 semantic information을 공유되어야함
• Dℎ𝑒𝑎𝑑와 비교했을 때, TSD w/o PC가 소폭 상승
Experiments
15
• Ablation studies
• 𝑚 𝑐 = 𝑚 𝑟 = 0.2
• Joint training with sibling head ℋ∗
• TSD와 sibling head를 함께 학습하면 어떨까?
• ෠𝑃𝑐와 ෠𝑃𝑟은 original proposal 𝑃와 충돌하지 않음! (Tab. 2)
• Effectiveness of PC
• TSD의 성능을 높이기 위해 PC 제안
• AP.75에서는 1.5나 향상, AP.5에서는 영향 거의 없음 (Tab. 3)
• PC가 더 정확한 classification과 regression을 유도
• IoU from 0.5:0.95에서 AP가 1.3 향상
Experiments
16
• Ablation studies
• 𝑚 𝑐 = 𝑚 𝑟 = 0.2
• Derived proposal learning manner for ℋ∗
𝑫
• ෠𝑃𝑟과 ෠𝑃𝑐를 계산하는 방법의 조합을 다양하게 실험 (Tab. 4)
• 𝑃𝑜𝑖𝑛𝑡. 𝑤가 classification에서 이점이 분명하고 PC와 함께 사용하면 더 좋음
• 𝑃𝑟𝑜𝑝. 𝑤는 localization에 약간의 성능 개선
• Classification은 shape의 제약없는 optimal local feature이 필요
• Regression은 global geometric shape information이 유지되어야함
• Delving to the effective PC
• PC 값들에 대한 ablation study (Fig. 4)
• ℳ𝑙𝑜𝑠과 ℳ𝑐𝑙𝑠 모두 성능 향상
Experiments
17
• Applicable to variant backbones
• 다른 모델에도 적용해보자
Experiments
18
• Applicable to Mask R-CNN
• Mask R-CNN같이 Instance Segmentation에도 적용해보자
Experiments
19
• Generalization on large-scale OpenImage
• COCO말고 OpenImage dataset에도 적용해보자
Experiments
20
• Comparison with state-of-the-Arts
• COCO SOTA 모델들과 비교해보자 (𝑚 𝑐 = 0.5, 𝑚 𝑟 = 0.2)
Experiments
21
• Analysis and discussion
• Performance in different IoU criteria
• IoU threshold가 증가함에 따라, 성능차가 점점 증가 (Fig. 6)
• Performance in different scale criteria
• AP threshold도 바꿔가며 확인 (Tab. 9)
Experiments
22
• Analysis and discussion
• What did TSD learn?
• False positive를 줄였고, 더 정확하게 bbox를 예측
• ෠𝑃𝑟 : translate to the boundary / ෠𝑃𝑐 : concentrate on the local appearance and object context information
Conclusion
23
• Conclusion
• Task-aware spatial disentanglement (TSD)
• To alleviate the inherent conflict in sibling head
• To learn the task-aware spatial disentanglement to bread through the performance limitation
• Progressive Constraint (PC)
• To enlarge the performance margin between the disentangled and the shared proposals
• To provide additional performance gain
감 사 합 니 다
24

More Related Content

What's hot

د.نادية باعشن - ملتقى سيدات الأعمال الثالث
د.نادية باعشن - ملتقى سيدات الأعمال الثالثد.نادية باعشن - ملتقى سيدات الأعمال الثالث
د.نادية باعشن - ملتقى سيدات الأعمال الثالثRiyadhBWF
 
CEO-018-領導的基本概念
CEO-018-領導的基本概念CEO-018-領導的基本概念
CEO-018-領導的基本概念handbook
 
清华大学精品课程 量子力学
清华大学精品课程 量子力学清华大学精品课程 量子力学
清华大学精品课程 量子力学littlesujin
 
ブラウザでMap Reduce風味の並列分散処理
ブラウザでMap Reduce風味の並列分散処理ブラウザでMap Reduce風味の並列分散処理
ブラウザでMap Reduce風味の並列分散処理Shinya Miyazaki
 
Lesson Plan
Lesson PlanLesson Plan
Lesson PlanDr22s
 
كتاب دراسات في_تعليم_القراءة_بمراحل_التعليم_العام
كتاب دراسات في_تعليم_القراءة_بمراحل_التعليم_العامكتاب دراسات في_تعليم_القراءة_بمراحل_التعليم_العام
كتاب دراسات في_تعليم_القراءة_بمراحل_التعليم_العامفوزية الوهابية
 
커리어특강자료_글로벌커리어 및 인공지능 커리어
커리어특강자료_글로벌커리어 및 인공지능 커리어커리어특강자료_글로벌커리어 및 인공지능 커리어
커리어특강자료_글로벌커리어 및 인공지능 커리어IBM HongKong
 
Republic 1&2
Republic 1&2Republic 1&2
Republic 1&2huquanwei
 
Building a New Substation from the Ground Up- Start off on the right foot wit...
Building a New Substation from the Ground Up- Start off on the right foot wit...Building a New Substation from the Ground Up- Start off on the right foot wit...
Building a New Substation from the Ground Up- Start off on the right foot wit...Roozbeh Molavi
 
Chinese Blogger Conference CIC Presentation Slides
Chinese Blogger Conference CIC Presentation SlidesChinese Blogger Conference CIC Presentation Slides
Chinese Blogger Conference CIC Presentation SlidesDenis Yu
 
秩序从哪里来?
秩序从哪里来?秩序从哪里来?
秩序从哪里来?guest8430ea2
 
Carbon nanotubes as fillers
Carbon nanotubes as fillersCarbon nanotubes as fillers
Carbon nanotubes as fillersguest654a09
 
Testing at the core of digital optimization
Testing at the core of digital optimizationTesting at the core of digital optimization
Testing at the core of digital optimizationFlorian Pihs
 
Japanese Article
Japanese ArticleJapanese Article
Japanese Articleguest148b6
 

What's hot (17)

د.نادية باعشن - ملتقى سيدات الأعمال الثالث
د.نادية باعشن - ملتقى سيدات الأعمال الثالثد.نادية باعشن - ملتقى سيدات الأعمال الثالث
د.نادية باعشن - ملتقى سيدات الأعمال الثالث
 
Japanese
JapaneseJapanese
Japanese
 
CEO-018-領導的基本概念
CEO-018-領導的基本概念CEO-018-領導的基本概念
CEO-018-領導的基本概念
 
sigfpai73-kaji
sigfpai73-kajisigfpai73-kaji
sigfpai73-kaji
 
清华大学精品课程 量子力学
清华大学精品课程 量子力学清华大学精品课程 量子力学
清华大学精品课程 量子力学
 
ブラウザでMap Reduce風味の並列分散処理
ブラウザでMap Reduce風味の並列分散処理ブラウザでMap Reduce風味の並列分散処理
ブラウザでMap Reduce風味の並列分散処理
 
From Virtual Worlds To The 3 D Web
From Virtual Worlds To The 3 D WebFrom Virtual Worlds To The 3 D Web
From Virtual Worlds To The 3 D Web
 
Lesson Plan
Lesson PlanLesson Plan
Lesson Plan
 
كتاب دراسات في_تعليم_القراءة_بمراحل_التعليم_العام
كتاب دراسات في_تعليم_القراءة_بمراحل_التعليم_العامكتاب دراسات في_تعليم_القراءة_بمراحل_التعليم_العام
كتاب دراسات في_تعليم_القراءة_بمراحل_التعليم_العام
 
커리어특강자료_글로벌커리어 및 인공지능 커리어
커리어특강자료_글로벌커리어 및 인공지능 커리어커리어특강자료_글로벌커리어 및 인공지능 커리어
커리어특강자료_글로벌커리어 및 인공지능 커리어
 
Republic 1&2
Republic 1&2Republic 1&2
Republic 1&2
 
Building a New Substation from the Ground Up- Start off on the right foot wit...
Building a New Substation from the Ground Up- Start off on the right foot wit...Building a New Substation from the Ground Up- Start off on the right foot wit...
Building a New Substation from the Ground Up- Start off on the right foot wit...
 
Chinese Blogger Conference CIC Presentation Slides
Chinese Blogger Conference CIC Presentation SlidesChinese Blogger Conference CIC Presentation Slides
Chinese Blogger Conference CIC Presentation Slides
 
秩序从哪里来?
秩序从哪里来?秩序从哪里来?
秩序从哪里来?
 
Carbon nanotubes as fillers
Carbon nanotubes as fillersCarbon nanotubes as fillers
Carbon nanotubes as fillers
 
Testing at the core of digital optimization
Testing at the core of digital optimizationTesting at the core of digital optimization
Testing at the core of digital optimization
 
Japanese Article
Japanese ArticleJapanese Article
Japanese Article
 

Similar to Revisiting the Sibling Head in Object Detector

創業家研習營-7分鐘創意簡報技巧,Mr.6劉威麟
創業家研習營-7分鐘創意簡報技巧,Mr.6劉威麟創業家研習營-7分鐘創意簡報技巧,Mr.6劉威麟
創業家研習營-7分鐘創意簡報技巧,Mr.6劉威麟taiwanweb20
 
設計思考研究演講 07.11
設計思考研究演講 07.11設計思考研究演講 07.11
設計思考研究演講 07.11NTUST
 
20070920173805
2007092017380520070920173805
200709201738055045033
 
Windows 7兼容性系列课程(2):Windows 7用户权限控制 (UAC)
Windows 7兼容性系列课程(2):Windows 7用户权限控制 (UAC)Windows 7兼容性系列课程(2):Windows 7用户权限控制 (UAC)
Windows 7兼容性系列课程(2):Windows 7用户权限控制 (UAC)Chui-Wen Chiu
 
企业级搜索引擎Solr交流
企业级搜索引擎Solr交流企业级搜索引擎Solr交流
企业级搜索引擎Solr交流chuan liang
 
QNBFS Daily Technical Trader Qatar - January 18, 2021 التحليل الفني اليومي لب...
QNBFS Daily Technical Trader Qatar - January 18, 2021 التحليل الفني اليومي لب...QNBFS Daily Technical Trader Qatar - January 18, 2021 التحليل الفني اليومي لب...
QNBFS Daily Technical Trader Qatar - January 18, 2021 التحليل الفني اليومي لب...QNB Group
 
Club Open Bidding Methodology
Club Open Bidding MethodologyClub Open Bidding Methodology
Club Open Bidding MethodologyfantasistaVppr
 
كتاب الذكاء العاطفي دانييل جولمان
كتاب الذكاء العاطفي   دانييل جولمانكتاب الذكاء العاطفي   دانييل جولمان
كتاب الذكاء العاطفي دانييل جولمانhamada13
 
QM-078-企業導入六標準差之個案探討
QM-078-企業導入六標準差之個案探討QM-078-企業導入六標準差之個案探討
QM-078-企業導入六標準差之個案探討handbook
 
Windows 7兼容性系列课程(5):Windows 7徽标认证
Windows 7兼容性系列课程(5):Windows 7徽标认证Windows 7兼容性系列课程(5):Windows 7徽标认证
Windows 7兼容性系列课程(5):Windows 7徽标认证Chui-Wen Chiu
 
ICLR2020読み会 Stable Rank Normalization
ICLR2020読み会 Stable Rank NormalizationICLR2020読み会 Stable Rank Normalization
ICLR2020読み会 Stable Rank Normalizationohken
 
標竿學習的價值
標竿學習的價值標竿學習的價值
標竿學習的價值HelloDaniel
 
JARIR BOOKSTORE 2
JARIR BOOKSTORE 2JARIR BOOKSTORE 2
JARIR BOOKSTORE 2gueste54184
 
Republic 1 2
Republic 1 2Republic 1 2
Republic 1 2huquanwei
 
Reconstituting Bodies Of Knowledge
Reconstituting Bodies Of KnowledgeReconstituting Bodies Of Knowledge
Reconstituting Bodies Of KnowledgeAHH
 
4200 Kte7.0 Training V1.0
4200 Kte7.0 Training V1.04200 Kte7.0 Training V1.0
4200 Kte7.0 Training V1.0wayneliao
 
maple , part2
maple , part2maple , part2
maple , part2ahamidp
 

Similar to Revisiting the Sibling Head in Object Detector (20)

創業家研習營-7分鐘創意簡報技巧,Mr.6劉威麟
創業家研習營-7分鐘創意簡報技巧,Mr.6劉威麟創業家研習營-7分鐘創意簡報技巧,Mr.6劉威麟
創業家研習營-7分鐘創意簡報技巧,Mr.6劉威麟
 
設計思考研究演講 07.11
設計思考研究演講 07.11設計思考研究演講 07.11
設計思考研究演講 07.11
 
20070920173805
2007092017380520070920173805
20070920173805
 
Windows 7兼容性系列课程(2):Windows 7用户权限控制 (UAC)
Windows 7兼容性系列课程(2):Windows 7用户权限控制 (UAC)Windows 7兼容性系列课程(2):Windows 7用户权限控制 (UAC)
Windows 7兼容性系列课程(2):Windows 7用户权限控制 (UAC)
 
企业级搜索引擎Solr交流
企业级搜索引擎Solr交流企业级搜索引擎Solr交流
企业级搜索引擎Solr交流
 
QNBFS Daily Technical Trader Qatar - January 18, 2021 التحليل الفني اليومي لب...
QNBFS Daily Technical Trader Qatar - January 18, 2021 التحليل الفني اليومي لب...QNBFS Daily Technical Trader Qatar - January 18, 2021 التحليل الفني اليومي لب...
QNBFS Daily Technical Trader Qatar - January 18, 2021 التحليل الفني اليومي لب...
 
Club Open Bidding Methodology
Club Open Bidding MethodologyClub Open Bidding Methodology
Club Open Bidding Methodology
 
Ph2
Ph2Ph2
Ph2
 
كتاب الذكاء العاطفي دانييل جولمان
كتاب الذكاء العاطفي   دانييل جولمانكتاب الذكاء العاطفي   دانييل جولمان
كتاب الذكاء العاطفي دانييل جولمان
 
QM-078-企業導入六標準差之個案探討
QM-078-企業導入六標準差之個案探討QM-078-企業導入六標準差之個案探討
QM-078-企業導入六標準差之個案探討
 
Windows 7兼容性系列课程(5):Windows 7徽标认证
Windows 7兼容性系列课程(5):Windows 7徽标认证Windows 7兼容性系列课程(5):Windows 7徽标认证
Windows 7兼容性系列课程(5):Windows 7徽标认证
 
ICLR2020読み会 Stable Rank Normalization
ICLR2020読み会 Stable Rank NormalizationICLR2020読み会 Stable Rank Normalization
ICLR2020読み会 Stable Rank Normalization
 
標竿學習的價值
標竿學習的價值標竿學習的價值
標竿學習的價值
 
It Flyer Page08
It Flyer Page08It Flyer Page08
It Flyer Page08
 
JARIR BOOKSTORE 2
JARIR BOOKSTORE 2JARIR BOOKSTORE 2
JARIR BOOKSTORE 2
 
Republic 1 2
Republic 1 2Republic 1 2
Republic 1 2
 
Reconstituting Bodies Of Knowledge
Reconstituting Bodies Of KnowledgeReconstituting Bodies Of Knowledge
Reconstituting Bodies Of Knowledge
 
4200 Kte7.0 Training V1.0
4200 Kte7.0 Training V1.04200 Kte7.0 Training V1.0
4200 Kte7.0 Training V1.0
 
maple , part2
maple , part2maple , part2
maple , part2
 
Manabeh
ManabehManabeh
Manabeh
 

More from Sungchul Kim

PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo SupervisionPR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo SupervisionSungchul Kim
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksSungchul Kim
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersSungchul Kim
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningSungchul Kim
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsSungchul Kim
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningSungchul Kim
 
Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Sungchul Kim
 
Going Deeper with Convolutions
Going Deeper with ConvolutionsGoing Deeper with Convolutions
Going Deeper with ConvolutionsSungchul Kim
 
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based LocalizationGrad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based LocalizationSungchul Kim
 
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...Sungchul Kim
 
Panoptic Segmentation
Panoptic SegmentationPanoptic Segmentation
Panoptic SegmentationSungchul Kim
 
On the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and BeyondOn the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and BeyondSungchul Kim
 
A Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural NetworksA Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural NetworksSungchul Kim
 
KDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial NetworksKDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial NetworksSungchul Kim
 
Designing Network Design Spaces
Designing Network Design SpacesDesigning Network Design Spaces
Designing Network Design SpacesSungchul Kim
 
Search to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSearch to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSungchul Kim
 
Supervised Constrastive Learning
Supervised Constrastive LearningSupervised Constrastive Learning
Supervised Constrastive LearningSungchul Kim
 
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised LearningBootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised LearningSungchul Kim
 
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...Sungchul Kim
 
Regularizing Class-wise Predictions via Self-knowledge Distillation
Regularizing Class-wise Predictions via Self-knowledge DistillationRegularizing Class-wise Predictions via Self-knowledge Distillation
Regularizing Class-wise Predictions via Self-knowledge DistillationSungchul Kim
 

More from Sungchul Kim (20)

PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo SupervisionPR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural Networks
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential Equations
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+
 
Going Deeper with Convolutions
Going Deeper with ConvolutionsGoing Deeper with Convolutions
Going Deeper with Convolutions
 
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based LocalizationGrad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
 
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
 
Panoptic Segmentation
Panoptic SegmentationPanoptic Segmentation
Panoptic Segmentation
 
On the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and BeyondOn the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and Beyond
 
A Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural NetworksA Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural Networks
 
KDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial NetworksKDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial Networks
 
Designing Network Design Spaces
Designing Network Design SpacesDesigning Network Design Spaces
Designing Network Design Spaces
 
Search to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSearch to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the Eyes
 
Supervised Constrastive Learning
Supervised Constrastive LearningSupervised Constrastive Learning
Supervised Constrastive Learning
 
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised LearningBootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
 
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
 
Regularizing Class-wise Predictions via Self-knowledge Distillation
Regularizing Class-wise Predictions via Self-knowledge DistillationRegularizing Class-wise Predictions via Self-knowledge Distillation
Regularizing Class-wise Predictions via Self-knowledge Distillation
 

Recently uploaded

Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Memory Interfacing of 8086 with DMA 8257
Memory Interfacing of 8086 with DMA 8257Memory Interfacing of 8086 with DMA 8257
Memory Interfacing of 8086 with DMA 8257subhasishdas79
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxMustafa Ahmed
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxkalpana413121
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxNANDHAKUMARA10
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfsumitt6_25730773
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...josephjonse
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesRashidFaridChishti
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...HenryBriggs2
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelDrAjayKumarYadav4
 
fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptAfnanAhmad53
 

Recently uploaded (20)

Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Memory Interfacing of 8086 with DMA 8257
Memory Interfacing of 8086 with DMA 8257Memory Interfacing of 8086 with DMA 8257
Memory Interfacing of 8086 with DMA 8257
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptx
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptx
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
 
Signal Processing and Linear System Analysis
Signal Processing and Linear System AnalysisSignal Processing and Linear System Analysis
Signal Processing and Linear System Analysis
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata Model
 
fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .ppt
 

Revisiting the Sibling Head in Object Detector

  • 2. Contents 1. Introduction 2. Methods 3. Experiments 4. Conclusion 2
  • 3. Introduction 3 • The specialized sibling head for both classification and localization • Single stage, two-stage, anchor free 모두 사용됨 • Sibling head 내 두 object function 사이의 충돌 우려 • IoU-Net (2018) • 좋은 classification score를 만드는 feature는 coarse bbox를 예측할 것 • Localization confidence로 IoU를 계산하는 extra head 추가 • Localization confidence와 classification confidence를 final classification score로 통합 • Tight bbox에 대한 confidence score는 높이고 좋지 않은 것은 줄임 • 하지만 spatial point의 misalignment는 여전히 남아있음
  • 4. Introduction 4 • Double-Head R-CNN • Sibling head를 classification, localization branch로 분리 • 두 task의 shared parameter를 감소시킴 • 성능은 향상되었지만, 두 branch로 들어가는 feature가 ROI pooling으로 만들어졌기 때문에 두 task간 충돌은 여전히 남아있음 • Anchor-based object detector의 sibling head를 살펴보자 • 각 레이어의 feature map에서 classification과 localization의 spatial sensitivity • Classification : some salient area / bbox regression : boundary • Spatial dimension에서 misalignment는 성능향상에 제한이 있음
  • 5. Introduction 5 • Task-aware Spatial Disentanglement (TSD) • Classification과 localization의 gradient flow를 공간적으로 분리하자 • 두 task가 절충없이 optimal location을 점차적으로 찾도록 함 • TSD와 classical sibling head 사이의 performance margin을 키우는 progressive constraint (PC) • Hyper-parameter margin
  • 6. Methods 6 • Task-aware Spatial Disentanglement (TSD) • A rectangular bbox proposal 𝑃, the ground-truth bbox ℬ with class 𝑦 • Faster R-CNN : ℒ = ℒ 𝑐𝑙𝑠 ℋ1 𝐹𝑙, 𝑃 , 𝑦 + ℒ 𝑙𝑜𝑐 ℋ2 𝐹𝑙, 𝑃 , ℬ • ℋ1 ⋅ = 𝑓 ⋅ , 𝒞 ⋅ , ℋ2 ⋅ = 𝑓 ⋅ , ℛ ⋅ • 𝑓 ⋅ : the feature extractor • 𝒞 ⋅ and ℛ ⋅ : the functions for transforming feature to predict specific category and localize object • A novel TSD head : ℒ = ℒ 𝑐𝑙𝑠 𝐷 ℋ1 𝐷 𝐹𝑙, ෠𝑃𝑐 , 𝑦 + ℒ 𝑙𝑜𝑐 𝐷 ℋ2 𝐷 𝐹𝑙, ෠𝑃𝑟 , ℬ • ෠𝑃𝑐 = 𝜏 𝑐 𝑃, Δ𝐶 , ෠𝑃𝑟 = 𝜏 𝑟 𝑃, ΔR from the shared 𝑃 • ΔC : a pointwise deformation of 𝑃 • ΔR : a proposal-wise translation • ℋ1 𝐷 ⋅ = 𝑓𝑐 ⋅ , 𝒞 ⋅ , ℋ2 𝐷 ⋅ = 𝑓𝑟 ⋅ , ℛ ⋅
  • 7. Methods 7 • Task-aware Spatial Disentanglement (TSD) • A rectangular bbox proposal 𝑃, the ground-truth bbox ℬ with class 𝑦 • A novel TSD head : ℒ = ℒ 𝑐𝑙𝑠 𝐷 ℋ1 𝐷 𝐹𝑙, ෠𝑃𝑐 , 𝑦 + ℒ 𝑙𝑜𝑐 𝐷 ℋ2 𝐷 𝐹𝑙, ෠𝑃𝑟 , ℬ • ෠𝑃𝑐 = 𝜏 𝑐 𝑃, Δ𝐶 , ෠𝑃𝑟 = 𝜏 𝑟 𝑃, ΔR from the shared 𝑃 • ΔC : a pointwise deformation of 𝑃 • ΔR : a proposal-wise translation • ℋ1 𝐷 ⋅ = 𝑓𝑐 ⋅ , 𝒞 ⋅ , ℋ2 𝐷 ⋅ = 𝑓𝑟 ⋅ , ℛ ⋅ • TSD는 𝑃의 RoI feature를 input으로 수행하고 disentangled proposal ෠𝑃𝑐와 ෠𝑃𝑟을 각각 생성 • 두 task는 분리된 proposal을 통해 spatial dimension에서 분리 가능 • ෠𝐹𝑐 (classification-specific feature map) → a three-layer fully connected networks for classification • ෠𝐹𝑟 (localization-specific feature map) → ~ for localization • 분리함으로써, task-aware feature representation을 배울 수 있음!
  • 8. Methods 8 • Task-aware spatial disentanglement learning • 𝐹 : the RoI feature of 𝑃일 때, deformation-learning manner를 TSD에 추가 • Localization • ℱ𝑟 : 새로운 proposal ෠𝑃𝑟 생성을 위해 𝑃에서 proposal-wise translation 생성 Δ𝑅 = 𝛾ℱ𝑟 𝐹; 𝜃𝑟 ⋅ 𝑤, ℎ • Δ𝑅 ∈ ℝ1×1×2 and the output of ℱ𝑟 for each layer : 256, 256, 2 • 𝛾 : a pre-defined scalar to modulate the magnitude of the Δ𝑅 • 𝑤, ℎ : the width and height of 𝑃 • The proposal-wise translation ෠𝑃𝑟 = 𝜏 𝑟 𝑃, Δ𝑅 = 𝑃 + ΔR • 𝑃의 pixel의 좌표가 동일한 Δ𝑅을 통해 새로운 좌표로 이동 • Localization task만 적용 • Δ𝑅이 미분가능하도록 bilinear interpolation 적용
  • 9. Methods 9 • Task-aware spatial disentanglement learning • 𝐹 : the RoI feature of 𝑃일 때, deformation-learning manner를 TSD에 추가 • Classification • ℱ𝑐 : 불규칙적인 shape의 ෠𝑃𝑐 생성을 위해 regular grid 𝑘 × 𝑘에서 pointwise deformation • (x, y)-th grid에서 ෠𝑃𝑐에서 새로운 sample point를 얻기 위해 translation Δ𝐶 𝑥, 𝑦,∗ 수행 Δ𝐶 = 𝛾ℱ𝑐 𝐹; 𝜃𝑐 ⋅ 𝑤, ℎ • Δ𝐶 ∈ ℝ 𝑘×𝑘×2 • ℱ𝑟 : a three-layer fully connected network with output 256, 256, 𝑘 × 𝑘 × 2 • 𝜃𝑐 : a learned parameter
  • 10. Methods 10 • Task-aware spatial disentanglement learning • 𝐹 : the RoI feature of 𝑃일 때, deformation-learning manner를 TSD에 추가 • Classification • ℱ𝑟과 ℱ𝑐의 첫 번째 레이어는 parameter를 줄이기 위해 공유 • 불규칙한 ෠𝑃𝑐에서 feature map ෠𝐹𝑐을 만들기 위해, deformable RoI pooling 적용 ෠𝐹𝑐 𝑥, 𝑦 = ෍ 𝑝∈𝐺 𝑥,𝑦 ℱ 𝐵 𝑝0 + Δ𝐶 𝑥, 𝑦, 1 , 𝑝1 + Δ𝐶 𝑥, 𝑦, 2 𝐺 𝑥, 𝑦 • 𝐺 𝑥, 𝑦 : the (x,y)-th grid, 𝐺 𝑥, 𝑦 : the number of sample-points in the grid • 𝑝 𝑥, 𝑝 𝑦 : the coordinate of the sample point in grid 𝐺 𝑥, 𝑦 • ℱ 𝐵 ⋅ : the bilinear interpolation to make the Δ𝐶 differentiable https://arxiv.org/abs/1703.06211
  • 11. Methods 11 • Progressive constraint • Classification branch ℳ𝑐𝑙𝑠 = ℋ1 𝑦|𝐹𝑙, 𝑃 − ℋ1 𝐷 𝑦|𝐹𝑙, 𝜏 𝑐 𝑃, Δ𝐶 + 𝑚 𝑐 + • ℋ 𝑦| ⋅ : the confidence score of the 𝑦-th class • 𝑚 𝑐 : the predefined margin, ⋅ + : ReLU function • Localization branch ℳ𝑙𝑜𝑐 = 𝐼𝑜𝑈 ෡ℬ, ℬ − 𝐼𝑜𝑈 ෡ℬ 𝑫, ℬ + 𝑚 𝑟 + • ෡ℬ : the predicted box by sibling head • ෡ℬ 𝐷 : ℋ2 𝐷 𝐹𝑙, 𝜏 𝑟 𝑃, Δ𝑅 에서 regression, 𝑃가 negative proposal이면 ℳ𝑙𝑜𝑐은 무시
  • 12. Methods 12 • Progressive constraint • Faster R-CNN에서 TSD의 전체 lossfunction ℒ = ℒ 𝑟𝑝𝑛 + ℒ 𝑐𝑙𝑠 + ℒ 𝑙𝑜𝑐 𝑐𝑙𝑎𝑠𝑠𝑖𝑐𝑎𝑙 𝑙𝑜𝑠𝑠 + ℒ 𝑐𝑙𝑠 𝐷 + ℒ 𝑙𝑜𝑐 𝐷 + ℳ𝑐𝑙𝑠 + ℳ𝑙𝑜𝑐 𝑇𝑆𝐷 𝑙𝑜𝑠𝑠 • TSD는 classification과 localization의 task-specific feature representation을 학습
  • 13. Experiments 13 • Dataset • 80-category MS-COCO dataset • 80k train images & 35k subset of val images & 5k val images for test & 20k test-dev • 500-category OpenImageV5 challenge dataset • 1,674,979 training images & 34,917 val images • AP.5 on public leaderboard • Implementation details • ImageNet pre-trained models / hyper-parameters of Faster R-CNN • Resize such that the shorter edge is 800 pixels / anchor scale = 8 / aspect ratio = {0.5, 1, 2} • RoIAlign / the pooling size is 7 in both ℋ1 ∗ and ℋ2 ∗ / …
  • 14. Experiments 14 • Ablation studies • 𝑚 𝑐 = 𝑚 𝑟 = 0.2 • Task-aware disentanglement • Backbone과 head에서 다양한 decoupling option으로 실험 (Fig. 3) • Backbone에서 decoupling하면 성능이 크게 저하 (D 𝑠8, D 𝑠16, D 𝑠32) • Backbone의 semantic information을 공유되어야함 • Dℎ𝑒𝑎𝑑와 비교했을 때, TSD w/o PC가 소폭 상승
  • 15. Experiments 15 • Ablation studies • 𝑚 𝑐 = 𝑚 𝑟 = 0.2 • Joint training with sibling head ℋ∗ • TSD와 sibling head를 함께 학습하면 어떨까? • ෠𝑃𝑐와 ෠𝑃𝑟은 original proposal 𝑃와 충돌하지 않음! (Tab. 2) • Effectiveness of PC • TSD의 성능을 높이기 위해 PC 제안 • AP.75에서는 1.5나 향상, AP.5에서는 영향 거의 없음 (Tab. 3) • PC가 더 정확한 classification과 regression을 유도 • IoU from 0.5:0.95에서 AP가 1.3 향상
  • 16. Experiments 16 • Ablation studies • 𝑚 𝑐 = 𝑚 𝑟 = 0.2 • Derived proposal learning manner for ℋ∗ 𝑫 • ෠𝑃𝑟과 ෠𝑃𝑐를 계산하는 방법의 조합을 다양하게 실험 (Tab. 4) • 𝑃𝑜𝑖𝑛𝑡. 𝑤가 classification에서 이점이 분명하고 PC와 함께 사용하면 더 좋음 • 𝑃𝑟𝑜𝑝. 𝑤는 localization에 약간의 성능 개선 • Classification은 shape의 제약없는 optimal local feature이 필요 • Regression은 global geometric shape information이 유지되어야함 • Delving to the effective PC • PC 값들에 대한 ablation study (Fig. 4) • ℳ𝑙𝑜𝑠과 ℳ𝑐𝑙𝑠 모두 성능 향상
  • 17. Experiments 17 • Applicable to variant backbones • 다른 모델에도 적용해보자
  • 18. Experiments 18 • Applicable to Mask R-CNN • Mask R-CNN같이 Instance Segmentation에도 적용해보자
  • 19. Experiments 19 • Generalization on large-scale OpenImage • COCO말고 OpenImage dataset에도 적용해보자
  • 20. Experiments 20 • Comparison with state-of-the-Arts • COCO SOTA 모델들과 비교해보자 (𝑚 𝑐 = 0.5, 𝑚 𝑟 = 0.2)
  • 21. Experiments 21 • Analysis and discussion • Performance in different IoU criteria • IoU threshold가 증가함에 따라, 성능차가 점점 증가 (Fig. 6) • Performance in different scale criteria • AP threshold도 바꿔가며 확인 (Tab. 9)
  • 22. Experiments 22 • Analysis and discussion • What did TSD learn? • False positive를 줄였고, 더 정확하게 bbox를 예측 • ෠𝑃𝑟 : translate to the boundary / ෠𝑃𝑐 : concentrate on the local appearance and object context information
  • 23. Conclusion 23 • Conclusion • Task-aware spatial disentanglement (TSD) • To alleviate the inherent conflict in sibling head • To learn the task-aware spatial disentanglement to bread through the performance limitation • Progressive Constraint (PC) • To enlarge the performance margin between the disentangled and the shared proposals • To provide additional performance gain
  • 24. 감 사 합 니 다 24