아리랑 위성영상 AI 객체 검출 경진대회 2등 수상자 솔루션

https://dacon.io
아리랑 위성영상 AI 객체 검출
경진대회
{green669, sbson0621, rnans33, karma1002, ohhs}@koreatech.ac.kr
DICE Lab
KOREATECH

DICE Lab
Deep Intelligence for Cognitive Environment (DICE) Lab
- School of Computer Science and Engineering, KOREATECH
- Focus on deep intelligence systems by understanding various
cognitive environments based on language and vision
technologies.
- https://www.dicelab.kr
- Looking for self-motivated graduate students!
2

목차
https://dacon.io 3
1 문제 정의
2 위성영상 도전 과제
3 접근 방법
4 모델
5 실험 결과
6 결론

https://dacon.io 5
1. 문제 정의
아리랑 위성영상에 내재된 다수의 객체를 신속, 정확하게 탐지 가능한 인공지능
알고리즘 개발
• Object Detection
• Classification
• Localization
• Horizontal bounding boxes
• Oriented bounding boxes

https://dacon.io 6
1. 문제 정의
Classification
Cat ( x, y, w, h )
Localization
Horizontal
bounding box
Object Detection
Classification + Localization(Horizontal bounding box)

https://dacon.io 7
1. 문제 정의
Classification Localization
Aerial Object Detection
Classification + Localization(Oriented bounding box)
Ship[1] ( x1, y1, x2, y2, x3, y3, x4, y4 )
Oriented
bounding box

https://dacon.io 8
1. 문제 정의
Classification Localization
Aerial Object Detection
Classification + Localization(Oriented bounding box)
Ship[1] ( x1, y1, x2, y2, x3, y3, x4, y4 )
Oriented
bounding box
Aerial Object Detection Task

https://dacon.io 10
2. 위성영상 도전 과제
1. 데이터 수의 부족
2. 항공뷰의 특징
3. 데이터의 불균형
4. 데이터 레이블링 노이즈

https://dacon.io 11
2. 위성영상 도전 과제1
Dataset Classes Images
ILSVRC 2014 200 516,840
COCO2017 80 163,957
PASCAL VOC 2012 20 22,531
Aerial Data 15 1,200
<Aerial Data vs 일반 OD데이터 셋 비교>
데이터 수의 부족

https://dacon.io 12
다양한 회전각도
모든 각도를 고려하기 어려움
물체의 밀집
밀집된 물체의 검출이 어려움
항공뷰의 특징

https://dacon.io 13
<Aerial Data Class 별 객체 수 비교>
데이터의 개수 불균형
Class Images Ratio
Car 154,348 75.658 %
Truck 20,931 10.259 %
Small ship 13,533 6.633 %
Train 5,648 2.768 %
Bus 5,429 2.661 %
Oil tank 1,093 0.535 %
Military aircraft 1,031 0.505 %
Civilian aircraft 550 0.269 %
Large ship 348 0.170 %
Crane 315 0.154 %
Roundabout 219 0.107 %
Dam 184 0.090 %
Helipad 155 0.075 %
Bridge 136 0.066 %
Athletic field 87 0.042 %

https://dacon.io 14
Class length height
Dam 259.96 241.30
Athletic field 241.14 257.71
Bridge 173.66 158.53
Large ship 102.95 117.03
Roundabout 98.00 95.06
Crane 73.14 60.51
Civilian aircraft 58.27 58.61
Oil tank 39.36 35.54
Military aircraft 33.77 32.39
Train 26.80 16.80
Helipad 21.70 17.77
Small ship 15.91 15.88
Bus 13.93 16.05
Truck 12.47 13.16
Car 6.96 7.27
<Aerial Data Class 별 크기 비교>
데이터의 크기 불균형

https://dacon.io 15
자동차?
학습 데이터 예시(빨강은 실제 어노테이션, 초록은 사람이봐도 애매한 부분)
데이터 레이블링 노이즈

https://dacon.io 17
3. 접근 방법
• 데이터 수의 부족
 Multi-scale data augmentation - training step
• 항공뷰의 특징
 RoI Transformer
 𝐒 𝟐
A-Nets
 Ensemble
• 데이터의 불균형
 Class-dependent IoU thresholding
 Multi-scale augmentation - inference step
 Few shot learning
• 데이터 레이블링 노이즈
 heteroscedastic uncertainty를 활용한 노이즈 캔슬링

https://dacon.io 18
3. 접근 방법
 RoI Transformer
 𝐒 𝟐
A-Nets
 Ensemble
 Heteroscedastic uncertainty를 활용한 노이즈 캔슬링

https://dacon.io 19
3. 접근 방법 - Multi-scale data augmentation
• 학습데이터의 크기를 다양한 스케일로 학습[2]시키는 것
• 장점
• 다양한 크기의 object에 강건한 모델을 만들 수 있음
• Ours
• 이미지를 1.0배, 1.5배, 2배, 4배의 스케일로 생성

https://dacon.io 20
3. 접근 방법
 RoI Transformer
 𝐒 𝟐
A-Nets
 Ensemble

https://dacon.io 21
3. 접근 방법 - RoI Transformer
• RoI transformer[3]
• Angle 자체를 추론함으로써 연산량을 극적으로 줄임
• anchor의 개수 = (num_scale * num_aspect_ratio * num_angles
 anchor의 개수 = (num_scale * num_aspect_ratio * 1)
• 표현 가능한 angle의 종류를 무한개로 늘림
• num_angles ∈ {
π
2
,
π
3
,
𝜋
4
, … ,
𝜋
𝑛
}, finite set
 num_angles ∈ 𝑅 , infinite set

https://dacon.io 22
3. 접근 방법 - 𝐒 𝟐
A-Nets
S2
A-Nets(Single-shot Alignment Network)[4]
• Feature Alignment Module(FAM)
 고품질 anchor를 생성하고, alignment convolution을 통해 anchor 위
치에 맞는 convolution을 수행함.
• Oriented Detection Module(ODM)
 Active Rotating Filters(ARF)를 사용해 방향 정보를 인코딩하여
orientation-sensitive feature를 제공함.

https://dacon.io 23
3. 접근 방법 - Ensemble
• Ensemble
• RoI Transformer 장점
• 2-stage 모델로 정확한 regression이 가능함.
• S2
A-Nets 장점
• Alignment covolution을 이용해 정확한 객체의 위치에 대한 연산이
가능한 모델로 classification성능이 우수함.
• RoI Transformer결과와 S2
A-Nets결과를 합하여 NMS를 해줌.
 RoI Transformer의 장점과 S2
A-Nets장점이 모두 드러난 앙상블 효과를 냄.

https://dacon.io 24
3. 접근 방법
 RoI Transformer
 𝐒 𝟐
A-Nets
 Ensemble

https://dacon.io 25
3. 접근 방법 - Class-dependent IoU thresholding
개수가 많은 객체에 높은 threshold를 적용
Class Images Ratio Threshold
Car 154,348 75.658 % 0.75
Truck 20,931 10.259 % 0.26
Small ship 13,533 6.633 % 0.15
Train 5,648 2.768 % 0.01
Bus 5,429 2.661 % 0.15
Oil tank 1,093 0.535 % 0.01
Military aircraft 1,031 0.505 % 0.01
Civilian aircraft 550 0.269 % 0.01
Large ship 348 0.170 % 0.01
Crane 315 0.154 % 0.01
Roundabout 219 0.107 % 0.01
Dam 184 0.090 % 0.01
Helipad 155 0.075 % 0.01
Bridge 136 0.066 % 0.01
Athletic field 87 0.042 % 0.01

https://dacon.io 26
3. 접근 방법 - Multi-scale augmentation - inference step
Class length height
Dam 259.96 241.30
Athletic field 241.14 257.71
Bridge 173.66 158.53
Large ship 102.95 117.03
Roundabout 98.00 95.06
Crane 73.14 60.51
Civilian aircraft 58.27 58.61
Oil tank 39.36 35.54
Military aircraft 33.77 32.39
Train 26.80 16.80
Helipad 21.70 17.77
Small ship 15.91 15.88
Bus 13.93 16.05
Truck 12.47 13.16
Car 6.96 7.27
클래스별 객체의 크기 차이가 있기 때문에 Inference step에서 다양한
크기의 이미지를 보고 추론하도록 함

https://dacon.io 27
3. 접근 방법 – Few shot learning
개수가 적은 10개의 객체에 대하여 따로 학습을 진행함
Class Images Ratio
Car 154,348 75.658 %
Truck 20,931 10.259 %
Small ship 13,533 6.633 %
Train 5,648 2.768 %
Bus 5,429 2.661 %
Oil tank 1,093 0.535 %
Military aircraft 1,031 0.505 %
Civilian aircraft 550 0.269 %
Large ship 348 0.170 %
Crane 315 0.154 %
Roundabout 219 0.107 %
Dam 184 0.090 %
Helipad 155 0.075 %
Bridge 136 0.066 %
Athletic field 87 0.042 %

https://dacon.io 28
3. 접근 방법
 RoI Transformer
 𝐒 𝟐
A-Nets
 Ensemble

https://dacon.io 29
3. 접근 방법 – 노이즈 캔슬링
p y|f 𝜔
x = N 𝑓 x; 𝜔 , 𝝉−𝟏
𝑰 𝑫
prediction f 𝜔
x 가 주어졌을 때
f 𝜔
x 에 가우시안 노이즈를 섞어서 최종적인 y값을 만들 수 있음

https://dacon.io 30
p y|f 𝜔
x = N 𝑓 x; 𝜔 , 𝝉−𝟏
𝑰 𝑫
f 𝜔
x 는 이상적으로 존재할 거라 믿는 값으로
실제값에서 노이즈를 덜어낸 값

https://dacon.io 31
p y|f 𝜔
x = N 𝑓 x; 𝜔 , 𝝉−𝟏
𝑰 𝑫
종래에는 𝝉−𝟏로 레이블링 데이터 자체의 노이즈에 대한 믿음을
사람이 하이퍼파라미터로 주었음

https://dacon.io 32
p y|f 𝜔
x = N 𝑓 x; 𝜔 , 𝝈 𝟐
(𝐱)
데이터로부터 노이즈를 학습시켜 노이즈 추론도 가능

https://dacon.io 33
p y|f 𝜔
x = N 𝑓 x; 𝜔 , 𝝉−𝟏
𝑰 𝑫 : homoscedastic uncertainty
p y|f 𝜔 x = N 𝑓 x; 𝜔 , 𝝈 𝟐(𝐱) : heteroscedastic uncertainty
기존에 homoscedastic(하이퍼파라미터 느낌)으로 주던 노이즈 캔
슬을 heteroscedastic(파라미터) 데이터별로 추론가능하게 줌[2]

https://dacon.io 34
p y|f 𝜔
x = N 𝑓 x; 𝜔 , 𝝉−𝟏
𝑰 𝑫 : homoscedastic uncertainty
p y|f 𝜔 x = N 𝑓 x; 𝜔 , 𝝈 𝟐(𝐱) : heteroscedastic uncertainty
노이즈 캔슬링노이즈 캔슬링된
output

https://dacon.io 36
4. 전체 프로세스
I 𝑛
3x1024x1024
𝒇 𝒂𝒖𝒈 𝐼 𝑛
Multi-scale
Augmentation
patch 1
⋯patch 2
patch k
⋯
patch N
RoI Transformer
S2A-Nets
Box Regression
Classification
𝒇 𝒎𝒆𝒓𝒈𝒆(
[patch 1,
…
patch N])
Merge Output image
3x1024x1024
Ship
patch 1
⋯
patch 2
patch k
⋯
patch N
Ensemble

https://dacon.io 38
5. 실험 셋업
Train(RoI Transformer)
• Pretrained backbone model : ResNetXt[5]
• GPU : Tesla V100 8대
• Epochs : 7
• Optimzer : SGD
• Time : 2.5 days
Hyper-parameter Value
Learning rate 0.01
Learning rate decay 0.1
Weight decay 0.0001
Warm up iteration 500
Warm up ratio 1.0 / 3
Milestones [6]
Momentum 0.9
<Train parameter setting>
Train(S2aNet)
• Pretrained backbone model : ResNetXt[5]
• GPU : Tesla V100 8대
• Epochs : 3, 12
• Optimzer : SGD
• Time : 1 day
Hyper-parameter Value
Learning rate 0.01
Learning rate decay 0.1
Weight decay 0.0001
Warm up iteration 500
Warm up ratio 1.0 / 3
Milestones [8,11]
Momentum 0.9
<Train parameter setting>

https://dacon.io 39
5. 실험 결과 - RoI Tranformer
Model Multi-scale training Multi-scale test
Pulblic
(mAP)
Resnet101 (700x700), (1024x1024) (700x700), (1024x1024) 0.4730
Resnext101 (700x700), (1024x1024) (700x700), (1024x1024) 0.5020
Resnext101 (700x700), (1024x1024) (600x600), (800x800), (1024x1024)
0.5350
(+0.0330)
Resnext101 (x1.0, x1.5, x2.0) (600x600), (800x800), (1024x1024), (2048x2048)
0.5543
(+0.0523)
Resnext101* (x1.0, x1.5, x2.0) (600x600), (800x800), (1024x1024), (2048x2048)
0.5701
(+0.0681)
Uncertainty
+ resnext101*
(x1.0, x1.5, x2.0) (600x600), (800x800), (1024x1024), (2048x2048)
0.5890
(+0.0870)
* : 클래스별 다른 threshold 적용
_ : Baseline

https://dacon.io 40
5. 실험 결과 - S2
A-Nets
Model Multi-scale training Multi-scale test
Pulblic
(mAP)
resnet50 (x1.0, x2.0, x4.0) (x0.5, x1.0, x2.0, x4.0)
0.527
(+0.0250)
resnet50* (x4.0, x4.0, x4.0) (x0.5, x1.0, x2.0, x4.0)
0.559
(+0.0570)

https://dacon.io 41
5. 실험 결과 - RoI Tranformer + S2
A-Nets (Ensemble)
모델명 Multi-scale training Multi-scale test
Pulblic
(mAP)
RoI Transformer
+ S2A-Net
RoI Transformer : (x1.0, x1.5, x2.0)
S2A-Net : (x1.0, x1.5, x2.0, x4.0)
RoI Transformer : ((600x600), (800x800),
(1024x1024), (2048x2048))
S2A-Net : (x0.5, x1.0, x2.0, x4.0)
0.6034
(+0.1014)
RoI Transformer*
+ S2A-Net*
S2A-Net : (x1.0, x1.5, x2.0, x4.0)
(1024x1024), (2048x2048))
S2A-Net : (x0.5, x1.0, x2.0, x4.0)
0.6179
(+0.1159)
Uncertainty based
RoI Transformer*
+ S2A-Net*
S2A-Net : (x1.0, x1.5, x2.0, x4.0)
(1024x1024), (2048x2048))
S2A-Net : (x0.5, x1.0, x2.0, x4.0)
0.6261
(+0.1241)

https://dacon.io 42
A-Nets (Ensemble)
Pulblic
(mAP)
Uncertainty based
RoI Transformer* +
S2A-Net* +
Few_shot_S2A-Net*
S2A-Net :
(x1.0, x1.5, x2.0, x4.0)
Few_shot_S2A-Net :
(x1.0, x1.5, x2.0, x4.0)
(1024x1024), (2048x2048))
S2A-Net :
(x0.5, x1.0, x2.0, x4.0)
Few_shot_S2A-Net :
(x0.5, x1.0, x2.0, x4.0)
0.6285
(+0.1265)
Uncertainty based
RoI Transformer* +
S2A-Net* +
Few_shot_S2A-
Net*
S2A-Net :
(x1.0, x1.5, x2.0, x4.0)
Few_shot_S2A-Net :
(x1.0, x1.5, x2.0, x4.0)
(800x800), (1024x1024), (1200x1200),
(1536x1536), (2048x2048))
S2A-Net :
(x0.5, x1.0, x2.0, x4.0)
Few_shot_S2A-Net :
(x0.5, x1.0, x1.5, x2.0, x3.0, x4.0)
0.6438
(+0.1418)

https://dacon.io 43
A-Nets (Ensemble)
Public
(mAP)
Private
(mAP)
Uncertainty based RoI
Transformer*
+ S2A-Net*
+ Few_shot_S2A-Net*
RoI Transformer :
(x1.0, x1.5, x2.0)
S2A-Net :
(x1.0, x1.5, x2.0, x4.0)
Few_shot_S2A-Net :
(x1.0, x1.5, x2.0, x4.0)
RoI Transformer :
((500x500), (600x600), (800x800), (1024x1024),
(1200x1200), (1536x1536), (2048x2048))
S2A-Net :
(x0.5, x1.0, x2.0, x4.0)
Few_shot_S2A-Net :
(x0.5, x1.0, x1.5, x2.0, x3.0, x4.0)
0.6438 0.6206

https://dacon.io 44
5. 실험 결과 분석 - ensemble 효과
RoI Transformer 결과 S2
A-Nets 결과
S2
A-Nets 에서는 helipad 부분을 잘 잡지만, RoI Transformer에서는 잘
잡지 못하는것을 볼 수 있음.

https://dacon.io 45
5. 실험 결과 분석 - ensemble 효과
RoI Transformer에서는 잘 잡지 못하던 helipad을 잘 잡게됨.
Ensemble(RoI Transformer + S2A-Nets) 결과

https://dacon.io 47
6. 결론
 Multi-scale data augmentation - training step을 통해 해결했음.
 RoI Transformer, 𝐒 𝟐
A-Nets을 ensemble하여 해결했음.
 Different class threshold을 통해 데이터 수 간의 불균형을 완화했음.
 Multi-scale augmentation – inference step을 통해 객체 크기
불균형을 완화했음.
 Few shot learning을 통해 데이터 수 간의 불균형을 완화했음.
 Heteroscedastic uncertainty를 활용한 노이즈 캔슬링을 통해 노이즈
레이블링 문제를 완화했음.

https://dacon.io 48
Reference
[1] https://commons.wikimedia.org/wiki/File:Aerial_photograph_of_a_cargo_ship.jpg
[2] J. C. Park, S. H. Lee, J. U. Jung, S. B. Son, H. S. Oh, Y. C. Jung, “Uncertainty-based Deep Object
Detection from Aerial Images”, Journal of Institute of Control, Robotics and Systems, (2020).
[3] J. Ding, N. Xue, Y. Long, G.-S. Xia, and Q. Lu, “Learning RoI transformer for oriented object
detection in aerial images,” Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 2844-2853, (2019).
[4] Han, Jiaming, et al. "Align Deep Features for Oriented Object Detection." arXiv preprint
arXiv:2008.09397 (2020).
[5] 'open-mmlab://resnext101_64x4d'

THANK YOU
THANK YOU
https://dacon.io 49

아리랑 위성영상 AI 객체 검출 경진대회 2등 수상자 솔루션

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 아리랑 위성영상 AI 객체 검출 경진대회 2등 수상자 솔루션

Similar to 아리랑 위성영상 AI 객체 검출 경진대회 2등 수상자 솔루션 (20)

More from DACON AI 데이콘

More from DACON AI 데이콘 (20)

Recently uploaded

Recently uploaded (20)

아리랑 위성영상 AI 객체 검출 경진대회 2등 수상자 솔루션