YOLO

You Only Look Once :
Unified, Real-Time Object Detection
Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi
전희선

1. Introduction
• 기존 모델들은 물체 인식과 분류 각각 따로 진행 → 사람 시각체계 모방하기에는 부족
• 하지만 YOLO는 물체 인식 및 분류를 하나의 regression 문제로 간주

1. Introduction
장점
- Extremely fast
- Reasons globally about the image
- Learns generalizable
representation of objects
단점
- Lags behind state-of-the-art
detection systems in accuracy

2. Unified Detection
1. 이미지를 S*S grid로 분할
(총 S*S개의 grid cell 생성)
Hyperparameters :
S (grid 분할 수)
B (bounding box 수)
C (class 수)

2. 각 grid cell별로 B개의 bounding box 유추
+ bounding box별 confidence score 계산
각 bounding box 구성요소
(x, y) : bounding box 중심점 (grid cell에 대한 상대값)
(w, h) : 이미지 width, height (전체 이미지에 대한 상대값)
confidence : 신뢰도
Confidence Score :
Box가 객체 포함하는지에 대한 신뢰도 및
box가 얼마나 정확하게 유추되었는지 반영
Pr 𝑂𝑏𝑗𝑒𝑐𝑡 ∗ 𝐼𝑂𝑈 𝑝𝑟𝑒𝑑
𝑡𝑟𝑢𝑡ℎ
IOU(Intersection Over Union) :
예측 구간과 실제 구간이 얼마나 겹치는지 나타냄
𝐼𝑂𝑈 𝑝𝑟𝑒𝑑
𝑡𝑟𝑢𝑡ℎ
=
𝑡𝑟𝑢𝑡ℎ ∩ 𝑝𝑟𝑒𝑑 영역 넓이
𝑡𝑟𝑢𝑡ℎ ∪ 𝑝𝑟𝑒𝑑 영역 넓이
grid cell에 객체 있으면 1, 없으면 0

3. 각 grid cell별로 C개의 conditional class probability 계산
→ 가장 확률 높은 class 할당
Conditional Class Probability :
Pr 𝐶𝑙𝑎𝑠𝑠𝑖 | 𝑂𝑏𝑗𝑒𝑐𝑡

4. 최종 detection!
Test할 때는 각 box별로
Class-specific confidence score 계산 :
Pr 𝐶𝑙𝑎𝑠𝑠𝑖 𝑂𝑏𝑗𝑒𝑐𝑡) ∗ Pr 𝑂𝑏𝑗𝑒𝑐𝑡 ∗ 𝐼𝑂𝑈 𝑝𝑟𝑒𝑑
𝑡𝑟𝑢𝑡ℎ
= Pr 𝐶𝑙𝑎𝑠𝑠𝑖 ∗ 𝐼𝑂𝑈 𝑝𝑟𝑒𝑑
𝑡𝑟𝑢𝑡ℎ

2.1 Network Design
GoogLeNet 모델 기반으로 생성됨
Inception module에서
1*1 reduction layer,
3*3 conv layer 이용

2.1 Network Design
초반 20개 (GoogLeNet modification된) conv layer : feature extractor
후반 4개 conv layer + FC layer : object classifier

2.1 Network Design
class별
probability
각 bounding box별
x, y, w, h, confidence 값
(슬라이드 5 참고, 여기서
bounding box 개수 = 2개)
최종 출력 Tensor 크기
= S x S x (B*5+C)
= 7 x 7 x (2*5+20)
S(grid 분할 수) = 7
B(bounding box 수) = 2
C(class 수) = 20
Pr 𝐶𝑙𝑎𝑠𝑠𝑖 | 𝑂𝑏𝑗𝑒𝑐𝑡

2.2 Training – Loss Function

Object가 존재하는 grid cell i의 bounding box j에 대해
x, y의 loss 계산

w, y의 loss 계산
(큰 box에 대하여 small deviation 반영 위해 제곱근)

confidence score의 loss 계산
(𝐶𝑖 = 1)

Object가 존재하지 않는 grid cell i의 bounding box j에 대해
confidence score의 loss 계산
(𝐶𝑖 = 0)

Object가 존재하지 않는 grid cell i의 bounding box j에 대해
conditional class probability의 loss 계산
(맞는 class이면 𝑝𝑖 𝑐 = 1, 아니면 𝑝𝑖 𝑐 = 0)

보통
10배

2.2 Training – hyperparameter
1. 초반 20개 conv layers를 ImageNet 1000-class dataset으로 pretrain
+ 4개 conv layer와 2개 FC layer 넣어서 PASCAL VOC dataset으로 train
2. 𝜆 𝑐𝑜𝑜𝑟𝑑 = 5, 𝜆 𝑛𝑜𝑜𝑏𝑗 = 0.5 (보통 object 있는 곳에 10배 가중치)
3. Batch size = 64
4. Dropout rate = 0.5
5. Activation function = leaky ReLU

2.4 Limitations of YOLO
각 cell이 하나의 box 유추 → 그룹으로 객체가 묶여 있으면 예측 어려움
새로운, 독특한 형태의 bounding box 정확히 예측 불가

참고자료
http://www.navisphere.net/6028/you-only-look-once-unified-real-time-object-detection/
https://curt-park.github.io/2017-03-26/yolo/
https://www.youtube.com/watch?v=eTDcoeqj1_w&t=1572s
https://www.youtube.com/watch?v=4eIBisqx9_g
https://www.youtube.com/watch?v=8DjIJc7xH5U
https://www.youtube.com/watch?v=Cgxsv1riJhI

YOLO

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

More from KyeongUkJang

More from KyeongUkJang (20)

Recently uploaded

Recently uploaded (6)

YOLO