SlideShare a Scribd company logo
1 of 73
YOLOYou Only Look Once: Unified, Real-Time Object Detection
Joseph Redmon, Santosh Divvala, Ross Cirshick, Ali Farhadi
20180125 이진경
You Only Look Once:
Unified, Real-Time Object Detection
무엇을 한번 보나?
어떤 과정들이 unified?
어떻게 unify시켰지?
어떤 구조일까?
Real-time 이면 어느정도 속도?
정확도는 과연 높을까?
Object detection?
Faster R-CNN vs YOLO
Faster R-CNN
YOLO
Introduction
• Deformable Part Models (DPM) : sliding window approach
• R-CNN : region proposal methods
Current detection system
Sliding window
Region proposal methods.
Complex pipelines
slow
hard to optimize
Introduction
Current detection system
Why? Each individual component must be trained separately
Introduction
YOLO : object detection as a single regression problem.
Neural network
• Bounding box coordinates
• Class probabilities
Single regression
You Only Look Once at an image to predict what objects are present and where they are.
Directly optimize
Classification vs Regression??
Unified Detection
: Unify the separate components of object detection into a single neural network.
Neural network
Predict all bounding boxes for an image simultaneously
Uses features from the entire image
• End-to-end training
• Real-time speeds
기존의 방법과 무엇이 다른가?
Unified Detection
결과가 어떤 형태로 나오길래 다음과 같이
Bounding box와 각 bounding box의 class를
동시에 알 수 있을까?
Neural network
Predict all bounding boxes for an image simultaneously
Uses features from the entire image
1) Divide the input image into a S*S grid.
Image > grid cell
• YOLO 에서는 detection을 위해 image 를 N*N으로 쪼갠 grid라는 단위를 이용한다.
• 예시에서의 S=7
Unified Detection
2) Each grid cell predicts
B bounding boxes and confidence scores for detecting that object.
• cell 내부에 중심을 두는 bounding box를 B개 propose.
• B개의 box 에서 confidence scores.
Image > grid cell > bounding boxes
* If no object exists in that cell, the confidence scores should be zero.
* 𝐵는 파라미터로 줄수있고, 이 예시에서 B=2.
Confidence scores reflect
1. how confident the model is that the box contains an object.
2. How accurate it thinks the box is that it predicts.
Unified Detection
Image > grid cell > bounding boxes
* If no object exists in that cell, the confidence scores should be zero.
* 𝐵는 파라미터로 줄수있고, 이 예시에서 B=2.
Confidence scores reflect
1. how confident the model is that the box contains an object.
2. How accurate it thinks the box is that it predicts.
2) Each grid cell predicts
B bounding boxes and confidence scores for detecting that object.
• cell 내부에 중심을 두는 bounding box를 B개 propose.
• B개의 box 에서 confidence scores
Box를 어떤 형태로 predict? 어떻게 계산?
Unified Detection
Image > grid cell > bounding boxes
Each bounding box consists of 5 predictions: 𝒙, 𝒚, 𝒘, 𝒉 and 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆.
• (𝒙, 𝒚) 𝒄𝒐𝒐𝒓𝒅𝒊𝒏𝒂𝒕𝒆𝒔 represent the center of the box relative to the bounds of the grid cell.
• The 𝒘𝒊𝒅𝒕𝒉(𝒘) and 𝒉𝒆𝒊𝒈𝒉𝒕(𝒉) are predicted relative to the whole image
• The 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒊𝒐𝒏 represents the IOU between the predicted box and any ground truth box.
*predicted box and any ground truth box
Unified Detection
2) Each grid cell predicts
B bounding boxes and confidence scores for detecting that object.
𝒙, 𝒚, 𝒘, 𝒉 and 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 같은 경우는 아직 blackbox인 neural network를 통해서 나올것이다.
IOU(intersection over union)
Confidence scores reflect
1. how confident the model is that the box contains an object.
2. How accurate it thinks the box is that it predicts.
Unified Detection
Image > grid cell > bounding boxes
Each bounding box consists of 5 predictions: 𝒙, 𝒚, 𝒘, 𝒉 and 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆.
• (𝒙, 𝒚) 𝒄𝒐𝒐𝒓𝒅𝒊𝒏𝒂𝒕𝒆𝒔 represent the center of the box relative to the bounds of the grid cell.
• The 𝒘𝒊𝒅𝒕𝒉(𝒘) and 𝒉𝒆𝒊𝒈𝒉𝒕(𝒉) are predicted relative to the whole image
• The 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒊𝒐𝒏 represents the IOU between the predicted box and any ground truth box.
*predicted box and any ground truth box
Unified Detection
(0.3, 0.2, 0.17, 0.76, 0.91)
2) Each grid cell predicts
B bounding boxes and confidence scores for detecting that object.
𝒙, 𝒚, 𝒘, 𝒉 and 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 같은 경우는 아직 blackbox인 neural network를 통해서 나올것이다.
Image > grid cell > bounding boxes
Each grid cell also predicts C conditional class probabilities.
박스 안에 object가 존재한다면, i 클래스 일 확률은?
Conditional class probability 같은 경우는 아직 blackbox인 neural network를 통해서 나올것이다.
We only predict one set of class probabilities per grid cell, regardless of the number of boxes 𝐵.
그 cell안에 중심이 있는 object가 여러 개일 경우에도, 하나의 class로만 정의.
2) Each grid cell predicts
B bounding boxes and confidence scores for detecting that object.
Cell의 크기를 줄인다면 ? Object 여러 개 찾을 수 있다. 계산량 많아짐. 속도 느려짐
Cell의 크기를 늘린다면 ? Bounding box 개수 줄어듬. 찾을 수 있는 object 개수 적어짐.
Unified Detection
Confidence scores reflect
1. how confident the model is that the box contains an object.
2. How accurate it thinks the box is that it predicts.
Each bounding box consists of 5 predictions:
𝒙, 𝒚, 𝒘, 𝒉 and 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆.
Each grid cell also predicts C conditional class probabilities.
현재까지 계산한 것 이제 계산해야 되는것
how confident the model is that the box contains an object.
How accurate it thinks the box is that it predicts.
Unified Detection
현재까지 계산한 것의 형태
Unified Detection
Unified Detection
Unified Detection
These predictions are encoded as an S x S x (B x 5 + C) tensor.
Ex) 7*7*30
Neural network
Predict all bounding boxes for an image simultaneously
Uses features from the entire image
2) Each grid cell predicts
B bounding boxes and confidence scores for detecting that object.
Image > grid cell > bounding boxes
class-specific confidence scores for each box =
(conditional class probabilities) x (the individual box confidence predictions)
class-specific confidence scores for each box.
Confidence scores reflect
1. how confident the model is that the box contains an object.
2. How accurate it thinks the box is that it predicts.
Unified Detection
Unified Detection
Unified Detection
Unified Detection
Unified Detection
Unified Detection
Unified Detection
Unified Detection
Unified Detection
Unified Detection
Unified Detection
Unified Detection
Unified Detection
2. Unified Detection
Neural network
Predict all bounding boxes for an image simultaneously
Uses features from the entire image
각 박스마다
X,y,w,h,confidence
각 cell마다 클래스
class-specific confidence scores for each box.
NMS 등으로 처리한 후.
2. Unified Detection
2.1. Design
Neural network
Predict all bounding boxes for an image simultaneously
Uses features from the entire image
2.1. Design
Fully
connected
layers
Extract features from the image Predict the output probabilities and coordinates
Convolutional layers of the network
GoogLeNet Inception module
2.1. Design
GoogLeNet model
Fully
connected
layers
Extract features from the image Predict the output probabilities and coordinates
Convolutional layers of the network
24 2
2.1. Design
YOLO model
2.1. Design
YOLO model
뒤에 training 에서 설명
x
y
왜 이런 activation function?
Relu는 node의 output이 0미만이면
아예 0으로 수렴 해당 노드 다음부터는
고려아예 안함.
0미만이어도 조금의 영향을 살려준다.
Fully
connected
layers
Extract features from the image Predict the output probabilities and coordinates
Convolutional layers of the network
9 2
2.1. Design
Fast YOLO model
2.2. Training
classification
Accuracy of 88 %
1) Pretrain classification
detection
2.2. Training
2) detection
Pretrained model for classification +4 conv lay, 2 FC
Error를 어떤식으로 계산해서 Training하는가?
2.2. Training
Use sum-squared error  easy to optimize
x ,y ,w ,h ,confidence scores, conditional class probabilities
targetPrediction
S x S x (B x 5 + C) S x S x (B x 5 + C)
Normalize 시켰기 때문에 각 value의 error weight 는 현재 같다.
2.2. Training
𝑖=0
𝑆2
𝑗=0
𝐵
ℝ𝑖𝑗
𝑜𝑏𝑗 𝑜𝑟 𝑛𝑜𝑜𝑏𝑗
(𝑥𝑖 − 𝑥𝑖)2
+ (𝑦𝑖 − 𝑦𝑖)2
+
𝑖=0
𝑆2
𝑗=0
𝐵
ℝ𝑖𝑗
𝑜𝑏𝑗 𝑜𝑟 𝑛𝑜𝑜𝑏𝑗
(𝑤𝑖 − 𝑤𝑖)2
+ (ℎ𝑖 − ℎ)2
+
𝑖=0
𝑆2
𝑗=0
𝐵
ℝ𝑖𝑗
𝑜𝑏𝑗 𝑜𝑟 𝑛𝑜𝑜𝑏𝑗
(𝐶𝑖 − 𝐶𝑖)2
+
𝑖=0
𝑆2
ℝ𝑖𝑗
𝑜𝑏𝑗 𝑜𝑟 𝑛𝑜𝑜𝑏𝑗
𝑐∈𝑐𝑙𝑎𝑠𝑠𝑒𝑠
(𝑝𝑖(𝑐) − 𝑝𝑖(𝑐))2
다음과 같이 sum-squared error 계산?
(대략적인 모습)
1) Localization error와 classification error의 Weight가 같다
Localization error, classification error weight를 다르게 준다.
• Weight가 같은게 왜 문제일까?
1) 이미 classificatio은 pre-trained 되어있기 때문에?
2) localization의 변수는 더 미세한 조절이 필요하다?
• 비율을 어떻게 줄까?
 ????
sum-squared error 이 training에서 갖는 단점
2.2. Training
2) Object가 없는 grid cell들이 많다. 이 셀들의 confidence score  0.
Object가 없는 cell들의 gradient 에 기울어진 방향으로 training 된다.
obj와 noobj cell 구분하여 confidence score 반영해야한다.
sum-squared error 이 training에서 갖는 단점
2.2. Training
target prediction
1 0.9
0.9
0.1
0.1
0.2
0.2
target
3) Box의 크기와 상관없이 error 계산.
weight : 큰 box에서의 작은 차이 < 작은 box에서의 작은 차이
prediction
sum-squared error 이 training에서 갖는 단점
2.2. Training
prediction
target
1
영향력
<<
1) Coordinate prediction(localization error)의 loss weight을 증가
2) noobj 의 confidence score의 loss weight을 감소
Localization error weight : classification error weight = 5 : 1
Obj confidence score error weight : noObj confidence score error weight = 1 : 0.5
3) Squre root of the bounding box width and height.
sum-squared error 이 training에서 갖는 단점 보완 방법
2.2. Training
𝑖=0
𝑆2
𝑗=0
𝐵
ℝ𝑖𝑗
𝑜𝑏𝑗 𝑜𝑟 𝑛𝑜𝑜𝑏𝑗
(𝑥𝑖 − 𝑥𝑖)2
+ (𝑦𝑖 − 𝑦𝑖)2
+
𝑖=0
𝑆2
𝑗=0
𝐵
ℝ𝑖𝑗
𝑜𝑏𝑗 𝑜𝑟 𝑛𝑜𝑜𝑏𝑗
(𝑤𝑖 − 𝑤𝑖)2
+ (ℎ𝑖 − ℎ)2
+
𝑖=0
𝑆2
𝑗=0
𝐵
ℝ𝑖𝑗
𝑜𝑏𝑗 𝑜𝑟 𝑛𝑜𝑜𝑏𝑗
(𝐶𝑖 − 𝐶𝑖)2
+
𝑖=0
𝑆2
ℝ𝑖𝑗
𝑜𝑏𝑗 𝑜𝑟 𝑛𝑜𝑜𝑏𝑗
𝑐∈𝑐𝑙𝑎𝑠𝑠𝑒𝑠
(𝑝𝑖(𝑐) − 𝑝𝑖(𝑐))2
2.2. Training
Training을 위한 error 계산시 고려하는 것.
1. Object가 있는 cell에 대해서만 classification error 줄이는 방향.
2. Responsible한 bounding box coordinate 에 대해서만 error 줄이는 방향
Training이 향하는 방향
2.2. Training
Image > grid cell > bounding boxes
Training이 향하는 방향
2.2. Training
Image > grid cell > bounding boxes
Training이 향하는 방향
2.2. Training
2.2. Training
What have done….
75 105 135
0.0001
0.001
0.00001
1) Learning rate
To avoid overfitting…
2) dropout( rate =0.5)
3) Data augmentation (scaling, translation ,exposure , saturation)
2.2. Training
What have done….
2.4. Limitations of YOLO
(1) Strong spatial constraints on bounding box predictions
Each grid cell only predicts two boxes and can only have one class.
For door
This spatial constraint limits the number of nearby objects that our model can predict.
Our model struggles with small objects that appear in groups
For motocycle
2.4. Limitations of YOLO
(2) Struggles to generalize to objects in new or unusual aspect ratios or configurations.
Our model learns to predict bounding boxes from data.
???training Unusual aspect ratio
2.4. Limitations of YOLO
(3) Loss function treats errors the same in small bounding boxes versus large bounding boxes.
???
Width,heigh이 아니고,
IOU에 대해서 Large , small box 고려안함
DPM R-CNN Other Fast Detectors
(FAST R-CNN,FASTER R-
CNN)
Deep multi-box OverFeat MultiGrasp
• Sliding window
• Extract features :disjoint pipeline
1. extract static features
2. classify regions
3. predict bounding boxes
…
• Find object : Region
proposals.
1. Selective search
2. CNN
3. SVM
4. NMS
disjoint
Selective search (x)
Sharing computation
and using neural
network region
proposal
Single object detection Efficiently performs
sliding window
detection.
Disjoint system.
Grid approach to
bounding box
predictions is based on
the MultiGrasp system
for regression to grasps.
Single convolutional neural network • Each grid cell proposes
a potential bounding
boxes and scores using
CNN.
• Bounding box 98 : 2000
• Single, jointly optimized
model
• real-time
• gene
• CNN to predict
bounding boxes in
and image
3. Comparisons to Other Detection Systems
Mean Average Precision(mAP)
4. Experiments
• Fast YOLO is the fastest object detection method On PASCAL.
• YOLO is 10mAP more accurate than the fast version
While still well above real-time in speed
• R-CNN –R : Selective Search(x)  static bounding box proposal
• Fast R-CNN : Relies on Selective Search slow
• Faster RNN :
Selective Search(x) neural network to propose bounding boxes
4.2. VOC 2007 Error Analysis
YOLO vs Fast R-CNN
Each prediction is either correct Or error.
we look at the top N predictions for that
category.(top N detections)
• YOLO struggles to localize objects correctly.
• Fast RCNN is almost 3x more likely to predict background error. false positive. (objec가 없는데 있다고 했다)
4.3. Combining Fast R-CNN and YOLO
YOLO & Fast R-CNN
YOLO 랑 combine 한것이 단순히 ensemble의 효과는 아니다. 왜냐하면 다른 Fast RCNN과 ensemble 했을 때에 비해 조금 더 많은 benefit을 갖고있다.
 YOLO makes different kinds of mistakes at test time  effective at boosting Fast R-CNN
 benefit from the speed of YOLO (x), why not? We run each model separately.
Give that prediction a boost
(YOLO 의 probability,
overlap between the two box 고려)
R-CNN Every proposal box
predicts a similar box?YOLO
Better localization
Better background
yes No 라면 background일 확률 높다?
YOLO에서 거른다?
4.4. VOC 2012 Results
4.5. Generalizability
We want our detector to learn robust representations of visual concepts
so that they can generalize to new domains or unexpected input at test time.
Picasso Dataset People-Art Dataset
• In real-word applications, it is hard to predict all possible use cases.
• Test data can diverge from wat the system has seen before.
.training
VOC data
Test For generalizing detection to artwork.
Detect people
Natural image Art image
4.5. Generalizability
• YOLO : AP degrades less than other methods
• R-CNN : highly overfit to PASCAL VOC.
• Selective Search : highly tuned for natural image
• Only sees small regions
• DPM : AP degrades less than other methods.
But it starts from a lower AP.
Artwork and natural images are very
different on a pixel level But they are similar
in terms of the size and shape of objects.
5. Real-Time Detection In The Wild
• When attachaed to a webcam it functions like a traking
system.
• Demo is available.
6. Conclusion
• A Unified model for object detection
• Simple to construct
• Can be trained directly on full images.
• Generalizes well to new domains

More Related Content

What's hot

Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detectionWenjing Chen
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewLEE HOSEONG
 
Real-time object detection coz YOLO!
Real-time object detection coz YOLO!Real-time object detection coz YOLO!
Real-time object detection coz YOLO!J On The Beach
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012Jinwon Lee
 
Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421穗碧 陳
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementJinwon Lee
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Hwa Pyung Kim
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Jihong Kang
 
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context KhalidKhan412
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksUsman Qayyum
 
Deep learning based object detection
Deep learning based object detectionDeep learning based object detection
Deep learning based object detectionchettykulkarni
 
Single Shot Multibox Detector
Single Shot Multibox DetectorSingle Shot Multibox Detector
Single Shot Multibox DetectorNamHyuk Ahn
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkNader Karimi
 

What's hot (20)

Yolo releases gianmaria
Yolo releases gianmariaYolo releases gianmaria
Yolo releases gianmaria
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
 
YOLO v1
YOLO v1YOLO v1
YOLO v1
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
 
Real-time object detection coz YOLO!
Real-time object detection coz YOLO!Real-time object detection coz YOLO!
Real-time object detection coz YOLO!
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
YOLO
YOLOYOLO
YOLO
 
Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421
 
Yol ov2
Yol ov2Yol ov2
Yol ov2
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
 
Yolo
YoloYolo
Yolo
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
 
Yolov3
Yolov3Yolov3
Yolov3
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Deep learning based object detection
Deep learning based object detectionDeep learning based object detection
Deep learning based object detection
 
Single Shot Multibox Detector
Single Shot Multibox DetectorSingle Shot Multibox Detector
Single Shot Multibox Detector
 
Object detection
Object detectionObject detection
Object detection
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 

Similar to You only look once

Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningMatthew Opala
 
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningCharles Deledalle
 
IISc Internship Report
IISc Internship ReportIISc Internship Report
IISc Internship ReportHarshilJain26
 
IRJET - Real Time Object Detection using YOLOv3
IRJET - Real Time Object Detection using YOLOv3IRJET - Real Time Object Detection using YOLOv3
IRJET - Real Time Object Detection using YOLOv3IRJET Journal
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Universitat Politècnica de Catalunya
 
Top object detection algorithms in deep neural networks
Top object detection algorithms in deep neural networksTop object detection algorithms in deep neural networks
Top object detection algorithms in deep neural networksApuChandraw
 
Computer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectComputer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectIOSR Journals
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Sergey Karayev
 
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...Sung Kim
 
NBDT : Neural-backed Decision Tree 2021 ICLR
 NBDT : Neural-backed Decision Tree 2021 ICLR NBDT : Neural-backed Decision Tree 2021 ICLR
NBDT : Neural-backed Decision Tree 2021 ICLRtaeseon ryu
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Simplilearn
 
#10 pydata warsaw object detection with dn ns
#10   pydata warsaw object detection with dn ns#10   pydata warsaw object detection with dn ns
#10 pydata warsaw object detection with dn nsAndrew Brozek
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...Edge AI and Vision Alliance
 
20190927 generative models_aia
20190927 generative models_aia20190927 generative models_aia
20190927 generative models_aiaYi-Fan Liou
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
AI BASED WILD BOAR DETECTION1.pptx
AI BASED WILD BOAR DETECTION1.pptxAI BASED WILD BOAR DETECTION1.pptx
AI BASED WILD BOAR DETECTION1.pptxSHABANASALIM4
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper reviewYoonho Na
 
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
Andrii Belas  "Overview of object detection approaches: cases, algorithms and...Andrii Belas  "Overview of object detection approaches: cases, algorithms and...
Andrii Belas "Overview of object detection approaches: cases, algorithms and...Lviv Startup Club
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee
 
Presentation Journée Thématique - Fouille de Grands Graphes
Presentation Journée Thématique - Fouille de Grands GraphesPresentation Journée Thématique - Fouille de Grands Graphes
Presentation Journée Thématique - Fouille de Grands GraphesJuan David Cruz-Gómez
 

Similar to You only look once (20)

Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep Learning
 
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, Captioning
 
IISc Internship Report
IISc Internship ReportIISc Internship Report
IISc Internship Report
 
IRJET - Real Time Object Detection using YOLOv3
IRJET - Real Time Object Detection using YOLOv3IRJET - Real Time Object Detection using YOLOv3
IRJET - Real Time Object Detection using YOLOv3
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
 
Top object detection algorithms in deep neural networks
Top object detection algorithms in deep neural networksTop object detection algorithms in deep neural networks
Top object detection algorithms in deep neural networks
 
Computer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectComputer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an Object
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
 
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
 
NBDT : Neural-backed Decision Tree 2021 ICLR
 NBDT : Neural-backed Decision Tree 2021 ICLR NBDT : Neural-backed Decision Tree 2021 ICLR
NBDT : Neural-backed Decision Tree 2021 ICLR
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
 
#10 pydata warsaw object detection with dn ns
#10   pydata warsaw object detection with dn ns#10   pydata warsaw object detection with dn ns
#10 pydata warsaw object detection with dn ns
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
20190927 generative models_aia
20190927 generative models_aia20190927 generative models_aia
20190927 generative models_aia
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
AI BASED WILD BOAR DETECTION1.pptx
AI BASED WILD BOAR DETECTION1.pptxAI BASED WILD BOAR DETECTION1.pptx
AI BASED WILD BOAR DETECTION1.pptx
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper review
 
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
Andrii Belas  "Overview of object detection approaches: cases, algorithms and...Andrii Belas  "Overview of object detection approaches: cases, algorithms and...
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
Presentation Journée Thématique - Fouille de Grands Graphes
Presentation Journée Thématique - Fouille de Grands GraphesPresentation Journée Thématique - Fouille de Grands Graphes
Presentation Journée Thématique - Fouille de Grands Graphes
 

Recently uploaded

Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 

Recently uploaded (20)

9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 

You only look once

  • 1. YOLOYou Only Look Once: Unified, Real-Time Object Detection Joseph Redmon, Santosh Divvala, Ross Cirshick, Ali Farhadi 20180125 이진경
  • 2. You Only Look Once: Unified, Real-Time Object Detection 무엇을 한번 보나? 어떤 과정들이 unified? 어떻게 unify시켰지? 어떤 구조일까? Real-time 이면 어느정도 속도? 정확도는 과연 높을까?
  • 4. Faster R-CNN vs YOLO Faster R-CNN YOLO
  • 5. Introduction • Deformable Part Models (DPM) : sliding window approach • R-CNN : region proposal methods Current detection system
  • 6. Sliding window Region proposal methods. Complex pipelines slow hard to optimize Introduction Current detection system Why? Each individual component must be trained separately
  • 7. Introduction YOLO : object detection as a single regression problem. Neural network • Bounding box coordinates • Class probabilities Single regression You Only Look Once at an image to predict what objects are present and where they are. Directly optimize
  • 9. Unified Detection : Unify the separate components of object detection into a single neural network. Neural network Predict all bounding boxes for an image simultaneously Uses features from the entire image • End-to-end training • Real-time speeds 기존의 방법과 무엇이 다른가?
  • 10. Unified Detection 결과가 어떤 형태로 나오길래 다음과 같이 Bounding box와 각 bounding box의 class를 동시에 알 수 있을까? Neural network Predict all bounding boxes for an image simultaneously Uses features from the entire image
  • 11. 1) Divide the input image into a S*S grid. Image > grid cell • YOLO 에서는 detection을 위해 image 를 N*N으로 쪼갠 grid라는 단위를 이용한다. • 예시에서의 S=7 Unified Detection
  • 12. 2) Each grid cell predicts B bounding boxes and confidence scores for detecting that object. • cell 내부에 중심을 두는 bounding box를 B개 propose. • B개의 box 에서 confidence scores. Image > grid cell > bounding boxes * If no object exists in that cell, the confidence scores should be zero. * 𝐵는 파라미터로 줄수있고, 이 예시에서 B=2. Confidence scores reflect 1. how confident the model is that the box contains an object. 2. How accurate it thinks the box is that it predicts. Unified Detection
  • 13. Image > grid cell > bounding boxes * If no object exists in that cell, the confidence scores should be zero. * 𝐵는 파라미터로 줄수있고, 이 예시에서 B=2. Confidence scores reflect 1. how confident the model is that the box contains an object. 2. How accurate it thinks the box is that it predicts. 2) Each grid cell predicts B bounding boxes and confidence scores for detecting that object. • cell 내부에 중심을 두는 bounding box를 B개 propose. • B개의 box 에서 confidence scores Box를 어떤 형태로 predict? 어떻게 계산? Unified Detection
  • 14. Image > grid cell > bounding boxes Each bounding box consists of 5 predictions: 𝒙, 𝒚, 𝒘, 𝒉 and 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆. • (𝒙, 𝒚) 𝒄𝒐𝒐𝒓𝒅𝒊𝒏𝒂𝒕𝒆𝒔 represent the center of the box relative to the bounds of the grid cell. • The 𝒘𝒊𝒅𝒕𝒉(𝒘) and 𝒉𝒆𝒊𝒈𝒉𝒕(𝒉) are predicted relative to the whole image • The 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒊𝒐𝒏 represents the IOU between the predicted box and any ground truth box. *predicted box and any ground truth box Unified Detection 2) Each grid cell predicts B bounding boxes and confidence scores for detecting that object. 𝒙, 𝒚, 𝒘, 𝒉 and 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 같은 경우는 아직 blackbox인 neural network를 통해서 나올것이다.
  • 15. IOU(intersection over union) Confidence scores reflect 1. how confident the model is that the box contains an object. 2. How accurate it thinks the box is that it predicts. Unified Detection
  • 16. Image > grid cell > bounding boxes Each bounding box consists of 5 predictions: 𝒙, 𝒚, 𝒘, 𝒉 and 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆. • (𝒙, 𝒚) 𝒄𝒐𝒐𝒓𝒅𝒊𝒏𝒂𝒕𝒆𝒔 represent the center of the box relative to the bounds of the grid cell. • The 𝒘𝒊𝒅𝒕𝒉(𝒘) and 𝒉𝒆𝒊𝒈𝒉𝒕(𝒉) are predicted relative to the whole image • The 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒊𝒐𝒏 represents the IOU between the predicted box and any ground truth box. *predicted box and any ground truth box Unified Detection (0.3, 0.2, 0.17, 0.76, 0.91) 2) Each grid cell predicts B bounding boxes and confidence scores for detecting that object. 𝒙, 𝒚, 𝒘, 𝒉 and 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 같은 경우는 아직 blackbox인 neural network를 통해서 나올것이다.
  • 17. Image > grid cell > bounding boxes Each grid cell also predicts C conditional class probabilities. 박스 안에 object가 존재한다면, i 클래스 일 확률은? Conditional class probability 같은 경우는 아직 blackbox인 neural network를 통해서 나올것이다. We only predict one set of class probabilities per grid cell, regardless of the number of boxes 𝐵. 그 cell안에 중심이 있는 object가 여러 개일 경우에도, 하나의 class로만 정의. 2) Each grid cell predicts B bounding boxes and confidence scores for detecting that object. Cell의 크기를 줄인다면 ? Object 여러 개 찾을 수 있다. 계산량 많아짐. 속도 느려짐 Cell의 크기를 늘린다면 ? Bounding box 개수 줄어듬. 찾을 수 있는 object 개수 적어짐. Unified Detection Confidence scores reflect 1. how confident the model is that the box contains an object. 2. How accurate it thinks the box is that it predicts.
  • 18. Each bounding box consists of 5 predictions: 𝒙, 𝒚, 𝒘, 𝒉 and 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆. Each grid cell also predicts C conditional class probabilities. 현재까지 계산한 것 이제 계산해야 되는것 how confident the model is that the box contains an object. How accurate it thinks the box is that it predicts.
  • 22. Unified Detection These predictions are encoded as an S x S x (B x 5 + C) tensor. Ex) 7*7*30
  • 23. Neural network Predict all bounding boxes for an image simultaneously Uses features from the entire image
  • 24. 2) Each grid cell predicts B bounding boxes and confidence scores for detecting that object. Image > grid cell > bounding boxes class-specific confidence scores for each box = (conditional class probabilities) x (the individual box confidence predictions) class-specific confidence scores for each box. Confidence scores reflect 1. how confident the model is that the box contains an object. 2. How accurate it thinks the box is that it predicts. Unified Detection
  • 38. Neural network Predict all bounding boxes for an image simultaneously Uses features from the entire image
  • 39. 각 박스마다 X,y,w,h,confidence 각 cell마다 클래스 class-specific confidence scores for each box. NMS 등으로 처리한 후. 2. Unified Detection
  • 40. 2.1. Design Neural network Predict all bounding boxes for an image simultaneously Uses features from the entire image
  • 41. 2.1. Design Fully connected layers Extract features from the image Predict the output probabilities and coordinates Convolutional layers of the network
  • 42. GoogLeNet Inception module 2.1. Design GoogLeNet model
  • 43. Fully connected layers Extract features from the image Predict the output probabilities and coordinates Convolutional layers of the network 24 2 2.1. Design YOLO model
  • 44. 2.1. Design YOLO model 뒤에 training 에서 설명
  • 45. x y 왜 이런 activation function? Relu는 node의 output이 0미만이면 아예 0으로 수렴 해당 노드 다음부터는 고려아예 안함. 0미만이어도 조금의 영향을 살려준다.
  • 46. Fully connected layers Extract features from the image Predict the output probabilities and coordinates Convolutional layers of the network 9 2 2.1. Design Fast YOLO model
  • 47. 2.2. Training classification Accuracy of 88 % 1) Pretrain classification
  • 48. detection 2.2. Training 2) detection Pretrained model for classification +4 conv lay, 2 FC
  • 49. Error를 어떤식으로 계산해서 Training하는가? 2.2. Training
  • 50. Use sum-squared error  easy to optimize x ,y ,w ,h ,confidence scores, conditional class probabilities targetPrediction S x S x (B x 5 + C) S x S x (B x 5 + C) Normalize 시켰기 때문에 각 value의 error weight 는 현재 같다. 2.2. Training 𝑖=0 𝑆2 𝑗=0 𝐵 ℝ𝑖𝑗 𝑜𝑏𝑗 𝑜𝑟 𝑛𝑜𝑜𝑏𝑗 (𝑥𝑖 − 𝑥𝑖)2 + (𝑦𝑖 − 𝑦𝑖)2 + 𝑖=0 𝑆2 𝑗=0 𝐵 ℝ𝑖𝑗 𝑜𝑏𝑗 𝑜𝑟 𝑛𝑜𝑜𝑏𝑗 (𝑤𝑖 − 𝑤𝑖)2 + (ℎ𝑖 − ℎ)2 + 𝑖=0 𝑆2 𝑗=0 𝐵 ℝ𝑖𝑗 𝑜𝑏𝑗 𝑜𝑟 𝑛𝑜𝑜𝑏𝑗 (𝐶𝑖 − 𝐶𝑖)2 + 𝑖=0 𝑆2 ℝ𝑖𝑗 𝑜𝑏𝑗 𝑜𝑟 𝑛𝑜𝑜𝑏𝑗 𝑐∈𝑐𝑙𝑎𝑠𝑠𝑒𝑠 (𝑝𝑖(𝑐) − 𝑝𝑖(𝑐))2 다음과 같이 sum-squared error 계산? (대략적인 모습)
  • 51. 1) Localization error와 classification error의 Weight가 같다 Localization error, classification error weight를 다르게 준다. • Weight가 같은게 왜 문제일까? 1) 이미 classificatio은 pre-trained 되어있기 때문에? 2) localization의 변수는 더 미세한 조절이 필요하다? • 비율을 어떻게 줄까?  ???? sum-squared error 이 training에서 갖는 단점 2.2. Training
  • 52. 2) Object가 없는 grid cell들이 많다. 이 셀들의 confidence score  0. Object가 없는 cell들의 gradient 에 기울어진 방향으로 training 된다. obj와 noobj cell 구분하여 confidence score 반영해야한다. sum-squared error 이 training에서 갖는 단점 2.2. Training
  • 53. target prediction 1 0.9 0.9 0.1 0.1 0.2 0.2 target 3) Box의 크기와 상관없이 error 계산. weight : 큰 box에서의 작은 차이 < 작은 box에서의 작은 차이 prediction sum-squared error 이 training에서 갖는 단점 2.2. Training prediction target 1 영향력 <<
  • 54. 1) Coordinate prediction(localization error)의 loss weight을 증가 2) noobj 의 confidence score의 loss weight을 감소 Localization error weight : classification error weight = 5 : 1 Obj confidence score error weight : noObj confidence score error weight = 1 : 0.5 3) Squre root of the bounding box width and height. sum-squared error 이 training에서 갖는 단점 보완 방법 2.2. Training
  • 55. 𝑖=0 𝑆2 𝑗=0 𝐵 ℝ𝑖𝑗 𝑜𝑏𝑗 𝑜𝑟 𝑛𝑜𝑜𝑏𝑗 (𝑥𝑖 − 𝑥𝑖)2 + (𝑦𝑖 − 𝑦𝑖)2 + 𝑖=0 𝑆2 𝑗=0 𝐵 ℝ𝑖𝑗 𝑜𝑏𝑗 𝑜𝑟 𝑛𝑜𝑜𝑏𝑗 (𝑤𝑖 − 𝑤𝑖)2 + (ℎ𝑖 − ℎ)2 + 𝑖=0 𝑆2 𝑗=0 𝐵 ℝ𝑖𝑗 𝑜𝑏𝑗 𝑜𝑟 𝑛𝑜𝑜𝑏𝑗 (𝐶𝑖 − 𝐶𝑖)2 + 𝑖=0 𝑆2 ℝ𝑖𝑗 𝑜𝑏𝑗 𝑜𝑟 𝑛𝑜𝑜𝑏𝑗 𝑐∈𝑐𝑙𝑎𝑠𝑠𝑒𝑠 (𝑝𝑖(𝑐) − 𝑝𝑖(𝑐))2 2.2. Training
  • 56. Training을 위한 error 계산시 고려하는 것. 1. Object가 있는 cell에 대해서만 classification error 줄이는 방향. 2. Responsible한 bounding box coordinate 에 대해서만 error 줄이는 방향 Training이 향하는 방향 2.2. Training
  • 57. Image > grid cell > bounding boxes Training이 향하는 방향 2.2. Training
  • 58. Image > grid cell > bounding boxes Training이 향하는 방향 2.2. Training
  • 60. 75 105 135 0.0001 0.001 0.00001 1) Learning rate To avoid overfitting… 2) dropout( rate =0.5) 3) Data augmentation (scaling, translation ,exposure , saturation) 2.2. Training What have done….
  • 61. 2.4. Limitations of YOLO (1) Strong spatial constraints on bounding box predictions Each grid cell only predicts two boxes and can only have one class. For door This spatial constraint limits the number of nearby objects that our model can predict. Our model struggles with small objects that appear in groups For motocycle
  • 62. 2.4. Limitations of YOLO (2) Struggles to generalize to objects in new or unusual aspect ratios or configurations. Our model learns to predict bounding boxes from data. ???training Unusual aspect ratio
  • 63. 2.4. Limitations of YOLO (3) Loss function treats errors the same in small bounding boxes versus large bounding boxes. ??? Width,heigh이 아니고, IOU에 대해서 Large , small box 고려안함
  • 64. DPM R-CNN Other Fast Detectors (FAST R-CNN,FASTER R- CNN) Deep multi-box OverFeat MultiGrasp • Sliding window • Extract features :disjoint pipeline 1. extract static features 2. classify regions 3. predict bounding boxes … • Find object : Region proposals. 1. Selective search 2. CNN 3. SVM 4. NMS disjoint Selective search (x) Sharing computation and using neural network region proposal Single object detection Efficiently performs sliding window detection. Disjoint system. Grid approach to bounding box predictions is based on the MultiGrasp system for regression to grasps. Single convolutional neural network • Each grid cell proposes a potential bounding boxes and scores using CNN. • Bounding box 98 : 2000 • Single, jointly optimized model • real-time • gene • CNN to predict bounding boxes in and image 3. Comparisons to Other Detection Systems
  • 66. 4. Experiments • Fast YOLO is the fastest object detection method On PASCAL. • YOLO is 10mAP more accurate than the fast version While still well above real-time in speed • R-CNN –R : Selective Search(x)  static bounding box proposal • Fast R-CNN : Relies on Selective Search slow • Faster RNN : Selective Search(x) neural network to propose bounding boxes
  • 67. 4.2. VOC 2007 Error Analysis YOLO vs Fast R-CNN Each prediction is either correct Or error. we look at the top N predictions for that category.(top N detections) • YOLO struggles to localize objects correctly. • Fast RCNN is almost 3x more likely to predict background error. false positive. (objec가 없는데 있다고 했다)
  • 68. 4.3. Combining Fast R-CNN and YOLO YOLO & Fast R-CNN YOLO 랑 combine 한것이 단순히 ensemble의 효과는 아니다. 왜냐하면 다른 Fast RCNN과 ensemble 했을 때에 비해 조금 더 많은 benefit을 갖고있다.  YOLO makes different kinds of mistakes at test time  effective at boosting Fast R-CNN  benefit from the speed of YOLO (x), why not? We run each model separately. Give that prediction a boost (YOLO 의 probability, overlap between the two box 고려) R-CNN Every proposal box predicts a similar box?YOLO Better localization Better background yes No 라면 background일 확률 높다? YOLO에서 거른다?
  • 69. 4.4. VOC 2012 Results
  • 70. 4.5. Generalizability We want our detector to learn robust representations of visual concepts so that they can generalize to new domains or unexpected input at test time. Picasso Dataset People-Art Dataset • In real-word applications, it is hard to predict all possible use cases. • Test data can diverge from wat the system has seen before. .training VOC data Test For generalizing detection to artwork. Detect people Natural image Art image
  • 71. 4.5. Generalizability • YOLO : AP degrades less than other methods • R-CNN : highly overfit to PASCAL VOC. • Selective Search : highly tuned for natural image • Only sees small regions • DPM : AP degrades less than other methods. But it starts from a lower AP. Artwork and natural images are very different on a pixel level But they are similar in terms of the size and shape of objects.
  • 72. 5. Real-Time Detection In The Wild • When attachaed to a webcam it functions like a traking system. • Demo is available.
  • 73. 6. Conclusion • A Unified model for object detection • Simple to construct • Can be trained directly on full images. • Generalizes well to new domains

Editor's Notes

  1. *Predict할때 데이터는 어떤식으로? *Pr(object) : confidence as Pr(Object) * IOU(truth,pred) 왜 2개만 곱하나? *왜 단점에 box크기 고려안한다고되어있지?
  2. http://www.cnblogs.com/lillylin/p/6207119.html
  3. http://www.cnblogs.com/lillylin/p/6207119.html
  4. DPM : extract static features, classify regions, predict bounding boxes for high scoring regions etc.
  5. https://medium.com/@xslittlegrass/almost-real-time-vehicle-detection-using-yolo-da0f016b43de
  6. https://www.slideshare.net/aurot/googlenet-insights 1*1 convolutional kernel Channel 별로 linear combination 해주는 효과 Inception module 계산량 대폭 줄여줌 Feature pooling