SlideShare a Scribd company logo
Object detection
Agenda
 Selective search
 RCNN family (Two stage)
 Retinanet (One stage)
 Anchors
 Losses
 Stats
 MAP
Classification vs Detection
Classification
Object Detection
Goal:
Problem: Where do we look in the image for the
object?
Kitte
n
Segmentation
Idea: If we correctly segment the image before running object
recognition, we can use our segmentations as candidate objects.
Advantages: Can be efficient, makes no assumptions about object
sizes or shapes.
Selective search
• Start by oversegmenting the input image
“Efficient graph-based image
segmentation” Felzenszwalband
Huttenlocher, IJCV2004
Image gradients
Image gradients
Similarity measures
 Color: 25 bin color histogram for each channel =75 (rgb)
 Texture: HOG like gaussian derivatives of the image in 8
directions and for each channel. Construct a 10-bin histogram
for each region = 240 dim vector.
 Size: Size similarity encourages smaller regions to merge
early. It ensures that region proposals at all scales are formed
at all parts of the image.
 Shape: Measures how well two regions (ri and rj) fit into each
other. If ri fits into rj merge them to fill gaps
Selective search
1. Merge two most similar regions basedonS.
2. Update similarities between the newregion and its
neighbors.
3. Gobackto step 1.until the
whole imageis
asingle region.
Selective search
• Use hierarchical segmentation: start with small superpixels and
merge based on diverse cues
• Take bounding boxesof all generatedregions andtreat them aspossible
object locations
Selective search
Stats
• Recallis aproportion of objects thatare
covered by some box with >0.5overlap
Selecte
d
setting
s
Region proposals!
R-CNN: Region proposals + CNN features
R-CNN details
• Cons
• Training is slow (84h), takes a lot of disk space
• 2000 CNN passes per image
• Inference (detection) is slow (47s / image with VGG16)
• The selective search algorithm is a fixed algorithm, no learning is
happening!. This could lead to the generation of bad candidate
region proposals.
Fast R-CNN
ConvNet
Forward whole image through ConvNet
“conv5” feature map of image
“RoI Pooling” layer
Linear +
softmax
FCs Fully-connected layers
Softmax classifier
Region
proposals
Linear Bounding-box regressors
Fast R-CNN
• Pros
• Less compute overhead
• 2.3 seconds per image inference time
• Cons
• Inference of 2.3 secs is still slow for real life!
• The selective search algorithm is a fixed algorithm, no learning is
happening!. This could lead to the generation of bad candidate
region proposals.
Fast R-CNN training
ConvNet
Linear +
softmax
FCs
Linear
Log loss + smooth L1 loss
Trainable
Multi-task loss
Speed comparison
Faster R-CNN
Region proposal network (RPN)
• Slide a small window over the feature map
• Predict object/no object
• Regress bounding box coordinates
• Box regression is with reference to anchors (3 scales x 3 aspect ratios)
Loss
i : index of an anchor in a mini-batch
pi: is the predicted probability of anchor i being an object
p∗i is 1 if the anchor is positive, and is 0 if the anchor is
negative.
ti: 4 predicted bounding box coordinates
t∗i: ground-truth box associated coordinates with a
positive anchor
Lreg (ti , t∗i ) = R(ti − t∗i ) where R is the robust loss
function (smooth L1)
Classification+Regression
Online hard example mining
• Class imbalance hurts training.
• We are training the model to learn background
space rather than detecting objects.
 Sort anchors by their calculated loss, apply NMS
 Pick the top ones such that ratio between the
picked negatives and positives is at most 3:1.
• Faster rcnn selects 256 anchors - 128 positive,
128 negative
Speed comparison
Faster R-CNN
• Pros
• 0.2 seconds per image inference time superfast for real life
• Uses RPN instead so better proposals as it can be trained
Strided convolutions (refresher)
Stride 1 convolution with 3x3 Kernel Stride 2 convolution with 3x3 Kernel
IOU: Intersection over union (refresher)
NMS: non max suppression (refresher)
Initial predicted boxes Filtered (Suppressed boxes) by IOU
Why one stage detector trails accuracy?
Two-stage:
The proposal stage rapidly
narrows down #candidate object
locations to a small number (e.g.,
1-2k), filtering out most
background samples
In the classification stage, fix
foreground-to-background ratio to
1:3, or online hard example
mining (OHEM).
One-stage:
Have to process a much larger
set of candidate object locations
regularly sampled across an
image, which amounts to
enumerating ~100k locations that
densely cover spatial positions,
scales, and aspect ratios.
 Extreme foreground-background class imbalance encountered
Activation maps
How about predicting from multiple maps?
As image goes through deeper in the
network, resolution decreases and
semantic value increases
Feature pyramid networks (FPN)
• Improve predictive power of
lower-level feature maps by
adding contextual
information from higher-
level feature maps
Top-Down+Lateral connections
Retinanet
Backbone
Activation
maps at
different
pyramid
levels
Can be:
Densenet
VGG
MobileNet
Retinanet - Architecture
Anchors
• Aspect ratios: 0.5, 1, 2
• Scales: 1, 1.25, 1.58
• Strides: 8,16,32,64,128
• Sizes: 32, 64, 128, 256, 512
• Total (A): ratio*scales=3*3=9 anchors/pixel location
• (K) object classes
Anchors - Example
• Anchor dims=(size*scale)/sqrt(ratio)
• Eg for 32 anchor size:
• [-22 -11 22 11] 44X22 [-28 -14 28 14] 56X28 [-35 -17 35 17] 70X34
• [-16 -16 16 16] 32X32 [-20 -20 20 20] 40X40 [-25 -25 25 25] 50X50
• [-11 -22 11 22] 22X44 [-14 -28 14 28] 28X56 [-17 -35 17 35] 34X70
For 800,600 input image:
• P3 activation map shape: 100,75
• Stride: 8
• Total (A) = 9 anchors per pixel location
• Total anchors at P3 level = 100*75*9
= 67500
• Similarly sum for all pyramid levels
P3,P4,P5,P6,P7 = total 90360! anchors per
image
Shift anchors
Shift anchors according to input image from activation map
(26,15)
(-22,-11)
(22,11)
(-18,-7)
(0,0)
(4,4)
Shift anchor centered at (0,0) on P3 (stride 8)
Activation map by [ 4. 4. 4. 4.]
Next shift [ 12. 4. 12. 4.], [ 20. 4. 20. 4.] , ….
(4,4) (12,4)
8
Input Image
Anchors applied wrt to input image!
Cross Entropy loss
Examples that are easily classified (pt >
0.5) incur a loss with non-trivial magnitude
but summed over a large number of easy
examples, these small loss values can
overwhelm the rare class.
Balanced Entropy loss
Alpha=1 for foreground,1-alpha for background
• Alpha hyperparam
• While α balances the importance
of positive/negative examples, it
does not differentiate between
easy/hard examples!
Example
• The loss from easy
examples = 100000×0.1 =
10000
• The loss from hard
examples = 100×2.3 =
230
• 10000 / 230 = 43. It is
about 40× bigger loss
from easy examples.
Focal loss!
• Misclassified, pt is small, modulating factor is near 1, loss is
unaffected.
• As pt → 1, the factor goes to 0 and the loss for well-classified
examples is down-weighted..
• with γ = 2, example classified with pt = 0.9 would have 100×
lower loss compared to CE and with pt ≈ 0.968 it would have
1000× lower loss. This in turn increases the importance of
correcting misclassified examples!
• Every sample is weighted
according to its error!
• Modulating factor added
• Focusing parameter γ smoothly
adjusts the rate at which easy
examples are downweighted
Focal loss!
unlike FL, OHEM completely
discards easy examples
Focal loss!
Smooth L1 loss: Bounding boxes
Prediction pipeline
Predicts regression(deltas) to anchor boxes!
 Filter by 0.05 anchor score threshold
 Get 1000 boxes per level, merge all
 Apply NMS at 0.5
 300 final boxes! display to user 
Stats
Stats
MAP: mean average precision
Precision = TP/(TP+FP) = 2/3 = 0.67
Recall is the proportion of TP out of the ground truth labels = 2/5 = 0.4
MAP: Interpolation approach (old 2007)
MAP: Interpolation approach (old 2007)
MAP: AUC approach (new 2011)
Thank You

More Related Content

What's hot

Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
Usman Qayyum
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
Brodmann17
 
You Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object DetectionYou Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object Detection
DADAJONJURAKUZIEV
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
Universitat Politècnica de Catalunya
 
Advanced deep learning based object detection methods
Advanced deep learning based object detection methodsAdvanced deep learning based object detection methods
Advanced deep learning based object detection methods
Brodmann17
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
Wenjing Chen
 
Convolutional neural networks
Convolutional neural networks Convolutional neural networks
Convolutional neural networks
Roozbeh Sanaei
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
Jinwon Lee
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
Jinwon Lee
 
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detection
Entrepreneur / Startup
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
Kasun Chinthaka Piyarathna
 
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
SlideTeam
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
Brodmann17
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
SumeraHangi
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning II
Yu Huang
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi Kerola
Preferred Networks
 
R-CNN
R-CNNR-CNN
Faster rcnn
Faster rcnnFaster rcnn
Faster rcnn
捷恩 蔡
 
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAIYurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Lviv Startup Club
 

What's hot (20)

Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
You Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object DetectionYou Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object Detection
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
 
Advanced deep learning based object detection methods
Advanced deep learning based object detection methodsAdvanced deep learning based object detection methods
Advanced deep learning based object detection methods
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
 
Convolutional neural networks
Convolutional neural networks Convolutional neural networks
Convolutional neural networks
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detection
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
 
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning II
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi Kerola
 
R-CNN
R-CNNR-CNN
R-CNN
 
Faster rcnn
Faster rcnnFaster rcnn
Faster rcnn
 
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAIYurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
 
Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural network
 

Similar to Object detection - RCNNs vs Retinanet

Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
ananth
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper review
Yoonho Na
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and Future
Sanghamitra Deb
 
Introduction to convolutional networks .pptx
Introduction to convolutional networks .pptxIntroduction to convolutional networks .pptx
Introduction to convolutional networks .pptx
ArunNegi37
 
Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료
taeseon ryu
 
Generating super resolution images using transformers
Generating super resolution images using transformersGenerating super resolution images using transformers
Generating super resolution images using transformers
NEERAJ BAGHEL
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptx
fahmi324663
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
Jihong Kang
 
D3L4-objects.pdf
D3L4-objects.pdfD3L4-objects.pdf
D3L4-objects.pdf
ssusere945ae
 
DIP Lecture 7-9.pdf
DIP Lecture 7-9.pdfDIP Lecture 7-9.pdf
DIP Lecture 7-9.pdf
SAhsanShahBukhari
 
Anomaly Detection and Localization Using GAN and One-Class Classifier
Anomaly Detection and Localization  Using GAN and One-Class ClassifierAnomaly Detection and Localization  Using GAN and One-Class Classifier
Anomaly Detection and Localization Using GAN and One-Class Classifier
홍배 김
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
Pierre de Lacaze
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
pushkarjoshi42
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
kibrualemu812
 
Svm ms
Svm msSvm ms
Svm ms
student
 
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Yan Xu
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
TheULTIMATEALLROUNDE
 
Auro tripathy - Localizing with CNNs
Auro tripathy -  Localizing with CNNsAuro tripathy -  Localizing with CNNs
Auro tripathy - Localizing with CNNs
Auro Tripathy
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
Yu Huang
 

Similar to Object detection - RCNNs vs Retinanet (20)

Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper review
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and Future
 
Introduction to convolutional networks .pptx
Introduction to convolutional networks .pptxIntroduction to convolutional networks .pptx
Introduction to convolutional networks .pptx
 
Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료
 
Generating super resolution images using transformers
Generating super resolution images using transformersGenerating super resolution images using transformers
Generating super resolution images using transformers
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptx
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
D3L4-objects.pdf
D3L4-objects.pdfD3L4-objects.pdf
D3L4-objects.pdf
 
DIP Lecture 7-9.pdf
DIP Lecture 7-9.pdfDIP Lecture 7-9.pdf
DIP Lecture 7-9.pdf
 
Anomaly Detection and Localization Using GAN and One-Class Classifier
Anomaly Detection and Localization  Using GAN and One-Class ClassifierAnomaly Detection and Localization  Using GAN and One-Class Classifier
Anomaly Detection and Localization Using GAN and One-Class Classifier
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
Svm ms
Svm msSvm ms
Svm ms
 
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
Auro tripathy - Localizing with CNNs
Auro tripathy -  Localizing with CNNsAuro tripathy -  Localizing with CNNs
Auro tripathy - Localizing with CNNs
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
 

Recently uploaded

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 

Recently uploaded (20)

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 

Object detection - RCNNs vs Retinanet

  • 2. Agenda  Selective search  RCNN family (Two stage)  Retinanet (One stage)  Anchors  Losses  Stats  MAP
  • 4. Object Detection Goal: Problem: Where do we look in the image for the object? Kitte n
  • 5. Segmentation Idea: If we correctly segment the image before running object recognition, we can use our segmentations as candidate objects. Advantages: Can be efficient, makes no assumptions about object sizes or shapes.
  • 6. Selective search • Start by oversegmenting the input image “Efficient graph-based image segmentation” Felzenszwalband Huttenlocher, IJCV2004
  • 9. Similarity measures  Color: 25 bin color histogram for each channel =75 (rgb)  Texture: HOG like gaussian derivatives of the image in 8 directions and for each channel. Construct a 10-bin histogram for each region = 240 dim vector.  Size: Size similarity encourages smaller regions to merge early. It ensures that region proposals at all scales are formed at all parts of the image.  Shape: Measures how well two regions (ri and rj) fit into each other. If ri fits into rj merge them to fill gaps
  • 10. Selective search 1. Merge two most similar regions basedonS. 2. Update similarities between the newregion and its neighbors. 3. Gobackto step 1.until the whole imageis asingle region.
  • 11. Selective search • Use hierarchical segmentation: start with small superpixels and merge based on diverse cues • Take bounding boxesof all generatedregions andtreat them aspossible object locations
  • 13. Stats • Recallis aproportion of objects thatare covered by some box with >0.5overlap Selecte d setting s
  • 15. R-CNN: Region proposals + CNN features
  • 16. R-CNN details • Cons • Training is slow (84h), takes a lot of disk space • 2000 CNN passes per image • Inference (detection) is slow (47s / image with VGG16) • The selective search algorithm is a fixed algorithm, no learning is happening!. This could lead to the generation of bad candidate region proposals.
  • 17. Fast R-CNN ConvNet Forward whole image through ConvNet “conv5” feature map of image “RoI Pooling” layer Linear + softmax FCs Fully-connected layers Softmax classifier Region proposals Linear Bounding-box regressors
  • 18. Fast R-CNN • Pros • Less compute overhead • 2.3 seconds per image inference time • Cons • Inference of 2.3 secs is still slow for real life! • The selective search algorithm is a fixed algorithm, no learning is happening!. This could lead to the generation of bad candidate region proposals.
  • 19. Fast R-CNN training ConvNet Linear + softmax FCs Linear Log loss + smooth L1 loss Trainable Multi-task loss
  • 22. Region proposal network (RPN) • Slide a small window over the feature map • Predict object/no object • Regress bounding box coordinates • Box regression is with reference to anchors (3 scales x 3 aspect ratios)
  • 23. Loss i : index of an anchor in a mini-batch pi: is the predicted probability of anchor i being an object p∗i is 1 if the anchor is positive, and is 0 if the anchor is negative. ti: 4 predicted bounding box coordinates t∗i: ground-truth box associated coordinates with a positive anchor Lreg (ti , t∗i ) = R(ti − t∗i ) where R is the robust loss function (smooth L1) Classification+Regression
  • 24. Online hard example mining • Class imbalance hurts training. • We are training the model to learn background space rather than detecting objects.  Sort anchors by their calculated loss, apply NMS  Pick the top ones such that ratio between the picked negatives and positives is at most 3:1. • Faster rcnn selects 256 anchors - 128 positive, 128 negative
  • 26. Faster R-CNN • Pros • 0.2 seconds per image inference time superfast for real life • Uses RPN instead so better proposals as it can be trained
  • 27. Strided convolutions (refresher) Stride 1 convolution with 3x3 Kernel Stride 2 convolution with 3x3 Kernel
  • 28. IOU: Intersection over union (refresher)
  • 29. NMS: non max suppression (refresher) Initial predicted boxes Filtered (Suppressed boxes) by IOU
  • 30. Why one stage detector trails accuracy? Two-stage: The proposal stage rapidly narrows down #candidate object locations to a small number (e.g., 1-2k), filtering out most background samples In the classification stage, fix foreground-to-background ratio to 1:3, or online hard example mining (OHEM). One-stage: Have to process a much larger set of candidate object locations regularly sampled across an image, which amounts to enumerating ~100k locations that densely cover spatial positions, scales, and aspect ratios.  Extreme foreground-background class imbalance encountered
  • 31. Activation maps How about predicting from multiple maps? As image goes through deeper in the network, resolution decreases and semantic value increases
  • 32. Feature pyramid networks (FPN) • Improve predictive power of lower-level feature maps by adding contextual information from higher- level feature maps Top-Down+Lateral connections
  • 35. Anchors • Aspect ratios: 0.5, 1, 2 • Scales: 1, 1.25, 1.58 • Strides: 8,16,32,64,128 • Sizes: 32, 64, 128, 256, 512 • Total (A): ratio*scales=3*3=9 anchors/pixel location • (K) object classes
  • 36. Anchors - Example • Anchor dims=(size*scale)/sqrt(ratio) • Eg for 32 anchor size: • [-22 -11 22 11] 44X22 [-28 -14 28 14] 56X28 [-35 -17 35 17] 70X34 • [-16 -16 16 16] 32X32 [-20 -20 20 20] 40X40 [-25 -25 25 25] 50X50 • [-11 -22 11 22] 22X44 [-14 -28 14 28] 28X56 [-17 -35 17 35] 34X70 For 800,600 input image: • P3 activation map shape: 100,75 • Stride: 8 • Total (A) = 9 anchors per pixel location • Total anchors at P3 level = 100*75*9 = 67500 • Similarly sum for all pyramid levels P3,P4,P5,P6,P7 = total 90360! anchors per image
  • 37. Shift anchors Shift anchors according to input image from activation map (26,15) (-22,-11) (22,11) (-18,-7) (0,0) (4,4) Shift anchor centered at (0,0) on P3 (stride 8) Activation map by [ 4. 4. 4. 4.] Next shift [ 12. 4. 12. 4.], [ 20. 4. 20. 4.] , …. (4,4) (12,4) 8 Input Image Anchors applied wrt to input image!
  • 38. Cross Entropy loss Examples that are easily classified (pt > 0.5) incur a loss with non-trivial magnitude but summed over a large number of easy examples, these small loss values can overwhelm the rare class.
  • 39. Balanced Entropy loss Alpha=1 for foreground,1-alpha for background • Alpha hyperparam • While α balances the importance of positive/negative examples, it does not differentiate between easy/hard examples!
  • 40. Example • The loss from easy examples = 100000×0.1 = 10000 • The loss from hard examples = 100×2.3 = 230 • 10000 / 230 = 43. It is about 40× bigger loss from easy examples.
  • 41. Focal loss! • Misclassified, pt is small, modulating factor is near 1, loss is unaffected. • As pt → 1, the factor goes to 0 and the loss for well-classified examples is down-weighted.. • with γ = 2, example classified with pt = 0.9 would have 100× lower loss compared to CE and with pt ≈ 0.968 it would have 1000× lower loss. This in turn increases the importance of correcting misclassified examples! • Every sample is weighted according to its error! • Modulating factor added • Focusing parameter γ smoothly adjusts the rate at which easy examples are downweighted
  • 42. Focal loss! unlike FL, OHEM completely discards easy examples
  • 44. Smooth L1 loss: Bounding boxes
  • 45. Prediction pipeline Predicts regression(deltas) to anchor boxes!  Filter by 0.05 anchor score threshold  Get 1000 boxes per level, merge all  Apply NMS at 0.5  300 final boxes! display to user 
  • 46. Stats
  • 47. Stats
  • 48. MAP: mean average precision Precision = TP/(TP+FP) = 2/3 = 0.67 Recall is the proportion of TP out of the ground truth labels = 2/5 = 0.4
  • 51. MAP: AUC approach (new 2011)

Editor's Notes

  1. At each sliding-window location, we simultaneously predict multiple region proposals, where the number of maximum possible proposals for each location is denoted as k. So the reg layer has 4k outputs encoding the coordinates of k boxes, and the cls layer outputs 2k scores that estimate probability of object or not object for each proposal