SlideShare a Scribd company logo
1 of 18
Object Detection using Convolutional Neural Networks
Agenda
Sample Footer Text
Why CNNs?
What is a CNN?
Object Detection: Definition
Sliding Windows Detection
Region Proposals
R-CNN
Fast R-CNN
Faster R-CNN
YOLO
IoU
NMS
Open-Source Resources
Variables of object detection
Next Steps
Object Detection: Why CNNs?
Graph credit: CS231n, Stanford University
What is a CNN?
Activation map
Input image
Applying many
filters
That’s it! A full convolutional layer.
A representation of the image.
https://analyticsindiamag.com/convolutional-neural-network-image-classification-overview/
Filter (3x3)
Object Detection: Definition
CNN
RGB
Image
List of objects
Output
Input
For each object:
1. Category label (person, car, cat, …)
2. Bounding box
(𝑥, 𝑦)
𝑊𝑖𝑑𝑡ℎ
𝐻𝑒𝑖𝑔ℎ𝑡
What is in the image and where is it?
Sliding Windows Detection
CNN
Is there a car in the image?
1
CNN 0
Issue? Huge computational cost
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks https://arxiv.org/pdf/1312.6229v4.pdf
Region Proposals
https://web.eecs.umich.edu/~justincj/teaching/eecs498/WI2022/
Find a small set of boxes that are likely to cover all objects
Selective Search
R-CNN: Region-Based CNN
Proposed
regions
(~2K)
Warped
image
regions
224x224
Rich feature hierarchies for accurate object detection and semantic segmentation.
Method:
1. Run selective search to get ~2K regions.
2. Resize (warp) regions to 224x224
3. Run regions independently through a CNN.
4. Linear SVM (FC layers)
What if regions do not exactly match the object?
Solution: CNN should learn to output a transformation of the Bbox size.
Caveat: CNNs share weights!
Issues?
1. Very slow! Run ~2k forward passes per image.
2. Using the selective search to select image regions. There is no learning at that stage.
Fast R-CNN
Idea: swap the order of the CNN with the warping.
Method:
1. Feed the input image into a CNN and compute feature maps.
2. Run the selective search on feature maps. “Cropping”
3. Warp (resize) the cropped features.
4. Feed warped features into a small “Per-region” network (e.g., FC layers).
5. Output bounding boxes with classification scores.
Faster R-CNN
Idea: use a neural network (Region Proposal Network) instead of the selective search algorithm for region proposals.
Method:
1. Feed the input image into the backbone network to get image features.
2. Pass image features to RPN to get region proposals.
3. Warp (resize) the cropped features.
4. Feed warped features into a small “Per-region” network (e.g., FC layers).
5. Output bounding boxes with classification scores.
YOLO: You Only Look Once
You only look once: Unified, real-time object detection
SSD: Single-Shot MultiBox Detector
Idea: use one giant CNN to go from the input image to a tensor of scores.
Eliminates the need for region proposals.
YOLO: You Only Look Once
You only look once: Unified, real-time object detection
SSD: Single-Shot MultiBox Detector
Input image
448x448
CNN
YOLO Architecture
Output tensor
𝑆 × 𝑆 × (𝐵 ∗ 5 + 𝐶)
𝐵 is the number of template bounding boxes
Template Boxes (𝐵 = 4):
Evaluating object localization: IoU
IoU (Intersection over Union) is used to measure the overlap between two bounding boxes.
https://pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
Non-max Suppression (NMS)
Ensures that each object is detected only once.
A solution for overlapping boxes.
Method:
Given a set of predictions (scores and boxes). Each output prediction:
𝑝𝑐
𝑏𝑥
𝑏𝑦
𝑏ℎ
𝑏𝑤
(Greedy Implementation)
1. Discard all boxes with 𝑝𝑐 ≤ 0.6
2. While there are any remaining boxes:
• Pick box with largest 𝑝𝑐 as the prediction.
• Discard all boxes with 𝐼𝑜𝑈 ≥ 0.7 with the chosen box.
https://towardsdatascience.com/non-maximum-suppression-nms-93ce178e177c
Open-Source Resources
https://github.com/facebookresearch/detectron2
https://github.com/tensorflow/models/tree/master/research/object_detection
Implement object detectors from scratch only for learning purposes!
Object Detection: variables
credit: CS231n, Stanford University
Next Steps
Thank You

More Related Content

Similar to object-detection.pptx

Introducción a las redes convolucionales
Introducción a las redes convolucionalesIntroducción a las redes convolucionales
Introducción a las redes convolucionalesJoseAlGarcaGutierrez
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
Convolutional Neural Network (CNN)of Deep Learning
Convolutional Neural Network (CNN)of Deep LearningConvolutional Neural Network (CNN)of Deep Learning
Convolutional Neural Network (CNN)of Deep Learningalihassaah1994
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureSanghamitra Deb
 
Object Detection An Overview
Object Detection An OverviewObject Detection An Overview
Object Detection An Overviewijtsrd
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detectionWenjing Chen
 
Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Dongmin Choi
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learningSushant Shrivastava
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Jihong Kang
 
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
SimCLR: A Simple Framework for Contrastive Learning of Visual RepresentationsSimCLR: A Simple Framework for Contrastive Learning of Visual Representations
SimCLR: A Simple Framework for Contrastive Learning of Visual Representationsynxm25hpxp
 
SeRanet introduction
SeRanet introductionSeRanet introduction
SeRanet introductionKosuke Nakago
 
Introduction talk to Computer Vision
Introduction talk to Computer Vision Introduction talk to Computer Vision
Introduction talk to Computer Vision Chen Sagiv
 
TechnicalBackgroundOverview
TechnicalBackgroundOverviewTechnicalBackgroundOverview
TechnicalBackgroundOverviewMotaz El-Saban
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMeetupDataScienceRoma
 
Comparative Study of Object Detection Algorithms
Comparative Study of Object Detection AlgorithmsComparative Study of Object Detection Algorithms
Comparative Study of Object Detection AlgorithmsIRJET Journal
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonAditya Bhattacharya
 
Deep learning for Computer Vision intro
Deep learning for Computer Vision introDeep learning for Computer Vision intro
Deep learning for Computer Vision introNadav Carmel
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Universitat Politècnica de Catalunya
 
Deep Neural Networks Presentation
Deep Neural Networks PresentationDeep Neural Networks Presentation
Deep Neural Networks PresentationBohdan Klimenko
 

Similar to object-detection.pptx (20)

Introducción a las redes convolucionales
Introducción a las redes convolucionalesIntroducción a las redes convolucionales
Introducción a las redes convolucionales
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
Convolutional Neural Network (CNN)of Deep Learning
Convolutional Neural Network (CNN)of Deep LearningConvolutional Neural Network (CNN)of Deep Learning
Convolutional Neural Network (CNN)of Deep Learning
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and Future
 
Object Detection An Overview
Object Detection An OverviewObject Detection An Overview
Object Detection An Overview
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
 
Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
SimCLR: A Simple Framework for Contrastive Learning of Visual RepresentationsSimCLR: A Simple Framework for Contrastive Learning of Visual Representations
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
 
SeRanet introduction
SeRanet introductionSeRanet introduction
SeRanet introduction
 
Introduction talk to Computer Vision
Introduction talk to Computer Vision Introduction talk to Computer Vision
Introduction talk to Computer Vision
 
TechnicalBackgroundOverview
TechnicalBackgroundOverviewTechnicalBackgroundOverview
TechnicalBackgroundOverview
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image Processing
 
Comparative Study of Object Detection Algorithms
Comparative Study of Object Detection AlgorithmsComparative Study of Object Detection Algorithms
Comparative Study of Object Detection Algorithms
 
CNN Algorithm
CNN AlgorithmCNN Algorithm
CNN Algorithm
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathon
 
Deep learning for Computer Vision intro
Deep learning for Computer Vision introDeep learning for Computer Vision intro
Deep learning for Computer Vision intro
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
 
Deep Neural Networks Presentation
Deep Neural Networks PresentationDeep Neural Networks Presentation
Deep Neural Networks Presentation
 

Recently uploaded

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 

Recently uploaded (20)

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 

object-detection.pptx

  • 1. Object Detection using Convolutional Neural Networks
  • 2. Agenda Sample Footer Text Why CNNs? What is a CNN? Object Detection: Definition Sliding Windows Detection Region Proposals R-CNN Fast R-CNN Faster R-CNN YOLO IoU NMS Open-Source Resources Variables of object detection Next Steps
  • 3. Object Detection: Why CNNs? Graph credit: CS231n, Stanford University
  • 4. What is a CNN? Activation map Input image Applying many filters That’s it! A full convolutional layer. A representation of the image. https://analyticsindiamag.com/convolutional-neural-network-image-classification-overview/ Filter (3x3)
  • 5. Object Detection: Definition CNN RGB Image List of objects Output Input For each object: 1. Category label (person, car, cat, …) 2. Bounding box (𝑥, 𝑦) 𝑊𝑖𝑑𝑡ℎ 𝐻𝑒𝑖𝑔ℎ𝑡 What is in the image and where is it?
  • 6. Sliding Windows Detection CNN Is there a car in the image? 1 CNN 0 Issue? Huge computational cost OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks https://arxiv.org/pdf/1312.6229v4.pdf
  • 7. Region Proposals https://web.eecs.umich.edu/~justincj/teaching/eecs498/WI2022/ Find a small set of boxes that are likely to cover all objects Selective Search
  • 8. R-CNN: Region-Based CNN Proposed regions (~2K) Warped image regions 224x224 Rich feature hierarchies for accurate object detection and semantic segmentation. Method: 1. Run selective search to get ~2K regions. 2. Resize (warp) regions to 224x224 3. Run regions independently through a CNN. 4. Linear SVM (FC layers) What if regions do not exactly match the object? Solution: CNN should learn to output a transformation of the Bbox size. Caveat: CNNs share weights! Issues? 1. Very slow! Run ~2k forward passes per image. 2. Using the selective search to select image regions. There is no learning at that stage.
  • 9. Fast R-CNN Idea: swap the order of the CNN with the warping. Method: 1. Feed the input image into a CNN and compute feature maps. 2. Run the selective search on feature maps. “Cropping” 3. Warp (resize) the cropped features. 4. Feed warped features into a small “Per-region” network (e.g., FC layers). 5. Output bounding boxes with classification scores.
  • 10. Faster R-CNN Idea: use a neural network (Region Proposal Network) instead of the selective search algorithm for region proposals. Method: 1. Feed the input image into the backbone network to get image features. 2. Pass image features to RPN to get region proposals. 3. Warp (resize) the cropped features. 4. Feed warped features into a small “Per-region” network (e.g., FC layers). 5. Output bounding boxes with classification scores.
  • 11. YOLO: You Only Look Once You only look once: Unified, real-time object detection SSD: Single-Shot MultiBox Detector Idea: use one giant CNN to go from the input image to a tensor of scores. Eliminates the need for region proposals.
  • 12. YOLO: You Only Look Once You only look once: Unified, real-time object detection SSD: Single-Shot MultiBox Detector Input image 448x448 CNN YOLO Architecture Output tensor 𝑆 × 𝑆 × (𝐵 ∗ 5 + 𝐶) 𝐵 is the number of template bounding boxes Template Boxes (𝐵 = 4):
  • 13. Evaluating object localization: IoU IoU (Intersection over Union) is used to measure the overlap between two bounding boxes. https://pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
  • 14. Non-max Suppression (NMS) Ensures that each object is detected only once. A solution for overlapping boxes. Method: Given a set of predictions (scores and boxes). Each output prediction: 𝑝𝑐 𝑏𝑥 𝑏𝑦 𝑏ℎ 𝑏𝑤 (Greedy Implementation) 1. Discard all boxes with 𝑝𝑐 ≤ 0.6 2. While there are any remaining boxes: • Pick box with largest 𝑝𝑐 as the prediction. • Discard all boxes with 𝐼𝑜𝑈 ≥ 0.7 with the chosen box. https://towardsdatascience.com/non-maximum-suppression-nms-93ce178e177c
  • 16. Object Detection: variables credit: CS231n, Stanford University

Editor's Notes

  1. Progression of object detection on the PASCAL VOC dataset, up until 2012 the performance almost plateaued but in 2013 when deep Convnets were being used in OD, the performance shot up pretty quickly.
  2. Challenges: Multiple outputs: need to output variable numbers of objects per image. Multiple types of output: need to predict “what” (category label) as well as “where” (bounding box) Large images: classification works at 224x224; need higher resolution for detection, often ~800x600 Objects can appear in different sizes and aspect ratios in the same image
  3. The hope is that there will be a window containing a car in which the Convnet can detect the car in. Disadvantage: (Slow) - You are taking so many crops of the image and feeding each of them independently through a CNN. Trying different window sizes. Using a larger stride can reduce the number of crops but coarser granularity can hurt performance. Convolutional implementation: - Pass the original image and a CNN and make a prediction of all crops at the same time.
  4. Rather than covering all possible boxes in the image, we can reduce the number of boxes by smartly selecting a small set of those crops. Often based on heuristics: e.g. look for “blob-like” image regions. Relatively fast to run; e.g. Selective Search gives 2000 region proposals in a few seconds on CPU. The idea of region proposals was the stepping-stone of many advanced object detectors such as R-CNN.
  5. One of the most influential papers in deep learning. All ConvNets share weights. Infeasible to train 2000 independent CNNs and it wouldn’t make sense because all of them share the same optimization goal which is to perform well on image regions. CNNs are trained on image regions (is given a batch of image regions @ training time) from same image across different images. An classification score and a bounding box is the output of each region. Steps: Run selective search => gives us 2K proposed regions (which can be of different sizes and aspect ratios) Warp (affine image warping) all regions into a fixed size (224x224; this size is a hyperparameter) Run regions independently through a CNN which outputs the classification over C+1 categories (C defined classes and 1 unknown; a background region with no object) Why warp? Region proposals are of different size and aspect ratio. Transform is log-space scale transform @ Test time: If the classification score is above a chosen threshold, we output the box, ow we don’t. Ex. If you don’t care about the categories output 10 boxes that have the lowest background score. Problem: What if region proposals do not exactly match up to the objects in the image? Output a bounding box. The CNN outputs a transformation that transforms the region proposal box into the final output box that we want (that contains the object. we are modifying the region proposal bounding box to fit the object. We don’t input the location of the box for the CNN to learn because we want the prediction to be invariant to the location of the object in the image.
  6. 10x times faster than R-CNN. Per-region networks are usually part of the backbone network. E.g., if AlexNet, the backbone would be the conv layers and the per-region network would be the FC layers at the end. Copping features Rol Pool. Idea: project region proposals extracted from the input image to the corresponding feature maps. Because we have CNN feature maps, then we know that each point in the feature maps corresponds to points in the input image. Then Snap to grid cells (divide into 2x2 grid of equal subregions) then max-pool with each subregion. Region features always have the same size even if input regions have different sizes!
  7. Run image into backbone network to get image features. Pass image features to the region proposal network to get region proposals. (rest is the same as before at this point) Crop and resize Per-region network Bbox and class scores. Faster R-CNN is 10x times faster than Fast R-CNN
  8. Figure 2: The Model. Our system models detection as a regression problem. It divides the image into an S × S grid and for each grid cell predicts B bounding boxes, confidence for those boxes, and C class probabilities. These predictions are encoded as an S × S × (B ∗ 5 + C) tensor. An input image is divided into multiple grid cells. For each grid, a vector is generated through the CNN which contains the classification score and the bounding box information. YOLO uses the idea of anchor boxes, where a multiple template boxes encompassing different sizes and shapes of objects possible to encounter in the dataset are used in training. The CNN would output a prediction for each of these template boxes. The prediction would match one of the available boxes
  9. YOLO uses the idea of anchor boxes, where a multiple template boxes encompassing different sizes and shapes of objects possible to encounter in the dataset are used in training. The CNN would output a prediction for each of these template boxes. The prediction would match one of the available boxes SxS is the number of grid cells B is the number of bounding boxes. We multiply by 5 because for each box we output x,y (center), width, height,, and confidence score (IoU between the predicted box and ground truth box). C is the classification score. Within each grid cell: Regression from each of the B boxes to a final box with 5 numbers (dx, dy, dh, dw, confidence) Predict scores for each class (including background,. The confidence is the measure of how much we are sure the object matches this particular template box.
  10. Jaccard similarity (Jaccard Index) but for OD it is called IoU A mechanism to compare two bounding boxes to evaluate our predictions. In practice above 0.7 is good and 0.9 is as good as it can ever get. > 0.5 decent > 0.7 good > 0.9 Excellent
  11. Algorithms usually output multiple detections of the same object, which means we have multiple overlapping boxes. If an object is detected multiple times, NMS is used to choose the best box. NMS is a method to ensure that each object is detected only once. 1. Get rid of low-probability predictions