Image Object Detection Pipeline

Object Detection Pipeline
Abhinav Dadhich, ABEJA, Inc.
Tokyo Machine Learning Kitchen, 2017

Outline
1. What is Object Detection? Components?
2. Models for detection
3. Train, Test & Evaluate
4. FAQs

Object Detection
http://cs231n.stanford.edu/slides/winter1516_lecture8.pdf
Q: What object is in image?
A: Cat

Object Detection
Q: Where the object is in image?
A: bounding box or coordinates

Object Detection

Components of
a Detection
● Dataset of images and
target labels
● Pre-Processing images
and labels
● Model selection and
modifications
● Training
● Testing and Evaluation
● Deploying final model

Model Architecture
Feature
Extractor
Classifier
Object
Classification
Bounding Box

Why Deep Neural Network?
“Fast R-CNN” Ross Girshick, ICCV’15

Model Architecture
Feature
Extractor
Classifier
Object
Classification
Bounding Box
Deep CNN Architecture
● VGG
● Inception
● Resnet

Faster Region based ConvNets (Faster-RCNN)
Girshick et al. ICCV 2015.
Fig: Huang et al. 2016, arXiv:1611.10012v1
● 2 Step Process
● Higher Accuracy
● Slower per
sample prediction
wrt similar
models.
● Large image size

Region based Fully Convolutional Network(R-FCN)
Dai et al. , NIPS 2016.
● 2 Step Process
● Faster per sample
prediction time wrt
Faster-RCNN

Single Shot Detector(SSD)
Liu et al. , ECCV 2016
● 1 step Process
● Faster per sample
prediction time
● Small images,
large objects

Training
● Pre-trained models are available for major deep learning
frameworks.
● Fine tune existing model
Feature Layers
CNNs FCs
Classification Layers

Training
● Pre-trained models are available for major deep learning
frameworks.
● Fine tune existing model, re-initializing output layers.
Feature Layers
CNNs FCs
Re-initialized
Classification Layers

Evaluation Metrics & Tests
● Mean Average Precision(mAP):
○ Thresholding based on Intersection over Union(IoU) score.
○ Average over all class predictions.
○ Higher is better.
● Prediction Time : pre-processing + prediction time per image
● Memory Usage : model’s gpu/cpu memory usage while
prediction

Datasets
● [Try] Collect new dataset according to task.
● [Try] Get labels as accurate as possible.
● If not, use public datasets:
○ MSCOCO: 80 objects, 300K images, 5 captions per image.
○ Pascal VOC: 20 objects, ~20K images
○ LSVRC: 200 objects, ~470K images

Pre-processing
● Image resizing.
● Image pixel normalization.
● Data Augmentation:
○ Flip
○ Random rotations

Codes and Pre-trained Models
● Faster-RCNN: https://github.com/rbgirshick/py-faster-rcnn (Caffe)
● R-FCN : https://github.com/Orpine/py-R-FCN (Caffe)
● SSD: https://github.com/weiliu89/caffe/tree/ssd (Caffe)

Image Object Detection Pipeline

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Image Object Detection Pipeline

Similar to Image Object Detection Pipeline (20)

Recently uploaded

Recently uploaded (20)

Image Object Detection Pipeline