1
OBJECT DETECTION USING DEEP
NEURAL NETWORK
2
What is deep neural networks?

A technology built to simulate the activity of the
human brain.

Has several layers including input & output layers.

Each layer performs specific types of functions.

Uses in image recognition, verification,
classification, object detection, real-time object
detection, face recognition etc.
3
TYPES OF NEURAL NETWORKS

Artificial Neural Network (ANN)

Deep Residual Network (Deep ResNet)

Recurrent Neural Network (RNN)

Convolutional Neural Network (CNN)
4
CONVOLUTIONAL NEURAL NETWORK(CNN)

One of the most popular neural networks.

Most commonly used in images recognition,
images classifications. Object detections,
recognition faces, computer vision etc.

Types of CNN:

LeNet-5

AlexNet

VGG-16

Inception Network

GoogleNet etc.
5
CNN IN OBJECT DETECTION

Very good for recognizing patterns such as edges
(vertical/horizontal), shapes, colours, and textures.

Most popular neural network for object detection.

Why CNN?:
 Parameter sharing
 Sparsity of connections
6
OBJECT DETECTION ALGORITHMS
Some popular object detection algorithms are:

Deformable parts models(DPM)

R-CNN(Region-CNN)

YOLO(You Only Look Once)
7
Deformable parts models(DPM)

Use a sliding window approach to object detection

DPM uses a disjoint pipeline to -
 extract static features
 classify regions
 predict bounding boxes for high scoring regions, etc.
8
Fig: Deformable parts models
9
R-CNN(Region-CNN)

One of the most popular object detection algorithm

Uses region proposals in-stead of sliding windows

First generate potential bounding boxes in an
image

Run a classifier on these proposed boxes.

Post-processing is used to refine the bounding
boxes, eliminate duplicate detections, and rescore
the boxes based on other objects in the scene.
10
11
LIMITATIONS OF R-CNN

Complex pipelines are slow

Hard to optimize

Individual component must be trained separately
12
YOLO(YOU ONLY LOOK ONCE)
According to the paper “You Only Look Once:Unified,
Real-Time Object Detection” - by Joseph Redmon,
Santosh Divvala, Ross Girshick, Ali Farhadi:

Reframe object detection as a single regression
problem

Straight from image pixels to bounding box
coordinates and class probabilities
13
METHODOLOGY

A single convNet simultaneously predicts multiple
bounding boxes

Calculate class probabilities for those boxes

Divides the input image into an S × S grid.

center of an object is responsible for detecting
that object.

Each grid cell predicts B bounding boxes and
confidence scores for those boxes.

Confidence scores reflect how confident the model
is that the box contains an object
14
METHODOLOGY
Fig:YOLO object detection
15
METHODOLOGY

How accurate it thinks the box is that it predicts

Confidence can be defined as Pr(Object) ∗ IOU
truth

Each bounding box consists of 5 predictions: x, y,
w, h,and confidence.
16
Fig: YOLO convolution layers
17
WHY YOLO?

YOLO is extremely fast

It doesn’t need a complex pipeline

YOLO achieves more than twice the mean average
precision of other real-time systems.

YOLO sees the entire image during training and
test time

YOLO makes less than half the number of
background errors compared to Fast R-CNN.

YOLO outperforms top detection methods like DPM
and R-CNN by a wide margin
18
LIMITATIONS

YOLO imposes strong spatial constraints on
bounding box predictions.

Model struggles with small objects that appear in
groups, such as flocks of birds.

It struggles to generalize to objects in new or
unusual aspect ratios or configurations.

Main source of error is incorrect localizations.

Top object detection algorithms in deep neural networks

  • 1.
    1 OBJECT DETECTION USINGDEEP NEURAL NETWORK
  • 2.
    2 What is deepneural networks?  A technology built to simulate the activity of the human brain.  Has several layers including input & output layers.  Each layer performs specific types of functions.  Uses in image recognition, verification, classification, object detection, real-time object detection, face recognition etc.
  • 3.
    3 TYPES OF NEURALNETWORKS  Artificial Neural Network (ANN)  Deep Residual Network (Deep ResNet)  Recurrent Neural Network (RNN)  Convolutional Neural Network (CNN)
  • 4.
    4 CONVOLUTIONAL NEURAL NETWORK(CNN)  Oneof the most popular neural networks.  Most commonly used in images recognition, images classifications. Object detections, recognition faces, computer vision etc.  Types of CNN:  LeNet-5  AlexNet  VGG-16  Inception Network  GoogleNet etc.
  • 5.
    5 CNN IN OBJECTDETECTION  Very good for recognizing patterns such as edges (vertical/horizontal), shapes, colours, and textures.  Most popular neural network for object detection.  Why CNN?:  Parameter sharing  Sparsity of connections
  • 6.
    6 OBJECT DETECTION ALGORITHMS Somepopular object detection algorithms are:  Deformable parts models(DPM)  R-CNN(Region-CNN)  YOLO(You Only Look Once)
  • 7.
    7 Deformable parts models(DPM)  Usea sliding window approach to object detection  DPM uses a disjoint pipeline to -  extract static features  classify regions  predict bounding boxes for high scoring regions, etc.
  • 8.
  • 9.
    9 R-CNN(Region-CNN)  One of themost popular object detection algorithm  Uses region proposals in-stead of sliding windows  First generate potential bounding boxes in an image  Run a classifier on these proposed boxes.  Post-processing is used to refine the bounding boxes, eliminate duplicate detections, and rescore the boxes based on other objects in the scene.
  • 10.
  • 11.
    11 LIMITATIONS OF R-CNN  Complexpipelines are slow  Hard to optimize  Individual component must be trained separately
  • 12.
    12 YOLO(YOU ONLY LOOKONCE) According to the paper “You Only Look Once:Unified, Real-Time Object Detection” - by Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi:  Reframe object detection as a single regression problem  Straight from image pixels to bounding box coordinates and class probabilities
  • 13.
    13 METHODOLOGY  A single convNetsimultaneously predicts multiple bounding boxes  Calculate class probabilities for those boxes  Divides the input image into an S × S grid.  center of an object is responsible for detecting that object.  Each grid cell predicts B bounding boxes and confidence scores for those boxes.  Confidence scores reflect how confident the model is that the box contains an object
  • 14.
  • 15.
    15 METHODOLOGY  How accurate itthinks the box is that it predicts  Confidence can be defined as Pr(Object) ∗ IOU truth  Each bounding box consists of 5 predictions: x, y, w, h,and confidence.
  • 16.
  • 17.
    17 WHY YOLO?  YOLO isextremely fast  It doesn’t need a complex pipeline  YOLO achieves more than twice the mean average precision of other real-time systems.  YOLO sees the entire image during training and test time  YOLO makes less than half the number of background errors compared to Fast R-CNN.  YOLO outperforms top detection methods like DPM and R-CNN by a wide margin
  • 18.
    18 LIMITATIONS  YOLO imposes strongspatial constraints on bounding box predictions.  Model struggles with small objects that appear in groups, such as flocks of birds.  It struggles to generalize to objects in new or unusual aspect ratios or configurations.  Main source of error is incorrect localizations.