Introduction to object detection

Introduction to Face Detection
By: Amar Jindal
NIT Patna

Face Detaction or Object Detection
 Combination of computer vision and image processing to find out object in
a image or video.
 The Object detection includes classifying an object in the image and
localize its poition in the image.
 This can be achived using image segmentation with AI.

 Classify the face.
 Localize the face
 List of facial
features.
How it Looks like?

Semantic
Segmentation
Instance
Segmentation
Types of Segmentation

Computer Vision Tasks
Source: CS231n Object detection http://cs231n.stanford.edu/slides/2016/winter1516_lecture8.pdf

Classification using LeNet on MNIST dataset

Classification + Localization
● Classification:
○ Input: Image
○ Output: Class label
○ Loss: Cross entropy (Softmaxlog)
○ Evaluation metric: Accuracy
● Localization:
○ Input: Image
○ Output: Box in the image (x, y, w, h)
○ Loss: L2 Loss (Euclidean distance)
○ Evaluation metric: Intersection over Union
● Classification + Localization:
○ Input: Image
○ Output: Class label + box in the image
○ Loss: Sum of both losses

Cross Entropy
 Measures the performance of a
 classification model whose output
 is a probability value between 0 &
1.
 Increases as the predicted P(e)
 diverges from the actual label.

Euclidean Distance
 This loss function is usually used for regression problems,It measures the
 distance between two distinct points.

Shallow
Features
Deep
Features
Feature
1
Feature
2

Classification + Localization: Model
Classification
Head:
● C Scores for C
classes
Localization Head:
● Class agnostic:
(x,y,w,h)
● Class specific:
(x,y,w,h) X C

Intersection Over Union (IoU)
● Important measurement for object localization.
● Used in both training and evaluation.

Face Recognition using Siamese Networks
 This network allows to calculate the degree of similarity between two
inputs

 Since we can not have multiple images of the same person in
our database,

Yolo Based Face Detection
Each grid cell has 5 associated values. The first one is the probability p of that cell
containing the center of a face.The other 4 values are the (x_center, y_center, width,
height) of the detected face (relative to the cell).

 2x [8 filter convolutional layer on 288x288 image]
 Max pooling (288x288 to 144x144 feature map)
 2x [16 filter convolutional layer on 144x144 feature map]
 5 filter convolutional layer on 9x9 feature map for the final
grid

Auxiliary network
Outputs of the main network were not as accurate as expected. Hence, a small
CNN network was implemented to take as input a small image containing a face

Object Detection
l
● Input: Image
● Output: For each object class c and each
image i,an algorithm returns predicted
detections: ocations
with confidence scores .

Object Detection: Evaluation
● Today new metrics are emerging
○ Averaging precision over all IoU thresholds: 0.5:0.05:0.95
○ Averaging precision for different object sizes (small, medium, big)
○ Averaging recall as a metric to measure object proposal quality.

Average Precision (AP)
● [In the vision community] AP is the estimated area under the PR curve

Mean Average Precision (mAP)
● The winner of each object class is the team with the highest average precision
● The winner of the challenge is the team with the highest mean Average
Precision (mAP) across all classes.

Object Detection: Evaluation
● Mean Average Precision (mAP) across all classes, based on Average Precision
(AP) per class, based on Precision and Recall.

Applications
 Face Unlock
 Person Identification
 Video Survillance

Introduction to object detection

More Related Content

What's hot

Similar to Introduction to object detection

Recently uploaded

Introduction to object detection