Introduction to Face Detection
By: Amar Jindal
NIT Patna
Face Detaction or Object Detection
 Combination of computer vision and image processing to find out object in
a image or video.
 The Object detection includes classifying an object in the image and
localize its poition in the image.
 This can be achived using image segmentation with AI.
 Classify the face.
 Localize the face
 List of facial
features.
How it Looks like?
Semantic
Segmentation
Instance
Segmentation
Types of Segmentation
Computer Vision Tasks
Source: CS231n Object detection http://cs231n.stanford.edu/slides/2016/winter1516_lecture8.pdf
Classification using LeNet on MNIST dataset
Computer Vision Tasks
Source: CS231n Object detection http://cs231n.stanford.edu/slides/2016/winter1516_lecture8.pdf
Classification + Localization
● Classification:
○ Input: Image
○ Output: Class label
○ Loss: Cross entropy (Softmaxlog)
○ Evaluation metric: Accuracy
● Localization:
○ Input: Image
○ Output: Box in the image (x, y, w, h)
○ Loss: L2 Loss (Euclidean distance)
○ Evaluation metric: Intersection over Union
● Classification + Localization:
○ Input: Image
○ Output: Class label + box in the image
○ Loss: Sum of both losses
Convolution
Operation
Poolin
g
Cross Entropy
 Measures the performance of a
 classification model whose output
 is a probability value between 0 &
1.
 Increases as the predicted P(e)
 diverges from the actual label.
Euclidean Distance
 This loss function is usually used for regression problems,It measures the
 distance between two distinct points.
Shallow
Features
Deep
Features
Feature
1
Feature
2
Classification + Localization: Model
Classification
Head:
● C Scores for C
classes
Localization Head:
● Class agnostic:
(x,y,w,h)
● Class specific:
(x,y,w,h) X C
Computer Vision Tasks
Source: CS231n Object detection http://cs231n.stanford.edu/slides/2016/winter1516_lecture8.pdf
Intersection Over Union (IoU)
● Important measurement for object localization.
● Used in both training and evaluation.
IoU
Algorithm
Face Recognition using Siamese Networks
 This network allows to calculate the degree of similarity between two
inputs
 Since we can not have multiple images of the same person in
our database,
Yolo Based Face Detection
Each grid cell has 5 associated values. The first one is the probability p of that cell
containing the center of a face.The other 4 values are the (x_center, y_center, width,
height) of the detected face (relative to the cell).
 2x [8 filter convolutional layer on 288x288 image]
 Max pooling (288x288 to 144x144 feature map)
 2x [16 filter convolutional layer on 144x144 feature map]
 Max pooling (144x144 to 72x72 feature map)
 2x [32 filter convolutional layer on 72x72 feature map]
 Max pooling (72x72 to 36x36 feature map)
 2x [64 filter convolutional layer on 36x36 feature map]
 Max pooling (36x36 to 18x18 feature map)
 2x [128 filter convolutional layer on 18x18 feature map]
 Max pooling (18x18 to 9x9 feature map)
 4x [192 filter convolutional layer on 9x9 feature map]
 5 filter convolutional layer on 9x9 feature map for the final
grid
Auxiliary network
Outputs of the main network were not as accurate as expected. Hence, a small
CNN network was implemented to take as input a small image containing a face
Object Detection
l
● Input: Image
● Output: For each object class c and each
image i,an algorithm returns predicted
detections: ocations
with confidence scores .
Object Detection: Evaluation
● Today new metrics are emerging
○ Averaging precision over all IoU thresholds: 0.5:0.05:0.95
○ Averaging precision for different object sizes (small, medium, big)
○ Averaging recall as a metric to measure object proposal quality.
Average Precision (AP)
● [In the vision community] AP is the estimated area under the PR curve
Mean Average Precision (mAP)
● The winner of each object class is the team with the highest average precision
● The winner of the challenge is the team with the highest mean Average
Precision (mAP) across all classes.
Object Detection: Evaluation
● Mean Average Precision (mAP) across all classes, based on Average Precision
(AP) per class, based on Precision and Recall.
Applications
 Face Unlock
 Person Identification
 Video Survillance
Doubts..?

Introduction to object detection

  • 1.
    Introduction to FaceDetection By: Amar Jindal NIT Patna
  • 2.
    Face Detaction orObject Detection  Combination of computer vision and image processing to find out object in a image or video.  The Object detection includes classifying an object in the image and localize its poition in the image.  This can be achived using image segmentation with AI.
  • 3.
     Classify theface.  Localize the face  List of facial features. How it Looks like?
  • 4.
  • 5.
    Computer Vision Tasks Source:CS231n Object detection http://cs231n.stanford.edu/slides/2016/winter1516_lecture8.pdf
  • 6.
  • 7.
    Computer Vision Tasks Source:CS231n Object detection http://cs231n.stanford.edu/slides/2016/winter1516_lecture8.pdf
  • 8.
    Classification + Localization ●Classification: ○ Input: Image ○ Output: Class label ○ Loss: Cross entropy (Softmaxlog) ○ Evaluation metric: Accuracy ● Localization: ○ Input: Image ○ Output: Box in the image (x, y, w, h) ○ Loss: L2 Loss (Euclidean distance) ○ Evaluation metric: Intersection over Union ● Classification + Localization: ○ Input: Image ○ Output: Class label + box in the image ○ Loss: Sum of both losses
  • 10.
  • 11.
  • 12.
    Cross Entropy  Measuresthe performance of a  classification model whose output  is a probability value between 0 & 1.  Increases as the predicted P(e)  diverges from the actual label.
  • 13.
    Euclidean Distance  Thisloss function is usually used for regression problems,It measures the  distance between two distinct points.
  • 14.
  • 15.
    Classification + Localization:Model Classification Head: ● C Scores for C classes Localization Head: ● Class agnostic: (x,y,w,h) ● Class specific: (x,y,w,h) X C
  • 16.
    Computer Vision Tasks Source:CS231n Object detection http://cs231n.stanford.edu/slides/2016/winter1516_lecture8.pdf
  • 17.
    Intersection Over Union(IoU) ● Important measurement for object localization. ● Used in both training and evaluation.
  • 18.
  • 19.
    Face Recognition usingSiamese Networks  This network allows to calculate the degree of similarity between two inputs
  • 20.
     Since wecan not have multiple images of the same person in our database,
  • 21.
    Yolo Based FaceDetection Each grid cell has 5 associated values. The first one is the probability p of that cell containing the center of a face.The other 4 values are the (x_center, y_center, width, height) of the detected face (relative to the cell).
  • 22.
     2x [8filter convolutional layer on 288x288 image]  Max pooling (288x288 to 144x144 feature map)  2x [16 filter convolutional layer on 144x144 feature map]  Max pooling (144x144 to 72x72 feature map)  2x [32 filter convolutional layer on 72x72 feature map]  Max pooling (72x72 to 36x36 feature map)  2x [64 filter convolutional layer on 36x36 feature map]  Max pooling (36x36 to 18x18 feature map)  2x [128 filter convolutional layer on 18x18 feature map]  Max pooling (18x18 to 9x9 feature map)  4x [192 filter convolutional layer on 9x9 feature map]  5 filter convolutional layer on 9x9 feature map for the final grid
  • 23.
    Auxiliary network Outputs ofthe main network were not as accurate as expected. Hence, a small CNN network was implemented to take as input a small image containing a face
  • 24.
    Object Detection l ● Input:Image ● Output: For each object class c and each image i,an algorithm returns predicted detections: ocations with confidence scores .
  • 25.
    Object Detection: Evaluation ●Today new metrics are emerging ○ Averaging precision over all IoU thresholds: 0.5:0.05:0.95 ○ Averaging precision for different object sizes (small, medium, big) ○ Averaging recall as a metric to measure object proposal quality.
  • 26.
    Average Precision (AP) ●[In the vision community] AP is the estimated area under the PR curve
  • 27.
    Mean Average Precision(mAP) ● The winner of each object class is the team with the highest average precision ● The winner of the challenge is the team with the highest mean Average Precision (mAP) across all classes.
  • 28.
    Object Detection: Evaluation ●Mean Average Precision (mAP) across all classes, based on Average Precision (AP) per class, based on Precision and Recall.
  • 29.
    Applications  Face Unlock Person Identification  Video Survillance
  • 30.