Introduction to Face Processing with Computer Vision

Introduction to
Face Processing
with Computer Vision
Gabriel Bianconi
Founder, Scalar Research

Gabriel Bianconi
Founder, Scalar Research
AI & Data Science Consulting Firm
Previously at the Stanford AI Lab

Agenda
• Theory
• Detection
• Recognition
• Other Tasks
• Practice
• Rapid Prototyping
• Scaling
3

Haar-Like Features
• Summarize image based on simple color patterns
• Manually determined feature extractors (kernels)
• Leveraged for first real-time face detector (2001)
6Ref: Viola & Jones (2001). Image: Wikimedia

Histogram of Oriented Gradients (HOG)
• Summarize image by distribution of color gradients
• Gradient intensities and orientations represent edges, etc.
• Captures more information than simple Haar-like features
9Ref: Shu et al. (2011).

R-CNN
• Introduces CNNs for object detection
• CNNs learn how to extract features from data
• Breakthrough in performance
• Beats previous SOTA methods by huge margin
• However, detection is extremely slow
14Ref: Girshick et al. (2014).

CNN Features
15Ref: Lee et al. (2009).

CNN Features

R-CNN
19Ref: Girshick et al. (2014).

Fast R-CNN
• Improvement to R-CNN that leverages CNN for
classification and regression
• Other than proposing regions, system is now end-to-end vs. three
components trained greedily.
• Predictions are 200x+ faster with better performance
• Region proposals still are a bottleneck; total inference time is ~2s.
20Ref: Girshick (2015).

Fast R-CNN
21Ref: Girshick (2015).

Faster R-CNN
• Leverages CNN for region proposals as well
• “Region Proposal Network”
• Finally an end-to-end system with deep learning
• About 10x faster than Fast R-CNN, with better performance
• Total inference time is ~0.2s
22Ref: Ren et al. (2016).

Faster R-CNN
23Ref: Ren et al. (2016).

MTCNN
• Many model for face detection draw heavily from
the generalized object detection methods.
• MTCNN, for example, trains a multi-task system for
detection and alignment.
24Ref: Zhang et al. (2015).

MTCNN
25Ref: Zhang et al. (2015).

DSFD
• The current SOTA method draws heavily from
modern single-shot detection architectures.
• DSFD extends to a dual-shot detector with
enhanced features and loss functions.
26Ref: Li et al. (2018).

Are we there yet?
28Ref: Yang (2016).
WIDER Face (Easy)
~96% AP
WIDER Face (Medium)
~95% AP
WIDER Face (Hard)
~90% AP

Facial Recognition
• Facial recognition actually corresponds to group of
different tasks.
• Verification vs. Identification vs. Grouping vs. …
• Closed-Set vs. Open-Set
30

Closed-Set Recognition
• Every identity appears in training set
• Example: recognizing celebrities
• Effectively a classification problem
• Model aims to learn separable features
31

Closed-Set Identification
32
Model Label ConfidencesTest Sample
Label 0 Label 1 …
… …
Images: Wikimedia

Closed-Set Verification
33
Model
Test Sample A
Test Sample B
Label Confidences
Label Confidences
Images: Wikimedia

Open-Set Recognition
• Not every identity appears in training set
• Example: Facebook Photos
• Effectively a metric learning problem
• Model aims to learn large-margin features (embeddings)
34

Embeddings
• Map each sample to a vector (coordinate system)
• Used for words, graphs, faces, etc.
• Embeddings preserve similarity
• Similar samples close to each other
• Dissimilar samples far from each other
35

Embeddings
• “Similar” depends on the training data
• Same person, physical characteristic, etc.
• Embeddings represent latent information
• High-dimensional embeddings trained on large datasets
learn to represent latent information about the person (e.g.
physical characteristics)
37

Open-Set Identification
38
Model Embedding + DistanceTest Sample
Emb. 0 Emb. 1 Emb. 2 …
Images: Wikimedia

Open-Set Verification
39
Model
Test Sample A
Test Sample B
Embedding A
Embedding B
Distance
vs.
Threshold
Images: Wikimedia

Metric Learning
40Ref: Liu et al. (2018)

Are we there yet?
41Ref: Deng et al. (2018); Learned-Miller et al. (2016)
LFW (Labeled Faces in the Wild)
99.8%+ accuracy

42
Cross-Factor
Facial Recognition

Cross-Age
43Ref: Zheng et al. (2017)

Cross-Pose
44Ref: Li et al. (2011)

Cross-Makeup
45Ref: Chen et al. (2013)

Security
• How do we deal with adversarial users?
• Real face goes undetected or misclassified
• Fake face gets recognized
• Private data is extracted from model
• …
47

Security
48Ref: Grigory Bakunov (2017)

Biometrics & Multi-Modal Data
• How do we deal with…
• Identical twins?
• Plastic surgery?
• ...
49Ref: Singh et al. (2010)

Biometrics & Multi-Modal Data
• Combine with other biometric data
• Biometric traits (e.g. hand)
• Multiple sensors (e.g. 2D + 3D)
• Multiple pictures (e.g. viewpoints, sequences)
• …
51Ref: Singh et al. (2010); Ross & Jain (2004); Ross & Govindarajan(2005)

Privacy
• How do we deal with…
• Models that can predict gender, race, …?
• Models that leak the data?
• Predictions without sharing the raw data?
• …
53Ref: Singh et al. (2010)

Alignment & Pose Estimation
55Ref: Ruiz et al. (2018)

Classification
57
Neutral
Happy
Happy

3D Reconstruction
58Ref: Sela et al. (2017)

Dozens of Tools
61
simplicity accuracyface_recognition
O
penC
V
FaceN
et
InsightFace
M
TC
N
N
……
D
SFD
……

APIs
• There are dozens of APIs providing low-cost face
processing at scale
• Most services charge less than $1 per 1000 images
• Depending on the use case, might be cheaper than provisioning GPUs
and deploying your own models (esp. if considering developer time)
• Often these APIs can achieve performance that’s
close to state-of-the-art
62

APIs – Example: Azure
• Detection
• Classification
• Gender, age, emotion, hair, smile, eyes, glasses, makeup, …
• Landmarks
• Pose Estimation
• Recognition
• Verification, identification, grouping, similarity search, …
63

Embeddings
• Face embeddings are typically used for open-set
recognition systems
• They can be leveraged to quickly train models for
downstream tasks (e.g. classification)
• Tools
• face_recognition (Github): extremely fast, reliable for frontal
• FaceNet: based on deep learning, strong across the board
64

Example – Facebook Photos
• Task: open-set face identification
• Strategy:
1. Detect faces and compute embeddings for known photos
of users; store for future use.
2. Whenever a photo is uploaded, do the same and compare
against known set.
65

import face_recognition as fr
image = fr.load_image_file("file.jpg")
face_locations = fr.face_locations(image)
66Ref: github.com/ageitgey/face_recognition
Example – Detection

image = fr.load_image_file("file.jpg")
face_embedding = fr.face_encodings(image)[0]
Example – Embedding

68Images: WikiMedia
Example – L2 Distance
- 0.31 0.59 0.69
0.31 - 0.52 0.63
0.59 0.52 - 0.50
0.69 0.63 0.50 -

Face Landmarks
• Face landmarks can also be quickly extracted with
pretrained models and used for a number of
downstream tasks.
69

face_landmarks = fr.face_landmarks(image)[0]
print(face_landmarks.keys())
# left_eyebrow, right_eyebrow, lower_lip, top_lip, …
Example – Face Landmarks

Example – Snapchat Filters
• Task: face manipulation
• Strategy:
1. Detect face and localize landmarks in image
2. Add objects, reshape image, etc. based on landmarks
72

Example – Snapchat Filters
73
from PIL import Image, ImageDraw
…
pil_image = Image.fromarray(image)
d = ImageDraw.Draw(pil_image, 'RGBA’)
lip_fill = (150, 0, 0, 128) # shade of red, 50% alpha
d.polygon(face_landmarks['top_lip'], fill=lip_fill)
d.polygon(face_landmarks['bottom_lip'], fill=lip_fill)
…

Bias
• People & Demographics
• Is your training set… Coworkers? Single location?
• Environment
• Does it cover… Day and night? Seasons? Lighting
conditions? Backgrounds?
• Sensors
• Did you consider… Diverse hardware? Calibration?
Viewpoint (angle)? Resolution? Occlusion?
76

Optimizations
• It is often easier to simplify the real-world task than
drastically improve ML models.
77

Optimizations
78
Time (weeks)
Performance Multiple model optimizations
($$$ in developer time, etc.)

Optimizations
79
Time (weeks)
Performance Install a new light
($)

Risks
• What happens when your model makes a mistake?
• How can you deal with adversarial users?
• What is your threat model?
80

Other Considerations
• How do you handle…
• Model getting stale over time?
• Growing search space?
• Large amounts of real-time data?
• Detecting or tracking people vs. faces?
• Speed vs. cost vs. performance trade-offs?
81

82
Thank you.
gabriel@scalarresearch.com

Introduction to Face Processing with Computer Vision

More Related Content

Similar to Introduction to Face Processing with Computer Vision

More from All Things Open

Recently uploaded

Introduction to Face Processing with Computer Vision