3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection

3D DYNAMIC FACIAL SEQUENCES ANALYSIS
FOR FACE RECOGNITION AND EMOTION
DETECTION
PhD Candidate: Taleb ALASHKAR
Supervisor: Prof. Mohamed DAOUDI
Co-Supervisor: Dr. Boulbaba BEN AMOR
1
Taleb ALASHKAR PhD Defense 2-Nov-2015

WHY FACE ANALYSIS?
?
DB
ID: 15081986
Identity Recognition
Facial Expressions
29252212
Age Estimation
Physical State Monitoring
FatiguePain
AngrySurprisedHappy
WHY 3D FACE?
Illumination
Pose
3D
2D
WHY 3D DYNAMIC?
3D static
3D Dynamic
vs
vs
Year
….
22

MOTIVATION AND CHALLENGES
 Motivation to 4D (3D+t) Face Analysis
 Robustness to illumination changes and pose variations
 Availability of cost-effective (Kinect-like) and high
resolution (Di4D) 3D dynamic sensors
 Richness in shape and deformation
 Challenges of 4D Face Analysis
 Noisy data (from acquisition and sensor accuracy)
 Missing data (single-view scanners)
 Volume of data (sequence of 3D meshes)
 Low-resolution frames (Kinect-like sensors)
Compact spatio-temporal representation robust to
noise and missing data wich allows 4D face analysis
3

THESIS CONTRIBUTION
Input
3D Sequences
Subspace
Representation
Trajectory
Representation
Dictionary
Representation
(I) 4D Face Recognition (II) 4D Spontaneous Emotion Detection
Matrix manifold
w
Applications
4

OUTLINE
 4D Face Recognition
 State of the art
 4D face recognition framework
 Experiments and results
 4D Spontaneous Emotion Detection
 Trajectories on Grassmann manifold
 Spontaneous emotional state detection from depth video
 Spontaneous pain detection from 4D high resolution video
 Conclusion and Future Work
5

OUTLINE
6

FACE RECOGNITION FROM 4D DATA
State of the Art
Frame SetSuper ResolutionSpatio-Temporal
 Low resolution (Kinect)
 Illumination/FE
 Temporal information
 Complex enrollment
(Lie et al., 2013)
 Low resolution (Kinect)
 Constant expression
 Temporal information
 3D frames alignment
(Berretti et al., 2014)(Sun et al., 2010)
One Kinect
frame
3D HR
scanner 77
 Outperforms 2D video/3D static
 Space-time representation
 Time consuming
 Tracking/model adaptation/
conformal mapping/ST HMM

4D FACE RECOGNITION
4D Face Recognition Approach
Data
processing
Training
Test
time
Modeling
Identity
Subspace
Modeling (k-SVD)
Curvature-maps
Extraction
time
Mean
Curvature
Computation
time
?
= Span{ , ,...., }
Dictionary
Classification
Grassmann
Dictionary
Learning
Grassmann
SRC Sparse
Coding
time
time
8

4D FACE RECOGNITION
Where K1 and K2 is the two principle curvatures at each vertex.
Spatial Feature Extraction
 Capture the local facial shape
 Invariant to the scale, rotation and mesh resolution
 Ability to capture the non-rigid facial deformation 9

4D FACE RECOGNITION
Spatio-Temporal Subspace Representation
n×m
k-SVD
3D dynamic original Data
Matrix manifold
Curvature Map
Reshape
Subspace
Representation
k < m
 Compact low dimensional representation
 Robust against noise and missing data
 Availability of geometric statistical tools
Why subspace representation?
1 2 ………... k …. m
10

4D FACE RECOGNITION
Matrix Manifolds1:
 Stiefel manifold
 All possible k-dimensional subspaces n-dimensional space.
 Defined distance on Stiefel given by:
 Grassmann manifold
 It is a quotient space of the Stiefel manifold with an equivalence
constraint:
 X=Y if Span(X)=Span(Y), or
 Exist orthonormal k×k matrix SO(k) w.r.t X=Y*SO(k)
[1]. P.A Absil et al,. “Optimization algorithms on matrix manifolds”, 2008.
11

4D FACE RECOGNITION
Grassmann Manifold Geometry
 Non-linear manifold
 A tangent space can be defined at any point on
the manifold.
 Algorithmic tools to compute the Log and Exp
maps functions.
Distances on Grassmann1
 Canonical Correlation/Principle Angles1
 Geodesic Distance:
[1]. Hamm et al., ICML, 2008.
12

4D FACE RECOGNITION
Statistical Analysis on Grassmann manifold:
p1
p2
Log
Exp
v1
v2Tµ
µ
v3
p3
v
Intrinsic methods
Grassmann Nearest Neighbor (GNN) Classification
Training:
1. Compute karcher mean1 for every subject
in the training data (Gallery).
Testing:
2. Compare the probe with the mean of each
class using one defined distance on
Grassmann.
3. The closest mean to the probe gives the
targeted subject.
[1]. H. Karcher. PAAM, 1977.
1313

4D FACE RECOGNITION
Statistical Analysis on Grassmann manifold:
14
Extrinsic methods1
 Grassmann manifold embedding into linear space
 Less computational time than Intrinsic
Projection mapping
[1]. Harandi et al., CVPR, 2013
14

4D FACE RECOGNITION
Sparse Coding and Dictionary Learning
 Suitable for data with sparse structure
 Learning over-complete rich dictionary
 Robustness against noise and missing data
 Efficient Sparse Representation Classifier (SRC)1
[1]. Wright et al. ,PAMI, 2009
15

4D FACE RECOGNITION
Experimental Results
Database
 Bu4DFE Database1
 101 subjects / 6 expressions (sequence) per subject
 About 100 frames per sequence
Experimental Protocol (Sun et al, 2010)
 60 subjects / sub-sequence of size w=6 / shifting step 3
 Expression Dependent (ED): ½ of the expression training , ½ testing
 Expression Independent (EI): 1 expression training, 5 testing
[1]. Yin et al. , FG, 2008
16

I. Grassmann Nearest Neighbor (GNN) classifier (w=6)
ED performance are better
than EI results
 GNN is based on the mean
for each class (statistical
summary).
4D FACE RECOGNITION
II. Grassmann Sparse Representation (GSR) classifier (w=6, EI)
Consider the face dynamics improves the recognition performance
in Expression Independent  Dictionary representation
3.1%
17

4D FACE RECOGNITION
18
GSR > GGDA (variant of the GDA)
GSR < Sun et al. (10%) but
- it is computationally much less expensive
- Landmarks free
III. Grassmann sparse representation (GSR) classifier
GGDA is a variant of Grassmann Discriminant Analysis (proposed in [1]. Harandi et al. , CVPR, 2011.
ST-HMM is the 4D FR approach propose in [2]. Sun et al., IEEE T-Cybernetics, 2010.
Robustness to the temporal
evolution (neutral-apex or
apex-neutral)
1
2
Expression DependentExpression Independent
18
1

4D FACE RECOGNITION
Expression Independent
 Training by 1 vs. training by 5 expressions
- Richness of the dictionary learned
- The sparse representation (code) of a new observation can be
covered efficiently from available atoms (except for surprise)
IV. Grassmann Sparse Representation (GSR) classifier
9.2%
19

OUTLINE
20

4D SPONTANEOUS EMOTION DETECTION
Objectives
Proposing early detection framework for
spontaneous emotion from 3D dynamic
sequences in a continuous emotions space.
Challenges:
 Spontaneous emotion of interest detection
 Early emotion detection as soon as possible
 3D (depth/high resolution) video
Arousal-valence chart
21

22
3D Facial Deformation 3D Feature Tracking
State of the Art
 Global deformation
 Subtle changes
 Nose tip
 Acted FE
(Ben Amor et al., 2014)
Non-Rigid
Deformation
Parameterization facial
Deformation
Local Spatial
Feature Tracking
Landmarks
Tracking
 Global deformation
 Automatic
 Time consuming
 Acted FE
(Sandbach et al., 2011)
 Robust to Noise
 Real time
 landmarks tracking
 Acted FE
(Berretti et al., 2012)
 Robust to noise
 Fast performance
 landmarks tracking
 Acted FE
(Xue et al., 2015)
22

Trajectories analysis on matrix manifold approach
 Dividing the 3D video into subsequences
 Subspace representation for each subsequence
 Time parameterized trajectories on Matrix manifold
 Temporal evolution through trajectory is computed
 SO-SVM early event classifier applied
23

Spontaneous emotion detection from depth video
- Upper part of the body vs. the face only
Face vs. Face+ Upper Part
- Depth vs. 2D video data
24

Spontaneous emotion detection from depth video
 Depth video representation as
trajectory on Grassmann.
Geodesic distances between
successive subspaces of the
trajectory is computed.
 Geometric Motion History (GMH)
gives the temporal evolution of the
depth video.
 SO-SVM1 early event detection is
applied on the GMHs signals.
[1]. Hoai et de la Torre. IJCV ,2014
……....
25

Experimental Analysis
Database:
The experiments are conducted on
Cam3D Kinect database1 which
contains depth videos for spontaneous
emotions
Protocol:
Two emotions will be detected
(Happiness vs. others and
Thinking/Unsure vs. others).
Targeted videos will be divided into two
halves, for training and testing.
Each emotion of interest will be
concatenated with two different others
randomly to have 100 samples for training
and testing.
[1]. Mahmoud et al., ACII,2011
26

Evaluation Criteria
True Positive (TP) Rate: is the fraction of
time series that the detector fires during the
event of interest.
False Positive (FP) Rate: is the fraction
of time series that the detector fires before
the event of interest starts
I. ROC Curve: is the function of TPR
against FPR by varying the detection
threshold. Area Under ROC Curve
(AUC)
II. AMOC curve: is to evaluate the
timeliness of detection.
27

I. Grassmann vs. Stiefel manifold
Happiness detection
experiment
Thinking/unsure detection
experiment
28

II. Upper part of the body vs. face alone
29

Physical pain detection from high resolution 4D-faces
 Spontaneous pain detection out of facial expressions.
 3D dynamic high resolution data is available.
 Early detection of pain using SO-SVM framework
30

Depth-based Grassmann trajectories
Trajectory representation of the 3D video.
Velocity vectors computed between
successive subspaces.
Local Deformation Histogram (LDH) is
computed.
LDHs is concatenated.
 The beginning and the end of the pain is
defined.
SO-SVM early detection.
1st Component of
the velocity
31

3D landmark-based Grassmann trajectories (Baseline)
 3D physical pain video is divided into
subsequences.
 Facial landmarks coordinates (x,y,z) is
used as facial descriptor (83)
 Every subsequence is represented as
subspace.
 Geodesic distance is quantified
between successive subspaces to build
the GMH.
 The beginning and the end of the pain
is defined.
 SO-SVM early detection.
32

Database:
 BP4D-Spontaneous database1
 41 subjects/ 8 Tasks
 AUs annotation
Protocol:
 28 physical pain videos are used.
 14 for training and 14 for testing.
 2-cross validation is applied.
 The beginning and the end of pain is
determined according to action units
activation formula.
1,2
[1]. Zhang et al., IVCJ, 2014
[2] Prkachin et al., Pain, 2008.
AU4: Brow Lowering
AU6: Cheek raising
AU7: Tightening of eyelids
AU9: Wrinkling of nose
AU10: Raising of upper lip
33

I. Effect of the smoothing and pose normalization
AUC=0.75
AUC=0.70
AUC=0.63
AUC=0.78
AUC=0.74
AUC=0.70
Geodesicdistance/Normofthevelocity
- Increasing the derivation step
improves the results
- Normalizing the head pose improves
the results
34

II. Landmarks vs. Depth method
AUC=0.80
AUC=0.78
35

III. Local Deformation Histogram (LDH) vs. Distances
- - - Distances
- - - LDH
AUC=0.84
AUC=0.80
- - - Distances
- - - LDH
36

OUTLINE
37

CONCLUSION AND FUTURE WORK
Conclusions
 Common geometric framework with two different
representations
 Efficient subspace representation for 4D data
 Exploiting the shape and its dynamic improves the results
 Enrich the dictionary improves the results
 4D Emotion Detection
 Modeling 3D video as time-parameterized curves (Trajectories) on
Grassmann manifold.
 Upper part of the body outperforms the face alone.
 Local approach (LDH) outperforms global approach (Distances)
 Coupling geometric features (velocities) with advanced ML techniques
(Early-event detector)
38

CONCLUSION AND FUTURE WORK
Limitations
 Lack of frame-to-frame vertex-level dense correspondence
 Not considering the texture channel (available)
 Limited number of subjects in the DB/lack of spontaneous DB
Perspectives and Future Work
 Dense non-rigid registration/tracking
 Investigating high order derivatives along the trajectories
39

PUBLICATION LIST
Submitted Journal
1. T. Alashkar, B. Ben Amor, M. Daoudi and S. Berretti , “Analyzing Trajectories on
Grassmann Manifolds for Spontaneous Emotion Detection ”, Submitted to IEEE
Transaction on Affective Computing, Sep-2015.
2. T. Alashkar, B. Ben Amor, M. Daoudi and S. Berretti “Modeling Shape Dynamics on
Grassmann Manifolds for 4D Face Recognition”, In preparation.
International Conferences and Workshops
1. T. Alashkar, B. Ben Amor, S. Berretti and M. Daoudi, “Analyzing Trajectories on
Grassmann Manifold for Early Emotion Detection from Depth Videos” in FG
2015.
2. T. Alashkar, B. Ben Amor, M. Daoudi and S. Berretti, “A 3D Dynamic Database for
Unconstrained Face Recognition ” in 3D Body Scanning Technology International
Conference 2014.
3. T. Alashkar, B. Ben Amor, M. Daoudi and S. Berretti, “A Grassmannian Framework
for Face Recognition of 3D Dynamic Sequences with Challenging Conditions ”
in Springer NORDIA Workshop in ECCV 2014.
National Conference
1. T. Alashkar, B. Ben Amor, S. Berretti and M. Daoudi, “Analyse des trajectoires sur
une Grassmannienne pour la détection d’émotions dans des vidéos de
profondeur” in ORASIS 2015.
40

Thank You and You are Welcome
41
Shawrma Taboula Kebba Yabrak
Syrian SweetsNamouraMtabalHommos
Salle: F024

3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to 3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection

Similar to 3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection (20)

Recently uploaded

Recently uploaded (20)

3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection