We present a novel two-phase Support Vector Machine (SVM) based specific recognition model that is learned using an optimized generic recognition model.
Human Factors of XR: Using Human Factors to Design XR Systems
Generic to Specific Recognition Models for Membership Analysis in Group Videos
1. Generic to Specific Recognition Models for
Membership Analysis in Group
Videos
Wenxuan Mou1, Christos Tzelepis 1,3, Vasileios Mezaris 3, Hatice Gunes2,
Ioannis Patras1
Queen Mary University of London1, University of Cambridge2, Information Technologies
Institute/Centre for Research and Technology Hellas (CERTH) 3
3. 3
Introduction
Definitions of group membership recognition
Recognize which group each individual is part of
Group 1 Group 2
Group 3
Subject 1 Subject 2 Subject 3 Subject 4 Subject 5 Subject 6 Subject 7 Subject 8
Subject 9 Subject 10 Subject 11 Subject 12
4. 4
Introduction
Motivation
People in a group share similar behaviors and
emotions [1]
Body features work better than the face features
in predicting group membership [2]
[1] S. G. Barsade, “The ripple effect: Emotional contagion and its influence on group behavior,” Administrative Science Quarterly, 2002.
[2] W. Mou, H. Gunes, and I. Patras, “Automatic recognition of emotions and membership in group videos,” in CVPRW, 2016.
5. 5
The Proposed Framework
X Y
f
X:Body behaviors
Y:Group membership
(Group 1 ? 2 ? 3 ?)
f:Linear-SVM
12 subjects from 3 groups watching 4 videos
Leave one subject out (blue) cross-validation
Group membership recognition is modelled as
a classification problem
6. 6
The Proposed Framework
Independent recognition modelGeneric recognition model Independent recognition modelGeneric recognition model
(W1)
(W2)
(W3)
(W4)
Generic model and independent model
8. 8
The Proposed Framework
λ : the regularization parameter
w0, b0 : the optimization parameters
𝐿 : the loss function and is given by the hinge-loss
𝑃𝑔𝑒𝑛𝑒𝑟𝑖𝑐: min
𝑤0, 𝑏0
λ
2
||𝑤0||2 +
1
𝑙
𝑖=1
𝑙
𝐿 𝑤0 , 𝑏0; 𝑥𝑖, 𝑧𝑖
Generic model
9. 9
The Proposed Framework
𝑃𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐: min
w,𝑏
μ
2
||w||2
+
ν
2
||𝑤 − 𝑤0||2
+
1
|𝑋𝑡|
𝐿(𝑤, 𝑏; (𝑥𝑖, 𝑧𝑖))
𝑋𝑡 :a subset of the original training set
μ and ν :regularization parameters
ν
2
||𝑤 − 𝑤0||2
: is used to bias 𝑤 to be close to 𝑤0
Specific model
11. 11
The Proposed Framework
[1]
[1] H. Wang, A. Kl¨ aser, C. Schmid, and C.-L. Liu, “Dense trajectories and motion boundary descriptors for action recognition,” IJCV, 2013.
Feature extraction
12. 12
Experiments and Results
• Data: 3 groups of data, 12 participants in total
• Feature: Body HOF feature
• Classifier: linear-SVM
Different Models Average Accuracy (p-value)
Specific recognition model 52% (p<0.01)
Generic recognition model 35% (p=0.41)
Independent recognition model 33% (p=0.42)
Chance level 33%
Experiment setup
Experimental results
13. 13
Conclusions and Future Work
The proposed two-phase learning model outperforms
the other models
Group membership recognition results also indicates
that individuals influence each others behaviours
within a group and their nonverbal behaviours share
commonalities
Conclusions
14. 14
Conclusions and Future Work
Future work
Testing with a larger database
Applying the two-phase learning model to other
applications, such as affect recognition
15. 15
Many thanks for your
attention !
Acknowledgement: The work of Wenxuan Mou is supported by CSC/Queen Mary joint PhD scholarship. The
work of Hatice Gunes and Wenxuan Mou is partially funded by the EPSRC under its IDEAS Factory Sandpits call
on Digital Personhood (grant ref: EP/L00416X/1). The work of Christos Tzelepis and Vasileios Mezaris is
supported by EU’s Horizon 2020 programme under grant agreement H2020-693092 MOVING.
Editor's Notes
The presentation is outlined as four parts, I will start with the introduction of group membership analysis, then I will go through the proposed framework. After that, I will talk about the experiments setup and analysis based on the experimental results, finally we will have a brief conclusion and discuss about the future work.
The first step of the proposed framework is to learn the generic recognition model using the standard linear SVM.
In this model, we use all of the available training samples, which are from all subjects across all videos. We denote
this training set as X=(xi, zi) here, xi denotes the ith training sample and zi refers to the corresponding ground truth.
We use the Pegasos [24] SGD algorithm for solving the above optimization problem and we arrive at the optimal
solution (w0; b0), which describes the optimal hyperplaneH0 : w0>x + b0= 0. Then, we use the optimal w0 to construct the specific recognition model