kinship_poster

Joseph P. Robinson, Ryan Birke, Timothy Gillis, Justin Xia, Yue Wu, Yun Fu
We present a large-scale dataset for visual kin-based problems, i.e., kinship
verification and family recognition. Namely, the Families in the Wild (FIW) dataset.
Motivated by the lack of a single, unified image dataset available for kinship tasks, our
goal is to provide the research community with a dataset that captivates interests for
involvement, i.e., large enough in scope to inherently provide platforms for multiple
vision tasks. We were able to collect and label the largest set of family images to date
with a small team and an efficient labelling tool that was designed to optimize the
process of marking complex hierarchical relationships, attributes, and local label
information in family photos. We experimentally compare our dataset to existing
kinship image datasets, and demonstrate the practical value of our FIW dataset. We
also show that using a pre-trained CNN model as an off-the-shelf feature extractor
works better than features traditionally used. We further boost performance by fine-
tuning CNN on FIW data. We also measure human performance and show their
performance does not match up to that of machine vision algorithms.
Abstract
A 5-fold cross validation kinship verification. 90% images of each family are used to
fine-tune the model and the rest for validation. We remove the last fully- connected
layer which is used to identify 2,622 people and utilize the triplet loss [11] as the loss
function. The network was frozen except the last fully-connected layer which was
used as the features for kinship verification. To fine-tune CNN we interfaced with the
well-known Caffe [10] framework. The learning rate was initially set to 10−5 and
decreases by a factor of 10 every 700 iterations. The model was fine-tuned for about
1,400 iterations. The batch size was set to 128 images. The other setting of the
network was the same with the original VGG-FACE. The training was conducted on a
single GTX Titan X using about 10 GB GPU memory.
Deep CNN: Fine Tuning
• Built the largest kinship database to date, along with the labels, baseline results,
and evaluation protocols needed to further and track future progress.
• Found pre-trained CNNs yield the best features for our unconstrained dataset.
• Revealed algorithms outperform humans doing the verification tasks.
• Obtained top scores for both kinship verification and family recognition by fine-
tuning CNN network on FIW data.
• Develop project page to go live upon being published in peer review paper.
• Generate additional baseline results for tasks new to visual kinship (e.g., fine-
grain classification and search & retrieval).
• Use data to explore natural inheritance from a visual perspective.
Conclusions/Future Work
1. Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns:
Application to face recognition. PAMI (2006).
2. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV (2004).
3. Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: BMVC (2015).
4. Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate
object detection and semantic segmentation. CoRR (2013).
5. Xia, S., Shao, M., Luo, J., Fu, Y.: Understanding kin relationships in a photo. IEEE
Transactions on Multimedia 14(4) (2012).
6. Fang, R., Tang, K.D., Snavely, N., Chen, T.: Towards computational models of kinship
verification. ICIP (2010).
7. Qin, X.., Chen, S. Tri-subject kinship verification: Understanding the core of a family.
8. Xia, S., Shao, M., Fu, Y.: Kinship verification through transfer learning. IJCAI (2011).
9. Guo Y., Dibeklioglu H., vander Maaten L. Graph-based kin recognition ICPR (2014).
10. Y. Jia, E. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe:
Convolutional architecture for fast feature embedding. arXiv:1408.5093, 2014.
11. F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face
recognition and clustering. CVPR 2015.
References
Table 3 Features and parameters used throughout this work.
Kinship Verification on FIW.
• Extracted features listed in Table 3, and used a cosine distance metric.
• Benchmarked results for 2 tasks, kinship verification and family recognition.
o Included various feature types, both raw and transformed with different
metric learning and dimensionality reduction schemes .
o Repeated experiments using several pre-existing kinship datasets to further
examine differences (i.e., FIW vs its predecessors).
• Kin relation types and sample sizes are listed in Table 1 and displayed in Fig 3.
o The number of positive and negative samples is split 50-50 (e.g., F-S pair was
made up of 35k pairs, 17.5k positive samples and 17.5k negative samples).
• Cross-validated on 5-fold [see Table 4].
o F-S, as an example, split 3.5k positive & 3.5k negative pairs in each fold.
o Determines kinship from cosine similarity scores on 4 test folds for each pair.
A threshold was used to determine the kin relation of each pair. The average
performance of 5 folds is reported.
• Fine-tuned VGG-Face deep CNN model achieved top scores.
Family Recognition.
• VGG-Face + one-vs-rest SVMs were used as baseline.
• 2 different experimental settings were used:
1. Families which have more than 20 images were selected, resulting in 399 families
with total 11,158 images. 80% images of each family was randomly set as training
data and the rest was set as testing.
2. For the second setting, we extract the families which have more than 5 members
in our dataset. Then we choose 5 members which have the most images in each
family. This results in 316 families with 7,772 images. Then we split these images
into 5 folds, each contained one family member for training, then test on all other
members.
• Fine-tuned VGG-Face deep CNN, which again achieved top score [see Table 5].
Human Kinship Verification.
• Measured human performance on kinship verification using 200 face pairs of the
11 pairwise types supported in FIW.
• Human performers scored an overall average of 56.6%, which was outscored by
fine-tuned CNN by approximately 15 % [see Fig 5].
Experiment
Fig 3 Sample pairs for the 11 kinship relations provided by the FIW database.
Feature Description
SIFT [1] • Resized images to 64x64, with the block size set as
16x16. The stride is 8, making for 49 blocks per
image. Feature dimensions: 12x8x49 = 6,272D.
LBP [2] • Resized image to 64x64; divided image into 16x16
non-overlapping blocks (i.e., 16 blocks per image);
extracted LBP features with radius = 2 and mapped to
8-neighbors. Each block were binned as 256D
histogram to yield a feature vector of dimension
256x16 = 4,096D.
VGG-Face CNN
Descriptors [3]
• Very Deep architecture, small convolutional kernels
(i.e., , 3x3), & a convolutional stride of 1 pixel; trained
on ~2.6M images of 2,622 celebrity faces. The pre-
trained CNN was used as a feature extractor; fed face
images of size 224x224 through the network with the
output set as the 2nd to last fully-connected layer (i.e.,
fc7-layer), resulting vectors were 4,096D.
F-D F-S M-D M-S SIBS B-B S-S GF-D GF-S GM-D GM-S Avg.
HOG 56.2 56.9 56.8 55.7 59.3 50.3 57.8 62.4 58.9 59.5 57.7 57.4
HOG PCA 56.1 56.5 56.4 55.3 58.7 50.3 57.4 59.3 66.9 60.4 56.9 57.7
LBP 55.0 55.2 55.4 56.0 57.1 57.0 55.9 59.0 56.0 55.8 60.3 56.6
LBP PCA 55.0 55.3 55.4 55.9 57.1 56.8 55.8 58.5 59.1 55.6 60.1 56.8
VGG-Face 64.3 63.3 66.4 64.2 73.2 71.4 70.6 66.1 61.1 64.9 60.4 66.0
VGG-Face PCA 64.4 63.4 66.2 64.0 73.2 71.5 70.8 64.4 68.6 66.2 63.5 66.9
Fine-Tuned 67.8 66.6 66.7 68.2 72.3 70.8 70.3 69.5 68.3 69.5 63.5 68.5
Fine-Tuned PCA 69.4 68.2 68.4 69.4 74.4 73.0 72.5 72.9 72.3 72.4 68.3 71.0
Families in the Wild (FIW): A Large-Scale Kinship Recognition Database
Father-Daughter
Mother-Daughter
Mother-Son
Father-Son
Grandfather-Granddaughter
Grandmother-Grandson
Grandfather-Grandson
Grandmother-Granddaughter
Sister-Sister
Brother-Brother
Siblings
Table 4 Verification scores for 5-fold experiment on FIW. Note, there was no family overlap between folds. Top accuracies
resulted from fine-tuning the VGG-Face model using FIW data by replacing the topmost layer with a triplet loss.
Table 1 No. pairs in FIW and other kinship image collections.
Pair-Type KFW-II Sibling
Face
Group
Face
Family
101
FIW
(Ours)
Brother-Brother -- 232 40 -- 86k
Sister-Sister -- 211 32 -- 86k
Siblings -- 277 53 -- 75k
Father-Daughter 250 -- 69 147 45k
Father-Son 250 -- 69 213 43k
Mother-Daughter 250 -- 62 148 44k
Mother-Son 250 -- 70 184 37k
GF-GD -- -- -- -- 410
GF-GS -- -- -- -- 350
GM-GD -- -- -- -- 550
GM-GS -- -- -- -- 750
Total 1,000 720 395 607 >418k
Dataset No.
Fam.
No.
People
No.
Faces
Age
Varies
Full
Fam.
Highlights
CornellKin
[5]
150 300 300 No No Parent-child pairs.
UB KinFace-I
[8]
90 180 270 Yes No Parent-child pairs. Parents’ 139 images
at various ages.
UB KinFace-
II [8]
200 400 600 Yes No Parent-child pairs. Parents’ 139 images
at various ages.
KFW-I [6] — 1,066 1k No No Parent-child pairs.
KFW-II [6] — 2,000 2k No No Parent-child pairs.
TSKinFace
[9]
787 2,589 — Yes Yes Two parents-child pairs for tri-
verification.
Family101
[7]
101 607 14k Yes Yes Family structured, variations
in age and ethnicity.
FIW(Ours) 1k 10.6k 27k Yes Yes A corpus of 1k family trees that
provides both depth and breadth and
multi-task evaluation offerings.
Table 2 Comparison of FIW with related datasets.
Fold VGG-Face VGG-Face (fine-tuned)
1 9.6 10.9
2 14.5 14.8
3 11.6 12.5
4 12.7 14.8
5 13.1 13.5
Avg. 12.3 13.3
Table 5 Family recognition results for 5-fold experiment.
Top score results from fine-tuned CNN .
Fig 1 Work-flow of labelling tool used to build FIW (a). Each time a face is selected it is surrounded by a
resizable bounding box (b). If the family member has been added to dataset their name is specified.
Otherwise, ’new’ adds a member (c)– their name and gender are given (d), then relationships to others (e).
Advancing visual kinship recognition technology is key for many real-world
applications (e.g., kinship verification, automatic photo library management,
genealogical analysis, and applied in social media). Although there has been some
efforts the machine vision and multimedia research communities made since 2010,
the lack of large enough datasets has restricted any substantial progress. For this, we
built the largest visual kinship dataset to date , Families in the Wild (FIW) [see Table 1
& 2]. Annotations (i.e., ground truth labels) were generated with our efficient SW tool
KAT-SMILE [see Fig 1] , which was developed to annotate the complex hierarchical
nature of 1,000 different family trees such to generate a rich set of labels capable of
supporting various evaluation types (i.e., multi-task purposes). Several
comprehensive experiments were designed, conducted, and presented here.
Introduction
Graduate
Category: Engineering and Technology
Degree Level: PhD
Abstract ID# 1271
Fig 4 Box-and-Whisker chart to depict the scores of 20 human observers doing kinship
verification on FIW dataset. The magenta color cross (+) marks the average score for the fine-
tuned CNN model, which outperforms the top human scorer in every category.
Fig 2 Visualizing structure of FIW. Family trees span up to 5 generations in depth with 10 set of parents in a single tree. As
shown the Spielberg family (i.e., family ID 703 out of 1,000) is made up of 3 parents with 10 children in total. Each column
(right), contains samples of an individual family member. Notice there are multiple samples for each person at various ages.
Steven
(a)
Kate
(c)
(d)
(e)
(b)
Spielberg Family: Face Samples
Spielberg Family Tree
YoungerßAGEàOlder
0.2
0.3
0.4
0.5
0.6
0.7
0.8
F-D F-S M-D M-S SIB GF-GD GF-GS GM-GD GM-GS Avg.
Kinship Verification: Human Observers

kinship_poster

Recommended

Recommended

More Related Content

Featured

Featured (20)

kinship_poster