+
FAME:
Face Association
through Model
Evolution
Pinar Duygulu (CMU, Hacettepe University)
Eren Golge (Bilkent University)
+
Yale dataset:
10 subjects, 9 poses, 64 illuminations
Labelled Faces in the wild
13323 Faces of 5749 celebrities
PubFig dataset
60,000 images of 200 people
Social Face Classification
4.4 million labeled faces from
4,030 people
+
Labeling for how many?
+
Search web for faces of a query name
+
Use this set to learn models
+
Variations and sub-categories
+
Irrelevant people
+
Find category related images in
the set of weakly labeled images
+Our previous work: Densest component
Most similar set of faces as a subgraph
Assumption:
The most similar
subset of faces
among the faces
associated with a
name will be the
correct faces
Drawback:
Finds a single
subset
Ozkan, D., Duygulu, P., ”Interesting Faces: A Graph Based Approach for Finding People in News”, Pattern
Recognition, 2010
Ozkan, D., Duygulu, P., ”A Graph Based Approach for Naming Faces in News Photos”, CVPR, 2006
Ozkan, D., Duygulu, P., ”Finding People Frequently Appearing in News”, CIVR, 2006
+
Our previous work: Concept Maps
Grouping and outlier removal
Golge, E., Duygulu, P., “Concept Maps: Mining Noisy Web Data for Concept Learning ”, accepted
+ Our previous work: Concept Maps
Grouping and outlier removal
Assumption:
Faces of a single
person can have sub-
categories
Outliers are different
than the queried
person
Drawback:
Eliminates strange
looks not in groups
as well
+
FAME
Face Association through Model Evolution
Capture discriminative and representative category
images through iterative data cleansing
Separate category instances versus random images.
Agnostic data refining method against Irrelevancy.
Evade Sub-Grouping using very high dimensional
representations.
+ Overview of FAME
First discern category candidates (CC) from random set (RS).
Define category references(CR) inside CC .
Second discern CR from CC.
Define spurious instances (SI) against CR and eliminate.
Re-Iterate
+
Step 1
Discerning category from random set
Learn a linear model M1 betweencategory
candidates CC and random set RS.
Take the most confidently classified instances
as the category references CR.
+
Step 2
Discerning category references from others
Another model M2 between category references
CR and other category candidates.
+
Step 3
Define spurious instances SI against category references CR.
Eliminate SI.
+ FAME
+
Eliminations by iteration
+
Eliminations by iteration
+
High Dimensional Representation
High dimensions help a category linearly separable
from others despite of category modularity.
+
Feature learning
Coates, Adam, Andrew Y. Ng, and Honglak Lee. "An analysis of single-layer networks in unsupervised feature
learning." International Conference on Artificial Intelligence and Statistics. 2011.
+
Raw pixel LBP encoded outliers
+
Implementation details
FAME
Data refining : L1 Logistic Regression with Gauss-Seidel algorithm [1]
Final Classifier: L1 Linear SVM with Grafting[2].
At each iteration 5 images are eliminated.
Feature Learning
Augment train data with horizontally flipped images.
Re-size each gray-level image 60px height.
Contrast Normalization to random patches.
ZCA whitening with Ɛ=0.5.
Receptive field (patch) size 6x6 pixels
1 pixel stride with k=2400 words.
Final feature vector has 5x2400 dimensions.
[1] Shirish Krishnaj Shevade and S Sathiya Keerthi. A simple and efficient algorithm for gene selection using sparse logistic
regression. Bioinformatics,19(17):2246–2253, 2003.
[2] Simon Perkins, Kevin Lacker, and JFAMEs Theiler. Grafting: Fast, incremental feature selection by gradient descent in
function space. The Journal of Machine Learning Research, 3:1333–1356, 2003.
+
Datasets
PubFig83
Subset of PubFig with 83 celebrities
at least 100 images for each.
N. Pinto, Z. Stone, T. Zickler, and D. Cox, “Scaling up biologically-inspired computer vision: A case
study in unconstrained face recognition on facebook,” in Computer Vision and Pattern
Recognition Workshops (CVPRW), 2011.
+
Datasets
FAN-Large
EASY subset: faces larger than 60x70 px, 138 categories.
ALL: no constraint, 365 categories.
M. Ozcan, J. Luo, V. Ferrari, and B. Caputo, “A large-scale database of images and captions for
automatic face naming.,” in BMVC, pp. 1–11, 2011.
+ Results on PubFig83
N. Pinto, Z. Stone, T. Zickler, and D. Cox, “Scaling up biologically-inspired computer vision: A case study in
unconstrained face recognition on facebook,” in Computer Vision and Pattern Recognition Workshops
(CVPRW), 2011 IEEE Computer Society Conference on, pp. 35–42, IEEE, 2011
B. C. Becker and E. G. Ortiz, “Evaluating open-universe face identification on the web,” in Computer Vision
and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on, pp. 904–911, IEEE, 2013.
No data refining, only our classification pipeline.
Models are trained on the training set of the given dataset
~5% improvement on State of Art
+
Evaluations
+ False versus true outlier elimination
+
Cross validation accuracies
+
Number of eliminations versus accuracy
+Models learned from weakly labeled set
 Baseline: all images collected for the query are used
 AME-M1 : Only M1 classifier which removes against global
negatives
 AME-SVM : with SVM as the final classifier
 AME-LR : the proposed method
S. Singh, A. Gupta, and A. A. Efros, “Unsupervised discovery of mid-level discriminative patches,”
in European Conference Computer Vision (ECCV), 2012.
+
Summary
 A method to build training sets from weakly-
labeled images
 Iterative pruning removes the outliers which are
the least confident instances
 High dimensional feature representation handles
the variations
+
TUBITAK 112E174
CHIST-ERA MUCKE
US Department of Defense, U. S. Army Research Office (W911NF-13-1-0277)
National Science Foundation Grant No. IIS-1251187
Thanks
+
Use annotated control set as a start point.
Fergus et. al. [1], OPTIMOL, Li and Fei-Fei [2]
We use fully autonomous framework.
Use Textual Captions
Berg and Forsyth [3]
We use only visual content
Discriminative image cues
Efros et al. [4] “Discriminative Patches”, Q. Li et al.[5]
We use single computer with faster and better results.
[1] Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image
search. In: Computer Vision, 2005. ICCV 2005
[2] Berg, T.L., Berg, A.C., Edwards, J., Maire, M., White, R., Teh, Y.W., Learned-Miller, E.G., Forsyth,
D.A.: NFAMEs and faces in the news. In: IEEE Conference on Computer Vision
Pattern Recognition (CVPR). Volume 2. (2004) 848–854
[3] Li, L.J., Fei-Fei, L.: Optimol: automatic online picture collection via incremental model learning.
International journal of computer vision 88(2) (2010) 147–168
[4] Li, Q., Wu, J., & Tu, Z. Harvesting Mid-level Visual Concepts from Large-scale Internet Images.

Fame cvpr

  • 1.
    + FAME: Face Association through Model Evolution PinarDuygulu (CMU, Hacettepe University) Eren Golge (Bilkent University)
  • 2.
    + Yale dataset: 10 subjects,9 poses, 64 illuminations Labelled Faces in the wild 13323 Faces of 5749 celebrities PubFig dataset 60,000 images of 200 people Social Face Classification 4.4 million labeled faces from 4,030 people
  • 3.
  • 4.
    + Search web forfaces of a query name
  • 5.
    + Use this setto learn models
  • 6.
  • 7.
  • 8.
    + Find category relatedimages in the set of weakly labeled images
  • 9.
    +Our previous work:Densest component Most similar set of faces as a subgraph Assumption: The most similar subset of faces among the faces associated with a name will be the correct faces Drawback: Finds a single subset Ozkan, D., Duygulu, P., ”Interesting Faces: A Graph Based Approach for Finding People in News”, Pattern Recognition, 2010 Ozkan, D., Duygulu, P., ”A Graph Based Approach for Naming Faces in News Photos”, CVPR, 2006 Ozkan, D., Duygulu, P., ”Finding People Frequently Appearing in News”, CIVR, 2006
  • 10.
    + Our previous work:Concept Maps Grouping and outlier removal Golge, E., Duygulu, P., “Concept Maps: Mining Noisy Web Data for Concept Learning ”, accepted
  • 11.
    + Our previouswork: Concept Maps Grouping and outlier removal Assumption: Faces of a single person can have sub- categories Outliers are different than the queried person Drawback: Eliminates strange looks not in groups as well
  • 12.
    + FAME Face Association throughModel Evolution Capture discriminative and representative category images through iterative data cleansing Separate category instances versus random images. Agnostic data refining method against Irrelevancy. Evade Sub-Grouping using very high dimensional representations.
  • 13.
    + Overview ofFAME First discern category candidates (CC) from random set (RS). Define category references(CR) inside CC . Second discern CR from CC. Define spurious instances (SI) against CR and eliminate. Re-Iterate
  • 14.
    + Step 1 Discerning categoryfrom random set Learn a linear model M1 betweencategory candidates CC and random set RS. Take the most confidently classified instances as the category references CR.
  • 15.
    + Step 2 Discerning categoryreferences from others Another model M2 between category references CR and other category candidates.
  • 16.
    + Step 3 Define spuriousinstances SI against category references CR. Eliminate SI.
  • 17.
  • 18.
  • 19.
  • 20.
    + High Dimensional Representation Highdimensions help a category linearly separable from others despite of category modularity.
  • 21.
    + Feature learning Coates, Adam,Andrew Y. Ng, and Honglak Lee. "An analysis of single-layer networks in unsupervised feature learning." International Conference on Artificial Intelligence and Statistics. 2011.
  • 22.
    + Raw pixel LBPencoded outliers
  • 23.
    + Implementation details FAME Data refining: L1 Logistic Regression with Gauss-Seidel algorithm [1] Final Classifier: L1 Linear SVM with Grafting[2]. At each iteration 5 images are eliminated. Feature Learning Augment train data with horizontally flipped images. Re-size each gray-level image 60px height. Contrast Normalization to random patches. ZCA whitening with Ɛ=0.5. Receptive field (patch) size 6x6 pixels 1 pixel stride with k=2400 words. Final feature vector has 5x2400 dimensions. [1] Shirish Krishnaj Shevade and S Sathiya Keerthi. A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics,19(17):2246–2253, 2003. [2] Simon Perkins, Kevin Lacker, and JFAMEs Theiler. Grafting: Fast, incremental feature selection by gradient descent in function space. The Journal of Machine Learning Research, 3:1333–1356, 2003.
  • 24.
    + Datasets PubFig83 Subset of PubFigwith 83 celebrities at least 100 images for each. N. Pinto, Z. Stone, T. Zickler, and D. Cox, “Scaling up biologically-inspired computer vision: A case study in unconstrained face recognition on facebook,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2011.
  • 25.
    + Datasets FAN-Large EASY subset: faceslarger than 60x70 px, 138 categories. ALL: no constraint, 365 categories. M. Ozcan, J. Luo, V. Ferrari, and B. Caputo, “A large-scale database of images and captions for automatic face naming.,” in BMVC, pp. 1–11, 2011.
  • 26.
    + Results onPubFig83 N. Pinto, Z. Stone, T. Zickler, and D. Cox, “Scaling up biologically-inspired computer vision: A case study in unconstrained face recognition on facebook,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on, pp. 35–42, IEEE, 2011 B. C. Becker and E. G. Ortiz, “Evaluating open-universe face identification on the web,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on, pp. 904–911, IEEE, 2013. No data refining, only our classification pipeline. Models are trained on the training set of the given dataset ~5% improvement on State of Art
  • 27.
  • 28.
    + False versustrue outlier elimination
  • 29.
  • 30.
    + Number of eliminationsversus accuracy
  • 31.
    +Models learned fromweakly labeled set  Baseline: all images collected for the query are used  AME-M1 : Only M1 classifier which removes against global negatives  AME-SVM : with SVM as the final classifier  AME-LR : the proposed method S. Singh, A. Gupta, and A. A. Efros, “Unsupervised discovery of mid-level discriminative patches,” in European Conference Computer Vision (ECCV), 2012.
  • 32.
    + Summary  A methodto build training sets from weakly- labeled images  Iterative pruning removes the outliers which are the least confident instances  High dimensional feature representation handles the variations
  • 33.
    + TUBITAK 112E174 CHIST-ERA MUCKE USDepartment of Defense, U. S. Army Research Office (W911NF-13-1-0277) National Science Foundation Grant No. IIS-1251187 Thanks
  • 34.
    + Use annotated controlset as a start point. Fergus et. al. [1], OPTIMOL, Li and Fei-Fei [2] We use fully autonomous framework. Use Textual Captions Berg and Forsyth [3] We use only visual content Discriminative image cues Efros et al. [4] “Discriminative Patches”, Q. Li et al.[5] We use single computer with faster and better results. [1] Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: Computer Vision, 2005. ICCV 2005 [2] Berg, T.L., Berg, A.C., Edwards, J., Maire, M., White, R., Teh, Y.W., Learned-Miller, E.G., Forsyth, D.A.: NFAMEs and faces in the news. In: IEEE Conference on Computer Vision Pattern Recognition (CVPR). Volume 2. (2004) 848–854 [3] Li, L.J., Fei-Fei, L.: Optimol: automatic online picture collection via incremental model learning. International journal of computer vision 88(2) (2010) 147–168 [4] Li, Q., Wu, J., & Tu, Z. Harvesting Mid-level Visual Concepts from Large-scale Internet Images.