Figure 2. Large variations in facial expres-
sions, poses, illumination conditions and oc-
clusions making face recognition difﬁcult.
Best viewed in color.
• The bagging framework helps to leverage noises in the
unsupervised labeling process.
Our contribution is two-fold:
Figure 1. A news photo and its caption. Ex- • We propose a general framework to boost the face re-
tracted faces are shown on the top. These trieval performance of text-based search engines by vi-
faces might be returned for the query of sual consistency learning. The framework seamlessly
person-Bush. integrates data mining techniques such as supervised
learning and unsupervised learning based on bagging.
Our framework requires only a few parameters and
or non-person-X (the un-queried person). The faces are
ranked according to a relevancy score that is inferred from • We demonstrate its feasibility with a practical web
the classiﬁer’s probability output. Since annotation data is mining application. A comprehensive evaluation on a
not available, the rank list from the previous step is used to large face dataset of many people was carried out and
assign labels for a subset of faces. This subset is then used conﬁrmed that our approach is promising.
to train a classiﬁer using supervised methods such as sup-
port vector machines (SVM). The trained classiﬁer is used
to re-rank faces in the original input set. This step is re- 2. Related Work
peated a number of times to get the ﬁnal ranked list. Since
automatically assigning labels from the ranked list is not re- There are several approaches for re-ranking and learn-
liable, the trained classiﬁers are weak. To obtain the ﬁnal ing models from web images. Their underlying assump-
strong classiﬁer, we use the idea of ensemble learning  in tion is that text-based search engines return a large frac-
which weak classiﬁers trained on different subsets are com- tion of relevant images. The challenge is how to model
bined to improve the stability and classiﬁcation accuracy of what is common in the relevant images. One approach
single classiﬁers. The learned classiﬁer can be further used is to model this problem in a probabilistic framework in
for recognizing new facial images of the queried person. which the returned images are used to learn the parame-
The second stage improves the ranked list and recogni- ters of the model. For examples, as described by Fergus et
tion performance for the following reasons: al. , objects retrieved using an image search engine are
re-ranked by extending the constellation model. Another
• Supervised learning methods, such as SVM, provide proposal, described in , uses a non-parametric graphi-
a strong theoretical background for ﬁnding the opti- cal model and an interactive framework to simultaneously
mal decision boundary even with noisy data. Further- learn object class models and collect object class datasets.
more, recent studies [20, 17] suggest that SVM clas- The main contribution of these approaches is probabilistic
siﬁers provide probability outputs that are suitable for models that can be learned with a small number of training
ranking. images. However, these models are complicated since they
require several hundred parameters for learning and are sus- 3 Proposed Framework
ceptible to over-ﬁtting. Furthermore, to obtain robust mod-
els, a small amount of supervision is required to select seed Given a set of images returned by any text-based search
images. engine for a queried person (e.g. ’George Bush’), we per-
Another study [4, 3] proposed a clustering-based method form a ranking process and learning of person X’s model
for associating names and faces in news photos. To solve as follows:
the problem of ambiguity between several names and one
• Step 1: Detect faces and eye positions, and then per-
face, a modiﬁed k-means clustering process was used in
form face normalizations.
which faces are assigned to the closest cluster (each clus-
ter corresponding to one name) after a number of iterations. • Step 2: Compute an eigenface space and project the
Although the result was impressive, it is not easy to apply it input faces into this subspace.
to our problem since it is based on a strong assumption that
requires a perfect alignment when a news photo only has • Step 3: Estimate the ranked list of these faces using
one face and its caption only has one name. Furthermore, Rank-By-Local-Density-Score.
a large number of irrelevant faces (more than 12%) have to
be manually eliminated before clustering. • Step 4: Improve this ranked list using Rank-By-
A graph-based approach was proposed by Ozkan and
Duygulu , in which a graph is formed from faces as Steps 1 and 2 are typical for any face processing system,
nodes, and the weights of edges linked between nodes are and they are described in section 4.2. The algorithms used
the similarity of faces, is closely related to our problem. in Steps 3 and 4 are described in section 3.1 and section 3.2,
Assuming that the number of faces of the queried person is respectively. Figure 3 illustrates the proposed framework.
larger than that of others and that these faces tend to form
the most similar subset among the set of retrieved faces, 3.1 Ranking by Local Density Score
this problem is considered equal to the problem of ﬁnding
the densest subgraph of a full graph; and can therefore be
solved by taking an available solution . Although, exper-
imental results showed the effectiveness of this method, it is
still questionable whether the densest subgraph intuitively
describes most of the relevant faces of the queried person
and it is easy to extend for the ranking problem. Further-
more, choosing an optimal threshold to convert the initial
graph into a binary one is difﬁcult and rather ad hoc due to
the curse of dimensionality.
An advantage of the methods [4, 3, 16] is they are fully
unsupervised. However, a disadvantage is that no model
is learned for predicting new images of the same category.
Furthermore, they are used for performing hard categoriza-
Figure 4. An example of faces retrieved for
tion on input images that are in applicable for re-ranking.
person-Donald Rumsfeld. Irrelevant faces
The balance of recall and precision was not addressed. Typ-
are marked with a star. Irrelevant faces might
ically, these approaches tend to ignore the recall to obtain
form several clusters, but the relevant faces
high precision. This leads to the reduction in the number of
form the largest cluster.
Our approach combines a number of advances over the
existing approaches. Speciﬁcally, we learn a model for each Among the faces retrieved by text-based search engines
query from the returned images for purposes such as re- for a query of person-X, as shown in Figure 4, relevant
ranking and predicting new images. However, we used an faces usually look similar and form the largest cluster. One
unsupervised method to select training samples automati- approach of re-ranking these faces is to cluster based on vi-
cally, which is different from the methods proposed by Fer- sual similarity. However, to obtain ideal clustering results is
gus et al. and Li et al. [12, 15]. This unsupervised method impossible since these faces are high dimensional data and
is different from the one by Ozkan and Duygulu  in the the clusters are in different shapes, sizes, and densities. In-
modeling of the distribution of relevant images. We use stead, a graph-based approach was proposed by Ozkan and
density-based estimation rather than the densest graph. Duygulu  in which the nodes are faces and edge weights
Figure 3. The proposed framework for re-ranking faces returned by text-based search engines.
are the similarities between two faces. With the observation Algorithm 1: Rank-By-Local-Density-Score
that the nodes (faces) of the queried person are similar to Step 1: For each face p, compute LDS(p, k),
each other and different from other nodes in the graph, the where k is the number of neighbors of p
densest component of the full graph the set of highly con- and is the input of the ranking process.
nected nodes in the graph will correspond to the face of the Step 2: Rank these faces using LDS(p, k)
queried person. The main drawback of this approach is it (The higher the score the more relevant).
needs a threshold to convert the initial weighted graph to a
binary graph. Choosing this threshold in high dimensional
spaces is difﬁcult since different persons might have differ- 3.2 Ranking by Bagging of SVM Classi-
ent optimal thresholds. ﬁers
We use the idea of density-based clustering described by
Ester et al. and Breunig et al. [11, 7] to solve this problem. One limitation of the local density score based ranking
Speciﬁcally, we deﬁne the local density score (LDS) of a is it cannot handle faces of another person strongly associ-
point p (i.e. a face) as the average distance to its k-nearest ated in the k-neighbor set (for example, many duplicates).
neighbors. Therefore, another step is proposed for handling this case.
distance(p, q) As a result, we have a model that can be used for both re-
LDS(p, k) = ranking current faces and predicting new incoming faces.
k The main idea is to use a probabilistic model to measure
where R(p, k) is the set of k - neighbors of p, and the relevancy of a face to person-X, P (person − X|f ace).
distance(p, q) is the similarity between p and q. Since the labels are not available for training, we use the
Since faces are represented in high dimensional feature input rank list found from the previous step to extract a sub-
space, and face clusters might have different sizes, shapes, set of faces lying at the top and bottom of the ranked list to
and densities, we do not directly use the Euclidean distance form the training set. After that, we use SVM with prob-
between two points in this feature space for distance(p, q). abilistic output  implemented in LibSVM  to learn
Instead, we use another similarity measure deﬁned by the the person-X model. This model is applied to faces of the
number of shared neighbors between two points. The efﬁ- original set, and the output probabilistic scores are used to
ciency of this similarity measure for density-based cluster- re-rank these faces. Since it is not guaranteed that faces ly-
ing methods was described in . ing at two ends of the input rank list correctly correspond to
|R(q, k) ∩ R(p, k)| the faces of person-X and faces of non person-X, we adopt
distance(p, q) = the idea of a bagging framework  in which randomly se-
lecting subsets to train weak classiﬁers, and then combining
these classiﬁers help reduce the risk of using noisy training
q∈R(p,k) |R(q, k) ∩ R(p, k)| sets.
LDS(p, k) =
k2 The details of the Rank-By-Bagging-ProbSVM-
A high value of LDS(p, k) indicates a strong association InnerLoop method, improving an input rank list by
between p and its neighbors. Therefore, we can use this combining weak classiﬁers trained from subsets annotated
local density score to rank faces. Faces with higher scores by that rank list are described in Algorithm 2.
are considered to be potential candidates that are relevant to Given an input ranked list, Rank-By-Bagging-ProbSVM-
person-X, while faces with lower scores are considered as InnerLoop is used to improve this list. We repeat the process
outliers and thus are potential candidates for non-person-X. a number of times whereby the ranked list output from the
Algorithm 1 describes these steps. previous step is used as the input ranked list of the next
Algorithm 2: Rank-By-Bagging-ProbSVM-InnerLoop 4 Experiments
Step 1: Train a weak classiﬁer, hi .
Step 1.1: Select a set Spos including p% of top ranked faces 4.1 Dataset
and then randomly select a subset Spos from Spos .
Label faces in Spos as positive samples. We used the dataset described by Berg et al.  for our
Step 1.2: Select a set Sneg including p% of bottom ranked
experiments. This dataset consists of approximately half a
faces and then randomly select a subset Sneg from Sneg . million news photos and captions from Yahoo News col-
Label faces in Sneg as negative samples. lected over a period of roughly two years. This dataset is
Step 1.3: Use Spos and Sneg to train a weak better than datasets collected from image search engines
classiﬁer, hj , using LibSVM  with probability outputs. such as Google that usually limit the total number of re-
Step 2: Compute ensemble classiﬁer Hi = j=1 hj . turned images to 1,000. Furthermore, it has annotations that
Step 3: Apply Hi to the original face set and form the are valuable for evaluation of methods. Note that these an-
rank list, Ranki , using the output probabilistic scores. notations are used for evaluation purpose only. Our method
Step 4: Repeat steps 1 to 3 is fully unsupervised, so it assumes the annotations are not
until Dist2RankList(Ranki−1, Ranki ) <= . available at running time.
Step 5: Return Hi = i hj .j=1 Only frontal faces were considered since current frontal
face detection systems  work in real time and have ac-
Algorithm 3: Rank-By-Bagging-ProbSVM-OuterLoop curacies exceeding 95%. 44,773 faces were detected and
Step 1: Rankcur = normalized to the size of 86×86 pixels.
Rank-By-Bagging-ProbSVM-InnerLoop(Rankprev). We selected ﬁfteen government leaders, including
Step 2: dist = Dist2RankList(Rankprev , Rankcur ). George W. Bush (US), Vladimir Putin (Russia), Ziang
Step 3: Rankf inal = Rankcur . Jemin (China), Tony Blair (UK), Junichiro Koizumi
Step 4: Rankprev = Rankcur . (Japan), Roh Moo-hyun (Korea), Abdullah Gul (Turkey),
Step 5: Repeat steps 1 to 4 and other key individuals, such as John Paul II (the Former
until dist <= . Pope) and Hans Blix (UN), because their images frequently
Step 6: Return Rankf inal . appear in the dataset . Variations in each person’s name
were collected. For example, George W. Bush, President
step. In this way, the iterations signiﬁcantly improve the Bush, U.S. President, etc., all refer to the current U.S. pres-
ﬁnal ranked list. The details are described in Algorithm 3. ident.
To determine the number of iterations of Rank- We performed simple string search in captions to check
By-Bagging-ProbSVM-InnerLoop and Rank-By-Bagging- whether a caption contained one of these names. The faces
ProbSVM-OuterLoop, we use the Kendall − tau dis- extracted from the corresponding image associated with this
tance , which is a metric that counts the number of pair- caption were returned. The faces retrieved from the differ-
wise disagreements between two lists. The larger the dis- ent name queries were merged into one set and used as input
tance, the more dissimilar the two lists are. The Kendall − for ranking.
tau distance between two lists, τ1 and τ2 , is deﬁned as fol- Figure 5 shows the distribution of retrieved faces from
lows: this method and the corresponding number of relevant faces
for these ﬁfteen individuals. In total, 5,603 faces were re-
K(τ1 , τ2 ) = K i,j (τ1 , τ2 ) trieved in which 3,374 faces were relevant. On average, the
(i,j)∈P accuracy was 60.22%.
where P is the set of unordered pairs of distinct elements
in τ1 and τ2 . K i,j (τ1 , τ2 ) = 0 if i and j are in the same 4.2 Face Processing
order in τ1 and τ2 , and K i,j (τ1 , τ2 ) = 1 if i and j are in the
opposite order in τ1 and τ2 . We used an eye detector to detect the positions of the
Since the maximum value of K(τ1 , τ2 ) is N (N − 1)/2, eyes of the detected faces. The eye detector, built with the
where N is the number of members of the list, the normal- same approach as that of Viola and Jones , had an ac-
ized Kendall tau distance can be written as follows: curacy of more than 95%. If the eye positions were not
detected, predeﬁned eye locations were assigned. The eye
K(τ1 , τ2 )
Knorm (τ1 , τ2 ) = . positions were used to align faces to a predeﬁned canonical
N (N − 1)/2
Using this measure for checking when the loops stop To compensate for illumination effects, the subtraction
means that if the ranked list does not change signiﬁcantly of the bestﬁt brightness plane followed by histogram equal-
after a number of iterations, it is reasonable to stop. ization was applied. This normalization process is shown in
lated as follows:
P recision =
Precision and recall are only used to evaluate the quality
of an unordered set of retrieved faces. To evaluate ranked
lists in which both recall and precision are taken into ac-
count, average precision is usually used. The average pre-
cision is computed by taking the average of the interpolated
precision measured at the 11 recall levels of 0.0, 0.1, 0.2, ...,
Figure 5. Distribution of retrieved faces and
relevant faces of 16 individuals used in ex-
The interpolated precision pinterp at a certain recall level
periments. Due to space limitation, bars cor-
r is deﬁned as the highest precision found for any recall
responding to George Bush (2,282 vs. 1,284)
level q ≥ r:
and Tony Blair (682 vs. 323) were cut-off at
the upper limit of the graph.
pinterp = maxr ≥r p(r )
In addition, to evaluate the performance of multiple
Figure 6. queries, we used mean average precision, which is the mean
We then used principle component analysis  to re- of average precisions computed from queries3 .
duce the number of dimensions of the feature vector for face
representation. Eigenfaces were computed from the origi- 4.4 Parameters
nal face set returned using the text-based query method. The
number of eigenfaces used to form the eigen space was se- The parameters of our method include:
lected so that 97% of the total energy was retained . The
number of dimensions of these feature spaces ranged from • p: the fraction of faces at the top and bottom of the
80 to 500. ranked list that are used to form a positive set Spos and
negative set Sneg for training weak classiﬁers in Rank-
By-Bagging-ProbSVM-InnerLoop. We empirically se-
lected p = 20% (i.e 40% samples of the rank list were
used) since a larger p will increase the number of incor-
rect labels, and a smaller p will cause over-ﬁtting. In
addition, Spos consists of 0.7 × |Spos | samples that are
selected randomly with replacement from Spos . This
sampling strategy is adopted from the bagging frame-
Figure 6. Face normalization. (top) faces with work . The same setting was used for Sneg .
detected eyes, (bottom) faces after normal-
ization process. • : the maximum Kendall tau distance Knorm (τ1 , τ 2)
between two rank lists τ 1 and τ2 . This value is used to
determine when the inner loop and the outer loop stop.
We set = 0.05 for balancing between accuracy and
processing time. Note that a smaller requires more
4.3 Evaluation Criteria iterations, making the system’s speed slower.
• kernel: the kernel type is used for the SVM. The de-
We evaluated the retrieval performance with measures fault is a linear kernel that is deﬁned as: k(x, y) =
that are commonly used in information retrieval, such as x ∗y. We have tested other kernel types such as RBF or
precision, recall, and average precision. Given a queried polynomial, but the performance did not change much.
person and letting Nret be the total number of faces re- Therefore, we used the linear kernel for simplicity.
turned, Nrel the number of relevant faces, and Nhit the total
number of relevant faces, recall and precision can be calcu- 3 http://trec.nist.gov/pubs/trec10/appendices/measures.pdf
4.5 Results • Supervised Learning (SVM-SUP): We randomly se-
lected a portion p of the data with annotations to train
4.5.1 Performance Comparison with Existing Ap- the classiﬁer; and then used this classiﬁer to re-rank
proaches the remaining faces. This process was repeated ﬁve
times and the average performance was reported. We
We performed a comparison between our proposed method used a range of portion p values for experiments: p =
with other existing approaches. 1%, 2%, 3%, ..., 5%.
• Text Based Baseline (TBL): Once faces corresponding
with images whose captions contain the query name
are returned, they are ranked in time order. This is a
rather naive method in which no prior knowledge be-
tween names and faces is used.
• Distance-Based Outlier (DBO): We adopted the idea
of distance-based outliers detection for ranking .
Given a threshold dmin , for each point p, we counted
the number of points q so that dist(p, q) ≤ dmin ,
where dist(p, q) is the Euclidean distance between p
and q in the feature space mentioned in section 4.2.
This number was then used as the score to rank faces.
We selected a range of dmin values for experiments:
dmin = 10, 15, 20, ..., 90.
• Densest Sub-Graph based Method (DSG): We re- Figure 7. Performance comparison of meth-
implemented the densest sub-graph based method  ods. Due to different settings, performances
for ranking. Once the densest subgraph was found af- are superimposed for better evaluation.
ter an edge elimination process, we counted the num-
ber of surviving edges of each node (i.e face) and used
this number as the ranking score. To form the graph, Figure 7 shows a performance comparison of these meth-
the Euclidean distance dist(p, q) was used to assign ods. Our proposed methods (LDS and UEL-LDS) out-
the weight for the edge linked between node p and perform other unsupervised methods such as TBL, DBO
node q. DSG require a threshold θ to convert the and DSG. Furthermore, the performance of the DBO and
weighted graph to the binary graph before searching DSG methods are sensitive to the distance threshold, while
for the densest subgraph. We selected a range of θ the performance of our proposed method is less sensitive.
values that are the same as the values used in DBO: It conﬁrms that the similarity measure using shared near-
θ = 10, 15, 20, ..., 90. est neighbors is reliable for estimation of the local den-
sity score. The performance of UEL-LDS is slightly bet-
• Local Density Score (LDS): This is the ﬁrst stage of ter than LDS since the training sets labeled automatically
our proposed method. It requires the input value k to from the ranked list are noisy. However, UEL-LDS im-
compute the local density score. Since we do not know proves signiﬁcantly even when the performance of LDS is
the number of returned faces from text-based search poor. These performances are worse than that of SVM-SUP
engines, we used another input value f raction deﬁned using a small number of labeled samples.
as the fraction of neighbors and estimated k by the for- Figure 8 shows an example of the top 50 faces ranked
mula: k = f raction ∗ N , where N is the number of using the TBL, DBO, DSG and LDS methods. The perfor-
returned faces. We used a range of f raction values mance of DBO is poor since a low threshold is used. This
for experiments: f raction = 5%, 10%, 15%, ..., 50%. ranks irrelevant faces that are near duplicates (rows 2 and 3
For a large number of returned faces, we set k to the in Figure 8(b)) higher than relevant faces. This explains the
maximum value of 200: k = 200. same situation with DSG.
• Unsupervised Ensemble Learning Using Local Den- 4.5.2 Performance of Ensemble Classiﬁers
sity Score (UEL-LDS): This is a combination of rank-
ing by local density scores and then the ranked list is In Figure 9, we show the performance of ﬁve single clas-
used for training a classiﬁer to boost the rank list. siﬁers and that of ﬁve ensemble classiﬁers. The ensemble
Precision return a large fraction of relevant images is satisﬁed. Fig-
Method at top 20 Recall Precision ure 12 shows an example where this assumption is broken.
GoogleSE 79.33 100.00 57.08 Consequently, as shown in Figure 13, the model learned by
UEL-LDS 89.00 72.50 76.41 this set performed poorly in recognizing new faces returned
SVM-SUP-05 85.00 73.14 76.46 by GoogleSE. Our approach solely relies on the above as-
SVM-SUP-10 90.67 74.94 78.30 sumption; therefore, it is not affected by the ranking of text-
based search engines.
Table 1. Comparison of different methods on
The iteration of bagging SVM classiﬁers does not guar-
the new test set returned by Google Image
antee a signiﬁcant improvement in performance. The aim
of our future work is to study how to improve the quality of
the training sets used in this iteration.
classiﬁer k is formed by combining single classiﬁers from 1 6 Conclusion
to k. It clearly indicates that the ensemble classiﬁer is more
stable than single weak classiﬁers.
We presented a method for ranking faces retrieved us-
ing text-based correlation methods in searches for a speciﬁc
4.5.3 New Face Annotation person. This method learns the visual consistency among
faces in a two-stage process. In the ﬁrst stage, a relative den-
We conducted another experiment to show the effectiveness
sity score is used to form a ranked list in which faces ranked
of our approach in which learned models are used to anno-
at the top or bottom of the list are likely to be relevant or ir-
tate new faces of other databases. We used each name in the
relevant faces, respectively. In the second stage, a bagging
list as a query to obtain the top 500 images from the Google
framework is used to combine weak classiﬁers trained on
Image Search Engine (GoogleSE). Next, these images were
subsets labeled from the ranked list into a strong classiﬁer.
processed using the steps described in section 4.2: extract-
This strong classiﬁer is then applied to the original set to
ing faces, detecting eyes and doing normalization. We pro-
re-rank faces on the basis of the output probabilistic scores.
jected these faces to the PCA subspace trained for that name
Experiments on various face sets showed the effectiveness
and used the learned model to re-rank faces.
of this method. Our approach is beneﬁcial when there are
There were 4,103 faces (including false positives - non- several faces in a returned image, as shown in Figure 11.
faces detected as faces) detected from 7,500 returned im-
ages. We manually labeled these faces and there were 2,342
relevant faces. On average, the accuracy of the GoogleSE is References
In Table 1, we compare the performance of the methods.  O. Arandjelovic and A. Zisserman. Automatic face recog-
The performance of UEL-LDS was obtained by running nition for ﬁlm character retrieval in feature-length ﬁlms. In
the best system, which is shown as the peak of the UEL- Proc. Intl. Conf. on Computer Vision and Pattern Recogni-
LDS curve in Figure 7. The performances of SVM-SUP-05 tion, volume 1, pages 860–867, 2005.
and SVM-SUP-10 correspond to the supervised systems (cf.  M. S. Bartlett, J. R. Movellan, and T. J. Sejnowski. Face
section 4.5.1) that used p = 5% and p = 10% of the data set recognition by independent component analysis. IEEE
respectively. We evaluated the performance by calculating Transactions on Neural Networks, 13(6):1450–1464, Nov
the precision at the top 20 returned faces, which is com- 2002.
mon for image search engines and recall and precision on  T. L. Berg, A. C. Berg, J. Edwards, and D. A. Forsyth. Who’s
in the picture? In Advances in Neural Information Process-
all detected faces of the test set. UEL-LDS achieved com-
ing Systems, 2004.
parable performance to the supervised methods and outper-
 T. L. Berg, A. C. Berg, J. Edwards, M. Maire, R. White,
formed the baseline GoogleSE. The precision at the top 20
Y. W. Teh, E. G. Learned-Miller, and D. A. Forsyth. Names
of SVM-SUP-05 is poorer than that of UEL-LDS due to the and faces in the news. In Proc. Intl. Conf. on Computer
small number of training samples. Figure 10 shows top 20 Vision and Pattern Recognition, volume 2, pages 848–854,
faces ranked using these two methods. 2004.
 D. Bolme, R. Beveridge, M. Teixeira, and B. Draper. The
csu face identiﬁcation evaluation system: Its purpose, fea-
tures and structure. In International Conference on Vision
Systems, pages 304–311, 2003.
Our approach works fairly well for well known people,  L. Breiman. Bagging predictors. Machine Learning,
where the main assumption that text-based search engines 24(2):123140, 1996.
 M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. LOF:
Identifying density-based local outliers. In Proc. ACM SIG-
MOD Int. Conf. on Management of Data(SIGMOD), pages
 C.-C. Chang and C.-J. Lin. LIBSVM: a library for
support vector machines, 2001. Software available at
 M. Charikar. Greedy approximation algorithms for ﬁnding
dense components in a graph. In APPROX ’00: Proceed-
ings of the Third International Workshop on Approximation
Algorithms for Combinatorial Optimization, pages 84–95.
 L. Ertoz, M. Steinbach, and V. Kumar. Finding clusters of
different sizes, shapes, and densities in noisy high dimen- (a) - TBL - 11 irrelevant faces
sional data. In SIAM International Conference on Data Min-
ing, pages 47–58, 2003.
 M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-
based algorithm for discovering clusters in large spatial
databases with noise. In Proc. ACM SIGKDD Int. Conf. on
Knowledge Discovery and Data Mining (SIGKDD), pages
 R. Fergus, P. Perona, and A. Zisserman. A visual category
ﬁlter for google images. In Proc. Intl. European Conference
on Computer Vision, volume 1, pages 242–256, 2004.
 M. Kendall. Rank Correlation Methods. Charles Grifﬁn
Company Limited, 1948. (b) - DBO - 17 irrelevant faces
 E. M. Knorr, R. T. Ng, and V. Tucakov. Distance-based out-
liers: Algorithms and applications. VLDB Journal: Very
Large Data Bases, 8(3-4):237–253, 2000.
 L.-J. Li, G. Wang, and L. Fei-Fei. Optimol: automatic on-
line picture collection via incremental model learning. In
Proc. Intl. Conf. on Computer Vision and Pattern Recogni-
tion, volume 2, pages 1–8, 2007.
 D. Ozkan and P. Duygu. A graph based approach for naming
faces in news photos. In Proc. Intl. Conf. on Computer Vi-
sion and Pattern Recognition, volume 2, pages 1477–1482,
 J. Platt. Probabilistic outputs for support vector machines
and comparison to regularized likelihood methods. In Ad- (c) - DSG - 18 irrelevant faces
vances in Large Margin Classiﬁers, pages 61–74, 1999.
 M. Turk and A. Pentland. Face recognition using eigenfaces.
In Proc. Intl. Conf. on Computer Vision and Pattern Recog-
 P. Viola and M. Jones. Rapid object detection using a
boosted cascade of simple features. In Proc. Intl. Conf. on
Computer Vision and Pattern Recognition, volume 1, pages
 T.-F. Wu, C.-J. Lin, and R. C. Weng. Probability estimates
for multi-class classiﬁcation by pairwise coupling. Journal
of Machine Learning Research, 5:975–1005, 2004.
 W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld. Face (d) - LDS - 4 irrelevant faces
recognition: A literature survey. ACM Computing Surveys,
Figure 8. Top 50 faces ranked by the methods
TBL, DBO, DSG and LDS. Irrelevant faces are
marked with a star.
Figure 9. Performance of the ensemble clas-
siﬁers and single classiﬁers.
(a) - 5 irrelevant faces
Figure 12. Example in which portion of rel-
evant faces is dominant, but it is difﬁcult to
group all these faces into one cluster due
(b) - no any irrelevant face to large facial variations. In feature space,
the largest cluster formed from relevant faces
is not largest cluster among those formed
Figure 10. Top 20 faces ranked by Google from all returned faces. Irrelevant faces are
Image Search Engine (a) and ranked using marked with a star.
our learned model (b). Irrelevant faces are
marked with a star.
Figure 13. Many irrelevant faces annotated
using the model learned from the data set
Figure 11. Image returned by GoogleSE for shown in Figure 12. Irrelevant faces are
query ’Gerhard Schroeder’. GoogleSE was marked with a star.
unable to accurately identify who the queried
person was, while the learned model of our
approach accurately identiﬁed him.