• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Active reranking for web image search
 

Active reranking for web image search

on

  • 1,684 views

Dear Students...

Dear Students
Ingenious techno Solution offers an expertise guidance on you Final Year IEEE & Non- IEEE Projects on the following domain
JAVA
.NET
EMBEDDED SYSTEMS
ROBOTICS
MECHANICAL
MATLAB etc
For further details contact us:
enquiry@ingenioustech.in
044-42046028 or 8428302179.

Ingenious Techno Solution
#241/85, 4th floor
Rangarajapuram main road,
Kodambakkam (Power House)
http://www.ingenioustech.in/

Statistics

Views

Total Views
1,684
Views on SlideShare
1,684
Embed Views
0

Actions

Likes
1
Downloads
66
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Active reranking for web image search Active reranking for web image search Document Transcript

    • IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 3, MARCH 2010 805 Active Reranking for Web Image Search Xinmei Tian, Dacheng Tao, Member, IEEE, Xian-Sheng Hua, Member, IEEE, and Xiuqing Wu Abstract—Image search reranking methods usually fail to cap- “panda” existing in its surrounding text. The other problem isture the user’s intention when the query term is ambiguous. There- that the textual information is insufficient to represent the se-fore, reranking with user interactions, or active reranking, is highly mantic content of the images. The same query words may referdemanded to effectively improve the search performance. The es-sential problem in active reranking is how to target the user’s in- to images that are semantically different, e.g., we cannot dif-tention. To complete this goal, this paper presents a structural in- ferentiate an animal panda image from an image for a personformation based sample selection strategy to reduce the user’s la- whose name is Panda, just with the text word “panda”.beling efforts. Furthermore, to localize the user’s intention in the Because the textual information is insufficient for semanticvisual feature space, a novel local-global discriminative dimension image retrieval, a natural recourse is the visual information. Re-reduction algorithm is proposed. In this algorithm, a submanifoldis learned by transferring the local geometry and the discrimina- cently a dozen of image/video reranking methods [6], [14], [15],tive information from the labelled images to the whole (global) [17], [34] have been proposed to exploit the usage of the vi-image database. Experiments on both synthetic datasets and a real sual information for refining the text-based search result. MostWeb image search dataset demonstrate the effectiveness of the pro- of these reranking methods utilize the visual information in anposed active reranking scheme, including both the structural infor- unsupervised and passive manner. The only exception is the In-mation based active sample selection strategy and the local-globaldiscriminative dimension reduction algorithm. tentSearch [6], which reorders the text-based search result by using query by example (QBE), with the query image specified Index Terms—Active reranking, local-global discriminative by the user from the initial text-based search result.(LGD) dimension reduction, structural information (SInfo) basedactive sample selection, web image search reranking. Unsupervised reranking methods, e.g., the clustering based algorithm [14], the random work [15], the VisualRank [17] and the Bayesian reranking [34], can only achieve limited perfor- I. INTRODUCTION mance improvements. This is because the visual information is insufficient to infer the user’s intention, especially when the query term is ambiguous. For example, “panda” can be eitherC URRENTLY, most of the popular commercial Web image search engines, e.g., Microsoft’s Live Image Search andGoogle Image Search, are built for “query by keywords” sce- an animal or a person whose name is Panda. Without user inter- actions, we have no idea which kind of panda images are pre-nario. That is, a user provides a keyword, e.g., “panda”, then ferred by the user. However, if the user interactions are avail-the search engine returns corresponding images by processing able, we can learn his/her intention and then rerank the initialthe associated textual information, e.g., file name, surrounding search results to achieve a significant performance improve-text, URL, etc. ment. For instance, in the query “panda”, if the user labels the Although text-based search techniques have shown their ef- animal pandas as relevant and other images as irrelevant, dif-fectiveness in the document search, they are problematic when ferent kinds of animal pandas will be returned to the user. Inapplied to the image search. There are two main problems. One this paper, reranking with user’s interactions is named as activeis the mismatching between images and their associated tex- reranking. IntentSearch [6] can be regarded as a simplified ac-tual information, resulting into irrelevant images appearing in tive reranking method with only one relevant image labelled bythe search results. For example, an image which is irrelevant to the user.“panda” will be mistaken as a relevant image if there is a word In active reranking, the essential problem is how to capture the user’s intention, i.e., to distinguish query relevant images from irrelevant ones. Different from the conventional learning Manuscript received March 04, 2009; revised October 05, 2009. First problems, in which each sample only has one fixed label, anpublished November 03, 2009; current version published February 18, 2010. image may be relevant for one user but irrelevant for another.This work was supported by the Nanyang Technological University Nanyang In other words, the semantic space is user-driven, accordingSUG Grant (M58020010), the Microsoft Operations PTE LTD-NTU JointR&D (M48020065), and the K. C. Wong Education Foundation Award. The to their different intentions but with identical query keywords.associate editor coordinating the review of this manuscript and approving it for Therefore, we propose to target the user-driven intention frompublication was Prof. Sharathchandra Pankanti two aspects: collecting labeling information from users to obtain X. Tian and X. Wu are with the Department of Electronic Engineering andInformation Science, University of Science and Technology of China, Hefei the specified semantic space, and localizing the visual charac-230027, China (e-mail: xinmei@mail.ustc.edu.cn; wuxq@ustc.edu.cn). teristics of the user’s intention in this specific semantic space, D. Tao is with the School of Computer Engineering, The Nanyang Techno- as detailed in Sections I-A and B, respectively.logical University, 50 Nanyang Avenue, Blk N4, Singapore, 639798 (e-mail:dacheng.tao@gmail.com). Although IntentSearch [6] can be deemed as a simplified ver- X.-S. Hua is with Microsoft Research Asia, Beijing 100190, China (e-mail: sion of active reranking, i.e., the user’s intention is defined byxshua@microsoft.com). only one query image, it cannot work well when the user’s inten- Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org. tion is too complex to be represented by one image. As shown Digital Object Identifier 10.1109/TIP.2009.2035866 in Fig. 3, the query relevant images for “Animal” vary largely 1057-7149/$26.00 © 2010 IEEE Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 20,2010 at 10:56:33 UTC from IEEE Xplore. Restrictions apply.
    • 806 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 3, MARCH 2010both in visual appearance and features, thus we cannot repre- high-level semantics to further enhance the reranking perfor-sent “Animal” only with one image. Instead, our proposed ac- mance on this submanifold.tive reranking method can learn the user’s intention more exten- In the past decades, a dozen of dimension reduction algo-sively and completely. rithms have been proposed, e.g., principal components analysis (PCA) [13], transductive component analysis [23], locally linearA. Active User’s Labeling Information Collection embedding (LLE) [27], Discriminant LLE [21], ISOMAP [31], To collect the labeling information from users efficiently, a nonparametric discriminant analysis [29], semi-supervised dis-new structural information (SInfo) based strategy is proposed criminant analysis (SDA) [3], biased marginal Fisher’s analysisto actively select the most informative query images. (BMFA) [37], locality preserving projections (LPP) [11], super- It is boring and unacceptable to keep asking a user to label a vised LPP (SLPP) [2], geometric mean for subspace selectionlot of images in the interaction stage. Thus, it is essential to get [28], local discriminant embedding (LDE) [5], semantic man-the necessary information by labeling as few images as possible. ifold learning (SML) [22], orthogonal Laplacianface [1], max-Active learning is well-known for reducing the labeling efforts, imum margin projection (MMP) [10] and the recently devel-by labeling most informative samples [4], [20]. Conventional oped correlation metric based methods [8], [9]. However, theyactive learning strategies can be divided into two categories: the are problematic for active reranking in Web image search forerror reduction strategy [12], [25], [43] and the most uncertain the following reasons. Unsupervised methods, e.g., PCA and(close-to-boundary) strategy [4], [36]. Both of them suffer from LLE, exploit a subspace or submanifold on the whole imagethe small sample size problem, i.e., the unreliable estimation of space but ignore user’s labeling information. As a consequent,the expected error risk and the uncertainty caused by the insuf- these algorithms fail to capture the user-driven intentions. Su-ficient labelled samples. pervised linear algorithms, e.g., LDA [7] and biased discrimi- In active reranking, however, only a few images will be nant analysis (BDA) [41], learn a subspace on the labelled set solabelled by a user. To avoid or alleviate the influence of the they ignore the submanifold of all relevant images. Supervisedsmall sample size problem, our proposed SInfo sample se- manifold learning algorithms, e.g., SLPP and BMFA, cannotlection strategy considers two aspects: the ambiguity and the transfer the learned submanifold from labelled images to unla-representativeness, simultaneously. belled images. Although some semi-supervised algorithms, e.g., The ambiguity denotes the uncertainty whether an image is SML and SDA, have been developed to model both labelled andrelevant or not to the user’s intention. Chang et al. [4] and Wang unlabelled images, they are not designed specifically for activeet al. [36] have demonstrated the effectiveness of the ambiguity reranking in Web image search. They assume both relevant andin active learning for image retrieval. However, they are not irrelevant unlabelled images are drawn from a nonlinear man-specified for reranking problem. In this paper, the ambiguity is ifold. In Web image search, however, irrelevant images scatterconsidered in a more natural way for reranking; it is derived in the whole space, i.e., they may be distributed uniformly, andfrom the ranking scores, which denotes the images’ relevance thus popular manifold regularizations [3], [22] will over-fit todegrees. Besides the ambiguity, the representativeness, another unlabelled images. As a consequence, the performance obtainedimportant aspect, is also considered. An image is more represen- by popular semi-supervised learning algorithms is poor. Thistative if it is located in a dense area with many images around paper presents a new algorithm to target user’s intention. Pre-it. Labeling a representative sample will bring more information liminary experimental results on both synthetic data and a realthan labeling an isolated one. In active reranking, the represen- Web image search dataset demonstrate the effectiveness of thetativeness is derived in a totally unsupervised fashion and inde- proposed LGD.pendent to the learning algorithms, to alleviate the influence of The rest of the paper is organised as follows. Firstly, we in-the aforementioned small sample size problem. Experiments on troduce the overall framework for active reranking in Section II.both synthetic data and a real Web image search dataset show The SInfo active sample selection strategy is detailed in Sec-that the SInfo is much more effective than other strategies, e.g., tion III and the LGD dimension reduction algorithm is presentedthe most uncertain strategy and the error reduction strategy, in in Section IV. In Section V, the basic Bayesian reranking algo-active reranking for Web image search. rithm is briefly introduced and the overall procedure of active reranking based on it is given. Experimental results on syntheticB. Visual Characteristic Localization datasets and a real Web image search dataset are reported in Sec- To localize the visual characteristics of the user’s intention, tion VI and Section VII, respectively. In Section VIII, we givewe propose a novel local-global discriminative (LGD) dimen- some analysis to the important parameters in SInfo and LGD,sion reduction algorithm. Basically, we assume that the query followed by the conclusion in Section IX.relevant images, which represent the user’s intention, are lyingon a low-dimensional submanifold of the original ambient (vi- II. ACTIVE RERANKING FOR WEB IMAGE SEARCHsual feature) space. LGD learns this submanifold by transfer- Fig. 1 shows the proposed general framework for activering both the local geometry and the discriminative informa- reranking in Web image search. Take the query term “panda”tion from labelled images to unlabelled ones. The learned sub- as an example. When “panda” is submitted to the Web imagemanifold preserves both the local geometry of labelled relevant search engine, an initial text-based search result is returned toimages and the discriminative information to separate relevant the user, as shown in Fig. 1(a) (only the top nine images arefrom irrelevant images. As a consequence, we can eliminate the given for illustration). This result is unsatisfactory because bothwell-known semantic gap between low-level visual features and person and animal images are retrieved as top results. This is Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 20,2010 at 10:56:33 UTC from IEEE Xplore. Restrictions apply.
    • TIAN et al.: ACTIVE RERANKING FOR WEB IMAGE SEARCH 807Fig. 1. Framework for active reranking illustrated with the query “panda”. When the query is submitted, the text-based image search engine returns a coarse result(a). Then the active reranking process is adopted to obtain a more satisfactory result (b), by learning the user’s intention.caused by the ambiguity of the query term. Without the user ambiguity of an image is measured by the entropy of the rel-interactions, it is impossible to eliminate this ambiguity. In par- evance probability distribution while the representativeness isticular, which kind of images, animal panda or person whose measured by the density.name is Panda, are user’s intention? Therefore, traditionalreranking methods, which improve the initial search results by A. Ambiguityonly utilizing the visual property of images, cannot achieve The ambiguity denotes the uncertainty whether an image isgood performances. relevant or not. It can be estimated via various sophisticated To solve this problem, active reranking, i.e., reranking with learning methods, e.g., support vector machine (SVM) [35],user interactions, is proposed. As shown in Fig. 1, four im- transductive SVM (TSVM) [18] and the harmonic Gaussianages are first selected according to an active sample selection filed method [42], by conducting a binary classification task.strategy, and then the user is required to label them. If the user However, in active reranking, it is direct and reasonable tolabels the animal pandas as query relevant (indicated by “ ” in measure the ambiguity with the ranking scores obtained in theFig. 1) and other two images (person, car) as query irrelevant. reranking process. There are two reasons. One reason is thatThen we can learn that the animal panda is the user’s intention. the reranking problem is essentially different from classifica-To represent this intention, i.e., the animal panda, a discrimina- tion [34], thus the ambiguity estimated via conducting classi-tive submanifold should be exploited to separate query relevant fication task may be not as accurate as that directly derived inimages from irrelevant ones. A dimension reduction step is thus reranking process. The other reason is that additional cost willintroduced to localize the visual characteristics of the user’s in- be introduced if the ambiguity is estimated via other learningtention. methods. In contrast, measuring ambiguity through the ranking With the knowledge of the user’s intention, including both scores avoids this additional cost.the labeling information and the learned discriminative subman- For an image is its ranking score, whereifold, the reranking process is conducted and different kinds of means is definitely query relevant, while means isanimal pandas are returned, as shown in Fig. 1(b). Sometimes, totally irrelevant. and can be regarded as the prob-several interaction rounds are preferred to achieve a more satis- ability of to be relevant and irrelevant respectively. Then thefactory performance. ambiguity can be measured via the information entropy, which In summary, there are two key steps in learning the user’s in- is a widely used measurement in the information theory. Thetention, i.e., the active sample selection strategy and the dimen- ambiguity of ission reduction algorithm. This paper implements these two stepsvia a new SInfo sample selection strategy and a novel LGD di- (1)mension reduction algorithm, as will be discussed in Sections III Because the reranking is conducted based on the initial text-and IV, respectively. based search result [34], the ambiguity in the initial text-based search result should also be taken into account, i.e., III. SINFO ACTIVE SAMPLE SELECTION An SInfo active sample selection strategy is presented to learn (2)the user’s intention efficiently which selects images by consid- where is the initial text-based search ranking scoreering not only the ambiguity but also the representativeness in for .the whole image database. Ambiguity and representativeness By combining (1) and (2), the total ambiguity for isare two important aspects in active sample selection. Labelinga sample which is more ambiguous will bring more informa- (3)tion. On the other side, the information provided by individualsample can be shared by its neighbors. Therefore, the more where is a trade-off parameter to control the influ-representative samples are preferred for labeling. In SInfo, the ence of the two ambiguity terms. Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 20,2010 at 10:56:33 UTC from IEEE Xplore. Restrictions apply.
    • 808 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 3, MARCH 2010 C. Active Sample Selection Since the most informative images should meet both ambi- guity and representativeness simultaneously, the structural in- formation of image , can be measured by the product of the two terms, i.e., Then the most informative image is selected from the un-Fig. 2. Because “A” and “B” have the same distance to the hyper-plane (dashed labelled image set according toline), they have an identical ambiguity. However, the more representative sample“A” is more preferable to “B”. (5) In practical applications, to provide a good user experience,B. Representativeness it would be better to ask users to label a small number of images Besides the ambiguity, representativeness, an important prop- than only one image in each round. This is because users will lose their patience after a few rounds. Thus, the batch mode iserty but not well studied before, is also taken into account. Apart utilized to select several images in each round. A simple methodfrom the unreliable estimation led by insufficient labelled im- is to select the top- most informative images. The disadvantageages, the ambiguity measures the importance of the image it- of this method is that the selected images may be redundantself only. Once the Web image search system gets the labeling and cluster in a small area in the high-dimensional feature space.information of an image, it is very important to consider how Thus, we seek to select a batch of most informative images andmany other images can share the labeling information with the maintain their diversity at the same time.labelled one. For example, given two unlabelled samples with The angle-diversity criterion [4] is a good choice to achievethe identical ambiguity, labeling the more representative one, this purpose. This criterion iteratively selects images which arei.e., many samples are distributed around it, will bring more in- most informative and also be diverse to the already selectedformation and achieve a better reranking performance. image set . For an unlabelled image , the diversity between To explain this, a simple synthetic dataset is shown in Fig. 2. and is measured by the minimal angle between and eachThere are two labelled samples (a big “*” for the query rele- image . Then, the images are selected iteratively ac-vant sample while a big “o” for the query irrelevant one) and cording toseveral unlabelled ones (marked with black big dot “.”). Thesesix samples distribute along a line and the coordinates on the (6)horizontal axis denote their positions. By using SVM [35], theclassification hyper-plane , which separates the two labelled where is a trade-off parameter which is introducedsample with the largest margin, crosses position 0 as shown in to balance the effects of the two components: the structural in-Fig. 2 with the dashed line. According to the most uncertainty formation and the angle-diversity.criteria, i.e., the samples closest to having the maximum am-biguity, we can get that “A” and “B” have the maximum and IV. LGD DIMENSION REDUCTIONidentical ambiguity because they have the same distance, i.e.,0.4 for both, to the hyper-plane. However, if we can choose only In reranking, the images returned for aone sample for labeling, it is better to label “A” than “B” because certain query term are represented by low-level visual features,more unlabelled samples will share the labeling information of i.e., with the -dimensional visual“A”. feature for image . The performance of reranking is To avoid the small sample size problem in active sample se- usually poor because of the gap between the low-level visuallection, the representativeness can be estimated in an unsuper- features and high-level semantics.vised manner. Intuitively, labeling an image in a dense area will With user interactions, this semantic gap can be reduced sig-be more helpful than labeling an isolated one because the la- nificantly. By mining user’s labeling information, we can learn a submanifold to encode the user’s intention. This submanifoldbeling information of the image can be shared with other sur- is embedded in the ambient space, i.e., the high-dimensional vi-rounding images. As a consequence, we can measure the rep- sual feature space . In this paper, a linear subspace is usedresentativeness of image via the probability density , to approximate this submanifold and then the images can be rep-which can be estimated by using the kernel density estimation resented as with(KDE) [26] for image . By using , an improved reranking per- formance can be further obtained. (4) This paper presents an LGD dimension reduction algorithm to learn such a . LGD considers both the local information contained in the labelled images and the global information ofwhere is the set of neighbors of . is the visual feature for the whole image database simultaneously. In detail, LGD trans-image . is a kernel function that satisfies both fers the local information, including both the local geometry ofand . The Gaussian kernel is adopted in this the labelled relevant images and the discriminative informationpaper. For the synthetic dataset in Fig. 2, the estimated repre- in the labelled images, to the global domain (the whole imagesentativeness is given by the curve . database). This cross domain transfer process is completed by Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 20,2010 at 10:56:33 UTC from IEEE Xplore. Restrictions apply.
    • TIAN et al.: ACTIVE RERANKING FOR WEB IMAGE SEARCH 809building different local and global patches for each image, andthen aligning those patches together to learn a consistent coor-dinate. One patch is a local area formed by a set of neighboringimages. We have three types of images: labelled relevant, la-belled irrelevant, and unlabelled. Therefore, we build 3 types ofpatches, which are: 1) local patches for labelled relevant imagesto represent the local geometry of them and the discriminativeinformation to separate relevant images from irrelevant ones,2) local patches for labelled irrelevant images to represent thediscriminative information to separate irrelevant images fromrelevant ones, and 3) global patches for both labelled and unla- Fig. 3. For query “animal”, the query relevant images vary largely in both ap- pearance (a) and visual features (b). In (b), the utilized 428-D visual featuresbelled images for transferring both the local geometry and the include 225-D color moment, 128-D wavelet texture and 75-D edge distribu-discriminative information from all labelled images to the unla- tion histogram.belled ones. For convenience, we use superscript “ ” to denote the la-belled relevant images and “ ” to denote the labelled irrele- Solving problem (8), we can getvant ones. If there is no superscript, it refers to an arbitraryimage which may be labelled relevant, labelled irrelevant or un- with the local gram matrixlabelled. . To rewrite (7) in a more compact form, we consider its twoA. Local Patches for Labelled Relevant Images parts separately. For the first part, which models the local ge- BDA, a popular dimension reduction algorithm for image re- ometry of relevant imagestrieval, assumes that all query relevant samples are alike whileeach irrelevant sample is irrelevant in its own way [41]. Thus,the relevant samples are required to be close to each other in theprojected subspace. However, this assumption is usually unreli-able in Web image search. The query relevant samples may vary in appearance andcorresponding visual features. For example, in query “animal”,query relevant images are different from each other, as shownin Fig. 3. For this reason, instead of requiring relevant imagesto be close to each other in the projected subspace, it is moreproper to remain the local geometry of the relevant imageswhile separating relevant images from all irrelevant ones. (9)Therefore, the local patch for a labelled relevant imageshould preserve both the local geometry of relevant images and where andthe discriminative information between the relevant imagesand all irrelevant images. This paper models the local patch for with .the low-dimensional representation of the labelled relevantimage as The second part models the discriminative information for separating relevant image from all irrelevant ones, i.e., (7) are ’s nearest neighbors in the labelled rel-evant image set “ ”, and The are itsnearest neighbors in the labelled irrelevant image set “ ”. Thecombination coefficient is a trade-off factor between the twoparts. The first part in (7) is used to preserve the local geometryof labelled relevant images before and after projection, thus thelinear combination coefficient vector is required to recon-struct from its neighboring relevant images with minimalerror (10) where and (8) . Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 20,2010 at 10:56:33 UTC from IEEE Xplore. Restrictions apply.
    • 810 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 3, MARCH 2010 By combining (9) and (10) together into (7), we have us to use PCA here is the rule of Occam’s razor [24], i.e., the utilization of PCA is helpful to avoid the over-fitting caused by using conventional manifold regularizations. To illustrate the advantage of global patches for dimension re- duction, a synthetic example is shown in Fig. 4. Fig. 4(a) shows the synthetic 3-D dataset and its projection on the 2-D planes for a nice view. In this dataset, there are 8 labelled samples, 4 relevant and 4 irrelevant, accompanied with abundant unla- belled samples. The relevant samples are all marked by “*”, with big red “*” for the 4 labelled relevant samples and small black “*” for the unlabelled relevant ones. The irrelevant samples areB. Local Patches for Labelled Irrelevant Images marked by “o”, where big blue “o” and small green “o” denote Discriminative information is also partially encoded in all ir- the labelled and unlabelled irrelevant samples respectively. Therelevant images, so we construct local patches for labelled irrel- irrelevant samples scatter in the space and the relevant imagesevant images by separating each irrelevant image from all rel- are distributed on a manifold approximately.evant images. Because each irrelevant image is irrelevant in its We have tried many different dimension reduction algo-own way, it could be unreasonable to keep the local geometry rithms and the results are illustrated in Fig. 4(b)–(k). Forof the irrelevant images. In this paper, we model the local patch each dimension reduction algorithm, we have computed thefor the low-dimensional representation of labelled irrelevant projection plane (the upper part of subfigure) and the projectedimage as 2-D data (the lower part of subfigure). With these conventional algorithms, the relevant and irrelevant samples are overlapped in the projected subspace and the submanifold of the relevant (11) samples is not well preserved, as illustrated in the figure. This is caused by the problems existing in these algorithms asThe is ’s nearest neighbors in the labelled aforementioned.relevant image set “ ”. The matrix can be calculated in the To avoid these problems, the proposed LGD learns theway similar to that of computing in (10) by setting submanifold by transferring both the local geometry and theand . discriminative information from labelled samples to all un- labelled samples. Global patches are built for each sampleC. Global Patches for All Images (including both labelled and unlabelled) to complete the cross In active reranking, users would like to label only a small domain knowledge transferring process. According to the align-number of images, so it is lavish and unreasonable to abandon a ment scheme in [40], the global patch for the low-dimensionallarge number of unlabelled images. With only the labelled im- representation of the image is modeled in a similar way toages, the learned subspace will bias to that spanned by these local patcheslabelled images and cannot generalize well to the large amount (12)of unlabelled data. Therefore, some semi-supervised methodshave been proposed which also take the unlabelled images into where is the centroid of the projected low-dimensional fea-utilization. However, because only relevant images are lying on ture. Here we use a variant version of the original definition ofan unknown manifold and the distribution of irrelevant images PCA to achieve a formula-level consistency for both local andis nearly flat, conventional manifold regularizations which as- global patches.sume both relevant and irrelevant samples are drawn from un- We rewrite (12) asknown manifolds prone to over-fit to unlabelled samples. As aconsequence, another method will be considered in this paperto model unlabelled images in active reranking. To make use of both the labelled and unlabelled images, themost important thing is to exploit the information contained inthem. Inspired by the main idea in the cross domain learning[16] and the transfer learning [32], in this paper, we introducethe global patches to both labelled and unlabelled images. Theglobal patches transfer the local geometry and the discrimina-tive information, which is exploited in the domain of labelledimages, to the domain of unlabelled images. With the global where withpatches, we aim to preserve the principal subspace to keep the are the rest images beyond , vectorsubmanifold of relevant images. The noise information con- andtained in the ambient space should be eliminated. The principalcomponent analysis (PCA) is a suitable choice, which maxi- .mizes the mutual information between the ambient space and By combining both local and global patches, LGD approxi-the corresponding projected subspace [13]. Another reason for mates the intrinsic submanifold of relevant samples, as shown Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 20,2010 at 10:56:33 UTC from IEEE Xplore. Restrictions apply.
    • TIAN et al.: ACTIVE RERANKING FOR WEB IMAGE SEARCH 811Fig. 4. Three-dimensional synthetic dataset for dimension reduction illustration. In this dataset, big red “*” and big blue “o” denote labelled relevant and irrelevantsamples, respectively. Small black “*” and small green “o” are unlabelled relevant and irrelevant samples, respectively. As given in (b), LGD reveals the submani-fold of the relevant samples and separates the relevant samples from the irrelevant ones in the projected 2-D subspace. When other dimension reduction algorithmsare adopted, the relevant and irrelevant samples are overlapped in the projected subspace, as shown in (c)–(k). (a) The 3-D synthetic data and its 2-D projections onthe three planes, i.e., XY, XZ, and YZ, (b) LGD, (c) Local patches, (d) Global patches, (e) LGD-LPP, (f) BDA, (g) BMFA, (h) LDE, (i) SLPP, (j) SDA, (k) SML.in Fig. 4(b). Relevant samples can be separated from irrelevant Then, we can combine all the patches defined in (7), (11), andones in the projected 2-D subspace. Besides, we show results (12) togetherof only local patches and only global patches for dimension re-duction in Fig. 4(c) and (d), respectively. Neither of them canperform well. To investigate the effectiveness of the PCA based globalpatches, we replace them with LPP based patches, which arebuilt in a similar way for each sample. We name this LPP basedLGD as LGD-LPP and show its performance in Fig. 4(e). Thisresult is unsatisfactory because LPP assumes there is a manifoldfor both labelled and unlabelled samples which violates thetrue distribution of irrelevant samples. On the other hand, byusing PCA based global patches, the subspace with maximumvariance is preserved, so manifold structure of relevant samplescan also be preserved. By integrating global patches and localpatches, we can discover the intrinsic submanifold of relevantsamples, and separate relevant samples from irrelevant samples.D. Patch Coordinate Alignment Each patch has its own coordinate system. With the calculatedlocal and global patches, we can align them together into a con-sistent coordinate. For each imagecan be rewritten as , where and is the selection matrix. The is defined ac-cording to [38]–[40] as (13) wherewhere is the index vector for samples in . and is a control parameter. Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 20,2010 at 10:56:33 UTC from IEEE Xplore. Restrictions apply.
    • 812 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 3, MARCH 2010 By imposing , the projection matrix When applying the Bayesian reranking for active reranking, can be obtained by solving the standard eigende- modifications will be made to incorporate the new obtained in-composition problem formation, i.e., the images’ labels obtained from SInfo and the effective feature learned via LGD. For a labelled image, its (14) is set as its ground truth label (“1” for relevant and “0” for irrelevant) and large (set as 100 in this paper) is adopted towhere is consisting of the eigenvectors corresponding to the ensure equal or very close to its ground truth label. The graph largest eigenvalues. is built with the learned to model the visual consistency precisely. V. BAYESIAN RERANKING In active reranking, at the very beginning, the Bayesian reranking is performed in the original feature space without To verify the effectiveness of the proposed active reranking labelled images. Then, with the derived , SInfo is conductedmethod, we apply the SInfo active sample selection strategy to select informative images for labeling. By interacting withand the LGD dimension reduction algorithm to reranking. Both the user, the labels of these images are obtained with which theSInfo and LGD are general and can be directly applied to var- effective feature is learned via LGD. With the latest labelledious reranking algorithms, e.g., VisualRank [17]. In this paper, image set as well as , Bayesian reranking is performed towe take the Bayesian reranking [34] as the basic reranking al- derive a new . The final reranking result is obtained by sortinggorithm for illustration. the images according to in a descending order. We first give a brief introduction for Bayesian reranking. In Usually, several interaction rounds are performed to achievethis method, reranking is explicitly formulated into a global op- a satisfactory performance. Therefore, in next interaction round,timization problem. The optimal reranked score list is ob- SInfo and LGD are performed with the new obtained in the lasttained by minimizing the following energy function: round. The overall procedure of our active reranking is summa- rized as follows: (15) 1: Initialization: the image set , the number of interactionwhere is the initial text search score list, is rounds T, labelled image set and .a trade-off parameter and is a graph which is constructed with 2: /* Perform Bayesian reranking to get */nodes being the images and the weights being their visual simi- Bayesian reranking .larities, and is the regularizer, which will be detailed 3: For to T dobelow. 1) /* Perform SInfo to select a set of image */ The two terms on the right hand side of (15) correspond to SInfotwo assumptions, i.e., the visual consistency and the ranking /* Update */consistency, respectively. The first term, i.e., the regularizationterm, penalizes the ranking score inconsistency within visually 2) /* Perform LGD to learn a new */similar samples. The second term is the ranking distance termwhich penalizes the derivation of the reranked results from the 3) /* Perform Bayesian reranking to derive a new */initial text-based search results. Bayesian reranking For the regularization term, the local kernel is adopted 4: End for 5: Return (16) VI. EXPERIMENTS ON SYNTHETIC DATASETSwhere is the local kernel matrix [33]. A point-wise distance In this section, we used three synthetic datasets to illustrateis adopted for the ranking distance the effectiveness of the SInfo sample selection strategy, as shown in Fig. 5 (top). In each dataset, the relevant samples (17) are marked with red stars (“*”) while the irrelevant ones are marked with blue circles (“o”). The initial ranking score list was set randomly since weWith (16) and (17), we obtain had no textual information to simulate the text-based search process. At the beginning stage, one relevant and one irrele- vant sample were randomly selected as the labelled set and the rest were taken as the unlabelled. The initial reranked results [“RerankInitial” curve in Fig. 5 (bottom)] were obtained by reranking without user interactions. Parameters in each method were determined empirically in this paper to achieve its bestwhere with . Then, a closed- performance.form solution for is given by In each interaction round, only one sample was selected for labeling. For each dataset, we have given the reranked results (18) after 4 interaction rounds with different active sample selection Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 20,2010 at 10:56:33 UTC from IEEE Xplore. Restrictions apply.
    • TIAN et al.: ACTIVE RERANKING FOR WEB IMAGE SEARCH 813 Fig. 5. Active reranking on synthetic datasets.strategies. We performed 100 random trials and showed the av- Yahoo), we only know ranks of images in the text-based searcheraged performance, measured by the widely used noninterpo- and their scores are not available. According to [14], the nor-lated Average Precision (AP) [30]. The AP averages the preci- malized rank is adopted as the pseudo score, forsion values obtained when each relevant image occurs. the th ranked image, where and is the number We compared SInfo with other three sample selection of images returned by the Web search engine for a query term.strategies, i.e., “Error Reduction” [43], “Most Uncertain” [4] For active sample selection, five images were selected to in-and “Random”. In “Most Uncertain”, the most ambiguity teract with the user in each interaction round and four roundssamples are selected for interaction according to (3). While were considered. Therefore, for each query, there were 20 im-in “Random”, the query samples are selected randomly. The ages labelled by the user totally. The performance is also mea-comparison results, as shown in Fig. 5 (bottom), demonstrate sured by average precision (AP) [30]. We calculated the APs atthat the proposed strategy outperforms the rival methods different positions from top-1 to top-100 to obtain the AP curve.consistently on all three datasets. This is because “Error Reduc- We averaged the APs over all the 105 queries to get the meantion” and “Most Uncertain” suffer from the small sample size average precision (MAP) for overall performance evaluation.problem. SInfo is more robust because it takes both ambiguityand representativeness into consideration, and thus alleviates A. Active Reranking With SInfothe influence of the small sample size problem. In this section, we will investigate the effectiveness of SInfo sample selection strategy and compare it with other three VII. EXPERIMENTS ON WEB IMAGE SEARCH DATASET methods: “Error Reduction” [43], “Most Uncertain” [4], and We also conducted experiments on a real Web image search “Random.” To be noted, here both the reranking and the activedataset. In this dataset, there are 105 queries selected seriously sample selection were conducted in the original feature space.from a commercial image search engine query log as well as The effectiveness of the LGD dimension reduction algorithmpopular tags of Flickr. These queries cover a large range of will be discussed in Section VII-B, in comparing with othertopics, including named person, named object, general object representative ones.and scene. For each query, a maximum of 1 000 images returned Fig. 6 summarizes the comparison results. The “Baseline”by commercial image search engines, i.e., Google, Live and curve gives the performance of the text-based search results andYahoo, were collected as the initial text-based search results. the “RerankInitial” curve is the performance of the unsuper-This dataset contains 94 341 images in total. For each query, vised reranking without user interactions. The “SInfo”, “Errorthree participants were asked to judge whether the returned im- Reduction”, “Most Uncertain”, and “Random” curves denoteages are query relevant or irrelevant. An image is labelled as the performances of the reranked results with query images se-query relevant if at least two of the three participants judged it lected according to these four strategies respectively.as relevant, and vice versa. Fig. 6 shows the effectiveness of the proposed active Images are represented by 428-D low-level visual features, reranking framework as well as the superiority of the proposedincluding 225-D color moment in LAB color space, 128-D SInfo sample selection strategy. Curves in this figure showwavelet texture as well as 75-D edge distribution histogram. that user’s labeling information helps enhance the rerankingFor the initial text search score list , because images are all performance. User interactions can improve the average perfor-downloaded from Web search engines (e.g., Google, Live and mance, no matter which sample selection strategy is adopted. Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 20,2010 at 10:56:33 UTC from IEEE Xplore. Restrictions apply.
    • 814 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 3, MARCH 2010 Fig. 6. MAP over all queries with different sample selection strategies. Fig. 7. MAP over all queries with different dimension reduction algorithms.Moreover, among these four strategies, SInfo performs bestand achieves a significant performance improvement. This isbecause SInfo considers both the ambiguity and the represen-tativeness while the “Most Uncertain” and “Random” onlytake one side of them into account. For “Error Reduction” and“Most Uncertain”, they both suffer from the small sample sizeproblem while our method alleviates this influence by takingrepresentativeness into account in an unsupervised manner.B. Active Reranking With LGD To test the effectiveness of LGD discussed in Section IV,we conducted the active reranking in the projected subspaceby using different dimension reduction algorithms. The SInfosample selection strategy was adopted in this experiment. We compared LGD with several representative algorithms,including unsupervised algorithm, i.e., PCA [13], supervised Fig. 8. Performance of SML-PCA.ones, i.e., BDA [41], LDE [5] and SLPP [2], as well as semi-su-pervised ones, i.e., SML [22], SDA [3] and LGD-LPP. The sub-space dimension was set to 100 for all algorithms empirically. class are sampled from a Gaussian. However, in Web imageFig. 7 shows the results. The “SInfo” curve denotes the reranked search, each irrelevant image is irrelevant in its own way andresults of active reranking which is conducted in the original thus images in the irrelevant class are not similar to each other,feature space without dimension reduction with the samples se- i.e., it is inconvenient to assume that irrelevant images are fromlected via SInfo. This curve is identical to the “SInfo” curve in an identical Gaussian. Therefore, SDA performed poorly. SMLFig. 6. The performance of reranking via different dimension assumes that all images are sampled from a nonlinear mani-reduction algorithms is denoted as SInfo+DR algorithm name, fold. In image search, irrelevant images usually scatter in thee.g., “SInfo LGD” for performance of LGD. whole space, i.e., they may be distributed uniformly. SML is Fig. 7 shows that LGD performs best among these algorithms prone to over-fit to unlabelled images because of the improperand achieves a more satisfactory performance than “SInfo”. It manifold regularization assumption. To justify this point, wereflects the effectiveness of LGD in localizing the visual charac- replaced the Laplacian regularization in SML with the globalteristics of the user intention. For the other dimension reduction patches in LGD. This method is denoted as SML-PCA. Thealgorithms, reranked performances are either slightly improved experimental results of SML-PCA with varying trade-off pa-or dramatically decreased. PCA fails to capture the user-driven rameter (controls the influence of global patches) are given inintention since it ignores the labeling information. BDA, LDE, Fig. 8. The figure shows that SML-PCA performs much betterand SLPP, which are all supervised dimension reduction algo- than SML, but not as well as LGD. The result of LGD-LPP fur-rithms, only utilize a few labelled images. Thus, the subspace ther confirms that improper manifold regularization is harmful.learned by them is biased to that spanned by several labelled In contrast with them, the proposed LGD duly learned the sub-images and cannot generalize well to the large amount of unla- manifold of the relevant images and overcome the difficultiesbelled ones. discussed above by preserving the local geometry of the labelled For semi-supervised algorithms, SDA is unsuitable for the relevant images through local patches and the global structurereranking task because it assumes that images in an identical of the whole image set via global patches. In Figs. 19 and 20, we Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 20,2010 at 10:56:33 UTC from IEEE Xplore. Restrictions apply.
    • TIAN et al.: ACTIVE RERANKING FOR WEB IMAGE SEARCH 815Fig. 9. Performance of LGD with samples selected via random and SInfo Fig. 10. Performance of SInfo with different . The solid line indicates therespectively. performance of “RerankInitial”, i.e., reranking without user interactions.further illustrate the active reranked results on queries “George tors: in (3) for SInfo, in (7) and in (13) for LGD. And thenW. Bush” and “zebra”. For each query, the top-20 ranked im- we investigate the influence of the interaction rounds of activeages are shown for both the text-based search result and the ac- sample selection and the dimension of the projected feature intive reranked result. For a nice view, we mark the query irrele- LGD. The mean AP averaged over AP@1 to AP@100 is uti-vant images appeared in the result with cross “ ”. These figures lized for overall performance evaluation.show that the proposed active reranking method is effective totarget user’s intention. A. Evaluation on Ambiguity Trade-Off ParameterC. LGD With Random Sample Selection The in (3) plays an important role in balancing the am- biguity estimation, which is one of the two critical aspects In Section VII-B, we have shown that, when samples are se- in SInfo. With close to 1, the ambiguity is derived entirelylected via SInfo, the performance of reranking conducted in from the reranked result and the ambiguity contained in thethe original feature space, i.e., the “SInfo” curve in Fig. 7, is text search prior is ignored. Fig. 10 shows the performance ofconsistently improved when LGD is utilized. As illustrated in SInfo subject to different . In this experiment, the rerankingFig. 7, “SInfo+LGD” performed better than “SInfo”. To verify is conducted in the original feature. The “RerankInitial”, i.e.,the sensitivity of LGD to sample selection strategy, we fur- reranking without user interactions, is also given for compar-ther conducted experiments for LGD when samples were ran- ison, denoted by the solid line in Fig. 10.domly selected. The experimental results are given in Fig. 9, in Fig. 10 shows that the performance of SInfo increases whenwhich the result of LGD with SInfo is also given for compar- growing and arrives at the peak with . This value isison. From this figure, we can see that “Random LGD” out- close to the best setup for the text search prior that have been re-performs “Random” and “SInfo+LGD” outperforms “SInfo”. ported in other applications which is around 0.85 [15], [17]. ByIt demonstrates the robustness of LGD to varying sample se- further comparing with “RerankInitial”, we can see that SInfolection strategies. Further comparing the performance of LGD outperforms it consistently no matter which is adopted. It il-with “Random” and “SInfo”, we can see that “SInfo LGD” lustrates the effectiveness of SInfo for reranking.achieves better performance than “Random+LGD”. This is be-cause more informative samples are selected in “SInfo” and thus B. Evaluation on Local Patch Trade-Off Parameterwith which LGD can learn the user intentions more effectively. We also investigated the influence of the trade-off parameterIn other words, a better active sample selection algorithm can in (7) for LGD when building the local patch for labelledbring more benefits to LGD. This phenomenon shows that both relevant images. A large reflects the importance of separatingsample selection and dimension reduction are important for ac- irrelevant samples from relevant ones, i.e., the discriminativetive reranking and thus should be elaborately developed. information, with less attention given to the local geometry of relevant images. Fig. 11 shows the performance of LGD with VIII. PARAMETER SENSITIVITY different , from which we can have the following observations. In this section, we analyse the sensitivity of important pa- • When is small, e.g., less than 0.3, the performance isrameters in SInfo and LGD for active reranking. The analyses unsatisfactory and even worse than “SInfo” (solid line inare performed based on the experiments conducted on the Web Fig. 11). This is because that in this situation the local ge-image search dataset. The experiments are conducted with SInfo ometry within labelled relevant images is mainly preservedactive sample selection and LGD dimension reduction, if not ex- while important discriminative information is less consid-plicitly stated otherwise. We first analyse some important fac- ered. This phenomenon reveals the importance of the dis- Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 20,2010 at 10:56:33 UTC from IEEE Xplore. Restrictions apply.
    • 816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 3, MARCH 2010 Fig. 13. Average AP over the first three pages of results.Fig. 11. Performance of LGD with different
    • . The solid line indicates the per-formance of “SInfo”, i.e., active reranking in the original feature space withoutdimension reduction. Fig. 14. Comparison of the average number of irrelevant images per query.Fig. 12. Performance of LGD with different . The solid line indicates the per-formance of “SInfo”, i.e., active reranking in the original feature space withoutdimension reduction. criminative information contained in the labelled relevant and irrelevant images. • The performance of LGD increases when growing and reaches the optimal value at . However, the AP decreases when larger than this best setup and gives a steady performance when in which case the dis- criminative information dominates the local patch and the local geometry is ignored. Therefore, both the local geometry and the discriminate infor-mation reflect the information contained in local patches fromdifferent aspects for complimentary. A suitable combination ofthem is essential to achieve a good performance. Fig. 15. Performance of LGD with different interaction rounds.C. Evaluation on Local-Global Patch Trade-Off Parameter Both the local and global patches reflect data information , only global patches are involved and LGD degrades tofrom different aspects. To investigate the contributions of these PCA in this case. A proper is demanded to balance them. Ac-two parts, we have tested the performance of LGD with different cording to our empirical comparisons, the best setup for istrade-offs . When , only local patches are utilized. When 0.03, as shown in Fig. 12. The solid line in this figure indicates Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 20,2010 at 10:56:33 UTC from IEEE Xplore. Restrictions apply.
    • TIAN et al.: ACTIVE RERANKING FOR WEB IMAGE SEARCH 817Fig. 16. Performance curves of
    • with different number of labelled images. (a) # Labeled = 5, (b) # Labeled = 10, (c) # Labeled = 15, (d) # Labeled = 20.the performance of “SInfo”, i.e., reranking in the original fea- D. Evaluation on Number of Interaction Rounds for Activeture space without dimension reduction. Fig. 12 shows that LGD Sample Selectionoutperforms it consistently with various and LGD is robust. More labelled images will bring more information and thus As shown in Fig. 12, the improvement of LGD over PCA oc- a better performance can be achieved. However, users usuallycurs in a range of [0.01, 0.05] for . This range seems to be a lose their patience after a few interaction rounds. Therefore, itlittle narrow. However, it is worth emphasizing that only a small is important to find out a good trade-off between the rerankingpart of images (around % in experiments, half for performance and the number of the interaction rounds. In thisrelevant and half for irrelevant images) are labelled. As a con- experiment, we investigated the performance of reranking withsequence, the number of global patches is much more than that interaction rounds varying from 1 to 20. In each round, 5 imagesof the labelled relevant and irrelevant patches. After eliminating are selected via SInfo for labeling. LGD is adopted to learn theissue of the patch number imbalance, the range for is mod- effective subspace for reranking.erate, i.e., it is around [1.0, 5.0]. The experimental results are illustrated in Fig. 15. Zero For the comparison between LGD and PCA, in Fig. 12, we interaction round means that the reranking conducted withoutonly give the overall performance of mean AP averaging over user interactions, i.e., the “Reranking Initial”. When interactiontop-1 to top-100 ranked images. We refer the reader back to round increases from 0 to 4, the performance receives dramaticFig. 7 for sufficient details. Fig. 7 shows that LGD outperforms improvements steadily. However, when more interactionsPCA consistently on top-1 to top-100 ranked images. It is worth are performed, the performance increases slowly and evenemphasizing that it is very difficult to improve the baselines for shows slightly decreasing at certain rounds. As a consequence,Web data based applications and 1% improvement is usually ac- reranking with 4 interaction rounds is a good choice by consid-knowledged, e.g., TRECVID [19]. The top-20 images are im- ering both the reranking performance and user tolerance.portant in Web search because they are displayed on the firstpage and dominant the user’s evaluation of the search results. E. Influence of Labelled Image Size on Model ParametersComparing with PCA, much more improvements are obtainedby LGD, i.e., LGD finds at least one more relevant image for In Sections VIII-B and C, we have discussed the influencetop-20 ranked images every five runs. This is practically signif- of parameters and in LGD to the reranking performanceicant. Fig. 13 shows the average performance of LGD versus when 20 images (4 interaction rounds with 5 images labelledPCA over top-1 to top-20, top-21 to top-40, top-41 to top-60, per round) are labelled. In this section, we turn to investigateand top-1 to top-60 ranked images, which corresponds to the the influence of the number of labelled images on these modelfirst 3 pages of results (assuming 20 images are displayed on parameters. Fig. 16 shows the performance curves of witheach page). LGD improves PCA consistently. different number of labelled images while Fig. 17 illustrates that Besides the AP, another evaluation criterion [17] is also of .introduced for performance evaluation. It is the average number The in (7) is utilized to balance the influence of the local ge-of irrelevant images per query among the top-k ranked results. ometry and the discriminative information in labelled relevantFig. 14 illustrates the statistical results. Among the top-20 patch. A larger indicates more emphasis is assigned to sepa-ranked images, LGD gives an average of 2.26 irrelevant results rating the labelled relevant images from irrelevant ones while aand represents about 10 percent drop, compared with the smaller reflects that more attention is assigned to the local ge-2.51 obtained by SInfo. However, PCA gives 2.50 irrelevant ometry of relevant images. In Fig. 16(a), we can see that whenresults which are very close to that given by SInfo. For overall only 5 images are labelled, a smaller (less than 0.3) givesevaluation, compared with SInfo, LGD shows about 10% drop better performance which indicates that the local geometry isconsistently while PCA only gives less than 5%. more important. Because the irrelevant images are much more Finally, considering the complexity of Web images (collected diverse than the relevant ones, over-fitting may occur if morefrom varying sources, taken from different viewpoints, with dif- emphasis is assigned to the discriminative information with onlyferent size, qualities/resolutions and complex backgrounds, and few labelled images. When more images are labelled, the dis-high diversity), this improvement is practically acceptable. criminative information is more reliable and thus a larger is Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 20,2010 at 10:56:33 UTC from IEEE Xplore. Restrictions apply.
    • 818 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 3, MARCH 2010Fig. 17. Performance curves of with different number of labelled images. (a) # Labeled = 5, (b) # Labeled = 10, (c) # Labeled = 15, (d) # Labeled = 20. Fig. 19. Query “George W. Bush”.Fig. 18. Performance of LGD with features projected onto the subspaces withdifferent dimensions.preferred. Fig. 16(c) and (d) shows that the best performance isachieved when is around 1.0 which means the local geometryand the discriminative information are equally important. The in (13) is utilized to control the influence of the globalpatches. Fig. 17(a) shows that a larger is preferred when fewerimages are labelled. With few labelled images, little informa-tion is contained in them and thus the global patches play themain role. Fig. 17(d) shows that when the number of labelled im-ages is augmented, the discriminative information and the localgeometry become robust and thus a smaller provides betterperformance.F. Evaluation on Dimension of the Projected Subspace LGD aims to learn a submanifold from the ambient visual fea-ture space to express the user’s intention. To find out a proper di- Fig. 20. Query “zebra”.mension of the projected feature, the following experiment hasbeen done to investigate the influence of the dimension. Fig. 18shows the performance of LGD with features projected onto IX. CONCLUSIONthe subspaces with different dimensions. When the dimensionis too low, e.g., less than 50, the learned subspace is insufficient This paper has presented a novel active reranking frameworkto encode the intention so the reranking performance is poor. for Web image search by using user interactions. To target theWhen dimension equals or closes to that of the ambient feature user’s intention effectively and efficiently, we have proposed anspace, i.e., 428 in this paper, no or less benefit can be obtained active sample selection strategy and a dimension reduction al-from LGD. From our experiments, the active reranking achieved gorithm, to reduce labeling efforts and to learn the visual char-its best performance with the dimension of 100, which gave a acteristics of the intention respectively. To select the most in-good trade-off. Besides, lower dimension leads to a less compu- formative query images, the structural information based ac-tational cost for active reranking. tive sample selection strategy takes both the ambiguity and the Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 20,2010 at 10:56:33 UTC from IEEE Xplore. Restrictions apply.
    • TIAN et al.: ACTIVE RERANKING FOR WEB IMAGE SEARCH 819representativeness into consideration. To learn the visual char- [23] W. Liu, D. Tao, and J. Liu, “Transductive component analysis,” in Proc.acteristics, a new local-global discriminative dimension reduc- IEEE Int. Conf. Data Mining Series, 2008, pp. 433–442. [24] I. J. Myung and M. A. Pitt, “Applying occam’s razor in modeling cog-tion algorithm transfers the local information in the domain of nition: A bayesian approach,” Psych. Bull. Rev., 1997.the labelled images domain to the whole image database. The [25] H. T. Nguyen and A. Smeulders, “Active learning using pre-clustering,”experiments on both synthetic datasets and a real Web image in Proc. Int. Conf. Machine Learning, 2004, pp. 623–630. [26] E. Parzen, “The annals of mathematical statistics,” On Estimation of asearch dataset have demonstrated the effectiveness of the pro- Probability Density Function and Mode, pp. 1065–1076, 1962.posed active reranking scheme, including both the sample se- [27] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction bylection strategy and the dimension reduction algorithm. locally linear embedding,” Science, pp. 2323–2326, 2000. [28] D. Tao, X. Li, X. Wu, and S.-J. Maybank, “Geometric mean for sub- space selection,” IEEE Trans. Pattern Anal. Mach. Intell., pp. 260–274, 2009. [29] D. Tao and X. Tang, “Nonparametric discriminant analysis in relevance feedback for content-based image retrieval,” in Proc. IEEE Int. Conf. REFERENCES Pattern Recognition, 2004, pp. 1013–1016. [30] Trec-10 Proceddings Appendix on Common Evaluation Measures [Online]. Available: http://trec.nist.gov/pubs/trec10/appendices/mea- [1] D. Cai, X. He, J. Han, and H.-J. Zhang, “Orthogonal laplacianfaces sures.pdf for face recognition,” IEEE Trans. Image Process., pp. 3608–3614, [31] J. B. Tenenbaum, V. de Silva, and J. C. Langford, “A global geo- 2006. metric framework for nonlinear dimensionality reduction,” Science, pp. [2] D. Cai, X. He, and J. Han, Using Graph Model for Face Analysis, Tech. 2319–2323, Dec. 2000. Rep., 2005, Comput. Sci. Dept., Univ. Illinois, Urbana-Champaign. [32] S. Thrun and T. M. Mitchell, “Learning one more thing,” in Proc. Int. [3] D. Cai, X. He, and J. Han, “Semi-supervised discriminant analysis,” in Joint Conf. Artificial Intelligence, 1995, pp. 1217–1225. Proc. IEEE Int. Conf. Computer Vision, 2007, pp. 1–8. [33] X. Tian, L. Yang, J. Wang, X. Wu, and X.-S. Hua, “Transductive video [4] E. Y. Chang, S. Tong, K. Goh, and C.-W. Chang, “Support vector annotation via local learnable kernel classifier,” in Proc. IEEE Int. Conf. machine concept-dependent active learning for image retrieval,” IEEE Multimedia & Expo, 2008, pp. 1509–1512. Trans. Multimedia, 2005. [34] X. Tian, L. Yang, J. Wang, Y. Yang, X. Wu, and X.-S. Hua, “Bayesian [5] H.-T. Chen, H.-W. Chang, and T. L. Liu, “Local discriminant embed- video search reranking,” in Proc. ACM Int. Conf. Multimedia, 2008, ding and its variants,” in IEEE Int. Conf. Computer Vision and Pattern pp. 131–140. Recognition, 2005, pp. 846–853. [35] V. N. Vapnik, Statistical Learning Theory. New York: Wiley, 1998. [6] J. Cui, F. Wen, and X. Tang, “Real time google and live image search [36] L. Wang, K. L. Chan, and Z. Zhang, “Bootstrapping svm active re-ranking,” presented at the ACM Int. Conf. Multimedia, 2008. learning by incorporating unlabelled images for image retrieval,” in [7] R. A. Fisher, “The use of multiple measurements in taxonomic prob- Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, 2003, lems,” Ann. Eugen., pp. 179–188, 1936. pp. 629–634. [8] Y. Fu and T. Huang, “Image classification using correlation tensor [37] D. Xu, S. Yan, D. Tao, and H.-J. Zhang, “Marginal fisher analysis and analysis,” IEEE Trans. Image Process., pp. 226–234, 2008. its variants for human gait recognition and content-based image re- [9] Y. Fu, S. Yan, and T. Huang, “Correlation metric for generalized trieval,” IEEE Trans. Image Process., pp. 2811–2821, 2007. feature extraction,” IEEE Trans. Pattern Anal. Mach. Intell., pp. [38] T. Zhang, D. Tao, X. Li, and J. Yang, “Patch alignment for dimension- 2229–2235, 2008. ality reduction,” IEEE Trans. Knowl. Data Eng., pp. 1299–1313, 2009. [10] X. He, D. Cai, and J. Han, “Learning a maximum margin subspace for [39] T. Zhang, D. Tao, and J. Yang, “Discriminative locality alignment,” in image retrieval,” IEEE Trans. Knowl. Data Eng., pp. 189–201, 2008. Proc. European Conf. Computer Vision, 2008, pp. 725–738. [11] X. He and P. Niyogi, “Locality preserving projections,” Adv. Neural [40] Z. Zhang and H. Zha, “Principal manifolds and nonlinear dimension- Inf. Process. Syst., 2003. ality reduction via tangent space alignment,” SIAM J. Sci. Comput., pp. [12] S. C. H. Hoi and M. R. Lyu, “A semi-supervised active learning frame- 313–338, 2004. work for image retrieval,” in Proc. IEEE Int. Conf. Computer Vision [41] X. S. Zhou and T. S. Huang, “Small sample learning during multimedia and Pattern Recognition, 2005, pp. 302–309. retrieval using biasmap,” in Proc. IEEE Int. Conf. Computer Vision and [13] H. Hotteling, “Analysis of a complex of statistical variables into prin- Pattern Recognition, 2001, pp. 11–17. cipal components,” J. Ed. Psych., pp. 417–441, 1933. [42] X. Zhu, Z. Ghahramani, and J. Lafferty, “Semi-supervised learning [14] W. H. Hsu, L. S. Kennedy, and S.-F. Chang, “Video search reranking using gaussian fields and harmonic functions,” in Proc. Int. Conf. Ma- via information bottleneck principle,” in Proc. ACM Int. Conf. Multi- chine Learning, 2003, pp. 912–919. media, 2006, pp. 35–44. [43] X. Zhu, J. Lafferty, and Z. Ghahramani, “Combining active leanring [15] W. H. Hsu, L. S. Kennedy, and S.-F. Chang, “Video search reranking and semi-suppervised learning using gaussian fields and harmonic through random walk over document-level context graph,” in Proc. functions,” in Proc. Int. Conf. Machine Learning, 2003, pp. 58–65. ACM Int. Conf. Multimedia, 2007, pp. 971–980. [16] H. D. III and D. Marcu, “Domain adaptation for statistical classifiers,” J. Artif. Intell. Res., pp. 101–126, 2006. [17] Y. Jing and S. Baluja, “Pagerank for product image search,” in Proc. Int. Conf. World Wide Web, 2008, pp. 307–316. [18] T. Joachims, “Transductive inference for text classification using sup- port vector machines,” in Proc. Int. Conf. Machine Learning, 1999, pp. 200–209. Xinmei Tian received the B.S. degree in 2005 from [19] L. S. Kennedy and S.-F. Chang, “A reranking approach for context- the University of Science and Technology of China, based concept fusion in video indexing and retreival,” in Proc. ACM Hefei, where she is currently pursuing the Ph.D. de- gree in the Department of Electronic Engineering and Int. Conf. Image and Video Retrieval, 2007, pp. 333–340. Information Science. [20] D. D. Lewis and W. A. Gale, “A sequential algorithm for training text From December 2007 to July 2008, she was a classifiers,” in Proc. ACM Int. Conf. Research and Development in In- Research Intern with the Internet Media Group at formation Retrieval, 1994, pp. 3–12. Microsoft Research Asia, Beijing. From August [21] X. Li, S. Lin, S. Yan, and D. Xu, “Discriminant locally linear embed- 2008 to December 2008, she was a Research Assis- ding with high-order tensor data,” IEEE Trans. Syst., Man, Cybern. B, tant with the School of Computing, the Hong Kong Cybern., pp. 342–352, 2008. Polytechnic University. Her current research inter- [22] Y.-Y. Lin, T.-L. Liu, and H.-T. Chen, “Semantic manifold learning for ests include computer vision, content-based video analysis, and image/video image retrieval,” in Proc. ACM Int. Conf. Multimedia, 2005, pp. 06–11. search reranking. Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 20,2010 at 10:56:33 UTC from IEEE Xplore. Restrictions apply.
    • 820 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 3, MARCH 2010 Dacheng Tao (M’07) received the B.Eng. degree Xian-Sheng Hua (M’04) received the B.S. and from the University of Science and Technology Ph.D. degrees from Peking University, Beijing, of China, the M.Phil. degree from the Chinese China, in 1996 and 2001, respectively, both in University of Hong Kong, and the Ph.D. degree from applied mathematics. the University of London, London, U.K. When he was at Peking University, his major re- Currently, he is a Nanyang Assistant Professor search interests were in the areas of image processing with the School of Computer Engineering, Nanyang and multimedia watermarking. Since 2001, he has Technological University. His research is mainly been with Microsoft Research Asia, Beijing, where on applying statistics and mathematics for data he is currently a Lead Researcher with the Internet analysis problems in computer vision, data mining, Media Group. He is also an Adjunct Professor at the machine learning, multimedia, and video surveil- University of Science and Technology of China. Hislance. He has published more than 100 scientific articles extensively in the current interests are in the areas of video content analysis, multimedia search,IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, management, authoring, sharing, and advertising. He has authored more thanthe IEEE TRANSACTIONS ON IMAGE PROCESSING, the IEEE TRANSACTIONS ON 100 publications at prestigious international conferences and journals includingKNOWLEDGE AND DATA ENGINEERING, CVPR, ECCV, ICDM, ACM, TKDD, ACM MM, CVPR, and the IEEE TRANSACTIONS ON MULTIMEDIA. He also hasMultimedia, KDD, etc. 30 filed or issued patents. Dr. Tao is the recipient of best paper awards and finalists. He holds the K. C. Dr. Hua received the Best Paper Award and the Best Demonstration AwardWong Education Foundation Award of the Chinese Academy of Sciences. from the ACM International Conference on Multimedia in 2007. He is a member of the Association for Computing Machinery and serves as an Associate Editor of the IEEE TRANSACTIONS ON MULTIMEDIA and Editorial Board Member of Multimedia Tools and Applications. Xiuqing Wu received the B.S. degree from the Uni- versity of Science and Technology of China, Hefei, in 1965. She is a Professor in the Department of Electronic Engineering and Information Science, University of Science and Technology of China. From 1985 to 1986, she was a Visiting Scientist in the Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA. Her research interests include intel- ligent information processing, multiresource data fusion, and digital image analysis. Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 20,2010 at 10:56:33 UTC from IEEE Xplore. Restrictions apply.