2294 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012Semisupervised Biased Maximum MarginAnalysis for Int...
2296 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012where is a kernel function. The kernel version of th...
2298 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012algorithm optimizes the objective function in a trac...
2300 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012TABLE ISemiBMMAFig. 3. Illustration of the SVM hyper...
2302 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012Fig. 7. Four feature extraction methods (i.e., LDA, ...
2304 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012Fig. 10. Precision–scope curves after the first and s...
2308 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012Lining Zhang (S’11) received the B.Eng. degree andth...
Upcoming SlideShare
Loading in …5

Semisupervised biased maximum margin analysis for interactive image retrieval


Published on

For more projects visit @ www.nnaocdac.com

Published in: Technology, Education
1 Like
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Semisupervised biased maximum margin analysis for interactive image retrieval

  1. 1. 2294 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012Semisupervised Biased Maximum MarginAnalysis for Interactive Image RetrievalLining Zhang, Student Member, IEEE, Lipo Wang, Senior Member, IEEE, and Weisi Lin, Senior Member, IEEEAbstract—With many potential practical applications, con-tent-based image retrieval (CBIR) has attracted substantial atten-tion during the past few years. A variety of relevance feedback(RF) schemes have been developed as a powerful tool to bridgethe semantic gap between low-level visual features and high-levelsemantic concepts, and thus to improve the performance of CBIRsystems. Among various RF approaches, support-vector-machine(SVM)-based RF is one of the most popular techniques in CBIR.Despite the success, directly using SVM as an RF scheme hastwo main drawbacks. First, it treats the positive and negativefeedbacks equally, which is not appropriate since the two groupsof training feedbacks have distinct properties. Second, most of theSVM-based RF techniques do not take into account the unlabeledsamples, although they are very helpful in constructing a goodclassifier. To explore solutions to overcome these two drawbacks,in this paper, we propose a biased maximum margin analysis(BMMA) and a semisupervised BMMA (SemiBMMA) for in-tegrating the distinct properties of feedbacks and utilizing theinformation of unlabeled samples for SVM-based RF schemes.The BMMA differentiates positive feedbacks from negative onesbased on local analysis, whereas the SemiBMMA can effectivelyintegrate information of unlabeled samples by introducing aLaplacian regularizer to the BMMA. We formally formulate thisproblem into a general subspace learning task and then proposean automatic approach of determining the dimensionality of theembedded subspace for RF. Extensive experiments on a largereal-world image database demonstrate that the proposed schemecombined with the SVM RF can significantly improve the perfor-mance of CBIR systems.Index Terms—Content-based image retrieval (CBIR), graph em-bedding, relevance feedback (RF), support vector machine (SVM).I. INTRODUCTIONDURING the past few years, content-based image retrieval(CBIR) has gained much attention for its potential appli-cations in multimedia management [1], [2]. It is motivated byManuscript received November 01, 2010; revised March 08, 2011, May 13,2011, and September 16, 2011; accepted November 17, 2011. Date of publica-tion December 02, 2011; date of current version March 21, 2012. This work wassupported in part by the Institute for Media Innovation (IMI), Nanyang Tech-nological University, Singapore, under an IMI scholarship, and the SingaporeMinistry of Education Academic Research Fund (AcRF) Tier 2 under GrantT208B1218. The associate editor coordinating the review of this manuscriptand approving it for publication was Prof. Gang Hua.L. Zhang is with the School of Electrical and Electronic Engineering,Nanyang Technological University, Singapore 639798, and also with theInstitute for Media Innovation, Nanyang Technological University, Singapore637553 (e-mail: zhan0327@e.ntu.edu.sg).L. Wang is with the School of Electrical and Electronic Engineering, NanyangTechnological University, Singapore 639798 (e-mail: elpwang@ntu.edu.sg).W. Lin is with the School of Computer Engineering, Nanyang TechnologicalUniversity, Singapore 639798 (e-mail: wslin@ntu.edu.sg).Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TIP.2011.2177846the explosive growth of image records and the online accessi-bility of remotely stored images. An effective search schemeis urgently required to manage the huge image database. Dif-ferent from the traditional search engine, in CBIR, an imagequery is described by using one or more example images, andlow-level visual features (e.g., color [3]–[5], texture [5]–[7],shape [8]–[10], etc.) are automatically extracted to representthe images in the database. However, the low-level featurescaptured from the images may not accurately characterize thehigh-level semantic concepts [1], [2].To narrow down the so-called semantic gap, relevance feed-back (RF) was introduced as a powerful tool to enhance theperformance of CBIR [11], [12]. Huang et al. introduced boththe query movement and the reweighting techniques [13], [14].A self-organizing map was used to construct the RF algorithms[15]. In [16], one-class support vector machine (SVM) esti-mated the density of positive feedback samples. Derived fromone-class SVM, a biased SVM inherited the merits of one-classSVM but incorporated the negative feedback samples [17].Considering the geometry structure of image low-level visualfeatures, [18] and [19] proposed manifold-learning-based ap-proaches to find the intrinsic structure of images and improvethe retrieval performance. With the observation that “all posi-tive examples are alike; each negative example is negative inits own way,” RF was formulated as a biased subspace learningproblem, in which there is an unknown number of classes, butthe user is only concerned about the positive class [20]–[22].However, all of these methods have some limitations. Forexample, the method in [13] and [14] is heuristically based,the density estimation method in [16] ignores any informationcontained in the negative feedback samples, and the discrimi-nant subspace learning techniques in [20] and [22] often sufferfrom the so-called “small sample size” problem. Regarding thepositive and negative feedbacks as two different groups, classi-fication-based RFs [23]–[25] have become a popular techniquein the CBIR community. However, RF is very different fromthe traditional classification problem because the feedbacksprovided by the user are often limited in real-world imageretrieval systems. Therefore, small sample learning methodsare most promising for RF.Two-class SVM is one of the popular small sample learningmethods widely used in recent years and obtains the state-of-the-art performance in classification for its good generalizationability [24]–[28]. The SVM can achieve a minimal structuralrisk by minimizing the Vapnik–Chervonenkis dimensions [27].Guo et al. developed a constrained similarity measure for imageretrieval [26], which learns a boundary that divides the imagesinto two groups, and samples inside the boundary are ranked bytheir Euclidean distance to the query image. The SVM activelearning method selects samples close to the boundary as the1057-7149/$26.00 © 2011 IEEE
  2. 2. ZHANG et al.: SEMISUPERVISED BIASED MAXIMUM MARGIN ANALYSIS FOR INTERACTIVE IMAGE RETRIEVAL 2295Fig. 1. Typical set of positive- and negative-feedback samples in an RFiteration.most informative samples for the user to label [28]. Randomsampling techniques were applied to alleviate unstable, biased,and overfitting problems in SVM RF [25]. Li et al. proposed amultitraining SVM method by adapting a cotraining techniqueand a random sampling method [29]. Nevertheless, most of theSVM RF approaches ignore the basic difference between thetwo distinct groups of feedbacks, i.e., all positive feedbacksshare a similar concept while each negative feedback usuallyvaries with different concepts. For instance, a typical set offeedback samples in RF iteration is shown in Fig. 1. All thesamples labeled as positive feedbacks share a common concept(i.e., elephant), while each sample labeled as negative feedbackvaries with diverse concepts (i.e., flower, horse, banquet, hill,etc.). Traditional SVM RF techniques treat positive and negativefeedbacks equally [24]–[26], [28], [29]. Directly using SVM asan RF scheme is potentially damaging to the performance ofCBIR systems. One problem stems from the fact that differentsemantic concepts live in different subspaces and each imagecan live in many different subspaces, and it is the goal of RFschemes to figure out “which one” [20]. However, it will be aburden for traditional SVM-based RF schemes to tune the in-ternal parameters to adapt to the changes of the subspace. Suchdifficulties have severely degraded the effectiveness of tradi-tional SVM RF approaches for CBIR. Additionally, it is prob-lematic to incorporate the information of unlabeled samples intotraditional SVM-based RF schemes for CBIR, although unla-beled samples are very helpful in constructing the optimal clas-sifier, alleviating noise and enhancing the performance of thesystem.To explore solutions to these two aforementioned problems inthe current technology, we propose a biased maximum marginanalysis (BMMA) and a semisupervised BMMA (SemiBMMA)for the traditional SVM RF schemes, based on the graph-em-bedding framework [30]. The proposed scheme is mainly basedon the following: 1) the effectiveness of treating positive exam-ples and negative examples unequally [20]–[22]; 2) the signif-icance of the optimal subspace or feature subset in interactiveCBIR; 3) the success of graph embedding in characterizing in-trinsic geometric properties of the data set in high-dimensionalspace [30]–[32]; and 4) the convenience of the graph-embed-ding framework in constructing semisupervised learning tech-niques. With the incorporation of BMMA, labeled positive feed-backs are mapped as close as possible, whereas labeled negativefeedbacks are separated from labeled positive feedbacks by amaximum margin in the reduced subspace. The traditional SVMcombined with BMMA can better model the RF process and re-duce the performance degradation caused by distinct propertiesof the two groups of feedbacks. The SemiBMMA can incorpo-rate the information of unlabeled samples into the RF and effec-tively alleviate the overfitting problem caused by the small sizeof labeled training samples. To show the effectiveness of theproposed scheme combined with the SVM RF, we will com-pare it with the traditional SVM RF and some other relevantexisting techniques for RF on a real-world image collection.Experimental results demonstrate that the proposed scheme cansignificantly improve the performance of the SVM RF for imageretrieval.The rest of this paper is organized as follows: In Section II, therelated previous work, i.e., the principle of SVM RF for CBIRand the graph-embedding framework, are briefly reviewed. InSection III, we introduce the BMMA and the SemiBMMA forSVM RF. An image retrieval system is given in Section IV. Alarge number of experiments that validate the effectiveness ofthe proposed scheme are given in Section V. Conclusion andfuture work are presented in Section VI.II. RELATED PREVIOUS WORKA. Principle of SVM RF for CBIRHere, we briefly introduce the principle of the traditionalSVM-based RF for CBIR. The SVM implements the structurerisk minimization by minimizing Vapnik–Chervonenkis dimen-sions [27]. Consider a linearly separable binary classificationproblem as follows:and (1)where denotes a -dimensional vector, is the number oftraining samples and is the label of the class that the vectorbelongs to. The objective function of SVM aims to find an op-timal hyperplane to separate the two classes, i.e.,(2)where is an input vector, is a weight vector, and is a bias.SVM attempts to find the two parameters and for the op-timal hyperplane by maximizing the geometric margin ,subject to(3)The solution of the objective function can be found througha Wolf dual problem with the Lagrangian multiplied by , i.e.,(4)subject to and .In general, in the dual problem, data points appear only in theinner product, which can be often replaced by a positive definitekernel function for better performance, i.e.,(5)
  3. 3. 2296 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012where is a kernel function. The kernel version of the Wolfedual problem is(6)Thus, for a given kernel function, the SVM classifier is givenbysgn (7)where is the output hyper-plane decision function of SVM and is the number of supportvectors.Generally, the output of SVM, i.e., , is usually used tomeasure the similarity between a given pattern and the queryimage in the traditional SVM RF for CBIR. The performanceof a SVM classifier mainly depends on the number of supportvectors. Orthogonal complement component analysis (OCCA)decreases the number of support vectors by finding a subspace,in which all the positive feedbacks are merged [33]. However, itstill totally ignores the information contained in negative feed-backs, which is very helpful in finding a homogeneous sub-space. Intuitively, good separation is achieved by the hyperplanethat has the largest distance to the nearest training samples, sincein general, the larger the margin, the lower the generalizationerror of the classifier.B. Graph-Embedding FrameworkIn order to describe our proposed approach clearly, we first re-view the graph-embedding framework introduced in [30]. Gen-erally, for a classification problem, the sample set can be rep-resented as matrix , whereindicates the total number of the samples and is the featuredimensionality. Let be an undirected similaritygraph, which is called an intrinsic graph, with vertices setand similarity matrix . The similarity matrix isreal and symmetric, and measures the similarity between a pairof vertices; can be formed using various similarity criteria.The corresponding diagonal matrix and the Laplacian matrixof graph can be defined as follows:(8)Graph embedding of graph is defined as an algorithmto determine the low-dimensional vector representationsof the vertex set , where islower than for dimensionality. The column vector is theembedding vector for vertex , which preserves the similari-ties between pairs of vertices in the original high-dimensionalspace. Then, in order to characterize the difference betweenpairs of vertices in the original high-dimensional space, apenalty graph is also defined, where verticesare the same as those of , but the edge weight matrixcorresponds to the similarity characteristics that are tobe suppressed in the low-dimensional feature space. For adimensionality reduction problem, direct graph embeddingrequires an intrinsic graph , whereas a penalty graph isnot a necessary input. Then, the similarities among vertex pairscan be maintained according to the graph-preserving criterionas follows:(9)where is the trace of an arbitrary square matrix, is a con-stant, and is the constraint matrix. may typically be a diag-onal matrix for scale normalization or express more general con-straints among vertices in a penalty graph , and it describesthe similarities between vertices that should be avoided; oris the Laplacian matrix of , similar to (8), which can bealso defined as follows:(10)where is the similarity matrix of the penalty graph tomeasure the difference between a pair of vertices in .The graph-embedding framework preserves the intrinsicproperty of the samples in two ways. For larger similaritybetween samples and , the distance between andshould be smaller to minimize the objective function. Con-versely, smaller similarity between and should lead tolarger distance between and . Hence, through the intrinsicgraph and the penalty graph , the similarities and thedifferences among vertex pairs in graph can be preserved inthe embedding.In [30], based on the graph-embedding framework, (9) canbe resolved by converting it into the following trace ratioformulation:(11)Generally, if the constraint matrix represents only scalenormalization, then this ratio formulation can be directlysolved by eigenvalue decomposition. However, for a moregeneral constraint matrix, it can be approximately solved withgeneralized eigenvalue decomposition by transforming theobjective function into a more tractable approximate form.With the assumption that the low-dimensional vector repre-sentations of the vertices can be obtained from a linear projec-tion, i.e., , where is the projection matrix, then theobjective function (11) can be changed to(12)During the past few years, a number of manifold-learning-based feature extraction methods have been proposed to capturethe intrinsic geometry property [31], [32], [34]–[37], In [30],Yan et al. claimed that all of the mentioned manifold learningalgorithms can be mathematically unified within the graph-em-bedding framework described here. They also proposed a mar-ginal Fisher analysis (MFA), which takes both the manifold ge-ometry and the class information into consideration. However,MFA still suffers from the “small sample size” problem whenthe training samples are insufficient, which is always the casein image retrieval.
  4. 4. ZHANG et al.: SEMISUPERVISED BIASED MAXIMUM MARGIN ANALYSIS FOR INTERACTIVE IMAGE RETRIEVAL 2297III. BMMA AND SEMIBMMA FOR SVM RF IN CBIRWith the observation that “all positive examples are alike;each negative example is negative in its own way,” the twogroups of feedbacks have distinct properties for CBIR [20].However, the traditional SVM RF treats the positive and neg-ative feedbacks equally.To alleviate the performance degradation when using SVMas an RF scheme for CBIR, we explore solutions based on theargument that different semantic concepts lie in different sub-spaces and each image can lie in many different concept sub-spaces [20]. We formally formulate this problem into a generalsubspace learning problem and propose a BMMA for the SVMRF scheme. In the reduced subspace, the negative feedbacks,which differ in diverse concepts with the query sample, are sepa-rated by a maximum margin from the positive feedbacks, whichshare a similar concept with the query sample. Therefore, wecan easily map the positive and negative feedbacks onto a se-mantic subspace in accordance with human perception of theimage contents.To utilize the information of unlabeled samples in the data-base, we introduced a Laplacian regularizer to the BMMA,which will lead to SemiBMMA for the SVM RF. The resultantLaplacian regularizer is largely based on the notion of localconsistency, which was inspired by the recently emergingmanifold learning community and can effectively depict theweak similarity relationship between unlabeled samples pairs.Then, the remaining images in the database are projected ontothis resultant semantic subspace, and a similarity measure is ap-plied to sort the images based on the new representations. Forthe SVM-based RFs, the distance to the hyperplane of the clas-sifier is the criterion to discriminate the query-relevant samplesfrom the query-irrelevant samples. After the projection step, allpositive feedbacks are clustered together, while negative feed-backs are well separated from positive feedbacks by a maximummargin. Therefore, the resultant SVM classifier hyperplane inthis subspace will be much simpler and better than that in theoriginal high-dimensional feature space.Different from the classical subspace learning methods, e.g.,principal component analysis and linear discriminant analysis(LDA), which can only see the linear global Euclidean structureof samples, BMMA aims to learn a projection matrix suchthat, in the projected space, the positive samples have high localwithin-class similarity but the samples with different labels havehigh local between-class separability. To describe the algorithmclearly, we first introduce some notations of this approach.In each round of feedback iteration, there are samples. For simplicity, we assume that the firstsamples are positive feedbacks , the nextsamples are negative feedbacks , andall the others are unlabeled samples .Let be the class label of sample ; we denotefor positive feedbacks, for negative feedbacks, andfor unlabeled samples. To better show the relation-ship between the proposed approaches and the graph-embed-ding framework, we use the similar notations and equations inthe original graph-embedding framework, which provides us ageneral platform to develop various new approaches for dimen-sionality reduction. First, two different graphs are formed, i.e.,the intrinsic graph , which characterizes the local similarity ofthe feedback samples, and the penalty graph , which charac-terizes the local discriminant structure of the feedback samples.For all the positive feedbacks, we first compute the pairwisedistance between each pair of positive feedbacks. Then, for eachpositive feedback , we find its nearest-neighborhood pos-itive feedbacks, which can be represented as a sample set ,and put an edge between and its neighborhood positive feed-backs. Then, the intrinsic graph is characterized as follows:(13)if and orelse(14)where is a diagonal matrix whose diagonal elements are cal-culated by ; denotes the total number ofnearest-neighborhood positive sample pairs for each posi-tive feedback. Basically, the intrinsic graph measures the totalaverage distance of the nearest-neighborhood sample pairsand is used to characterize the local within-class compactnessfor all the positive feedbacks.For the penalty graph , its similarity matrix repre-sents geometric or statistical properties to be avoided and is usedas a constraint matrix in the graph-embedding framework. Inthe BMMA, the penalty graph is constructed to representthe local separability between the positive and negative classes.More strictly speaking, we expect that the total average marginbetween the sample pairs with different labels should be as largeas possible.For each feedback sample, we find its neighbor feedbackswith different labels and put edges between corresponding pairsof feedback samples with weights .Then, the penalty graph can be formed as follows:(15)if and orelse(16)where is a diagonal matrix whose diagonal elements arecalculated by ; denotes the total number ofneighborhood sample pairs with different labels. Similarly,the penalty graph measures the total average distance of thenearest-neighbor sample pairs in a different class and is used tocharacterize the local between-class separability.In the following, we describe how to utilize the graph-embed-ding framework to develop algorithms based on the designedintrinsic and penalty graphs. Different from the original formu-lation of the graph-embedding framework in [30], the BMMA
  5. 5. 2298 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012algorithm optimizes the objective function in a trace differenceform instead, i.e.,(17)As given in (17), we can notice that the objective functionworks in two ways, i.e., trying to maximizeand, at the same time, minimize . Intuitively,we can analyze the meaning of the objective function in (17) ge-ometrically. By formulating the objective function as a trace dif-ference form, we can regard it as the total average local marginbetween positive samples and negative samples. Therefore, (17)can be used as a criterion to discriminate the different classes.In [38], the maximum margin criterion (MMC) was presentedas an objective function with a similar difference form. The dif-ferences between BMMA and MMC are the definitions of theinterclass separability and the intraclass compactness. In MMC,both the interclass separability and the intraclass compactnessare defined as the same in LDA, which treats the two differentclasses equally, and MMC can only see the linear global Eu-clidean structure. In BMMA, the intraclass compactness is con-structed by only considering one class (e.g., positive feedbacks)and characterized by a sum of the distances between each posi-tive sample and its nearest neighbors in the same class. Theinterclass separability is defined by resorting to interclass mar-ginal samples instead of the mean vectors of different classesas in MMC. Without prior information on data distributions,BMMA can find more reliable low-dimensional representationsof the data compared with MMC [38] and also follow the orig-inal assumption in biased discriminant analysis (BDA) [20] (i.e.,all positive examples are alike; each negative example is nega-tive in its own way). It should be noted that previous methods[39], [40] that followed MMC cannot be directly used for theSVM RF in image retrieval because these methods treat sam-ples in different classes equally.In order to remove an arbitrary scaling factor in the projec-tion, we additionally require that is constituted by the unitvectors, i.e., , . This means that weneed to solve the following constraint optimization:(18)Note that we may also use other constraints instead. For ex-ample, we may require and then min-imize . It is easy to check that the aforemen-tioned maximum margin approach with such a constraint in factresults in the traditional MFA [30]. The only difference is thatFig. 2. (a) Red dots are positive samples, and blue dots are negative samplesin the original space. (b) BMMA approach. (c) Positive samples and negativesamples in the maximum margin subspace.(18) involves a constrained optimization problem, whereas thetraditional MFA solves an unconstrained optimization problem.The motivation for using constraint , ,is to avoid calculating the inverse of , which leads tothe potential “small sample size” problem. In order to solve theaforementioned constraint optimization problem, we introducea Lagrangian, i.e.,(19)with multipliers . Lagrangian should be maximized withrespect to both and . The condition is that, at the stationarypoint, the derivatives of respect to must vanish, i.e.,(20)and therefore(21)which means that the are the eigenvalues ofand are the corresponding eigenvectors. Thus, we have(22)Therefore, the objective function is maximized when iscomposed of the largest eigenvectors of . Here,by imposing constraint , , we need notto calculate the inverse of , and this allows us to avoidthe “small sample size” problem easily.BMMA can be illustrated in Fig. 2.In the previous subsection, we have formulated the BMMAalgorithm and have shown that the optimal projection matrixcan be obtained by generalized eigenvalue decompositionon a matrix. Then, the problem is how to determine an op-timal dimensionality for RF, i.e., the projected subspace. Toachieve such a goal, we give the detail of determining the op-timal dimensionality.In general(23)
  6. 6. ZHANG et al.: SEMISUPERVISED BIASED MAXIMUM MARGIN ANALYSIS FOR INTERACTIVE IMAGE RETRIEVAL 2299where values are the associated eigenvalues, and we have(24)To maximize the margin between the positive samples andnegative samples, we should preserve all the eigenvectors as-sociated with the positive eigenvalues. However, as indicatedin [33], for image retrieval, the orthogonal complement com-ponents are essential to capture the same concept shared byall positive samples. Based on this observation, we should alsopreserve the components associated with zero eigenvalues, al-though they do not contribute to maximize the margin. Thistechnique can effectively preserve more geometry properties ofthe feedbacks in the original high-dimensional feature space.Therefore, the optimal dimensionality of the projected subspacejust corresponds to the number of nonnegative eigenvalues ofthe matrix. Therefore, as compared with the original formula-tion of the graph-embedding framework in [30], the new for-mulation (18) can easily avoid the intrinsic “small sample size”problem and can also provide us with a simple way to determinethe optimal dimensionality for this subspace learning problem.BDA and its kernel version BiasMap [20] were first proposedto address the asymmetry between the positive and negativesamples in interactive image retrieval. However, to use BDA,the “small sample size” problem and the Gaussian assumptionfor positive feedbacks are two major challenges. While thekernel method BiasMap cannot exert its normal capability sincethe feature dimensionality is much higher than the number oftraining samples. Additionally, it is still problematic to deter-mine the optimal dimensionality of BDA and BiasMap forCBIR. Different from the original BDA, our BMMA algorithmis a local discriminant analysis approach, which does not makeany assumption on the distribution of the samples. Biased to-ward the positive samples, maximizing the objective function inthe projected space can push the nearby negative samples awayfrom the positive samples while pulling the nearby positivesamples toward the positive samples. Therefore, the definitionin (18) can maximize the overall average margin between thepositive samples and the negative samples. In such a way, eachsample in the original space is mapped onto a low-dimensionallocal maximum margin subspace in accordance with humanperception of the image contents.Since the graph-embedding technique is an effective way tocapture the intrinsic geometry structure in the original featurespace, we propose a way to incorporate the unlabeled samplesbased on the intrinsic graph, which is helpful in capturing themanifold structure of samples and alleviating the overfittingproblem. In the following, we design a regularization term basedon the intrinsic graph for the unlabeled samples in the imagedatabase.For each unlabeled sample , weexpect that the nearby unlabeled samples are likely to have thesimilar low-dimensional representations. Specifically, for eachunlabeled sample, we find its nearest-neighborhood unla-beled samples, which can be represented as a sample set ,and put an edge between the unlabeled and its neighborhoodunlabeled samples. Then, the intrinsic graph for the unlabeledsamples is characterized as follows:(25)iforelse(26)which reflects the affinity of the sample pairs; is a diag-onal matrix whose diagonal elements are calculated by; denotes the total number of nearest-neighbor-hood unlabeled sample pairs for each unlabeled sample.can be known as a Laplacian matrix. Hence, we callthis term as a Laplacian regularizer.The motivation for introducing this term is inspired by theregularization principle, which is the key to enhancing the gen-eralization and the robust performance of the approach in prac-tical applications. There are a lot of possible ways to choose aregularizer for the proposed BMMA. In this paper, we chose theLaplacian regularizer, which is largely inspired by the recentlyemerging manifold learning community. This scheme can pre-serve weak (probably correct) similarities between all unlabeledsample pairs and thus effectively integrate the similarity infor-mation of unlabeled samples into BMMA. By integrating theLaplacian regularizer into the supervised BMMA, we can easilyobtain SemiBMMA for the SVM RF, i.e.,(27)where is used to trade off the contributions of the labeled sam-ples and the unlabeled samples. Similarly, the solution of (27)is obtained by conducting the generalized eigenvalue decompo-sition, and is calculated as a set of eigenvectors.The difference between BMMA SVM and SemiBMMASVM for two classes of feedbacks can be shown in Fig. 3. TheSemiBMMA algorithm is illustrated in Table I.Then, all the unlabeled samples in the database are projectedonto this subspace. After the projection, the traditional SVM RFis executed on the new representations. Finally, similar to thetraditional SVM RF, we can measure the degree of relevancethrough the output of SVM, i.e., .IV. CBIR SYSTEMIn experiments, we use a subset of the Corel Photo Galleryas the test data to evaluate the performance of the proposedscheme. The original Corel Photo Gallery includes plenty ofsemantic categories, each of which contains 100 or more im-ages. However, some of the categories are not suitable for imageretrieval since some images with different concepts are in thesame category and many images with the same concept are indifferent categories. Therefore, the existing categories in theoriginal Corel Photo Gallery are ignored and reorganized into80 conceptual classes based on the ground truth, such as lion,
  7. 7. 2300 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012TABLE ISemiBMMAFig. 3. Illustration of the SVM hyperplane comparison between BMMA SVMand SemiBMMA SVM for two classes of feedbacks.castle, aviation, train, dog, autumn, cloud, and tiger. Finally, thetest database is comprised of totally 10 763 real-world images.Given a query image by the user, the CBIR system is expectedto feed back more semantically relevant images after each feed-back iteration [2]. However, during RF, the number of the rele-vant images is usually very small because of the semantic gap.At the same time, the user would not like to label a large numberof samples. The user also expects to obtain more relevant im-ages with only a few rounds of RF iterations. Keeping the sizeof labeled relevant images small and keeping the RF iterationsfew are two key issues in designing the image retrieval system.Therefore, we devise the following CBIR framework accord-ingly to evaluate the RF algorithms.From the flowchart in Fig. 4, we can notice that, when a queryimage is provided by the user, the image retrieval system firstextracts the low-level features. Then, all the images in the data-base are sorted based on a similarity metric, i.e., Euclidean dis-tance. If the user is satisfied with the results, the retrieval processis ended, and the results are presented to the user. However,because of the semantic gap, most of the time, the user is notsatisfied with the first retrieval results. Then, she/he will labelthe most semantically relevant images as positive feedbacks intop retrieval results. All of the remaining images in top resultsare automatically labeled by the system as the negative feed-backs. Based on the small-size positive and negative feedbacks,the RF model can be trained based on various existing tech-niques. Then, all the images in the database are resorted basedon a new similarity metric. After each round of retrieval, theuser will check whether the results are satisfied. If the user issatisfied with the results, then the process is ended; otherwise,the feedback process repeats until the user is satisfied with theretrieval results.Generally, the image representation is a crucial problem inCBIR. The images are usually represented by low-level fea-tures, such as color [3]–[5], texture [5]–[7], and shape [8]–[10],each of which can capture the content of an image to someextent.For color, we extracted three moments, i.e., color mean, colorvariance, and color skewness in each color channel (L, U, andV, respectively). Thus, a 9-D color moment is employed as thecolor features in our experiments to represent the color informa-tion. Then a 256-D HSV color histogram is calculated.Both hue and saturation are quantized into eight bins, and thevalues are quantized into four bins. These two kinds of visualfeatures are formed as color features.Comparing with the classical global texture descriptors (e.g.,Gabor features and wavelet features), the local dense featuresshow good performance in describing the content of an image.The Weber local descriptors (WLDs) [41] are adopted as featuredescriptors, which are mainly based on the mechanism of thehuman perception of a pattern. The WLD local descriptor resultsin a feature vector of 240 values.We employ the edge directional histogram [8] from the Ycomponent in YCrCb space to capture the spatial distributionof edges. The edge direction histogram is quantized into fivecategories including horizontal, 45 diagonal, vertical, 135 diag-onal, and isotropic directions to represent the edge features.
  8. 8. ZHANG et al.: SEMISUPERVISED BIASED MAXIMUM MARGIN ANALYSIS FOR INTERACTIVE IMAGE RETRIEVAL 2301Fig. 4. Flowchart of the CBIR system.Fig. 5. SVM hyperplane is diverse for different combinations of features.Fig. 6. SVM hyperplane is unstable and complex when dealing with small size of training set.Generally, these features are combined into a featurevector, which results in a vector with 510 values (i.e.,). Then, all feature componentsare normalized to normal distributions with zero mean and onestandard deviation to represent the images.V. EXPERIMENTAL RESULTS ON A REAL-WORLDIMAGE DATABASEA. Intrinsic Problems in Traditional Two-Class SVM RFAn image is usually represented as a high-dimensional fea-ture vector in CBIR. However, one key issue in RF is aboutwhich subset of features can reflect the basic properties of dif-ferent groups of feedback samples and benefit the constructionof the optimal classifier. This problem can be illustrated fromsome real-world data in RF. There are five positive samples andfive negative feedback samples. We randomly select two fea-tures to construct the optimal SVM hyperplane for three times.As shown in Fig. 5, we can see that the resultant SVM classi-fiers are diverse with different combinations of features.It is essential to obtain a satisfactory classifier when thenumber of available feedback samples is small, which is alwaysthe case in RF, particularly in the first few rounds of feedbacks.Therefore, we first show a simple example to simulate theunstable problem of SVM when dealing with a small numberof training samples. The open circles in Fig. 6 indicate thepositive feedback samples, and the plus points indicate thenegative samples in the RF. Fig. 6(a) shows an optimal hy-perplane, which is trained by the original training samples.Fig. 6(b) and (c) shows a different optimal hyperplane, whichare trained by the original training set with only one and twoincremental positive samples, respectively. From Fig. 6, we cansee that the hyperplane of the SVM classifier sharply changeswhen a new incremental sample is integrated into the originaltraining set. Additionally, we can also note that the optimalhyperplanes of SVM are much complex when the feedbackshave a complicated distribution.Note that the similar results have been indicated in the pre-vious research [25]. However, here, we have shown slightly dif-ferent problems in the traditional SVM RF, that is, distinct prop-erty of feedback samples in RF and unstable and complex hy-perplanes of the traditional SVM in the first few rounds of feed-backs.B. Features Extraction Based on Different MethodsSix experiments are conducted for comparing the BMMAwith the traditional LDA, the BDA method, and a graph-em-bedding approach, i.e., MFA, in finding the most discrimina-tive directions. We plot the directions that correspond to the
  9. 9. 2302 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012Fig. 7. Four feature extraction methods (i.e., LDA, BDA, MFA, and BMMA) for two classes of samples (i.e., positive samples and negative samples) with differentdistribution.largest eigenvalue of the decomposed matrices for LDA, BDA,MFA, and BMMA, respectively. From these examples, we canclearly notice that LDA can find the best discriminative direc-tion when the data from each class are distributed as Gaussianwith similar covariance matrices, as shown in Fig. 7(a) and (d),but it may confuse when the data distribution is more compli-cated, as given in Fig. 7(b), (c), (e), and (f). Biased toward thepositive samples, BDA can find the direction that the positivesamples are well separated with the negative samples when thepositive samples have a Gaussian distribution, but it may alsoconfuse when the distribution of the positive samples is morecomplicated. For instance, in Fig. 7(b), BDA can find the di-rection for distinguishing positive samples from negative ones.However, in Fig. 7(c), (e), and (f), when the positive sampleshave a more complicated distribution, the BDA algorithm obvi-ously fails. MFA can also find the discriminative direction whenthe distribution of negative feedbacks is simple, as shown inFig. 7(a)–(c). When the negative samples pose a more compli-cated distribution, MFA will fail as in Fig. 7(d)–(f). Biased to-ward positive samples, the BMMA method can find the mostdiscriminative direction for all the six experiments based onlocal analysis, since it does not make any assumptions on thedistributions of the positive and negative samples. It shouldbe noted that BMMA is a linear method, and therefore, weonly gave the comparison results of the aforementioned linearmethods.C. Statistical Experimental ResultsHere, we evaluate the performance of the proposed schemeon a real-world image database. We use precision–scope curve,precision rate, and standard deviation to evaluate the effective-ness of the image retrieval algorithms. The scope is specifiedby number of top-ranked images presented to the user. Theprecision is the major evaluation criterion, which evaluatesthe effectiveness of the algorithms. The precision–scope curvedescribes the precision with various scopes and can give theoverall performance evaluation of the approaches. The preci-sion rate is the ratio of the number of relevant images retrievedto the top retrieved images, which emphasizes the precisionat a particular value of scope. Standard deviation describesthe stability of different algorithms. Therefore, the precisionevaluates the effectiveness of a given algorithm, and the cor-responding standard deviation evaluates the robustness of thealgorithm. We empirically select parameters andaccording to manifold learning approaches. Considering thecomputable efficiency, we randomly select 300 unlabeledsamples in each round of feedback iteration. For the tradeoffparameter between labeled samples and unlabeled samples,we simply set . For all the SVM-based algorithms, wechoose the Gaussian kernel, i.e.,(28)Note that, the kernel parameters and the kernel type cansignificantly affect the performance of retrieval. For differentimage database, we should tune the kernel parameters and thekernel type carefully. In our experiments, we determine thekernel parameters from a series of values according to the per-formance. Moreover, much better performance can be achievedby tuning the kernel parameters further for different queries.1) Experiments on a Small-Scale Image Database: In orderto show how efficient the proposed BMMA combined withSVM is in dealing with the asymmetric properties of feed-back samples, the first evaluation experiment is executed ona small-scale database, which includes 3899 images with 30different categories. We use all 3899 images in 30 categoriesas queries. Some example categories used in experiments areshown in Fig. 8. To avoid the potential problem caused by the
  10. 10. ZHANG et al.: SEMISUPERVISED BIASED MAXIMUM MARGIN ANALYSIS FOR INTERACTIVE IMAGE RETRIEVAL 2303Fig. 8. Example categories used in the small-size image database.Fig. 9. Average precisions in the top 20 results of SVM, OCCA SVM, and BMMA SVM after two rounds of feedback.asymmetric amount of positive and negative feedbacks [25], weselected an equal number of positive and negative feedbackshere. In practice, the first five query-relevant images and firstfive irrelevant images in the top 20 retrieved images in theprevious iterations were automatically selected as positive andnegative feedbacks, respectively.In [33], OCCA was proposed to only analyze the positivefeedbacks for the SVM RF in a retrieval task. Hence, wecompared the RF performance of BMMA combined with SVM(BMMA SVM), OCCA combined with SVM (OCCA SVM),and the traditional SVM here. In real world, it is not practical torequire the user to label many samples. Therefore, a small sizeof training samples will cause the severe unstable problem inSVM RF (as shown in Section V-A). Fig. 9 shows the precisionsin the top 20 results after the second round of feedback iterationfor all the 30 categories. The baseline curve describes the initialretrieval results without any feedback information. Specially,at the beginning of retrieval, the Euclidean distances in theoriginal high-dimensional space are used to rank the images inthe database. After the user provides RFs, the traditional SVM,BMMA SVM, and OCCA SVM algorithms are then applied tosort the images in the database. As shown, the retrieval perfor-mance of these algorithms varies with different categories. Forsome easy categories, all the algorithms can perform well (forCategories 2 and 4, even the baseline can achieve over 90% forprecision). For some hard categories, all the algorithms poorlyperform (e.g., Categories 18, 20, and 24). After two rounds offeedbacks, all the algorithms are significantly better than thebaseline, and this indicates that the RFs provided by the userare very helpful in improving the retrieval performance.Fig. 10 shows the average precision–scope curves of the al-gorithms for the first and second iterations. We can notice thatboth BMMA SVM and OCCA SVM can perform much betterthe traditional SVM on the entire scope, particularly the firstround of feedback. The main reason is that, in the first roundof feedback iteration, the number of training samples is particu-larly small (usually eight to ten training samples totally), andthis will make SVM extremely poorly perform. The BMMASVM and OCCA SVM algorithms can significantly improvethe performance of SVM by treating the positive and negativefeedbacks unequally. Therefore, we can conclude that the tech-nique, which asymmetrically treats the feedback samples (i.e.,biased toward the positive feedbacks), can significantly improvethe performance of SVM RF, which treats the feedback samplesequally. As shown in Fig. 10(b), with the number of feedbacksincreasing, the performance difference between the enhancedalgorithms and the traditional SVM gets small. Generally, byiteratively adding the user’s feedbacks, more samples will befed back as training samples and will make the performanceof SVM much more stable. Meanwhile, the dimensionality ofBMMA and OCCA decreases with the increasing of the posi-tive feedbacks. Consequently, the performance of BMMA SVMand OCCA SVM will be degraded by overfitting. Therefore, theperformance difference between the enhanced algorithms andthe traditional SVM gets small. However, the performance ofthe first few rounds of feedbacks is usually the most importantsince the user would not like to provide more rounds of feed-back iteration. In the first few rounds of feedbacks, the classi-fier trained based on few labeled training samples is not reliable,but its performance can be improved when more images are la-beled by the user in the subsequent feedback iterations. BothBMMA and OCCA can significantly improve the performanceof the traditional SVM in the first two iterations. Therefore, wecan conclude that BMMA can effectively integrate the distinctproperties of two groups of feedback samples into the retrievalprocess and thus enhance the performance.
  11. 11. 2304 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012Fig. 10. Precision–scope curves after the first and second feedbacks for SVM, OCCA SVM, and BMMA SVM.2) Experiments on a Large-Scale Image Database: We de-signed a slightly different feedback scheme to model the real-world retrieval process. In a real image retrieval system, a queryimage is usually not in the image database. To simulate such anenvironment, we use fivefold cross validation to evaluate the al-gorithms. More precisely, we divide the whole image databaseinto five subsets of equal size. Thus, there are 20% images percategory in each subset. At each run of cross validation, onesubset is selected as the query set, and the other four subsetsare used as the database for retrieval. Then, 400 query samplesare randomly selected from the query subset, and the RF is au-tomatically implemented by the system. For each query image,the system retrieves and ranks the images in the database, andnine RF iterations are automatically executed.At each iteration of the RF process, top 20 images are pickedfrom the database and labeled as relevant and irrelevant feed-backs. Generally, in real-world retrieval systems, the negativesamples usually largely outnumber the positive ones. To simu-late such a case in the retrieval system, the first three relevantimages are labeled as positive feedbacks, and all the other irrel-evant images in top 20 results are automatically marked as neg-ative feedbacks. Note that the images that have been selected atthe previous iterations are excluded from later selections. Theexperimental results are shown in Figs. 11 and 12. The averageprecision and the standard deviation are computed from the five-fold cross validation. To demonstrate the effectiveness of theproposed scheme, we compare them with the traditional SVM,OCCA SVM, BDA SVM, and one-class SVM (OneSVM). Thetraditional SVM regards the RF as a strict two-class classifica-tion problem, with equal treatments on both positive and nega-tive samples. OCCA tries to find a subspace, in which all posi-tive samples are merged, and then, the traditional SVM are im-plemented to retrieve the relevant images to the query image.For BDA, we select all the eigenvectors with the eigenvalueslarger than 1% of the maximum eivenvalues, and then, the tra-ditional SVM are used to classify the relevant and irrelevantimages, which is the common way to select the dimensionalityof the subspace. OneSVM assumes the RF as a one-class clas-sification problem and estimates the distribution of the targetimages in the feature space.Figs. 11 and 12 show the average precision and the stan-dard deviation curves of different algorithms, respectively.SemiBMMA SVM outperforms all the other algorithms on theentire scope. Both BMMA and OCCA can improve the perfor-mance of the traditional SVM RF, as shown in Fig. 11(a)–(d).Comparing with OCCA SVM, BMMA SVM performs muchbetter for all the top results, since BMMA takes both the posi-tive and negative feedbacks into consideration. However, bothBMMA and OCCA will encounter the overfitting problem,i.e., both of them combined with SVM will degrade the per-formance of SVM after a few rounds of feedbacks, althoughthey can improve the performance of SVM in the first fewrounds of feedback. As shown in Fig. 11(d)–(f), with the in-crease in rounds of feedbacks, OCCA SVM poorly performs incomparison with the traditional SVM. At the same time, the per-formance difference between BMMA SVM and the traditionalSVM gets smaller. SemiBMMA combined with SVM cansignificantly improve the performance of the traditional SVM,since it can effectively utilize the basic property of the differentgroups of the feedback samples and integrate the informationof the unlabeled samples into the construction of the classifier.For the top ten results, BDA SVM can achieve better resultthan the traditional SVM RF. However, the BDA algorithmstill discards much information contained in the orthogonalcomplement components of the positive samples. Therefore,BDA combined with traditional SVM performs much worsethan the traditional SVM RF, as shown in Fig. 11(b)–(f). Al-though OneSVM tries to estimate the distribution of the targetimages in the feature space, it cannot work well without thehelp of negative samples.Considering the stability of the algorithms, we can also noticethat SemiBMMA SVM and BMMA SVM perform best amongall the algorithms for the top 10, 20, and 30 results. AlthoughOneSVM shows good stability for the top 40, 50, and 60 results,its average precision for retrieval is too low.We should indicate that the performance difference of algo-rithms between experiments in Section V-C1 and experimentshere is mainly caused by the different experimental setting, be-cause the number of positive and negative feedbacks is equalin Section V-C1, while negative feedbacks largely outnumberpositive feedbacks in Section V-C2. Additionally, the perfor-mance of SemiBMMA SVM does not perform better comparedwith BMMA SVM in the first two rounds of feedback iterationsfor most of the results. This is mainly because the maximummargin between different classes is essentially important whenthe number of training samples is extremely small.The detailed results of all the algorithms after the ninth feed-back are shown in Table II. As shown, SemiBMMA combined
  12. 12. ZHANG et al.: SEMISUPERVISED BIASED MAXIMUM MARGIN ANALYSIS FOR INTERACTIVE IMAGE RETRIEVAL 2305Fig. 11. Average precisions in the top 10–60 results of the six approaches from the fivefold cross validation.Fig. 12. Standard deviations in the top 10–60 results of the six approaches from the fivefold cross validation.with SVM integrates all the available information into the RFiteration and achieves much better performance compared withother approaches for all the top results. BMMA SVM still ob-tains satisfactory performance, as compared with the traditionalSVM and OCCA SVM. Therefore, we can conclude that theproposed BMMA and SemiBMMA combined with the SVM RFhave shown much better performance than the traditional SVMRF (i.e., directly using the SVM as an RF scheme) for CBIR.3) Visualization of the Retrieval Results: In the previous sub-sections, we have presented some statistically quantitative re-sults of the proposed scheme. Here, we show the visualizationof retrieval results.In experiments, we randomly select some images (e.g., bob-sled, cloud, cat, and car) as the queries and perform the RFprocess based on the ground truth. For each query image, wedo four RF iterations. For each RF iteration, we randomly se-lect some relevant and irrelevant images as positive and nega-tive feedbacks from the first screen, which contains 20 images intotal. The number of selected positive and negative feedbacks isabout 4, respectively. We choose them according to the ground
  13. 13. 2306 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012TABLE IIAVERAGE PRECISIONS IN TOP RESULTS OF THE SIX ALGORITHMS AFTER THE NINTH FEEDBACK ITERATION (MEAN STANDARD DEVIATION)Fig. 13. Top ten results for four different query images based on initial results and SemiBMMA after four rounds of feedback iteration.truth of the images, i.e., whether they share the same conceptwith the query image or not. Fig. 13 shows the experimental re-sults. The query images are given as the first image of each row.We show the top one to ten images of initial results without feed-back and SemiBMMA SVM after four feedback iterations, re-spectively, and incorrect results are highlighted by green boxes.From the results, we can notice that our proposed scheme cansignificantly improve the performance of the system. For thefirst, second, and fourth query images, our system produces tenrelevant images out of the top ten retrieved images. For the thirdquery image, our system produces nine relevant images out ofthe top ten retrieved images. Therefore, SemiBMMA SVM caneffectively detect the homogeneous concept shared by the posi-tive samples and hence improve the performance of the retrievalsystem.VI. CONCLUSION AND FUTURE WORKSVM-based RF has been widely used to bridge the semanticgap and enhance the performance of CBIR systems. However,directly using SVM as an RF scheme has two main drawbacks.First, it treats the positive and negative feedbacks equally, al-though this assumption is not appropriate since all positive feed-backs share a common concept, while each negative feedbackdiffers in diverse concepts. Second, it does not take into ac-count the unlabeled samples, although they are very helpfulin constructing a good classifier. In this paper, we have ex-plored solutions based on the argument that different semanticconcepts live in different subspaces and each image can livein many different subspaces. We have designed BMMA andSemiBMMA to alleviate the two drawbacks in the traditionalSVM RF. The novel approaches can distinguish the positivefeedbacks and the negative feedbacks by maximizing the localmargin and integrating the information of the unlabeled samplesby introducing a Laplacian regularizer. Extensive experimentson a large real-world Corel image database have shown that theproposed scheme combined with the traditional SVM RF cansignificantly improve the performance of CBIR systems.Despite the promising results, several questions remain to beinvestigated in our future work. First, this approach involves
  14. 14. ZHANG et al.: SEMISUPERVISED BIASED MAXIMUM MARGIN ANALYSIS FOR INTERACTIVE IMAGE RETRIEVAL 2307dense-matrix eigen decomposition, which can be computation-ally expensive both in time and memory. Therefore, an effec-tive technique for computation is required to alleviate the draw-back. Second, theoretic questions need to be investigated re-garding how the proposed scheme affects the generalizationerror of classification models. More specifically, we expect toget a better tradeoff between the integration of the distinct prop-erties of feedbacks and the generalization error of the classifier.ACKNOWLEDGMENTThe authors would like to thank Prof. J. Z. Wang (with theCollege of Information Sciences and Technology, PennsylvaniaState University) for kindly providing the Corel Image Gallery,The authors would also like to thank the handling AssociateEditor and the three anonymous reviewers for their constructivecomments on this manuscript in the three rounds of review.REFERENCES[1] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain,“Content-based image retrieval at the end of the early years,” IEEETrans. Pattern Anal. Mach. Intell., vol. 22, no. 12, pp. 1349–1380, Dec.2000.[2] R. Datta, D. Joshi, J. Li, and J. Z. Wang, “Image retrieval: Ideas, influ-ences, and trends of the new age,” ACM Comput. Surv., vol. 40, no. 2,pp. 1–60, Apr. 2008.[3] M. J. Swain and D. H. Ballard, “Color indexing,” Int. J. Comput. Vis.,vol. 7, no. 1, pp. 11–32, Nov. 1991.[4] G. Pass, R. Zabih, and J. Miller, “Comparing images using color co-herence vectors,” in Proc. ACM Multimedia, 1996, pp. 65–73.[5] Y. Rubner, J. Puzicha, C. Tomasi, and J. M. Buhmann, “Empirical eval-uation of dissimilarity measures for color and texture,” Comput. Vis.Image Understand., vol. 84, no. 1, pp. 25–43, Oct. 2001.[6] H. Tamura, S. Mori, and T. Yamawaki, “Texture features correspondingto visual perception,” IEEE Trans. Syst., Man, Cybern., vol. SMC-8,no. 6, pp. 460–473, Jun. 1978.[7] J. Mao and A. Jain, “Texture classification and segmentation using mul-tiresolution simultaneous autoregressive models,” Pattern Recognit.,vol. 25, no. 2, pp. 173–188, Feb. 1992.[8] A. Jain and A. Vailaya, “Image retrieval using color and shape,” PatternRecognit., vol. 29, no. 8, pp. 1233–1244, Aug. 1996.[9] W. Niblack, R. Barber, W. Equitz, M. Flickner, E. Glasman, D.Petkovic, P. Yanker, C. Faloutsos, and G. Taubino, “The QBICproject:Querying images by content using color, texture, and shape,” in Proc.SPIE—Storage and Retrieval for Images and Video Databases, Feb.1993, pp. 173–181.[10] A. Jain and A. Vailaya, “Shape-based retrieval: A case study withtrademark image databases,” Pattern Recognit., vol. 31, no. 9, pp.1369–1390, Sep. 1998.[11] Y. Rui, T. Huang, M. Ortega, and S. Mehrotra, “Relevance feedback: Apower tool in interactive content-based image retrieval,” IEEE Trans.Circuits Syst. Video Technol., vol. 8, no. 5, pp. 644–655, Sep. 1998.[12] X. Zhou and T. Huang, “Relevance feedback for image retrieval: Acomprehensive review,” Multimedia Syst., vol. 8, no. 6, pp. 536–544,Apr. 2003.[13] Y. Rui, T. S. Huang, and S. Mehrotra, “Content-based image retrievalwith relevance feedback in MARS,” in Proc. IEEE Int. Conf. ImageProcess., 1997, vol. 2, pp. 815–818.[14] T. S. Huang and X. S. Zhou, “Image retrieval by relevance feedback:From heuristic weight adjustment to optimal learning methods,” inProc. IEEE ICIP, Thessaloniki, Greece, Oct. 2001, pp. 2–5.[15] J. Laaksonen, M. Koskela, and E. Oja, “PicSOM: Self-organizing mapsfor content-based image retrieval,” in Proc. IJCNN, Washington, DC,1999, pp. 2470–2473.[16] Y. Chen, X.-S. Zhou, and T.-S. Huang, “One-class SVM for learningin image retrieval,” in Proc. IEEE ICIP, 2001, pp. 815–818.[17] C. Hoi, C. Chan, K. Huang, M. R. Lyu, and I. King, “Biased supportvector machine for relevance feedback in image retrieval,” in Proc.IJCNN, Budapest, Hungary, Jul. 25–29, 2004, pp. 3189–3194.[18] X. He, D. Cai, and J. Han, “Learning a maximum margin subspace forimage retrieval,” IEEE Trans. Knowl. Data Eng., vol. 20, no. 2, pp.189–201, Feb. 2008.[19] W. Bian and D. Tao, “Biased discriminant Euclidean embedding forcontent-based image retrieval,” IEEE Trans. Image Process., vol. 19,no. 2, pp. 545–554, Feb. 2010.[20] X. Zhou and T. Huang, “Small sample learning during multimedia re-trieval using BiasMap,” in Proc. IEEE Int. Conf. Comput. Vis. PatternRecog., 2001, vol. 1, pp. 11–17.[21] D. Tao, X. Tang, X. Li, and Y. Rui, “Kernel direct biased discriminantanalysis: A new content-based image retrieval relevance feedback al-gorithm,” IEEE Trans. Multimedia, vol. 8, no. 4, pp. 716–724, Aug.2006.[22] D. Xu, S. Yan, D. Tao, S. Lin, and H. Zhang, “Marginal Fisher Analysisand its variants for human gait recognition and content-based imageretrieval,” IEEE Trans. Image Process., vol. 16, no. 11, pp. 2811–2821,Nov. 2007.[23] D. Cai, X. He, and J. Han, “Regularized regression on image manifoldfor retrieval,” in Proc. 9th ACM SIGMM Int. Workshop MIR, Augsburg,Germany, Sep. 2007, pp. 11–20.[24] P. Hong, Q. Tian, and T. S. Huang, “Incorporate support vector ma-chines to content-based image retrieval with relevant feedback,” inProc. IEEE ICIP, Vancouver, BC, Canada, 2000, pp. 750–753.[25] D. Tao, X. Tang, X. Li, and X. Wu, “Asymmetric bagging and randomsubspace for support vector machines-based relevance feedback inimage retrieval,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no.7, pp. 1088–1099, Jul. 2007.[26] G. Guo, A. K. Jain, W. Ma, and H. Zhang, “Learning similarity mea-sure for natural image retrieval with relevance feedback,” IEEE Trans.Neural Netw., vol. 13, no. 4, pp. 811–820, Jul. 2002.[27] V. Vapnik, The Nature of Statistical Learning Theory. New York:Springer-Verlag, 1995.[28] S. Tong and E. Chang, “Support vector machine active learning forimage retrieval,” in Proc. ACM Multimedia, 2001, pp. 107–118.[29] J. Li, N. Allinson, D. Tao, and X. Li, “Multitraining support vectormachine for image retrieval,” IEEE Trans. Image Process., vol. 15, no.11, pp. 3597–3601, Nov. 2006.[30] S. Yan, D. Xu, B. Zhang, H. Zhang, Q. Yang, and S. Lin, “Graph em-bedding and extensions: A general framework for dimensionality re-duction,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 1, pp.40–51, Jan. 2007.[31] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction bylocally linear embedding,” Science, vol. 290, no. 22, pp. 2323–2326,Dec. 2000.[32] J. Tenenbaum, V. Silva, and J. Langford, “A global geometric frame-work for nonlinear dimensionality reduction,” Science, vol. 290, no.22, pp. 2319–2323, Dec. 2000.[33] D. Tao, X. Tang, and X. Li, “Which components are important for in-teractive image searching?,” IEEE Trans. Circuits Syst. Video Technol.,vol. 18, no. 1, pp. 3–11, Jan. 2008.[34] M. Belkin and P. Niyogi, “Laplacian Eigenmaps and spectral tech-niques for embedding and clustering,” in Proc. Adv. NIPS, 2001, vol.14, pp. 585–591.[35] X. He, D. Cai, S. Yan, and H. Zhang, “Neighborhood preserving em-bedding,” in Proc. IEEE ICCV, 2005, pp. 1208–1213.[36] X. He and P. Niyogi, “Locality preserving projections,” in Proc. Adv.NIPS, 2004, vol. 16, pp. 153–160.[37] D. Cai, X. He, and J. Han, “Isometric projection,” in Proc. 22nd Conf.Artif. Intell. (AAAI), Vancouver, BC, Canada, Jul. 2007, pp. 528–533.[38] H. Li, T. Jiang, and K. Zhang, “Efficient and robust feature extractionby maximum margin criterion,” in Proc. Adv. NIPS, 2004, pp. 157–165.[39] W. Yang, C. Sun, H. S. Du, and J. Yang, “Feature extraction usingLaplacian maximum margin criterion,” Neural Process. Lett., vol. 33,no. 1, pp. 99–110, Feb. 2011.[40] F. Wang and C. Zhang, “Feature extraction by maximizing the averageneighborhood margin,” in Proc. IEEE Int. Conf. CVPR, 2007, pp. 1–8.[41] J. Chen, S. Shan, G. Zhao, X. Chen, and W. Gao, “Matti Pietikainen,WLD: A robust descriptor based on Weber’s law,” IEEE Trans. PatternAnal. Mach. Intell., vol. 32, no. 9, pp. 1705–1720, Sep. 2010.
  15. 15. 2308 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012Lining Zhang (S’11) received the B.Eng. degree andthe M.Eng. degree in electronic engineering fromXidian University, Xi’an, China, in 2006 and 2009,respectively. He is currently working toward thePh.D. degree at Nanyang Technological University,Singapore.His research interests include computer vision,machine learning, multimedia information retrieval,data mining, and computational intelligence.Lipo Wang (M’97–SM’98) received the B.S. degreefrom the National University of Defense Technology,Changsha, China, in 1983, and the Ph.D. degree fromLouisiana State University, Baton Rouge, in 1988.He is currently with the School of Electrical andElectronic Engineering, Nanyang TechnologicalUniversity, Singapore. He has authored or coau-thored over 200 papers (of which more than 80 arein journals). He is the holder of a U.S. patent inneural networks. He has coauthored two monographsand edited or coedited 15 books. He was/will bekeynote/panel speaker for several international conferences. His researchinterest is computational intelligence with applications to bioinformatics, datamining, optimization, and image processing.Dr. Wang is/was an Associate Editor/Editorial Board Member of 20 interna-tional journals, including the IEEE TRANSACTIONS ON NEURAL NETWORKS, theIEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, and the IEEETRANSACTIONS ON EVOLUTIONARY COMPUTATION. He is an elected memberof the AdCom (Board of Governors, 2010–2012) of the IEEE ComputationalIntelligence Society (CIS) and served as the IEEE CIS Vice President for Tech-nical Activities (2006–2007) and the Chair of Emergent Technologies TechnicalCommittee (2004–2005). He is an elected member of the Board of Governorsof the International Neural Network Society (2011–2013) and a CIS Represen-tative to the AdCom of the IEEE Biometrics Council. He was the Presidentof the Asia-Pacific Neural Network Assembly (APNNA) in 2002/2003 and re-ceived the 2007 APNNA Excellent Service Award. He was the Founding Chairof both the IEEE Engineering in Medicine and Biology Singapore Chapter andthe IEEE Computational Intelligence Singapore Chapter. He serves/served asthe IEEE EMBC 2011 and 2010 Theme Cochair, the International Joint Con-ference on Neural Networks (IJCNN) 2010 Technical Co-Chair, the CEC 2007Program Cochair, the IJCNN 2006 Program Chair, as well as on the steering/ad-visory/organizing/program committees of over 180 international conferences.Weisi Lin (M’92–SM’98) received the B.Sc. degreein electronics and the M.Sc. degree in digital signalprocessing from Zhongshan University, Guangzhou,China, in 1982 and 1985, respectively, and the Ph.D.degree in computer vision from King’s College,London University, London, U.K., in 1992.He taught and conducted research with ZhongshanUniversity; Shantou University, Shantou, China;Bath University, Bath, U.K.; the National Universityof Singapore, Singapore; the Institute of Microelec-tronics, Singapore; and the Institute for InfocommResearch, Singapore. He has been the Project Leader of 13 major successfullydelivered projects in digital multimedia technology development. He alsoserved as the Laboratory Head of Visual Processing and the Acting DepartmentManager of Media Processing with the Institute for Infocomm Research. Heis currently an Associate Professor with the School of Computer Engineering,Nanyang Technological University, Singapore. He is the holder of ten patents.He edited one book, authored one book and five book chapters, and haspublished over 180 refereed papers in international journals and conferences.His areas of expertise include image processing, perceptual modeling, videocompression, multimedia communication, and computer vision.Dr. Lin is a Fellow of the Institution of Engineering Technology. He isalso a Chartered Engineer (U.K.). He organized special sessions in the IEEEInternational Conference on Multimedia and Expo (ICME 2006), the IEEEInternational Workshop on Multimedia Analysis and Processing (2007), theIEEE International Symposium on Circuits and Systems (ISCAS 2010),the Pacific-Rim Conference on Multimedia (PCM 2009), the SPIE VisualCommunications and Image Processing (VCIP 2010), the Asia Pacific Signaland Information Processing Association (APSIPA 2011), the MobiMedia2011, and the ICME 2012. He gave invited/keynote/panelist talks in theInternational Workshop on Video Processing and Quality Metrics (2006), theIEEE International Conference on Computer Communications and Networks(2007), the SPIE VCIP 2010, and the IEEE Multimedia CommunicationTechnical Committee (MMTC) Interest Group of Quality of Experience forMultimedia Communications (2011), and tutorials in the PCM 2007, the PCM2009, the IEEE ISCAS 2008, the IEEE ICME 2009, the APSIPA 2010, and theIEEE International Conference on Image Processing (2010). He is currentlyon the editorial boards of the IEEE TRANSACTIONS ON MULTIMEDIA, the IEEESIGNAL PROCESSING LETTERS, and the Journal of Visual Communication andImage Representation, and four IEEE Technical Committees. He cochairs theIEEE MMTC Special Interest Group on Quality of Experience. He has been onTechnical Program Committees and/or Organizing Committees of a number ofinternational conferences.