21. a memory learning framework for effective image retrieval


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

21. a memory learning framework for effective image retrieval

  1. 1. IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 4, APRIL 2005 511 A Memory Learning Framework for Effective Image Retrieval Junwei Han, King N. Ngan, Fellow, IEEE, Mingjing Li, and Hong-Jiang Zhang, Fellow, IEEE Abstract—Most current content-based image retrieval systems to global feature-based systems, these schemes are designedare still incapable of providing users with their desired results. The to search for “things” by extracting local features from seg-major difficulty lies in the gap between low-level image features mented regions, and describing images on the region or objectand high-level image semantics. To address the problem, this studyreports a framework for effective image retrieval by employing level [5]. The performance of these systems mainly relies ona novel idea of memory learning. It forms a knowledge memory the results of segmentation. Therefore, they cannot generate ex-model to store the semantic information by simply accumulating tremely good performance since the image segmentation is stilluser-provided interactions. A learning strategy is then applied to an open problem in computer vision so far.predict the semantic relationships among images according to the The limited retrieval accuracy of image-centric retrieval sys-memorized knowledge. Image queries are finally performed basedon a seamless combination of low-level features and learned se- tems is essentially due to the inherent gap between semanticmantics. One important advantage of our framework is its ability concepts and low-level features. In order to reduce the gap, theto efficiently annotate images and also propagate the keyword an- interactive relevance feedback (RF) is introduced into CBIR.notation from the labeled images to unlabeled images. The pre- RF, originally developed for textural document retrieval [8], is asented algorithm has been integrated into a practical image re- supervised learning algorithm used to improve the performancetrieval system. Experiments on a collection of 10 000 general-pur-pose images demonstrate the effectiveness of the proposed frame- of information systems. Its basic idea is to incorporate humanwork. perception subjectivity into the query process and provide users with the opportunity to evaluate the retrieval results. The sim- Index Terms—Annotation propagation, image retrieval,memory learning, relevance feedback, semantics. ilarity measures are automatically refined on the basis of these evaluations. After RF for CBIR was first proposed by Rui et al. [9], this area of research has attracted much attention and I. INTRODUCTION become active in the CBIR community. Many groups have re- ported their RF techniques [5], [7], [9]–[14].D UE to the rapidly growing amount of digital image data on the Internet and in digital libraries, there is a great need forlarge image database management and effective image retrieval Recently, many researchers began to consider the RF as a learning or classification problem. That is, a user provides posi-tools. Content-based image retrieval (CBIR) is the set of tech- tive and/or negative examples, and the systems learn from suchniques for searching for similar images from an image database examples to refine the retrieval results or train a classifier by theusing automatically extracted image features. labeled examples to separate all data into relevant and irrelevant Tremendous research has been devoted to CBIR and a variety groups. Hence, many classical machine learning schemes mayof solutions have been proposed within the past ten years. By be applied to the RF, which include decision tree learning [17],and large, research activities in CBIR have progressed in three Bayesian learning [10], [7], support vector machines (SVM)major directions [5]: global features based, object/region-level [14], boosting [18], and so on. For the latest developments infeatures based, and relevance feedback. Initially, developed sys- RF, please refer to [15] and [16].tems [1], [2] are usually based on the carefully selected global Although RF can significantly improve the retrieval perfor-image features, such as color, texture or shapes, and prefixed mance, its applicability still suffers from three inherent draw-similarity measure. They are easy to implement and perform backs.well for images that are either simple or contain few semantic 1) Incapability of capturing semantics. Most RF tech-contents (for example, medical images and face images). How- niques in CBIR absolutely copy ideas from texturalever, for these systems, it is impossible to search for objects information retrieval. They simply replace keywords withor regions of the image. Therefore, the second group of sys- low-level features and then adopt the vector model fortems [3]–[5] is proposed on image segmentation. Contrasting document retrieval to perform interactions. This strategy works well underlying the premise that the low-level Manuscript received November 3, 2003; revised March 24, 2004. This work features are as powerful in representing the semanticwas supported in part by Nanyang Technological University, Singapore. The content of images, as keywords in representing texturalassociate editor coordinating the review of this manuscript and approving it for information. Unfortunately, this requirement is often notpublication was Dr. Gopal Pingali. J. Han and K. N. Ngan are with the Department of Electronic Engineering, satisfied. Therefore, it is difficult to capture high-levelThe Chinese University of Hong Kong, Shatin, N.T., Hong Kong (e-mail: jun- semantics of images when only low-level features areweihan@hotmail.com; knngan@ee.cuhk.edu.hk). used in RF. M. Li and H.-J. Zhang are with Microsoft Research Asia, Beijing 100080,China (e-mail: mjli@microsoft.com; hjzhang@microsoft.com). 2) Scarcity and imbalance of feedback examples. Very Digital Object Identifier 10.1109/TIP.2004.841205 few users are willing to go through endless iterations of 1057-7149/$20.00 © 2005 IEEE
  2. 2. 512 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 4, APRIL 2005 feedback with the hopes of getting the best results. Hence, are not very useful as the user log is scarce with respect to the number of feedback examples labeled by users during the scale of image database. The other problem is that most a RF session is far smaller than the dimension of low-level long-term learning approaches only recommend the memorized features that characterize an image. Because of such small semantic knowledge to users but lack a learning ability to pre- training data sizes, many classical learning algorithms dict hidden semantics in terms of acquired semantics. Strictly cannot give exciting results. Furthermore, in the RF sce- speaking, there is no learning or limited learning in such ex- nario, the number of labeled negative examples is usu- isting long-term learning systems. ally greater than the number of labeled positive examples. In practical image retrieval systems, many users prefer using As pointed out in [19], the imbalance of training data al- keywords to conduct queries [33]. Hence, images must first be ways makes classification learning less reliable. Thus, the annotated to support keyword searches. In general, two ways are scarcity of feedback examples, especially positive exam- employed to annotate images: full annotation and partial annota- ples, definitely limits the accuracy of RF. tion [31]. The former manually labels all images in the database. 3) Lack of the memory mechanism. A disadvantage of the Although manual annotation is considered a best choice by ac- traditional RF is that the potentially obtained semantic curacy, it is not a feasible solution because human labeling is knowledge in the feedback processes of one query session tedious and expensive. This was what motivated CBIR research is not memorized to continuously improve the retrieval a few years ago. The latter only first manually marks a small performance [6], [15]. Even with the same query, a user subset of images. Then, the annotations are propagated from the will have to go through the same, often tedious, feedback small number of marked images to a large number of unmarked process to get the same result, despite the fact the user has images according to the similarity measure or classical learning given the same query and feedbacks before. Hence, there algorithms. is an urgent need of building a memory mechanism to Lately, research on image annotation and annotation propaga- accumulate and learn the semantic information provided tion are attracting growing interests [15], [29]–[33], [37], [38]. by past user interactions. In [29] and [30], annotations are propagated by visual similarity To overcome the aforementioned difficulties, another school measures. Zhang et al. [32] and Chang et al. [33] focus on ac-of thought [6], [20]–[23], generally called long-term learning, tive learning for annotation propagation. Liu et al. [31] use thehas become available in recent years. They memorize and accu- RF to improve annotation performance. Zhang et al. [15] fur-mulate users’ preferences in the RF process. The historical re- ther perform annotation propagation by integrating RF with atrieval experience will then be used to guide new users’ queries. Bayesian model. Li et al. [37] and Barnard et al. [38] apply ma-These long-term learning algorithms are mainly based on pre- chine learning to predict words for images.vious users’ behaviors, which basically embody more semantic Despite many efforts, the accuracy of propagated annotationinformation than low-level features. They try to narrow that is still limited. The problem stems from the fact that imageswell-known gap by other persons’ subjectivities because image close to each other in low-level feature space do not share thecontent understanding is very difficult with the present state of same semantic meaning. On the contrary, most of the above-the computer vision and image processing technology. mentioned systems assume that images located near to each Actually, the idea of long-term learning in CBIR is borrowed other in the feature space are likely to relate to similar keywords.from the work of collaborative filtering [24]–[26] and link struc- As analyzed above, when designing an effective image re-ture analysis [27], [28] in the web information retrieval. Collab- trieval system, at least the following two issues should be con-orative filtering is a technique of predicting the preferences of sidered: how to reduce the gap between low-level features andunknown users by using known attitudes of other users. It is built high-level semantic concepts; how to annotate images and prop-on the assumption that a good way to find interesting things is agate the annotations efficiently. In this paper, we attempt to pro-to discover other people who have similar interests, and then pose a novel memory learning framework to address those tworecommend objects that similar people like. Unlike the collabo- issues. For the first issue, we introduce a feedback knowledgerative filtering, many web search engines search for web pages memory model to accumulate the previous users’ preferences.by the link structure analysis. Two basic assumptions of the link Furthermore, a learning strategy is presented to predict hiddenstructure analysis are: pages that are co-cited by a certain page semantics using the memorized information, which is able to re-are likely to relate to the same topic, and pages that are often duce the limitation of user log sparsity to a certain extent. Thevisited in succession by a certain user are possibly similar. The feedback knowledge memory model and the learning strategycommon idea behind the two above-mentioned techniques is to are joint by known as memory learning. In the process of imageestimate similarity between objects by users’ behaviors, instead retrieval, the memory learning can capture semantics from pre-of object contents. vious users’ behaviors instead of image contents. The memory Without doubt, the long-term learning methods can achieve learning and low-level feature-based RF are then combined tobetter retrieval precision compared to traditional RF techniques. improve the retrieval performance. According to the memorizedHowever, they inevitably encounter two problems in practice. knowledge, the memory learning provides additional positiveOne is the sparsity of memorized feedback information. The examples to low-level feature-based RF, which alleviates thequality of long-term learning relies strongly on the amount of problem of scarcity and imbalance of feedback examples. In theuser log that the system has stored so far. Because of the large meantime, the improved low-level feature-based RF suggestsdatabase and limited interactions, it is not easy to collect suffi- more fresh knowledge for the memory learning to memorize.cient log information. Hence, the long-term learning algorithms In other words, the mutual reinforcement of memory learning
  3. 3. HAN et al.: MEMORY LEARNING FRAMEWORK 513and low-level feature-based RF enhances the system’s ability to new framework is effective. However, it is complex in terms ofgrasp semantics. computation and implementation, and also the system may take To address the second issue, we propose an annotation prop- a long time to converge.agation scheme on a semantic level by the memory learning. In [21], Bartolini et al. reported a system of FeedbackBypass.It annotates images and propagates the annotations using both It assumes the existence a static mapping from each retrievalmemorized and learned semantic information. sample to “optimal” parameters including query point and dis- Here, we summarize our contributions as follows. tance function. The “optimal” parameters are learned by feed- 1) A feedback knowledge memory model is presented to back loops over time. Afterwards, it is possible to either “by- gather the users’ feedback information during the process pass” the feedback loop completely for already-seen queries or of image search and feedback. It is efficient and can be to “predict” near-optimal parameters for new queries. simply implemented. Li et al. [6] described a bigram correlation model to cap- 2) A learning strategy based on the memorized information ture the semantic relationships among images from statistics of is proposed. It can estimate the hidden semantic relation- users’ RF information. The algorithm is simple but effective. ships among images. Consequently, this technique could Experimental results on a database of 100 000 images demon- address the problem of user log sparsity in a certain ex- strate its ability to improve the retrieval performance. tent. In [22], a long-term similarity learning algorithm was applied 3) During the interactive process, a seamless combination to CBIR. In this model, user feedback refines the current search of normal RF (low-level feature based) and the memory results. The interaction information is stored to build the se- learning (semantics based) is proposed to improve the re- mantic similarity among images. This similarity is updated with trieval performance. Notice that this combination is not queries and put into the content-based similarity. a pure linear summation. The memory learning provides Recently, He et al. [23] introduced an idea of learning a se- the normal RF with a pool of positive examples according mantic space from user’s RF. It assumes that images relevant to to its captured knowledge, which helps the normal RF to a query belong to a semantic class. By aggregating lots of feed- alleviate the problem of scarcity and imbalance of feed- back iterations, a semantic space is incrementally constructed. back examples. In addition, this paper discusses the singular value decomposi- 4) A semantics-based image annotation propagation scheme tion (SVD) to reduce the dimensionality of the semantic space. is proposed using both memorized and learned semantics. All of these systems have obtained good empirical results. In contrast with existing algorithms of propagating anno- However, their performance strongly depends on the amount of tation by visual similarity, its precision is much better. gathered user log. A common problem of these methods is that The rest of this paper is organized as follows. In Section II, they ignore the reality that it is a little hard to collect sufficientwe briefly review the related work. In Section III, we present the user interaction data for a nonweb-based image retrieval system.feedback knowledge memory model. In Section IV, the learning In fact, the same case arises in the area of web information re-strategy to estimate the hidden semantics is described. In Sec- trieval. Researchers of this area have recognized this problemtion V, the image retrieval framework by memory learning is and suggested some solutions [26], whereas, to our best knowl-explained. The experimental results are shown in Section VI. edge, very little work has been done in the CBIR. Hence, thisFinally, concluding remarks are given in Section VII. paper attempts to address this problem in a certain extent by using a learning strategy. II. REVIEW OF RELATED WORK B. Related Work in Image Annotation Propagation This section discusses some previous work in long-term Often, the users are accustomed to query by keywords. It islearning and image annotation propagation. very tedious and expensive to manually label all images of data- base. Therefore, a challenge of CBIR systems is how to propa-A. Related Work in Long-Term Learning gate annotations from labeled images to the rest of the unlabeled As previously stated, most traditional RF methods take into images effectively.account only the current query session, while the semantics cap- Picard and Minka [29] introduced an algorithm to propagatetured from past users is lost. Thus, in the recent years, a number annotation by image texture. It consists of two steps. First, hu-of long-term learning models [6], [20]–[23] have been presented mans label a patch of an image. Then, the label is propagated toto gradually improve the retrieval performance through accumu- other images with similar patches of texture.lating user query log. In [30], Saber and Tekalp provided an image retrieval and an- The information-embedding framework [20] probably was notation propagation framework based on regions. It first seg-the first attempt to explicitly memorize users’ behaviors to ments the image into regions and merges the neighboring re-improve retrieval accuracy. Its basic idea is to embed semantic gions to form objects. Afterward, each object is compared to ainformation into CBIR processes through RF using a semantic set of given template; if the match is successful, the annotationscorrelation matrix and low-level feature distances. The se- of the template are shared by the matched object.mantic relationships among images are gained and embedded The above two methods propagate annotation in terms of vi-into the system by splitting/merging image clusters and up- sual similarity, yet some researchers consider this issue fromdating the correlation matrix. Experiments have shown that this the active learning and classification perspective. Zhang et al.
  4. 4. 514 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 4, APRIL 2005[32] advised an active learning system to probabilistically prop- III. FEEDBACK KNOWLEDGE MEMORY MODELagate annotation. It estimates the attribute probabilities of unan- In this section, we propose a simple statistical model to trans-notated images by means of their annotated neighbors, which is form users’ preferences during query sessions into semantic cor-fulfilled by the kernel regression. The next sample for labeling relations among images. Then, a semantic image link networkis the one with a high density of unannotated neighbors. is formed to store those semantic correlations. In [33], Chang et al. recommended a soft annotation model The key assumption of the proposed model is that two images(CBSA) also using the active learning. It starts with labeling a share similar semantics if they are jointly labeled as positive ex-small set of training images, each with one semantic keyword. amples in a query session. Intuitively, we can estimate the se-An ensemble of binary classifiers is then trained for predicting mantic correlation between two images by means of the numbereach unlabeled image to give the image multiple soft keywords. of query sessions in which both images are positive examples.As with the binary classifier, this paper experiments with two A query session contains a query phase and possibly severallearning algorithms, SVMs and Bayes point machines (BPMs). rounds of feedback. For the sake of simplicity, the number of Another class of ideas takes advantage of RF to implement times that two images are jointly relevant to the same queryimage annotation and annotation propagation. A semi-auto- is referred to as the co-positive-feedback frequency, while thatmatic image annotation framework was suggested in [31]. when both are labeled as feedback images and at least one ofThis work embeds the annotation process in the process of them is positive is referred to as co-feedback frequency. Con-image retrieval and RF. When the user submits a keyword sequently, the correlation strength between two images is de-query and then offers RF, the search keyword is automatically fined as the ratio between their co-positive-feedback frequencyadded or strengthened to the positive examples, and removed or and their co-feedback frequency. According to this definition,weakened to the negative examples. The performance of image the correlation value is within the interval between 0 and 1. Theannotation is improved progressively as the iteration of search larger the correlation is, the more likely that these two imagesand feedback increases. are semantically similar to each other. If two images are never marked as the positive example together in a single query ses- In [15], Zhang et al. discussed a probabilistic progressive sion, their correlation value is zero. It is important to note thatkeyword propagation scheme integrating RF with a Bayesian the proposed model does not assume any correlation betweenmodel. While the user is providing feedback in a query ses- two negative examples because they may be irrelevant to thesion, it supposes that all positive examples belong to the same user’s query in many different ways.semantic class and the features from the same semantic class The semantic correlation between image andfollow the Gaussian distribution. Therefore, all positive exam- image is formally defined as follows:ples in a query session are used to calculate and update the pa-rameters of the corresponding semantic Gaussian classes. Then,the probability of each image in the database belonging to suchsemantic class is estimated by the Bayesian model. The common (1)keywords in positive examples are propagated to the images ifwith a very high probability of belonging to this class. if and Of late, there is a novel trend of using machine learning algo- ifrithms to learn concepts from images and automatically trans- (2)late the content of images to text descriptions. A good job for au-tomatic linguistic indexing of images was presented by Li et al. where is the co-positive-feedback frequency and[37]. In this system, categorized images are adopted to train denotes the co-feedback frequency.a dictionary of many two-dimensional multiresolution hidden The semantic correlation is created and updated by accumu-Markov Models [two-dimensional (2-D) MHMMs] each repre- lating user feedback information, which can be described below.senting a concept. Because each image category in the training 1) Initialize allset is annotated by humans, a mapping between 2-D MHMMs .and groups of words can be built. To annotate a new image, the 2) After the th query session offered by a user, collectlikelihood of the image being generated by each 2-D MHMM is all feedback images (the query image is treated as a posi-first estimated, and then words are picked from those categories tive example).yielding highest likelihoods. This work achieves a success on a 3) For each feedback image pair , in this query session,database of 600 image categories. Another representative work update and as:of linking images and words was done by Barnard et al. [38]. It if both and are positive examples,explores a variety of latent variable models to predict words for ;both entire images and particular image regions. if one of them is positive example, It can be easily seen that the underlying assumption of most ;of the earlier methods is that images with similar visual features otherwise,should be associated with the same keywords. Obviously, this .assumption often does not hold. Accordingly, this paper will try 4) Recalculate the semantic correlations for all feedbackto propagate annotation using memorized and learned semantic image pairs according to (2).information. 5) Repeat steps 2)–4) once a new query session is completed.
  5. 5. HAN et al.: MEMORY LEARNING FRAMEWORK 515 them. Hence, thanks to the reality of limited user log, a good model should not only memorize retrieved relevant images, but also learn to discover more relevant images that have not been memorized. In this section, we will discuss a learning strategy to estimate the hidden semantic correlation between two images without “direct link.” Objectively speaking, the so-called learning of most long-term learning is essentially corresponding to the memory process of our framework. Also, it is the reason that our framework is named memory learning. The main objective of our work is to make the limited user log play the fullest role.Fig. 1. Simple graphical representation of the semantic image link network. The learning strategy proceeds in four steps. At first, images in the database are grouped into semantically relevant clusters According to the semantic correlations, a semantic image link by the gathered semantic correlations. Next, we assume thatnetwork can be easily constructed, which is represented by im- each cluster is associated with a semantic topic. Within eachages having links to other images in the database. Its simple semantic topic, the authoritative rank of the image is calculated.graphical representation is shown in Fig. 1. The link intensity Third, hidden semantic correlation between two images is esti-on each individual link stands for the degree of semantic rel- mated by the authoritative ranks. Finally, the hidden semanticevance between two images. Hence, the link intensity is as- correlation between an image and the feedback examples is ap-signed to its corresponding semantic correlation. In the network, proximated by a probabilistic scheme.we say there is a “direct link” between two images and if ; otherwise, we say there is no “direct link” be- A. Image Semantic Clusteringtween them. We summarize the characteristics of the feedback knowledge This subsection introduces a clustering approach to groupingmemory model in the following three points. images into a few semantically correlated clusters using memo- rized semantic correlations among images. Because of its sim- 1) It is able to automatically collect and analyze the users’ plicity, the -means algorithm is adopted. To use -means algo- historical judgments offline without additional cost of rithm, two key issues have to be addressed: how to determine user interaction. Also, it hardly influences the speed of the initial cluster centers and how to measure the similarity be- the real-time retrieval system. tween an unclassified image and a cluster. The following is our 2) Since the user log accumulates feedback knowledge from solution. various users, the semantic correlations can reflect the Assume there are images in the database. For each image preference of the majority of the users. In addition, by , a measure of cluster center is defined as using large amount of user log, the model calculates the semantic correlations from a statistical point of view. Therefore, a small number of error feedbacks do not (3) produce great adverse effect to the final results. 3) Due to the symmetry of the semantic correlation, a trian- which is the sum of link intensities with all images in the link gular matrix is sufficient to keep all information. In order network. Intuitively, images with strong links with many others to further reduce the memory size, all items with zero might be representative for a specific topic. Thus, images are value are excluded. Thus, the representation of the model ranked in the descending order of the value of and top im- is simple but highly efficient. ages that have no “direct link” with each other are selected as the initial cluster centers. IV. SEMANTIC CORRELATION ANALYSIS Assume , is an image cluster, which con- BY A LEARNING STRATEGY tains images . The similarity between Most existing long-term learning algorithms also utilize an unclassified image and cluster is defined asmemory models similar to ours to gather knowledge. Theythen directly apply the memory knowledge to improve image (4)retrieval performance. There is no doubt that they alwaysget good results since the recorded information contains theuser-perceived semantics. Nevertheless, a problem arises in which is the sum of link intensities between image and allpractice. As can be easily seen from our memory model, only images in cluster .those “direct links” embody the users’ preferences, on the After the above two issues have been addressed, the -meanscontrary, many images having no “direct link” with each other algorithm is performed to group images. Each unclassifieddo not convey any information. Should we, thus, doubtless say image is assigned to the cluster with the maximal similaritythat two images without “direct link” are not similar at all? Ac- between them. This process is repeated until convergence.tually, many cases of two similar images without “direct links” Notice that the image is assigned to an “unknown” cluster if itare due to the sparsity of user log. That is, the model has not has no “direct links” to any other images of database. However,enough feedback data to find the semantic relevance between once the image of “unknown” cluster gets the “direct links”
  6. 6. 516 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 4, APRIL 2005from users’ interactions, it takes part in the above semantic how likely it contains the specific concept. Thus, if the authori-clustering and is assigned to a semantic class. tative rank of an image is higher, it is reasonable to assume that this image is more semantically similar to other images in theB. Image Authoritative Rank same cluster. Consequently, the hidden semantic correlation be- In general, we might assume that an image cluster corre- tween two images could be simply estimated bysponds to a specific semantic topic. In a cluster, a member imagemay more or less share this topic. To reflect how likely an imagewithin the cluster contains its corresponding concept, an author- (8)itative rank is estimated by D. Hidden Semantic Correlation Between an Image and the Feedback Examples During a query session, after the user provides a set of feed- (5) back examples, there emerges a question. If an image of data- base has no “direct link” to any of the feedback examples, howwhere , represents the th image cluster, to determine the correlation degree between this image and therepresents an image of the th image cluster, and denotes feedback example set? To address this problem, in this subsec-the image number of . tion, a probabilistic approach is suggested to approximate the As can be seen from the definition of the authoritative rank, hidden semantic correlation between an image and the feedbackan image with strong links to images of its same cluster but examples. Only positive examples are used in the probabilisticwith weak links to images of other clusters always has the high approach.authoritative rank. Intuitively, the larger the authoritative rank Assume that refers to as the pos-of an image is, the more likely that this image can represent the itive feedback example set containing the query image andcorresponding concepts of its cluster. positive examples stands for an image without “direct links” to any members of denotes the setC. Hidden Semantic Correlation Between Two Images of semantic classes that the query image and positive examples In contrast to the clear relevance between images with “direct belong to, and represents one semantic class in . We definelink,” the hidden semantic correlation is the potential semantic the hidden correlation between the image and the feedback ex-relevance between images without “direct link.” amples as the conditional probability . The probability We first introduce the definition of semantic similarity be- can be determined as follows:tween two clusters that will be used to estimate the hidden se-mantic correlation between two images. Intuitively, for two dif-ferent clusters, stronger semantic links between them indicatethat they are more semantically similar. Hence, the sum of se- (9)mantic correlations between members of two clusters could beused as the similarity measure. However, a regular similarity When we consider the feedback example set as a conditionshould be within the range of . Accordingly, we adopt the to qualify the conditional probability of semantic classes andlinear scaling algorithm to normalize it. After the -means algo- the image, it is independent of the other two items. Hence, it isrithm is convergent, for each image in the cluster , the reasonable to assume that , thenfollowing inequality holds: (6) (10)which illustrates that the image is more similar to its own classthan to other classes. Thus, the semantic similarity between twoclusters and is formally defined as (7), shown at the is the conditional probability that feedback exam-bottom of the page. As mentioned before, each image cluster ples belong to the class when is provided by the user.could be considered to correspond to a specific concept. Within is the conditional probability of occurrence of ifeach image cluster, the authoritative rank of an image describes the class is selected. Supposing and if (7) if
  7. 7. HAN et al.: MEMORY LEARNING FRAMEWORK 517 may be estimated integrating the feedback exampleset , the authoritative rank , and semantic similarity be-tween and . Thus suppose (11) (12)where • is the number of examples that are in as well as belong to the class ; • is the number of examples in . By combining (9)—(12), we get the hidden semantic corre-lation between image and feedback examples by Fig. 2. Basic architecture of the proposed memory learning framework. where is the low-level feature-based similarity, and stands for the semantic similarity. is calcu- lated by the distance between feature vectors of and . (13) is defined as if there is direct link between andThe hidden semantic correlation between an image and feed- otherwiseback examples can be dynamically updated with each round of (15)user’s feedback preferences. The objective of this probabilistic If the query is a new image outside database or in the “un-scheme is to make images with higher similarity to most positive known” class, , and only the low-level features areexamples more likely similar to the query image. The similarity used to produce the initial retrieval results.is estimated by a combination of image semantic clusters and In (14), the weights could be either predefined or dy-image authoritative ranks. namically adjusted using Rui’s [9] weight refining method that treats and as two different features. V. IMAGE RETRIEVAL FRAMEWORK BY THE MEMORY LEARNING B. Relevance Feedback Integrating SVM Learning With Memory Learning We have incorporated the memory learning framework intothe iFind image retrieval system [34] developed at Microsoft Re- When the user submits a query to the system, the similaritysearch Asia. It supports query by examples, query by keywords, between each image and is calculated using (14), and im-and RF. Fig. 2 displays the basic architecture of the memory ages with the highest similarities are returned as the result. If thelearning framework. In the following, we discuss the key tech- user offers any feedback information, the similarity measure isniques of this framework one by one. refined integrating low-level feature-based normal RF with se- mantics-based memory learning.A. Image Similarity Measure Due to its excellent capability in dealing with classification issue with small sample size, this paper adopts SVM learning as A problem of typical long-term learning models is the the low-level feature-based RF tool. SVMs are a family of ma-sparsity of user log. When not enough memory knowledge chine learning technologies originally invented by Vapnik [35].is available, their performance is poor. On the contrary, con- Let us consider SVMs in a binary classification setting. Given atent-based schemes are absolutely insensible to sparsity of set of linear separable training data with theiruser log. Hence, considering memory learning is limited by an labels , SVMs are trained andinsufficient amount of user log, a combination of content-based the hyperplane is formed, which separates the training data bymethod and memory learning can lead to better retrieval perfor- a maximal margin. Data points lying on one side of the hyper-mance. Moreover, the combined system may recommend fresh plane are labeled , and points lying on the other side of theimages that have not yet received any previous users’ query or hyperplane are marked 1. When a new data point is inputted forfeedback so far. classification, a label (1 or ) is assigned according to its re- Therefore, the similarity between the query image and lationship to the decision boundary, that isimage in the database is defined as the weighted sum oflow-level feature-based similarity and learned semantic simi-larity (16) When the data is not linearly separable, SVMs first project the (14) original data to a higher dimensional space by a Mercer kernel
  8. 8. 518 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 4, APRIL 2005function , and then linearly separates the data in this space. “help” from memory learning knowledge, the system can collectThe corresponding nonlinear decision boundary is much more positive examples without accessional consume of user work. Moreover, this “help” can improve the retrieval per- (17) formance of traditional RF techniques. In the proposed framework, the normal RF and the memory learning are not only purely linear integration. In fact, theyCommonly used kernel functions include Gaussian radius basis are able to reinforce each other. On the one hand, the memoryfunction (RBF) kernels, polynomial kernels, and Laplacian RBF learning can provide additional positive examples to the SVMkernels. learning according to its memorized knowledge. This helps The SVM learning can be easily applied to image retrieval. traditional RF techniques reduce the bottleneck of scarcity andDuring the process of RF, positive examples and negative ex- imbalance of feedback examples. On the other hand, the im-amples are used as the training data to construct a binary SVM proved SVM learning gives the memory learning more chancesclassifier. Images are ranked by their distances to the separating to discover new images and then memorize them.hyperplane, and the top images are returned as the retrieval re-sults. C. Image Annotation and Annotation Propagation In our framework, once any RF examples are provided, Many modern image retrieval systems support image anno-both low-level feature-based similarity and semantics-basedsimilarity are refined. For the low-level feature part, the SVM tation and annotation propagation, yet most systems propagatelearning is used as the RF scheme. After each round of feed- keywords relying on visual similarity between images, while two completely semantically different images may stay closeback, a binary SVM classifier is trained using positive examples to each other in the visual space. In this subsection, we applyand negative examples. Thereafter, the low-level feature-basedsimilarity between an image in the database and the query the memory learning framework to accomplish image annota-image is estimated by the distance from the image to the tion and probabilistic propagation of annotation on a semanticsSVM classifier. level. For the semantic part, the memory learning is employed. For Basically, there are two major issues in annotating image andthe case of the image having no “direct links” with any member propagating keywords: which subset of images should be ini-of the positive feedback example set , tially labeled and which probability should be used to propagatethe hidden semantic correlation is used to refine keywords from one annotated image to one unannotated image.the . On the contrary, if the image has “direct links” We solve them as follows.with positive examples, a multi-query is carried out, which In the work of [32] and [33], the active learning is adoptedmeans is updated as follows: to select the initial labeling samples. The samples are chosen based on how much information the annotation of each sample can provide to decrease the uncertainty of the system. The anno- (18) tated sample, once annotated, giving the maximum information or knowledge gain to the system is selected [32], [33]. Consid-By combining both cases above, the semantic similarity is re- ering the memory learning model, authoritative rank and directfined by (19), shown at the bottom of the page.As discussed in link number of one image are two factors to determine the ini-Section I, classical RF approaches suffer from the bottleneck of tial labeling samples. The image authoritative ranks reflect howinsufficient positive examples. The memory learning may make likely images contain their associated semantic topic. The di-use of memory knowledge to lighten the burden. In the general rect link number of one image measures how many images thisRF process, only very limited positive examples are possibly image has the direct link with in its semantic class. Intuitively,offered by the user, which results in the poor performance of if an image with higher authoritative rank and larger direct linkmany classical learning-based retrieval schemes. However, in number is annotated, its annotations can be propagated to moreour framework, images of database with the large semantic cor- unannotated images on a high confidence level. For simplicity,relation to any one of positive examples are also regarded as the initial annotation measure of one image is formulated bypositive examples. That is, images who satisfy (21) (20) where is the direct link number of image . For each se- mantic class, we pick one image with the highest initial annota-are automatically added as the positive examples. In (20), tion measure for labeling. That is, images with the highest initial is the positive feedback example set, in- annotation measure in their corresponding semantic class con-dicates an image of database, and is a threshold. Hence, by this struct the initial annotation set. if there are no direct links between and (19) otherwise
  9. 9. HAN et al.: MEMORY LEARNING FRAMEWORK 519 Let us next discuss the issue of annotation propagation prob- feature. Like [6], [7], and [23], the retrieval accuracy is definedability. Actually, in the memory learning model, for any two asimages in the database, their semantic relevance can be esti- relevant images retrieved in top returns Accuracymated by the semantic correlation SC or the hidden semanticcorrelation between them. Accordingly, once an image (22)is annotated, the probability of this annotation propagating to A retrieved image is considered to be relevant if it belongs to theanother unannotated image is assigned to the semantic relevance same category of the query. In all experiments, we determinebetween them. In this way, the annotated keywords are propa- the weights of (14) by the Rui’s [9] reweighting algorithm. Thegated to the whole database. Every annotated image is assigned value of the image class number is predefined to the categoryto a label vector. Each element in the vector is a keyword, and number of the image database used in the experiment.the value for that keyword indicates the probability of this image Six aspects of experiment were conducted to evaluate the pro-having it. The value of a keyword in the label vector can be rea- posed framework. In Section VI-A, we test its retrieval perfor-sonably set to the probability of propagating that keyword. A mance. In Section VI-B, we evaluate the performance of thetypical label vector may be described by {(flower, 0.8), (bird, hidden semantics learning strategy. Section VI-C examines the0.5), (mountain, 0.4), }. system’s robustness to user errors. Section VI-D shows how the Image annotation can be improved and updated by RF. The traditional RF algorithms improve performance with the helpwork of [31] may be used to accomplish this task. Its principal of the memory learning model. Afterwards, experiments aboutidea is briefly described as follows. After a user submits a query image semantic clustering are presented in Section VI-E. Fi-consisting of one or more keywords, the system automatically nally, evaluation for memory learning-based annotation prop-searches in the database for those images relevant to the key- agation is reported in Section VI-F.words. In the retrieved images, the user may tell the systemwhich images are relevant or irrelevant using RF. Then, positive A. Retrieval Performance Testinstances append the query keywords or strength the weights of To build feedback knowledge memory model, we asked eightthe query keywords. On the contrary, negative instances remove real-world users to retrieve images using our system. Each oneor weaken the keywords. Please refer to [31] for details. of seven users was required to perform 200 query sessions, and Adding new images into the database is a very common op- the last user provided 100 query sessions. Each query sessioneration for a retrieval system. An unconfirmed annotation algo- consisted of four iterations of feedback. At each iteration, therithm is proposed by [31] to automatically annotate new images. users marked positive and negative examples according to theirIt automatically adopts each new image as a query and performs preferences. Totally, we collected 1 500 query sessions in thea low-level feature-based image retrieval process. For the top user log. similar images to a query, a list of keywords sorted by their One hundred images were randomly chosen from the imagefrequency in these images is stored in an unconfirmed key- database as the query set. Based on the query set, the averageword list. The new image is, thus, labeled by the unconfirmed retrieval accuracy of the top 100 images [in (22), ] is used as the performance evaluation measure . In thekeywords. The unconfirmed annotation may be refined through experiment, the RF process is conducted automatically. In thefuture query sessions. In this paper, we also use this algorithm first round of feedback, the top 30 images are checked and la-to annotate new images. beled as either positive or negative examples. A retrieved image The memory learning-based image annotation and annotation is considered to be relevant if it belongs to the same category ofpropagation, thus, proceed as follows. the query. In the following round of feedback, the labeled posi- 1) Select images with the highest initial annotation measure tive images are placed at the beginning, while negative ones are of each semantic topic as the initial annotation set. placed at the end, and the top 30 images in the rest of the list are 2) For each sample of the initial annotation, manually label checked again. keywords and propagate these keywords to other unanno- To test the effectiveness of the proposed framework, we com- tated images according to the semantic relevance. pared its retrieval performance with two classic nonmemory ap- 3) During the query session, after the user marks feedback proaches: SVM learning-based RF described in Section V-B instances, reweigh the image annotation for the positive and MARS presented by [9]. To show the effect of the amount of and negative instances using [31]. memorized knowledge, 1 500 queries were used in three stages: ML500 using 500 queries, ML1000 using 1 000 queries, and ML1500 using 1 500 queries. Fig. 3 presents the experimental results. Clearly, the proposed framework improves the retrieval VI. EXPERIMENTAL RESULTS accuracy substantially. Moreover, the more memorized feed- back information from the users’ interactions, the better the per- We tested the memory learning framework with a general- formance of the retrieval system.purpose image database that consists of 10 000 images of 79categories from the Corel Image Gallery. Corel images have B. Performance Evaluation of the Hidden Semanticsbeen widely used by the image processing and CBIR research Learning Strategycommunities. They cover a variety of topics, such as “flower,” To illustrate the function of the hidden semantics learning“tiger,” “eagle,” “gun,” “horse,” etc. In all experiments, we use strategy, we compared the results with and without the hiddenthe color correlogram [36] with 144 dimensions as the low-level semantics learning in all three stages. Considering queries
  10. 10. 520 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 4, APRIL 2005 Fig. 5. Performance comparison with and without the simulated user errors. contains noisy data. Therefore, the promising experimental re- sults presented in Section VI-A can partially confirm the pro-Fig. 3. Performance comparison between memory learning framework and posed system is robust in a noisy environment.nonmemory RF. In this subsection, an experiment was conducted to further examine the robustness of the proposed work to user errors. We did 150 wrong query sessions to simulate the user errors (here, user errors mean images labeled as positive examples are not in the same category of the query image), and then merged them into those 1 500 real-world user log. In this way, the simulated user error rate is around 10%. Under the simulated noisy en- vironment, we tested the retrieval performance of the proposed system again. As can be seen from Fig. 5, the proposed system enjoys little performance degradation under this simulated noisy environment. This experiment demonstrates that the memory learning framework is robust to mild user errors. D. Image Retrieval by Traditional RF Techniques With the Help of the Memory LearningFig. 4. Performance comparison with and without the hidden semantics In the proposed framework, the traditional RF and memorylearning. learning are not a purely linear combination. The memory learning can automatically provide additional positive exam-without the hidden semantics learning, in (14) is ples to the traditional RF. More specifically, given a queryonly determined by . The results are displayed in the session, after the user offers feedback examples, images ofFig. 4. As can be seen from the experimental results, the hidden database with the large semantic correlation [in (20), ]semantics learning surely improves the retrieval performance. to any one of positive examples are also automatically regardedIt is not surprising that the hidden semantics learning plays a as positive examples. We argue this “help” can alleviate theless important role as the stored feedback knowledge increases. limitation of scarcity and imbalance of feedback examples. AnWe can imagine that the hidden semantics learning would be experiment was designed to examine this “help.” This experi-disabled when the system has collected enough user log where ment used the two traditional RF algorithms: SVM learning andthere is the “direct link” between any two images. From another MARS [9]. We compared the performance with and without thepoint of view, this experiment verifies that the hidden semantics memory learning’s “help.” In this test, the feedback knowledgelearning strategy can indeed alleviate the problem of feedback memory model was built using 1 000 query sessions. Figs. 6–8knowledge sparsity. report the experimental results. Clearly, the two RF methods achieved the better retrieval performances with the help ofC. Robustness to User Errors memory learning. In the real-world RF processes, user errors may take place.For instance, a user carelessly labels images of “flower” as rel- E. Experiments About Image Semantic Clusteringevant examples while he/she is actually interested in images of To demonstrate the effectiveness of our image semantic clus-“bird.” Hence, a good CBIR system should be able to tolerate tering algorithm, we compared with the SVM-OPC (one permoderate levels of noise from user errors. The proposed system class) classification scheme used in CBSA system [33]. Thecan handle mild noise by two reasons. First, our semantic corre- so-called SVM-OPC implements image classification in termslations are calculated on a great amount of user feedback knowl- of low-level features. For classes, it trains SVM clas-edge. From the statistical perceptive, the overwhelming ma- sifiers each distinguishing one class from the otherjority of correct feedback information can filter a small number classes. For each point , there exists SVM classifiers out-of user errors. Second, in our empirical study, we employ feed- putting . The class of the point is assigned toback information offered by real-world users, which inevitably .
  11. 11. HAN et al.: MEMORY LEARNING FRAMEWORK 521 Fig. 9. Classification precision comparison on the 2 500 images database.Fig. 6. Retrieval accuracy (P20) comparison with and without the help ofmemory learning.Fig. 7. Retrieval accuracy (P50) comparison with and without the help of Fig. 10. Classification precision comparison on the 10 000 images database.memory learning. dataset, and Fig. 10 shows the comparison results using the 10-K dataset. Clearly, our algorithm achieves the better classi- fication performance with both databases. On the contrary, the SVM-OPC handles the situation of the small database well, but is not scalable to large database. F. Evaluation for Memory Learning-Based Annotation Propagation In this subsection, we evaluate annotation propagation perfor- mance of our framework through comparing with two recently published systems: CBSA [33] and Bayesian model [15]. The so-called CBSA system first defines a set of labels each cor- responding to the semantics of an image category. Then, eachFig. 8. Retrieval accuracy (P100) comparison with and without the help of unannotated image is classified against those defined categoriesmemory learning. using SVM-OPC. It produces a rank list for those categories, with each category assigned a confidence probability. The la- In order to fulfill our experimental goals, we used two bels, together with their probabilities, become the annotation ofdatasets: one contains 2 500 images from 25 Corel categories; this image. As for the Bayesian model, it performs the anno-the other contains 10 000 images from 79 Corel categories. tation propagation during the process of RF. After the user hasFor the former database, we invited three subjects to collect provided feedback in a query session, it assumes that all positive150 query sessions. We further divided the user log into two examples belong to one semantic class and the features from thephases: 100 queries and 150 queries. For the latter database, we same semantic class follow the Gaussian distribution. The pa-adopted the same user log used in the foregoing experiments. rameters for a semantic Gaussian class can be estimated usingThe SVM-OPC classifiers were trained respectively by 10%, the feature vectors of all the positive examples. Then the poste-20%, and 30% of images randomly picked from the database. rior probability of each image in the database belonging to suchThe number of SVM-OPC classifiers is decided by the number semantic class is estimated by the Bayesian formulation. Theof image category used in the experiment. Its category in the common keywords of positive examples are propagated to thedatabase is the ground truth for an image. The classified image image by the estimated posterior probability.is regarded as correct if it belongs to its ground truth. Fig. 9 In the experiment, 10 000 Corel images from 79 semanticgives the classification precision comparison using the 2.5-K categories were adopted. For the proposed strategy, the feed-
  12. 12. 522 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 4, APRIL 2005Fig. 11. Annotation propagation examples propagated to the keyword of “flower.” (a) Thirteen matches; top 15 images using memory learning framework after1 000 queries. (b) Twelve matches; top 15 images using CBSA with 20% training data. (c) Six matches; top 15 images using Bayesian model after five feedbackiterations. work, CBSA, and Bayesian model, respectively. In Fig. 11, (a) displays the results using memory learning framework, (b) shows the results using CBSA, and (c) shows the results using Bayesian model. A match happens when one image in the top 15 images belongs to the category represented by the propagated keyword. The visually results demonstrate that our framework outperforms other two systems. To objectively and comprehensively evaluate the annotation propagation function of the proposed framework, we define a quantitative performance metric: valid propagation precision. This measure shows how often the valid propagations are correct. The so-called valid propagation means its propagation probability is over a confidence threshold . A propagationFig. 12. Annotation valid propagation precision comparison of memory probability less than implies this propagation happens on alearning framework, CBSA, and Bayesian model. low confidence level. Moreover, in the practical query process, images with a low value to one keyword are hardly retrievedback knowledge memory model was constructed by 1 000 user by the system when the user submits that keyword for query.log and 1 500 user log, respectively. 79 images with the highest Therefore, only valid propagations are used to finally calculateinitial annotation measure in their corresponding semantic class the accuracy. In the experiment, , and the groundwere composed of the initial annotation image set. We assume truth keyword of an image is its corresponding category name.that each image of initial annotation image set is labeled by Fig. 12 shows the comparing results. The accuracy of Bayesianonly one keyword, which is exactly its category name. There- model is the average accuracy of 100 queries. As can be seenafter, the keywords were propagated from annotated images to from the Fig. 12, the proposed framework is able to provideunannotated images based on the semantic correlation or the more trustworthy annotation propagations.hidden semantic correlation between them. For the CBSA, ineach image category, 20% images were randomly selected tomake up the training data. They were annotated by their cate- VII. CONCLUSIONgory names. Then, 79 SVM-OPC classifiers were trained using In order to supply effective image retrieval to users, thisthe training data and each classifier was associated with a se- paper has presented a new memory learning frameworkmantic category. By the trained classifiers, a set of probabilities in which low-level feature-based RF and semantics-basedfor each unannotated image were produced. The probabilities memory learning are combined to help each other to achievedepict the likelihood of a category label describing an image. better retrieval performance. There are two novel characteris-For the Bayesian model, 100 randomly selected images con- tics that distinguish the memory learning framework from thesisted of the query set and they were annotated by their cate-gory names. Only one keyword was associated with each query existing RF techniques. First, it creates a feedback knowledgeimage, and other images in the database have no keyword anno- memory model to accumulate user’s preferences. More im-tation. The system used each query image for image retrieval. portantly, a learning strategy is introduced to infer the hiddenThe RF was performed automatically. In each round of feed- semantics according to the gathered semantic information.back, the top 30 images are checked and labeled as positive ex- In addition, a semantics-based image annotation propagationample or not. A retrieved image is considered to be relevant if it scheme is described.belongs to the same category of the query. After five iterations The proposed framework is easy to implement and can beof feedback, a Bayesian model was formed by the positive ex- efficiently incorporated into an image retrieval system. Experi-amples and the keyword of query image was propagated to other mental evaluations on a large-scale image database have alreadyunannotated images according to the posterior probability. shown very promising results. However, a limitation of the pro- Due to the limitation of space, we present only one keyword posed work is that it somewhat lacks sufficient theoretical jus-propagation example. Fig. 11 shows the top 15 images that are tification. Our future work will investigate the possibility to de-propagated to the keyword “flower” by memory learning frame- velop more sophisticated and theoretical learning schemes.
  13. 13. HAN et al.: MEMORY LEARNING FRAMEWORK 523 ACKNOWLEDGMENT [25] A. Kohrs and B. Merialdo, “Improving collaborative filtering with mul- timedia indexing techniques to create user-adapting web sits,” in Proc. The authors would like to thank Dr. L. Zhang of MSRA for ACM Int. Conf. Multimedia, Seattle, WA, Nov. 1999, pp. 27–36.his help in system work. They would also like to thank F. Jing [26] A. Kohrs and B. Merialdo, “Clustering for collaborative filtering appli- cations,” presented at the Int. Conf. Computational Intelligence for Mod-and G. Xue for some valuable discussions. eling Control and Automation, Feb. 1999. [27] J. M. Kleinberg, “Authoritative sources in a hyperlinked environment,” J. ACM, vol. 46, no. 5, pp. 604–632, 1999. REFERENCES [28] R. Lempel and A. Soffer, “PicASHOW: Pictorial authority search by hy- perlinks on the web,” in Proc. 10th Int. WWW Conf., 2001, pp. 438–448. [1] M. Flickner, H. Sawhney, and W. Niblack, “Query by image and video [29] R. W. Picard and T. P. Minka, “Vision texture for annotation,” Multi- content: The QBIC system,” IEEE Computer, vol. 28, no. 9, pp. 23–32, media Syst., vol. 3, pp. 3–14, 1995. Sep. 1995. [30] E. Saber and A. M. Tekalp, “Region-based affine shape matching for [2] A. P. Penland, R. W. Picard, and S. Sclaroff, “Photobook: Content-based automatic image annotation and query-by-example,” J. Vis. Commun. manipulation of image databases,” Int. J. Comput. Vis., vol. 18, no. 3, pp. Image Rep., vol. 8, no. 1, pp. 3–20, Mar. 1997. 233–254, 1996. [31] W. Liu, S. Dumais, Y. Sun, and H. Zhang, “Semi-automatic image an- [3] W. Y. Ma and B. Manjunath, “NETRA: A toolbox for navigating large notation,” in Proc. Conf. Human-Computer Interaction, Jul. 2001, pp. image databases,” Multimedia Syst., vol. 7, no. 3, pp. 184–198, 1999. 326–333. [4] J. Z. Wang, J. Li, and G. Wiederhold, “SIMPLIcity: Semantics-sensi- [32] C. Zhang and T. Chen, “An active learning framework for con- tive integrated matching for picture libraries,” IEEE Trans. Pattern Anal. tent-based information retrieval,” IEEE Trans. Multimedia, vol. 4, no. Mach. Intell., vol. 23, no. 9, pp. 947–963, Sep. 2001. 2, pp. 260–268, Jun. 2002. [5] A. Gaurav, T. V. Ashwin, and G. Sugata, “An image retrieval system with [33] E. Chang, K. Goh, G. Sychay, and G. Wu, “CBSA: Content-based soft automatic query modification,” IEEE Trans. Multimedia, vol. 4, no. 2, annotation for multimodal image retrieval using bayes point machines,” pp. 201–213, Jun. 2002. IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 1, pp. 26–38, Jan. [6] M. Li, Z. Chen, and H. Zhang, “Statistical correlation analysis in image 2003. retrieval,” Pattern Recognit., vol. 35, pp. 2687–2693, 2002. [34] H. Zhang, W. Liu, and C. Hu, “iFind-a system for semantics and fea- [7] Z. Su, H. Zhang, S. Li, and S. Ma, “Relevance feedback in content-based ture based image retrieval over internet,” in Proc. ACM Int. Conf. Mul- image retrieval: Bayesian framework, feature subspaces, and progressive timedia, Los Angeles, CA, 2000, pp. 477–478. learning,” IEEE Trans. Image Process., vol. 12, no. 8, pp. 924–936, Aug. [35] V. Vapnik, The Nature of Statistical Learning Theory. New York: 2003. Springer, 1995. [8] J. J. Rocchio, “Relevance feedback in information,” in The SMART Re- [36] J. Huang, S. R. Kumar, M. Mitra, W. Zhu, and R. Zabih, “Image indexing trieval Systems: Experiments in Automatic Document processing, G. using color correlograms,” in Proc. IEEE Conf. Computer Vision Pattern Salton, Ed. Upper Saddle River, NJ: Prentice-Hall, 1971, pp. 313–323. Recognition, Jun. 1997, pp. 762–768. [9] Y. Rui, T. S. Huang, and S. Mehrotra, “Relevance feedback: A powerful [37] J. Li and J. Z. Wang, “Automatic linguistic indexing of pictures by a tool in interactive content-based image retrieval,” IEEE Trans. Circuits statistical modeling approach,” IEEE Trans. Pattern Anal. Mach. Intell., Syst. Video Technol., vol. 8, no. 5, pp. 644–655, May 1998. vol. 25, no. 9, pp. 1075–1088, Sep. 2003. [10] I. J. Cox, M. L. Miller, T. P. Minka, T. V. Papathomas, and P. N. Yianilos, [38] K. Barnard, P. Duygulu, D. Forsyth, N. D. Freitas, D. M. Blei, and M. “The Bayesian image retrieval system, PicHunter: Theory, implemen- I. Jordan, “Matching words and pictures,” J. Mach. Learn. Res., vol. 3, tation, and psychophysical experiments,” IEEE Trans. Image Process., pp. 1107–1135, 2003. vol. 9, no. 1, pp. 20–37, Jan. 2000. [11] T. P. Minka and R. W. Picard, “Interactive learning with a “society of models”,” in Proc. IEEE Conf. Computer Vision Pattern Recognition, San Francisco, CA, Jun. 1996, pp. 447–452. [12] Y. Rui and T. S. Huang, “Optimizing learning in image retrieval,” in Proc. IEEE Conf. Computer Vision Pattern Recognition, Jun. 2000, pp. Junwei Han received the Ph.D. degree from North- 236–243. western Polytechnical University, Xi’an, China, in [13] X. S. Zhou and T. S. Huang, “Small sample learning during multimedia 2003. retrieval using BiasMap,” in Proc. IEEE Conf. Computer Vision Pattern He is currently a Postdoctoral Fellow with the Recognition, Dec. 2001, pp. 8–14. Department of Electronic Engineering, The Chinese [14] S. Tong and E. Chang, “Support vector machine active learning for University of Hong Kong, Shatin, Hong Kong. His image retrieval,” in Proc. ACM Int. Conf. Multimedia, Ottawa, ON, research interests include content-based image/video Canada, Oct. 2001, pp. 107–118. retrieval and image/video segmentation. [15] H. Zhang and Z. Su, “Relevance feedback in CBIR,” presented at the Int. Workshop on Visual Databases, 2002. [16] X. S. Zhou and T. S. Huang, “Relevance feedback in image retrieval: A comprehensive review,” Multimedia Syst., vol. 8, pp. 536–544, 2003. [17] S. D. MacArthur, C. E. Brodley, and C. R. Shyu, “Relevance feedback decision trees in content-based image retrieval,” in Proc. IEEE Work- shop Content-Based Access of Image and Video Libraries, Jun. 2000, pp. 68–72. King N. Ngan (M’79–SM’91–F’00) received the [18] K. Tieu and P. Viola, “Boosting image retrieval,” in Proc. IEEE Conf. Ph.D. degree in electrical engineering from Lough- Computer Vision Pattern Recognition, Jun. 2000, pp. 228–235. borough University of Technology, Loughborough, [19] E. Chang, B. Li, G. Wu, and K. S. Goh, “Statistical learning for effec- U.K. tive visual information retrieval,” in Proc. IEEE Conf. Image Processing, He is a Chair Professor with the Department of Barcelona, Sep. 2003, pp. III-609–III-612. Electronic Engineering, The Chinese University of [20] C. Lee, W. Y. Ma, and H. Zhang, “Information embedding based on Hong Kong, Shatin, Hong Kong. Previously, he was a user’s relevance feedback for image retrieval,” presented at the SPIE Full Professor with Nanyang Technological Univer- Conf. Multimedia Storage and Archiving Systems IV, Boston, MA, sity, Singapore, and the University of Western Aus- 1999. tralia, Crawley, Australia. He is an Associate Editor [21] I. Bartolini, P. Ciaccia, and F. Waas, “FeedbackBypass: A new approach to interactive similarity query processing,” in Proc. Int. Conf. Very Large for the Journal on Visual Communications and Image Data Bases, Rome, Italy, Jun. 2001, pp. 201–210. Representation and an Area Editor of the EURASIP Journal of Image Commu- [22] F. Fournier and M. Card, “Long-term similarity learning in content- nication and Journal of Applied Signal Processing. He has chaired a number of based image retrieval,” in Proc. IEEE Conf. Image Processing, New prestigious international conferences on video signal processing and communi- York, Sep. 2002, pp. 22–25. cations and has served on the advisory and technical committees of numerous [23] X. He, O. King, W. Y. Ma, M. Li, and H. Zhang, “Learning a semantic professional organizations. He has published extensively, including three au- space from user’s relevance feedback for image retrieval,” IEEE Trans. thored books, five edited volumes, and over 200 refereed technical papers in the Circuits Syst. Video Technol., vol. 13, no. 1, pp. 39–48, Jan. 2003. areas of image/video coding and communications. [24] D. Goldberg, D. Nichols, B. Oki, and D. Terry, “Using collaborative Prof. Ngan is a Fellow of the IEE (U.K.) and a Fellow of IEAust (Australia). filtering to weave an information tapestry,” Commun. ACM, vol. 35, no. He was an Associate Editor of IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS 12, pp. 218–277, 1992. FOR VIDEO TECHNOLOGY.
  14. 14. 524 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 4, APRIL 2005 Mingjing Li received the B.S. degree in electrical Hong-Jiang Zhang (F’03) received the Ph.D. engineering from the University of Science and degree in electrical engineering from the Technical Technology of China, Hefei, and the Ph.D. degree in University of Denmar, Lyngby, and the B.S. degree pattern recognition from the Institute of Automation, in electrical engineering from Zhengzhou University, Chinese Academy of Sciences, Beijing, in 1989 and Zhengzhou, China, in 1991 and 1982, respectively. 1995, respectively. From 1992 to1995, he was with the Institute of He joined Microsoft Research Asia, Beijing, Systems Science, National University of Singapore, China, in July 1999. His research interests include Singapore, where he led several projects in video and handwriting recognition, statistical language mod- image content analysis and retrieval and computer vi- eling, search engines, and image retrieval. sion. He was also with the Massachusetts Institute of Technology Media Laboratory, Cambridge, as a Vis- iting Researcher in 1994. From 1995 to 1999, he was a Research Manager at Hewlett-Packard Laboratories, Palo Alto, CA, where he was responsible for re- search and technology transfers in the areas of multimedia management, in- telligent image processing, and Internet media. In 1999, he joined Microsoft Research Asia, Beijing, China, where he is currently a Senior Researcher and Assistant Managing Director in charge of media computing and information processing research. He has authored three books, over 260 referred papers, seven special issues of international journals on image and video processing, content-based media retrieval, and computer vision, as well as over 50 patents or pending applications. Dr. Zhang currently serves on the editorial boards of five IEEE/ACM journals and a dozen committees of international conferences.