Faro Visual Attention For Implicit Relevance Feedback In A Content Based Image Retrieval
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Faro Visual Attention For Implicit Relevance Feedback In A Content Based Image Retrieval

on

  • 1,805 views

In this paper we propose an implicit relevance feedback method with the aim to improve the performance of known Content Based Image Retrieval (CBIR) systems by re-ranking the retrieved images ...

In this paper we propose an implicit relevance feedback method with the aim to improve the performance of known Content Based Image Retrieval (CBIR) systems by re-ranking the retrieved images according to users’ eye gaze data. This represents a new mechanism for implicit relevance feedback, in fact usually the sources taken into account for image retrieval are based on the natural behavior of the user in his/her environment estimated by analyzing mouse and keyboard interactions. In detail, after the retrieval of the images by querying CBIRs with a keyword, our system computes the most salient regions (where users look with a greater interest) of the retrieved images by gathering data from an unobtrusive eye tracker, such as Tobii T60. According to the features, in terms of color, texture, of these relevant regions our system is able to re-rank the images, initially, retrieved by the CBIR. Performance evaluation, carried out on a set of 30 users by using Google Images and “pyramid” like keyword, shows that about the 87% of the users is more satisfied of the output images when the re-raking is applied.

Statistics

Views

Total Views
1,805
Views on SlideShare
1,804
Embed Views
1

Actions

Likes
0
Downloads
29
Comments
0

1 Embed 1

http://www.health.medicbd.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Faro Visual Attention For Implicit Relevance Feedback In A Content Based Image Retrieval Document Transcript

  • 1. Visual Attention for Implicit Relevance Feedback in a Content Based Image Retrieval A. Faro, D. Giordano, C. Pino, C. Spampinato∗ Department of Informatics and Telecommunication Engineering University of Catania, Catania, 95125, Italy Abstract Relevance feedback is a key-feature in image retrieval systems, whose main idea is to take into account the outputs, initially re- In this paper we propose an implicit relevance feedback method trieved, and to use the user’s feedback, on the relevance of them with the aim to improve the performance of known Content Based with the initial query, in order to perform a new query. In literature Image Retrieval (CBIR) systems by re-ranking the retrieved images two types of feedback can be defined: explicit feedback and implicit according to users’ eye gaze data. This represents a new mechanism feedback. Since the former method requires a higher effort on the for implicit relevance feedback, in fact usually the sources taken user’s side, because it may be difficult to get explicit relevance as- into account for image retrieval are based on the natural behavior sessments from searchers [Xu et al. 2008], implicit feedback meth- of the user in his/her environment estimated by analyzing mouse ods have gained more attention, where feedback data are obtained and keyboard interactions. In detail, after the retrieval of the im- by observing the user’s actions and in his/her natural environment. ages by querying CBIRs with a keyword, our system computes the Until today the most explored and implemented sources for im- most salient regions (where users look with a greater interest) of the plicit relevance feedback have been the interactions of users with retrieved images by gathering data from an unobtrusive eye tracker, the mouse and the keyboard [Kelly and Teevan 2003]. A new evi- such as Tobii T60. According to the features, in terms of color, dence source for implicit feedback, explored in the last few years, texture, of these relevant regions our system is able to re-rank the e.g., in [Moe et al. 2007], [Miller and Agne 2005], [Granka et al. images, initially, retrieved by the CBIR. Performance evaluation, 2004], is the one related to the user’s visual attention (provided by carried out on a set of 30 users by using Google Images and “pyra- the eye movements), which introduces a potentially very valuable mid” like keyword, shows that about the 87% of the users is more new dimension of contextual information [Buscher 2007]. satisfied of the output images when the re-raking is applied. Indeed, in a CBIR the knowledge of the human visual attention would allow us to select the most salient parts of an image, which Keywords: Relevance Feedback, Content Based Image Retrieval, can be used both for image retrieval, as in [Marques et al. 2006], and Visual Attention, Eye Tracker for relevance feedback mechanisms implementation. Moreover, the detection of these salient regions observed by a user is a crucial in- formation for finding image similarity. 1 Introduction In this paper we propose an implicit relevance feedback mechanism by using visual attention implemented with a Tobii Eye Tracker T60 During the last ten years, with the growing of Internet and the ad- to be integrated in a web based content based image retrieval, which vances in digital cameras research, huge collections of images have aims at re-ranking, using the most salient regions extracted by the been created and shared on the web. During the past the only way to eye tracker, the output images provided by a web-based CBIR. The search digital images was done by keyword indexing, or simply by proposed system represents a novel approach of eye tracking for images browsing. The needs to fast find the images in a large digi- image retrieval since other approaches in literature, e.g. [Oyekoya tal images databases have brought researchers in image processing and Stentiford 2006], [Oyekoya and Stentiford 2007], are based on to open the way to develop Content Based Image Retrieval (CBIR), roughly retrieval engines based on high level features and it allows i.e., a system for images retrieval based on the concept of similar both users with disabilities to perform a feedback of the obtained visual content. results and generic users to tune the CBIR with their cognitive per- Moreover, recent researches in information retrieval are based ception of the images, (e.g., “I unconsciously prefer image with red on the consideration of the user’s personal environment in or- colors-like”, or “When I look to Egyptian Images I prefer to see der to better understand the user’s needs. Indeed, in CBIR Pyramids and Sphinx”). The remainder of the paper is as follows: systems not always user gets results fully related with the in the section 2 the architecture of the proposed system in discussed. image query especially in web-based image retrieval, such In the section 3 an experimental evaluation on the Google images as Google Images (http://images.google.it) or Yahoo!’s Picture CBIR is performed and the experimental results on a set of 30 users Gallery (http://gallery.yahoo.com/). This is mainly due to the fact are shown. Finally, in the last section conclusion and future work that metadata often cannot explain well the content of an image and are, respectively, presented. even when the description is exhaustive the attention of the user may be only in some portions of the image, which often correspond to the greatest salience areas of the image. In order to take into ac- 2 The Proposed System count these user’s needs, a relevance feedback mechanism must be integrated in CBIRs. The flow diagram of the proposed system is shown in fig. 1: 1) the ∗ e-mail: user insert a keyword for image searching, 2) the web-based CBIR {afaro,dgiordan,cpino, cspampin}@diit.unict.it retrieves the most relevant images whose metadata contain the in- Copyright © 2010 by the Association for Computing Machinery, Inc. serted word, 3) the user looks at the output images and the system Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed retrieves the most relevant regions by using Tobii facilities and their for commercial advantage and that copies bear this notice and the full citation on the features (e.g. color, texture, etc ...), 4) the system re-ranks the out- first page. Copyrights for components of this work owned by others than ACM must be put images according to the above extracted information. honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on Our system uses the Tobii eye tracker to capture an implicit rele- servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail vance feedback and to classify the images in a different order of permissions@acm.org. relevance with respect to the initial classification, in order to im- ETRA 2010, Austin, TX, March 22 – 24, 2010. © 2010 ACM 978-1-60558-994-7/10/0003 $10.00 73
  • 2. Figure 1: Implicit Relevance Feedback for the new Ranking Method in web-based CBIR. prove the ranking provided by the search on a CBIR environment. tant regions of the image, such as clustering algorithms and The aim of this is to capture the user’s gaze fixations in order to object recognition, but it permits a considerable reduction of identify the characteristics of the images s/he declares to be of computational complexity of search algorithms. her/his interest. This will allow the tool to retrieve automatically In our case the detection is simplified by the eye tracker, further relevant images. The tool may be also able to discover in an which allows us to identify the regions of major interest. The unsupervised way the characteristics of the images of potential user local features, considered for describing image content, are interest. Indeed, it is able to derive the characteristics of the images the Contrast C, Correlation Cr , Energy E, Homogeneity H, of user interest by considering the images, which mainly captured Gabor filters G-Maps (24 maps: 6 scales and 4 orientations) the user attention, e.g., by taking into account the user visual ac- and two novel features that describe the: tivity over the analyzed images. In the former case the tool learns µ3 ·2552 how to select further relevant images, whereas in the latter case it – Brightness computed as rbright = µ + 10 ; could be also able to reclassify the images already examined by the 1 – Smoothness computed as rsmooth = ( µ2 + 1 +E + µ4 user suggesting to her/him of reconsidering more deeply some po- tentially relevant images. H); Although the system proposed has been only tested on Google im- The above features are based on the moments of the histogram ages to improve the precision of the retrieval, it may be applied to H of the gray levels. The nth moment of the histogram of improve the precision of the retrieval of any document on the basis gray levels is represented by of the images featuring the documents. L−1 Figure 2 shows the general architecture of proposed implicit rel- µn (x) = (xi − µ) · p(xi ) evance feedback, where we point out the system ability of rear- i=0 ranging the images initially retrieved from a web-based CBIR (e.g. Google Images) without any user supervision, i.e., only on the basis where p(xi ) is the probability of finding a pixel of the im- of the user gaze fixations. A fine tuning of the characteristics to be age with gray level xi (given by the histogram H), L is the possessed by the images may be carried out by the system on the number of gray levels and µ the average value. Therefore, in basis of the user agreement for a better rearrangement of the images the proposed system, the images returned by the CBIR and or for extracting relevant images from other datasets. In detail, the the file containing the data taken by the eye tracker are pro- re-ranking mechanism is composed of the following steps: cessed in order to identify the most relevant images and their features. Each image is then represented by a feature vector • First Image Retrieval. The user enters some keywords on F = [C, Cr , R, H, G − M aps, rbright , rsmooth ]. the used CBIR and observes the results. During this phase, the eye tracker stores gaze fixations on the thumbnails of the • Re-Ranking. The values of the extracted features, which retrieved images, which most captured the user attention and should be possessed by the images to best fit the user inter- her/his eye movements; est, are then processed to produce a ranking of the images initially retrieved. In detail, we compute a similarity score • Features Extraction. One of the crucial point in CBIR is the (which represents a sort of implicit relevance feedback) be- choice of low-level features, to be used to compare the image tween the most relevant images, detected at the previous step, under test with the queried image. The features combination and the images retrieved at the first step (see fig. 3). The determines the effectiveness of research. The extracted fea- metrics to evaluate the similarity is based on the concept of tures can be related to the entire image, so we are talking about distance, measured between the feature vector Frel (normal- global features, or to its portion, then we are talking about lo- ized between 0 and 1) of the most salient images (extracted at cal features. The local features extraction is more complex, the previous step) and the feature vector Fret (normalized be- because it requires a first step for the detection of the impor- tween 0 and 1) of the images initially retrieved (at step 1). The 74
  • 3. Figure 2: System Architecture. images are re-ranked by using this similarity score, computed as: N i i f (IRel , IRet ) = wi · Ωi (frel , fret ) i=1 (1) w1 + w2 + ..... + wN = 1 i i where IRel , IRet , frel , fret are respectively the relevant im- age detected at the previous step, the image initially retrieved, the ith feature of the N features of the vector F of the image IRel and IRet . Ωi is the fitness function related to the features i i frel , fret and is computed as: 1 i i Ω = e− 2 ·(fret −frel ) (2) Finally, the retrieved images are ordered, hence re-ranked, ac- cording to the decreasing values of the similarity score f . The relevance feedback detected by the eye tracker could be im- proved by taking into account the ranking carried out by other meth- Figure 3: Eye Tracker with the implicit relevance feedback produce ods, e.g., by the ones, which model the user behavior during the an image input for CBIR system. phase of image analysis from how the user operates on the mouse and keyboard. fixations and the one related to the images, which are merged in the 3 User Interface and Experimental Results same picture, are actually separated into two files. To evaluate the effectiveness of the proposed system for increasing The system has been implemented by integrating the functionality the precision of the information retrieval carried out by Google Im- of the Tobii Studio to Matlab 7.5 responsible for processing the out- ages, we will show below how the system rearranges significantly put provided from the eye tracker. The Tobii studio makes possible the collection of images proposed by Google in response to the to register a web browsing, setting appropriate parameters such as word “pyramid” and we will evaluate the performance increase as the URL and the initial size of the window on the web browser. perceived by a set of 30 users. Indeed, such collection is proposed By default the web browsing is set to http://images.google.com/ as without any knowledge of the user interest by merging images of homepage, whereas the window size and resolution are put equal “pyramid” where the subject is either a monument or a geometric to the entire screen and the maximum resolution allowed by the solid (see fig. 4). monitor. After a proper training phase of the instrument, the user With the eye tracker we may go insight the user interests, by dis- is authorized to start regular recording sessions that terminate by covering, for example that s/he is more interested in the pyramids as pressing the F10 key on the keyboard. At the end of the session monuments since the more fixed images are related to the Egyptian the user should confirm the export in textual form of the two files pyramids (as shown by the heatmap in fig. 5). With this informa- related to fixations and events needed for the computation of the tion at hand it is relatively easy for the system to discover, after relevance feedback. Thus, the information representing the gaze the recording session, the images relevant for the user following the 75
  • 4. the users was satisfied after the re-ranking, the 6.7% of the users was indifferent and the 6.7% was less satisfied. Less Satisfied Indifferent More Satisfied U sers 2 2 26 % 6.7 6.7 86.6 Table 1: Qualitative Performance Evaluation on a set of 30 Users Figure 4: Google Ranking for “Pyramid” Keyword. 4 Conclusions and Future Work The proposed model shows that the use of an eye tracker to de- tect an implicit feedback may greatly improve the performance of processing procedure pointed out in the previous section. a search in a CBIR system. Future developments will concern the Fig. 6 shows the collection of the images as re-proposed by our possibility of considering not only the first image but also the next in order of importance, to obtain a more refined ranking. Moreover, we are currently working on the possibility to use visual attention for image indexing, thus taking into account the real contents of images. A comparison with on other web-based CBIRs and tests on a wider set of users in order to provide quantitative results will be carried out in future works. References B USCHER , G. 2007. Attention-based information retrieval. In SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, New York, NY, USA, 918–918. Figure 5: Gaze Fixations on the Images retrieved by Google using the “Pyramid” keyword. G RANKA , L. A., J OACHIMS , T., AND G AY, G. 2004. Eye-tracking analysis of user behavior in www search. In SIGIR ’04: Proceed- ings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, New York, NY, USA, 478–479. K ELLY, D., AND T EEVAN , J. 2003. Implicit feedback for inferring user preference: a bibliography. SIGIR Forum 37, 2, 18–28. M ARQUES , O., M AYRON , L. M., B ORBA , G. B., AND G AMBA , H. R. 2006. Using visual attention to extract regions of interest in the context of image retrieval. In ACM-SE 44: Proceedings of the 44th annual Southeast regional conference, ACM, New York, NY, USA, 638–643. M ILLER , T., AND AGNE , S. 2005. Attention-based information retrieval using eye tracker data. In K-CAP ’05: Proceedings of the 3rd international conference on Knowledge capture, ACM, Figure 6: New Images Pyramid Ranking according to the Eye New York, NY, USA, 209–210. Tracker feedback given by the user. M OE , K. K., J ENSEN , J. M., AND L ARSEN , B. 2007. A qual- system. The new ranking correctly suggests a sequence that favors itative look at eye-tracking for implicit relevance feedback. In the pyramids more similar to those observed and then actually re- CIR. quested by the user. The user’s will was caught with an implicit OYEKOYA , O. K., AND S TENTIFORD , F. W. 2006. Eye tracking relevance feedback by taking into account that s/he was particu- – a new interface for visual exploration. BT Technology Journal larly attracted by a picture with the Sphinx in the foreground and 24, 3, 57–66. the pyramid in the background. The proposed system was then able to discover meaningful infor- OYEKOYA , O., AND S TENTIFORD , F. 2007. Perceptual image re- mation from how the perception process has been carried out by the trieval using eye movements. Int. J. Comput. Math. 84, 9, 1379– user. Indeed, by the new re-proposed ranking, at the top two places 1391. there are images with the pyramid and the Sphinx. X U , S., Z HU , Y., J IANG , H., AND L AU , F. C. M. 2008. A Finally, we tested the performance of the proposed system on a set user-oriented webpage ranking algorithm based on user attention of 30 users. In detail, after the re-ranking the user was requested to time. In AAAI’08: Proceedings of the 23rd national conference say if the first five retrieved images were more or less relevant to the on Artificial intelligence, AAAI Press, 1255–1260. inserted word with respect to the ones obtained by Google Images. The results are reported in table 1, where we can see that 86.6% of 76