Faro Visual Attention For Implicit Relevance Feedback In A Content Based Image Retrieval

Visual Attention for Implicit Relevance Feedback in a Content Based Image
Retrieval
A. Faro, D. Giordano, C. Pino, C. Spampinato∗
Department of Informatics and Telecommunication Engineering
University of Catania, Catania, 95125, Italy

Abstract Relevance feedback is a key-feature in image retrieval systems,
whose main idea is to take into account the outputs, initially re-
In this paper we propose an implicit relevance feedback method trieved, and to use the user’s feedback, on the relevance of them
with the aim to improve the performance of known Content Based with the initial query, in order to perform a new query. In literature
Image Retrieval (CBIR) systems by re-ranking the retrieved images two types of feedback can be defined: explicit feedback and implicit
according to users’ eye gaze data. This represents a new mechanism feedback. Since the former method requires a higher effort on the
for implicit relevance feedback, in fact usually the sources taken user’s side, because it may be difficult to get explicit relevance as-
into account for image retrieval are based on the natural behavior sessments from searchers [Xu et al. 2008], implicit feedback meth-
of the user in his/her environment estimated by analyzing mouse ods have gained more attention, where feedback data are obtained
and keyboard interactions. In detail, after the retrieval of the im- by observing the user’s actions and in his/her natural environment.
ages by querying CBIRs with a keyword, our system computes the Until today the most explored and implemented sources for im-
most salient regions (where users look with a greater interest) of the plicit relevance feedback have been the interactions of users with
retrieved images by gathering data from an unobtrusive eye tracker, the mouse and the keyboard [Kelly and Teevan 2003]. A new evi-
such as Tobii T60. According to the features, in terms of color, dence source for implicit feedback, explored in the last few years,
texture, of these relevant regions our system is able to re-rank the e.g., in [Moe et al. 2007], [Miller and Agne 2005], [Granka et al.
images, initially, retrieved by the CBIR. Performance evaluation, 2004], is the one related to the user’s visual attention (provided by
carried out on a set of 30 users by using Google Images and “pyra- the eye movements), which introduces a potentially very valuable
mid” like keyword, shows that about the 87% of the users is more new dimension of contextual information [Buscher 2007].
satisfied of the output images when the re-raking is applied. Indeed, in a CBIR the knowledge of the human visual attention
would allow us to select the most salient parts of an image, which
Keywords: Relevance Feedback, Content Based Image Retrieval, can be used both for image retrieval, as in [Marques et al. 2006], and
Visual Attention, Eye Tracker for relevance feedback mechanisms implementation. Moreover, the
detection of these salient regions observed by a user is a crucial in-
formation for finding image similarity.
1 Introduction In this paper we propose an implicit relevance feedback mechanism
by using visual attention implemented with a Tobii Eye Tracker T60
During the last ten years, with the growing of Internet and the ad- to be integrated in a web based content based image retrieval, which
vances in digital cameras research, huge collections of images have aims at re-ranking, using the most salient regions extracted by the
been created and shared on the web. During the past the only way to eye tracker, the output images provided by a web-based CBIR. The
search digital images was done by keyword indexing, or simply by proposed system represents a novel approach of eye tracking for
images browsing. The needs to fast find the images in a large digi- image retrieval since other approaches in literature, e.g. [Oyekoya
tal images databases have brought researchers in image processing and Stentiford 2006], [Oyekoya and Stentiford 2007], are based on
to open the way to develop Content Based Image Retrieval (CBIR), roughly retrieval engines based on high level features and it allows
i.e., a system for images retrieval based on the concept of similar both users with disabilities to perform a feedback of the obtained
visual content. results and generic users to tune the CBIR with their cognitive per-
Moreover, recent researches in information retrieval are based ception of the images, (e.g., “I unconsciously prefer image with red
on the consideration of the user’s personal environment in or- colors-like”, or “When I look to Egyptian Images I prefer to see
der to better understand the user’s needs. Indeed, in CBIR Pyramids and Sphinx”). The remainder of the paper is as follows:
systems not always user gets results fully related with the in the section 2 the architecture of the proposed system in discussed.
image query especially in web-based image retrieval, such In the section 3 an experimental evaluation on the Google images
as Google Images (http://images.google.it) or Yahoo!’s Picture CBIR is performed and the experimental results on a set of 30 users
Gallery (http://gallery.yahoo.com/). This is mainly due to the fact are shown. Finally, in the last section conclusion and future work
that metadata often cannot explain well the content of an image and are, respectively, presented.
even when the description is exhaustive the attention of the user
may be only in some portions of the image, which often correspond
to the greatest salience areas of the image. In order to take into ac- 2 The Proposed System
count these user’s needs, a relevance feedback mechanism must be
integrated in CBIRs. The flow diagram of the proposed system is shown in fig. 1: 1) the
∗ e-mail: user insert a keyword for image searching, 2) the web-based CBIR
{afaro,dgiordan,cpino, cspampin}@diit.unict.it
retrieves the most relevant images whose metadata contain the in-
Copyright © 2010 by the Association for Computing Machinery, Inc. serted word, 3) the user looks at the output images and the system
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
retrieves the most relevant regions by using Tobii facilities and their
for commercial advantage and that copies bear this notice and the full citation on the features (e.g. color, texture, etc ...), 4) the system re-ranks the out-
first page. Copyrights for components of this work owned by others than ACM must be put images according to the above extracted information.
honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on Our system uses the Tobii eye tracker to capture an implicit rele-
servers, or to redistribute to lists, requires prior specific permission and/or a fee.
Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail
vance feedback and to classify the images in a different order of
permissions@acm.org. relevance with respect to the initial classification, in order to im-
ETRA 2010, Austin, TX, March 22 – 24, 2010.
© 2010 ACM 978-1-60558-994-7/10/0003 $10.00

73

Figure 1: Implicit Relevance Feedback for the new Ranking Method in web-based CBIR.

prove the ranking provided by the search on a CBIR environment. tant regions of the image, such as clustering algorithms and
The aim of this is to capture the user’s gaze fixations in order to object recognition, but it permits a considerable reduction of
identify the characteristics of the images s/he declares to be of computational complexity of search algorithms.
her/his interest. This will allow the tool to retrieve automatically In our case the detection is simplified by the eye tracker,
further relevant images. The tool may be also able to discover in an which allows us to identify the regions of major interest. The
unsupervised way the characteristics of the images of potential user local features, considered for describing image content, are
interest. Indeed, it is able to derive the characteristics of the images the Contrast C, Correlation Cr , Energy E, Homogeneity H,
of user interest by considering the images, which mainly captured Gabor filters G-Maps (24 maps: 6 scales and 4 orientations)
the user attention, e.g., by taking into account the user visual ac- and two novel features that describe the:
tivity over the analyzed images. In the former case the tool learns µ3 ·2552
how to select further relevant images, whereas in the latter case it – Brightness computed as rbright = µ + 10
;
could be also able to reclassify the images already examined by the 1
– Smoothness computed as rsmooth = ( µ2 + 1
+E +
µ4
user suggesting to her/him of reconsidering more deeply some po-
tentially relevant images. H);
Although the system proposed has been only tested on Google im- The above features are based on the moments of the histogram
ages to improve the precision of the retrieval, it may be applied to H of the gray levels. The nth moment of the histogram of
improve the precision of the retrieval of any document on the basis gray levels is represented by
of the images featuring the documents.
L−1
Figure 2 shows the general architecture of proposed implicit rel- µn (x) = (xi − µ) · p(xi )
evance feedback, where we point out the system ability of rear- i=0
ranging the images initially retrieved from a web-based CBIR (e.g.
Google Images) without any user supervision, i.e., only on the basis where p(xi ) is the probability of finding a pixel of the im-
of the user gaze fixations. A fine tuning of the characteristics to be age with gray level xi (given by the histogram H), L is the
possessed by the images may be carried out by the system on the number of gray levels and µ the average value. Therefore, in
basis of the user agreement for a better rearrangement of the images the proposed system, the images returned by the CBIR and
or for extracting relevant images from other datasets. In detail, the the file containing the data taken by the eye tracker are pro-
re-ranking mechanism is composed of the following steps: cessed in order to identify the most relevant images and their
features. Each image is then represented by a feature vector
• First Image Retrieval. The user enters some keywords on F = [C, Cr , R, H, G − M aps, rbright , rsmooth ].
the used CBIR and observes the results. During this phase,
the eye tracker stores gaze fixations on the thumbnails of the • Re-Ranking. The values of the extracted features, which
retrieved images, which most captured the user attention and should be possessed by the images to best fit the user inter-
her/his eye movements; est, are then processed to produce a ranking of the images
initially retrieved. In detail, we compute a similarity score
• Features Extraction. One of the crucial point in CBIR is the (which represents a sort of implicit relevance feedback) be-
choice of low-level features, to be used to compare the image tween the most relevant images, detected at the previous step,
under test with the queried image. The features combination and the images retrieved at the first step (see fig. 3). The
determines the effectiveness of research. The extracted fea- metrics to evaluate the similarity is based on the concept of
tures can be related to the entire image, so we are talking about distance, measured between the feature vector Frel (normal-
global features, or to its portion, then we are talking about lo- ized between 0 and 1) of the most salient images (extracted at
cal features. The local features extraction is more complex, the previous step) and the feature vector Fret (normalized be-
because it requires a first step for the detection of the impor- tween 0 and 1) of the images initially retrieved (at step 1). The

74

Figure 2: System Architecture.

images are re-ranked by using this similarity score, computed
as:

N
i i
f (IRel , IRet ) = wi · Ωi (frel , fret )
i=1 (1)
w1 + w2 + ..... + wN = 1

i i
where IRel , IRet , frel , fret are respectively the relevant im-
age detected at the previous step, the image initially retrieved,
the ith feature of the N features of the vector F of the image
IRel and IRet . Ωi is the fitness function related to the features
i i
frel , fret and is computed as:
1 i i
Ω = e− 2 ·(fret −frel ) (2)

Finally, the retrieved images are ordered, hence re-ranked, ac-
cording to the decreasing values of the similarity score f .
The relevance feedback detected by the eye tracker could be im-
proved by taking into account the ranking carried out by other meth- Figure 3: Eye Tracker with the implicit relevance feedback produce
ods, e.g., by the ones, which model the user behavior during the an image input for CBIR system.
phase of image analysis from how the user operates on the mouse
and keyboard.
fixations and the one related to the images, which are merged in the
3 User Interface and Experimental Results same picture, are actually separated into two files.
To evaluate the effectiveness of the proposed system for increasing
The system has been implemented by integrating the functionality the precision of the information retrieval carried out by Google Im-
of the Tobii Studio to Matlab 7.5 responsible for processing the out- ages, we will show below how the system rearranges significantly
put provided from the eye tracker. The Tobii studio makes possible the collection of images proposed by Google in response to the
to register a web browsing, setting appropriate parameters such as word “pyramid” and we will evaluate the performance increase as
the URL and the initial size of the window on the web browser. perceived by a set of 30 users. Indeed, such collection is proposed
By default the web browsing is set to http://images.google.com/ as without any knowledge of the user interest by merging images of
homepage, whereas the window size and resolution are put equal “pyramid” where the subject is either a monument or a geometric
to the entire screen and the maximum resolution allowed by the solid (see fig. 4).
monitor. After a proper training phase of the instrument, the user With the eye tracker we may go insight the user interests, by dis-
is authorized to start regular recording sessions that terminate by covering, for example that s/he is more interested in the pyramids as
pressing the F10 key on the keyboard. At the end of the session monuments since the more fixed images are related to the Egyptian
the user should confirm the export in textual form of the two files pyramids (as shown by the heatmap in fig. 5). With this informa-
related to fixations and events needed for the computation of the tion at hand it is relatively easy for the system to discover, after
relevance feedback. Thus, the information representing the gaze the recording session, the images relevant for the user following the

75

the users was satisfied after the re-ranking, the 6.7% of the users
was indifferent and the 6.7% was less satisfied.

Less Satisfied Indifferent More Satisfied
U sers 2 2 26

% 6.7 6.7 86.6

Table 1: Qualitative Performance Evaluation on a set of 30 Users

Figure 4: Google Ranking for “Pyramid” Keyword. 4 Conclusions and Future Work
The proposed model shows that the use of an eye tracker to de-
tect an implicit feedback may greatly improve the performance of
processing procedure pointed out in the previous section. a search in a CBIR system. Future developments will concern the
Fig. 6 shows the collection of the images as re-proposed by our possibility of considering not only the first image but also the next
in order of importance, to obtain a more refined ranking. Moreover,
we are currently working on the possibility to use visual attention
for image indexing, thus taking into account the real contents of
images. A comparison with on other web-based CBIRs and tests
on a wider set of users in order to provide quantitative results will
be carried out in future works.

References
B USCHER , G. 2007. Attention-based information retrieval. In
SIGIR ’07: Proceedings of the 30th annual international ACM
SIGIR conference on Research and development in information
retrieval, ACM, New York, NY, USA, 918–918.
Figure 5: Gaze Fixations on the Images retrieved by Google using
the “Pyramid” keyword. G RANKA , L. A., J OACHIMS , T., AND G AY, G. 2004. Eye-tracking
analysis of user behavior in www search. In SIGIR ’04: Proceed-
ings of the 27th annual international ACM SIGIR conference on
Research and development in information retrieval, ACM, New
York, NY, USA, 478–479.
K ELLY, D., AND T EEVAN , J. 2003. Implicit feedback for inferring
user preference: a bibliography. SIGIR Forum 37, 2, 18–28.
M ARQUES , O., M AYRON , L. M., B ORBA , G. B., AND G AMBA ,
H. R. 2006. Using visual attention to extract regions of interest
in the context of image retrieval. In ACM-SE 44: Proceedings
of the 44th annual Southeast regional conference, ACM, New
York, NY, USA, 638–643.
M ILLER , T., AND AGNE , S. 2005. Attention-based information
retrieval using eye tracker data. In K-CAP ’05: Proceedings of
the 3rd international conference on Knowledge capture, ACM,
Figure 6: New Images Pyramid Ranking according to the Eye New York, NY, USA, 209–210.
Tracker feedback given by the user.
M OE , K. K., J ENSEN , J. M., AND L ARSEN , B. 2007. A qual-
system. The new ranking correctly suggests a sequence that favors itative look at eye-tracking for implicit relevance feedback. In
the pyramids more similar to those observed and then actually re- CIR.
quested by the user. The user’s will was caught with an implicit OYEKOYA , O. K., AND S TENTIFORD , F. W. 2006. Eye tracking
relevance feedback by taking into account that s/he was particu- – a new interface for visual exploration. BT Technology Journal
larly attracted by a picture with the Sphinx in the foreground and 24, 3, 57–66.
the pyramid in the background.
The proposed system was then able to discover meaningful infor- OYEKOYA , O., AND S TENTIFORD , F. 2007. Perceptual image re-
mation from how the perception process has been carried out by the trieval using eye movements. Int. J. Comput. Math. 84, 9, 1379–
user. Indeed, by the new re-proposed ranking, at the top two places 1391.
there are images with the pyramid and the Sphinx. X U , S., Z HU , Y., J IANG , H., AND L AU , F. C. M. 2008. A
Finally, we tested the performance of the proposed system on a set user-oriented webpage ranking algorithm based on user attention
of 30 users. In detail, after the re-ranking the user was requested to time. In AAAI’08: Proceedings of the 23rd national conference
say if the first five retrieved images were more or less relevant to the on Artificial intelligence, AAAI Press, 1255–1260.
inserted word with respect to the ones obtained by Google Images.
The results are reported in table 1, where we can see that 86.6% of

76

Faro Visual Attention For Implicit Relevance Feedback In A Content Based Image Retrieval

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to Faro Visual Attention For Implicit Relevance Feedback In A Content Based Image Retrieval

Similar to Faro Visual Attention For Implicit Relevance Feedback In A Content Based Image Retrieval (19)

More from Kalle

More from Kalle (20)

Faro Visual Attention For Implicit Relevance Feedback In A Content Based Image Retrieval