Dynamic Two-Stage Image Retrieval from Large Multimodal Databases

873 views
815 views

Published on

Content-based image retrieval (CBIR) with global features is notoriously noisy, especially for image queries with low percentages of relevant images in a collection. Moreover, CBIR typically ranks the whole collection, which is inefficient for large databases. We experiment with a method for image retrieval from multimodal databases, which improves both the effectiveness and efficiency of traditional CBIR by exploring secondary modalities. We perform retrieval in a two-stage fashion: first rank by a secondary modality, and then perform CBIR only on the top-K items. Thus, effectiveness is improved by performing CBIR on a ‘better’ subset. Using a relatively ‘cheap’ first stage, efficiency is also improved via the fewer CBIR operations performed. Our main novelty is that K is dynamic, i.e. estimated per query to optimize a predefined effectiveness measure. We show that such dynamic two-stage setups can be significantly more effective and robust than similar setups with static thresholds previously proposed.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
873
On SlideShare
0
From Embeds
0
Number of Embeds
364
Actions
Shares
0
Downloads
23
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Dynamic Two-Stage Image Retrieval from Large Multimodal Databases

  1. 1. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis Konstantinos Zagoris Savvas Chatzichristofis Department of Electrical and Computer Engineering Democritus University of Thrace Xanthi 67100, Greece ECIR 2011
  2. 2. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 2 Outline1. Content-based Image Retrieval (CBIR): global vs local features, scalability2. A problem of global features: The Red Tomato / Red Pie-Chart Problem3. Multimodal Information Collections4. Two-stage Image Retrieval from Multimedia Databases related work our main contributions5. Experiments on the ImageCLEF 2010 Wikipedia Test Collection
  3. 3. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 3 Content-based Image Retrieval (CBIR) In CBIR, images are represented by either local features: ¦ computed at multiple image points; capable of recognizing objects ¦ computationally expensive, e.g. due to high dimensionality global features: ¦ generalize images with single vectors capturing color, texture, shape, etc. ¦ computationally cheap (relatively) ¥ thus, more popular CBIR with either global or local features does not scale up well to large databases efficiency-wise
  4. 4. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 4 CBIR with Global Features notoriously noisy for image queries of low generality (generality = the fraction of relevant images in a collection) In text retrieval: documents matching no query keywords are not retrieved In image retrieval: CBIR will rank the whole collection ¦ The Red Tomato / Red Pie-Chart Problem [next]
  5. 5. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 5 The Red Tomato / Red Pie-Chart Problem Image query: Collection items: If the query has a low generality early ranks may be dominated by spurious results (such as the pie-chart) possibly ranked even before red tomatoes on non-white backgrounds
  6. 6. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 7 Contemporary Information Collections Large Multimodal (= provide different manners of retrieval )A good example is Wikipedia: Large: 3,605,426 articles (English version alone), “millions of images”, etc. Multimodal: Topics are covered in several languages. Topics may include non-textual media (image, sound, video, etc.) annotated in a variety of metadata fields (caption, description, etc.) Thus, there are several ways to get to the same info/topic.
  7. 7. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 8 Image Retrieval from Multimodal DatabasesIn image retrieval systems, users are assumed to target visual similarity. Thus, Image is the primary medium. image modalities are most important than others Modalities from other media are secondary. but they can still help with the problems of CBIR: ¦ low generality (Red Tomato/Pie-Chart) problem (esp. for global feats.) ¦ speedTraditionally, fusion has been used, but: weighing of media is not trivial; usually requires training data not theoretically sound: influence of textual scores may worsen the visual quality
  8. 8. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 9 Two-stage Image Retrieval from Multimedia DatabasesKey idea for improving CBIR effectiveness:Before CBIR, raise query generality by reducing collection size via filtering.Assumptions: Query is expressed in the primary medium i.e. image, accompanied by a query in a secondary medium, e.g. text.Two stage retrieval: Rank the collection by the secondary medium/query, e.g. text. Draw a rank threshold K. Re-rank only the top-K items with CBIR.Using a ‘cheaper’ secondary medium, we also improve speed.
  9. 9. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 10 Previous Related WorkBest-results re-ranking by visual content has been seen before, but for other purposes, e.g. result clustering, diversity, . . . using external info (e.g. sets of images), or training data, . . . They all used global features a static predefined threshold for all queriesEffectiveness results have been mixed,while some didn’t provide a comparative evaluation.
  10. 10. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 11 Main ContributionsIn view of the related literature, our main contributions are: dynamic thresholds per query no external information no training data extensive evaluation of thresholding types and levels
  11. 11. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 12 The ImageCLEF 2010 Wikipedia Test Collection 237,434 items Primary medium: image ¦ Heterogeneous: color natural images, graphics, greyscale images, etc. ¦ variety of image sizes ¦ It’s a large benchmark image database for today’s standards. + noisy and incomplete user-supplied annotations + wikipedia articles containing the images Annotations articles come in 3 languages: En, Fr, De. 70 test topics Visual part: ¦ 1 or more example images Textual part: ¦ 3 titles fields, one per language (En, Fr, De) Topics are assessed by visual similarity.
  12. 12. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 13 Indexing and Retrieval text: only English annotations + English query Lemur Toolkit V4.11 / Indri V2.11 ¦ tf.idf retrieval model (works better with our thresholding methods) ¦ default settings + Krovetz stemmer image: Joint Composite Descriptor (JCD) ¦ captures color and texture information ¦ developed for color natural images Spatial Color Distribution (SpCD) ¦ captures color and its spatial distribution ¦ more suitable for colored graphics (fewer colors, less texture) JCD SpCD are found to be more effective than MPEG-7 descriptors.
  13. 13. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 14 Thresholding and Re-ranking Static Thresholding: a fixed pre-selected rank threshold K for all topics K 25, 50, 100, 250, 500, 1000. Dynamic Thresholding: a variable rank threshold per topic Score-Distributional Threshold Optimization (SDTO) [Arampatzis et.al., ”Where to Stop Reading a Ranked-list?”, SIGIR 2009] Two types: ¦ Threshold on Precision: g 0.990, 0.950, 0.800, 0.500, 0.330, 0.100 ¦ Threshold on prel: θ 0.990, 0.950, 0.800, 0.500, 0.330, 0.100
  14. 14. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 15 Initial Experiments: Setting a Baseline item scoring by MAP P@10 P@20 bpref JCD1 .0058 .0486 .0479 .0352 maxi JCDi .0072 .0614 .0614 .0387 maxi JCDi + maxi SpCDi .0112 .0871 .0886 .0415 tf.idf (text-only) .1293 .3614 .3314 .1806 Image-only runs provide very weak baselines. We chose the text-only run as a baseline for the statistical significance testing. This makes sense also from an efficiency point of view: if using a secondary text modality for image retrieval is more effective than current CBIR methods, then there is no reason at all for using computationally costly CBIR methods.
  15. 15. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 16 Results threshold K€ MAP P@10 JCD1 P@20 bpref MAP maxi JCDi + P@10 maxi SpCDi P@20 bpref text-only — .1293 .3614 .3314 .1806 .1293 .3614 .3314 .1806 25 25 .1162™ .3957 - .3457˜ .1641™ .1168 - .3943 - .3436 ˜ .1659™ 50 50 .1144™ .3829 - .3579˜ .1608™ .1154 - .3986 - .3557 - .1648™ 100 100 .1138 - .3786 - .3471 - .1609™ .1133 - .3900 - .3486 - .1623 - K 250 250 .1081™ .3414 - .3164 - .1644 - .1092™ .3771 - .3564 - .1664 - 500 500 .0968™ .3200 - .3007 - .1575™ .0999™ .3557 - .3250 - .1590™ 1000 1000 .0865™ .2871™ .2729™ .1493™ .0909™ .3329 - .3064 - .1511™ .9900 49 .1364 - .4214˜ .3550 - .1902˜ .1385˜ .4371˜œ .3743˜ .1921˜ .9500 68 .1352 - .4171˜ .3586 - .1912˜ .1386˜ .4500˜ œ .3836˜ .1932˜ .8000 95 .1318 - .4000 - .3536 - .1892 - .1365 - .4443˜œ .3871˜ .1924 - g .5000 151 .1196 - .3814 - .3393 - .1808 - .1226 - .4043 - .3550 - .1813 - .3333 237 .1085™ .3500 - .3000 - .1707 - .1121™ .3857 - .3364 - .1734 - .1000 711 .0864™ .2871™ .2621™ .1461™ .0909™ .3357 - .2964 - .1487™ .9900 42 .1342 - .4043 - .3414 - .1865 - .1375˜ .4371˜œ .3700˜ .1897˜ .9500 51 .1371 - .4214˜ .3586 - .1903˜ .1417˜œ .4500˜œ .3864˜œ .1924˜œ .8000 81 .1384˜ .4229˜ .3614 - .1921˜ .1427˜ œ .4629˜ œ .3871˜œ .1961˜ œ θ .5000 91 .1367 - .4057 - .3571 - .1919˜ .1397˜ .4400˜œ .3829˜ .1937˜ .3333 109 .1375 - .4129 - .3636 - .1933˜ .1404˜ .4500˜œ .3907˜œ .1949˜œ .1000 130 .1314 - .4100 - .3629 - .1866 - .1370 - .4371˜ .3843˜ .1922˜ image-only — .0058™ .0486™ .0479™ .0352™ .0112™ .0871™ .0886™ .0415™
  16. 16. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 17 Results Static thresholding improves initial Precision at the cost of MAP and bpref. Dynamic thresholding on Precision or prel does not have this drawback. The level of static and precision thresholds influences greatly the effectiveness; unsuitable choices (e.g. too loose) lead to a degraded performance. Prel thresholds are much more robust in this respect. Better CBIR at the second stage leads to overall improvements, but the thresholding type seems more important: While the two CBIR methods vary greatly in performance (the best has almost double the effectiveness of the other), static thresholding is not influenced much by this choice. Dynamic methods benefit more from improved CBIR.
  17. 17. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 18 Results
  18. 18. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 20 Conclusions Two-stage retrieval with dynamic thresholding: more effective robust than static thresholding practically insensitive to a wide range of choices for the optimization measure beats significantly the text-only several image-only baselines Two-stage retrieval, irrespective of thresholding type: efficiency benefit: ¦ it cuts down greatly on expensive image operations ¦ on average, only 2 to 5 in 10,000 images had to be scored
  19. 19. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 21 Further Research Compare two-stage retrieval to fusion. [Did this; check ECIR11 poster] Both methods work better than text-only image-only; two-stage is slightly better than fusion, but the difference is non-significant. Both methods are robust in different ways: ¦ Fusion provides less variability across topics but it is sensitive to the weighing parameter of the contributing media. ¦ Two-stage provides a much lower sensitivity to its thresholding parameter but has a higher variability across topics. Try two-stage retrieval with other type of image descriptors, e.g. the visual codebook approach (TOP-SURF). Generalization to multi-stage retrieval, where rankings for the media are successively being thresholded and re-ranked with respect to a media hierarchy.
  20. 20. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 22 avi@ee.duth.gr Thank you !
  21. 21. http://www.mmretrieval.net

×