A Unified Framework for Retrieving 
Diverse Social Images 
MAIA ZAHARIEVA1,2 AND PATRICK SCHWAB1 
1Multimedia Information Systems, Faculty of Computer Science, University of Vienna, Austria 
2Interactive Media Systems, Institute of Sof tware Technology and Interactive Systems, 
Vienna University of Technology, Austria 
The large number of publicly shared images leads 
to a high number of visually similar data. In 
traditional search engines these images receive a 
similar ranking, as the only ranking criterion is 
image’s relevance to a query. This, in turn, 
produces redundant search results, where users 
often have to browse through several pages of 
similar images to get an overview of the underlying 
dataset. 
This is where the MediaEval 2014 Retrieving 
Diverse Social Images Task [1] and, subsequently, 
this project come in: We aim at the reduction of 
redundancy in search results by introducing 
diversity as a ranking criterion. 
Final image 
selection 
EXPERIMENTS 
RANK FEATURE COMPARISON METRIC 
1 CM3x3 Euclidean 
2 TFIDF Spearman 
SIFT Euclidean 
3 CM χ2 
LBP χ2 
LBP3x3 χ2 
4 HOG cosine 
GLRLM χ2 
GLRLM3x3 χ2 
5 CSD cosine 
CN correlation 
6 CN3x3 Euclidean 
RANK FEATURE COMPARISON METRIC 
1 CM χ2 
CM3x3 Euclidean 
2 HOG cosine 
3 GLRLM3x3 χ2 
LBP3x3 χ2 
4 GLRLM Euclidean 
LBP χ2 
SIFT Euclidean 
5 CSD cosine 
CN Euclidean 
CN3x3 Euclidean 
6 TFIDF Euclidean 
RESULTS 
RUN RELEVANCE RANKING IMAGE CLUSTERING 
run1 (V) CM3x3 SIFT 
run2 (T) TFIDF TFIDF 
run3 (VT) TFIDF CSD 
run5 (V) CM3x3 CSD 
Table 3: The features used in the runs. 
V – visual, T – textual, VT – visual and textual 
RUN CR@20 P@20 F1@20 
run1 0.4426 0.7600 0.5552 
run2 0.4132 0.7250 0.5188 
run3 0.4484 0.7567 0.5559 
run5 0.4369 0.7617 0.5499 
Table 4: Evaluation results on the development dataset. 
RUN CR@20 P@20 F1@20 
run1 0.3901 0.6646 0.4863 
run2 0.3909 0.6809 0.4888 
run3 0.3982 0.6732 0.4949 
run5 0.3915 0.6752 0.4897 
Table 1: Feature-metric combination ranking for relevance 
ranking. 
Table 5: Evaluation results on the test dataset. 
Table 2: Feature-metric combination ranking for image 
clustering. 
• We submitted 4 different runs for the final 
evaluation (Table 3 shows the configurations). 
• The combination of visual and textual 
information achieved the best performances. 
• However, the clustering recall (CR) remained 
relatively low due to the large number of 
irrelevant images. 
We conducted a thorough evaluation of the 
employed feature-metric combinations at the two 
main stages of our approach: relevance ranking 
and image clustering. 
REFERENCES 
INTRODUCTION 
APPROACH 
CONCLUSION 
B. Ionescu, A. Popescu, M. Lupu, A. L. 
Gînscâ, and H. Müller. Retrieving diverse 
social images at mediaeval 2014: Challenges, 
datasets, and evaluation. In MediaEval 2014 
Multimedia Challenge Workshop, 2014. 
[1] 
We employ a multi-stage approach for the retrieval 
of diverse social images. The workflow passes the 
following three main stages: 
Relevance ranking of input images. 
In this stage we evaluate how relevant each image 
is with respect to a given query. We compare every 
image to a set of representative images by 
computing the distance between their correspon-ding 
feature vectors. For the given dataset, we use 
the provided Wikipedia images as reference 
images. Following, we assign the smallest 
distance to any of the reference images as its 
relevance score. 
Image clustering for diversification. 
The image clustering stage aims at the identifica-tion 
of groups of similar images that can be used 
to diversify the final retrieval results. The clustering 
algorithm is up to choice. For our purposes, 
Adaptive Hierarchical Clustering (AHC) showed 
the best performance. It’s important to note that 
the features and the distance metric used in this 
stage can, and optimally should, differ from the 
ones employed in the relevance ranking stage. 
Final image selection. 
The final stage combines the results of the 
previous steps to facilitate a result set that is both 
relevant and diverse according to the initial image 
set. We use a Round-Robin algorithm that 
chooses the most relevant images from alternating 
clusters. 
Although, there are significant differences in the 
performances of single features, the top perform-ing 
features prove to be highly interchangeable. 
Achieved results indicate that - for the given data-sets 
- the crucial part of the process is not so much 
the diversification but more the assessment of 
image relevance. 
In general, the achieved results outline the 
limitations of the available textual (and visual) 
information in assessing image relevance. A 
possible approach to improve the results may 
consider occasionally available GPS data and 
employ external resources as additional source for 
information. 
Relevance ranking Image clustering 
1 2 3 …! 
Figure 1: Schematic overview of the proposed approach. 
Images from: http://google.com/search?q=eiffel+tower, 
accessed July 2nd 2014 
ACKNOWLEDGMENT 
This work has been partly funded by the Vienna 
Science and Technology Fund (WWTF) through 
project ICT12-010.

A Unified Framework for Retrieving Diverse Social Images

  • 1.
    A Unified Frameworkfor Retrieving Diverse Social Images MAIA ZAHARIEVA1,2 AND PATRICK SCHWAB1 1Multimedia Information Systems, Faculty of Computer Science, University of Vienna, Austria 2Interactive Media Systems, Institute of Sof tware Technology and Interactive Systems, Vienna University of Technology, Austria The large number of publicly shared images leads to a high number of visually similar data. In traditional search engines these images receive a similar ranking, as the only ranking criterion is image’s relevance to a query. This, in turn, produces redundant search results, where users often have to browse through several pages of similar images to get an overview of the underlying dataset. This is where the MediaEval 2014 Retrieving Diverse Social Images Task [1] and, subsequently, this project come in: We aim at the reduction of redundancy in search results by introducing diversity as a ranking criterion. Final image selection EXPERIMENTS RANK FEATURE COMPARISON METRIC 1 CM3x3 Euclidean 2 TFIDF Spearman SIFT Euclidean 3 CM χ2 LBP χ2 LBP3x3 χ2 4 HOG cosine GLRLM χ2 GLRLM3x3 χ2 5 CSD cosine CN correlation 6 CN3x3 Euclidean RANK FEATURE COMPARISON METRIC 1 CM χ2 CM3x3 Euclidean 2 HOG cosine 3 GLRLM3x3 χ2 LBP3x3 χ2 4 GLRLM Euclidean LBP χ2 SIFT Euclidean 5 CSD cosine CN Euclidean CN3x3 Euclidean 6 TFIDF Euclidean RESULTS RUN RELEVANCE RANKING IMAGE CLUSTERING run1 (V) CM3x3 SIFT run2 (T) TFIDF TFIDF run3 (VT) TFIDF CSD run5 (V) CM3x3 CSD Table 3: The features used in the runs. V – visual, T – textual, VT – visual and textual RUN CR@20 P@20 F1@20 run1 0.4426 0.7600 0.5552 run2 0.4132 0.7250 0.5188 run3 0.4484 0.7567 0.5559 run5 0.4369 0.7617 0.5499 Table 4: Evaluation results on the development dataset. RUN CR@20 P@20 F1@20 run1 0.3901 0.6646 0.4863 run2 0.3909 0.6809 0.4888 run3 0.3982 0.6732 0.4949 run5 0.3915 0.6752 0.4897 Table 1: Feature-metric combination ranking for relevance ranking. Table 5: Evaluation results on the test dataset. Table 2: Feature-metric combination ranking for image clustering. • We submitted 4 different runs for the final evaluation (Table 3 shows the configurations). • The combination of visual and textual information achieved the best performances. • However, the clustering recall (CR) remained relatively low due to the large number of irrelevant images. We conducted a thorough evaluation of the employed feature-metric combinations at the two main stages of our approach: relevance ranking and image clustering. REFERENCES INTRODUCTION APPROACH CONCLUSION B. Ionescu, A. Popescu, M. Lupu, A. L. Gînscâ, and H. Müller. Retrieving diverse social images at mediaeval 2014: Challenges, datasets, and evaluation. In MediaEval 2014 Multimedia Challenge Workshop, 2014. [1] We employ a multi-stage approach for the retrieval of diverse social images. The workflow passes the following three main stages: Relevance ranking of input images. In this stage we evaluate how relevant each image is with respect to a given query. We compare every image to a set of representative images by computing the distance between their correspon-ding feature vectors. For the given dataset, we use the provided Wikipedia images as reference images. Following, we assign the smallest distance to any of the reference images as its relevance score. Image clustering for diversification. The image clustering stage aims at the identifica-tion of groups of similar images that can be used to diversify the final retrieval results. The clustering algorithm is up to choice. For our purposes, Adaptive Hierarchical Clustering (AHC) showed the best performance. It’s important to note that the features and the distance metric used in this stage can, and optimally should, differ from the ones employed in the relevance ranking stage. Final image selection. The final stage combines the results of the previous steps to facilitate a result set that is both relevant and diverse according to the initial image set. We use a Round-Robin algorithm that chooses the most relevant images from alternating clusters. Although, there are significant differences in the performances of single features, the top perform-ing features prove to be highly interchangeable. Achieved results indicate that - for the given data-sets - the crucial part of the process is not so much the diversification but more the assessment of image relevance. In general, the achieved results outline the limitations of the available textual (and visual) information in assessing image relevance. A possible approach to improve the results may consider occasionally available GPS data and employ external resources as additional source for information. Relevance ranking Image clustering 1 2 3 …! Figure 1: Schematic overview of the proposed approach. Images from: http://google.com/search?q=eiffel+tower, accessed July 2nd 2014 ACKNOWLEDGMENT This work has been partly funded by the Vienna Science and Technology Fund (WWTF) through project ICT12-010.