Abstract: This paper provides an overview of the Retrieving Diverse Social Images task that is organized as part of the MediaEval 2017 Benchmarking Initiative for Multimedia Evaluation. The task addresses the challenge of visual diversification of image retrieval results, where images, metadata, user tagging profiles, and content and text models are available for processing. We present the task challenges, the employed dataset and ground truth information, the required runs, and the considered evaluation metrics.
MediaEval 2017 Retrieving Diverse Social Images Task (Overview)
Retrieving Diverse Social Images Task
- task overview -
Maia Zaharieva (TUW, Austria)
Bogdan Ionescu (UPB, Romania)
Alexandru Lucian Gînscǎ (CEA LIST, France)
Rodrygo L.T. Santos (UFMG, Brazil)
Henning Müller (HES-SO in Sierre, Switzerland)
Bogdan Boteanu (UPB, Romania)
September 13-15, Dublin, Irelandce
Universidade Federal de
Minas Gerais, Brazil
The Retrieving Diverse Social Images Task
Dataset and Evaluation
Discussion and Perspectives
Diversity Task: Objective & Motivation
Objective: image search result diversification in the context of
social photo retrieval.
Why diversifying search results?
- to respond to the needs of different users;
- as a method of tackling queries with unclear information needs;
- to widen the pool of possible results (increase performance);
- to reduce the number/redundancy of the returned items;
Diversity Task: Definition
For each query, participants receive a ranked list of photos retrieved
from Flickr using its default “relevance” algorithm.
Query = general-purpose, multi-topic term
e.g.: autumn colors, bee on a flower, home office, snow in
the city, holding hands, ...
Goal of the task: refine the results by providing a ranked list of up
to 50 photos (summary) that are considered to be both relevant and
diverse representations of the query.
relevant: a common photo representation of the query topics (all at once);
bad quality photos (e.g., severely blurred, out of focus) are not considered
relevant in this scenario
diverse: depicting different visual characteristics of the query topics and
subtopics with a certain degree of complementarity, i.e., most of the
perceived visual information is different from one photo to another.
Dataset: General Information & Resources
query text formulation;
ranked list of Creative Commons photos from Flickr*
(up to 300 photos per query);
metadata from Flickr (e.g., tags, description, views,
comments, date-time photo was taken, username, userid, etc);
visual, text & user annotation credibility descriptors;
semantic vectors for general English terms computed on top of
the English Wikipedia(wikiset);
relevance and diversity ground truth.
Development: 110 queries 32,340 photos
Test: 84 queries 24,986 photos
Dataset: Provided Descriptors
General purpose visual descriptors:
e.g., Auto Color Correlogram, Color and Edge Directivity
Descriptor, Pyramid of Histograms of Orientation Gradients, etc;
Convolutional Neural Network based descriptors:
Caffe framework based;
General purpose text descriptors:
e.g., term frequency information, document frequency
information and their ratio, i.e., TF-IDF;
User annotation credibility descriptors (give an automatic
estimation of the quality of users' tag-image content relationships):
e.g., measure of user image relevance, total number of images a
user shared, the percentage of images with faces.
Dataset: Ground Truth - annotations
Relevance and diversity annotations were carried out by
relevance: 8 annotators + 1 master (3 annotations/query)
diversity: 1 annotation/query
relevance: 8 annotators + 1 master (3 annotations/query)
diversity: 12 annotators (3 annotations/query)
Lenient majority voting for relevance
Evaluation: Run Specification
Participants are required to submit up to 5 runs:
run 1: automated using visual information only;
run 2: automated using textual information only;
run 3: automated using textual-visual fused without other
resources than provided by the organizers;
run 4: everything allowed, e.g. human-based or hybrid human-
machine approaches, including using data from external
sources, (e.g., Internet) or pre-trained models obtained from
external datasets related to this task;
run 5: everything allowed.
Evaluation: Official Metrics
Cluster Recall* @ X = Nc/N (CR@X)
where X is the cutoff point, N is the total number of clusters for the
current query (from ground truth, N<=25) and Nc is the number of
different clusters represented in the X ranked images;
*cluster recall is computed only for the relevant images.
Precision @ X = R/X (P@X)
where R is the number of relevant images;
F1-measure @ X = harmonic mean of CR and P (F1@X)
Metrics are reported for different values of X (5, 10, 20, 30, 40 & 50)
on per topic as well as overall (average).
official ranking F1@20
Participants: Basic Statistics
- 22 respondents were interested in the task;
- 14 teams registered (1 team is organizer related);
- 6 teams finished the task, including 1 organizer related team;
- 29 runs were submitted;
- 5 teams are represented at the workshop.
this year mainly classification/clustering (& fusion), re-ranking,
relevance feedback, & neural-network based;
best run F1@20: improving relevancy (text) + neural network-based
clustering; use of visual-text information (team NLE).
getting very complex (read diverse);
still low resources for Creative Commons on Flickr;
descriptors were very well received (employed by all of the
participants as provided).
Bogdan Boteanu, UPB, Romania & Mihai Lupu, Vienna University of
Alberto Ueda, Bruno Laporais, Felipe Moraes, Lucas Chaves, Jordan
Silva, Marlon Dias, Rafael Glater
Catalin Mitrea, Mihai Dogariu, Liviu Stefan, Gabriel Petrescu, Alexandru
Toma, Alina Banica, Andreea Roxana, Mihaela Radu, Bogdan Guliman,