Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MediaEval 2017 Retrieving Diverse Social Images Task (Overview)

185 views

Published on

Presenter: Bogdan Boteanu, University Politehnica of Bucharest, Romania

Paper: http://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_1.pd

Video: https://youtu.be/jXA3xvHqVWg

Authors: Maia Zaharieva, Bogdan Ionescu, Alexandru Lucian Gînscă, Rodrygo L.T. Santos, Henning Müller

Abstract: This paper provides an overview of the Retrieving Diverse Social Images task that is organized as part of the MediaEval 2017 Benchmarking Initiative for Multimedia Evaluation. The task addresses the challenge of visual diversification of image retrieval results, where images, metadata, user tagging profiles, and content and text models are available for processing. We present the task challenges, the employed dataset and ground truth information, the required runs, and the considered evaluation metrics.

Published in: Science
  • Be the first to comment

  • Be the first to like this

MediaEval 2017 Retrieving Diverse Social Images Task (Overview)

  1. 1. Retrieving Diverse Social Images Task - task overview - 2017 University Politehnica of Bucharest Maia Zaharieva (TUW, Austria) Bogdan Ionescu (UPB, Romania) Alexandru Lucian Gînscǎ (CEA LIST, France) Rodrygo L.T. Santos (UFMG, Brazil) Henning Müller (HES-SO in Sierre, Switzerland) Bogdan Boteanu (UPB, Romania) September 13-15, Dublin, Irelandce Universidade Federal de Minas Gerais, Brazil
  2. 2.  The Retrieving Diverse Social Images Task  Dataset and Evaluation  Participants  Results  Discussion and Perspectives 2 Outline
  3. 3. 3 Diversity Task: Objective & Motivation Objective: image search result diversification in the context of social photo retrieval. Why diversifying search results? - to respond to the needs of different users; - as a method of tackling queries with unclear information needs; - to widen the pool of possible results (increase performance); - to reduce the number/redundancy of the returned items; …
  4. 4. 3 Diversity Task: Objective & Motivation #2
  5. 5. 4 Diversity Task: Objective & Motivation #2
  6. 6. 5 Diversity Task: Objective & Motivation #3
  7. 7. 7 Diversity Task: Definition For each query, participants receive a ranked list of photos retrieved from Flickr using its default “relevance” algorithm. Query = general-purpose, multi-topic term e.g.: autumn colors, bee on a flower, home office, snow in the city, holding hands, ... Goal of the task: refine the results by providing a ranked list of up to 50 photos (summary) that are considered to be both relevant and diverse representations of the query. relevant: a common photo representation of the query topics (all at once); bad quality photos (e.g., severely blurred, out of focus) are not considered relevant in this scenario diverse: depicting different visual characteristics of the query topics and subtopics with a certain degree of complementarity, i.e., most of the perceived visual information is different from one photo to another.
  8. 8. 8 Dataset: General Information & Resources Provided information:  query text formulation;  ranked list of Creative Commons photos from Flickr* (up to 300 photos per query);  metadata from Flickr (e.g., tags, description, views, comments, date-time photo was taken, username, userid, etc);  visual, text & user annotation credibility descriptors;  semantic vectors for general English terms computed on top of the English Wikipedia(wikiset);  relevance and diversity ground truth. Photos: Development: 110 queries 32,340 photos Test: 84 queries 24,986 photos
  9. 9. 9 Dataset: Provided Descriptors General purpose visual descriptors:  e.g., Auto Color Correlogram, Color and Edge Directivity Descriptor, Pyramid of Histograms of Orientation Gradients, etc; Convolutional Neural Network based descriptors:  Caffe framework based; General purpose text descriptors:  e.g., term frequency information, document frequency information and their ratio, i.e., TF-IDF; User annotation credibility descriptors (give an automatic estimation of the quality of users' tag-image content relationships):  e.g., measure of user image relevance, total number of images a user shared, the percentage of images with faces.
  10. 10. 10 Dataset: Basic Statistics devset (design the methods) testset (final benchmarking) #queries 110 84 #images 32,340 24,986 #img. per query (min-average-max ) 141 - 295 - 300 299 - 300 - 300 % relevant img. 53 57.4 avg. #clusters per query 17 14 avg. #img. per cluster 9 14
  11. 11. 11 Dataset: Ground Truth - annotations Relevance and diversity annotations were carried out by expert annotators:  devset: relevance: 8 annotators + 1 master (3 annotations/query) diversity: 1 annotation/query  testset: relevance: 8 annotators + 1 master (3 annotations/query) diversity: 12 annotators (3 annotations/query)  Lenient majority voting for relevance
  12. 12. 12 Evaluation: Run Specification Participants are required to submit up to 5 runs:  required runs: run 1: automated using visual information only; run 2: automated using textual information only; run 3: automated using textual-visual fused without other resources than provided by the organizers;  general runs: run 4: everything allowed, e.g. human-based or hybrid human- machine approaches, including using data from external sources, (e.g., Internet) or pre-trained models obtained from external datasets related to this task; run 5: everything allowed.
  13. 13. 13 Evaluation: Official Metrics  Cluster Recall* @ X = Nc/N (CR@X) where X is the cutoff point, N is the total number of clusters for the current query (from ground truth, N<=25) and Nc is the number of different clusters represented in the X ranked images; *cluster recall is computed only for the relevant images.  Precision @ X = R/X (P@X) where R is the number of relevant images;  F1-measure @ X = harmonic mean of CR and P (F1@X) Metrics are reported for different values of X (5, 10, 20, 30, 40 & 50) on per topic as well as overall (average). official ranking F1@20
  14. 14. 14 Participants: Basic Statistics  Survey: - 22 respondents were interested in the task;  Registration: - 14 teams registered (1 team is organizer related);  Run submission: - 6 teams finished the task, including 1 organizer related team; - 29 runs were submitted;  Workshop participation: - 5 teams are represented at the workshop.
  15. 15. 15 Participants: Submitted Runs (29) *organizer related team Team Country Required Runs General Runs Results (best) 1 (visual) 2 (text) 3 (vis-text) 4 5 P@20 CR@20 F1@20 NLE France ✓ ✓ ✓ visual-text visual-text 0.793 0.679 0.705 MultiBrazil Brazil ✓ ✓ ✓ visual-text-cred. visual-text-cred. 0.7208 0.6524 0.6634 UMONS Belgium ✓ ✓ ✓ visual-text-cred. visual-cred. 0.8071 0.5856 0.6554 CFM China ✓ ✓ ✓ text-cred. text-cred. 0.6881 0.6671 0.6533 tud-mmc Netherlands ✓ ✓ ✓ text-intent ✗ 0.7262 0.6142 0.6462 Flickr 0.6595 0.5831 0.5922 LAPI* Romania ✓ ✓ ✓ visual cred. 0.633 0.6045 0.5777
  16. 16. 16 Results: P vs. CR @20 (all runs - testset) 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.5 0.55 0.6 0.65 0.7 P@20 CR@20 Flickr Initial CFM LAPI MultiBrazil NLE tud-mmc UMONS Flickr initial NLE UMONS
  17. 17. 17 Results: Best Team Runs (F1 @) 0.3 0.4 0.5 0.6 0.7 0.8 @5 @10 @20 @30 @40 @50 F1@X Flickr Initial CFM_run5_text_cred.txt LAPI_HC_PSRF_Run5.txt run3VisualTextual_MultiBrasil.txt NLE_run3_CMRF_MMR.txt tudmmc_run4_tudmmc_intent.txt UMONS_run5_visual_user_G.txt
  18. 18. 18 Results: Best Team Runs (Cluster Recall @) 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 @5 @10 @20 @30 @40 @50 CR@X Flickr Initial CFM_run5_text_cred.txt LAPI_HC_PSRF_Run5.txt run3VisualTextual_MultiBrasil.txt NLE_run3_CMRF_MMR.txt tudmmc_run4_tudmmc_intent.txt UMONS_run5_visual_user_G.txt
  19. 19. Results: Visual Results – Flickr Initial Results Truck Camper 19
  20. 20. Results: Visual Results – Flickr Initial Results Truck Camper CR@20=0.35, P@20=0.3, F1@20=0.32 19
  21. 21. Results: Visual Results #2 – Best run (F1@20) 20 Truck Camper
  22. 22. Results: Visual Results #2 – Best run (F1@20) 20 Truck Camper CR@20=0.68, P@20=0.8, F1@20=0.74
  23. 23. Results: Visual Results #3 – Lowest run 21 Truck Camper
  24. 24. Results: Visual Results #3 – Lowest run 21 Truck Camper CR@20=0.5, P@20=0.5, F1@20=0.5
  25. 25. 22 Brief Discussion Methods:  this year mainly classification/clustering (& fusion), re-ranking, relevance feedback, & neural-network based;  best run F1@20: improving relevancy (text) + neural network-based clustering; use of visual-text information (team NLE). Dataset:  getting very complex (read diverse);  still low resources for Creative Commons on Flickr;  descriptors were very well received (employed by all of the participants as provided).
  26. 26. 23 Acknowledgements Task auxiliaries: Bogdan Boteanu, UPB, Romania & Mihai Lupu, Vienna University of Technology, Austria Task supporters: Alberto Ueda, Bruno Laporais, Felipe Moraes, Lucas Chaves, Jordan Silva, Marlon Dias, Rafael Glater Catalin Mitrea, Mihai Dogariu, Liviu Stefan, Gabriel Petrescu, Alexandru Toma, Alina Banica, Andreea Roxana, Mihaela Radu, Bogdan Guliman, Sebastian Moraru
  27. 27. 24 Questions & Answers Thank you!

×