TASK OVERVIEW
RETRIEVING DIVERSE SOCIAL IMAGES
Bogdan Ionescu (UPB, Romania)
Alexandru Lucian Gînscǎ (CEA LIST, France)
Maia Zaharieva (TUW&UW,Austria)
Mihai Lupu (TUW,Austria)
Henning Müller (HES-SO, Switzerland)
October 20-21, Hilversum, Netherlands
UNIVERSITY POLITEHNICA
OF BUCHAREST
WHY CARE ABOUT DIVERSIFYING
IMAGE SEARCH RESULTS?
GOAL OF THE TASK
For each query participants receive a list of photos retrieved from
Flickr and ranked with Flickr’s default "relevance" algorithm
Goal: refine the results by providing a ranked list of up to 50 photos
that are both relevant1 and diverse2 representations of the query.
1relevant: a common representation of the query concepts
2diverse: depict different visual characteristics of the query topics and
subtopics with a certain degree of complementarity, i.e. most of the
perceived visual information is different from one photo to another.
CORE CHALLENGE
QUERY = general-purpose, multi-topic term
e.g.: accordion player, blanket on sofa, construction
works, dancing on the street, drinking water, dog on a
leash, sand castles, sailing boat, three wheeled car, … .
DATASETS Photo by Roman Kraft
THE BASICS
Photos:
Development: 70 queries; 20,757 photos in total
Test: 64 queries; 18,717 photos in total
Available metadata for each photo/query:
query formulation
initial Flickr ranking
title, tags, description
views and user information
ADDITIONAL RESOURCES
Visual-based descriptors: CNN (Caffe framework)
Text-based descriptors:TF-IDF, SOLR indexes
User annotation credibility descriptors: 

provide an estimation of the quality of tag-image content
relationships using visual- and text-based content analysis
Wikiset: semantic vectors for general English terms
SOME STATISTICS
Development Dataset Test Dataset
# queries 70 64
# images 20,757 18,717
# images / query:
min - mean (std) - max
176 - 297 (19) - 300 141 - 292 (29) - 300
# relevant images / query:
min - mean (std) - max
9 - 191 (76) - 300 10 - 146 (82) - 298
# clusters / query:
min - mean (std) - max
5 - 18 (6) - 25 4 - 16 (6) - 25
# images / cluster:
min - mean (std) - max
1 - 11 (14) - 179 1 - 9 (10) - 100
EVALUATION Photo by John-Mark Kuznietsov
RUN SUBMISSION
Required runs:
run 1: automated using visual information only
run 2: automated using textual information only
run 3: automates using textual-visual fusion without other resources
than provided by the organizers
General runs:
runs 4&5: everything allowed, e.g. human-based, hybrid human-
machine, using external resources, etc.
OFFICIAL METRICS
Precision @ X = R/X (P@X)
where X is the cutoff point, R the number of relevant images
Cluster Recall @ X = Nc/N (CR@X)
where N is the total number of clusters for the current query and Nc
is the number of different clusters represented in the top X images
F1@X (harmonic meant of CR and P)
Metrics are reported for X={5,10,20,30,40,50}
Official ranking: F1@20
CR@20
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
#queries
0
5
10
15
20
25
30
CR@20
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
#queries
0
5
10
15
20
25
30
Flickr Baseline Results
Development data Test dataP@20 = 0.6979
CR@20 = 0.3117
F1@20 = 0.4674
P@20 = 0.5531
CR@20 = 0.3609
F1@20 = 0.4122
P@20
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
#queries
0
5
10
15
20
25
30
P@20
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
#queries
0
5
10
15
20
25
30
BENCHMARK RESULTS 2016Photo by Andrew Branch
PARTICIPANTS
Survey
13 respondents were interested in the task, 8 very interested
Registration
14 teams registered from 10 different countries
Runs submission
6 teams (incl. 2 organizers-related teams) finished the task
Workshop
5 teams participating
SUBMITTED RUNS (29)
Team Country
Required Runs General Runs
1 (visual) 2 (text) 3 (visual-text) 4 5
IMS* Austria ✓ ✓ ✓
✓
(visual-text)
✗
LAPI* Romania ✓ ✓ ✓
✓

(credibility)
✓

(visual-text-credibility)
RECOD Brazil ✓ ✓ ✓
✓

(visual-text)
✓

(visual-text)
UNED Spain ✓ ✓ ✓
✓

(text-human)
✓

(visual-text)
UPMC France ✓ ✓ ✓
✓

(text-credibility)
✓

(visual-text-credibility)
USS-ENIS-REGIM Tunisia ✓ ✓ ✓
✓

(visual)
✓

(visual-text-credibility)
*organizers-related team
OFFICIAL RANKING (F1@20)
Team Best Run P@20 CR@20 F1@20
UPMC run 3 (visual-text) 0.6961 0.4938 0.5532
LAPI* run 4 (credibility) 0.5484 0.4374 0.4638
UNED run 4 (text-human) 0.5734 0.4252 0.4597
IMS* run 3 (visual-text) 0.5430 0.4130 0.4471
RECOD run 5 (visual-text) 0.5156 0.4065 0.4379
Flickr Baseline 0.5531 0,3609 0.4122
USS-ENIS-REGIM run 5 (visual-text-credibility) 0.4180 0.3538 0.3637
P@20
0.4 0.45 0.5 0.55 0.6 0.65 0.7
CR@20
0.3
0.32
0.34
0.36
0.38
0.4
0.42
0.44
0.46
0.48
0.5
Flickr Baseline
IMS
LAPI
RECOD
UNEDV
UPMC
USS-ENIS-REGIM
P@20
0.4 0.45 0.5 0.55 0.6 0.65 0.7
CR@20
0.3
0.32
0.34
0.36
0.38
0.4
0.42
0.44
0.46
0.48
0.5
Flickr Baseline
IMS
LAPI
RECOD
UNEDV
UPMC
USS-ENIS-REGIM
Flickr
P@20
0.4 0.45 0.5 0.55 0.6 0.65 0.7
CR@20
0.3
0.32
0.34
0.36
0.38
0.4
0.42
0.44
0.46
0.48
0.5
Flickr Baseline
IMS
LAPI
RECOD
UNEDV
UPMC
USS-ENIS-REGIM
Flickr
UPMC
P@20
0.4 0.45 0.5 0.55 0.6 0.65 0.7
CR@20
0.3
0.32
0.34
0.36
0.38
0.4
0.42
0.44
0.46
0.48
0.5
Flickr Baseline
IMS
LAPI
RECOD
UNEDV
UPMC
USS-ENIS-REGIM
Flickr
UPMC
LAPI
P@20
0.4 0.45 0.5 0.55 0.6 0.65 0.7
CR@20
0.3
0.32
0.34
0.36
0.38
0.4
0.42
0.44
0.46
0.48
0.5
Flickr Baseline
IMS
LAPI
RECOD
UNEDV
UPMC
USS-ENIS-REGIM
Flickr
UPMC
LAPI
UNED
@5 @10 @20 @30 @40 @50
CR@X
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Flickr Baseline
IMS
LAPI
RECOD
UNEDV
UPMC
USS-ENIS-REGIM
@5 @10 @20 @30 @40 @50
P@X
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Flickr Baseline
IMS
LAPI
RECOD
UNEDV
UPMC
USS-ENIS-REGIM
Top 20 Flickr
results:
Hanging bridge
Top 20 Flickr
results:
Hanging bridge
Top 20 Flickr
results:
Hanging bridge
P@20 = 0.20
CR@20 = 0.25
F1@20 = 0.22
Hanging bridge
Best achieved
result:
Hanging bridge
Best achieved
result:
Hanging bridge
P@20 = 0.95
CR@20 = 0.75
F1@20 = 0.84
Best achieved
result:
Hanging bridge
P@20 = 0.95
CR@20 = 0.75
F1@20 = 0.84
Best achieved
result:
bottom up view
mid of the nature
facing a hanging bridge
starting point
winter view
colourful bridge
LESSONS LEARNED
The dataset is getting very complex and challenging
Different queries favour different approaches
Potential subjectivity in the annotation process
Still low resources for CC on Flickr
Acknowledgments
WWTF Project ICT12-010: Maia Zaharieva,Vienna University ofTechnology,Austria.
Task auxiliaries:Adrian Popescu, CEA LIST, France & Bogdan Boteanu, UPB, Romania.
Task supporters: Gabi Constantin, Lukas Diem, Ivan Eggel, Laura Fluerătoru, Ciprian Ionașcu,
Corina Macovei, Cătălin Mitrea, Irina Emilia Nicolae, Mihai Gabriel Petrescu,Andrei Purică.
ThankYou!
and …
Photo by Mario Salvo
… please share media online using

MediaEval 2016: Task Overview: Retrieving Diverse Social Images

  • 1.
    TASK OVERVIEW RETRIEVING DIVERSESOCIAL IMAGES Bogdan Ionescu (UPB, Romania) Alexandru Lucian Gînscǎ (CEA LIST, France) Maia Zaharieva (TUW&UW,Austria) Mihai Lupu (TUW,Austria) Henning Müller (HES-SO, Switzerland) October 20-21, Hilversum, Netherlands UNIVERSITY POLITEHNICA OF BUCHAREST
  • 2.
    WHY CARE ABOUTDIVERSIFYING IMAGE SEARCH RESULTS?
  • 8.
    GOAL OF THETASK For each query participants receive a list of photos retrieved from Flickr and ranked with Flickr’s default "relevance" algorithm Goal: refine the results by providing a ranked list of up to 50 photos that are both relevant1 and diverse2 representations of the query. 1relevant: a common representation of the query concepts 2diverse: depict different visual characteristics of the query topics and subtopics with a certain degree of complementarity, i.e. most of the perceived visual information is different from one photo to another.
  • 9.
    CORE CHALLENGE QUERY =general-purpose, multi-topic term e.g.: accordion player, blanket on sofa, construction works, dancing on the street, drinking water, dog on a leash, sand castles, sailing boat, three wheeled car, … .
  • 10.
    DATASETS Photo byRoman Kraft
  • 11.
    THE BASICS Photos: Development: 70queries; 20,757 photos in total Test: 64 queries; 18,717 photos in total Available metadata for each photo/query: query formulation initial Flickr ranking title, tags, description views and user information
  • 12.
    ADDITIONAL RESOURCES Visual-based descriptors:CNN (Caffe framework) Text-based descriptors:TF-IDF, SOLR indexes User annotation credibility descriptors: 
 provide an estimation of the quality of tag-image content relationships using visual- and text-based content analysis Wikiset: semantic vectors for general English terms
  • 13.
    SOME STATISTICS Development DatasetTest Dataset # queries 70 64 # images 20,757 18,717 # images / query: min - mean (std) - max 176 - 297 (19) - 300 141 - 292 (29) - 300 # relevant images / query: min - mean (std) - max 9 - 191 (76) - 300 10 - 146 (82) - 298 # clusters / query: min - mean (std) - max 5 - 18 (6) - 25 4 - 16 (6) - 25 # images / cluster: min - mean (std) - max 1 - 11 (14) - 179 1 - 9 (10) - 100
  • 14.
    EVALUATION Photo byJohn-Mark Kuznietsov
  • 15.
    RUN SUBMISSION Required runs: run1: automated using visual information only run 2: automated using textual information only run 3: automates using textual-visual fusion without other resources than provided by the organizers General runs: runs 4&5: everything allowed, e.g. human-based, hybrid human- machine, using external resources, etc.
  • 16.
    OFFICIAL METRICS Precision @X = R/X (P@X) where X is the cutoff point, R the number of relevant images Cluster Recall @ X = Nc/N (CR@X) where N is the total number of clusters for the current query and Nc is the number of different clusters represented in the top X images F1@X (harmonic meant of CR and P) Metrics are reported for X={5,10,20,30,40,50} Official ranking: F1@20
  • 17.
    CR@20 0 0.1 0.20.3 0.4 0.5 0.6 0.7 0.8 0.9 1 #queries 0 5 10 15 20 25 30 CR@20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 #queries 0 5 10 15 20 25 30 Flickr Baseline Results Development data Test dataP@20 = 0.6979 CR@20 = 0.3117 F1@20 = 0.4674 P@20 = 0.5531 CR@20 = 0.3609 F1@20 = 0.4122 P@20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 #queries 0 5 10 15 20 25 30 P@20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 #queries 0 5 10 15 20 25 30
  • 18.
  • 19.
    PARTICIPANTS Survey 13 respondents wereinterested in the task, 8 very interested Registration 14 teams registered from 10 different countries Runs submission 6 teams (incl. 2 organizers-related teams) finished the task Workshop 5 teams participating
  • 20.
    SUBMITTED RUNS (29) TeamCountry Required Runs General Runs 1 (visual) 2 (text) 3 (visual-text) 4 5 IMS* Austria ✓ ✓ ✓ ✓ (visual-text) ✗ LAPI* Romania ✓ ✓ ✓ ✓
 (credibility) ✓
 (visual-text-credibility) RECOD Brazil ✓ ✓ ✓ ✓
 (visual-text) ✓
 (visual-text) UNED Spain ✓ ✓ ✓ ✓
 (text-human) ✓
 (visual-text) UPMC France ✓ ✓ ✓ ✓
 (text-credibility) ✓
 (visual-text-credibility) USS-ENIS-REGIM Tunisia ✓ ✓ ✓ ✓
 (visual) ✓
 (visual-text-credibility) *organizers-related team
  • 21.
    OFFICIAL RANKING (F1@20) TeamBest Run P@20 CR@20 F1@20 UPMC run 3 (visual-text) 0.6961 0.4938 0.5532 LAPI* run 4 (credibility) 0.5484 0.4374 0.4638 UNED run 4 (text-human) 0.5734 0.4252 0.4597 IMS* run 3 (visual-text) 0.5430 0.4130 0.4471 RECOD run 5 (visual-text) 0.5156 0.4065 0.4379 Flickr Baseline 0.5531 0,3609 0.4122 USS-ENIS-REGIM run 5 (visual-text-credibility) 0.4180 0.3538 0.3637
  • 22.
    P@20 0.4 0.45 0.50.55 0.6 0.65 0.7 CR@20 0.3 0.32 0.34 0.36 0.38 0.4 0.42 0.44 0.46 0.48 0.5 Flickr Baseline IMS LAPI RECOD UNEDV UPMC USS-ENIS-REGIM
  • 23.
    P@20 0.4 0.45 0.50.55 0.6 0.65 0.7 CR@20 0.3 0.32 0.34 0.36 0.38 0.4 0.42 0.44 0.46 0.48 0.5 Flickr Baseline IMS LAPI RECOD UNEDV UPMC USS-ENIS-REGIM Flickr
  • 24.
    P@20 0.4 0.45 0.50.55 0.6 0.65 0.7 CR@20 0.3 0.32 0.34 0.36 0.38 0.4 0.42 0.44 0.46 0.48 0.5 Flickr Baseline IMS LAPI RECOD UNEDV UPMC USS-ENIS-REGIM Flickr UPMC
  • 25.
    P@20 0.4 0.45 0.50.55 0.6 0.65 0.7 CR@20 0.3 0.32 0.34 0.36 0.38 0.4 0.42 0.44 0.46 0.48 0.5 Flickr Baseline IMS LAPI RECOD UNEDV UPMC USS-ENIS-REGIM Flickr UPMC LAPI
  • 26.
    P@20 0.4 0.45 0.50.55 0.6 0.65 0.7 CR@20 0.3 0.32 0.34 0.36 0.38 0.4 0.42 0.44 0.46 0.48 0.5 Flickr Baseline IMS LAPI RECOD UNEDV UPMC USS-ENIS-REGIM Flickr UPMC LAPI UNED
  • 27.
    @5 @10 @20@30 @40 @50 CR@X 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Flickr Baseline IMS LAPI RECOD UNEDV UPMC USS-ENIS-REGIM
  • 28.
    @5 @10 @20@30 @40 @50 P@X 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Flickr Baseline IMS LAPI RECOD UNEDV UPMC USS-ENIS-REGIM
  • 29.
  • 30.
  • 31.
    Top 20 Flickr results: Hangingbridge P@20 = 0.20 CR@20 = 0.25 F1@20 = 0.22
  • 32.
  • 33.
  • 34.
    Hanging bridge P@20 =0.95 CR@20 = 0.75 F1@20 = 0.84 Best achieved result:
  • 35.
    Hanging bridge P@20 =0.95 CR@20 = 0.75 F1@20 = 0.84 Best achieved result: bottom up view mid of the nature facing a hanging bridge starting point winter view colourful bridge
  • 36.
    LESSONS LEARNED The datasetis getting very complex and challenging Different queries favour different approaches Potential subjectivity in the annotation process Still low resources for CC on Flickr Acknowledgments WWTF Project ICT12-010: Maia Zaharieva,Vienna University ofTechnology,Austria. Task auxiliaries:Adrian Popescu, CEA LIST, France & Bogdan Boteanu, UPB, Romania. Task supporters: Gabi Constantin, Lukas Diem, Ivan Eggel, Laura Fluerătoru, Ciprian Ionașcu, Corina Macovei, Cătălin Mitrea, Irina Emilia Nicolae, Mihai Gabriel Petrescu,Andrei Purică.
  • 37.
  • 38.
    Photo by MarioSalvo … please share media online using