Ghent and Cardiff University at the 2012 Placing Task

Ghent and Cardiff University at the 2012
Placing Task (UG-CU)
Olivier Van Laere, Bart Dhoedt
Department of Information Technology (INTEC)
Ghent University, Belgium
Steven Schockaert, Jonathan A. Quinn, Frank C. Langbein
School of Computer Science & Informatics
Cardiff University, United Kingdom

Department of Information Technology – Broadband Communication Networks (IBCN)
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy

Lessons from last year

 Using a prior in our language models that includes
information from the user’s home location
significantly boosts the results
 Clear need for a feature selection technique

tailored to this task
 E.g. WISTUD approach 2011

2


italy

3


sicily

4


sea

5


pisa

6


leaningtower

7


 Using a prior in our language models that includes
information from the user’s home location
significantly boosts the results
 Clear need for a feature selection technique

tailored to this task
 E.g. WISTUD approach 2011
 Need for handling videos without tags
 43.4% of test data compared to 16.1% last year
 We try to georeference the item at the 10000
clustering, and fall back to 2500, 500 in case of
absence of textual information
8

Data

 ~2.1M of the original ~3M task training photos
 Run 2: extracted SIFT features
 to the extent the images were available on Flickr
 Run 5: ~17.1M Flickr photos
 Crawled in 2011, accuracy 16 ~ street level
 Gazetteer: Google Geocoding API
 Used to reverse geocode the “home” field from the
user’s profile

9

Approach – two step

 Data clustered into 500, 2500 and 10000 areas
 Feature vocabulary selection for each of those

 Language models are used to select most likely

area to contain the given test video, based on
textual information
 Similarity search, using the textual information, is

used to select a location within this area based on
the most similar training items

10

Approach – main differences

 Adopted feature selection method from WISTUD
 In case a video has no tags, use:
 Textual home location from the user, video title and
description as Flickr tags
 In case there is no textual info at all, default to
London
 If available and considered reliable, include visual

similarity

11

Approach – similarity search

 Instead of returning the location of most similar
(using Jaccard index) training item
 3 possible locations:
 Most similar training photo
 Home location of the owner (if allowed and available)
 Visually most similar training photo
 We choose the location minimizing a certain score

12

Results and discussion - dev
2011 1km 10km 100km 1000km 10K km
test
run1 23.28% 44.62% 62.46% 75.00% 97.38%
run2 24.20% 51.49% 72.62% 85.62% 97.85%
run3 23.62% 49.84% 70.30% 84.14% 97.83%
run4 0.04% 0.11% 0.92% 11.67% 81.02%
run5 48.01% 65.98% 76.85% 87.38% 98.43%
2012 1km 10km 100km 1000km 10K km
dev
run1 24.18% 53.13% 72.71% 85.15% 98.19%
run2 24.65% 54.25% 75.05% 86.82% 98.34%
run3 24.59% 54.25% 75.01% 86.82% 98.34%
run4 0.58% 2.69% 5.82% 21.45% 92.07%
run5 47.52% 66.04% 76.83% 86.65%
Department of Information Technology – Broadband Communication Networks (IBCN) 97.66% 13

2011 1km 10km 100km 1000km 10K km
test
run1 23.28% 44.62% 62.46% 75.00% 97.38%
run2 24.20% 51.49% 72.62% 85.62% 97.85%
run3 23.62% 49.84% 70.30% 84.14% 97.83%
run4 0.04% 0.11% 0.92% 11.67% 81.02%
run5 48.01% 65.98% 76.85% 87.38% 98.43%
2012 1km 10km 100km 1000km 10K km
dev
run1 24.18% 53.13% 72.71% 85.15% 98.19%
run2 24.65% 54.25% 75.05% 86.82% 98.34%
run3 24.59% 54.25% 75.01% 86.82% 98.34%
run4 0.58% 2.69% 5.82% 21.45% 92.07%
run5 47.52% 66.04% 76.83% 86.65%

Results and discussion - test

2012 test 1km 10km 100km 1000km 10K km
run1 10.98% 28.10% 41.54% 57.91% 89.41%
run2 11.36% 29.65% 47.18% 61.19% 89.98%
run3 11.36% 29.65% 47.18% 61.19% 89.98%
run4 0.10% 0.74% 2.56% 21.21% 91.37%
run5 20.61% 34.24% 47.42% 59.47% 89.74%

18


run1 10.98% 28.10% 41.54% 57.91% 89.41%
run2 11.36% 29.65% 47.18% 61.19% 89.98%
run3 11.36% 29.65% 47.18% 61.19% 89.98%
run4 0.10% 0.74% 2.56% 21.21% 91.37%
run5 20.61% 34.24% 47.42% 59.47% 89.74%

19


run1 10.98% 28.10% 41.54% 57.91% 89.41%
run2 11.36% 29.65% 47.18% 61.19% 89.98%
run3 11.36% 29.65% 47.18% 61.19% 89.98%
run4 0.10% 0.74% 2.56% 21.21% 91.37%
run5 20.61% 34.24% 47.42% 59.47% 89.74%

20


run1 10.98% 28.10% 41.54% 57.91% 89.41%
run2 11.36% 29.65% 47.18% 61.19% 89.98%
run3 11.36% 29.65% 47.18% 61.19% 89.98%
run4 0.10% 0.74% 2.56% 21.21% 91.37%
run5 20.61% 34.24% 47.42% 59.47% 89.74%

21

Conclusions

 Using textual home locations, title and description of
the video, we can considerably improve the results

 SIFT features may help in some particular cases, but
the computation cost seems hard to justify for this

 There seems to be scope for improving the results of
feature selection techniques for tailored to this task
 Witnessed by replacing chi-2 based method with the
approach from WISTUD2011

22

Questions ?

Olivier Van Laere
Olivier.VanLaere@intec.ugent.be
www.ibcn.intec.ugent.be
INTEC Broadband Communication Networks (IBCN)
Department of Information Technology (INTEC)
Ghent University - IBBT


Ghent and Cardiff University at the 2012 Placing Task

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (19)

Similar to Ghent and Cardiff University at the 2012 Placing Task

Similar to Ghent and Cardiff University at the 2012 Placing Task (20)

More from MediaEval2012

More from MediaEval2012 (20)

Recently uploaded

Recently uploaded (20)

Ghent and Cardiff University at the 2012 Placing Task

Editor's Notes