Ghent and Cardiff University at the 2012 Placing Task
1. Ghent and Cardiff University at the 2012
Placing Task (UG-CU)
Olivier Van Laere, Bart Dhoedt
Department of Information Technology (INTEC)
Ghent University, Belgium
Steven Schockaert, Jonathan A. Quinn, Frank C. Langbein
School of Computer Science & Informatics
Cardiff University, United Kingdom
Department of Information Technology – Broadband Communication Networks (IBCN)
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
2. Lessons from last year
Using a prior in our language models that includes
information from the user’s home location
significantly boosts the results
Clear need for a feature selection technique
tailored to this task
E.g. WISTUD approach 2011
Department of Information Technology – Broadband Communication Networks (IBCN)
2
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
3. Lessons from last year
italy
Department of Information Technology – Broadband Communication Networks (IBCN)
3
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
4. Lessons from last year
sicily
Department of Information Technology – Broadband Communication Networks (IBCN)
4
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
5. Lessons from last year
sea
Department of Information Technology – Broadband Communication Networks (IBCN)
5
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
6. Lessons from last year
pisa
Department of Information Technology – Broadband Communication Networks (IBCN)
6
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
7. Lessons from last year
leaningtower
Department of Information Technology – Broadband Communication Networks (IBCN)
7
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
8. Lessons from last year
Using a prior in our language models that includes
information from the user’s home location
significantly boosts the results
Clear need for a feature selection technique
tailored to this task
E.g. WISTUD approach 2011
Need for handling videos without tags
43.4% of test data compared to 16.1% last year
We try to georeference the item at the 10000
clustering, and fall back to 2500, 500 in case of
absence of textual information
Department of Information Technology – Broadband Communication Networks (IBCN)
8
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
9. Data
~2.1M of the original ~3M task training photos
Run 2: extracted SIFT features
to the extent the images were available on Flickr
Run 5: ~17.1M Flickr photos
Crawled in 2011, accuracy 16 ~ street level
Gazetteer: Google Geocoding API
Used to reverse geocode the “home” field from the
user’s profile
Department of Information Technology – Broadband Communication Networks (IBCN)
9
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
10. Approach – two step
Data clustered into 500, 2500 and 10000 areas
Feature vocabulary selection for each of those
Language models are used to select most likely
area to contain the given test video, based on
textual information
Similarity search, using the textual information, is
used to select a location within this area based on
the most similar training items
Department of Information Technology – Broadband Communication Networks (IBCN)
10
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
11. Approach – main differences
Adopted feature selection method from WISTUD
In case a video has no tags, use:
Textual home location from the user, video title and
description as Flickr tags
In case there is no textual info at all, default to
London
If available and considered reliable, include visual
similarity
Department of Information Technology – Broadband Communication Networks (IBCN)
11
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
12. Approach – similarity search
Instead of returning the location of most similar
(using Jaccard index) training item
3 possible locations:
Most similar training photo
Home location of the owner (if allowed and available)
Visually most similar training photo
We choose the location minimizing a certain score
Department of Information Technology – Broadband Communication Networks (IBCN)
12
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
13. Results and discussion - dev
2011 1km 10km 100km 1000km 10K km
test
run1 23.28% 44.62% 62.46% 75.00% 97.38%
run2 24.20% 51.49% 72.62% 85.62% 97.85%
run3 23.62% 49.84% 70.30% 84.14% 97.83%
run4 0.04% 0.11% 0.92% 11.67% 81.02%
run5 48.01% 65.98% 76.85% 87.38% 98.43%
2012 1km 10km 100km 1000km 10K km
dev
run1 24.18% 53.13% 72.71% 85.15% 98.19%
run2 24.65% 54.25% 75.05% 86.82% 98.34%
run3 24.59% 54.25% 75.01% 86.82% 98.34%
run4 0.58% 2.69% 5.82% 21.45% 92.07%
run5 47.52% 66.04% 76.83% 86.65%
Department of Information Technology – Broadband Communication Networks (IBCN) 97.66% 13
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
14. Results and discussion - dev
2011 1km 10km 100km 1000km 10K km
test
run1 23.28% 44.62% 62.46% 75.00% 97.38%
run2 24.20% 51.49% 72.62% 85.62% 97.85%
run3 23.62% 49.84% 70.30% 84.14% 97.83%
run4 0.04% 0.11% 0.92% 11.67% 81.02%
run5 48.01% 65.98% 76.85% 87.38% 98.43%
2012 1km 10km 100km 1000km 10K km
dev
run1 24.18% 53.13% 72.71% 85.15% 98.19%
run2 24.65% 54.25% 75.05% 86.82% 98.34%
run3 24.59% 54.25% 75.01% 86.82% 98.34%
run4 0.58% 2.69% 5.82% 21.45% 92.07%
run5 47.52% 66.04% 76.83% 86.65%
Department of Information Technology – Broadband Communication Networks (IBCN) 97.66% 14
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
15. Results and discussion - dev
2011 1km 10km 100km 1000km 10K km
test
run1 23.28% 44.62% 62.46% 75.00% 97.38%
run2 24.20% 51.49% 72.62% 85.62% 97.85%
run3 23.62% 49.84% 70.30% 84.14% 97.83%
run4 0.04% 0.11% 0.92% 11.67% 81.02%
run5 48.01% 65.98% 76.85% 87.38% 98.43%
2012 1km 10km 100km 1000km 10K km
dev
run1 24.18% 53.13% 72.71% 85.15% 98.19%
run2 24.65% 54.25% 75.05% 86.82% 98.34%
run3 24.59% 54.25% 75.01% 86.82% 98.34%
run4 0.58% 2.69% 5.82% 21.45% 92.07%
run5 47.52% 66.04% 76.83% 86.65%
Department of Information Technology – Broadband Communication Networks (IBCN) 97.66% 15
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
16. Results and discussion - dev
2011 1km 10km 100km 1000km 10K km
test
run1 23.28% 44.62% 62.46% 75.00% 97.38%
run2 24.20% 51.49% 72.62% 85.62% 97.85%
run3 23.62% 49.84% 70.30% 84.14% 97.83%
run4 0.04% 0.11% 0.92% 11.67% 81.02%
run5 48.01% 65.98% 76.85% 87.38% 98.43%
2012 1km 10km 100km 1000km 10K km
dev
run1 24.18% 53.13% 72.71% 85.15% 98.19%
run2 24.65% 54.25% 75.05% 86.82% 98.34%
run3 24.59% 54.25% 75.01% 86.82% 98.34%
run4 0.58% 2.69% 5.82% 21.45% 92.07%
run5 47.52% 66.04% 76.83% 86.65%
Department of Information Technology – Broadband Communication Networks (IBCN) 97.66% 16
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
17. Results and discussion - dev
2011 1km 10km 100km 1000km 10K km
test
run1 23.28% 44.62% 62.46% 75.00% 97.38%
run2 24.20% 51.49% 72.62% 85.62% 97.85%
run3 23.62% 49.84% 70.30% 84.14% 97.83%
run4 0.04% 0.11% 0.92% 11.67% 81.02%
run5 48.01% 65.98% 76.85% 87.38% 98.43%
2012 1km 10km 100km 1000km 10K km
dev
run1 24.18% 53.13% 72.71% 85.15% 98.19%
run2 24.65% 54.25% 75.05% 86.82% 98.34%
run3 24.59% 54.25% 75.01% 86.82% 98.34%
run4 0.58% 2.69% 5.82% 21.45% 92.07%
run5 47.52% 66.04% 76.83% 86.65%
Department of Information Technology – Broadband Communication Networks (IBCN) 97.66% 17
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
18. Results and discussion - test
2012 test 1km 10km 100km 1000km 10K km
run1 10.98% 28.10% 41.54% 57.91% 89.41%
run2 11.36% 29.65% 47.18% 61.19% 89.98%
run3 11.36% 29.65% 47.18% 61.19% 89.98%
run4 0.10% 0.74% 2.56% 21.21% 91.37%
run5 20.61% 34.24% 47.42% 59.47% 89.74%
Department of Information Technology – Broadband Communication Networks (IBCN)
18
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
19. Results and discussion - test
2012 test 1km 10km 100km 1000km 10K km
run1 10.98% 28.10% 41.54% 57.91% 89.41%
run2 11.36% 29.65% 47.18% 61.19% 89.98%
run3 11.36% 29.65% 47.18% 61.19% 89.98%
run4 0.10% 0.74% 2.56% 21.21% 91.37%
run5 20.61% 34.24% 47.42% 59.47% 89.74%
Department of Information Technology – Broadband Communication Networks (IBCN)
19
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
20. Results and discussion - test
2012 test 1km 10km 100km 1000km 10K km
run1 10.98% 28.10% 41.54% 57.91% 89.41%
run2 11.36% 29.65% 47.18% 61.19% 89.98%
run3 11.36% 29.65% 47.18% 61.19% 89.98%
run4 0.10% 0.74% 2.56% 21.21% 91.37%
run5 20.61% 34.24% 47.42% 59.47% 89.74%
Department of Information Technology – Broadband Communication Networks (IBCN)
20
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
21. Results and discussion - test
2012 test 1km 10km 100km 1000km 10K km
run1 10.98% 28.10% 41.54% 57.91% 89.41%
run2 11.36% 29.65% 47.18% 61.19% 89.98%
run3 11.36% 29.65% 47.18% 61.19% 89.98%
run4 0.10% 0.74% 2.56% 21.21% 91.37%
run5 20.61% 34.24% 47.42% 59.47% 89.74%
Department of Information Technology – Broadband Communication Networks (IBCN)
21
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
22. Conclusions
Using textual home locations, title and description of
the video, we can considerably improve the results
SIFT features may help in some particular cases, but
the computation cost seems hard to justify for this
There seems to be scope for improving the results of
feature selection techniques for tailored to this task
Witnessed by replacing chi-2 based method with the
approach from WISTUD2011
Department of Information Technology – Broadband Communication Networks (IBCN)
22
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
23. Questions ?
Olivier Van Laere
Olivier.VanLaere@intec.ugent.be
www.ibcn.intec.ugent.be
INTEC Broadband Communication Networks (IBCN)
Department of Information Technology (INTEC)
Ghent University - IBBT
Department of Information Technology – Broadband Communication Networks (IBCN)
MediaEval2012 Workshop, October 4-5, 2012, Pisa, Italy
Editor's Notes
To give an idea, training data of this year, couple of examples
Sicily
sea
Pisa
Pisa
where S contains the 10 most similar photos from the chosen cluster in terms of the Jaccard index, dist(p,s) is the straight-line distance between p and the location of photo s, jaccard(s,x) is the Jaccard similarity between s and the test video x and λ = 5.
Development data : 2011 test = 2012 dev by adopting these changes, we manage to increase the results for our first run
Development data : 2011 test = 2012 dev To this extent that the results of run1 are in the same range of run 2, which means we can achieve similar results to using a gazetteer, but without actually using it
Interesting to note is that there is a small but visible difference between run2 and run3. Run 2 uses SIFT features, run 3 does not There was a difference at the sub kilometer threshold for about 6 videos – landmarks
Please note the minor difference in the results of run 5, while the approach is quite different 17.1M training instead of 10M - no multilevel and no dempster-shafer – just 500 + a lot of training data for similarity search
Also note that last year, run 5 differed significantly from the others
The results clearly still benefit from using a gazetteer
Using the visual features has not made any difference at all This shows the difficulty of combining the visual similarity with the textual information - It was hard to determine a reliable visual match, so we adopted a very cautious acceptance, apparently sidelining the visual information in these cases
Run 5 still clearly outperforms run 1
But it is noteworthy that already at the 100km threshold, run 2 catches up, and even outperforms the 1000km treshold with only 2.1M training items vs 17.1M. Main difference here is that run5 only uses a single clustering and no fallback