SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
1.
In-depth Exploration of Geotagging
Performance
using sampling strategies on YFCC100M
George Kordopatis-Zilos, Symeon Papadopoulos, Yiannis
Kompatsiaris
Information Technologies Institute, Thessaloniki, Greece
MMCommons Workshop, October 16, 2016 @ Amsterdam, NL
2.
Where is it?
Depicted landmark
Eiffel Tower
Location
Paris, Tennessee
Keyword “Tennesee” is very important
to correctly place the photo.
Source (Wikipedia):
http://en.wikipedia.org/wiki/Eiffel_Tow
er_(Paris,_Tennessee)
3.
Motivation
Evaluating multimedia retrieval systems
• What do we evaluate?
• How?
• What decisions do we make based on it?
MM system
(black box) Test Collection
Comparison to ground truth
Evaluation measure
Decision
4.
Problem Formulation
• Test collection creation Evaluation bias
• Performance reduced to a single measure
miss a lot of nuances of performance
• Test problem: Geotagging = predicting the
geographic location of a multimedia item
based on its content
5.
Example: Evaluating geotagging
• Test collection #1: 1M images, 700K located in US
• Assume we use P@1km as an evaluation measure
• System 1: almost perfect precision in US (100%), very poor for
rest of the world (10%) P@1km = 0.7*100 + 0.3*10 = 73%
• System 2: approximately the same precision all over the world
(65%) P@1km = 65%
• Test collection #2: 1M images, 500K depicting cats
and puppies on white background
• Then, for 50% of the collection any prediction is
essentially random.
6.
Multimedia Geotagging
• Problem of estimating the geographic location of a
multimedia item (e.g. Flickr image + metadata)
• Variety of approaches:
• Text-based: use the text metadata (tags)
• Gazetteer-based
• Statistical methods (associations between tags & locations)
• Visual
• Similarity-based (find most similar and use their location)
• Model-based (learn visual model of an area)
• Hybrid
• Combine text and visual
7.
Language Model
• Most likely cell: 𝑐𝑗 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑖 𝑘=1
𝑁
𝑝(𝑡 𝑘|𝑐𝑖)
• Tag-cell probability: 𝑝 𝑡 𝑐 =
𝑁 𝑢
𝑁𝑡
We will refer to this as:
Base LM (or Basic)
8.
Language Model Extensions
• Feature selection
• Discard tags that do not provide any geographical cues
• Selection criterion: locality > 0
• Feature weighting
• More importance to tags with geographic information
• Linear combination of locality and spatial entropy
• Multiple grids
• Consider two grids: fine and coarse – if the estimate from the
fine grid falls within that of the coarse, then use that one
• Similarity Search
• Out of the selected cell, use lat/lon of most similar item to
refine location estimation
We will refer to this as:
Full LM (or Full)
9.
MediaEval Placing Task
• Benchmarking activity in the context of MediaEval
• Dataset:
• Flickr images and videos (different each year)
• Training and test set
• Also possible to test systems that use external data
Edition Training Set Test Set
2015 4,695,149 949,889
2014 5,025,000 510,000
2013 8,539,050 262,000
10.
Proposed Evaluation Framework
• Initial (reference) test collection Dref
• Sampling function f: Dref Dtest
• Performance volatility
• p(D): performance score achieved in collection D
• In our case, we consider two such measures:
• P@1km
• Median distance error
11.
Sampling Strategies
A variety of approaches for Placing Task collection:
• Geographical Uniform Sampling
• User Uniform Sampling
• Text-based Sampling
• Text Diversity Sampling
• Geographically Focused Sampling
• Ambiguity-based Sampling
• Visual Sampling
12.
Uniform Sampling
• Geographic Uniform Sampling
• Divide earth surface into square areas of approximately
the same size (~10x10km)
• Select N items from each area (N=median of items/area)
• User Uniform Sampling
• Select only one item per user
13.
Text Sampling
• Text-based Sampling
• Select only items with more than M terms (M: median
of terms/item)
• Text Diversity Sampling
• Represent items using bag-of-words
• Use MinHash to generate a binary code per BoW vector
• Select one item per code (bucket) B
14.
Other Sampling Strategies
• Geographically Focused Sampling
• Pick items from a selected place (continent/country)
• Ambiguity-based Sampling
• Select the set of items that are associated with
ambiguous place names (or the complementary set)
• Ambiguity defined with the help of entropy
• Visual Sampling
• Select only items associated with a given visual concept
• Select only items associated with concepts related to
buildings
15.
Experiments - Setup
• Placing Task 2015 dataset: 949,889 images (subset
of YFCC100M)
• Test four variants of Language Model method:
• Basic-PT: Base LM method trained on PT dataset (=4.7
geotagged images released by the task organizers)
• Full-PT: Full LM method trained on PT dataset
• Basic-Y: Base LM method trained on YFCC dataset
(=40M geotagged images of YFCC100M)
• Full-Y: Full LM method trained on YFCC dataset
25.
Thank you!
Data/Code:
• https://github.com/MKLab-ITI/multimedia-geotagging/
Get in touch:
• George Kordopatis-Zilos: georgekordopatis@iti.gr
• Symeon Papadopoulos: papadop@iti.gr / @sympap
With the support of:
0 likes
Be the first to like this
Views
Total views
490
On SlideShare
0
From Embeds
0
Number of Embeds
3
You have now unlocked unlimited access to 20M+ documents!
Unlimited Reading
Learn faster and smarter from top experts
Unlimited Downloading
Download to take your learnings offline and on the go
You also get free access to Scribd!
Instant access to millions of ebooks, audiobooks, magazines, podcasts and more.
Read and listen offline with any device.
Free access to premium services like Tuneln, Mubi and more.