This paper describes our Georeferencing approaches, experiments, and results at the MediaEval 2014 Placing Task evaluation. The task consists of predicting the most probable geographical coordinates of Flickr images and videos using its visual, audio and metadata associated features. Our approaches used only Flickr users textual metadata annotations and tagsets. We used four approaches for this task: 1) an approach based on Geographical Knowledge Bases (GeoKB), 2) the Hiemstra Language Model (HLM) approach with Re-Ranking, 3) a combination of the GeoKB and the HLM (GeoFusion). 4) a combination of the GeoFusion with a HLM model derived from the English Wikipedia georeferenced pages. The HLM approach with Re-Ranking showed the best performance within 10m to 1km distances. The GeoFusion approaches achieved the best results within the margin of errors from 10km to 5000km.
http://ceur-ws.org/Vol-1263/mediaeval2014_submission_77.pdf
Advancing Spatio-temporal Analysis of Ecological Data Examples in R.pdfSabrina Green
Similar to TALP-UPC at MediaEval 2014 Placing Task: Combining Geographical Knowledge Bases and Language Models for Large-Scale Textual Georeferencing (20)
TALP-UPC at MediaEval 2014 Placing Task: Combining Geographical Knowledge Bases and Language Models for Large-Scale Textual Georeferencing
1. TALP-UPC at MediaEval 2014 Placing Task
TALP-UPC at MediaEval 2014 Placing Task:
Combining Geographical Knowledge Bases and
Language Models for Large-Scale Textual
Georeferencing
Daniel Ferr´es and Horacio Rodr´ıguez
TALP Research Center
Computer Science Department
Universitat Polit`ecnica de Catalunya
Barcelona, Spain
17th October 2014
2. TALP-UPC at MediaEval 2014 Placing Task
Introduction
Textual Approaches: we use only textual metadata to
predict
Title
Description
User Tags (keywords)
Approaches.
1) GeoKB. Geographical Knowledge Heuristics
2) Hiemstra Language Model with Re-Ranking (HLM-RR)
3) GeoFusion: combines results from approaches GeoKB and
HLM-RR
4) GeoFusion+GeoWiki: combines results from GeoKB,
HLM-RR, and a HLM model of English Wikipedia
Georeferenced pages.
3. TALP-UPC at MediaEval 2014 Placing Task
Geographical Knowledge Approach
System Architecture
Place Name Detection with Geonames Gazetteer.
Filtering Geographically ambiguous words with Multilingual
Stopwords Lists and the English Dictionary.
Geographical Focus Detection with Toponym Disambiguation
Heuristics.
4. TALP-UPC at MediaEval 2014 Placing Task
Geographical Focus Detection Heuristics
Geographical Knowledge Heuristics (high priority):
GH1) select the most populated place that is not a state,
country or continent and has its state apearing in the text,
GH2) otherwise select the most populated place that is not a
state, country or continent and has its country apearing in the
text,
GH3) otherwise select the most populated state that has its
country apearing in the text
otherwise try Population rules
Population Heuristics (low priority):
PH1) if a place exists select the most populated place that is
not a country, state or a continent,
PH2) otherwise if exists a state, select the most populated one,
PH3) otherwise select the most populated country,
PH4) otherwise select the most populated continent.
5. TALP-UPC at MediaEval 2014 Placing Task
Information Retrieval with Re-Ranking Approach
Terrier1 IR software (version 3.0) with the Hiemstra Language
Modelling (HLM) weighting model.
Indexing: coordinates pairs (document identifier) and tagsets
(document text).
Retrieval: topic composed by Metadata fields (title,
description and user tags).
Re-Ranking the IR results (1000 top documents).
Get the subset of top-weighted coordinate pairs.
Recalculate the weights of the coordinates of the subset by
adding the weights of neighbours up to a X distance in Kms.
Get the top-weighted coordinate pair.
1
Terrier. http://terrier.org
6. TALP-UPC at MediaEval 2014 Placing Task
GeoFusion Approach
Combines the results of the GeoKB approach and the IR
approach with Re-Ranking
Geographical Knowledge Rules (GH1, GH2 and GH3) if
activated are selected in priority order to get the prediction.
Otherwise, the prediction comes from the IR Re-Ranking
approach.
7. TALP-UPC at MediaEval 2014 Placing Task
GeoFusion+GeoWiki Approach
Combines the results of the GeoKB approach, the HLM with
Re-Ranking approach, and a HLM model of the English
Wikipedia georeferenced pages (857,574 pages).
Geographical Knowledge Rules (GH1, GH2 and GH3) if
activated are selected in priority order to get the prediction.
Otherwise, the prediction comes from the HLM Re-Ranking
approach.
The HLM model of the georeferenced Wikipedia is used if the
score of the HLM Re-Ranking approach does not achieves a
threshold.
8. TALP-UPC at MediaEval 2014 Placing Task
Training Corpora for IR
Corpora used
Placing Task 2014 Training dataset: (5,025,000
photos/videos). (after a filtering process we get a set of
3,057,718 coordinates pairs with related metadata)
English Wikipedia georeferenced pages2 ( a set of 857,574
pages with associated coordinates)
2
http://de.wikipedia.org/wiki/Wikipedia:
WikiProjekt_Georeferenzierung/Hauptseite/Wikipedia-World/en
10. TALP-UPC at MediaEval 2014 Placing Task
Results II
Table : Percentage of correctly georeferenced photos/videos within
certain amount of kilometers and median error for each run.
Margin run1 run3 run4 run5
10m 0.29 0.08 0.23 0.23
100m 4.12 0.80 3.00 3.00
1km 16.54 10.71 15.90 15.90
10km 34.34 33.89 38.52 38.53
100km 51.06 42.35 52.47 52.47
1000km 64.67 52.54 65.87 65.86
5000km 78.63 69.84 79.29 79.28
Median Error (kms) 83.98 602.21 64.36 64.41
11. TALP-UPC at MediaEval 2014 Placing Task
Conclusions
The HLM approach with Re-Ranking showed the best
performance within 10m to 1km distances.
GeoFusion and GeoFusion+GeoWiki outperform the other
approaches within the margin of errors from 10km to 5000km..
GeoKB rules achieved 81.17% of accuracy predicting up to
100km.
Difficult cases: few textual information.