4B_3_Automatically generating keywods for georeferenced imagedPresentation Transcript
Automatically generating keywords for georeferenced images Ross Purves1, Alistair Edwardes1, Xin Fan2, Mark Hall2 and Martin Tomko1 1University of Zurich 2University of Sheffield 3Cardiff University
What I want to talk about… How are images indexed? How do we describe images? How can we (semi?)-automaticallygenerate (geographically related) keywords that describe image content?
Image indexing Image indexing is crucial, since image search is almost always based on keywords Keywords basically assigned in four ways: Manual annotation Terms freely chosen by indexers (these might include tags nowadays) Terms selected from a controlled vocabulary Automatic annotation Terms extracted from text thought to be related to an image Terms assigned to an image on the basis of content-based techniques
…but… Image search (based on such annotation) often does not meet user expectations
The semantic gap… “…the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation.” (Smeulders et al., 2000)
…in other words Search fails because annotations are not the same as search terms… …unlike text search – we are using a proxy for content The challenge is, therefore, to develop methods that better match user expectations -> and are therefore more universal Taken up by the ESP game/ Google’s image labeller
How should we describe these pictures? Image Javier Corripio Image Google Streetview
Theory of image description Panofsky-Shatford facet matrix – Shatford (1986)
The where facet In the Tripod project, we are especially interested in describing images based on where they were taken We suppose, that in the medium-term all images will be georeferenced Panofsky-Shatfordmatrix suggests some ways we might describe images I’m going to concentrate on where/generic of (Martin will talk tomorrow about one element of the where/ specific of)
Where/ generic of The generic of represents a kind of place Kinds may relate to basic levels Basic levels (e.g. Rosch, 1977) are terms used in natural language which are informative and summative (e.g. table vs. furniture or square table) Basic levels are probably very good indexing terms For example: Mountains, valleys, desert, ravines Street, pavement, house Empirical research has explored what these basic levels are in human subject experiments and we explored them in UGC/VGI How can we generate terms which relate to these kinds automatically?
Basic process Start with an image associated with a coordinates and (sometimes) direction Basic process Identify potential visible area Query spatial data within visible area for candidate keywords Rank and filter candidate keywords to generate final keyword list
Identifying visible area Camera parameters extracted from EXIF Content-based check for building combined with landcover data to determine urban or rural case – controls range of viewshed If no direction information, 360° viewshedgenerated, otherwise sector defined on basis of camera parameters
Basic keywording process Identify available data in region Query data using viewshed for data classes Map data classes to (potentially multiple) concepts – remove duplicates Concepts expanded to multiple (potentially multi-lingual) candidate keywords Rank candidate keywords according to area, probabilistic salience and web salience
Final filtering and ranking Ranking and filtering based on spatial extent, descriptive and web salience Spatial extent: concepts covering large area typically important (but favours landcover related concepts) Descriptive salience: weight rare concepts with respect to surroundings higher than common (e.g. a village shop is more salient than one on a high street) Web salience: Query web with keywordsand local toponymsto find common combinations
Keywords:Reservoir, Forest, Moors, Woodland
Keywords: Lake, Meadows, Settlement, Reservoir, Forest
Are we any good overall? Initial results (with an interim solution) – about 40% of keywords are good or very good for a set of 20 images, assessed by ~70 users
Where are we? We can (semi) automatically annotate images based on their location, with keywords related to geography Concept ontology allows us to switch data providers in and out easily Filtering and ranking reduces initial long list of candidate concepts to set of keywords ~40% of keywords are good/very good
Some lessons learned Many devices still relatively error prone (particularly with respect to direction) Open Street Map has very rich (but sometimes esoteric) attribution – but richness in urban areas very beneficial National Mapping Agency data essential in rural areas General classes (e.g. Building) difficult to use (either too many or too general keywords)
25 Acknowledgements I’d like to gratefully acknowledge contributors to Geograph British Isles, see http://www.geograph.org.uk/credits/2007-02-24, whose work is made available under the following Creative Commons Attribution-ShareAlike 2.5 Licence (http://creativecommons.org/licenses/by-sa/2.5/). Much of the research reported here was part of the project TRIPOD supported by the European Commission under contract 045335.