Querying Incomplete Geospatial Information in RDF

Uploaded on

In this paper, we propose the problem of implementing an efficient query processing system for incomplete temporal and geospatial information in RDFi as a challenge to the SSTD community.

In this paper, we propose the problem of implementing an efficient query processing system for incomplete temporal and geospatial information in RDFi as a challenge to the SSTD community.

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • Ordnance Survey is Great Britain's national mapping authority. It offers digital and paper map products for a wide range of business and outdoor uses.
  • GADM is a spatial database of the location of the world's administrative areas for use in GIS and similar software.
  • NUTS is a hierarchical system defined by the Eurostat office of the European Union for dividing the economic territory of EU in 4 levels.
  • Chaudhuri (VLDB’88)Framework for temporal relationships in a database employing a graph model (limited to definite information) The knowledge representation language Telos (1991)Preliminary Prolog implementation by M. Koubarakis and T. Topaloglou. The most efficient implementation of Telos (ConceptBase) does not consider incomplete information.Foundations of temporal constraint databases (Koubarakis, PhD thesis 1994)Database models for (indefinite) temporal constraint databasesSQL+i (1996)Temporal RDBMS for modeling and querying indeterminate temporal factsRepresentation and reasoning employing constraint networksLater system (1997)Querying of temporal knowledge basesLimited query language (no disjunctive expressions)


  • 1. Querying Incomplete Geospatial Information in RDF Charalampos Nikolaou and Manolis Koubarakis Department of Informatics and Telecommunications National and Kapodistrian University of Athens International Symposium on Spatial and Temporal Databases (SSTD) 2013 August 23, 2013
  • 2. Motivation • Increased interest in publishing geospatial datasets as linked data (i.e., encoded in RDF and with semantic links to other datasets) • Geospatial information might be: o Quantitative (e.g., exact geometric information) o Qualitative (e.g., topological relations) ... and express knowledge that is o Complete o Incomplete (or indefinite)
  • 3. Ordnance Survey (UK) 73,546,231 triples
  • 4. Global Administrative Areas (GADM) 9,896,532 triples
  • 5. Nomenclature of Territorial Units for Statistics (NUTS) 316,246 triples
  • 6. Linked Geospatial Data DB Tropes Hellenic FBD Hellenic PD Crime Reports UK NHS (EnAKTing) Open Election Data Project EU Institutions CO2 Emission (EnAKTing) Energy (EnAKTing) EEA Mortality (EnAKTing) Ordnance Survey legislation data.gov.uk UK Postcodes ESD standards ISTAT Immigration Lichfield Spending Scotland Pupils & Exams Traffic Scotland Data Gov.ie reference data.gov. uk TWC LOGD transport data.gov. uk Eurostat Eurostat (FUB) (RKB Explorer) Linked EDGAR (Ontology Central) EURES (Ontology Central) GovTrack Finnish Municipalities New York Times World Factbook Geo Species Italian public schools Project Gutenberg UMBEL riese dbpedia lite dataopenac-uk TCM Gene DIT Daily Med YAGO Open Cyc data dcs Diseasome Enipedia Lexvo DBLP (L3S) Twarql LinkedCT EUNIS Cornetto SMC Journals Ocean Drilling Codices Turismo de Zaragoza Janus AMP Linked GeoData WordNet (W3C) Alpine Ski Austria AEMET Metoffice Weather Forecasts PDB Weather Stations Yahoo! Geo Planet National Radioactivity JP ChEMBL Open Data Thesaurus Sears GESIS Pisa RESEX Scholarometer ACM NVD IBM DEPLOY Newcastle RAE2001 LOCAH Roma CiteSeer Courseware dotAC ePrints IEEE RISKS PROSITE Affymetrix SISVU GEMET Airports STW Budapest IRIT VIVO Indiana (Bio2RDF) PubMed ProDom VIVO Cornell STITCH LAAS NSF KISTI Linked Open Colors SGD Gene Ontology AGROV OC Product DB DBLP (RKB Explorer) Swedish Open Cultural Heritage JISC WordNet (RKB Explorer) EARTh lobid Organisations ECS (RKB Explorer) HGNC LODE Climbing NSZL Catalog Wiki ECS Southampton ECS Southampton EPrints Eurécom UniProt Taxono my lobid Resources Pfam UniProt WordNet (VUA) Ulm P20 UN/ LOCODE SIDER Drug Bank Europeana OAI DBLP (FU Berlin) ERA lingvoj VIAF Deutsche Biographie ~ 62 billion triples BibBase Uberblic Norwegian MeSH UB Mannheim Calames BNB Freebase Rådata nå! GND ndlna data bnf.fr OS DBpedia GeoWord Net El Viajero Tourism IdRef Sudoc iServe Geo Names LCSH Sudoc RDF Book Mashup LIBRIS PSH DDC Open Calais Greek DBpedia ntnusc MARC Codes List totl.net US Census (rdfabout) Piedmont Accomodations URI Burner LEM Thesaurus W SW Dog Food Portuguese DBpedia t4gm info RAMEAU SH LinkedL CCN theses. fr my Experiment flickr wrappr NDL subjects Open Library (Talis) Plymouth Reading Lists Revyu Fishes of Texas (rdfabout) Scotland Geography Linked MDB Event Media US SEC Semantic XBRL FTS Chronicling America Telegraphis Linked Sensor Data (Kno.e.sis) Eurostat Goodwin Family NTU Resource Lists Open Library SSW Thesaur us semantic web.org BBC Music Geo Linked Data Source Code Ecosystem Linked Data Didactal ia Pokedex St. Andrews Resource Lists Manchester Reading Lists gnoss Poképédia Classical (DB Tune) BBC Wildlife Finder NASA (Data Incubator) Ontos News Portal Sussex Reading Lists Bricklink yovisto Semantic Tweet Linked Crunchbase Jamendo (DBtune) Music Brainz (DBTune) Last.FM (rdfize) Taxon Concept LOIUS CORDIS CORDIS (FUB) (Data Incubator) BBC Program mes Rechtspraak. nl Openly Local data.gov.uk intervals London Gazette Discogs (DBTune) OpenEI statistics data.gov. uk GovWILD Brazilian Politicians educatio n.data.g ov.uk Music Brainz (zitgist) RDF ohloh FanHubz patents data.go v.uk research data.gov. uk Klappstuhlclub Lotico (Data Incubator) Last.FM artists Population (EnAKTing) reegle Ren. Energy Generators (DBTune) Surge Radio tags2con delicious Slideshare 2RDF (DBTune) Music Brainz John Peel EUTC Productions business data.gov. uk Crime (EnAKTing) Ox Points GTAA Magnatune Linked User Feedback LOV Audio Scrobbler Moseley Folk OMIM MGI InterPro Smart Link Product Types Ontology Open Corporates Italian Museums Amsterdam Museum UniParc UniRef UniSTS Linked Open Numbers Reactome OGOLOD Pub Chem GeneID KEGG Pathway Medi Care Google Art wrapper meducator KEGG Drug UniPath way Chem2 Bio2RDF Homolo Gene VIVO UF ECCOTCP bible ontology KEGG Enzyme PBAC KEGG Reaction KEGG Compound KEGG Glycan Media Geographic Publications User-generated content Government Cross-domain Life sciences As of September 2011
  • 7. Question How do we manage (represent, store, query) this data efficiently?
  • 8. Challenges: Theory ① RDF extensions for representing and querying incomplete qualitative and quantitative geospatial information • GeoSPARQL • We proposed RDFi • No published algorithm for query processing when considering RCC-8 and constants o Standard OGC query language for RDF data with geospatial information o Topological relations can be expressed/queried, but no reasoning is offered. o Can work with any topological/temporal constraint language with/without constant symbols (e.g., RCC-5, RCC-8, IA) o Formal semantics and algorithm for computing certain answers o Preliminary complexity results for various constraint languages
  • 9. i RDF by example gag:Region rdfs:subClassOf geo:Feature. gag:WestGreece rdf:type gag:Region. gag:Municipality rdfs:subClassOf geo:Feature. West Greece gag:OlympiaMuni noa:Hotspot noa:hotspot rdfs:subClassOf geo:Feature. rdf:type noa:Hospot. noa:Fire noa:fire Olympia rdf:type gag:Municipality. rdfs:subClassOf geo:Feature. rdf:type noa:Fire. gag:OlympiaMuni geo:hasGeometry ex:oGeo. ex:oGeo rdf:type sf:Polygon. ex:oGeo geo:asWKT "POLYGON((..))"^^geo:wktLiteral. noa:hotspot geo:hasGeometry ex:rec. ex:rec geo:asWKT "POLYGON((..))"^^geo:wktLiteral. gag:WestGreece geo:sfContains gag:OlympiaMuni. noa:hotspot geo:sfContains noa:fire.
  • 10. i RDF by example (cont’d) Query: Find fires inside the region of West Greece. West Greece GeoSPARQL query: Olympia CERTAIN SELECT ?f WHERE { ?f rdf:type noa:Fire. gag:WestGreece geo:sfContains ?f. }
  • 11. i RDF by example (cont’d) Query: Find fires inside the region of West Greece. contains contains West Greece Olympia GeoSPARQL query: CERTAIN SELECT ?f WHERE { ?f rdf:type noa:Fire. gag:WestGreece geo:sfContains ?f. }
  • 12. Challenges: Theory ② Efficient computation of the entailment relation Φ⊨Θ • where Φ and Θ are quantifier-free first-order formulas of a constraint language expressing the topological relations of various frameworks (RCC-8, DE-9IM, etc.)
  • 13. Challenges: Theory ③ Computing entailment is equivalent to checking consistency of formulas with constraint networks • Constraint networks: o Spatial relations among regions o Regions might be constant ones (exact geometric information) or identified by a URI • Most recent results considered basic and complete RCC-5 networks with polygonal regions • For RCC-8, deciding consistency is NP-complete • No published algorithm for checking consistency • Are there tractable cases?
  • 14. Challenges: Practice ④ Scale to billions of triples • Reasoners from QSR scale only up to hundreds of regions with complex spatial relations How do they perform in our case? • Setting: o o o o Real linked geospatial datasets No constants Only base RCC-8 relations Evaluation of consistency checking using the well-known path-consistency algorithm
  • 15. Experimental evaluation after one day • Computation of the complete constraint network • Running time: O(n3) • Memory requirements: O(n2) n ≈ thousands to millions hundreds of regions thousands of regions thousands of regions thousands of regions Setup: Intel Xeon E5620, 2.4 GHz, 12MB L3, 48GB RAM, RAID 5, Ubuntu 12.04
  • 16. Network structure • We have started working on algorithms taking into account the structure of these networks: o Node degrees fit a power-law distribution o Network is sparse
  • 17. Network structure (cont’d) • Edges of three kinds: non-tangential proper part externally connected equals • Reflect networks composed of components with hierarchical structure o R-tree extensions (Papadias, Kalnis, Mamoulis, AAAI’99) • Parallel algorithms combined with backward-chaining techniques for lazy query processing o Graph partitioning o Path compression data structures and indexes
  • 18. Related work: Spatial • Qualitative spatial reasoning - Efficient algorithms for consistency checking of constraint networks (complex spatial relations, few number of regions) - Does not consider query processing • Description logic reasoners - PelletSpatial: RCC-8 reasoning (cannot handle disjunctions) - RacerPro: RCC-8 reasoning
  • 19. Related work: Temporal • Chaudhuri (VLDB’88) • The knowledge representation language Telos (TOIS’90) • Foundations of temporal constraint databases (Koubarakis, PhD thesis, ‘94) • Qualitative temporal reasoning community (since 80s) • SQL+i system (BNCOD‘96) • Later system (IEEE’97) • Hurtado and Vaisman (2006)
  • 20. Conclusions • What’s the CHALLENGE? Implementing an efficient query processing system for incomplete geospatial information in RDFi • The desired system should: o reason about qualitative and quantitative spatial information that might be incomplete o be scalable to billions of triples in the most useful cases
  • 21. Thank you
  • 22. Dataset characteristics