Outline™  Motivation™  The Model stRDF™  The Query Language stSPARQL™  The Scalable Geospatial RDF Store Strabon™  Ex...
Motivation™  Need for modeling of™  Geospatial and temporal information™  Product metadata™  Product content™  Need t...
TELEIOS Project™  TELEIOS EU project™  Development of a Virtual Earth Observatory™  Accessible to expert and non-expert...
The Data Model stRDF™  stRDF extends RDF with:™  Spatial literals encoded by Boolean combinations oflinear constraints™...
stRDF: An example
stRDF: An exampleex:BurntArea1 rdf:type noa:BurntArea.ex:BurntArea1 noa:hasID "1"^^xsd:decimal.ex:BurntArea1 noa:hasArea "...
stRDF: An example (GML)ex:BurntArea1 rdf:type noa:BurntArea.ex:BurntArea1 noa:hasID "1"^^xsd:decimal.ex:BurntArea1 noa:has...
stSPARQL: An example™  Find all burnt forests within 10km of a citySELECT ?BA ?BAGEOWHERE {?R rdf:type noa:Region .?R str...
stSPARQL: Geospatial SPARQL 1.1™  We define a SPARQL extension function for each functiondefined in the OpenGIS Simple Fe...
stSPARQL: Geospatial SPARQL 1.1™  Spatial terms™  Constants (e.g., "POLYGON((38.16 23.7, ...)) " ^^strdf:WKT)™  Variabl...
Strabon: A Scalable GeospatialRDF Store™  Storing: stRDF™  Querying: stSPARQL, GeoSPARQL™  Storage Backend: PostGIS, Mo...
Strabon: A Scalable GeospatialRDF StorestRDFgraphsstSPARQL/GeoSPARQLqueriesWKT GMLQuery EngineParserOptimizerEvaluatorTran...
Query EvaluationSesame’s Native Store (Baseline)™  Index (Geometries): None(geometries treated as ordinary literals)™  Q...
Experimental Evaluation™  Workload based on geospatial linked datasetsq  150 million triplesq  Test queries based on lo...
Datasets: CharacteristicsDBpedia GeoNames LGD Pachube SwissEx CLC GADMSize 7.1GB 2.1GB 6.6GB 828KB 33MB 14GB 146MBTriples ...
LinkedGeoData™  Q1: Find the names of the points of interest located in thecity of Rotterdam.SELECT ?poiNameWHERE {?poi r...
LinkedGeoData/GADM™  Q2: Find the points of interest located in Luxembourg.SELECT ?labelWHERE {?poi rdf:type lgd:Amenity ...
Response Time for Queries Q1, Q2050100150200250cold caches warm caches cold caches warm cachesQ1 Q2ResponseTime(seconds)Wo...
Experimental Evaluation™  Workload based on a synthetic datasetq  Dataset based on OpenStreetMapsq  10 million triples ...
Sample Data“14”“POINT(2.00 3.00)”osm:idgeordf:hasGeometry1osm:Noderdf:typeosm:Tag osm:Tag1“1”osm:Node23rdf:type“1”osm:keyo...
Sample stSPARQL QuerySELECT *WHERE {?tag geordf:key "1" .?node geordf:hasTag ?tag .?node geo:hasGeography1 ?geo .FILTER (s...
Experimental Results: 10 million0.1110100101102103104105106responsetime[sec]number of Nodes in query regionResponse Time (...
Experimental Results: 500 million110100100010000103104105106107responsetime[sec]number of Nodes in query regionResponse Ti...
Other Extensions to RDF
The Framework RDFi™  Extension of RDF with incomplete information™  New kind of literals (e-literals) for each datatype™...
Challenges™ Query processing in face of linked geospatialdata™  Query optimization for spatial data™  Federation™  Eff...
Conclusions™  stRDF/stSPARQL™  Model and query language for the representation andquerying of geospatial data™  Strabon...
Thanks™ Strabonhttp://strabon.di.uoa.gr/™ NOA Application Demohttp://test.strabon.di.uoa.gr/NOA/™ TELEIOS EU Projecthtt...
Building Scalable Semantic Geospatial RDF Stores
Building Scalable Semantic Geospatial RDF Stores
Building Scalable Semantic Geospatial RDF Stores
Upcoming SlideShare
Loading in …5
×

Building Scalable Semantic Geospatial RDF Stores

1,007 views

Published on

Dagstuhl

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,007
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Building Scalable Semantic Geospatial RDF Stores

  1. 1. Outline™  Motivation™  The Model stRDF™  The Query Language stSPARQL™  The Scalable Geospatial RDF Store Strabon™  Experimental Evaluation™  Other Extensions to RDF™  Conclusions
  2. 2. Motivation™  Need for modeling of™  Geospatial and temporal information™  Product metadata™  Product content™  Need to link to other data sources™  GIS data™  Other information on the Web
  3. 3. TELEIOS Project™  TELEIOS EU project™  Development of a Virtual Earth Observatory™  Accessible to expert and non-expert users™  Scalable (Storing/Querying PBs)™  Use Cases™  DLR: A Virtual Observatory for TerraSAR-X data™  NOA: Real-time fire monitoring based on continuousacquisitions of EO images and geospatial data™  Technologies™  NKUA: Geospatial extensions for RDF and SPARQL™  CWI: Array extensions to SQL
  4. 4. The Data Model stRDF™  stRDF extends RDF with:™  Spatial literals encoded by Boolean combinations oflinear constraints™  New Datatype for spatial literals (strdf:geometry)™  Valid time of triples encoded by Boolean combinationsof temporal constraints™  stRDF (most recent version, TELEIOS)™  Spatial literals encoded in Well-Known Text/GML(OGC standards)™  Valid time of triples ignored for the time being
  5. 5. stRDF: An example
  6. 6. stRDF: An exampleex:BurntArea1 rdf:type noa:BurntArea.ex:BurntArea1 noa:hasID "1"^^xsd:decimal.ex:BurntArea1 noa:hasArea "23.7636"^^xsd:double.ex:BurntArea1 strdf:geometry "POLYGON(( 38.16 23.7, 38.1823.7, 38.18 23.8, 38.16 23.8, 38.16 23.7));<http://spatialreference.org/ref/epsg/4121/>"^^strdf:WKT .Spatial Data TypeWell-Known TextSpatial Literal
  7. 7. stRDF: An example (GML)ex:BurntArea1 rdf:type noa:BurntArea.ex:BurntArea1 noa:hasID "1"^^xsd:decimal.ex:BurntArea1 noa:hasArea "23.7636"^^xsd:double.ex:BurntArea1 strdf:geometry "<gml:PolygonsrsName=http://www.opengis.net/def/crs/EPSG/0/4121><gml:outerBoundaryIs><gml:LinearRing><gml:coordinates>38.16,23.70 38.18,23.70 38.18,23.80 38.16,23.8038.16,23.70</gml:coordinates></gml:LinearRing></gml:outerBoundaryIs></gml:Polygon>"^^strdf:GML .Spatial Data TypeGMLSpatial Literal
  8. 8. stSPARQL: An example™  Find all burnt forests within 10km of a citySELECT ?BA ?BAGEOWHERE {?R rdf:type noa:Region .?R strdf:geometry ?RGEO .?R noa:hasCorineLandCoverUse ?F .?F rdfs:subClassOf clc:Forests .?CITY rdf:type dbpedia:City .?CITY strdf:geometry ?CGEO .?BA rdf:type noa:BurntArea .?BA strdf:geometry ?BAGEO .FILTER(strdf:Intersect(?RGEO,?BAGEO) &&strdf:Distance(?BAGEO,?CGEO) < 0.02) }SpatialFunctions
  9. 9. stSPARQL: Geospatial SPARQL 1.1™  We define a SPARQL extension function for each functiondefined in the OpenGIS Simple Features Access standard™  Basic functions™  Get a property of a geometry (e.g., strdf:srid)™  Get the desired representation of a geometry (e.g., strdf:AsText)™  Test whether a certain condition holds (e.g., strdf:IsEmpty,strdf:IsSimple)™  Functions for testing topological spatial relationships(e.g., strdf:equals, strdf:intersects)™  Spatial analysis functions™  Construct new geometric objects from existing geometric objects(e.g., strdf:buffer, strdf:intersection, strdf:convexHull)™  Spatial metric functions (e.g., strdf:distance, strdf:area)™  We define spatial aggregate functions (e.g., strdf:union,strdf:extent)
  10. 10. stSPARQL: Geospatial SPARQL 1.1™  Spatial terms™  Constants (e.g., "POLYGON((38.16 23.7, ...)) " ^^strdf:WKT)™  Variables (e.g., ?GEO)™  Results of set operations (e.g., strdf:intersection, strdf:union,)™  Results of geometric operations (e.g., strdf:boundary, strdf:buffer)™  Select clause™  Construction of new geometries (e.g., strdf:buffer(?geo, 0.1))™  Spatial aggregate functions (e.g., strdf:extent(?geo))™  Metric functions (e.g., strdf:area(?geo))™  Filter clause™  Functions for testing topological spatial relationships between spatial terms (e.g., strdf:contains(?G1, strdf:union(?G2, ?G3)))™  Numeric expressions involving spatial metric functions (e.g., strdf:area(?G1) ≤ 2*strdf:area(?G2)+1)™  Boolean combinations™  Having clause™  Boolean expressions involving spatial aggregate functions and spatial metric functions or functionstesting for topological relationships between spatial terms (e.g., strdf:area(strdf:union(?geo) ) > 1 )™  Updates™  Similar to the OGC standard GeoSPARQL
  11. 11. Strabon: A Scalable GeospatialRDF Store™  Storing: stRDF™  Querying: stSPARQL, GeoSPARQL™  Storage Backend: PostGIS, MonetDB™  Extends: Sesame 2.6.3™  Open Source: http://strabon.di.uoa.gr/™  EU Projects: Semsorgrid4Env and TELEIOS
  12. 12. Strabon: A Scalable GeospatialRDF StorestRDFgraphsstSPARQL/GeoSPARQLqueriesWKT GMLQuery EngineParserOptimizerEvaluatorTransactionManagerStorage ManagerRepositorySAILRDBMSStrabonPostGISGeneralDB
  13. 13. Query EvaluationSesame’s Native Store (Baseline)™  Index (Geometries): None(geometries treated as ordinary literals)™  Query Evaluation1.  Triple pattern evaluation2.  Evaluation of spatial functions in Sesame using JTS libraryStrabon + PostGIS™  Index (Geometries): R-Tree™  Query EvaluationPostGIS optimizerEvaluation of spatial functions in PostGISTriple pattern evaluation might come before/after evaluation of spatialfunctions
  14. 14. Experimental Evaluation™  Workload based on geospatial linked datasetsq  150 million triplesq  Test queries based on logs from Dbpedia,LinkedGeoData1.  Detailed evaluation is coming up[ISWC paper submission under preparation]
  15. 15. Datasets: CharacteristicsDBpedia GeoNames LGD Pachube SwissEx CLC GADMSize 7.1GB 2.1GB 6.6GB 828KB 33MB 14GB 146MBTriples 58,722,893 17,688,602 46,296,978 6,333 277,919 19,711,926 255SpatialTerms386,205 1,262,356 5,414,032 101 687 2,190,214 51DistinctSpatialTerms375,087 1,099,964 5,035,981 70 623 2,190,214 51Points 375,087 1,099,964 3,205,015 70 623 - -Linestrings - - 353,714 - - - -Polygons - - 1,704,650 - - 2,190,214 51(79,831 vertices)
  16. 16. LinkedGeoData™  Q1: Find the names of the points of interest located in thecity of Rotterdam.SELECT ?poiNameWHERE {?poi rdf:type lgd:Amenity .?poi rdfs:label ?poiName .?poi geo:geometry ?poiGeo .FILTER (strdf:inside(?poiGeo, "POLYGON((3.9111328125 51.7401123046875, 3.9111328125 52.05322265625,4.6087646484375 52.05322265625, 4.6087646484375 51.7401123046875,3.9111328125 51.7401123046875))"^^strdf:WKT)) }
  17. 17. LinkedGeoData/GADM™  Q2: Find the points of interest located in Luxembourg.SELECT ?labelWHERE {?poi rdf:type lgd:Amenity .?poi rdfs:label ?label .?poi geo:geometry ?poiGeo .gadm:LUXEMBOURG gadm:hasGeometry ?luxGeo .FILTER (strdf:inside(?luxGeo, ?poiGeo)) }
  18. 18. Response Time for Queries Q1, Q2050100150200250cold caches warm caches cold caches warm cachesQ1 Q2ResponseTime(seconds)Workload based on Geospatial Linked DatasetsStrabonSesame
  19. 19. Experimental Evaluation™  Workload based on a synthetic datasetq  Dataset based on OpenStreetMapsq  10 million triples (2 GB) up to 1 billion triples (110 GB)q  Triples with spatial literals: 1.4 and 140 million triples™  Response time of queries with various thematic andspatial selectivities
  20. 20. Sample Data“14”“POINT(2.00 3.00)”osm:idgeordf:hasGeometry1osm:Noderdf:typeosm:Tag osm:Tag1“1”osm:Node23rdf:type“1”osm:keyosm:valueosm:hasTag
  21. 21. Sample stSPARQL QuerySELECT *WHERE {?tag geordf:key "1" .?node geordf:hasTag ?tag .?node geo:hasGeography1 ?geo .FILTER (strdf:inside(?geo,"POLYGON((-1 -1, 0.056568542 -1,0.056568542 0.056568542, -1 0.056568542, -1 -1))"^^http://strdf.di.uoa.gr/ontology#WKT))}
  22. 22. Experimental Results: 10 million0.1110100101102103104105106responsetime[sec]number of Nodes in query regionResponse Time (cold caches)(postgres) key = 1(postgres) key = 1024(sesame) key = 1(sesame) key = 1024
  23. 23. Experimental Results: 500 million110100100010000103104105106107responsetime[sec]number of Nodes in query regionResponse Time (cold caches)(postgres) key = 1(postgres) key = 1024(sesame) key = 1(sesame) key = 1024
  24. 24. Other Extensions to RDF
  25. 25. The Framework RDFi™  Extension of RDF with incomplete information™  New kind of literals (e-literals) for each datatype™  Property values that exist but are unknown or partially known™  Partial knowledge: captured by constraints(appropriate constraint language L)™  RDF graphs extended to RDFi databases: pair (G, φ)G: RDF graph with e-literalsφ: quantifier-free formula of L™  Formal semantics for RDFi and SPARQL query evaluation™  Representation System: CONSTRUCT with AUF graph patterns™  Certain Answer: semantics, algorithms, computational complexity when L isa language of spatial topological constraints™  Implementation in the context of Strabon has started with L being PCL(topological constraints between variables and polygon constants)[conference submission]
  26. 26. Challenges™ Query processing in face of linked geospatialdata™  Query optimization for spatial data™  Federation™  Efficient query processing with incompleteinformation
  27. 27. Conclusions™  stRDF/stSPARQL™  Model and query language for the representation andquerying of geospatial data™  Strabon™  Scalable Geospatial RDF Store™  RDFi™  Framework for representation and querying of RDFdata with incomplete information
  28. 28. Thanks™ Strabonhttp://strabon.di.uoa.gr/™ NOA Application Demohttp://test.strabon.di.uoa.gr/NOA/™ TELEIOS EU Projecthttp://www.earthobservatory.eu/

×