StrabonA Semantic Geospatial DBMS Kostis Kyzirakos, Manos Karpathiotakis, and             Manolis Koubarakis Dept. of Info...
Outline Introduction The data model stRDF The query language stSPARQL  and a comparison to GeoSPARQL The system Strabo...
Main idea How do we represent and query geospatial information in the Semantic Web?    Develop appropriate vocabularies a...
Example          4
National Observatory of Athens:Fire Products Ontology                                  5
   Introduction   The data model                         The data model    stRDF   The query language    stSPARQL and a...
The Data Model stRDF stRDF extends RDF with:    Spatial literals encoded by Boolean combinations of     linear constrain...
Burnt Area Products@prefix strdf:<http://strdf.di.uoa.gr/ontology#>.@prefix noa:<http://teleios.di.uoa.gr/ontologies/noaOn...
The stRDF Data Modelstrdf:geometry rdf:type rdfs:Datatype;              rdfs:subClassOf rdfs:Literal.strdf:WKT   rdf:type ...
   Introduction   The data model    stRDF                     The query   The query    language         language       ...
stSPARQL: Geospatial SPARQL 1.1We define a SPARQL extension function for each functiondefined in the OpenGIS Simple Featur...
stSPARQL: Geospatial SPARQL 1.1Select clause Construction of new geometries (e.g., strdf:buffer(?geo, 0.1)) Spatial aggr...
stSPARQL: An example (1/2)Find coniferous forests that have been affected byfiresSELECT ?forest ?burntAreaWHERE {   ?burnt...
stSPARQL: An example (2/2)Isolate the parts of the burnt areas that lie inconiferous forests.                 Spatial SELE...
The OGC Standard GeoSPARQL                    Core       Topology                  Geometry      Parameters      Vocabular...
   Introduction   The data model    stRDF            The system   The query    language    stSPARQL and a              ...
Strabon Architecture                             Strabon WKT       GML   Sesame v2.6.3                   Query Engine     ...
Storage Scheme                               uri_valuestype_2Triples                                      ID   VALUE      ...
Query Processing                          Strabon   • Parser generates stSPARQL/                            abstract synta...
Query Processing (cont’d)• Deviate from the evaluation strategy of Sesame  for SPARQL extension functions• Push the evalua...
   Introduction   The data model                     Experimental    stRDF   The query    language                     ...
Experimental Evaluation• Goal: Evaluate the performance of Strabon vs other systems• Real workload based on geospatial lin...
Strabon vs other systems• Strabon over PostgreSQL  (Strabon-PG)• Strabon over System X  (Strabon-X)• Implementation over R...
Real world workload: Data               DBpedia     GeoNames        LGD       Pachube   SwissEx      CLC        GADMSize  ...
Real world workload: Queries               Commonly Used     Spatial Selection    Spatial Join    Query 1           X    Q...
Real world workload: ResultsCache    System       Q1     Q2     Q3      Q4      Q5       Q6      Q7     Q8StateCold     Na...
Synthetic Workload• Workload based on a synthetic dataset  Dataset based on OpenStreetMaps  10 million triples (2 GB) up...
Synthetic Workload: SampleData
Synthetic Workload:    Sample stSPARQL QuerySELECT *WHERE {    ?tag geordf:key "1" .    ?node geordf:hasTag ?tag .    ?nod...
Real-world Workload:                 100 million triples – warm cachesResponse time (sec)                                 ...
Real-world Workload:100 million triples – cold caches   number of Nodes in query region   number of Nodes in query region ...
Strabon-PG: 500 million triplesTags comparison     Response time (sec)
Real-world Workload:Query Plans
Findings• Strabon over PostgreSQL outperforms other systems in case of warm caches• Results in case of cold caches mixed• ...
System           Language      Index       Geometries    CRS support     Geospatial                                       ...
   Introduction    The data model    stRDF    The query                     Related Work &    language    stSPARQL and ...
Future Work Use even larger datasets   Test with 1 billion triples successful Implement the temporal part of stSPARQL ...
Thanks! Any Questions? Strabon   Manolis Koubarakis, Kostis Kyzirakos, Manos Karpathiotakis,      Charalampos Nikolaou, ...
Upcoming SlideShare
Loading in...5
×

Strabon: A Semantic Geospatial Database System

2,041

Published on

We present a new version of the data model stRDF and the query language stSPARQL for the representation and querying of geospatial data. The new versions of stRDF and stSPARQL use OGC standards to represent geometries where the original version of stSPARQL used linear constraints. In this sense stSPARQL is a subset of the recent standard GeoSPARQL proposed by OGC. We discuss the implementation of the system Strabon which is a storage and query evaluation module for stRDF/stSPARQL and the corresponding subset of GeoSPARQL. We study the performance of Strabon experimentally and show that it scales to very large data volumes.

Published in: Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,041
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
20
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Hello, I am KostisKyzirakos, and I will be presenting you the work done with my colleagues Manos Karpathiotakis and ManolisKoubarakis, on the development of a geospatial RDF store nick-named Strabon.
  • Our extension of RDF for the representation of geospatial and temporal information is stRDF.stRDFextens RDF with a new kind of literals, called spatial literals, which are expresssionsof Boolean combinations of linear constraints.stRDF defines also a new datatype for spatial literals [which is a supertype of the respective datatypes for WKT and GML (standardized formats of the Open Geospatial Consortium for the encoding of geometries)]stRDF employs also a forth component to a triple to represent the valid time of triples.(???WHAT EXPRESSIONS CAN BE WRITTEN???)Constraints: theoretical representationIn the most recent version of stRDF, and to be compliant with geospatial industry standards,spatial literals are encoded in the WKT/GML formats, which are widely adopted standards of the Open Geospatial Consortium.The reason for moving to OGC standards was being realistic. No one had any spatial data expressedwith linear constraints! But all the nice work of the theoretical model remained valid. We only had to change the encoding used for spatial literals.(We are currently working on this feature, so this presentation does not consider time anymore.) We will describe our model through an example.
  • Filter expressions are...stSPARQL supports also the construction of new spatial terms and spatial aggregate functions in SELECT clauses.(spatial aggregate functions are implemented in Strabon and we do not use the implementation of PostGIS to overcome issues with transformatios between cordinate reference systems, something that PostGIS does not take into account)OpenGIS Simple Feature Access specification defines a standard SQL schema that supports storage, retrieval, query and update of collections of spatial features using SQL
  • Strabon extends the well-known RDF store Sesame, allowing it to manage both thematic and spatial data expressed in stRDF. In the case when PostGIS is used as the relational backend of Strabon,Strabon uses an R-tree-over-GiST (Generalised Search Tree) offered by PostGIS as a spatial index.Strabon is implemented by creating a layer that is included in Sesame&apos;s software stackin a transparent way so that it does not affect its range of functionalities, whilebenefitting from new versions of Sesame. Strabon 3.0 uses Sesame 2.6.3 andcomprises three modules: the storage manager, the query engine and the underlying databaseWe have extended Sesame as follows: Modified the storage scheme, optimizer, and evaluator to take into account geometries and not treat them as plain literals Added a new RDBMS abstract layer to Storage Manager so as any spatially-enabled database can be used with Strabon.Currently, we have been using PostGIS and MonetDB.
  • “Decomposition Storage Model”
  • The query engine works as follows. First, the parser generates an abstractsyntax tree. Then, this tree is mapped to the internal algebra of Sesame, re-sulting in a query tree. The query tree is then processed by the optimizer thatprogressively modies it, implementing the various optimization techniques ofStrabon. Afterwards, the query tree is passed to the evaluator to produce thecorresponding SQL query that will be evaluated by PostgreSQL. After the SQLquery has been posed, the evaluator receives the results and performs any post-processing actions needed. The nal step involves formatting the results. Besidesthe standard formats oered by RDF stores, Strabonoers KML and GeoJSONencodings, which are widely used in the mapping industry.We now discuss how the optimizer works. First, it applies all the Sesameoptimizations that deal with the standard SPARQL part of an stSPARQL query(e.g., it pushes down FILTERs to minimize intermediate results etc.). Then, twooptimizations specic to stSPARQL are applied. The rst optimization has todo with the extension functions of stSPARQL. By default, Sesame evaluatesthese after all bindings for the variables present in the query are retrieved. InStrabon, we modify the behaviour of the Sesame optimizer to incorporate allextension functions present in the SELECT and FILTER clause of an stSPARQLquery into the query tree prior to its transformation to SQL. In this way, theseextension functions will be evaluated using PostGIS spatial functions instead ofrelying on external libraries that would add an unneeded post-processing cost.The second optimization makes the underlying DBMS aware of the existence ofspatial joins in stSPARQL queries so that they would be evaluated eciently.Let us consider the query of Example 2. The rst three triple patterns of thequery are related to the rest via the topological function strdf:intersects.The query tree that is produced by the Sesame optimizer, fails to deal withthis spatial join appropriately. It will generate a Cartesian product for the thirdand the fth triple pattern of the query, and the evaluation of the spatial predi-catestrdf:intersects will be wrongly postponed after the calculation of thisCartesian product. Using a query graph as an intermediate representation of thequery, we identify such spatial joins and modify the query tree with appropriatenodes so that Cartesian products are avoided. For the query of Example 2, themodied query tree will result in a SQL query that contains a -join where isthe spatial function ST Intersects.At this stage, the query tree is a plain translation ofthe initial query to the algebra of Sesame. The query tree is then processed bythe optimizer that progressively restructures and enriches it, implementing thevarious optimization techniques of Strabon. Afterwards, the query tree is passedto the evaluator to produce the corresponding SQL query that will be evaluatedby PostgreSQL. After the SQL query has been posed, the evaluator receivesthe results and performs any post-processing actions needed.
  • First, it applies all the Sesameoptimizations that deal with the standard SPARQL part of an stSPARQL query(e.g., it pushes down FILTERs to minimize intermediate results etc.). Then, twooptimizations specic to stSPARQL are applied. The rst optimization has todo with the extension functions of stSPARQL. By default, Sesame evaluatesthese after all bindings for the variables present in the query are retrieved. InStrabon, we modify the behaviour of the Sesame optimizer to incorporate allextension functions present in the SELECT and FILTER clause of an stSPARQLquery into the query tree prior to its transformation to SQL. In this way, theseextension functions will be evaluated using PostGIS spatial functions instead ofrelying on external libraries that would add an unneeded post-processing cost.The second optimization makes the underlying DBMS aware of the existence ofspatial joins in stSPARQL queries so that they would be evaluated eciently.Let us consider the query of Example 2. The rst three triple patterns of thequery are related to the rest via the topological function strdf:intersects.The query tree that is produced by the Sesame optimizer, fails to deal withthis spatial join appropriately. It will generate a Cartesian product for the thirdand the fth triple pattern of the query, and the evaluation of the spatial predi-catestrdf:intersects will be wrongly postponed after the calculation of thisCartesian product. Using a query graph as an intermediate representation of thequery, we identify such spatial joins and modify the query tree with appropriatenodes so that Cartesian products are avoided. For the query of Example 2, themodied query tree will result in a SQL query that contains a -join where isthe spatial function ST Intersects.At this stage, the query tree is a plain translation ofthe initial query to the algebra of Sesame. The query tree is then processed bythe optimizer that progressively restructures and enriches it, implementing thevarious optimization techniques of Strabon. Afterwards, the query tree is passedto the evaluator to produce the corresponding SQL query that will be evaluatedby PostgreSQL. After the SQL query has been posed, the evaluator receivesthe results and performs any post-processing actions needed.Query processing in Strabon is performed by the query engine which consistsof a parser, an optimizer, an evaluator and a transaction manager. The parserand the transaction manager are identical to the ones in Sesame. The optimizerand the evaluator have been implemented by modifying the corresponding com-ponents of Sesame as we describe below.The query engine works as follows. First, the parser generates an abstractsyntax tree. Then, this tree is mapped to the internal algebra of Sesame, re-sulting in a query tree. The query tree is then processed by the optimizer thatprogressively modies it, implementing the various optimization techniques ofStrabon. Afterwards, the query tree is passed to the evaluator to produce thecorresponding SQL query that will be evaluated by PostgreSQL. After the SQLquery has been posed, the evaluator receives the results and performs any post-processing actions needed. The nal step involves formatting the results. Besidesthe standard formats oered by RDF stores, Strabonoers KML and GeoJSONencodings, which are widely used in the mapping industry.We now discuss how the optimizer works. First, it applies all the Sesameoptimizations that deal with the standard SPARQL part of an stSPARQL query(e.g., it pushes down FILTERs to minimize intermediate results etc.). Then, twooptimizations specic to stSPARQL are applied. The rst optimization has todo with the extension functions of stSPARQL. By default, Sesame evaluatesthese after all bindings for the variables present in the query are retrieved. InStrabon, we modify the behaviour of the Sesame optimizer to incorporate allextension functions present in the SELECT and FILTER clause of an stSPARQLquery into the query tree prior to its transformation to SQL. In this way, theseextension functions will be evaluated using PostGIS spatial functions instead ofrelying on external libraries that would add an unneeded post-processing cost.The second optimization makes the underlying DBMS aware of the existence ofspatial joins in stSPARQL queries so that they would be evaluated eciently.Let us consider the query of Example 2. The rst three triple patterns of thequery are related to the rest via the topological function strdf:intersects.The query tree that is produced by the Sesame optimizer, fails to deal withthis spatial join appropriately. It will generate a Cartesian product for the thirdand the fth triple pattern of the query, and the evaluation of the spatial predi-catestrdf:intersects will be wrongly postponed after the calculation of thisCartesian product. Using a query graph as an intermediate representation of thequery, we identify such spatial joins and modify the query tree with appropriatenodes so that Cartesian products are avoided. For the query of Example 2, themodied query tree will result in a SQL query that contains a -join where isthe spatial function ST Intersects.At this stage, the query tree is a plain translation ofthe initial query to the algebra of Sesame. The query tree is then processed bythe optimizer that progressively restructures and enriches it, implementing thevarious optimization techniques of Strabon. Afterwards, the query tree is passedto the evaluator to produce the corresponding SQL query that will be evaluatedby PostgreSQL. After the SQL query has been posed, the evaluator receivesthe results and performs any post-processing actions needed.
  • We designed a simple benchmark
  • This section presents a detailed evaluation of the system Strabon using two different workloads: a workload based on linked data and a synthetic workload. Forboth workloads, we compare the response time of Strabon on top of PostgreSQL (called Strabon PG from now on) with our closest competitor implementationin [3], the naive, baseline implementation described in Section 4, and the RDF store Parliament. To identify potential benets from using a dierent DBMS asa relational backend for Strabon, we also executed the SQL queries produced by Strabon in a proprietary spatially-enabled DBMS (which we will call System X,and Strabon X the resulting combination). [3] presents an implementation which enhances the RDF-3X triple store withthe ability to perform spatial selections using an R-tree index. The implementation of [3] is not a complete system like Strabon and does not support afull-edged query language such as stSPARQL. In addition, the only way to load data in the system is the use of a generator which has been especially designed for the experiments of [3] thus it cannot be used to load other datasets in the implementation. Moreover, the geospatial indexing support of this implementation is limited to spatial selections. Spatial selections are pushed down inthe query tree (i.e., they are evaluated before other operators). Parliament is an RDF storage engine recently enhanced with GeoSPARQL processing capabilities [2], which is coupled with Jena (an RDF query processor) to provide a complete RDF system.
  • We have used a number of linked geospatial datasets. I would like to stress the heterogeneity of the dataset.We have included datasets from many application domains (e.g., sensors, land use, administrative areas, gazeteer).They also vary significantly in terms of size and complexity of the spatial geometries. DBpediaGeoNamesLDG (LinkedGeoData, OpenStreetMaps)Pachube (“patch-bay”): Pachube (&quot;patch-bay&quot;) connects people to devices, applications, and the Internet of Things. As a web-based service built to manage the world&apos;s real-time data, Pachube gives people the power to share, collaborate, and make use of information generated from the world around them.SwissEx (Swiss Experiment): http://www.swiss-experiment.chA platform to enable real-time environmental experiments through wireless sensor networks and a common, modern, generic cyber-infrastructure. This infrastructure will be used to enable interdisciplinary environmental research, allowing scientists to work efficiently and collaboratively to find the key mechanisms in the triggering of natural hazards and to efficiently distribute the information to increase public awareness.CLC (Corine Land Cover)GADM (Global Administrative Areas): http://www.gadm.org/ GADM is a spatial database of the location of the world&apos;s administrative areas (or adminstrative boundaries) for use in GIS and similar software. Administrative areas in this database are countries and lower level subdivisions such as provinces, departments, bibhag, bundeslander, daerahistimewa, fivondronana, krong, landsvæðun, opština, sous-préfectures, counties, and thana. GADM describes where these administrative areas are (the &quot;spatial features&quot;), and for each area it provides some attributes, such as the name and variant names.
  • Queries 1,2: no spatial dimension
  • Cold caches vs work cachesGreen is the winning colorRed is the loosing colorThe various versions of Strabon are the winning onesNo cases where Parliament is the winning systemThere are many interesting things we found in this experiment. For example,Parliament: reference implementation of GeoSPARQLIn queries Q1 and Q2, the baseline implementation outperforms all other systems as these queries do not include any spatial function. As mentioned earlier, the native store of Sesame outperforms Sesame implementations on top of a DBMS. All other systems producecomparable result times for these non-spatial queries. In all other queries the DBMS-based implementations outperform Parliament and the naive implemen-tation. Strabon PG outperforms the naive implementation since it incorporates the two stSPARQL-specic optimizations discussed in Section 4. Thus, spatialoperations are evaluated by PostGIS using a spatial index, instead of being evaluated after all the results have been retrieved. The naive implementation andParliament fail to execute Q3 as this query involves a spatial join that is very expensive for systems using a naive approach. The only exception in the behaviorof the naive implementation is Q4 in the case of warm caches, where the non-spatial part of the query produces very few results and the file blocks needed forquery evaluation are cached in main memory. In this case, the non-spatial part of the query is executed rapidly while the evaluation of the spatial function overthe results thus far is not signicant. All queries except Q8 are executed significantly faster when run using Strabon on warm caches. Q8 involves many triplepatterns and spatial functions which result in the production of a large number of intermediate results. As these do not fit in the system&apos;s cache, the responsetime is unaffected by the cache contents. System X decides to ignore the spatial index in queries Q3, Q6-Q8 and evaluate any spatial predicate exhaustively overthe results of a thematic join. In queries Q6-Q8, it also uses Cartesian products in the query plan as it considers their evaluation more profitable. These decisionsare correct in the case of Q8 where it outperforms all other systems significantly, but very costly in the cases of Q3, Q6 and Q7.
  • Για το synthetic dataset, έχουμε τρέξει πειράματα με 1 billion με τον Strabon. Το naive δεν μπορούσε να αποτιμήσει queries σε τόσο μεγάλο dataset οπότε παρουσιάζουμε γράφους μέχρι 500 millions.
  • The data of the synthetic dataset follows a general version of the schema of Open Street Map. Each node has a spatial extent (the location of the node) and is placed uniformly on a grid with step 0.001 (min 0, max 10.24).In addition, each node is assigned a number of tags each of which consist of a key-value pair of strings.Every node is tagged with key 1, every second node with key 2, every fourth node with key 4, and so on up to key 1024.By modifying the step of the grid, we produce datasets of arbitrary size.
  • In this query THIS_VALUE (value “1”) allows us to select a known fraction of the generated nodes. Since every node is tagged with key 1, this triple pattern will produce a binding for every node of the dataset.By altering this value to “2”, we select half the nodes of the dataset. By further changing it to “4”, we select half thenodes we previously did, and so on.The extent of the POLYGON in the filter expression defines how selective the spatial predicate is. We modify the size of the polygon in order to select 10, 100, 1000, etc. spatial nodes.
  • After consulting the query logs of PostgreSQL, we no-ticed that the vast majority of the SQL queries that were posed and derivedfrom the query template adhered to one of the query plans of Figures 4,5. Ac-cording to plan A, query evaluation starts by evaluating the thematic selectionover table key using the appropriate index. The results are retrieved and joinedwith the hasTag and hasGeography predicate tables using appropriate indices.Finally, the spatial selection is evaluated by scanning the spatial index and theresults are joined with the intermediate results of the previous operations. Onthe contrary, plan B starts with the evaluation of the spatial selection, leavingthe application of the thematic selection for the end.Unfortunately, in the current version of PostGIS spatial selectivities are notcomputed properly. The functions that estimate the selectivity of a spatial se-lection/join return a constant number regardless of the actual selectivity of theoperator. Thus, only the thematic selectivity aects the choice of a query plan.To state it in terms of our query template, altering the value PARAM A between1 and 1024 was the only factor inuencing the selection of a query plan.
  • In this case, as previously discussed, PostgreSQL does not select a good plan andthe response times are even higher than the times where PARAM A is equal to 1(Figure 6a), despite producing signicantly more intermediate results.
  • After consulting the query logs of PostgreSQL, we no-ticed that the vast majority of the SQL queries that were posed and derivedfrom the query template adhered to one of the query plans of Figures 4,5. Ac-cording to plan A, query evaluation starts by evaluating the thematic selectionover table key using the appropriate index. The results are retrieved and joinedwith the hasTag and hasGeography predicate tables using appropriate indices.Finally, the spatial selection is evaluated by scanning the spatial index and theresults are joined with the intermediate results of the previous operations. Onthe contrary, plan B starts with the evaluation of the spatial selection, leavingthe application of the thematic selection for the end.Unfortunately, in the current version of PostGIS spatial selectivities are notcomputed properly. The functions that estimate the selectivity of a spatial se-lection/join return a constant number regardless of the actual selectivity of theoperator. Thus, only the thematic selectivity aects the choice of a query plan.To state it in terms of our query template, altering the value PARAM A between1 and 1024 was the only factor inuencing the selection of a query plan.
  • Two different query plans.1. Evaluation of thematic selection first.2. Evaluation of spatial selection first.In Postgres, the functions that estimate the selectivity of a spatial selection or a spatial join return a constant number. In other words, the spatial selectivity of a query did not affect dynamically the decision whether the spatial selection should be the first operation to execute. Therefore, only the thematic selectivity affected the choice for a query plan. To state it in terms of our query template, altering the value for the key between 1 and 1024 was the only factor influencing the query plan.
  • So, what I would like you to keep for this slide is that Strabon not only has a good performance but it is also one of the richest systems available at the moment.
  • Strabon: A Semantic Geospatial Database System

    1. 1. StrabonA Semantic Geospatial DBMS Kostis Kyzirakos, Manos Karpathiotakis, and Manolis Koubarakis Dept. of Informatics and Telecommunications, National and Kapodistrian University of Athens, Greece
    2. 2. Outline Introduction The data model stRDF The query language stSPARQL and a comparison to GeoSPARQL The system Strabon for stSPARQL and GeoSPARQL Experimental Evaluation Related Work & Conclusions
    3. 3. Main idea How do we represent and query geospatial information in the Semantic Web?  Develop appropriate vocabularies and ontologies  Extend RDF to take into account the geospatial dimension  Extend SPARQL to query the new kinds of data  Use Open Geospatial Consortium (OGC) and other geospatial industry standards
    4. 4. Example 4
    5. 5. National Observatory of Athens:Fire Products Ontology 5
    6. 6.  Introduction The data model The data model stRDF The query language stSPARQL and a stRDF comparison to GeoSPARQL The system Strabon for stSPARQL and GeoSPARQL Experimental Evaluation Related Work & Conclusions
    7. 7. The Data Model stRDF stRDF extends RDF with:  Spatial literals encoded by Boolean combinations of linear constraints  New datatype for spatial literals (strdf:geometry)  Valid time of triples encoded by Boolean combinations of temporal constraints stRDF (most recent version)  Spatial literals encoded in Well-Known Text/GML (OGC standards)  Valid time of triples ignored for the time being
    8. 8. Burnt Area Products@prefix strdf:<http://strdf.di.uoa.gr/ontology#>.@prefix noa:<http://teleios.di.uoa.gr/ontologies/noaOntology.owl#>.noa:ba_15 rdf:type noa:BurntArea ; noa:hasGeometry noa:ba_15g. Spatial literalnoa:ba_15g noa:hasSerialization "MULTIPOLYGON(((393801.42 4198827.92, ..., 393008 424131)));<http://www.opengis.net/def/crs/EPSG/0/2100>" Spatial ^^strdf:WKT. data type
    9. 9. The stRDF Data Modelstrdf:geometry rdf:type rdfs:Datatype; rdfs:subClassOf rdfs:Literal.strdf:WKT rdf:type rdfs:Datatype; rdfs:subClassOf strdf:geometry.strdf:GML rdf:type rdfs:Datatype; rdfs:subClassOf strdf:geometry.
    10. 10.  Introduction The data model stRDF The query The query language language stSPARQL stSPARQL and a comparison to GeoSPARQL The system Strabon for stSPARQL and and a comparison GeoSPARQL Experimental to GeoSPARQL Evaluation Related Work & Conclusions
    11. 11. stSPARQL: Geospatial SPARQL 1.1We define a SPARQL extension function for each functiondefined in the OpenGIS Simple Features Access standard Basic functions  Get a property of a geometry (e.g., strdf:srid)  Get the desired representation of a geometry (e.g., strdf:AsText)  Test whether a certain condition holds (e.g., strdf:IsEmpty, strdf:IsSimple) Functions for testing topological spatial relationships (e.g., strdf:equals, strdf:intersects)  OGC Simple Features Access, Egenhofer, RCC-8 Spatial analysis functions  Construct new geometric objects from existing geometric objects (e.g., strdf:buffer, strdf:intersection, strdf:convexHull)  Spatial metric functions (e.g., strdf:distance, strdf:area) Spatial aggregate functions (e.g., strdf:union, strdf:extent)
    12. 12. stSPARQL: Geospatial SPARQL 1.1Select clause Construction of new geometries (e.g., strdf:buffer(?geo, 0.1)) Spatial aggregate functions (e.g., strdf:union(?geo)) Metric functions (e.g., strdf:area(?geo))Filter clause Functions for testing topological relationships between spatial terms (e.g., strdf:contains(?G1, strdf:union(?G2, ?G3))) Numeric expressions involving spatial metric functions (e.g., strdf:area(?G1) ≤ 2*strdf:area(?G2)) Boolean combinationsHaving clause Boolean expressions involving spatial aggregate functions and spatial metric functions or functions testing for topological relationships between spatial terms (e.g., strdf:area(strdf:union(?geo))>1)Updates
    13. 13. stSPARQL: An example (1/2)Find coniferous forests that have been affected byfiresSELECT ?forest ?burntAreaWHERE { ?burntArea rdf:type noa:BurntArea; noa:hasGeometry [ noa:hasSerialization ?baGeo]. ?forest rdf:type noa:Region; clc:hasLandCover noa:coniferousForest; Spatial clc:hasGeometry [ Function clc:hasSerialization ?fGeom]. FILTER(strdf:intersects(?baGeom,?fGeom))}
    14. 14. stSPARQL: An example (2/2)Isolate the parts of the burnt areas that lie inconiferous forests. Spatial SELECT ?burntArea Aggregate(strdf:intersection(?baGeom, strdf:union(?fGeom)) AS ?burntForest)WHERE { ?burntArea rdf:type noa:BurntArea; noa:hasGeometry [ noa:hasSerialization ?baGeo]. ?forest rdf:type noa:Region; clc:hasLandCover noa:coniferousForest; Spatial clc:hasGeometry [ Function clc:hasSerialization ?fGeom]. FILTER(strdf:intersects(?baGeom,?fGeom))}GROUP BY ?burntArea ?baGeom
    15. 15. The OGC Standard GeoSPARQL Core Topology Geometry Parameters Vocabulary Extension Extension- relation family - serialization - version • Serialization • WKT • GML Geometry Topology Extension - serialization • Relation Family - version - relation family • Simple Features • RCC-8 Query Rewrite RDFS Entailment Extension Extension • Egenhofer- serialization - serialization- version - version- relation family - relation family
    16. 16.  Introduction The data model stRDF The system The query language stSPARQL and a Strabon for comparison to GeoSPARQL stSPARQL and GeoSPARQL The system Strabon for stSPARQL and GeoSPARQL Experimental Evaluation strabon.di.uoa.gr Related Work & Conclusions
    17. 17. Strabon Architecture Strabon WKT GML Sesame v2.6.3 Query Engine Storage Manager Parser Repository Optimizer SAIL stRDF graphs Evaluator RDBMS Transaction Manager stSPARQL/ GeoSPARQL queries GeneralDB PostGIS
    18. 18. Storage Scheme uri_valuestype_2Triples ID VALUE DictionarySUBJEC OBJECTSUBJECT PREDICATE OBJECT PREDICATE 1 OBJECT VALUE ID noa:ba_15Tnoa:ba_15 3 rdf:typenoa:ba_151 rdf:type 12 noa:ba_15 rdf:type noa:BurntArea .1noa:ba_15 2 noa:hasGeometry 3noa:ba_15 6 noa:hasGeometry5 noa:ba_15g. rdf:type noa:ba_15g noa:BurntArea 231noa:ba_15g4 rdf:type 5noa:ba_15g rdf:typehasgeom_4 noa:Geometry . 34 noa:BurntArea noa:Geometrynoa:hasGeometry5noa:ba_15g2 noa:hasSerialization 6 "MULTIPOLYGON(...)"^^noa:ba_15g OBJECTSUBJECT noa:hasserialization 45 noa:hasGeometry noa:ba_15g "MULTIPOLYGON(...)"^^5 7 8 strdf:WKT . strdf:WKT noa:ba_15g 56 noa:Geometry1 5 67 noa:Geometry noa:hasSerializationhasserial_7 7label_values noa:hasSerializationSUBJECT OBJECT 8 ID "MULTIPOLYGON(...)"^^ VALUE5 8 geo_values strdf:WKT 8 "MULTIPOLYGON(...)"^^ ID VALUE strdf:WKT 8 datatype_values ID VALUE 8 strdf:WKT
    19. 19. Query Processing Strabon • Parser generates stSPARQL/ abstract syntax tree Query EngineGeoSPARQL query Parser • Abstract syntax tree Optimizer mapped to internal Evaluator algebra of Sesame Transaction Manager • Standard optimizations performed • Evaluator produces corresponding SQL GeneralDB query • DBMS evaluates the SQL query PostGIS • Post-processing
    20. 20. Query Processing (cont’d)• Deviate from the evaluation strategy of Sesame for SPARQL extension functions• Push the evaluation of extension functions to underlying DBMS • Spatial predicates evaluated by PostGIS • Spatial joins now affect query plan • Avoid Cartesian products• Results may be returned in well-known industry formats • KML/KMZ • GeoJSON • GML
    21. 21.  Introduction The data model Experimental stRDF The query language Evaluation stSPARQL and a comparison to GeoSPARQL The system Strabon for stSPARQL and GeoSPARQL Experimental Evaluation Related Work & Conclusions
    22. 22. Experimental Evaluation• Goal: Evaluate the performance of Strabon vs other systems• Real workload based on geospatial linked datasets 150 million triples• Synthetic workload Half a billion triples
    23. 23. Strabon vs other systems• Strabon over PostgreSQL (Strabon-PG)• Strabon over System X (Strabon-X)• Implementation over RDF-3X (Brodt et. al)• Parliament (BBN Technologies)• Naive Implementation over Sesame
    24. 24. Real world workload: Data DBpedia GeoNames LGD Pachube SwissEx CLC GADMSize 7.1 GB 2.1 GB 6.6 GB 828 KB 33 MB 14 GB 146 MBTriples 58,722,893 17,688,602 46,296,978 6,333 277,919 19,711,926 255Spatial 386,205 1,262,356 5,414,032 101 687 2,190,214 51TermsDistinct 375,087 1,099,964 5,035,981 70 623 2,190,214 51SpatialTermsPoints 375,087 1,099,964 3,205,015 70 623 - -Linestrings - - 353,714 - - - 1 - 79,831 verticesPolygons - - 1,704,650 - - 2,190,214 51
    25. 25. Real world workload: Queries Commonly Used Spatial Selection Spatial Join Query 1 X Query 2 X Query 3 X Query 4 X Query 5 X Query 6 X X Query 7 X X Query 8 X Queries with Spatial Joins Points Lines Polygons Query 5 X X X X Query 6 X X X Query 7 X Query 8 X X X
    26. 26. Real world workload: ResultsCache System Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8StateCold Naive 0.08 1.65 >8h 28.88 89 170 844 1.699(sec.) Strabon 2.01 6.79 41.39 10.11 78.69 60.25 9.23 702.5 PG 5 Strabon 1.74 3.05 1623 46.52 12.57 2409 >8h 57.83 X Parliament 2.12 6.46 >8h 229.72 1130 872 3627 3786Warm Naïve 0.01 0.03 >8h 0.79 43.07 88 708 1712(sec.) Strabon 0.01 0.81 0.96 1.66 38.74 1.22 2.92 648.1 PG Strabon 0.01 0.26 1604.9 35.59 0.18 3196 >8h 44.72 X Parliament 0.01 0.04 >8h 10.91 358.92 483.29 2771 3502
    27. 27. Synthetic Workload• Workload based on a synthetic dataset Dataset based on OpenStreetMaps 10 million triples (2 GB) up to half a billion triples (50GB)  Implemented custom bulk loader Triples with spatial literals: 1 up to 46 million triples• Response time of queries with various thematic and spatial selectivities
    28. 28. Synthetic Workload: SampleData
    29. 29. Synthetic Workload: Sample stSPARQL QuerySELECT *WHERE { ?tag geordf:key "1" . ?node geordf:hasTag ?tag . ?node geo:hasGeography1 ?geo .FILTER (strdf:inside(?geo, "POLYGON((-1 -1, 0.056568542 -1, 0.056568542 0.056568542, -1 0.056568542, -1 -1))"^^strdf:WKT ))}
    30. 30. Real-world Workload: 100 million triples – warm cachesResponse time (sec) Response time (sec) number of Nodes in query region number of Nodes in query region Tag 1 Tag 1024
    31. 31. Real-world Workload:100 million triples – cold caches number of Nodes in query region number of Nodes in query region Tag 1 Tag 1024
    32. 32. Strabon-PG: 500 million triplesTags comparison Response time (sec)
    33. 33. Real-world Workload:Query Plans
    34. 34. Findings• Strabon over PostgreSQL outperforms other systems in case of warm caches• Results in case of cold caches mixed• PostreSQL optimizer needs to take into account spatial selectivity • PostGIS 2.0 moves towards it
    35. 35. System Language Index Geometries CRS support Geospatial Function SupportStrabon stSPARQL/ R-tree-over- WKT / GML Yes • OGC-SFA GeoSPARQL* GiST support • Egenhofer • RCC-8Parliament GeoSPARQL* R-Tree WKT / GML Yes •OGC-SFA support •Egenhofer •RCC-8Oracle 12c GeoSPARQL R-Tree, WKT Yes •OGC-SFA QuadtreeBrodt et al. SPARQL R-Tree WKT support No OGC-SFA(RDF-3X)Perry SPARQL-ST R-Tree GeoRSS GML Yes RCC-8AllegroGraph Extended Distribution 2D point Partial •Buffer SPARQL sweeping geometries •Bounding Box technique •DistanceOWLIM Extended Custom 2D point No •Point-in-polygon •Buffer SPARQL geometries •DistanceVirtuoso SPARQL R-Tree 2D point Yes SQL/MM (subset) geometriesuSeekM SPARQL R-tree-over WKT support No OGC-SFA GiST
    36. 36.  Introduction The data model stRDF The query Related Work & language stSPARQL and a comparison to Conclusions GeoSPARQL The system Strabon for stSPARQL and GeoSPARQL Experimental Evaluation Related Work & Conclusions
    37. 37. Future Work Use even larger datasets  Test with 1 billion triples successful Implement the temporal part of stSPARQL Develop a benchmark for geospatial RDF stores  Consider more systems  GeoSPARQL implementation of Oracle  Virtuoso stSPARQL query processing in MonetDB Go distributed!  Federated queries
    38. 38. Thanks! Any Questions? Strabon  Manolis Koubarakis, Kostis Kyzirakos, Manos Karpathiotakis, Charalampos Nikolaou, Giorgos Garbis, Konstantina Bereta, Kallirroi Dogani, Stella Giannakopoulou and Panayiotis Smeros.  Web site: http://strabon.di.uoa.gr  Mercurial repository: http://hg.strabon.di.uoa.gr  Trac: http://bug.strabon.di.uoa.gr  Mailing list: http://cgi.di.uoa.gr/~mailman/listinfo/strabon-users Real Time Fire Monitoring Service, National Obervatory of Athens  http://papos.space.noa.gr/fend_static Greek Linked Open Data  http://www.linkedopendata.gr  TELEIOS EU Project  http://www.earthobservatory.eu SemsorGrid4Env EU Project  http://www.semsorgrid4env.eu
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×