SlideShare a Scribd company logo
1 of 28
Download to read offline
December 9, 2021
Three Years of the ExtremeEarth Project
Online Workshop
Antonis Troumpoukis (NCSR-D)
Federating Big Linked Geospatial Data
Outline
• Semagrow federated query processor
• KOBE benchmarking engine
• ExtremeEarth Use-cases
Outline
• Semagrow federated query processor
• KOBE benchmarking engine
• ExtremeEarth Use-cases
4
Federated query processors
• Systems that seamlessly integrate data from multiple
remote dataset servers.
• Receive a query, issue the necessary subqueries in the
remote servers, combine the results accordingly, and
present the result to the user.
• Used thoroughly in Linked-Data; there exist many data
providers that publish their datasets through SPARQL
endpoints.
5
Semagrow federated query processor
• Semagrow is an open source dynamic data integration system:
o makes the best out of all public data, regardless of their size, update rate, and schema.
o presents to client applications a single, unified SPARQL endpoint that federates multiple data sources.
o manages both syntactic and semantic heterogeneity.
• The federated data sources may serve data that use different vocabularies and codelists
o Semagrow dynamically transforms responses from the different data sources to match the vocabularies used in the
query.
• The federated data sources may offer SPARQL, GeoSPARQL, SQL, or CQL (CassandraQL) APIs.
o Semagrow processes SPARQL queries and appropriately re-writes the sub-queries for each data source.
o Semagrow fills in the missing expressivity, e.g. arbitrary joins for CQL sources
6
History
• Originally developed in FP7 SemaGrow:
o SPARQL endpoint federation engine [1]
o Multi-threaded application, deployed directly on server or VM
o Dynamic vocabulary mapping
• Extended in H2020 BigDataEurope:
o Containerization and ability to deploy on Cloud infrastructures
o Re-engineered architecture to allow executor plugins that manage syntactic heterogeneity [2]
o Limited support for big linked geospatial data [3]
▪ The federation could include only one geospatial data source
• During ExtremeEarth, we have developed a new version of Semagrow. Now, Semagrow is the first
federation engine to be able to federate multiple geospatial data sources.
[1] A. Charalambidis, A. Troumpoukis, S. Konstantopoulos: SemaGrow: optimizing federated SPARQL queries. In SEMANTICS 2015, Vienna, Austria, September 15-17, 2015
[2] S. Konstantopoulos, A. Charalambidis, G. Mouchakis, A. Troumpoukis, J. Jakobitsch, V. Karkaletsis: Semantic Web Technologies and Big Data Infrastructures: SPARQL Federated
Querying of Heterogeneous Big Data Stores. In ISWC 2016 (Posters & Demos), Kobe, Japan, October 17-21, 2016
[3] A. Davvetas, I. Klampanos, S. Andronopoulos, G. Mouchakis, S. Konstantopoulos, A. Ikonomopoulos, V. Karkaletsis: Big Data Processing and Semantic Web Technologies for Decision
Making in Hazardous Substance Dispersion Emergencies. In ISWC 2017 (Posters, Demos & Industry Tracks), Vienna, Austria, October 21-57, 2017
7
The Semagrow architecture
• Source selector Identifies which of the federated sources refer to which parts of the query.
• Query planner Constructs an efficient query execution plan
• Query executor Evaluates the query execution plan and returns the results to the client.
8
Source Selector
● Identifies which of the federated sources refer to which parts of the query
● Should exclude as many redundant sources as possible, but without removing any nevecssary sources
● Combines two mechanisms:
○ Thematic data:
■ Extended with all the state-of-the-art source selection methods
● Predicate and Class metadata
● ASK queries (and cache)
● URI-Prefix-based source selection
○ Geospatial data:
■ A novel approach that targets geospatial data sources [4]
● Annotate all federated data sources with a bounding polygon
● Use such a summary to filter out sources that refer to irrelevant areas
[4] A. Troumpoukis, S. Konstantopoulos, N. Prokopaki-Kostopoulou: A Geospatial Source Selector for Federated GeoSPARQL Querying, To be submitted in SWJ.
9
Geospatial source selection
Method
• Each data source is tagged with a bounding polygon that contains all geometries of the source.
• For each triple pattern of the form ?x geo:asWKT ?y, prune the set of sources obtained by the wrapped
source selectors w.r.t. to its relevant geospatial filters and the bounding polygons of the set of sources.
o Geospatial selections: geospatial filters with one free variable
▪ ?s geo:asWKT ?x.
FILTER (geof:sfIntersects(
?x, KNOWN_WKT))
▪ If the border of a candidate source for the pattern is disjoint
from KNOWN_WKT then the source is irrelevant.
o Geospatial joins: geospatial filters with two free variables
▪ ?s1 geo:asWKT ?x.
?s2 geo:asWKT ?y.
FILTER (geof:sfWithin(
?x,?y))
▪ If the border of a candidate source for the first pattern is
disjoint with the borders of all candidate sources for the
second pattern, then the former source is irrelevant.
• We consider standard spatial relations (all apart from disjoint) and within-distance comparisons.
10
Query Planner
• Constructs an efficient query execution plan
• Uses endpoint statistics and dynamic-programming to find an optimal query plan.
• Federated geospatial joins: Bind join strategy with filter-pushdown optimization
o Reduction of the communication cost
o Geospatial functions are evaluated faster in the sources (spatial index)
• Evaluation of complex thematic queries:
o Examples;
▪ Subqueries (inner SELECT queries)
▪ ORDER BY, LIMIT 1
▪ FILTER NOT EXISTS
o Such queries appear in the Use-cases (Data-Validation of Land Usage Data)
11
Query Executor
• Evaluates the query execution plan and returns the results to the client.
o provides a mechanism for issuing queries to the remote endpoints
o provides an implementation of all geospatial operators that may appear in the plan
▪ GeoSPARQL, stSPARQL functions
• PostGIS connector:
o Semagrow allows executor plugins for non-SPARQL endpoints
o a plugin for communicating directly with PostGIS databases with shapefile data that contain
geometric shapes exclusively.
• Optimization of federated geospatial within-distance joins [5]
o Insert additional geospatial filters in the source queries
o Filter out shapes that are “too-far away” using the spatial index of the source
[5] A. Troumpoukis, S. Konstantopoulos, N. Prokopaki-Kostopoulou: A Geospatial Join Optimization for Federated GeoSPARQL Querying, To be submitted in ESWC2022.
12
Geospatial join optimization
Method
• Situation: bind join with filter pushdown optimization. Example:
?s1 geo:asWKT ?x .
?s2 geo:asWKT ?y .
FILTER (geof:distance(?x, ?y, uom:metre) < 10).
• Such queries appear frequently in the use cases
• The within-distance operation is computationally expensive: It cannot be
answered from the spatial index.
• Intuition:
o ?x is bound from the left part of the federated join, thus the filter during the query execution phase looks like this:
FILTER (geof:distance(
KNOWN_WKT, ?y, uom:metre) < 10).
o To help with the evaluation of the remote query by the federated endpoint, we can add the filter
FILTER (geof:sfIntersects(?y, CONSTRUCTED_BOX))
where CONSTRUCTED_BOXis equal to the bounding box of the buffer of size D around KNOWN_WKT.
13
Semagrow in ExtremeEarth
• Integration within Hopsworks
• Provides an extra layer over big linked
geospatial data store Strabo2
• Can be used to combine the data stored
in Strabo2 with additional external
geospatial endpoints.
Outline
• Semagrow federated query processor
• KOBE benchmarking engine
• ExtremeEarth Use-cases
15
History
• During the benchmarking activities of the FP7 SemaGrow project, we were faced with the need of a
framework that would help us for conducting experiments.
• Originally developed in H2020 BigDataEurope:
o Docker Containerization to abstract from the installation intricacies of each system
• During ExtremeEarth we explored this idea even further…
16
The KOBE Open Benchmarking Engine
• KOBE is a framework for benchmarking federated query engines.
• Features:
o Automation of the various tasks:
deployment, initialization of dataset servers and federation engines,
experiment execution
o Reproducibility in different environments:
each component in its own Docker container
o Declarative specifications:
formalism that hides from the user the details of provisioning and
orchestrating
o Simulating real-life scenarios:
network delays (dataset server latency limitations)
o Results presentation:
collection of logs and visualization in a WebUI
o Extensibility:
supports the integration of new benchmarks, new federators and
new remote dataset servers
17
The KOBE Open Benchmarking Engine (cont.)
• Re-engineered KOBE into 3 subsystems (Deployment, Networking, Logging)
• Technologies used: Docker, Kubernetes for orchestration, Istio for simulating delays, EFK stack for logs
• Command line interface for control, Kibana dashboards for viewing the results
[6] C. Kostopoulos, G. Mouchakis, A. Troumpoukis, N. Prokopaki-Kostopoulou, A. Charalambidis, S. Konstantopoulos: KOBE: Cloud-Native Open Benchmarking Engine for
Federated Query Processors. In ESWC 2021: 664-679
[7] C. Kostopoulos, G. Mouchakis, N. Prokopaki-Kostopoulou, A. Troumpoukis, A. Charalambidis, S. Konstantopoulos: KOBE: Cloud-native Open Benchmarking Engine for
Federated Query Processors. In ISWC (Demos/Industry) 2020: 325-330
18
The KOBE Open Benchmarking Engine (cont.)
• Dataset servers: Virtuoso, Strabo2, Federation Engines: Semagrow, FedX
• Benchmarks: Fedbench, LargeRDFBench, OPFbench, Geographica2, Geofedbench.
• Detailed Documentation (step by step instructions for getting started, using and extending KOBE).
https://semagrow.github.io/kobe/ (publicly available)
Outline
• Semagrow federated query processor
• KOBE benchmarking engine
• ExtremeEarth Use-cases
20
Combining Snow-cover data with Crop-type data
for Food Security
Datasets
• 3 data layers that cover Austria:
o Administrative, Snow cover, Crop type data
o each layer is partitioned geospatially
• Each dataset contains a single thematic layer and
refers to a specific polygonal area.
• 4.5 million triples, ~4GB of data in N-triples
• 34 GeoSPARQL endpoints.
We envisage that Austrian state governments publish crop datasets
for their own area of responsibility; and a further (different) entity
publishes a snow cover dataset that ignores state boundaries and
publishes its datasets according to a geographical grid.
Example: All snow-covered crops within a specific
area of interest (shown in red) appear only in 2 of the
total 12 datasets.
21
Combining Snow-cover data with Crop-type data
for Food Security
Queries
Queries
Q1 municipalities intersecting a given polygon
Q2 snow-covered potato fields intersecting a given polygon
Q3 potato fields within 5km from snow and intersecting a given polygon
Q4 snow area within 5km from a given municipality
Q5 potato fields within a given municipality
Q6 snow-covered potato fields within given municipality
Q7 potato fields within 5km from snow and within a given municipality
22
Combining Snow-cover data with Crop-type data
for Food Security
Experimental results
#layers
query processing time
geo-poly geo-appr them
Q1 1 0.200 0.205 0.180
Q2 2 0.985 0.525 0.755
Q3 2 5.245 1.215 1.810
Q4 2 8.785 7.940 9.025
Q5 2 0.605 0.445 0.520
Q6 3 15.535 n/a n/a
Q7 3 39.670 n/a n/a
• them: no geospatial metadata - geo-poly and
geo-appr use geospatial metadata - geo-poly
has more precise boundaries than geo-appr.
• Q1 is the easiest (1 data layer) them is faster..
• Q6-Q7 are the most difficult, (3 data layers).
only geo-poly can evaluate the queries.
• Q2-Q5 difficulty is in between (2 data layers).
geo-appr is the preferred (geo-poly too much
time in source selection, them spends more
time in planning, execution)
23
Validating Land-Usage Data
Datasets
• Austrian Land Parcel Identification System (INVEKOS)
o crop parcels in Austria and the owners' self-declaration about the
crops grown in each parcel
• Land Use and Cover Area Survey (LUCAS)
o agro-environmental and soil data by field observation of
geographically referenced points
• Task: Validate crop-type data of INVEKOS using LUCAS
• Crop-type map provided by UNITN
• 14.1 million triples, ~4GB of data in N-triples format
24
Validating Land-Usage Data
Queries
Queries
Q1
given a LUCAS instance, return the closest
INVEKOS instance if it is within 10 meters
and their crop types match
positive
validation
Q2
given a LUCAS instance, return the closest
INVEKOS instance if it is within 10 meters
and their crop types do not match
negative
validation
Q3
given a LUCAS instance, return it if there is
no closest INVEKOS instance within 10 meters
irrelevant
Example: 3 ground observations located in
the roads adjacent to field parcels, used for
crop-type validation of the field dataset. 2
of them (the green ones) provide a positive
and the other one provides a negative
validation.
25
Validating Land-Usage Data
Experimental results
• PostGIS: all data in a single PostGIS. semagrow-std without within-distance optimization, semagrow-opt
with the optimization.
• Semagrow without optimization is slower but similar to standalone PostGIS.
• Optimized Semagrow is faster by two orders of magnitude.
#queries
query execution time
PostGIS semagrow-std semagrow-opt
total average total average total average
Q1 2488 54 hours 78.6 sec 83 hours 120 sec 106 mins 2.6 sec
Q2 2488 54 hours 78.4 sec 82 hours 119 sec 99 mins 2.4 sec
Q3 2488 54 hours 78.6 sec 81 hours 117 sec 74 mins 1.8 sec
26
• Semagrow endpoint in Hopsworks-TEP infrastructure.
o Endpoint 1:
▪ Strabo2 endpoint already deployed in Hopsworks-TEP infrastructure
▪ Contains Extreme Earth data
o Endpoint 2:
▪ Public Strabon endpoint
▪ Contains GADM of Germany
• Federated query for demo (“Query1” for a specific administrative region)
o Regions affected by precipitation in Quarter 2 of 2021 that was lower then -15% of the
normal rainfall and that are equipped with irrigation and intersect with state of Branderburg
o Semagrow operates as follows:
▪ Retrieve the WKT of the state of Brandenburg from Endpoint 2
▪ Retrieve all relevant EE that are found within WKT from Endpoint 1
o Returns 447 Results.
Combining EE data with public endpoints
Datasets and Queries
27
Conclusions
• We have developed a new version of Semagrow, Now, Semagrow is the first federation engine to be
able to federate multiple big linked geospatial data sources.
• We have developed a new version of the KOBE benchmarking engine, which is a useful tool for
benchmarking federated query processors.
• We have applied Semagrow to several exercises and use cases from the Extreme Earth project
(Land-usage data validation, Combination of snow-cover and crop-type data for food security, etc.)
Thank you!

More Related Content

What's hot

Sky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor ComputationSky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor Computation
EUDAT
 
Opening the Path to Technical Excellence
Opening the Path to Technical ExcellenceOpening the Path to Technical Excellence
Opening the Path to Technical Excellence
NETWAYS
 
Clustering
ClusteringClustering
Clustering
Anjan Goswami
 

What's hot (20)

Enabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsEnabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
 
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechGeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
 
Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster Relief
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DC
 
Sky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor ComputationSky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor Computation
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9p
 
OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3
 
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...
 
CourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkCourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on Spark
 
Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?
 
Opening the Path to Technical Excellence
Opening the Path to Technical ExcellenceOpening the Path to Technical Excellence
Opening the Path to Technical Excellence
 
SmartMet Server OSGeo
SmartMet Server OSGeoSmartMet Server OSGeo
SmartMet Server OSGeo
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
 
Clustering
ClusteringClustering
Clustering
 
State of the Map US 2018: Analytic Support to Mapping Contributors
State of the Map US 2018: Analytic Support to Mapping ContributorsState of the Map US 2018: Analytic Support to Mapping Contributors
State of the Map US 2018: Analytic Support to Mapping Contributors
 
Working with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and GeotrellisWorking with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and Geotrellis
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
A Scalable Dataflow Implementation of Curran's Approximation AlgorithmA Scalable Dataflow Implementation of Curran's Approximation Algorithm
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
 

Similar to Big Linked Data Federation - ExtremeEarth Open Workshop

OREChem Services and Workflows
OREChem Services and WorkflowsOREChem Services and Workflows
OREChem Services and Workflows
marpierc
 

Similar to Big Linked Data Federation - ExtremeEarth Open Workshop (20)

ICWE2017 BigDataEurope
ICWE2017 BigDataEuropeICWE2017 BigDataEurope
ICWE2017 BigDataEurope
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overview
 
CLIM Program: Remote Sensing Workshop, The Earth System Grid Federation as a ...
CLIM Program: Remote Sensing Workshop, The Earth System Grid Federation as a ...CLIM Program: Remote Sensing Workshop, The Earth System Grid Federation as a ...
CLIM Program: Remote Sensing Workshop, The Earth System Grid Federation as a ...
 
SERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_schoolSERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_school
 
SERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the Cloud
 
OREChem Services and Workflows
OREChem Services and WorkflowsOREChem Services and Workflows
OREChem Services and Workflows
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
WMS Performance Shootout 2011
WMS Performance Shootout 2011WMS Performance Shootout 2011
WMS Performance Shootout 2011
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
 
Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
 
Scientific
Scientific Scientific
Scientific
 
Compressing and Sparsifying LLM in GenAI Applications
Compressing and Sparsifying LLM in GenAI ApplicationsCompressing and Sparsifying LLM in GenAI Applications
Compressing and Sparsifying LLM in GenAI Applications
 
WSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product OverviewWSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product Overview
 
Integrating PostGIS in Web Applications
Integrating PostGIS in Web ApplicationsIntegrating PostGIS in Web Applications
Integrating PostGIS in Web Applications
 
DEMETER at OGC Agriculture Session
DEMETER at OGC Agriculture SessionDEMETER at OGC Agriculture Session
DEMETER at OGC Agriculture Session
 
01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf
 
Analysis Ready Data workshop - OGC presentation
Analysis Ready Data workshop - OGC presentation Analysis Ready Data workshop - OGC presentation
Analysis Ready Data workshop - OGC presentation
 

More from ExtremeEarth

More from ExtremeEarth (13)

Polar Use Case - ExtremeEarth Open Workshop
Polar Use Case  - ExtremeEarth Open WorkshopPolar Use Case  - ExtremeEarth Open Workshop
Polar Use Case - ExtremeEarth Open Workshop
 
ExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - IntroductionExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - Introduction
 
Food Security Use Case - ExtremeEarth Open Workshop
Food Security Use Case - ExtremeEarth Open WorkshopFood Security Use Case - ExtremeEarth Open Workshop
Food Security Use Case - ExtremeEarth Open Workshop
 
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
 
ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth Data Science Pipeline for Linked Earth Observation DataExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
 
Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...
 
Snow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and IrrigationSnow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and Irrigation
 
Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19
 
The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19
 
Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19
 
Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19
 
Copernicus and AI workshop 2020
Copernicus and AI workshop 2020Copernicus and AI workshop 2020
Copernicus and AI workshop 2020
 
LPS19 ExtremeEarth Project
LPS19 ExtremeEarth ProjectLPS19 ExtremeEarth Project
LPS19 ExtremeEarth Project
 

Recently uploaded

edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
great91
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
ppy8zfkfm
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Valters Lauzums
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 
Toko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra Malang
Toko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra MalangToko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra Malang
Toko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra Malang
adet6151
 
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat ViagraToko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
adet6151
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
Amil baba
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
fztigerwe
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
 

Recently uploaded (20)

ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
Toko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra Malang
Toko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra MalangToko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra Malang
Toko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra Malang
 
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat ViagraToko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 

Big Linked Data Federation - ExtremeEarth Open Workshop

  • 1. December 9, 2021 Three Years of the ExtremeEarth Project Online Workshop Antonis Troumpoukis (NCSR-D) Federating Big Linked Geospatial Data
  • 2. Outline • Semagrow federated query processor • KOBE benchmarking engine • ExtremeEarth Use-cases
  • 3. Outline • Semagrow federated query processor • KOBE benchmarking engine • ExtremeEarth Use-cases
  • 4. 4 Federated query processors • Systems that seamlessly integrate data from multiple remote dataset servers. • Receive a query, issue the necessary subqueries in the remote servers, combine the results accordingly, and present the result to the user. • Used thoroughly in Linked-Data; there exist many data providers that publish their datasets through SPARQL endpoints.
  • 5. 5 Semagrow federated query processor • Semagrow is an open source dynamic data integration system: o makes the best out of all public data, regardless of their size, update rate, and schema. o presents to client applications a single, unified SPARQL endpoint that federates multiple data sources. o manages both syntactic and semantic heterogeneity. • The federated data sources may serve data that use different vocabularies and codelists o Semagrow dynamically transforms responses from the different data sources to match the vocabularies used in the query. • The federated data sources may offer SPARQL, GeoSPARQL, SQL, or CQL (CassandraQL) APIs. o Semagrow processes SPARQL queries and appropriately re-writes the sub-queries for each data source. o Semagrow fills in the missing expressivity, e.g. arbitrary joins for CQL sources
  • 6. 6 History • Originally developed in FP7 SemaGrow: o SPARQL endpoint federation engine [1] o Multi-threaded application, deployed directly on server or VM o Dynamic vocabulary mapping • Extended in H2020 BigDataEurope: o Containerization and ability to deploy on Cloud infrastructures o Re-engineered architecture to allow executor plugins that manage syntactic heterogeneity [2] o Limited support for big linked geospatial data [3] ▪ The federation could include only one geospatial data source • During ExtremeEarth, we have developed a new version of Semagrow. Now, Semagrow is the first federation engine to be able to federate multiple geospatial data sources. [1] A. Charalambidis, A. Troumpoukis, S. Konstantopoulos: SemaGrow: optimizing federated SPARQL queries. In SEMANTICS 2015, Vienna, Austria, September 15-17, 2015 [2] S. Konstantopoulos, A. Charalambidis, G. Mouchakis, A. Troumpoukis, J. Jakobitsch, V. Karkaletsis: Semantic Web Technologies and Big Data Infrastructures: SPARQL Federated Querying of Heterogeneous Big Data Stores. In ISWC 2016 (Posters & Demos), Kobe, Japan, October 17-21, 2016 [3] A. Davvetas, I. Klampanos, S. Andronopoulos, G. Mouchakis, S. Konstantopoulos, A. Ikonomopoulos, V. Karkaletsis: Big Data Processing and Semantic Web Technologies for Decision Making in Hazardous Substance Dispersion Emergencies. In ISWC 2017 (Posters, Demos & Industry Tracks), Vienna, Austria, October 21-57, 2017
  • 7. 7 The Semagrow architecture • Source selector Identifies which of the federated sources refer to which parts of the query. • Query planner Constructs an efficient query execution plan • Query executor Evaluates the query execution plan and returns the results to the client.
  • 8. 8 Source Selector ● Identifies which of the federated sources refer to which parts of the query ● Should exclude as many redundant sources as possible, but without removing any nevecssary sources ● Combines two mechanisms: ○ Thematic data: ■ Extended with all the state-of-the-art source selection methods ● Predicate and Class metadata ● ASK queries (and cache) ● URI-Prefix-based source selection ○ Geospatial data: ■ A novel approach that targets geospatial data sources [4] ● Annotate all federated data sources with a bounding polygon ● Use such a summary to filter out sources that refer to irrelevant areas [4] A. Troumpoukis, S. Konstantopoulos, N. Prokopaki-Kostopoulou: A Geospatial Source Selector for Federated GeoSPARQL Querying, To be submitted in SWJ.
  • 9. 9 Geospatial source selection Method • Each data source is tagged with a bounding polygon that contains all geometries of the source. • For each triple pattern of the form ?x geo:asWKT ?y, prune the set of sources obtained by the wrapped source selectors w.r.t. to its relevant geospatial filters and the bounding polygons of the set of sources. o Geospatial selections: geospatial filters with one free variable ▪ ?s geo:asWKT ?x. FILTER (geof:sfIntersects( ?x, KNOWN_WKT)) ▪ If the border of a candidate source for the pattern is disjoint from KNOWN_WKT then the source is irrelevant. o Geospatial joins: geospatial filters with two free variables ▪ ?s1 geo:asWKT ?x. ?s2 geo:asWKT ?y. FILTER (geof:sfWithin( ?x,?y)) ▪ If the border of a candidate source for the first pattern is disjoint with the borders of all candidate sources for the second pattern, then the former source is irrelevant. • We consider standard spatial relations (all apart from disjoint) and within-distance comparisons.
  • 10. 10 Query Planner • Constructs an efficient query execution plan • Uses endpoint statistics and dynamic-programming to find an optimal query plan. • Federated geospatial joins: Bind join strategy with filter-pushdown optimization o Reduction of the communication cost o Geospatial functions are evaluated faster in the sources (spatial index) • Evaluation of complex thematic queries: o Examples; ▪ Subqueries (inner SELECT queries) ▪ ORDER BY, LIMIT 1 ▪ FILTER NOT EXISTS o Such queries appear in the Use-cases (Data-Validation of Land Usage Data)
  • 11. 11 Query Executor • Evaluates the query execution plan and returns the results to the client. o provides a mechanism for issuing queries to the remote endpoints o provides an implementation of all geospatial operators that may appear in the plan ▪ GeoSPARQL, stSPARQL functions • PostGIS connector: o Semagrow allows executor plugins for non-SPARQL endpoints o a plugin for communicating directly with PostGIS databases with shapefile data that contain geometric shapes exclusively. • Optimization of federated geospatial within-distance joins [5] o Insert additional geospatial filters in the source queries o Filter out shapes that are “too-far away” using the spatial index of the source [5] A. Troumpoukis, S. Konstantopoulos, N. Prokopaki-Kostopoulou: A Geospatial Join Optimization for Federated GeoSPARQL Querying, To be submitted in ESWC2022.
  • 12. 12 Geospatial join optimization Method • Situation: bind join with filter pushdown optimization. Example: ?s1 geo:asWKT ?x . ?s2 geo:asWKT ?y . FILTER (geof:distance(?x, ?y, uom:metre) < 10). • Such queries appear frequently in the use cases • The within-distance operation is computationally expensive: It cannot be answered from the spatial index. • Intuition: o ?x is bound from the left part of the federated join, thus the filter during the query execution phase looks like this: FILTER (geof:distance( KNOWN_WKT, ?y, uom:metre) < 10). o To help with the evaluation of the remote query by the federated endpoint, we can add the filter FILTER (geof:sfIntersects(?y, CONSTRUCTED_BOX)) where CONSTRUCTED_BOXis equal to the bounding box of the buffer of size D around KNOWN_WKT.
  • 13. 13 Semagrow in ExtremeEarth • Integration within Hopsworks • Provides an extra layer over big linked geospatial data store Strabo2 • Can be used to combine the data stored in Strabo2 with additional external geospatial endpoints.
  • 14. Outline • Semagrow federated query processor • KOBE benchmarking engine • ExtremeEarth Use-cases
  • 15. 15 History • During the benchmarking activities of the FP7 SemaGrow project, we were faced with the need of a framework that would help us for conducting experiments. • Originally developed in H2020 BigDataEurope: o Docker Containerization to abstract from the installation intricacies of each system • During ExtremeEarth we explored this idea even further…
  • 16. 16 The KOBE Open Benchmarking Engine • KOBE is a framework for benchmarking federated query engines. • Features: o Automation of the various tasks: deployment, initialization of dataset servers and federation engines, experiment execution o Reproducibility in different environments: each component in its own Docker container o Declarative specifications: formalism that hides from the user the details of provisioning and orchestrating o Simulating real-life scenarios: network delays (dataset server latency limitations) o Results presentation: collection of logs and visualization in a WebUI o Extensibility: supports the integration of new benchmarks, new federators and new remote dataset servers
  • 17. 17 The KOBE Open Benchmarking Engine (cont.) • Re-engineered KOBE into 3 subsystems (Deployment, Networking, Logging) • Technologies used: Docker, Kubernetes for orchestration, Istio for simulating delays, EFK stack for logs • Command line interface for control, Kibana dashboards for viewing the results [6] C. Kostopoulos, G. Mouchakis, A. Troumpoukis, N. Prokopaki-Kostopoulou, A. Charalambidis, S. Konstantopoulos: KOBE: Cloud-Native Open Benchmarking Engine for Federated Query Processors. In ESWC 2021: 664-679 [7] C. Kostopoulos, G. Mouchakis, N. Prokopaki-Kostopoulou, A. Troumpoukis, A. Charalambidis, S. Konstantopoulos: KOBE: Cloud-native Open Benchmarking Engine for Federated Query Processors. In ISWC (Demos/Industry) 2020: 325-330
  • 18. 18 The KOBE Open Benchmarking Engine (cont.) • Dataset servers: Virtuoso, Strabo2, Federation Engines: Semagrow, FedX • Benchmarks: Fedbench, LargeRDFBench, OPFbench, Geographica2, Geofedbench. • Detailed Documentation (step by step instructions for getting started, using and extending KOBE). https://semagrow.github.io/kobe/ (publicly available)
  • 19. Outline • Semagrow federated query processor • KOBE benchmarking engine • ExtremeEarth Use-cases
  • 20. 20 Combining Snow-cover data with Crop-type data for Food Security Datasets • 3 data layers that cover Austria: o Administrative, Snow cover, Crop type data o each layer is partitioned geospatially • Each dataset contains a single thematic layer and refers to a specific polygonal area. • 4.5 million triples, ~4GB of data in N-triples • 34 GeoSPARQL endpoints. We envisage that Austrian state governments publish crop datasets for their own area of responsibility; and a further (different) entity publishes a snow cover dataset that ignores state boundaries and publishes its datasets according to a geographical grid. Example: All snow-covered crops within a specific area of interest (shown in red) appear only in 2 of the total 12 datasets.
  • 21. 21 Combining Snow-cover data with Crop-type data for Food Security Queries Queries Q1 municipalities intersecting a given polygon Q2 snow-covered potato fields intersecting a given polygon Q3 potato fields within 5km from snow and intersecting a given polygon Q4 snow area within 5km from a given municipality Q5 potato fields within a given municipality Q6 snow-covered potato fields within given municipality Q7 potato fields within 5km from snow and within a given municipality
  • 22. 22 Combining Snow-cover data with Crop-type data for Food Security Experimental results #layers query processing time geo-poly geo-appr them Q1 1 0.200 0.205 0.180 Q2 2 0.985 0.525 0.755 Q3 2 5.245 1.215 1.810 Q4 2 8.785 7.940 9.025 Q5 2 0.605 0.445 0.520 Q6 3 15.535 n/a n/a Q7 3 39.670 n/a n/a • them: no geospatial metadata - geo-poly and geo-appr use geospatial metadata - geo-poly has more precise boundaries than geo-appr. • Q1 is the easiest (1 data layer) them is faster.. • Q6-Q7 are the most difficult, (3 data layers). only geo-poly can evaluate the queries. • Q2-Q5 difficulty is in between (2 data layers). geo-appr is the preferred (geo-poly too much time in source selection, them spends more time in planning, execution)
  • 23. 23 Validating Land-Usage Data Datasets • Austrian Land Parcel Identification System (INVEKOS) o crop parcels in Austria and the owners' self-declaration about the crops grown in each parcel • Land Use and Cover Area Survey (LUCAS) o agro-environmental and soil data by field observation of geographically referenced points • Task: Validate crop-type data of INVEKOS using LUCAS • Crop-type map provided by UNITN • 14.1 million triples, ~4GB of data in N-triples format
  • 24. 24 Validating Land-Usage Data Queries Queries Q1 given a LUCAS instance, return the closest INVEKOS instance if it is within 10 meters and their crop types match positive validation Q2 given a LUCAS instance, return the closest INVEKOS instance if it is within 10 meters and their crop types do not match negative validation Q3 given a LUCAS instance, return it if there is no closest INVEKOS instance within 10 meters irrelevant Example: 3 ground observations located in the roads adjacent to field parcels, used for crop-type validation of the field dataset. 2 of them (the green ones) provide a positive and the other one provides a negative validation.
  • 25. 25 Validating Land-Usage Data Experimental results • PostGIS: all data in a single PostGIS. semagrow-std without within-distance optimization, semagrow-opt with the optimization. • Semagrow without optimization is slower but similar to standalone PostGIS. • Optimized Semagrow is faster by two orders of magnitude. #queries query execution time PostGIS semagrow-std semagrow-opt total average total average total average Q1 2488 54 hours 78.6 sec 83 hours 120 sec 106 mins 2.6 sec Q2 2488 54 hours 78.4 sec 82 hours 119 sec 99 mins 2.4 sec Q3 2488 54 hours 78.6 sec 81 hours 117 sec 74 mins 1.8 sec
  • 26. 26 • Semagrow endpoint in Hopsworks-TEP infrastructure. o Endpoint 1: ▪ Strabo2 endpoint already deployed in Hopsworks-TEP infrastructure ▪ Contains Extreme Earth data o Endpoint 2: ▪ Public Strabon endpoint ▪ Contains GADM of Germany • Federated query for demo (“Query1” for a specific administrative region) o Regions affected by precipitation in Quarter 2 of 2021 that was lower then -15% of the normal rainfall and that are equipped with irrigation and intersect with state of Branderburg o Semagrow operates as follows: ▪ Retrieve the WKT of the state of Brandenburg from Endpoint 2 ▪ Retrieve all relevant EE that are found within WKT from Endpoint 1 o Returns 447 Results. Combining EE data with public endpoints Datasets and Queries
  • 27. 27 Conclusions • We have developed a new version of Semagrow, Now, Semagrow is the first federation engine to be able to federate multiple big linked geospatial data sources. • We have developed a new version of the KOBE benchmarking engine, which is a useful tool for benchmarking federated query processors. • We have applied Semagrow to several exercises and use cases from the Extreme Earth project (Land-usage data validation, Combination of snow-cover and crop-type data for food security, etc.)