SPgen: A Benchmark Generator for Spatial Link Discovery Tools

SAVETA SPgen ISWC 2018 – Monterey CA
SPgen: A Benchmark Generator for Spatial
Link Discovery Tools
T. Saveta1, I. Fundulaki1, G. Flouris1, and A.-C. Ngonga-Ngomo2
1 Institute of Computer Science - FORTH, Greece
2 University of Paderborn, Germany
ISWC 2018 – Monterey CA
ISWC 2018 – Monterey CA

SAVETA SPgen ISWC 2018 – Monterey CASAVETA SPgen ISWC 2018 – Monterey CA
LOD Cloud and Geospatial Data
UK Postcodes
LinkedGeoData
GeoWordNet
Ocean Drilling - Forams
GeoNames Semantic Web
Pleiades
Ocean Drilling –
Janus Age Models
Weather Stations
OpenMobileNetwork
GeoLinkedData
EARTh
Linked Crowdsourced Data
greek-administrative-
geography
The Getty
Thesaurus of
Geographic
Names
Linked Logainm
FAO geopolitical ontology
NUTS (GeoVocab)
NUTS (GeoVocab)
AgreementMaker
Strabon
OntoIdea
Silk
RADON

Essential to determine the effectiveness and the efficiency of the
proposed techniques and tools
Compare them
Assess their performance
«Push» the systems to get better
Benchmarking Link Discovery Systems for Geospatial Data
Develop benchmarks for geo-spatial link discovery
systems

Organized into test cases each addressing different kind of
requirements
Datasets
The raw material of the benchmarks.These are the source and
the target dataset that will be matched together to find the links
Gold Standard
The “correct answer sheet” used to judge the completeness and
soundness of the link discovery algorithms
Metrics
The performance metric(s) that determine the systems behavior
and performance
Link Discovery Benchmark Ingredients

Only a limited number of benchmarks target the problem of
linking geo-spatial entities
Propose: Spatial Benchmark Generator (SPgen)
Differs from the classical, mostly string-based
approaches
Generic, schema agnostic and choke-point based
Test the performance of link discovery systems that
deal with the DE-9IM (Dimensionally Extended nine-
Intersection Model) - standard to describe topological
relations between two geometries.
Is integrated into HOBBIT platform
Spatial Benchmark Generator (SPgen)

Source Dataset
Traces (lists of (longitude, latitude)) represented as LineStrings in
the Well-known text (WKT) format
Target Dataset
Traces represented as LineStrings or Polygons in the WKT format
Gold Standard
Stores pairs of matched source and target traces
Generated using RADON to ensure completeness and
correctness
SPgen Overview
Given aWKT Geometry s as source and a DE-9IM Relation r, we generate a
WKT Geometry t as target such as their Intersection Matrix follows the
definition of Relation r.

Choke Points
CP1: Scalability
• Large datasets
• Large number of points for each trace
CP2: Output Quality Metric
• Precision
• Recall
• F-measure
CP3:Time Performance
SPgen Choke Points & Key Performance Indicators

SPgen: Source Dataset Generation
Input Dataset
(Traces)
Source Data
Generation
Target Data
Generation
(JTS Extension)
Test Case
Generation
Parameters
Source
Data
Target
Data
Gold
Standard
Generation
Gold
Standard

TomTom Data Generator
GPSTrace Generator
• Trace: a list of (longitude, latitude) pairs recorded by one
device (phone, car, etc.)
Spaten (Spatio-temporal and textual Big Data Generator)
Did not use their data generator but the 163.845 already
generated traces that came from 9.460 users and 38.800.019
GPSTraces
The generator can produce 14 users per day using the
free version of Google Maps API –2500 requests per day
Data Generators

SPgen: Target Dataset Generation
Input Dataset
(Traces)
Source Data
Generation
Target Data
Generation
(JTS Extension)
Test Case
Generation
Parameters
Source
Data
Target
Data
Gold
Standard
Generation
Gold
Standard

DE-9IM Topological Relations between LineStrings
Equals Disjoint Touches
Contains (Within)
& Covers (Covered By)
Intersects Crosses
Overlaps

DE-9IM Topological Relations between a LineString and a Polygon
Disjoint
TouchesIntersects
• Equals and Overlaps cannot hold between a LineString and a Polygon
Crosses Contains (Within)
& Covers (Covered By)

Disjoint example (LineString/LineString)

SPgen: Gold Standard Generation
Input Dataset
(Traces)
Source Data
Generation
Target Data
Generation
(JTS Extension)
Test Case
Generation
Parameters
Source
Data
Target
Data
Gold
Standard
Generation
Gold
Standard

Systems to test
Spatial Link Discovery systems: RADON, Silk, AML, OntoIdea
Tasks: For each data generator (TomTom/Spaten) we divided the
tasks into 4 subtasks, systems were asked to match the geometries
considering a given relation
SLL & LLL: LineStrings/LineStrings, Small dataset of 200
instances and Large dataset of 2K instances
SLP & LLP: LineStrings/Polygons, Small dataset of 200
instances Large dataset of 2K instances
Experiments for SPgen

Aims
Scalability
• Due to a limitation of Silk, we had to select traces no larger than 64
KB
Output Quality Metric
• Precision, Recall and F-measure (all were equal to 1.0)
Time Performance to show:
• The rate between the time performance and the size of the datasets
• The rate between the time performance and the different data
generators
• The time diﬀerence between the topological relations
• The time diﬀerence between the geometries
Experimental set up: Platform time limit = 75 minutes
Experiments for SPgen

Experiments for SPgen (LineStrings/LineStrings) [SLL]
Time rate (in ms) while increasing population for each topological relation for RADON, Silk, AML and OntoIdea.
*Platform time limit = 75 mins

Experiments for SPgen (LineStrings/LineStrings) [LLL]
Time rate (in ms) while increasing population for each topological relation for RADON, Silk, AML and OntoIdea.

Experiments for SPgen (LineStrings/Polygons) [SLP]
Time rate (in ms) while increasing population for each topological relation for RADON, Silk, AML.

Experiments for SPgen (LineStrings/Polygons) [LLP]
Time rate (in ms) while increasing population for each topological relation for RADON, Silk, AML.

RADON
Addressed all the tasks
Best performance for the LineString/PolygonTasks
Improvements in Touches and Intersects relations for the LineString/LineStringTasks
AML
Performs extremely well in most cases
Improvements in LineString/PolygonTasks and in Disjoint relations
Silk
Improvements in Touches, Intersects and Overlaps relations and for the
LineString/LineString and in Disjoint relation for the LineString/PolygonTasks
OntoIdea
Handles small datasets efficiently, this does not hold when it comes to larger datasets
Did not support LineString/PolygonTasks in this version
Experiments General Remarks

All the systems needed
More time forTomTom than Spaten, due to the smaller number
of points per trace in the latter
More time the LineString/LineString Tasks for Touches, Intersects
and Crosses relations
Approximately the same time for both LineString/LineString and
LineString/PolygonTasks for Disjoint relation
Less time for the LineString/LineString Tasks for Contains, Within,
Covers and Covered By relations
Depending on the test case we can choose the appropriate
system
Experiments General Remarks

Conclusions
SPgen: checks whether spatial link discovery systems can
identify DE-9IM topological relations between LineStrings and
Polygons
Experiments with RADON and Silk, AML and OntoIdea
Choke points: Scalability, Quality measure output andTime
performance
Next
Implement DE-9IM relations for all possible combinations of
different geometries (e.g. Polygon/Polygon, combination with
Points, LineStrings and Polygons, etc.)
More data generators – already added DEBS dataset
More system – already added Strabon
Concluding Remarks & Future Work

Thank you!
Questions?

This work was supported by grands from the EU H2020 Framework Programme
provided for the project HOBBIT (GA no. 688227).

SPgen: A Benchmark Generator for Spatial Link Discovery Tools

Recommended

Recommended

More Related Content

Similar to SPgen: A Benchmark Generator for Spatial Link Discovery Tools

Similar to SPgen: A Benchmark Generator for Spatial Link Discovery Tools (20)

More from Holistic Benchmarking of Big Linked Data

More from Holistic Benchmarking of Big Linked Data (20)

Recently uploaded

Recently uploaded (20)

SPgen: A Benchmark Generator for Spatial Link Discovery Tools