SlideShare a Scribd company logo
1 of 16
Interlinking Big Linked Geospatial Data
George Papadakis
ExtremeEarth Online Workshop 9/12/2021
2
Geospatial Interlinking in action
Mandilaras et. al. “Ice monitoring with ExtremeEarth” LASCAR workshop 2020, co-located with ESWC
3
Detected links
Two different types:
1. Proximity relations (such as dbp:near) with a distance threshold
• e.g., find all cities from S that are less than 1km away from any river in T
2. Topological relations according to the Dimensionally Extended 9-Intersection Model (DE9IM)
• Equals
• Disjoint
• Touches
• Contains
• Covers
• Intersects
• Within
• CoveredBy
• Crosses
• Overlaps
4
Geospatial Interlinking Example
Three topological relations:
1. LineString g1 touches Polygon g3
2. LineString g1 intersects LineString g2
3. Polygon g3 contains Polygon g4
Challenges:
1. quadratic time complexity, O(n2)
2. time-consuming topological relations
g3
5
GIA.nt: Geospatial Interlinking At large
Goes beyond existing Filtering methods in two ways:
1. Redundant pairs are inherently removed
2. Space tiling depends only on the source dataset →
the target dataset is read from the disk
(>50% lower memory footprint)
Introduces Holistic Verification
• based on the Intersection Matrix →
80% lower run-time
G. Papadakis, G. Mandilaras, N. Mamoulis, M. Koubarakis: Progressive, Holistic Geospatial Interlinking. WWW 2021
6
Progressive Geospatial Interlinking
Same Filtering as GIA.nt.
Introduces Scheduling:
• Priority queue with top-BU weighted candidate pairs, where BU is
the available budget and weight is determined by:
• Co-occurrence Frequency (CF): #common tiles
• Jaccard Similarity (JS): normalized CF
• Pearson’s 𝜒2 test (𝜒2): degree to which s and t occur
independently in tiles
Verification processes the pairs of the queue in decreasing weight.
G. Papadakis, G. Mandilaras, N. Mamoulis, M. Koubarakis: Progressive, Holistic Geospatial Interlinking. WWW 2021
7
Dynamic Progressive Geospatial Interlinking
New weighting schemes:
• POINTS: smaller geometries processed first → higher time efficiency
• MBR: higher overlap in Minimum Bounding Rectangles first → higher effectiveness
• composite weights → higher effectiveness, more deterministic behavior
New scheduling:
• instead of static processing order of geometry pairs, the processing order is updated dynamically, as
more topologically related pairs are detected
8
JedAI-spatial
Publicly available at: https://github.com/GeoLinker/GeoLinker
Solution space:
Model-view-controller architecture
9
JedAI-spatial – Part B
• Common three-stage pipeline for the
state-of-the-art parallel joins:
o GeoSpark, i.e., Apache Sedona
o Spatial Spark
o Magellan
o Location Spark
o Parallel GIA.nt
Scalability Analysis over D1
(|S|=2.3M, |T|=5.8M, |C|=6.3M)
10
Approximate Geospatial Interlinking
Goal:
• Improve Progressive Geospatial Interlinking in two ways:
1. Use comprehensive evidence to discard candidate pairs in a principled way
2. Reduce the memory requirements
Approach:
1. Filtering → as in (Progressive) GIA.nt
2. Supervised Filtering
o Classify candidate pairs into “likely related pairs” & “unlikely related pairs” using a
feature vector
3. Verification → as in (Progressive) GIA.nt
Challenges:
• Avoid any human intervention
• Address class imbalance
• Define generic, effective & efficient features
• Minimize the feature and the training set → simple & efficient classification models
11
Approximate Geospatial Interlinking – Solution overview
• Contrastive, self-supervised learning
• 4 categories of features
1. Area-based
2. Boundary-based
3. Grid-based
4. Candidate-based
• 2 sub-categories in each case:
o Atomic features
o Composite features
Experimental results:
• Undersampling necessary
• All 31 features are important
• Just 1,000 labelled instances suffice
• Parallelization for higher scalability
12
Proactive Geospatial Interlinking
• Motivation:
o Most geometry pairs are disjoint
o Progressive Geospatial Interlinking maximizes throughput, but
has no way to a-priori determine the maximum number of
Verifications, BU, for a desired recall level
▪ Low BU leads to low recall
▪ High BU leads to low precision
• Solution:
o Terminate Geospatial Interlinking automatically as soon as recall exceeds a desired level →
minimize the time required for processing voluminous datasets
• Algorithms:
o Extrapolation-based
o Heuristics-based
▪ Precision-threshold
▪ Qualifying distance threshold
o Convergence-based
13
Convergence-based Algorithm
Based on:
• Trilateral weighting scheme (JS + CF + MBR) → fully deterministic approach
• Fine-grained MBRs → fewer candidate pairs (see next slide)
• Massive parallelization on Apache Spark
• Batch-oriented operation
o Terminate as soon as batch precision falls below a threshold for n consecutive batches
• Experiment with ~300M geometries in progress.
Precision
Precision
14
Fine-grained MBR
Decompose large geometries into smaller geometry segments
As a result:
• further filter superfluous verifications
o Verifications in D4: 66,379,979
o Verifications in D4 using fine-grained MBRs: 45,209,855 →
31% less candidate pairs
• accelerate Verification
o Instead of verifying two big geometries, verify only
the intersecting segments
o Task: Discover all topological relations by verifying
the least intersecting segments
15
Preliminary experiments
16
Thank you!

More Related Content

What's hot

HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 
FMI Open Data Interface and Usage
FMI Open Data Interface and UsageFMI Open Data Interface and Usage
FMI Open Data Interface and UsageRoope Tervo
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clusteringSubhas Kumar Ghosh
 
Design Pattern of HBase Configuration
Design Pattern of HBase ConfigurationDesign Pattern of HBase Configuration
Design Pattern of HBase ConfigurationDan Han
 
Earthquake Updates and Enhancements to Processing for Hazus-MH 3.2
Earthquake Updates and Enhancements to Processing for Hazus-MH 3.2Earthquake Updates and Enhancements to Processing for Hazus-MH 3.2
Earthquake Updates and Enhancements to Processing for Hazus-MH 3.2Troy Schmidt
 
Improving access to satellite imagery with Cloud computing
Improving access to satellite imagery with Cloud computingImproving access to satellite imagery with Cloud computing
Improving access to satellite imagery with Cloud computingRAHUL BHOJWANI
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit
 
CourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkCourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkDataWorks Summit
 
Pa nalyticals high_score_suite_brochure
Pa nalyticals high_score_suite_brochurePa nalyticals high_score_suite_brochure
Pa nalyticals high_score_suite_brochureNhut Duong
 
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...Kyong-Ha Lee
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceKyong-Ha Lee
 
HACC: Fitting the Universe Inside a Supercomputer
HACC: Fitting the Universe Inside a SupercomputerHACC: Fitting the Universe Inside a Supercomputer
HACC: Fitting the Universe Inside a Supercomputerinside-BigData.com
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...Spark Summit
 
Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefRobert Grossman
 
An Effective PSO-inspired Algorithm for Workflow Scheduling
An Effective PSO-inspired Algorithm for Workflow Scheduling An Effective PSO-inspired Algorithm for Workflow Scheduling
An Effective PSO-inspired Algorithm for Workflow Scheduling IJECEIAES
 

What's hot (20)

HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
FMI Open Data Interface and Usage
FMI Open Data Interface and UsageFMI Open Data Interface and Usage
FMI Open Data Interface and Usage
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
 
Design Pattern of HBase Configuration
Design Pattern of HBase ConfigurationDesign Pattern of HBase Configuration
Design Pattern of HBase Configuration
 
CMPE275-Project1Report
CMPE275-Project1ReportCMPE275-Project1Report
CMPE275-Project1Report
 
Clustering
ClusteringClustering
Clustering
 
Earthquake Updates and Enhancements to Processing for Hazus-MH 3.2
Earthquake Updates and Enhancements to Processing for Hazus-MH 3.2Earthquake Updates and Enhancements to Processing for Hazus-MH 3.2
Earthquake Updates and Enhancements to Processing for Hazus-MH 3.2
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 
Improving access to satellite imagery with Cloud computing
Improving access to satellite imagery with Cloud computingImproving access to satellite imagery with Cloud computing
Improving access to satellite imagery with Cloud computing
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
 
CourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkCourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on Spark
 
Pa nalyticals high_score_suite_brochure
Pa nalyticals high_score_suite_brochurePa nalyticals high_score_suite_brochure
Pa nalyticals high_score_suite_brochure
 
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduce
 
MapReduce
MapReduceMapReduce
MapReduce
 
HACC: Fitting the Universe Inside a Supercomputer
HACC: Fitting the Universe Inside a SupercomputerHACC: Fitting the Universe Inside a Supercomputer
HACC: Fitting the Universe Inside a Supercomputer
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
 
Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster Relief
 
C42011318
C42011318C42011318
C42011318
 
An Effective PSO-inspired Algorithm for Workflow Scheduling
An Effective PSO-inspired Algorithm for Workflow Scheduling An Effective PSO-inspired Algorithm for Workflow Scheduling
An Effective PSO-inspired Algorithm for Workflow Scheduling
 

Similar to Big Linked Data Interlinking - ExtremeEarth Open Workshop

Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Ram Sriharsha
 
[Seminar] hyunwook 0624
[Seminar] hyunwook 0624[Seminar] hyunwook 0624
[Seminar] hyunwook 0624ivaderivader
 
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...ssuser4b1f48
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data miningMITS Gwalior
 
Seminar_Presentation(Mar 2023).pptx
Seminar_Presentation(Mar 2023).pptxSeminar_Presentation(Mar 2023).pptx
Seminar_Presentation(Mar 2023).pptxjapnaanand3
 
PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...Feng Li
 
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...thanhdowork
 
Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...
Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...
Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...Christian Kehl
 
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Spark Summit
 
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...CRS4 Research Center in Sardinia
 
NS-CUK Seminar: S.T.Nguyen, Review on "DeepGCNs: Can GCNs Go as Deep as CNNs?...
NS-CUK Seminar: S.T.Nguyen, Review on "DeepGCNs: Can GCNs Go as Deep as CNNs?...NS-CUK Seminar: S.T.Nguyen, Review on "DeepGCNs: Can GCNs Go as Deep as CNNs?...
NS-CUK Seminar: S.T.Nguyen, Review on "DeepGCNs: Can GCNs Go as Deep as CNNs?...ssuser4b1f48
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptxthanhdowork
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraJason Riedy
 
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013Kostis Kyzirakos
 
Distributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsDistributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsBita Kazemi
 

Similar to Big Linked Data Interlinking - ExtremeEarth Open Workshop (20)

Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017
 
CLIM: Transition Workshop - Optimization Methods in Remote Sensing - Jessica...
CLIM: Transition Workshop - Optimization Methods in Remote Sensing  - Jessica...CLIM: Transition Workshop - Optimization Methods in Remote Sensing  - Jessica...
CLIM: Transition Workshop - Optimization Methods in Remote Sensing - Jessica...
 
CLIM Program: Remote Sensing Workshop, Optimization Methods in Remote Sensing...
CLIM Program: Remote Sensing Workshop, Optimization Methods in Remote Sensing...CLIM Program: Remote Sensing Workshop, Optimization Methods in Remote Sensing...
CLIM Program: Remote Sensing Workshop, Optimization Methods in Remote Sensing...
 
[Seminar] hyunwook 0624
[Seminar] hyunwook 0624[Seminar] hyunwook 0624
[Seminar] hyunwook 0624
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data mining
 
Seminar_Presentation(Mar 2023).pptx
Seminar_Presentation(Mar 2023).pptxSeminar_Presentation(Mar 2023).pptx
Seminar_Presentation(Mar 2023).pptx
 
Presentation
PresentationPresentation
Presentation
 
PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...
 
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
 
Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...
Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...
Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...
 
dock.ppt
dock.pptdock.ppt
dock.ppt
 
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
 
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
 
NS-CUK Seminar: S.T.Nguyen, Review on "DeepGCNs: Can GCNs Go as Deep as CNNs?...
NS-CUK Seminar: S.T.Nguyen, Review on "DeepGCNs: Can GCNs Go as Deep as CNNs?...NS-CUK Seminar: S.T.Nguyen, Review on "DeepGCNs: Can GCNs Go as Deep as CNNs?...
NS-CUK Seminar: S.T.Nguyen, Review on "DeepGCNs: Can GCNs Go as Deep as CNNs?...
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
 
Distributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsDistributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasets
 

More from ExtremeEarth

Polar Use Case - ExtremeEarth Open Workshop
Polar Use Case  - ExtremeEarth Open WorkshopPolar Use Case  - ExtremeEarth Open Workshop
Polar Use Case - ExtremeEarth Open WorkshopExtremeEarth
 
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopExtremeEarth
 
ExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - IntroductionExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - IntroductionExtremeEarth
 
Food Security Use Case - ExtremeEarth Open Workshop
Food Security Use Case - ExtremeEarth Open WorkshopFood Security Use Case - ExtremeEarth Open Workshop
Food Security Use Case - ExtremeEarth Open WorkshopExtremeEarth
 
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...ExtremeEarth
 
ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth Data Science Pipeline for Linked Earth Observation DataExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth Data Science Pipeline for Linked Earth Observation DataExtremeEarth
 
Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...ExtremeEarth
 
Snow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and IrrigationSnow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and IrrigationExtremeEarth
 
Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19ExtremeEarth
 
The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19ExtremeEarth
 
Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19ExtremeEarth
 
Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19ExtremeEarth
 
Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19ExtremeEarth
 
Copernicus and AI workshop 2020
Copernicus and AI workshop 2020Copernicus and AI workshop 2020
Copernicus and AI workshop 2020ExtremeEarth
 
LPS19 ExtremeEarth Project
LPS19 ExtremeEarth ProjectLPS19 ExtremeEarth Project
LPS19 ExtremeEarth ProjectExtremeEarth
 

More from ExtremeEarth (15)

Polar Use Case - ExtremeEarth Open Workshop
Polar Use Case  - ExtremeEarth Open WorkshopPolar Use Case  - ExtremeEarth Open Workshop
Polar Use Case - ExtremeEarth Open Workshop
 
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open Workshop
 
ExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - IntroductionExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - Introduction
 
Food Security Use Case - ExtremeEarth Open Workshop
Food Security Use Case - ExtremeEarth Open WorkshopFood Security Use Case - ExtremeEarth Open Workshop
Food Security Use Case - ExtremeEarth Open Workshop
 
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
 
ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth Data Science Pipeline for Linked Earth Observation DataExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
 
Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...
 
Snow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and IrrigationSnow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and Irrigation
 
Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19
 
The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19
 
Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19
 
Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19
 
Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19
 
Copernicus and AI workshop 2020
Copernicus and AI workshop 2020Copernicus and AI workshop 2020
Copernicus and AI workshop 2020
 
LPS19 ExtremeEarth Project
LPS19 ExtremeEarth ProjectLPS19 ExtremeEarth Project
LPS19 ExtremeEarth Project
 

Recently uploaded

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 

Recently uploaded (20)

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 

Big Linked Data Interlinking - ExtremeEarth Open Workshop

  • 1. Interlinking Big Linked Geospatial Data George Papadakis ExtremeEarth Online Workshop 9/12/2021
  • 2. 2 Geospatial Interlinking in action Mandilaras et. al. “Ice monitoring with ExtremeEarth” LASCAR workshop 2020, co-located with ESWC
  • 3. 3 Detected links Two different types: 1. Proximity relations (such as dbp:near) with a distance threshold • e.g., find all cities from S that are less than 1km away from any river in T 2. Topological relations according to the Dimensionally Extended 9-Intersection Model (DE9IM) • Equals • Disjoint • Touches • Contains • Covers • Intersects • Within • CoveredBy • Crosses • Overlaps
  • 4. 4 Geospatial Interlinking Example Three topological relations: 1. LineString g1 touches Polygon g3 2. LineString g1 intersects LineString g2 3. Polygon g3 contains Polygon g4 Challenges: 1. quadratic time complexity, O(n2) 2. time-consuming topological relations g3
  • 5. 5 GIA.nt: Geospatial Interlinking At large Goes beyond existing Filtering methods in two ways: 1. Redundant pairs are inherently removed 2. Space tiling depends only on the source dataset → the target dataset is read from the disk (>50% lower memory footprint) Introduces Holistic Verification • based on the Intersection Matrix → 80% lower run-time G. Papadakis, G. Mandilaras, N. Mamoulis, M. Koubarakis: Progressive, Holistic Geospatial Interlinking. WWW 2021
  • 6. 6 Progressive Geospatial Interlinking Same Filtering as GIA.nt. Introduces Scheduling: • Priority queue with top-BU weighted candidate pairs, where BU is the available budget and weight is determined by: • Co-occurrence Frequency (CF): #common tiles • Jaccard Similarity (JS): normalized CF • Pearson’s 𝜒2 test (𝜒2): degree to which s and t occur independently in tiles Verification processes the pairs of the queue in decreasing weight. G. Papadakis, G. Mandilaras, N. Mamoulis, M. Koubarakis: Progressive, Holistic Geospatial Interlinking. WWW 2021
  • 7. 7 Dynamic Progressive Geospatial Interlinking New weighting schemes: • POINTS: smaller geometries processed first → higher time efficiency • MBR: higher overlap in Minimum Bounding Rectangles first → higher effectiveness • composite weights → higher effectiveness, more deterministic behavior New scheduling: • instead of static processing order of geometry pairs, the processing order is updated dynamically, as more topologically related pairs are detected
  • 8. 8 JedAI-spatial Publicly available at: https://github.com/GeoLinker/GeoLinker Solution space: Model-view-controller architecture
  • 9. 9 JedAI-spatial – Part B • Common three-stage pipeline for the state-of-the-art parallel joins: o GeoSpark, i.e., Apache Sedona o Spatial Spark o Magellan o Location Spark o Parallel GIA.nt Scalability Analysis over D1 (|S|=2.3M, |T|=5.8M, |C|=6.3M)
  • 10. 10 Approximate Geospatial Interlinking Goal: • Improve Progressive Geospatial Interlinking in two ways: 1. Use comprehensive evidence to discard candidate pairs in a principled way 2. Reduce the memory requirements Approach: 1. Filtering → as in (Progressive) GIA.nt 2. Supervised Filtering o Classify candidate pairs into “likely related pairs” & “unlikely related pairs” using a feature vector 3. Verification → as in (Progressive) GIA.nt Challenges: • Avoid any human intervention • Address class imbalance • Define generic, effective & efficient features • Minimize the feature and the training set → simple & efficient classification models
  • 11. 11 Approximate Geospatial Interlinking – Solution overview • Contrastive, self-supervised learning • 4 categories of features 1. Area-based 2. Boundary-based 3. Grid-based 4. Candidate-based • 2 sub-categories in each case: o Atomic features o Composite features Experimental results: • Undersampling necessary • All 31 features are important • Just 1,000 labelled instances suffice • Parallelization for higher scalability
  • 12. 12 Proactive Geospatial Interlinking • Motivation: o Most geometry pairs are disjoint o Progressive Geospatial Interlinking maximizes throughput, but has no way to a-priori determine the maximum number of Verifications, BU, for a desired recall level ▪ Low BU leads to low recall ▪ High BU leads to low precision • Solution: o Terminate Geospatial Interlinking automatically as soon as recall exceeds a desired level → minimize the time required for processing voluminous datasets • Algorithms: o Extrapolation-based o Heuristics-based ▪ Precision-threshold ▪ Qualifying distance threshold o Convergence-based
  • 13. 13 Convergence-based Algorithm Based on: • Trilateral weighting scheme (JS + CF + MBR) → fully deterministic approach • Fine-grained MBRs → fewer candidate pairs (see next slide) • Massive parallelization on Apache Spark • Batch-oriented operation o Terminate as soon as batch precision falls below a threshold for n consecutive batches • Experiment with ~300M geometries in progress. Precision Precision
  • 14. 14 Fine-grained MBR Decompose large geometries into smaller geometry segments As a result: • further filter superfluous verifications o Verifications in D4: 66,379,979 o Verifications in D4 using fine-grained MBRs: 45,209,855 → 31% less candidate pairs • accelerate Verification o Instead of verifying two big geometries, verify only the intersecting segments o Task: Discover all topological relations by verifying the least intersecting segments