SlideShare a Scribd company logo
Instance Matching Benchmarks in the
Era of Linked Data
Tzanina Saveta,
Evangelia Daskalaki, Irini Fundulaki,Giorgos Flouris
Institute of Computer Science – FORTH , Greece
Benchmarking Link Discovery Systems for Geo-Spatial Data 1
Linked Data –The LOD Cloud
27-Oct-17 Benchmarking Link Discovery Systems for Geo-Spatial Data 2*Adapted from Suchanek & Weikum tutorial@SIGMOD 2013
Same entity can be
described in
different sources
Instance Matching: the cornerstone for Linked Data
Benchmarking Link Discovery Systems for Geo-Spatial Data 3
Data Acquisition
Data Integration
Open/Social data
Problem: How can we automatically
recognize multiple mentions of the
same entity across or within sources?
Data Evolution
Solution: Instance Matching
Instance Matching in the LOD
27-Oct-17 Benchmarking Link Discovery Systems for Geo-Spatial Data 4
Sets
RDF/OWL triples
*Adapted from Suchanek & Weikum tutorial@SIGMOD 2013
Many sources to
match
Rich semantics
Value
Structure
Logical variations
Instance MatchingTechniques andTools
• People interconnect their dataset with existing ones.
– These links are often manually curated (or semi-
automatically generated).
• Size and number of datasets is huge, so it is vital to
automatically detect additional links that is making the graph
more dense
• Instance matching research has led to the development of a
number of systems.
Benchmarking Link Discovery Systems for Geo-Spatial Data 5
1. How to compare these?
2. How can we assess their performance?
3. How can we push the systems to get better?
The systems must be benchmarked!
Ingredients of an Instance Matching Benchmark
• It is organized in test cases each addressing different kind of
requirements
• Datasets
– The raw material of the benchmarks.These are the source and the
target dataset that will be matched together to find the instances that
refer to the same real world entity
• Gold Standard (GroundTruth / Reference Alignment)
– The “correct answer sheet” used to judge the completeness and
soundness of the instance matching algorithms
• Metrics/Key Performance Indicators
– The performance metric(s) that determine the systems behavior and
performance
Benchmarking Link Discovery Systems for Geo-Spatial Data 6
Datasets
Distinguish between Real and Synthetic datasets :
– Real Datasets:
• Realistic conditions for heterogeneity problems
• Realistic distributions
• Error prone Reference Alignment
– Synthetic Datasets:
• Fully controlled test conditions
• Accurate Gold Standards
• Unrealistic distributions
Benchmarking Link Discovery Systems for Geo-Spatial Data 7
DataVariations in Datasets
Benchmarking Link Discovery Systems for Geo-Spatial Data 8
Variations
Value
- Name style abbreviation
-Typographical errors
- Change format
(date/gender/number)
- Synonym Change
- Multilingualism
Structural
-Change property depth
-Delete/Add property
-Split property values
-Transformation of
object to data type
property
-Transformation of data
to object type property
Logical
-Delete/Modify Class
Assertions
-Invert property
assertions
-Change property
hierarchy
-Assert disjoint classes
Gold Standards and Metrics
• Metrics/Key Performance Indicators
– Precision/Recall/F-measure
Benchmarking Link Discovery Systems for Geo-Spatial Data 9
Gold
Standard
Result
Set
False
Negative (FN)
True
Positive (TP)
False
Positive (FP)
Recall r =TP / (TP + FN)
Precision p =TP / (TP + FP)
F-measure f = 2 * p * r / (p + r)
Design Criteria for Instance Matching Benchmarks
Benchmarking Link Discovery Systems for Geo-Spatial Data 10
Systematic
Procedure
matching tasks are reproducible and the execution is
comparable
Availability related to the availability of the benchmark in time
Quality precise evaluation rules and high quality ontologies
Equity no system privileged during the evaluation process
Dissemination how many systems have used this benchmark to be
evaluated with
Volume how many instances did the datasets contain
Gold Standard existence of gold standard and it’s accuracy
Benchmarking
• Instance matching techniques have, until recently, been
benchmarked in an ad-hoc way
• There does not exist a standard way of benchmarking the
performance of the systems, when it comes to Linked Data
• On the other hand, Instance Matching benchmarks have been
mainly driven forward by the Ontology Alignment Evaluation
Initiative (OAEI)
– organizes annual campaign for ontology matching since 2005
– hosts independent benchmarks
• In 2009, OAEI introduced the Instance Matching (IM)Track
– focuses on the evaluation of different instance matching
techniques and tools for Linked Data
Benchmarking Link Discovery Systems for Geo-Spatial Data 11
Benchmark Frameworks
• Benchmark Generators
• Synthetic Benchmarks
• Real Benchmarks
Benchmarking Link Discovery Systems for Geo-Spatial Data 12
SWING
[FMN+11]
SPIMBENCH
[SDF+15]
LANCE
[SDFF+15]
OAEI IIMB 2009
[EFH+09]
OAEI IIMB 2010
[EFM+10]
OAEI Persons-
Restaurants
2010 [EFM+10]
ONTOBI 2010
[Z10]
OAEI IIMB 2011
[EHH+11]
Sandbox
2012 [AEE+12]
OAEI IIMB 2012
[AEE+12]
OAEI RDFT
2013 [GDE+13]
ID-REC Task
2014 [DEE14]
SPIMBENCH
2015 [CDE+15]
Author Task
2015 [CDE+15]
ARS
(OAEI 2009) [EFH+09]
DI
(OAEI 2010) [EFM+10]
DI-NYT
(OAEI 2011) [EHH+11]
Comparison of synthetic Benchmarks
Benchmarking Link Discovery Systems for Geo-Spatial Data 13
Comparison of real benchmarks
Benchmarking Link Discovery Systems for Geo-Spatial Data 14
Wrapping Up
Benchmarking Link Discovery Systems for Geo-Spatial Data 15
Which benchmarks included multilingual datasets?
OAEI RDFT
2013 (French-
English)
ID-REC 2014
(English- Italian)
Author Task
(English –
Italian)
Wrapping Up: Benchmarks
Benchmarking Link Discovery Systems for Geo-Spatial Data 16
Which benchmarks included value variations into the test
cases?
OAEI IIMB
2009
OAEI IIMB
2010
OAEI Persons-
Restaurants
2010
ONTOBI
OAEI IIMB
2011
Sandbox 2012
OAEI IIMB
2012
OAEI RDFT
2013
ID-REC 2014
SPIMBENCH
2015
Author Task
2015
ARS
DI 2010 DI 2011
Wrapping up: Benchmarks
Which benchmarks included structural variations into the test
cases?
OAEI IIMB
2009
OAEI IIMB
2010
OAEI Persons-
Restaurants
2010
ONTOBI
OAEI IIMB
2011
OAEI IIMB
2012
OAEI RDFT
2013
ID-REC 2014
SPIMBENCH
2015
Author Task
2015
ARS DI 2010
DI 2011
Wrapping up: Benchmarks
Which benchmarks included logical variations into the test
cases?
OAEI IIMB
2009
OAEI IIMB
2010
OAEI IIMB
2011
OAEI IIMB
2012
SPIMBENCH
2015
Wrapping up: Benchmarks
Which benchmarks included combination of the variations into
the test cases?
IIMB 2009 IIMB 2010 IIMB 2011
IIMB 2012 RDFT 2013 ID-REC 2014
SPIMBENCH
2015
Author Task
2015
Wrapping up: Benchmarks
Which benchmarks are more voluminous?
SPIMBENCH
2015
ARS
DI 2011
Wrapping up: Benchmarks
Which benchmarks included both combination of the variations
and was voluminous at the same time?
SPIMBENCH 2015
Open Issues
• Issue 1:
Only one benchmark that tackles both, combination of
variations and scalability issues
• Issue 2 :
Not enough IM benchmark using the full expressiveness of
RDF/OWL language
Wrapping Up: Systems for Benchmarks
Outcomes as far as systems are concerned:
• Systems can handle the value variations, the
structural variation, and the simple logical variations
separately
• More work needed for complex variations
(combination of value, structural, and logical)
• More work needed for structural variations
• Enhancement of systems to cope with the clustering
of the mappings (1-n mappings)
Conclusion
• Many instance matching benchmarks have been
proposed
• Each of them answering to some of the needs of
instance matching systems
• It is time now to start creating benchmarks that will
“show the way to the future” and make them better
25
This work was supported by grands from the EU H2020 Framework Programme
provided for the project HOBBIT (GA no. 688227).

More Related Content

Similar to Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017

Connected development data
Connected development dataConnected development data
Connected development data
Rob Worthington
 
Ontologies for Emergency & Disaster Management
Ontologies for Emergency & Disaster Management Ontologies for Emergency & Disaster Management
Ontologies for Emergency & Disaster Management
Stephane Fellah
 
Geospatial Ontologies and GeoSPARQL Services
Geospatial Ontologies and GeoSPARQL ServicesGeospatial Ontologies and GeoSPARQL Services
Geospatial Ontologies and GeoSPARQL Services
Stephane Fellah
 
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...
Graph-TA
 
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
Ioan Toma
 
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
LDBC council
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
Riccardo Albertoni
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage Analysis
Jamshaid Ashraf
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
SSSW
 
An empirical performance evaluation of relational keyword search systems
An empirical performance evaluation of relational keyword search systemsAn empirical performance evaluation of relational keyword search systems
An empirical performance evaluation of relational keyword search systems
Browse Jobs
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBI
Simon Jupp
 
Resource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and FederationResource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and Federation
Pistoia Alliance
 
Core Geospatial Ontologies
Core Geospatial OntologiesCore Geospatial Ontologies
Core Geospatial Ontologies
Stephane Fellah
 
Delivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphsDelivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphs
Ben Gardner
 
RDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneRDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOne
Research Data Alliance
 
Visionbi Quality Gates
Visionbi Quality GatesVisionbi Quality Gates
Visionbi Quality Gates
Ram Yonish
 
Globe seminar
Globe seminarGlobe seminar
Globe seminar
Xavier Ochoa
 
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and ApplicationsSemantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
Artificial Intelligence Institute at UofSC
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data Portals
Peter Haase
 
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked Data
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked DataISWC 2014 Tutorial - Instance Matching Benchmarks for Linked Data
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked Data
Evangelia Daskalaki
 

Similar to Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017 (20)

Connected development data
Connected development dataConnected development data
Connected development data
 
Ontologies for Emergency & Disaster Management
Ontologies for Emergency & Disaster Management Ontologies for Emergency & Disaster Management
Ontologies for Emergency & Disaster Management
 
Geospatial Ontologies and GeoSPARQL Services
Geospatial Ontologies and GeoSPARQL ServicesGeospatial Ontologies and GeoSPARQL Services
Geospatial Ontologies and GeoSPARQL Services
 
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...
 
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
 
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage Analysis
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
 
An empirical performance evaluation of relational keyword search systems
An empirical performance evaluation of relational keyword search systemsAn empirical performance evaluation of relational keyword search systems
An empirical performance evaluation of relational keyword search systems
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBI
 
Resource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and FederationResource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and Federation
 
Core Geospatial Ontologies
Core Geospatial OntologiesCore Geospatial Ontologies
Core Geospatial Ontologies
 
Delivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphsDelivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphs
 
RDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneRDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOne
 
Visionbi Quality Gates
Visionbi Quality GatesVisionbi Quality Gates
Visionbi Quality Gates
 
Globe seminar
Globe seminarGlobe seminar
Globe seminar
 
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and ApplicationsSemantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data Portals
 
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked Data
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked DataISWC 2014 Tutorial - Instance Matching Benchmarks for Linked Data
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked Data
 

More from Holistic Benchmarking of Big Linked Data

EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
Holistic Benchmarking of Big Linked Data
 
Benchmarking Big Linked Data: The case of the HOBBIT Project
Benchmarking Big Linked Data: The case of the HOBBIT ProjectBenchmarking Big Linked Data: The case of the HOBBIT Project
Benchmarking Big Linked Data: The case of the HOBBIT Project
Holistic Benchmarking of Big Linked Data
 
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
Holistic Benchmarking of Big Linked Data
 
The DEBS Grand Challenge 2018
The DEBS Grand Challenge 2018The DEBS Grand Challenge 2018
The DEBS Grand Challenge 2018
Holistic Benchmarking of Big Linked Data
 
Benchmarking of distributed linked data streaming systems
Benchmarking of distributed linked data streaming systemsBenchmarking of distributed linked data streaming systems
Benchmarking of distributed linked data streaming systems
Holistic Benchmarking of Big Linked Data
 
SQCFramework: SPARQL Query Containment Benchmarks Generation Framework
SQCFramework: SPARQL Query Containment Benchmarks Generation FrameworkSQCFramework: SPARQL Query Containment Benchmarks Generation Framework
SQCFramework: SPARQL Query Containment Benchmarks Generation Framework
Holistic Benchmarking of Big Linked Data
 
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federationLargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
Holistic Benchmarking of Big Linked Data
 
The DEBS Grand Challenge 2017
The DEBS Grand Challenge 2017The DEBS Grand Challenge 2017
The DEBS Grand Challenge 2017
Holistic Benchmarking of Big Linked Data
 
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
Holistic Benchmarking of Big Linked Data
 
Scalable Link Discovery for Modern Data-Driven Applications (poster)
Scalable Link Discovery for Modern Data-Driven Applications (poster)Scalable Link Discovery for Modern Data-Driven Applications (poster)
Scalable Link Discovery for Modern Data-Driven Applications (poster)
Holistic Benchmarking of Big Linked Data
 
An Evaluation of Models for Runtime Approximation in Link Discovery
An Evaluation of Models for Runtime Approximation in Link DiscoveryAn Evaluation of Models for Runtime Approximation in Link Discovery
An Evaluation of Models for Runtime Approximation in Link Discovery
Holistic Benchmarking of Big Linked Data
 
Scalable Link Discovery for Modern Data-Driven Applications
Scalable Link Discovery for Modern Data-Driven ApplicationsScalable Link Discovery for Modern Data-Driven Applications
Scalable Link Discovery for Modern Data-Driven Applications
Holistic Benchmarking of Big Linked Data
 
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
 Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F... Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
Holistic Benchmarking of Big Linked Data
 
SPgen: A Benchmark Generator for Spatial Link Discovery Tools
SPgen: A Benchmark Generator for Spatial Link Discovery ToolsSPgen: A Benchmark Generator for Spatial Link Discovery Tools
SPgen: A Benchmark Generator for Spatial Link Discovery Tools
Holistic Benchmarking of Big Linked Data
 
Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
Introducing the HOBBIT platform into the Ontology Alignment Evaluation CampaignIntroducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
Holistic Benchmarking of Big Linked Data
 
OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018
Holistic Benchmarking of Big Linked Data
 
MOCHA 2018 Challenge @ ESWC2018
MOCHA 2018 Challenge @ ESWC2018MOCHA 2018 Challenge @ ESWC2018
MOCHA 2018 Challenge @ ESWC2018
Holistic Benchmarking of Big Linked Data
 
Dynamic planning for link discovery - ESWC 2018
Dynamic planning for link discovery - ESWC 2018Dynamic planning for link discovery - ESWC 2018
Dynamic planning for link discovery - ESWC 2018
Holistic Benchmarking of Big Linked Data
 
Hobbit project overview presented at EBDVF 2017
Hobbit project overview presented at EBDVF 2017Hobbit project overview presented at EBDVF 2017
Hobbit project overview presented at EBDVF 2017
Holistic Benchmarking of Big Linked Data
 
Leopard ISWC Semantic Web Challenge 2017 (poster)
Leopard ISWC Semantic Web Challenge 2017 (poster)Leopard ISWC Semantic Web Challenge 2017 (poster)
Leopard ISWC Semantic Web Challenge 2017 (poster)
Holistic Benchmarking of Big Linked Data
 

More from Holistic Benchmarking of Big Linked Data (20)

EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
 
Benchmarking Big Linked Data: The case of the HOBBIT Project
Benchmarking Big Linked Data: The case of the HOBBIT ProjectBenchmarking Big Linked Data: The case of the HOBBIT Project
Benchmarking Big Linked Data: The case of the HOBBIT Project
 
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
 
The DEBS Grand Challenge 2018
The DEBS Grand Challenge 2018The DEBS Grand Challenge 2018
The DEBS Grand Challenge 2018
 
Benchmarking of distributed linked data streaming systems
Benchmarking of distributed linked data streaming systemsBenchmarking of distributed linked data streaming systems
Benchmarking of distributed linked data streaming systems
 
SQCFramework: SPARQL Query Containment Benchmarks Generation Framework
SQCFramework: SPARQL Query Containment Benchmarks Generation FrameworkSQCFramework: SPARQL Query Containment Benchmarks Generation Framework
SQCFramework: SPARQL Query Containment Benchmarks Generation Framework
 
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federationLargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
 
The DEBS Grand Challenge 2017
The DEBS Grand Challenge 2017The DEBS Grand Challenge 2017
The DEBS Grand Challenge 2017
 
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
 
Scalable Link Discovery for Modern Data-Driven Applications (poster)
Scalable Link Discovery for Modern Data-Driven Applications (poster)Scalable Link Discovery for Modern Data-Driven Applications (poster)
Scalable Link Discovery for Modern Data-Driven Applications (poster)
 
An Evaluation of Models for Runtime Approximation in Link Discovery
An Evaluation of Models for Runtime Approximation in Link DiscoveryAn Evaluation of Models for Runtime Approximation in Link Discovery
An Evaluation of Models for Runtime Approximation in Link Discovery
 
Scalable Link Discovery for Modern Data-Driven Applications
Scalable Link Discovery for Modern Data-Driven ApplicationsScalable Link Discovery for Modern Data-Driven Applications
Scalable Link Discovery for Modern Data-Driven Applications
 
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
 Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F... Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
 
SPgen: A Benchmark Generator for Spatial Link Discovery Tools
SPgen: A Benchmark Generator for Spatial Link Discovery ToolsSPgen: A Benchmark Generator for Spatial Link Discovery Tools
SPgen: A Benchmark Generator for Spatial Link Discovery Tools
 
Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
Introducing the HOBBIT platform into the Ontology Alignment Evaluation CampaignIntroducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
 
OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018
 
MOCHA 2018 Challenge @ ESWC2018
MOCHA 2018 Challenge @ ESWC2018MOCHA 2018 Challenge @ ESWC2018
MOCHA 2018 Challenge @ ESWC2018
 
Dynamic planning for link discovery - ESWC 2018
Dynamic planning for link discovery - ESWC 2018Dynamic planning for link discovery - ESWC 2018
Dynamic planning for link discovery - ESWC 2018
 
Hobbit project overview presented at EBDVF 2017
Hobbit project overview presented at EBDVF 2017Hobbit project overview presented at EBDVF 2017
Hobbit project overview presented at EBDVF 2017
 
Leopard ISWC Semantic Web Challenge 2017 (poster)
Leopard ISWC Semantic Web Challenge 2017 (poster)Leopard ISWC Semantic Web Challenge 2017 (poster)
Leopard ISWC Semantic Web Challenge 2017 (poster)
 

Recently uploaded

Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
LengamoLAppostilic
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 

Recently uploaded (20)

Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 

Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017

  • 1. Instance Matching Benchmarks in the Era of Linked Data Tzanina Saveta, Evangelia Daskalaki, Irini Fundulaki,Giorgos Flouris Institute of Computer Science – FORTH , Greece Benchmarking Link Discovery Systems for Geo-Spatial Data 1
  • 2. Linked Data –The LOD Cloud 27-Oct-17 Benchmarking Link Discovery Systems for Geo-Spatial Data 2*Adapted from Suchanek & Weikum tutorial@SIGMOD 2013 Same entity can be described in different sources
  • 3. Instance Matching: the cornerstone for Linked Data Benchmarking Link Discovery Systems for Geo-Spatial Data 3 Data Acquisition Data Integration Open/Social data Problem: How can we automatically recognize multiple mentions of the same entity across or within sources? Data Evolution Solution: Instance Matching
  • 4. Instance Matching in the LOD 27-Oct-17 Benchmarking Link Discovery Systems for Geo-Spatial Data 4 Sets RDF/OWL triples *Adapted from Suchanek & Weikum tutorial@SIGMOD 2013 Many sources to match Rich semantics Value Structure Logical variations
  • 5. Instance MatchingTechniques andTools • People interconnect their dataset with existing ones. – These links are often manually curated (or semi- automatically generated). • Size and number of datasets is huge, so it is vital to automatically detect additional links that is making the graph more dense • Instance matching research has led to the development of a number of systems. Benchmarking Link Discovery Systems for Geo-Spatial Data 5 1. How to compare these? 2. How can we assess their performance? 3. How can we push the systems to get better? The systems must be benchmarked!
  • 6. Ingredients of an Instance Matching Benchmark • It is organized in test cases each addressing different kind of requirements • Datasets – The raw material of the benchmarks.These are the source and the target dataset that will be matched together to find the instances that refer to the same real world entity • Gold Standard (GroundTruth / Reference Alignment) – The “correct answer sheet” used to judge the completeness and soundness of the instance matching algorithms • Metrics/Key Performance Indicators – The performance metric(s) that determine the systems behavior and performance Benchmarking Link Discovery Systems for Geo-Spatial Data 6
  • 7. Datasets Distinguish between Real and Synthetic datasets : – Real Datasets: • Realistic conditions for heterogeneity problems • Realistic distributions • Error prone Reference Alignment – Synthetic Datasets: • Fully controlled test conditions • Accurate Gold Standards • Unrealistic distributions Benchmarking Link Discovery Systems for Geo-Spatial Data 7
  • 8. DataVariations in Datasets Benchmarking Link Discovery Systems for Geo-Spatial Data 8 Variations Value - Name style abbreviation -Typographical errors - Change format (date/gender/number) - Synonym Change - Multilingualism Structural -Change property depth -Delete/Add property -Split property values -Transformation of object to data type property -Transformation of data to object type property Logical -Delete/Modify Class Assertions -Invert property assertions -Change property hierarchy -Assert disjoint classes
  • 9. Gold Standards and Metrics • Metrics/Key Performance Indicators – Precision/Recall/F-measure Benchmarking Link Discovery Systems for Geo-Spatial Data 9 Gold Standard Result Set False Negative (FN) True Positive (TP) False Positive (FP) Recall r =TP / (TP + FN) Precision p =TP / (TP + FP) F-measure f = 2 * p * r / (p + r)
  • 10. Design Criteria for Instance Matching Benchmarks Benchmarking Link Discovery Systems for Geo-Spatial Data 10 Systematic Procedure matching tasks are reproducible and the execution is comparable Availability related to the availability of the benchmark in time Quality precise evaluation rules and high quality ontologies Equity no system privileged during the evaluation process Dissemination how many systems have used this benchmark to be evaluated with Volume how many instances did the datasets contain Gold Standard existence of gold standard and it’s accuracy
  • 11. Benchmarking • Instance matching techniques have, until recently, been benchmarked in an ad-hoc way • There does not exist a standard way of benchmarking the performance of the systems, when it comes to Linked Data • On the other hand, Instance Matching benchmarks have been mainly driven forward by the Ontology Alignment Evaluation Initiative (OAEI) – organizes annual campaign for ontology matching since 2005 – hosts independent benchmarks • In 2009, OAEI introduced the Instance Matching (IM)Track – focuses on the evaluation of different instance matching techniques and tools for Linked Data Benchmarking Link Discovery Systems for Geo-Spatial Data 11
  • 12. Benchmark Frameworks • Benchmark Generators • Synthetic Benchmarks • Real Benchmarks Benchmarking Link Discovery Systems for Geo-Spatial Data 12 SWING [FMN+11] SPIMBENCH [SDF+15] LANCE [SDFF+15] OAEI IIMB 2009 [EFH+09] OAEI IIMB 2010 [EFM+10] OAEI Persons- Restaurants 2010 [EFM+10] ONTOBI 2010 [Z10] OAEI IIMB 2011 [EHH+11] Sandbox 2012 [AEE+12] OAEI IIMB 2012 [AEE+12] OAEI RDFT 2013 [GDE+13] ID-REC Task 2014 [DEE14] SPIMBENCH 2015 [CDE+15] Author Task 2015 [CDE+15] ARS (OAEI 2009) [EFH+09] DI (OAEI 2010) [EFM+10] DI-NYT (OAEI 2011) [EHH+11]
  • 13. Comparison of synthetic Benchmarks Benchmarking Link Discovery Systems for Geo-Spatial Data 13
  • 14. Comparison of real benchmarks Benchmarking Link Discovery Systems for Geo-Spatial Data 14
  • 15. Wrapping Up Benchmarking Link Discovery Systems for Geo-Spatial Data 15 Which benchmarks included multilingual datasets? OAEI RDFT 2013 (French- English) ID-REC 2014 (English- Italian) Author Task (English – Italian)
  • 16. Wrapping Up: Benchmarks Benchmarking Link Discovery Systems for Geo-Spatial Data 16 Which benchmarks included value variations into the test cases? OAEI IIMB 2009 OAEI IIMB 2010 OAEI Persons- Restaurants 2010 ONTOBI OAEI IIMB 2011 Sandbox 2012 OAEI IIMB 2012 OAEI RDFT 2013 ID-REC 2014 SPIMBENCH 2015 Author Task 2015 ARS DI 2010 DI 2011
  • 17. Wrapping up: Benchmarks Which benchmarks included structural variations into the test cases? OAEI IIMB 2009 OAEI IIMB 2010 OAEI Persons- Restaurants 2010 ONTOBI OAEI IIMB 2011 OAEI IIMB 2012 OAEI RDFT 2013 ID-REC 2014 SPIMBENCH 2015 Author Task 2015 ARS DI 2010 DI 2011
  • 18. Wrapping up: Benchmarks Which benchmarks included logical variations into the test cases? OAEI IIMB 2009 OAEI IIMB 2010 OAEI IIMB 2011 OAEI IIMB 2012 SPIMBENCH 2015
  • 19. Wrapping up: Benchmarks Which benchmarks included combination of the variations into the test cases? IIMB 2009 IIMB 2010 IIMB 2011 IIMB 2012 RDFT 2013 ID-REC 2014 SPIMBENCH 2015 Author Task 2015
  • 20. Wrapping up: Benchmarks Which benchmarks are more voluminous? SPIMBENCH 2015 ARS DI 2011
  • 21. Wrapping up: Benchmarks Which benchmarks included both combination of the variations and was voluminous at the same time? SPIMBENCH 2015
  • 22. Open Issues • Issue 1: Only one benchmark that tackles both, combination of variations and scalability issues • Issue 2 : Not enough IM benchmark using the full expressiveness of RDF/OWL language
  • 23. Wrapping Up: Systems for Benchmarks Outcomes as far as systems are concerned: • Systems can handle the value variations, the structural variation, and the simple logical variations separately • More work needed for complex variations (combination of value, structural, and logical) • More work needed for structural variations • Enhancement of systems to cope with the clustering of the mappings (1-n mappings)
  • 24. Conclusion • Many instance matching benchmarks have been proposed • Each of them answering to some of the needs of instance matching systems • It is time now to start creating benchmarks that will “show the way to the future” and make them better
  • 25. 25 This work was supported by grands from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).