SlideShare a Scribd company logo
1 of 18
SPIMBENCH:
A Scalable, Schema-Aware
Instance Matching Benchmark
for the Semantic Publishing Domain
T. Saveta1, E. Daskalaki1, G. Flouris1, I. Fundulaki1,
M. Herschel2, A.-C. Ngonga Ngomo3
#1 FORTH-ICS, #2 University of Stuttgart, #3 University of Leipzig
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 2
Instance Matching in Linked Data
Data acquisition
Data
evolution
Data integration
Open/social data
How can we automatically recognize
multiple mentions of the same entity
across or within sources?
=
Instance Matching
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 3
Benchmarking
Instance matching research has led to the development of
various systems and algorithms.
How to compare these?
How can we assess their performance?
How can we push the systems to get better?
These systems need to be benchmarked
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 4
SPIMBENCH
• Based on Semantic Publishing Benchmark (SPB) of Linked
Data Benchmark Council (LDBC)
• Synthetic benchmark for the Semantic Publishing Domain
• Value-based, structure-based and semantics-aware
transformations [FMN+11, FLM08]
• Deterministic, scalable data generation in the order of
billion triples
• Weighted gold standard
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 5
Instance Matching Benchmark Ingredients [FLM08]
Benchmark
Datasets
Gold
Standard
Test
Cases
Metrics
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 6
SPIMBENCH Model
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 7
Value & Structure Based Transformations
Value: Mainly typographical errors and the use of
different data formats.[FMN+11]
Structure: Changes that occur to the properties.
– Property Addition/Deletion
– Property Aggregation/Extraction
Blank Character Addition/Deletion Change Number
Random Character Addition/Deletion/Modification Synonym/Antonym
Token Addition/Deletion/Shuffle Abbreviation
Multi-linguality (65 supported languages) Stem of a Word
Date Format
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 8
Semantics-Aware Transformations
Test if matching systems consider schema information to discover
instance matches.
• Instance (in)equality constructs
• owl:sameAs, owl:differentFrom
• Equivalence classes, properties
• owl:equivalentClass, owl:equivalentProperty
• Disjointness classes, properties
• owl:disjointWith, owl:propertyDisjointWith
• RDFS hierarchies
• rdfs:subClassOf, rdfs:subPropertyOf
• Property constraints
• owl:FunctionalProperty, owl:InverseFunctionalProperty
• Complex class definitions
• owl:unionOf, owl:intersectionOf
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 9
SPIMBENCH Model
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 10
Weighted Gold Standard
• Detailed GS for debugging reasons
• Final GS : Contains only URIs that we consider a match
and their similarity
spimbench:Match owl:Thing
spimbench:ValueTransf spimbench:StructureTransf spimbench:SemanticsAwareTransf
spimbench:Transformation
spimbench:VT1 spimbench:VTi
spimbench:ST1 spimbench:STi
spimbench:SAT1
…
spimbench:SATi
…
…
rdfs:subPropertyOf
rdfs:subClassOf
rdf:type
c
spimbench:source
spimbench:target
spimbench:weight xsd:string
spimbench:onProperty rdf:Property
spimbench:transformation
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 11
Scalability Experiments (1/2)
• Scalability experiments for datasets up to 500M triples
• 1000 triples ~ 36 entities
• Data generation along with data transformation is linear to the size
of triples
• Transformation overhead is negligible for value-based, structure-
based, semantics-aware and simple combinations
• Overhead for complex combinations is higher by one magnitude
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 12
Scalability Experiments (2/2)
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 13
Performance of LogMap [JG11]
Performance of LogMap for 10K triples Performance of LogMap for 25K triples
Performance of LogMap for 50K triples
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 14
Conclusions
• Schema aware variations
– Complex class definitions
– Property constraints
– Equivalence, Disjointness, etc.
• Combination of transformations
• Scalable data generation in order of billion triples
– Uses sampling
• Weighted gold standard
– Final gold standard
– Detailed gold standard for debugging reasons
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 15
Future Work
• SPIMBENCH will be used as one of the Ontology
Alignment Evaluation Initiative [OAEI]
benchmarks for 2015.
• Domain independent instance matching test
case generator.
• Definition of more sophisticated metrics that
takes into account the
difficulty (weight).
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 16
Acknowledgments
This work was partially supported by the ongoing FP7
European Project LDBC (Linked Data Benchmark Council)
(317548) and is done in collaboration with I. Fundulaki,
M. Herschel (University of Stuttgart), G. Flouris,
E. Daskalaki and A. C. Ngonga Ngomo (University of
Leipzig)
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 17
References
# Reference Abbreviation
1
A. Ferrara and D. Lorusso and S. Montanelli and G. Varese.
Towards a Benchmark for Instance Matching. In OM, 2008.
[FLM08]
2
A. Ferrara and S. Montanelli and J. Noessner and H. Stuckenschmidt.
Benchmarking Matching Applications on the Semantic Web. In ESWC, 2011.
[FMN+11]
3
M. Nickel and V. Tresp. Tensor Factorization for Multi-relational Learning.
Machine Learning and Knowledge Discovery in Databases. Springer Berlin
Heidelberg, 2013. 617-621.
[NV13]
4
J. M. Joyce . Kullback-Leibler Divergence. International Encyclopedia of
Statistical Science. Springer Berlin Heidelberg, 2011. 720-722.
[J11]
5
E. Jimenez-Ruiz and B. C. Grau. Logmap: Logic-based and scalable ontology
matching. In ISWC, 2011.
[JG11]
6
B. Fuglede and F. Topsoe. Jensen-Shannon divergence and Hilbert space
embedding, in IEEE International Symposium on Information Theory, 2004.
[FT04]
7
Ontology Alignment Evaluation Initiative, find at
http://oaei.ontologymatching.org/
[OAEI]
Thank you!
Questions?

More Related Content

Viewers also liked

Abaques Lecko - Enseignements du benchmark 2013
Abaques Lecko - Enseignements du benchmark 2013 Abaques Lecko - Enseignements du benchmark 2013
Abaques Lecko - Enseignements du benchmark 2013 Lecko
 
Benchmark Sites Agences Com. Sensible
Benchmark Sites Agences Com. SensibleBenchmark Sites Agences Com. Sensible
Benchmark Sites Agences Com. SensibleBioforce
 
Rapport supref benchmarking 2016
Rapport supref benchmarking 2016Rapport supref benchmarking 2016
Rapport supref benchmarking 2016Amaury Baot
 

Viewers also liked (7)

Abaques Lecko - Enseignements du benchmark 2013
Abaques Lecko - Enseignements du benchmark 2013 Abaques Lecko - Enseignements du benchmark 2013
Abaques Lecko - Enseignements du benchmark 2013
 
Benchmark Sites Agences Com. Sensible
Benchmark Sites Agences Com. SensibleBenchmark Sites Agences Com. Sensible
Benchmark Sites Agences Com. Sensible
 
Rapport supref benchmarking 2016
Rapport supref benchmarking 2016Rapport supref benchmarking 2016
Rapport supref benchmarking 2016
 
Stratégie Marketing Adidas
Stratégie Marketing Adidas Stratégie Marketing Adidas
Stratégie Marketing Adidas
 
Denotation connotation
Denotation connotationDenotation connotation
Denotation connotation
 
Benchmark football
Benchmark footballBenchmark football
Benchmark football
 
Marketing mix
Marketing mixMarketing mix
Marketing mix
 

Similar to SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Semantic Publishing Domain

Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching SystemsLink Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching SystemsHolistic Benchmarking of Big Linked Data
 
Data Integration Ontology Mapping
Data Integration Ontology MappingData Integration Ontology Mapping
Data Integration Ontology MappingPradeep B Pillai
 
Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic webStanley Wang
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the WebRinke Hoekstra
 
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked Data
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked DataISWC 2014 Tutorial - Instance Matching Benchmarks for Linked Data
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked DataEvangelia Daskalaki
 
Sergey Nikolenko and Anton Alekseev User Profiling in Text-Based Recommende...
Sergey Nikolenko and  Anton Alekseev  User Profiling in Text-Based Recommende...Sergey Nikolenko and  Anton Alekseev  User Profiling in Text-Based Recommende...
Sergey Nikolenko and Anton Alekseev User Profiling in Text-Based Recommende...AIST
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking Mohamed BEN ELLEFI
 
SemFacet paper
SemFacet paperSemFacet paper
SemFacet paperDBOnto
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paperDBOnto
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISimon Jupp
 
BioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialBioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialRothamsted Research, UK
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisJamshaid Ashraf
 
How well does your Instance Matching system perform? Experimental evaluation ...
How well does your Instance Matching system perform? Experimental evaluation ...How well does your Instance Matching system perform? Experimental evaluation ...
How well does your Instance Matching system perform? Experimental evaluation ...Holistic Benchmarking of Big Linked Data
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsAndre Freitas
 

Similar to SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Semantic Publishing Domain (20)

Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017
Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017
Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching SystemsLink Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
 
Presentation at MTSR 2012
Presentation at MTSR 2012Presentation at MTSR 2012
Presentation at MTSR 2012
 
Data Integration Ontology Mapping
Data Integration Ontology MappingData Integration Ontology Mapping
Data Integration Ontology Mapping
 
Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic web
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked Data
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked DataISWC 2014 Tutorial - Instance Matching Benchmarks for Linked Data
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked Data
 
Sergey Nikolenko and Anton Alekseev User Profiling in Text-Based Recommende...
Sergey Nikolenko and  Anton Alekseev  User Profiling in Text-Based Recommende...Sergey Nikolenko and  Anton Alekseev  User Profiling in Text-Based Recommende...
Sergey Nikolenko and Anton Alekseev User Profiling in Text-Based Recommende...
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking
 
SemFacet paper
SemFacet paperSemFacet paper
SemFacet paper
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paper
 
ACL-IJCNLP 2015
ACL-IJCNLP 2015ACL-IJCNLP 2015
ACL-IJCNLP 2015
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBI
 
BioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialBioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS Tutorial
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage Analysis
 
How well does your Instance Matching system perform? Experimental evaluation ...
How well does your Instance Matching system perform? Experimental evaluation ...How well does your Instance Matching system perform? Experimental evaluation ...
How well does your Instance Matching system perform? Experimental evaluation ...
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP Systems
 
20110728 datalift-rpi-troy
20110728 datalift-rpi-troy20110728 datalift-rpi-troy
20110728 datalift-rpi-troy
 

More from Graph-TA

Computing on Event-sourced Graphs
Computing on Event-sourced GraphsComputing on Event-sourced Graphs
Computing on Event-sourced GraphsGraph-TA
 
Using Evolutionary Computing for Feature-driven Graph generation
Using Evolutionary Computing for Feature-driven Graph generationUsing Evolutionary Computing for Feature-driven Graph generation
Using Evolutionary Computing for Feature-driven Graph generationGraph-TA
 
Reactive Databases for Big Data applications
Reactive Databases for Big Data applicationsReactive Databases for Big Data applications
Reactive Databases for Big Data applicationsGraph-TA
 
The scarcity of crossing dependencies: a direct outcome of a specific constra...
The scarcity of crossing dependencies: a direct outcome of a specific constra...The scarcity of crossing dependencies: a direct outcome of a specific constra...
The scarcity of crossing dependencies: a direct outcome of a specific constra...Graph-TA
 
Holistic Benchmarking of Big Linked Data: HOBBIT
Holistic Benchmarking of Big Linked Data: HOBBITHolistic Benchmarking of Big Linked Data: HOBBIT
Holistic Benchmarking of Big Linked Data: HOBBITGraph-TA
 
Identifiability in Dynamic Casual Networks
Identifiability in Dynamic Casual NetworksIdentifiability in Dynamic Casual Networks
Identifiability in Dynamic Casual NetworksGraph-TA
 
Polyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivotPolyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivotGraph-TA
 
Benchmarking Versioning for Big Linked Data
Benchmarking Versioning for Big Linked DataBenchmarking Versioning for Big Linked Data
Benchmarking Versioning for Big Linked DataGraph-TA
 
Synthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modelingSynthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modelingGraph-TA
 
Use of Graphs for Cloud Service Selection in Multi-Cloud Environments
Use of Graphs for Cloud Service Selection in Multi-Cloud EnvironmentsUse of Graphs for Cloud Service Selection in Multi-Cloud Environments
Use of Graphs for Cloud Service Selection in Multi-Cloud EnvironmentsGraph-TA
 
Graphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platformsGraphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platformsGraph-TA
 
Modelling the Clustering Coefficient of a Random graph
Modelling the Clustering Coefficient of a Random graphModelling the Clustering Coefficient of a Random graph
Modelling the Clustering Coefficient of a Random graphGraph-TA
 
RDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsRDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsGraph-TA
 
GRAPHITE — An Extensible Graph Traversal Framework for RDBMS
GRAPHITE — An Extensible Graph Traversal Framework for RDBMSGRAPHITE — An Extensible Graph Traversal Framework for RDBMS
GRAPHITE — An Extensible Graph Traversal Framework for RDBMSGraph-TA
 
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphsOn the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphsGraph-TA
 
Graphalytics: A big data benchmark for graph processing platforms
Graphalytics: A big data benchmark for graph processing platformsGraphalytics: A big data benchmark for graph processing platforms
Graphalytics: A big data benchmark for graph processing platformsGraph-TA
 
Autograph: an evolving lightweight graph tool
Autograph: an evolving lightweight graph toolAutograph: an evolving lightweight graph tool
Autograph: an evolving lightweight graph toolGraph-TA
 
Understanding Graph Structure in Knowledge Bases
Understanding Graph Structure in Knowledge BasesUnderstanding Graph Structure in Knowledge Bases
Understanding Graph Structure in Knowledge BasesGraph-TA
 
Finding patterns of chronic disease and medication prescriptions from a large...
Finding patterns of chronic disease and medication prescriptions from a large...Finding patterns of chronic disease and medication prescriptions from a large...
Finding patterns of chronic disease and medication prescriptions from a large...Graph-TA
 
Recent Updates on IBM System G — GraphBIG and Temporal Data
Recent Updates on IBM System G — GraphBIG and Temporal DataRecent Updates on IBM System G — GraphBIG and Temporal Data
Recent Updates on IBM System G — GraphBIG and Temporal DataGraph-TA
 

More from Graph-TA (20)

Computing on Event-sourced Graphs
Computing on Event-sourced GraphsComputing on Event-sourced Graphs
Computing on Event-sourced Graphs
 
Using Evolutionary Computing for Feature-driven Graph generation
Using Evolutionary Computing for Feature-driven Graph generationUsing Evolutionary Computing for Feature-driven Graph generation
Using Evolutionary Computing for Feature-driven Graph generation
 
Reactive Databases for Big Data applications
Reactive Databases for Big Data applicationsReactive Databases for Big Data applications
Reactive Databases for Big Data applications
 
The scarcity of crossing dependencies: a direct outcome of a specific constra...
The scarcity of crossing dependencies: a direct outcome of a specific constra...The scarcity of crossing dependencies: a direct outcome of a specific constra...
The scarcity of crossing dependencies: a direct outcome of a specific constra...
 
Holistic Benchmarking of Big Linked Data: HOBBIT
Holistic Benchmarking of Big Linked Data: HOBBITHolistic Benchmarking of Big Linked Data: HOBBIT
Holistic Benchmarking of Big Linked Data: HOBBIT
 
Identifiability in Dynamic Casual Networks
Identifiability in Dynamic Casual NetworksIdentifiability in Dynamic Casual Networks
Identifiability in Dynamic Casual Networks
 
Polyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivotPolyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivot
 
Benchmarking Versioning for Big Linked Data
Benchmarking Versioning for Big Linked DataBenchmarking Versioning for Big Linked Data
Benchmarking Versioning for Big Linked Data
 
Synthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modelingSynthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modeling
 
Use of Graphs for Cloud Service Selection in Multi-Cloud Environments
Use of Graphs for Cloud Service Selection in Multi-Cloud EnvironmentsUse of Graphs for Cloud Service Selection in Multi-Cloud Environments
Use of Graphs for Cloud Service Selection in Multi-Cloud Environments
 
Graphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platformsGraphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platforms
 
Modelling the Clustering Coefficient of a Random graph
Modelling the Clustering Coefficient of a Random graphModelling the Clustering Coefficient of a Random graph
Modelling the Clustering Coefficient of a Random graph
 
RDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsRDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL Platforms
 
GRAPHITE — An Extensible Graph Traversal Framework for RDBMS
GRAPHITE — An Extensible Graph Traversal Framework for RDBMSGRAPHITE — An Extensible Graph Traversal Framework for RDBMS
GRAPHITE — An Extensible Graph Traversal Framework for RDBMS
 
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphsOn the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
 
Graphalytics: A big data benchmark for graph processing platforms
Graphalytics: A big data benchmark for graph processing platformsGraphalytics: A big data benchmark for graph processing platforms
Graphalytics: A big data benchmark for graph processing platforms
 
Autograph: an evolving lightweight graph tool
Autograph: an evolving lightweight graph toolAutograph: an evolving lightweight graph tool
Autograph: an evolving lightweight graph tool
 
Understanding Graph Structure in Knowledge Bases
Understanding Graph Structure in Knowledge BasesUnderstanding Graph Structure in Knowledge Bases
Understanding Graph Structure in Knowledge Bases
 
Finding patterns of chronic disease and medication prescriptions from a large...
Finding patterns of chronic disease and medication prescriptions from a large...Finding patterns of chronic disease and medication prescriptions from a large...
Finding patterns of chronic disease and medication prescriptions from a large...
 
Recent Updates on IBM System G — GraphBIG and Temporal Data
Recent Updates on IBM System G — GraphBIG and Temporal DataRecent Updates on IBM System G — GraphBIG and Temporal Data
Recent Updates on IBM System G — GraphBIG and Temporal Data
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Semantic Publishing Domain

  • 1. SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Semantic Publishing Domain T. Saveta1, E. Daskalaki1, G. Flouris1, I. Fundulaki1, M. Herschel2, A.-C. Ngonga Ngomo3 #1 FORTH-ICS, #2 University of Stuttgart, #3 University of Leipzig
  • 2. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 2 Instance Matching in Linked Data Data acquisition Data evolution Data integration Open/social data How can we automatically recognize multiple mentions of the same entity across or within sources? = Instance Matching
  • 3. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 3 Benchmarking Instance matching research has led to the development of various systems and algorithms. How to compare these? How can we assess their performance? How can we push the systems to get better? These systems need to be benchmarked
  • 4. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 4 SPIMBENCH • Based on Semantic Publishing Benchmark (SPB) of Linked Data Benchmark Council (LDBC) • Synthetic benchmark for the Semantic Publishing Domain • Value-based, structure-based and semantics-aware transformations [FMN+11, FLM08] • Deterministic, scalable data generation in the order of billion triples • Weighted gold standard
  • 5. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 5 Instance Matching Benchmark Ingredients [FLM08] Benchmark Datasets Gold Standard Test Cases Metrics
  • 6. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 6 SPIMBENCH Model
  • 7. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 7 Value & Structure Based Transformations Value: Mainly typographical errors and the use of different data formats.[FMN+11] Structure: Changes that occur to the properties. – Property Addition/Deletion – Property Aggregation/Extraction Blank Character Addition/Deletion Change Number Random Character Addition/Deletion/Modification Synonym/Antonym Token Addition/Deletion/Shuffle Abbreviation Multi-linguality (65 supported languages) Stem of a Word Date Format
  • 8. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 8 Semantics-Aware Transformations Test if matching systems consider schema information to discover instance matches. • Instance (in)equality constructs • owl:sameAs, owl:differentFrom • Equivalence classes, properties • owl:equivalentClass, owl:equivalentProperty • Disjointness classes, properties • owl:disjointWith, owl:propertyDisjointWith • RDFS hierarchies • rdfs:subClassOf, rdfs:subPropertyOf • Property constraints • owl:FunctionalProperty, owl:InverseFunctionalProperty • Complex class definitions • owl:unionOf, owl:intersectionOf
  • 9. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 9 SPIMBENCH Model
  • 10. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 10 Weighted Gold Standard • Detailed GS for debugging reasons • Final GS : Contains only URIs that we consider a match and their similarity spimbench:Match owl:Thing spimbench:ValueTransf spimbench:StructureTransf spimbench:SemanticsAwareTransf spimbench:Transformation spimbench:VT1 spimbench:VTi spimbench:ST1 spimbench:STi spimbench:SAT1 … spimbench:SATi … … rdfs:subPropertyOf rdfs:subClassOf rdf:type c spimbench:source spimbench:target spimbench:weight xsd:string spimbench:onProperty rdf:Property spimbench:transformation
  • 11. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 11 Scalability Experiments (1/2) • Scalability experiments for datasets up to 500M triples • 1000 triples ~ 36 entities • Data generation along with data transformation is linear to the size of triples • Transformation overhead is negligible for value-based, structure- based, semantics-aware and simple combinations • Overhead for complex combinations is higher by one magnitude
  • 12. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 12 Scalability Experiments (2/2)
  • 13. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 13 Performance of LogMap [JG11] Performance of LogMap for 10K triples Performance of LogMap for 25K triples Performance of LogMap for 50K triples
  • 14. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 14 Conclusions • Schema aware variations – Complex class definitions – Property constraints – Equivalence, Disjointness, etc. • Combination of transformations • Scalable data generation in order of billion triples – Uses sampling • Weighted gold standard – Final gold standard – Detailed gold standard for debugging reasons
  • 15. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 15 Future Work • SPIMBENCH will be used as one of the Ontology Alignment Evaluation Initiative [OAEI] benchmarks for 2015. • Domain independent instance matching test case generator. • Definition of more sophisticated metrics that takes into account the difficulty (weight).
  • 16. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 16 Acknowledgments This work was partially supported by the ongoing FP7 European Project LDBC (Linked Data Benchmark Council) (317548) and is done in collaboration with I. Fundulaki, M. Herschel (University of Stuttgart), G. Flouris, E. Daskalaki and A. C. Ngonga Ngomo (University of Leipzig)
  • 17. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 17 References # Reference Abbreviation 1 A. Ferrara and D. Lorusso and S. Montanelli and G. Varese. Towards a Benchmark for Instance Matching. In OM, 2008. [FLM08] 2 A. Ferrara and S. Montanelli and J. Noessner and H. Stuckenschmidt. Benchmarking Matching Applications on the Semantic Web. In ESWC, 2011. [FMN+11] 3 M. Nickel and V. Tresp. Tensor Factorization for Multi-relational Learning. Machine Learning and Knowledge Discovery in Databases. Springer Berlin Heidelberg, 2013. 617-621. [NV13] 4 J. M. Joyce . Kullback-Leibler Divergence. International Encyclopedia of Statistical Science. Springer Berlin Heidelberg, 2011. 720-722. [J11] 5 E. Jimenez-Ruiz and B. C. Grau. Logmap: Logic-based and scalable ontology matching. In ISWC, 2011. [JG11] 6 B. Fuglede and F. Topsoe. Jensen-Shannon divergence and Hilbert space embedding, in IEEE International Symposium on Information Theory, 2004. [FT04] 7 Ontology Alignment Evaluation Initiative, find at http://oaei.ontologymatching.org/ [OAEI]

Editor's Notes

  1. We are currently working on a domain-independent instance matching test case generator for Linked Data, whose aim is to take any ontology and RDF dataset as source and produce a target dataset that will implement the test cases discussed earlier. We are also studying how we can dene more sophisticated metrics that take into account the difficulty (weight) of the correctly identified matches, to be used in tandem with the standard precision and recall metrics. Also SPIMBENCH will be used as one of the OAEI benchmarks for 2015. --------------------------------------------------------------------------------------------------------------- Όσο αφορά την μελλοντική ανάπτυξη του συστήματος θα προσπαθήσουμε να κάνουμε τον SPIMBench τελείως ανεξάρτητο από οποιοδήποτε τομέα (domain). Ακόμα θα μπορεί να υποστηρίζει περισσοτέρους συνδυασμούς μετατροπών με πιο αυτόματο τρόπο. Ακόμα θα πρέπει να επανεξετάσουμε τις μετρικές (precision- recall) ώστε να μπορουν να λάβουν υπόψη και τα βάρη. Wald method[ref] for sampling ?? -> provlepei kai poso tha einai to sfalma analoga to k ++++ koitaksame ola ta vasika tis owl lite kai owl rl kai auta pou kaname eixan mono noima alliws tha itan polu duskolo gia ta sustimata mpla mpla