SlideShare a Scribd company logo
SPIMBENCH:
A Scalable, Schema-Aware
Instance Matching Benchmark
for the Semantic Publishing Domain
T. Saveta1, E. Daskalaki1, G. Flouris1, I. Fundulaki1,
M. Herschel2, A.-C. Ngonga Ngomo3
#1 FORTH-ICS, #2 University of Stuttgart, #3 University of Leipzig
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 2
Instance Matching in Linked Data
Data acquisition
Data
evolution
Data integration
Open/social data
How can we automatically recognize
multiple mentions of the same entity
across or within sources?
=
Instance Matching
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 3
Benchmarking
Instance matching research has led to the development of
various systems and algorithms.
How to compare these?
How can we assess their performance?
How can we push the systems to get better?
These systems need to be benchmarked
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 4
SPIMBENCH
• Based on Semantic Publishing Benchmark (SPB) of Linked
Data Benchmark Council (LDBC)
• Synthetic benchmark for the Semantic Publishing Domain
• Value-based, structure-based and semantics-aware
transformations [FMN+11, FLM08]
• Deterministic, scalable data generation in the order of
billion triples
• Weighted gold standard
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 5
Instance Matching Benchmark Ingredients [FLM08]
Benchmark
Datasets
Gold
Standard
Test
Cases
Metrics
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 6
SPIMBENCH Model
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 7
Value & Structure Based Transformations
Value: Mainly typographical errors and the use of
different data formats.[FMN+11]
Structure: Changes that occur to the properties.
– Property Addition/Deletion
– Property Aggregation/Extraction
Blank Character Addition/Deletion Change Number
Random Character Addition/Deletion/Modification Synonym/Antonym
Token Addition/Deletion/Shuffle Abbreviation
Multi-linguality (65 supported languages) Stem of a Word
Date Format
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 8
Semantics-Aware Transformations
Test if matching systems consider schema information to discover
instance matches.
• Instance (in)equality constructs
• owl:sameAs, owl:differentFrom
• Equivalence classes, properties
• owl:equivalentClass, owl:equivalentProperty
• Disjointness classes, properties
• owl:disjointWith, owl:propertyDisjointWith
• RDFS hierarchies
• rdfs:subClassOf, rdfs:subPropertyOf
• Property constraints
• owl:FunctionalProperty, owl:InverseFunctionalProperty
• Complex class definitions
• owl:unionOf, owl:intersectionOf
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 9
SPIMBENCH Model
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 10
Weighted Gold Standard
• Detailed GS for debugging reasons
• Final GS : Contains only URIs that we consider a match
and their similarity
spimbench:Match owl:Thing
spimbench:ValueTransf spimbench:StructureTransf spimbench:SemanticsAwareTransf
spimbench:Transformation
spimbench:VT1 spimbench:VTi
spimbench:ST1 spimbench:STi
spimbench:SAT1
…
spimbench:SATi
…
…
rdfs:subPropertyOf
rdfs:subClassOf
rdf:type
c
spimbench:source
spimbench:target
spimbench:weight xsd:string
spimbench:onProperty rdf:Property
spimbench:transformation
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 11
Scalability Experiments (1/2)
• Scalability experiments for datasets up to 500M triples
• 1000 triples ~ 36 entities
• Data generation along with data transformation is linear to the size
of triples
• Transformation overhead is negligible for value-based, structure-
based, semantics-aware and simple combinations
• Overhead for complex combinations is higher by one magnitude
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 12
Scalability Experiments (2/2)
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 13
Performance of LogMap [JG11]
Performance of LogMap for 10K triples Performance of LogMap for 25K triples
Performance of LogMap for 50K triples
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 14
Conclusions
• Schema aware variations
– Complex class definitions
– Property constraints
– Equivalence, Disjointness, etc.
• Combination of transformations
• Scalable data generation in order of billion triples
– Uses sampling
• Weighted gold standard
– Final gold standard
– Detailed gold standard for debugging reasons
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 15
Future Work
• SPIMBENCH will be used as one of the Ontology
Alignment Evaluation Initiative [OAEI]
benchmarks for 2015.
• Domain independent instance matching test
case generator.
• Definition of more sophisticated metrics that
takes into account the
difficulty (weight).
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 16
Acknowledgments
This work was partially supported by the ongoing FP7
European Project LDBC (Linked Data Benchmark Council)
(317548) and is done in collaboration with I. Fundulaki,
M. Herschel (University of Stuttgart), G. Flouris,
E. Daskalaki and A. C. Ngonga Ngomo (University of
Leipzig)
Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 17
References
# Reference Abbreviation
1
A. Ferrara and D. Lorusso and S. Montanelli and G. Varese.
Towards a Benchmark for Instance Matching. In OM, 2008.
[FLM08]
2
A. Ferrara and S. Montanelli and J. Noessner and H. Stuckenschmidt.
Benchmarking Matching Applications on the Semantic Web. In ESWC, 2011.
[FMN+11]
3
M. Nickel and V. Tresp. Tensor Factorization for Multi-relational Learning.
Machine Learning and Knowledge Discovery in Databases. Springer Berlin
Heidelberg, 2013. 617-621.
[NV13]
4
J. M. Joyce . Kullback-Leibler Divergence. International Encyclopedia of
Statistical Science. Springer Berlin Heidelberg, 2011. 720-722.
[J11]
5
E. Jimenez-Ruiz and B. C. Grau. Logmap: Logic-based and scalable ontology
matching. In ISWC, 2011.
[JG11]
6
B. Fuglede and F. Topsoe. Jensen-Shannon divergence and Hilbert space
embedding, in IEEE International Symposium on Information Theory, 2004.
[FT04]
7
Ontology Alignment Evaluation Initiative, find at
http://oaei.ontologymatching.org/
[OAEI]
Thank you!
Questions?

More Related Content

Viewers also liked

Instance Matching
Instance Matching Instance Matching
Instance Matching
Robert Isele
 
Enseñar derechos humanos
Enseñar derechos humanos Enseñar derechos humanos
Enseñar derechos humanos
angelronco93
 
SPRC Presentation -June 15, 2011
SPRC Presentation -June 15, 2011SPRC Presentation -June 15, 2011
SPRC Presentation -June 15, 2011
nvasoya
 
14 cinnamon springs.condo docs
14 cinnamon springs.condo docs14 cinnamon springs.condo docs
14 cinnamon springs.condo docs
Mary Beth Welsh
 
SWKS-Pitch
SWKS-PitchSWKS-Pitch
SWKS-Pitch
Iqbal Majeed
 
Skyworks Overview
Skyworks OverviewSkyworks Overview
Skyworks Overview
Monster12
 
Model Runway, Part 3 Design Best Practices at Blue Cross BlueShield
Model Runway, Part 3 Design Best Practices at Blue Cross BlueShieldModel Runway, Part 3 Design Best Practices at Blue Cross BlueShield
Model Runway, Part 3 Design Best Practices at Blue Cross BlueShield
Roger Snook
 
Queuing_analysis
Queuing_analysisQueuing_analysis
Queuing_analysis
Abinanth Sathya
 
Decision making in management for large medical equipment
Decision making in management for large medical equipmentDecision making in management for large medical equipment
Decision making in management for large medical equipment
HTAi Bilbao 2012
 
CIM to Modelica Factory - Automated Equation-Based Cyber-Physical Power Syste...
CIM to Modelica Factory - Automated Equation-Based Cyber-Physical Power Syste...CIM to Modelica Factory - Automated Equation-Based Cyber-Physical Power Syste...
CIM to Modelica Factory - Automated Equation-Based Cyber-Physical Power Syste...
Luigi Vanfretti
 
Farmacia ni Dok - Retail Pharmacy + Distribution Center
Farmacia ni Dok  - Retail Pharmacy + Distribution CenterFarmacia ni Dok  - Retail Pharmacy + Distribution Center
Farmacia ni Dok - Retail Pharmacy + Distribution Center
Farmacia ni Dok Incorporated
 
Domains of io t v1
Domains of io t v1Domains of io t v1
Domains of io t v1
Incubation & Industry
 
Walgreen vs cvs
Walgreen vs cvsWalgreen vs cvs
Walgreen vs cvs
Ruth Bonilla
 
Total Cycle Time (TCT)
Total Cycle Time (TCT)Total Cycle Time (TCT)
Total Cycle Time (TCT)
Patrick Fritz
 
Business model theory part 1
Business model theory part 1Business model theory part 1
Business model theory part 1
Incubation & Industry
 
Case Study on Practical Applications of Lean Principles - Phillip Cain, Alcon...
Case Study on Practical Applications of Lean Principles - Phillip Cain, Alcon...Case Study on Practical Applications of Lean Principles - Phillip Cain, Alcon...
Case Study on Practical Applications of Lean Principles - Phillip Cain, Alcon...
marcus evans Network
 
Nestlé and Alcon -case
Nestlé and Alcon -caseNestlé and Alcon -case
Nestlé and Alcon -case
Gabriele Falcone
 
Marketing strategies for cvs pharmacy
Marketing strategies for cvs pharmacyMarketing strategies for cvs pharmacy
Marketing strategies for cvs pharmacy
Yuanping Hu
 
Where Is the Pharmaceutical Industry on Supply Chain Maturity? What Can They ...
Where Is the Pharmaceutical Industry on Supply Chain Maturity? What Can They ...Where Is the Pharmaceutical Industry on Supply Chain Maturity? What Can They ...
Where Is the Pharmaceutical Industry on Supply Chain Maturity? What Can They ...
Lora Cecere
 
Sales and distribution of pharmaceutical industry
Sales and distribution of pharmaceutical industrySales and distribution of pharmaceutical industry
Sales and distribution of pharmaceutical industry
Krishna Bhawsar
 

Viewers also liked (20)

Instance Matching
Instance Matching Instance Matching
Instance Matching
 
Enseñar derechos humanos
Enseñar derechos humanos Enseñar derechos humanos
Enseñar derechos humanos
 
SPRC Presentation -June 15, 2011
SPRC Presentation -June 15, 2011SPRC Presentation -June 15, 2011
SPRC Presentation -June 15, 2011
 
14 cinnamon springs.condo docs
14 cinnamon springs.condo docs14 cinnamon springs.condo docs
14 cinnamon springs.condo docs
 
SWKS-Pitch
SWKS-PitchSWKS-Pitch
SWKS-Pitch
 
Skyworks Overview
Skyworks OverviewSkyworks Overview
Skyworks Overview
 
Model Runway, Part 3 Design Best Practices at Blue Cross BlueShield
Model Runway, Part 3 Design Best Practices at Blue Cross BlueShieldModel Runway, Part 3 Design Best Practices at Blue Cross BlueShield
Model Runway, Part 3 Design Best Practices at Blue Cross BlueShield
 
Queuing_analysis
Queuing_analysisQueuing_analysis
Queuing_analysis
 
Decision making in management for large medical equipment
Decision making in management for large medical equipmentDecision making in management for large medical equipment
Decision making in management for large medical equipment
 
CIM to Modelica Factory - Automated Equation-Based Cyber-Physical Power Syste...
CIM to Modelica Factory - Automated Equation-Based Cyber-Physical Power Syste...CIM to Modelica Factory - Automated Equation-Based Cyber-Physical Power Syste...
CIM to Modelica Factory - Automated Equation-Based Cyber-Physical Power Syste...
 
Farmacia ni Dok - Retail Pharmacy + Distribution Center
Farmacia ni Dok  - Retail Pharmacy + Distribution CenterFarmacia ni Dok  - Retail Pharmacy + Distribution Center
Farmacia ni Dok - Retail Pharmacy + Distribution Center
 
Domains of io t v1
Domains of io t v1Domains of io t v1
Domains of io t v1
 
Walgreen vs cvs
Walgreen vs cvsWalgreen vs cvs
Walgreen vs cvs
 
Total Cycle Time (TCT)
Total Cycle Time (TCT)Total Cycle Time (TCT)
Total Cycle Time (TCT)
 
Business model theory part 1
Business model theory part 1Business model theory part 1
Business model theory part 1
 
Case Study on Practical Applications of Lean Principles - Phillip Cain, Alcon...
Case Study on Practical Applications of Lean Principles - Phillip Cain, Alcon...Case Study on Practical Applications of Lean Principles - Phillip Cain, Alcon...
Case Study on Practical Applications of Lean Principles - Phillip Cain, Alcon...
 
Nestlé and Alcon -case
Nestlé and Alcon -caseNestlé and Alcon -case
Nestlé and Alcon -case
 
Marketing strategies for cvs pharmacy
Marketing strategies for cvs pharmacyMarketing strategies for cvs pharmacy
Marketing strategies for cvs pharmacy
 
Where Is the Pharmaceutical Industry on Supply Chain Maturity? What Can They ...
Where Is the Pharmaceutical Industry on Supply Chain Maturity? What Can They ...Where Is the Pharmaceutical Industry on Supply Chain Maturity? What Can They ...
Where Is the Pharmaceutical Industry on Supply Chain Maturity? What Can They ...
 
Sales and distribution of pharmaceutical industry
Sales and distribution of pharmaceutical industrySales and distribution of pharmaceutical industry
Sales and distribution of pharmaceutical industry
 

Similar to SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Semantic Publishing Domain

Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017
Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017
Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017
Holistic Benchmarking of Big Linked Data
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Matthew Lease
 
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching SystemsLink Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Holistic Benchmarking of Big Linked Data
 
Presentation at MTSR 2012
Presentation at MTSR 2012Presentation at MTSR 2012
Presentation at MTSR 2012
Riccardo Albertoni
 
Data Integration Ontology Mapping
Data Integration Ontology MappingData Integration Ontology Mapping
Data Integration Ontology Mapping
Pradeep B Pillai
 
Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic web
Stanley Wang
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
Rinke Hoekstra
 
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked Data
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked DataISWC 2014 Tutorial - Instance Matching Benchmarks for Linked Data
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked Data
Evangelia Daskalaki
 
Sergey Nikolenko and Anton Alekseev User Profiling in Text-Based Recommende...
Sergey Nikolenko and  Anton Alekseev  User Profiling in Text-Based Recommende...Sergey Nikolenko and  Anton Alekseev  User Profiling in Text-Based Recommende...
Sergey Nikolenko and Anton Alekseev User Profiling in Text-Based Recommende...
AIST
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
Carole Goble
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking
Mohamed BEN ELLEFI
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paper
DBOnto
 
SemFacet paper
SemFacet paperSemFacet paper
SemFacet paper
DBOnto
 
ACL-IJCNLP 2015
ACL-IJCNLP 2015ACL-IJCNLP 2015
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBI
Simon Jupp
 
BioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialBioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS Tutorial
Rothamsted Research, UK
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage Analysis
Jamshaid Ashraf
 
How well does your Instance Matching system perform? Experimental evaluation ...
How well does your Instance Matching system perform? Experimental evaluation ...How well does your Instance Matching system perform? Experimental evaluation ...
How well does your Instance Matching system perform? Experimental evaluation ...
Holistic Benchmarking of Big Linked Data
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP Systems
Andre Freitas
 
20110728 datalift-rpi-troy
20110728 datalift-rpi-troy20110728 datalift-rpi-troy
20110728 datalift-rpi-troy
François Scharffe
 

Similar to SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Semantic Publishing Domain (20)

Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017
Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017
Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching SystemsLink Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
 
Presentation at MTSR 2012
Presentation at MTSR 2012Presentation at MTSR 2012
Presentation at MTSR 2012
 
Data Integration Ontology Mapping
Data Integration Ontology MappingData Integration Ontology Mapping
Data Integration Ontology Mapping
 
Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic web
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked Data
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked DataISWC 2014 Tutorial - Instance Matching Benchmarks for Linked Data
ISWC 2014 Tutorial - Instance Matching Benchmarks for Linked Data
 
Sergey Nikolenko and Anton Alekseev User Profiling in Text-Based Recommende...
Sergey Nikolenko and  Anton Alekseev  User Profiling in Text-Based Recommende...Sergey Nikolenko and  Anton Alekseev  User Profiling in Text-Based Recommende...
Sergey Nikolenko and Anton Alekseev User Profiling in Text-Based Recommende...
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paper
 
SemFacet paper
SemFacet paperSemFacet paper
SemFacet paper
 
ACL-IJCNLP 2015
ACL-IJCNLP 2015ACL-IJCNLP 2015
ACL-IJCNLP 2015
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBI
 
BioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialBioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS Tutorial
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage Analysis
 
How well does your Instance Matching system perform? Experimental evaluation ...
How well does your Instance Matching system perform? Experimental evaluation ...How well does your Instance Matching system perform? Experimental evaluation ...
How well does your Instance Matching system perform? Experimental evaluation ...
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP Systems
 
20110728 datalift-rpi-troy
20110728 datalift-rpi-troy20110728 datalift-rpi-troy
20110728 datalift-rpi-troy
 

More from Ioan Toma

LDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter BonczLDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter Boncz
Ioan Toma
 
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOXParallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
Ioan Toma
 
MODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionMODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service Selection
Ioan Toma
 
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
Ioan Toma
 
LDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC SNB Benchmark Auditing
LDBC SNB Benchmark Auditing
Ioan Toma
 
Social Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadSocial Network Benchmark Interactive Workload
Social Network Benchmark Interactive Workload
Ioan Toma
 
MarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesMarkLogic Overview and Use Cases
MarkLogic Overview and Use Cases
Ioan Toma
 
Towards Temporal Graph Management and Analytics
Towards Temporal Graph Management and AnalyticsTowards Temporal Graph Management and Analytics
Towards Temporal Graph Management and Analytics
Ioan Toma
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
Ioan Toma
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
Ioan Toma
 
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesSADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
Ioan Toma
 
20 billion triples in production
20 billion triples in production20 billion triples in production
20 billion triples in production
Ioan Toma
 
Lighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphLighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on Giraph
Ioan Toma
 
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
Ioan Toma
 
Ldbc spb 2.0 evolution
Ldbc spb 2.0 evolutionLdbc spb 2.0 evolution
Ldbc spb 2.0 evolution
Ioan Toma
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
Ioan Toma
 
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba PeyGRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
Ioan Toma
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter Boncz
Ioan Toma
 

More from Ioan Toma (18)

LDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter BonczLDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter Boncz
 
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOXParallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
 
MODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionMODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service Selection
 
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
 
LDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC SNB Benchmark Auditing
LDBC SNB Benchmark Auditing
 
Social Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadSocial Network Benchmark Interactive Workload
Social Network Benchmark Interactive Workload
 
MarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesMarkLogic Overview and Use Cases
MarkLogic Overview and Use Cases
 
Towards Temporal Graph Management and Analytics
Towards Temporal Graph Management and AnalyticsTowards Temporal Graph Management and Analytics
Towards Temporal Graph Management and Analytics
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesSADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
 
20 billion triples in production
20 billion triples in production20 billion triples in production
20 billion triples in production
 
Lighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphLighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on Giraph
 
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
 
Ldbc spb 2.0 evolution
Ldbc spb 2.0 evolutionLdbc spb 2.0 evolution
Ldbc spb 2.0 evolution
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
 
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba PeyGRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter Boncz
 

Recently uploaded

Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 

Recently uploaded (20)

Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 

SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Semantic Publishing Domain

  • 1. SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Semantic Publishing Domain T. Saveta1, E. Daskalaki1, G. Flouris1, I. Fundulaki1, M. Herschel2, A.-C. Ngonga Ngomo3 #1 FORTH-ICS, #2 University of Stuttgart, #3 University of Leipzig
  • 2. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 2 Instance Matching in Linked Data Data acquisition Data evolution Data integration Open/social data How can we automatically recognize multiple mentions of the same entity across or within sources? = Instance Matching
  • 3. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 3 Benchmarking Instance matching research has led to the development of various systems and algorithms. How to compare these? How can we assess their performance? How can we push the systems to get better? These systems need to be benchmarked
  • 4. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 4 SPIMBENCH • Based on Semantic Publishing Benchmark (SPB) of Linked Data Benchmark Council (LDBC) • Synthetic benchmark for the Semantic Publishing Domain • Value-based, structure-based and semantics-aware transformations [FMN+11, FLM08] • Deterministic, scalable data generation in the order of billion triples • Weighted gold standard
  • 5. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 5 Instance Matching Benchmark Ingredients [FLM08] Benchmark Datasets Gold Standard Test Cases Metrics
  • 6. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 6 SPIMBENCH Model
  • 7. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 7 Value & Structure Based Transformations Value: Mainly typographical errors and the use of different data formats.[FMN+11] Structure: Changes that occur to the properties. – Property Addition/Deletion – Property Aggregation/Extraction Blank Character Addition/Deletion Change Number Random Character Addition/Deletion/Modification Synonym/Antonym Token Addition/Deletion/Shuffle Abbreviation Multi-linguality (65 supported languages) Stem of a Word Date Format
  • 8. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 8 Semantics-Aware Transformations Test if matching systems consider schema information to discover instance matches. • Instance (in)equality constructs • owl:sameAs, owl:differentFrom • Equivalence classes, properties • owl:equivalentClass, owl:equivalentProperty • Disjointness classes, properties • owl:disjointWith, owl:propertyDisjointWith • RDFS hierarchies • rdfs:subClassOf, rdfs:subPropertyOf • Property constraints • owl:FunctionalProperty, owl:InverseFunctionalProperty • Complex class definitions • owl:unionOf, owl:intersectionOf
  • 9. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 9 SPIMBENCH Model
  • 10. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 10 Weighted Gold Standard • Detailed GS for debugging reasons • Final GS : Contains only URIs that we consider a match and their similarity spimbench:Match owl:Thing spimbench:ValueTransf spimbench:StructureTransf spimbench:SemanticsAwareTransf spimbench:Transformation spimbench:VT1 spimbench:VTi spimbench:ST1 spimbench:STi spimbench:SAT1 … spimbench:SATi … … rdfs:subPropertyOf rdfs:subClassOf rdf:type c spimbench:source spimbench:target spimbench:weight xsd:string spimbench:onProperty rdf:Property spimbench:transformation
  • 11. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 11 Scalability Experiments (1/2) • Scalability experiments for datasets up to 500M triples • 1000 triples ~ 36 entities • Data generation along with data transformation is linear to the size of triples • Transformation overhead is negligible for value-based, structure- based, semantics-aware and simple combinations • Overhead for complex combinations is higher by one magnitude
  • 12. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 12 Scalability Experiments (2/2)
  • 13. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 13 Performance of LogMap [JG11] Performance of LogMap for 10K triples Performance of LogMap for 25K triples Performance of LogMap for 50K triples
  • 14. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 14 Conclusions • Schema aware variations – Complex class definitions – Property constraints – Equivalence, Disjointness, etc. • Combination of transformations • Scalable data generation in order of billion triples – Uses sampling • Weighted gold standard – Final gold standard – Detailed gold standard for debugging reasons
  • 15. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 15 Future Work • SPIMBENCH will be used as one of the Ontology Alignment Evaluation Initiative [OAEI] benchmarks for 2015. • Domain independent instance matching test case generator. • Definition of more sophisticated metrics that takes into account the difficulty (weight).
  • 16. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 16 Acknowledgments This work was partially supported by the ongoing FP7 European Project LDBC (Linked Data Benchmark Council) (317548) and is done in collaboration with I. Fundulaki, M. Herschel (University of Stuttgart), G. Flouris, E. Daskalaki and A. C. Ngonga Ngomo (University of Leipzig)
  • 17. Semantic Publishing Instance Matching Benchmark (SPIMBENCH) 17 References # Reference Abbreviation 1 A. Ferrara and D. Lorusso and S. Montanelli and G. Varese. Towards a Benchmark for Instance Matching. In OM, 2008. [FLM08] 2 A. Ferrara and S. Montanelli and J. Noessner and H. Stuckenschmidt. Benchmarking Matching Applications on the Semantic Web. In ESWC, 2011. [FMN+11] 3 M. Nickel and V. Tresp. Tensor Factorization for Multi-relational Learning. Machine Learning and Knowledge Discovery in Databases. Springer Berlin Heidelberg, 2013. 617-621. [NV13] 4 J. M. Joyce . Kullback-Leibler Divergence. International Encyclopedia of Statistical Science. Springer Berlin Heidelberg, 2011. 720-722. [J11] 5 E. Jimenez-Ruiz and B. C. Grau. Logmap: Logic-based and scalable ontology matching. In ISWC, 2011. [JG11] 6 B. Fuglede and F. Topsoe. Jensen-Shannon divergence and Hilbert space embedding, in IEEE International Symposium on Information Theory, 2004. [FT04] 7 Ontology Alignment Evaluation Initiative, find at http://oaei.ontologymatching.org/ [OAEI]

Editor's Notes

  1. We are currently working on a domain-independent instance matching test case generator for Linked Data, whose aim is to take any ontology and RDF dataset as source and produce a target dataset that will implement the test cases discussed earlier. We are also studying how we can dene more sophisticated metrics that take into account the difficulty (weight) of the correctly identified matches, to be used in tandem with the standard precision and recall metrics. Also SPIMBENCH will be used as one of the OAEI benchmarks for 2015. --------------------------------------------------------------------------------------------------------------- Όσο αφορά την μελλοντική ανάπτυξη του συστήματος θα προσπαθήσουμε να κάνουμε τον SPIMBench τελείως ανεξάρτητο από οποιοδήποτε τομέα (domain). Ακόμα θα μπορεί να υποστηρίζει περισσοτέρους συνδυασμούς μετατροπών με πιο αυτόματο τρόπο. Ακόμα θα πρέπει να επανεξετάσουμε τις μετρικές (precision- recall) ώστε να μπορουν να λάβουν υπόψη και τα βάρη. Wald method[ref] for sampling ?? -> provlepei kai poso tha einai to sfalma analoga to k ++++ koitaksame ola ta vasika tis owl lite kai owl rl kai auta pou kaname eixan mono noima alliws tha itan polu duskolo gia ta sustimata mpla mpla