Rule-Based Curation of Reference Datasets for Named Entity Recognition and Entity Linking.
Hobbit presentation at ESWC 2017
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Introduction to neo4j and graph databases in generally - looking at what neo4j is, why we should use it, the Cypher query language and the wider ecosystem.
The document provides an agenda for the second day of the WISS Challenge on question answering over linked data. It encourages full effort on projects and offers free coffee. It also includes links to training question and answer datasets for the QALD-4 challenge and the DBpedia endpoint to use.
"EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs" as presented in Sthe 17th International Semantic Web Conference ISWC, 9th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Benchmarking Big Linked Data: The case of the HOBBIT Project" as presented in the First International Workshop on Semantic Web Technologies for Health Data Management (SWH 2018), co-located with ISWC 2018, 9th October, 2018 held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning Benchmark" as presented in SSWS 2018 co-located with the 17th International Semantic Web Conference ISWC, 9th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"The DEBS Grand Challenge 2018" as presented in the 12th ACM International Conference on Distributed and Event-Based Systems (DEBS 2017), 25 - 29 June, 2017 held in Hammilton, New Zeland
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Benchmarking of distributed linked data streaming systems" as presented in the Stream Reasoning Workshop 2018, January 16-17, 2018, held by Department of Informatics DDIS (University of Zurich) in Zurich, Suisse
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"SQCFramework: SPARQL Query Containment Benchmarks Generation Framework" as presented in the 9th International Conference on Knowledge Capture(K-Cap 2017), December 4th-6th, 2017, held in Austin, USA.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Introduction to neo4j and graph databases in generally - looking at what neo4j is, why we should use it, the Cypher query language and the wider ecosystem.
The document provides an agenda for the second day of the WISS Challenge on question answering over linked data. It encourages full effort on projects and offers free coffee. It also includes links to training question and answer datasets for the QALD-4 challenge and the DBpedia endpoint to use.
"EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs" as presented in Sthe 17th International Semantic Web Conference ISWC, 9th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Benchmarking Big Linked Data: The case of the HOBBIT Project" as presented in the First International Workshop on Semantic Web Technologies for Health Data Management (SWH 2018), co-located with ISWC 2018, 9th October, 2018 held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning Benchmark" as presented in SSWS 2018 co-located with the 17th International Semantic Web Conference ISWC, 9th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"The DEBS Grand Challenge 2018" as presented in the 12th ACM International Conference on Distributed and Event-Based Systems (DEBS 2017), 25 - 29 June, 2017 held in Hammilton, New Zeland
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Benchmarking of distributed linked data streaming systems" as presented in the Stream Reasoning Workshop 2018, January 16-17, 2018, held by Department of Informatics DDIS (University of Zurich) in Zurich, Suisse
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"SQCFramework: SPARQL Query Containment Benchmarks Generation Framework" as presented in the 9th International Conference on Knowledge Capture(K-Cap 2017), December 4th-6th, 2017, held in Austin, USA.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation" as presented in the 17th International Semantic Web Conference ISWC ( ournal track), 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"The DEBS Grand Challenge 2017" as presented in the The 11th ACM International Conference on Distributed and Event-Based Systems, 19 - 23 June, 2017 held in Barcelona, Spain
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QALD-9 Question Answering over Linked Data Challenge" as presented in the 17th International Semantic Web Conference ISWC, 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Scalable Link Discovery for Modern Data-Driven Applications" poster presented ECAI 2016, September 2016, held in the Hague, Netherlands.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"An Evaluation of Models for Runtime Approximation in Link Discovery" as presented in the IEEE/WIC/ACM WI, August 25th, 2017, held in Leipzig, Germany.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Scalable Link Discovery for Modern Data-Driven Applications" as presented in the 15th International Semantic Web Conference ISWC, Doctoral Consortium, October 18th, 2016, held in Kobe, Japan
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint Federation" as presented in SSWS 2018 co-located with the 17th International Semantic Web Conference ISWC, 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"SPgen: A Benchmark Generator for Spatial Link Discovery Tools" as presented in Ontology Matching (OM) hosted by the 17th International Semantic Web Conference ISWC, 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign" was presented in Ontology Matching (OM) hosted by the 17th International Semantic Web Conference ISWC, 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
OKE Challenge was hosted by European Semantic Web Conference ESWC, 3-7 June 2018, held in Heraklion, Crete, Greece (Aldemar Knossos Royal & Royal Villa).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
MOCHA Challenge was hosted by European Semantic Web Conference ESWC, 3-7 June 2018, held in Heraklion, Crete, Greece (Aldemar Knossos Royal & Royal Villa).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Paper presented at European Semantic Web Conference ESWC, 3-7 June 2018, held in Heraklion, Crete, Greece (Aldemar Knossos Royal & Royal Villa).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
HOBBIT project overview presented at European Big Data Value Forum, 21-23 Nov 2017, held in Versailles, France (Palais des Congres).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Leopard ISWC Semantic Web Challenge 2017 at ISWC2017.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Leopard ISWC Semantic Web Challenge 2017 at ISWC2017.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Benchmarking Link Discovery Systems for Geo-Spatial Data presented at BLINK2017 at ISWC2017.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Journal presented at AlignmentTrack at ISWC2017.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Poster presented at the OAEI Ontology Matching (OM) workshop at ISWC 2017 (HOBBIT Link Discovery Task).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Valdestilhas et al. propose using the most frequent K characters (MFKC) as a string similarity measure and develop efficient filtering approaches. They define MFKC and derive a similarity function σ. They present three filters - a hash intersection filter, frequency filter, and most frequent character filter - to efficiently compute string pairs with σ above a threshold. Their experimental evaluation shows the filters improve runtime over naive approaches while maintaining high precision, recall, and F-measure.
Presentation at ACM Conference - Semantics2017, September 11--14, 2017, Amsterdam, Netherlands
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
"LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation" as presented in the 17th International Semantic Web Conference ISWC ( ournal track), 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"The DEBS Grand Challenge 2017" as presented in the The 11th ACM International Conference on Distributed and Event-Based Systems, 19 - 23 June, 2017 held in Barcelona, Spain
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QALD-9 Question Answering over Linked Data Challenge" as presented in the 17th International Semantic Web Conference ISWC, 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Scalable Link Discovery for Modern Data-Driven Applications" poster presented ECAI 2016, September 2016, held in the Hague, Netherlands.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"An Evaluation of Models for Runtime Approximation in Link Discovery" as presented in the IEEE/WIC/ACM WI, August 25th, 2017, held in Leipzig, Germany.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Scalable Link Discovery for Modern Data-Driven Applications" as presented in the 15th International Semantic Web Conference ISWC, Doctoral Consortium, October 18th, 2016, held in Kobe, Japan
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint Federation" as presented in SSWS 2018 co-located with the 17th International Semantic Web Conference ISWC, 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"SPgen: A Benchmark Generator for Spatial Link Discovery Tools" as presented in Ontology Matching (OM) hosted by the 17th International Semantic Web Conference ISWC, 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign" was presented in Ontology Matching (OM) hosted by the 17th International Semantic Web Conference ISWC, 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
OKE Challenge was hosted by European Semantic Web Conference ESWC, 3-7 June 2018, held in Heraklion, Crete, Greece (Aldemar Knossos Royal & Royal Villa).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
MOCHA Challenge was hosted by European Semantic Web Conference ESWC, 3-7 June 2018, held in Heraklion, Crete, Greece (Aldemar Knossos Royal & Royal Villa).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Paper presented at European Semantic Web Conference ESWC, 3-7 June 2018, held in Heraklion, Crete, Greece (Aldemar Knossos Royal & Royal Villa).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
HOBBIT project overview presented at European Big Data Value Forum, 21-23 Nov 2017, held in Versailles, France (Palais des Congres).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Leopard ISWC Semantic Web Challenge 2017 at ISWC2017.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Leopard ISWC Semantic Web Challenge 2017 at ISWC2017.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Benchmarking Link Discovery Systems for Geo-Spatial Data presented at BLINK2017 at ISWC2017.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Journal presented at AlignmentTrack at ISWC2017.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Poster presented at the OAEI Ontology Matching (OM) workshop at ISWC 2017 (HOBBIT Link Discovery Task).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Valdestilhas et al. propose using the most frequent K characters (MFKC) as a string similarity measure and develop efficient filtering approaches. They define MFKC and derive a similarity function σ. They present three filters - a hash intersection filter, frequency filter, and most frequent character filter - to efficiently compute string pairs with σ above a threshold. Their experimental evaluation shows the filters improve runtime over naive approaches while maintaining high precision, recall, and F-measure.
Presentation at ACM Conference - Semantics2017, September 11--14, 2017, Amsterdam, Netherlands
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
More from Holistic Benchmarking of Big Linked Data (20)
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
The technology uses reclaimed CO₂ as the dyeing medium in a closed loop process. When pressurized, CO₂ becomes supercritical (SC-CO₂). In this state CO₂ has a very high solvent power, allowing the dye to dissolve easily.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...Scintica Instrumentation
Targeting Hsp90 and its pathogen Orthologs with Tethered Inhibitors as a Diagnostic and Therapeutic Strategy for cancer and infectious diseases with Dr. Timothy Haystead.
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfSelcen Ozturkcan
Ozturkcan, S., Berndt, A., & Angelakis, A. (2024). Mending clothing to support sustainable fashion. Presented at the 31st Annual Conference by the Consortium for International Marketing Research (CIMaR), 10-13 Jun 2024, University of Gävle, Sweden.
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...Advanced-Concepts-Team
Presentation in the Science Coffee of the Advanced Concepts Team of the European Space Agency on the 07.06.2024.
Speaker: Diego Blas (IFAE/ICREA)
Title: Gravitational wave detection with orbital motion of Moon and artificial
Abstract:
In this talk I will describe some recent ideas to find gravitational waves from supermassive black holes or of primordial origin by studying their secular effect on the orbital motion of the Moon or satellites that are laser ranged.
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills MN
By harnessing the power of High Flux Vacuum Membrane Distillation, Travis Hills from MN envisions a future where clean and safe drinking water is accessible to all, regardless of geographical location or economic status.
PPT on Direct Seeded Rice presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
Basics of crystallography, crystal systems, classes and different forms
All That Glitters is not Gold
1. Kunal Jha1
, Michael Röder1
, Axel-Cyrille Ngonga Ngomo1,2
1
AKSW, Leipzig University, Germany
2
Data Science Group, University of Paderborn, Germany
All That Glitters is not Gold
Rule-Based Curation of Reference Datasets for
Named Entity Recognition and Entity Linking
2. 30th
May 2017 Jha et al. — All That Glitters is not Gold 2
Outline
Motivation
3. 30th
May 2017 Jha et al. — All That Glitters is not Gold 3
Outline
Motivation
Rule set
4. 30th
May 2017 Jha et al. — All That Glitters is not Gold 4
Outline
Motivation
Rule set
Error types
5. 30th
May 2017 Jha et al. — All That Glitters is not Gold 5
Outline
Motivation
Rule set
Error types
Eaglet
6. 30th
May 2017 Jha et al. — All That Glitters is not Gold 6
Outline
Motivation
Rule set
Error types
Eaglet
Evaluation
Summary
∑
7. 30th
May 2017 Jha et al. — All That Glitters is not Gold 7
Motivation
8. 30th
May 2017 Jha et al. — All That Glitters is not Gold 8
Motivation
KB
9. 30th
May 2017 Jha et al. — All That Glitters is not Gold 9
Motivation
Annotation of texts (A2KB)
Bosch and Sharp are
both home appliances
producing companies.
Example from KORE50 dataset
10. 30th
May 2017 Jha et al. — All That Glitters is not Gold 10
Motivation
Named Entity Recognition
Bosch and Sharp are
both home appliances
producing companies.
Example from KORE50 dataset
11. 30th
May 2017 Jha et al. — All That Glitters is not Gold 11
KB
Motivation
Entity Linking
Bosch and Sharp are
both home appliances
producing companies.
dbr:Robert_Bosch_GmbH
dbr:Sharp_Corporation
12. 30th
May 2017 Jha et al. — All That Glitters is not Gold 12
Motivation
Evaluation
System
=
?
13. 30th
May 2017 Jha et al. — All That Glitters is not Gold 14
Motivation
Evaluation
System
=
?
14. 30th
May 2017 Jha et al. — All That Glitters is not Gold 15
Motivation
Evaluation
System
=
?
Bosch and Sharp are
both home appliances
producing companies.
dbr:Sharp
15. 30th
May 2017 Jha et al. — All That Glitters is not Gold 16
Motivation
Evaluation
System
=
?
Bosch and Sharp are
both home appliances
producing companies.
dbr:Sharp
How can we check our gold standards?
16. 30th
May 2017 Jha et al. — All That Glitters is not Gold 17
Motivation
Approach
Set of annotation rules for gold standards
Error types violating these rules
Tool for the semi-automatic check of gold standards
17. 30th
May 2017 Jha et al. — All That Glitters is not Gold 18
Outline
Motivation
Rule set
Error types
Eaglet
Evaluation
Summary
18. 30th
May 2017 Jha et al. — All That Glitters is not Gold 19
A1: A single sentence has a linear structure
Rule Set
Assumptions
Barack and Michelle Obama
19. 30th
May 2017 Jha et al. — All That Glitters is not Gold 20
A2: The annotation should cover as many consecutive words as
possible
- Name the entity as precisely as possible
Rule Set
Assumptions
legendary cryptanalyst Alan Turing
20. 30th
May 2017 Jha et al. — All That Glitters is not Gold 21
A3: Each annotation should be linked to the most precise
resource of the KB
Rule Set
Assumptions
113th
United States Congress
dbr:113th_United_States_Congress
dbr:United_States_Congress
X
21. 30th
May 2017 Jha et al. — All That Glitters is not Gold 22
A4: The annotated string should point to a specific entity
A5: A set of entity types TA
is given to define which entities can
be found and which resources of a KB that can be used for
linking
Rule Set
Assumptions
TA
= {dbo:Person, dbo:Place, dbo:Organisation}
22. 30th
May 2017 Jha et al. — All That Glitters is not Gold 23
R1 dataset and documents
- Each Dataset D is a set of documents
- Each document d is an ordered set of words
d={w1
,...,wn
}
Rule Set
23. 30th
May 2017 Jha et al. — All That Glitters is not Gold 24
R2 words
- Each word wi
d∈ is a sequence of characters or digits
starting
at the beginning of the document or
after a whitespace
- And ending
at the end of the document or
before a whitespace or punctuation character.
Rule Set
24. 30th
May 2017 Jha et al. — All That Glitters is not Gold 25
R3 entities for annotation
- The annotation process relies on a set of entities
- E might contain emerging entities (EEs)
Rule Set
E={e|τ(e)∩T A≠∅}
25. 30th
May 2017 Jha et al. — All That Glitters is not Gold 26
R4 annotation
-
(a) is a sequence of consecutive words
(b) Is a URI that links the sequence to an entity
i. e is the most precise entity possible
ii. that represents a as described in A3
Rule Set
Sa
ua e=δ(ua)
a=(Sa ,ua)
26. 30th
May 2017 Jha et al. — All That Glitters is not Gold 27
R5 annotation function
(a)
(b)
(c) has to be complete
Rule Set
A={a1, ... ,am}
ρ(d , K , E ,T A)=A
δ(uai
)∈E
∀ ai ,a j∈A∧(Sai
,Sa j
⊂d ),(Sai
∩Sa j
=∅)
A
27. 30th
May 2017 Jha et al. — All That Glitters is not Gold 28
Outline
Motivation
Rule set
Error types
Eaglet
Evaluation
Summary
28. 30th
May 2017 Jha et al. — All That Glitters is not Gold 29
Positioning error
violates rules 2 + 4(a)
Error types
Müller_scored a hattrick against England.
[...], a performance space that opened in 2006 [...]
dbr:Thomas_Müller_(footballer)
dbr:Man
Examples from DBpedia Spotlight and KORE50 datasets
dbr:England_national_football_team
29. 30th
May 2017 Jha et al. — All That Glitters is not Gold 30
Ovelapping error
violates rule 5(b)
Error types
The only accident engineers said, was when one Google car
was rear-ended while stopped at a traffic light.
dbr:Car
Example from DBpedia Spotlight dataset
dbr:Google_driverless_car
30. 30th
May 2017 Jha et al. — All That Glitters is not Gold 31
Combined marking
violates rule 4(b)i
Error types
In December 2012, [...]
dbr:December
dbr:2012
31. 30th
May 2017 Jha et al. — All That Glitters is not Gold 32
Long description error
violates rule 4(b)ii
Error types
The car is a project of Google, which has been working in
secret but in plain view on vehicles that can drive themselves, [...]
dbr:Driverless_car
Example from DBpedia Spotlight dataset
32. 30th
May 2017 Jha et al. — All That Glitters is not Gold 33
Missing entity
violates rule 5(c)
- Inconsistent marking
URI errors
- Outdated URI
dbr:People’s_Republic_of_China → dbr:China
- Disambiguation URI
dbr:Teresa
- Invalid URI
*null*
Error types
Example from DBpedia Spotlight dataset
33. 30th
May 2017 Jha et al. — All That Glitters is not Gold 34
Outline
Motivation
Rule set
Error types
Eaglet
Evaluation
Summary
35. 30th
May 2017 Jha et al. — All That Glitters is not Gold 36
Outline
Motivation
Rule set
Error types
Eaglet
Evaluation
Summary
36. 30th
May 2017 Jha et al. — All That Glitters is not Gold 37
Evaluation
Errors found
37. 30th
May 2017 Jha et al. — All That Glitters is not Gold 38
Only 4 datasets come with a set of Types
- 25 documents of ACE
- 25 documents of AIDA/CoNLL
- 30 documents of OKE 2015
Evaluation
Quality of identified errors
T A
38. 30th
May 2017 Jha et al. — All That Glitters is not Gold 39
URI errors have 0.94 accuracy
Minor problems with the CM module, e.g.
Evaluation
Quality of identified errors
Example from AIDA/CoNLL dataset
Steve Pagani VIENNA
Interannotator agreement in brackets
39. 30th
May 2017 Jha et al. — All That Glitters is not Gold 40
Evaluation
Influence on evaluation
40. 30th
May 2017 Jha et al. — All That Glitters is not Gold 41
Outline
Motivation
Rule set
Error types
Eaglet
Evaluation
Summary
∑
41. 30th
May 2017 Jha et al. — All That Glitters is not Gold 42
Summary
NER/EL gold standards can contain severe errors
42. 30th
May 2017 Jha et al. — All That Glitters is not Gold 43
Summary
NER/EL gold standards can contain severe errors
For the semi-automatic check of gold standards, we developed
- a set of rules
- a tool (will be presented during the poster session)
43. 30th
May 2017 Jha et al. — All That Glitters is not Gold 44
Summary
NER/EL gold standards can contain severe errors
For the semi-automatic check of gold standards, we developed
- a set of rules
- a tool (will be presented during the poster session)
We showed the quality of gold standards has an impact on the
evaluation results
44. Kunal Jha1
, Michael Röder1
, Axel-Cyrille Ngonga Ngomo2
1
AKSW, Leipzig University, Germany
2
Data Science Group, University of Paderborn, Germany
roeder@informatik.uni-leipzig.de
https://github.com/aksw/eaglet
Thanks for your attention!
This work has been supported by the H2020 project HOBBIT (GA no. 688227) as well as the the EuroStars projects DIESEL (project no.
01QE1512C) and QAMEL (project no. 01QE1549C).
45. 30th
May 2017 Jha et al. — All That Glitters is not Gold 46
Completion module
- 10 annotation systems
- 5 have to “vote” for an annotation to suggest it to the user
- Missed entities were found
74% for ACE2004
92% for AIDA/CoNLL
57% for OKE2015
Evaluation
Quality of identified errors
46. 30th
May 2017 Jha et al. — All That Glitters is not Gold 47
Evaluation
Influence on evaluation