Journal presented at AlignmentTrack at ISWC2017.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
The document discusses testing the use of the SERONTO ontology for semantic data integration of distributed ecological databases from ALTER-Net and LTER Europe. Five databases were independently mapped to SERONTO concepts and queries could be run across the integrated data without knowledge of the underlying database structures. However, the effort required for mapping was significant and maintaining reference lists will be crucial. More use cases are needed to fully evaluate SERONTO's potential for LTER data integration.
Entity Linking Combining Open Source Annotatorspruiz_
Poster for 2015 NAACL Demo: Ruiz, Pablo, Thierry Poibeau, and Frédérique Mélanie (2015). ELCO3 : Entity Linking with Corpus Coherence Combining Open Source Annotators. In Proceedings of the Demonstrations at NAACL 2015. Denver, U.S
In the domain of chemistry one of the greatest benefits to publishing research is that data can be shared. Unfortunately, the vast majority of chemical structure data associated with scientific publications remain locked up in document form, primarily in PDF files or trapped on webpages. Despite the explosive growth of online chemical databases and the overall maturity of cheminformatics platforms, many barriers stifle the exchange of chemical structures via publications. These challenges include incomplete support by accepted standards (especially InChI) for advanced stereochemistry, organometallic compounds and generic “Markush” representations, the difference between human-readable and computer-readable forms of data, and challenges with the computer representation of chemical structures. To address these obstacles to chemical structure sharing, US EPA National Center for Computational Toxicology scientists are using a combination of cheminformatics applications and online repositories to distribute chemical structure data associated with their publications. This presentation will describe how EPA-NCCT chemical structure data that is amenable to indexing and distribution are shared and highlight the benefit of open data sharing for modeling, data integration, and increasing research impact. This abstract does not reflect U.S. EPA policy.
Cheminformatics methods form an essential basis for providing analytical scientists with access to data, algorithms and workflows. There are an increasing number of free online databases (compound databases, spectral libraries, data repositories) and a rich collection of software approaches that can be used to support automated structure verification and elucidation, specifically for Nuclear Magnetic Resonance (NMR) and Mass Spectrometry (MS). This presentation will provide an overview of freely available data, tools, databases and approaches available to support chemical structure verification and elucidation and highlight some of the known issues regarding data quality and suggest approaches for resolving some of the issues. The importance of structure and spectral standards for data exchange will be discussed, especially with regard to how spectral data can be made openly available to the community via online tools and through scientific publishing. This work does not necessarily reflect U.S. EPA policy.
The US EPA’s National Center for Computational Toxicology (NCCT) has been both measuring and aggregating data to support our research efforts for over a decade. We have delivered these data via a number of publicly accessible websites, so-called dashboards, to provide transparent access to the outputs of the center. Since the inception of our research, software projects technologies have changed dramatically, as have the expectations regarding the methods by which to access data. Our informatics efforts provide access to millions of dollars of high-throughput screening data available in open, downloadable formats, via web services and through a rich web interface. Similarly, we provide access to experimental and predicted data associated with ~760,000 substances to serve the environmental chemistry community, and open source code for predictive models. This presentation will provide an overview of the efforts of NCCT to provide transparent access to our research and data via our publications (and accompanying supplementary data), via our Open Data policies, and through our databases, software tools and web services. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015Charlie Hull
BioSolr, funded by the BBSRC, is a collaboration between open source search experts Flax and the European Bioinformatics Institute (EBI), aiming to significantly advance the state of the art with regard to indexing and querying biomedical data with freely available open source software
IRE2014 Filtering Tweets Related to an entitykartik179
This document describes a project to filter tweets related to entities. The team used supervised machine learning with features extracted from tweets, entity homepages, and wikipedia pages to train an SVM model to classify tweets as related or unrelated to entities. The preprocessing removed user mentions, URLs, punctuation, and stop words before extracting features. Testing the model on 61 entities achieved an overall accuracy of 80% for classifying tweets, with accuracy for individual entities ranging from 96% to 40%.
The Royal Society of Chemistry was pleased to contribute to the Open PHACTS project, a 3 year project funded by the Innovative Medicines Initiative fund from the European Union. For three years we developed our existing platforms, created new and innovative widgets and data platforms to handle chemistry data, extended existing chemistry ontologies and embraced the semantic web open standards. As a result RSC served as the centralized chemistry data hub for the project. With the conclusion of the Open PHACTS project we will report on our experiences resulting from our participation in the project and provide an overview of what tools, capabilities and data have been released into the community as a result of our participation and how this may influence future projects. This will include the Open PHACTS open chemistry data dump including the chemistry related data in chemistry and semantic web consumable formats as well as some of the resulting chemistry software released to the community. The Open PHACTS project resulted in significant contributions to the chemistry community as well as the supporting pharmaceutical companies and biomedical community.
The document discusses testing the use of the SERONTO ontology for semantic data integration of distributed ecological databases from ALTER-Net and LTER Europe. Five databases were independently mapped to SERONTO concepts and queries could be run across the integrated data without knowledge of the underlying database structures. However, the effort required for mapping was significant and maintaining reference lists will be crucial. More use cases are needed to fully evaluate SERONTO's potential for LTER data integration.
Entity Linking Combining Open Source Annotatorspruiz_
Poster for 2015 NAACL Demo: Ruiz, Pablo, Thierry Poibeau, and Frédérique Mélanie (2015). ELCO3 : Entity Linking with Corpus Coherence Combining Open Source Annotators. In Proceedings of the Demonstrations at NAACL 2015. Denver, U.S
In the domain of chemistry one of the greatest benefits to publishing research is that data can be shared. Unfortunately, the vast majority of chemical structure data associated with scientific publications remain locked up in document form, primarily in PDF files or trapped on webpages. Despite the explosive growth of online chemical databases and the overall maturity of cheminformatics platforms, many barriers stifle the exchange of chemical structures via publications. These challenges include incomplete support by accepted standards (especially InChI) for advanced stereochemistry, organometallic compounds and generic “Markush” representations, the difference between human-readable and computer-readable forms of data, and challenges with the computer representation of chemical structures. To address these obstacles to chemical structure sharing, US EPA National Center for Computational Toxicology scientists are using a combination of cheminformatics applications and online repositories to distribute chemical structure data associated with their publications. This presentation will describe how EPA-NCCT chemical structure data that is amenable to indexing and distribution are shared and highlight the benefit of open data sharing for modeling, data integration, and increasing research impact. This abstract does not reflect U.S. EPA policy.
Cheminformatics methods form an essential basis for providing analytical scientists with access to data, algorithms and workflows. There are an increasing number of free online databases (compound databases, spectral libraries, data repositories) and a rich collection of software approaches that can be used to support automated structure verification and elucidation, specifically for Nuclear Magnetic Resonance (NMR) and Mass Spectrometry (MS). This presentation will provide an overview of freely available data, tools, databases and approaches available to support chemical structure verification and elucidation and highlight some of the known issues regarding data quality and suggest approaches for resolving some of the issues. The importance of structure and spectral standards for data exchange will be discussed, especially with regard to how spectral data can be made openly available to the community via online tools and through scientific publishing. This work does not necessarily reflect U.S. EPA policy.
The US EPA’s National Center for Computational Toxicology (NCCT) has been both measuring and aggregating data to support our research efforts for over a decade. We have delivered these data via a number of publicly accessible websites, so-called dashboards, to provide transparent access to the outputs of the center. Since the inception of our research, software projects technologies have changed dramatically, as have the expectations regarding the methods by which to access data. Our informatics efforts provide access to millions of dollars of high-throughput screening data available in open, downloadable formats, via web services and through a rich web interface. Similarly, we provide access to experimental and predicted data associated with ~760,000 substances to serve the environmental chemistry community, and open source code for predictive models. This presentation will provide an overview of the efforts of NCCT to provide transparent access to our research and data via our publications (and accompanying supplementary data), via our Open Data policies, and through our databases, software tools and web services. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015Charlie Hull
BioSolr, funded by the BBSRC, is a collaboration between open source search experts Flax and the European Bioinformatics Institute (EBI), aiming to significantly advance the state of the art with regard to indexing and querying biomedical data with freely available open source software
IRE2014 Filtering Tweets Related to an entitykartik179
This document describes a project to filter tweets related to entities. The team used supervised machine learning with features extracted from tweets, entity homepages, and wikipedia pages to train an SVM model to classify tweets as related or unrelated to entities. The preprocessing removed user mentions, URLs, punctuation, and stop words before extracting features. Testing the model on 61 entities achieved an overall accuracy of 80% for classifying tweets, with accuracy for individual entities ranging from 96% to 40%.
The Royal Society of Chemistry was pleased to contribute to the Open PHACTS project, a 3 year project funded by the Innovative Medicines Initiative fund from the European Union. For three years we developed our existing platforms, created new and innovative widgets and data platforms to handle chemistry data, extended existing chemistry ontologies and embraced the semantic web open standards. As a result RSC served as the centralized chemistry data hub for the project. With the conclusion of the Open PHACTS project we will report on our experiences resulting from our participation in the project and provide an overview of what tools, capabilities and data have been released into the community as a result of our participation and how this may influence future projects. This will include the Open PHACTS open chemistry data dump including the chemistry related data in chemistry and semantic web consumable formats as well as some of the resulting chemistry software released to the community. The Open PHACTS project resulted in significant contributions to the chemistry community as well as the supporting pharmaceutical companies and biomedical community.
Ontologies for Emergency & Disaster Management Stephane Fellah
Ogc meeting march 2014
OGC OWS-10 Cross-Community Interoperability
Ontologies for Emergency & Disaster Management
(The application of geospatial linked data)
The document discusses efforts to improve geospatial data interoperability through the use of ontologies and semantic technologies. It outlines work done as part of OGC OWS-10 to build and test a core geospatial ontology, develop a semantic gazetteer service using GeoSPARQL, and demonstrate how semantic approaches can enable easier integration and sharing of geospatial knowledge across systems. Next steps include standardizing the ontologies and semantically enabling additional OGC web services and architectures.
Semantic Similarity and Selection of Resources Published According to Linked ...Riccardo Albertoni
The position paper aims at discussing the potential of exploiting linked data best practice to provide metadata documenting domain specific resources created through verbose acquisition-processing pipelines. It argues that resource selection, namely the process engaged to choose a set of resources suitable for a given analysis/design purpose, must be supported by a deep comparison of their metadata. The semantic similarity proposed in our previous works is discussed for this purpose and the main issues to make it scale up to the web of data are introduced. Discussed issues contribute beyond the re-engineering of our similarity since they largely apply to every tool which is going to exploit information made available as linked data. A research plan and an exploratory phase facing the presented issues are described remarking the lessons we have learnt so far.
The document discusses data workflows and integrating open data from different sources. It defines a data workflow as a series of well-defined functional units where data is streamed between activities such as extraction, transformation, and delivery. The document outlines key steps in data workflows including extraction, integration, aggregation, and validation. It also discusses challenges around finding rules and ontologies, data quality, and maintaining workflows over time. Finally, it provides examples of data integration systems and relationships between global and source schemas.
An empirical performance evaluation of relational keyword search systemsBrowse Jobs
The document presents an empirical performance evaluation of relational keyword search systems. It evaluates 7 relational keyword search techniques and finds that many existing techniques do not provide acceptable performance for realistic retrieval tasks. In particular, memory consumption prevents many search techniques from scaling to datasets with more than tens of thousands of vertices. The evaluation also explores how factors varied in previous studies have relatively little impact on performance. The work confirms that existing systems have unacceptable performance and underscores the need for standardization in evaluating these retrieval systems.
This document discusses using ontologies to make biological and biomedical data more interoperable and FAIR (Findable, Accessible, Interoperable, Reusable). It describes several ontology services and tools provided by EMBL-EBI to help with tasks like annotating data, mapping data to ontologies, searching and accessing ontologies, and publishing structured data. It also uses the example of the BioSamples database to illustrate challenges in working with large, heterogeneous datasets and how ontologies can help address issues like normalizing descriptions and attributes to enable better searching and data integration.
Resource Description Framework Approach to Data Publication and FederationPistoia Alliance
Bob Stanley, CEO, IO Informatics, explains the utility to RDF as a standard way of defining and redefining data that could have utility in managing life science information.
This document discusses moving from a data-centric to a knowledge-centric approach for geospatial data and services. It proposes using core geospatial ontologies and semantic technologies like OWL and SPARQL to encode conceptual models, business rules, and formal semantics. This would allow automated reasoning and flexible integration of data as knowledge is shared through a semantic layer. Example applications like semantic gazetteers are presented to illustrate how existing services could be enhanced by adding semantics. The document outlines next steps like standardizing core ontologies and developing semantic profiles and services. The overall approach aims to reduce costs and burdens on users by making geospatial data and knowledge more accessible and interpretable through shared formal representations.
Delivering a Linked Data warehouse and realising the power of graphsBen Gardner
Linklaters is one of the world’s leading global law firms. The firm has a wealth of high value information held within our systems however due to the nature of these systems it is not always easy to leverage this value. Our goal was to improve decision making across the firm by transforming access to and ability to query data. To do this we wanted a solution that would combine our information, was easy to extend in an iterative fashion and would leverage our existing investment in business intelligence. To achieve this we chose to create a graph based warehouse using Linked Data. Data from our SAP Business Warehouse was combined with flat file and XML feeds from our systems of record and transformed into RDF via ETL services that loaded it into a triple store. To provide simple integration with our existing environment a SPARQL to OData service was deployed creating an OData compliant endpoint. Finally a model driven, mobile friendly, user interface was created allowing users to query, review results and explore the underlying graph. This talk will describe the approach we took and the lessons learnt.
The document summarizes a presentation by Mark A. Parsons on opportunities and challenges for data sharing and citation. The presentation discusses how all of society's grand challenges require diverse data shared across boundaries, and the vision of the Research Data Alliance (RDA) to openly share data. RDA builds social and technical bridges to enable open data sharing through developing infrastructure, standards, and best practices. The presentation also covers specific RDA activities like developing data citation recommendations and engaging members globally.
The document discusses Vision.bi Quality Gates, a centralized data quality solution. It provides an overview of quality gates and how they can be used for integration and business intelligence projects. Quality gates help define tests to constantly monitor and improve data quality. The document outlines various test types like check sums, referential integrity checks, and execution flow tests. It demonstrates how quality gates provide alerts, ensure data integrity, and validate data warehouse cubes. Clients currently using Vision.bi are also listed.
This document discusses the GLOBE architecture for federating learning object repositories. It describes:
- GLOBE's use of LOM metadata standard and OAI-PMH protocol for metadata harvesting between repositories.
- The hybrid federated query and harvesting approach used to allow distributed and centralized searching of content.
- Key components of the GLOBE architecture including repositories, registry, harvester, validation services, and the ARIADNE tools for implementing repositories.
- Analysis of LOM usage in GLOBE repositories, including which elements are used most and quality issues around metadata completeness and consistency.
Talk given by prof. T.K. Prasad at the workshop on Semantics in Geospatial Architectures: Applications and Implementation. The workshop was held from October 28-29, 2013 at Pyle Center (702 Langdon Street, Madison, WI), University of Wisconsin-Madison.
With the continuously increasing number of datasets published in the Web of Data and form part of the Linked Open Data Cloud, it becomes more and more essential to identify resources that correspond to the same real world object in order to interlink web resources and set the basis for large-scale data integration. This requirement becomes apparent in a multitude of domains ranging from science (marine research, biology, astronomy, pharmacology) to semantic publishing and cultural domains. In this context, instance matching is of crucial importance.
It is though essential at this point to develop, along with instance and entity matching systems, benchmarks to determine the weak and strong points of those systems, as well as their overall quality in order to support users in deciding the system to use for their needs. Hence, well defined, and good quality benchmarks are important for comparing the performance of the developed instance matching systems.
In this tutorial we aim at:
- Discussing the state-of-the-art instance matching benchmarks
- Presenting the benchmark design principles
- Providing an analysis of the performance results of instance matching systems for the presented benchmarks
- Presenting the research directions that should be exploited for the creation of novel benchmarks to answer the needs of the Linked Data paradigm.
Please click here for the Tutorial web-page: http://www.ics.forth.gr/isl/BenchmarksTutorial/
"EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs" as presented in Sthe 17th International Semantic Web Conference ISWC, 9th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Benchmarking Big Linked Data: The case of the HOBBIT Project" as presented in the First International Workshop on Semantic Web Technologies for Health Data Management (SWH 2018), co-located with ISWC 2018, 9th October, 2018 held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
More Related Content
Similar to Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017
Ontologies for Emergency & Disaster Management Stephane Fellah
Ogc meeting march 2014
OGC OWS-10 Cross-Community Interoperability
Ontologies for Emergency & Disaster Management
(The application of geospatial linked data)
The document discusses efforts to improve geospatial data interoperability through the use of ontologies and semantic technologies. It outlines work done as part of OGC OWS-10 to build and test a core geospatial ontology, develop a semantic gazetteer service using GeoSPARQL, and demonstrate how semantic approaches can enable easier integration and sharing of geospatial knowledge across systems. Next steps include standardizing the ontologies and semantically enabling additional OGC web services and architectures.
Semantic Similarity and Selection of Resources Published According to Linked ...Riccardo Albertoni
The position paper aims at discussing the potential of exploiting linked data best practice to provide metadata documenting domain specific resources created through verbose acquisition-processing pipelines. It argues that resource selection, namely the process engaged to choose a set of resources suitable for a given analysis/design purpose, must be supported by a deep comparison of their metadata. The semantic similarity proposed in our previous works is discussed for this purpose and the main issues to make it scale up to the web of data are introduced. Discussed issues contribute beyond the re-engineering of our similarity since they largely apply to every tool which is going to exploit information made available as linked data. A research plan and an exploratory phase facing the presented issues are described remarking the lessons we have learnt so far.
The document discusses data workflows and integrating open data from different sources. It defines a data workflow as a series of well-defined functional units where data is streamed between activities such as extraction, transformation, and delivery. The document outlines key steps in data workflows including extraction, integration, aggregation, and validation. It also discusses challenges around finding rules and ontologies, data quality, and maintaining workflows over time. Finally, it provides examples of data integration systems and relationships between global and source schemas.
An empirical performance evaluation of relational keyword search systemsBrowse Jobs
The document presents an empirical performance evaluation of relational keyword search systems. It evaluates 7 relational keyword search techniques and finds that many existing techniques do not provide acceptable performance for realistic retrieval tasks. In particular, memory consumption prevents many search techniques from scaling to datasets with more than tens of thousands of vertices. The evaluation also explores how factors varied in previous studies have relatively little impact on performance. The work confirms that existing systems have unacceptable performance and underscores the need for standardization in evaluating these retrieval systems.
This document discusses using ontologies to make biological and biomedical data more interoperable and FAIR (Findable, Accessible, Interoperable, Reusable). It describes several ontology services and tools provided by EMBL-EBI to help with tasks like annotating data, mapping data to ontologies, searching and accessing ontologies, and publishing structured data. It also uses the example of the BioSamples database to illustrate challenges in working with large, heterogeneous datasets and how ontologies can help address issues like normalizing descriptions and attributes to enable better searching and data integration.
Resource Description Framework Approach to Data Publication and FederationPistoia Alliance
Bob Stanley, CEO, IO Informatics, explains the utility to RDF as a standard way of defining and redefining data that could have utility in managing life science information.
This document discusses moving from a data-centric to a knowledge-centric approach for geospatial data and services. It proposes using core geospatial ontologies and semantic technologies like OWL and SPARQL to encode conceptual models, business rules, and formal semantics. This would allow automated reasoning and flexible integration of data as knowledge is shared through a semantic layer. Example applications like semantic gazetteers are presented to illustrate how existing services could be enhanced by adding semantics. The document outlines next steps like standardizing core ontologies and developing semantic profiles and services. The overall approach aims to reduce costs and burdens on users by making geospatial data and knowledge more accessible and interpretable through shared formal representations.
Delivering a Linked Data warehouse and realising the power of graphsBen Gardner
Linklaters is one of the world’s leading global law firms. The firm has a wealth of high value information held within our systems however due to the nature of these systems it is not always easy to leverage this value. Our goal was to improve decision making across the firm by transforming access to and ability to query data. To do this we wanted a solution that would combine our information, was easy to extend in an iterative fashion and would leverage our existing investment in business intelligence. To achieve this we chose to create a graph based warehouse using Linked Data. Data from our SAP Business Warehouse was combined with flat file and XML feeds from our systems of record and transformed into RDF via ETL services that loaded it into a triple store. To provide simple integration with our existing environment a SPARQL to OData service was deployed creating an OData compliant endpoint. Finally a model driven, mobile friendly, user interface was created allowing users to query, review results and explore the underlying graph. This talk will describe the approach we took and the lessons learnt.
The document summarizes a presentation by Mark A. Parsons on opportunities and challenges for data sharing and citation. The presentation discusses how all of society's grand challenges require diverse data shared across boundaries, and the vision of the Research Data Alliance (RDA) to openly share data. RDA builds social and technical bridges to enable open data sharing through developing infrastructure, standards, and best practices. The presentation also covers specific RDA activities like developing data citation recommendations and engaging members globally.
The document discusses Vision.bi Quality Gates, a centralized data quality solution. It provides an overview of quality gates and how they can be used for integration and business intelligence projects. Quality gates help define tests to constantly monitor and improve data quality. The document outlines various test types like check sums, referential integrity checks, and execution flow tests. It demonstrates how quality gates provide alerts, ensure data integrity, and validate data warehouse cubes. Clients currently using Vision.bi are also listed.
This document discusses the GLOBE architecture for federating learning object repositories. It describes:
- GLOBE's use of LOM metadata standard and OAI-PMH protocol for metadata harvesting between repositories.
- The hybrid federated query and harvesting approach used to allow distributed and centralized searching of content.
- Key components of the GLOBE architecture including repositories, registry, harvester, validation services, and the ARIADNE tools for implementing repositories.
- Analysis of LOM usage in GLOBE repositories, including which elements are used most and quality issues around metadata completeness and consistency.
Talk given by prof. T.K. Prasad at the workshop on Semantics in Geospatial Architectures: Applications and Implementation. The workshop was held from October 28-29, 2013 at Pyle Center (702 Langdon Street, Madison, WI), University of Wisconsin-Madison.
With the continuously increasing number of datasets published in the Web of Data and form part of the Linked Open Data Cloud, it becomes more and more essential to identify resources that correspond to the same real world object in order to interlink web resources and set the basis for large-scale data integration. This requirement becomes apparent in a multitude of domains ranging from science (marine research, biology, astronomy, pharmacology) to semantic publishing and cultural domains. In this context, instance matching is of crucial importance.
It is though essential at this point to develop, along with instance and entity matching systems, benchmarks to determine the weak and strong points of those systems, as well as their overall quality in order to support users in deciding the system to use for their needs. Hence, well defined, and good quality benchmarks are important for comparing the performance of the developed instance matching systems.
In this tutorial we aim at:
- Discussing the state-of-the-art instance matching benchmarks
- Presenting the benchmark design principles
- Providing an analysis of the performance results of instance matching systems for the presented benchmarks
- Presenting the research directions that should be exploited for the creation of novel benchmarks to answer the needs of the Linked Data paradigm.
Please click here for the Tutorial web-page: http://www.ics.forth.gr/isl/BenchmarksTutorial/
Similar to Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017 (20)
"EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs" as presented in Sthe 17th International Semantic Web Conference ISWC, 9th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Benchmarking Big Linked Data: The case of the HOBBIT Project" as presented in the First International Workshop on Semantic Web Technologies for Health Data Management (SWH 2018), co-located with ISWC 2018, 9th October, 2018 held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning Benchmark" as presented in SSWS 2018 co-located with the 17th International Semantic Web Conference ISWC, 9th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"The DEBS Grand Challenge 2018" as presented in the 12th ACM International Conference on Distributed and Event-Based Systems (DEBS 2017), 25 - 29 June, 2017 held in Hammilton, New Zeland
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Benchmarking of distributed linked data streaming systems" as presented in the Stream Reasoning Workshop 2018, January 16-17, 2018, held by Department of Informatics DDIS (University of Zurich) in Zurich, Suisse
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"SQCFramework: SPARQL Query Containment Benchmarks Generation Framework" as presented in the 9th International Conference on Knowledge Capture(K-Cap 2017), December 4th-6th, 2017, held in Austin, USA.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation" as presented in the 17th International Semantic Web Conference ISWC ( ournal track), 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"The DEBS Grand Challenge 2017" as presented in the The 11th ACM International Conference on Distributed and Event-Based Systems, 19 - 23 June, 2017 held in Barcelona, Spain
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QALD-9 Question Answering over Linked Data Challenge" as presented in the 17th International Semantic Web Conference ISWC, 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Scalable Link Discovery for Modern Data-Driven Applications" poster presented ECAI 2016, September 2016, held in the Hague, Netherlands.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"An Evaluation of Models for Runtime Approximation in Link Discovery" as presented in the IEEE/WIC/ACM WI, August 25th, 2017, held in Leipzig, Germany.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Scalable Link Discovery for Modern Data-Driven Applications" as presented in the 15th International Semantic Web Conference ISWC, Doctoral Consortium, October 18th, 2016, held in Kobe, Japan
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint Federation" as presented in SSWS 2018 co-located with the 17th International Semantic Web Conference ISWC, 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"SPgen: A Benchmark Generator for Spatial Link Discovery Tools" as presented in Ontology Matching (OM) hosted by the 17th International Semantic Web Conference ISWC, 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign" was presented in Ontology Matching (OM) hosted by the 17th International Semantic Web Conference ISWC, 8th - 12th of October 2018, held in Monterey, California, USA
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
OKE Challenge was hosted by European Semantic Web Conference ESWC, 3-7 June 2018, held in Heraklion, Crete, Greece (Aldemar Knossos Royal & Royal Villa).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
MOCHA Challenge was hosted by European Semantic Web Conference ESWC, 3-7 June 2018, held in Heraklion, Crete, Greece (Aldemar Knossos Royal & Royal Villa).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Paper presented at European Semantic Web Conference ESWC, 3-7 June 2018, held in Heraklion, Crete, Greece (Aldemar Knossos Royal & Royal Villa).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
HOBBIT project overview presented at European Big Data Value Forum, 21-23 Nov 2017, held in Versailles, France (Palais des Congres).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Leopard ISWC Semantic Web Challenge 2017 at ISWC2017.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
More from Holistic Benchmarking of Big Linked Data (20)
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...AbdullaAlAsif1
The pygmy halfbeak Dermogenys colletei, is known for its viviparous nature, this presents an intriguing case of relatively low fecundity, raising questions about potential compensatory reproductive strategies employed by this species. Our study delves into the examination of fecundity and the Gonadosomatic Index (GSI) in the Pygmy Halfbeak, D. colletei (Meisner, 2001), an intriguing viviparous fish indigenous to Sarawak, Borneo. We hypothesize that the Pygmy halfbeak, D. colletei, may exhibit unique reproductive adaptations to offset its low fecundity, thus enhancing its survival and fitness. To address this, we conducted a comprehensive study utilizing 28 mature female specimens of D. colletei, carefully measuring fecundity and GSI to shed light on the reproductive adaptations of this species. Our findings reveal that D. colletei indeed exhibits low fecundity, with a mean of 16.76 ± 2.01, and a mean GSI of 12.83 ± 1.27, providing crucial insights into the reproductive mechanisms at play in this species. These results underscore the existence of unique reproductive strategies in D. colletei, enabling its adaptation and persistence in Borneo's diverse aquatic ecosystems, and call for further ecological research to elucidate these mechanisms. This study lends to a better understanding of viviparous fish in Borneo and contributes to the broader field of aquatic ecology, enhancing our knowledge of species adaptations to unique ecological challenges.
Or: Beyond linear.
Abstract: Equivariant neural networks are neural networks that incorporate symmetries. The nonlinear activation functions in these networks result in interesting nonlinear equivariant maps between simple representations, and motivate the key player of this talk: piecewise linear representation theory.
Disclaimer: No one is perfect, so please mind that there might be mistakes and typos.
dtubbenhauer@gmail.com
Corrected slides: dtubbenhauer.com/talks.html
PPT on Direct Seeded Rice presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
The technology uses reclaimed CO₂ as the dyeing medium in a closed loop process. When pressurized, CO₂ becomes supercritical (SC-CO₂). In this state CO₂ has a very high solvent power, allowing the dye to dissolve easily.
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017
1. Instance Matching Benchmarks in the
Era of Linked Data
Tzanina Saveta,
Evangelia Daskalaki, Irini Fundulaki,Giorgos Flouris
Institute of Computer Science – FORTH , Greece
Benchmarking Link Discovery Systems for Geo-Spatial Data 1
2. Linked Data –The LOD Cloud
27-Oct-17 Benchmarking Link Discovery Systems for Geo-Spatial Data 2*Adapted from Suchanek & Weikum tutorial@SIGMOD 2013
Same entity can be
described in
different sources
3. Instance Matching: the cornerstone for Linked Data
Benchmarking Link Discovery Systems for Geo-Spatial Data 3
Data Acquisition
Data Integration
Open/Social data
Problem: How can we automatically
recognize multiple mentions of the
same entity across or within sources?
Data Evolution
Solution: Instance Matching
4. Instance Matching in the LOD
27-Oct-17 Benchmarking Link Discovery Systems for Geo-Spatial Data 4
Sets
RDF/OWL triples
*Adapted from Suchanek & Weikum tutorial@SIGMOD 2013
Many sources to
match
Rich semantics
Value
Structure
Logical variations
5. Instance MatchingTechniques andTools
• People interconnect their dataset with existing ones.
– These links are often manually curated (or semi-
automatically generated).
• Size and number of datasets is huge, so it is vital to
automatically detect additional links that is making the graph
more dense
• Instance matching research has led to the development of a
number of systems.
Benchmarking Link Discovery Systems for Geo-Spatial Data 5
1. How to compare these?
2. How can we assess their performance?
3. How can we push the systems to get better?
The systems must be benchmarked!
6. Ingredients of an Instance Matching Benchmark
• It is organized in test cases each addressing different kind of
requirements
• Datasets
– The raw material of the benchmarks.These are the source and the
target dataset that will be matched together to find the instances that
refer to the same real world entity
• Gold Standard (GroundTruth / Reference Alignment)
– The “correct answer sheet” used to judge the completeness and
soundness of the instance matching algorithms
• Metrics/Key Performance Indicators
– The performance metric(s) that determine the systems behavior and
performance
Benchmarking Link Discovery Systems for Geo-Spatial Data 6
7. Datasets
Distinguish between Real and Synthetic datasets :
– Real Datasets:
• Realistic conditions for heterogeneity problems
• Realistic distributions
• Error prone Reference Alignment
– Synthetic Datasets:
• Fully controlled test conditions
• Accurate Gold Standards
• Unrealistic distributions
Benchmarking Link Discovery Systems for Geo-Spatial Data 7
8. DataVariations in Datasets
Benchmarking Link Discovery Systems for Geo-Spatial Data 8
Variations
Value
- Name style abbreviation
-Typographical errors
- Change format
(date/gender/number)
- Synonym Change
- Multilingualism
Structural
-Change property depth
-Delete/Add property
-Split property values
-Transformation of
object to data type
property
-Transformation of data
to object type property
Logical
-Delete/Modify Class
Assertions
-Invert property
assertions
-Change property
hierarchy
-Assert disjoint classes
9. Gold Standards and Metrics
• Metrics/Key Performance Indicators
– Precision/Recall/F-measure
Benchmarking Link Discovery Systems for Geo-Spatial Data 9
Gold
Standard
Result
Set
False
Negative (FN)
True
Positive (TP)
False
Positive (FP)
Recall r =TP / (TP + FN)
Precision p =TP / (TP + FP)
F-measure f = 2 * p * r / (p + r)
10. Design Criteria for Instance Matching Benchmarks
Benchmarking Link Discovery Systems for Geo-Spatial Data 10
Systematic
Procedure
matching tasks are reproducible and the execution is
comparable
Availability related to the availability of the benchmark in time
Quality precise evaluation rules and high quality ontologies
Equity no system privileged during the evaluation process
Dissemination how many systems have used this benchmark to be
evaluated with
Volume how many instances did the datasets contain
Gold Standard existence of gold standard and it’s accuracy
11. Benchmarking
• Instance matching techniques have, until recently, been
benchmarked in an ad-hoc way
• There does not exist a standard way of benchmarking the
performance of the systems, when it comes to Linked Data
• On the other hand, Instance Matching benchmarks have been
mainly driven forward by the Ontology Alignment Evaluation
Initiative (OAEI)
– organizes annual campaign for ontology matching since 2005
– hosts independent benchmarks
• In 2009, OAEI introduced the Instance Matching (IM)Track
– focuses on the evaluation of different instance matching
techniques and tools for Linked Data
Benchmarking Link Discovery Systems for Geo-Spatial Data 11
13. Comparison of synthetic Benchmarks
Benchmarking Link Discovery Systems for Geo-Spatial Data 13
14. Comparison of real benchmarks
Benchmarking Link Discovery Systems for Geo-Spatial Data 14
15. Wrapping Up
Benchmarking Link Discovery Systems for Geo-Spatial Data 15
Which benchmarks included multilingual datasets?
OAEI RDFT
2013 (French-
English)
ID-REC 2014
(English- Italian)
Author Task
(English –
Italian)
16. Wrapping Up: Benchmarks
Benchmarking Link Discovery Systems for Geo-Spatial Data 16
Which benchmarks included value variations into the test
cases?
OAEI IIMB
2009
OAEI IIMB
2010
OAEI Persons-
Restaurants
2010
ONTOBI
OAEI IIMB
2011
Sandbox 2012
OAEI IIMB
2012
OAEI RDFT
2013
ID-REC 2014
SPIMBENCH
2015
Author Task
2015
ARS
DI 2010 DI 2011
17. Wrapping up: Benchmarks
Which benchmarks included structural variations into the test
cases?
OAEI IIMB
2009
OAEI IIMB
2010
OAEI Persons-
Restaurants
2010
ONTOBI
OAEI IIMB
2011
OAEI IIMB
2012
OAEI RDFT
2013
ID-REC 2014
SPIMBENCH
2015
Author Task
2015
ARS DI 2010
DI 2011
18. Wrapping up: Benchmarks
Which benchmarks included logical variations into the test
cases?
OAEI IIMB
2009
OAEI IIMB
2010
OAEI IIMB
2011
OAEI IIMB
2012
SPIMBENCH
2015
19. Wrapping up: Benchmarks
Which benchmarks included combination of the variations into
the test cases?
IIMB 2009 IIMB 2010 IIMB 2011
IIMB 2012 RDFT 2013 ID-REC 2014
SPIMBENCH
2015
Author Task
2015
21. Wrapping up: Benchmarks
Which benchmarks included both combination of the variations
and was voluminous at the same time?
SPIMBENCH 2015
22. Open Issues
• Issue 1:
Only one benchmark that tackles both, combination of
variations and scalability issues
• Issue 2 :
Not enough IM benchmark using the full expressiveness of
RDF/OWL language
23. Wrapping Up: Systems for Benchmarks
Outcomes as far as systems are concerned:
• Systems can handle the value variations, the
structural variation, and the simple logical variations
separately
• More work needed for complex variations
(combination of value, structural, and logical)
• More work needed for structural variations
• Enhancement of systems to cope with the clustering
of the mappings (1-n mappings)
24. Conclusion
• Many instance matching benchmarks have been
proposed
• Each of them answering to some of the needs of
instance matching systems
• It is time now to start creating benchmarks that will
“show the way to the future” and make them better
25. 25
This work was supported by grands from the EU H2020 Framework Programme
provided for the project HOBBIT (GA no. 688227).