The slides present a benchmark of RDF stores with real-world datasets and queries from the EU Publications Office (PO). The study compares the performance of four commercial triple stores: Stardog 4.3 EE, GraphDB 8.0.3 EE, Oracle 12.2c and Virtuoso 7.2.4.2 with respect to the following requirements: bulk loading, scalability, stability and query execution.
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
This presentation will provide a brief introduction to logical reasoning and overview of the most popular semantic schema and ontology languages: RDFS and the profiles of OWL 2.
While automatic reasoning has always inspired the imagination, numerous projects have failed to deliver to the promises. The typical pitfalls related to ontologies and symbolic reasoning fall into two categories:
- Over-engineered ontologies. The selected ontology language and modeling patterns can be too expressive. This can make the results of inference hard to understand and verify, which in its turn makes KG hard to evolve and maintain. It can also impose performance penalties far greater than the benefits.
- Inappropriate reasoning support. There are many inference algorithms and implementation approaches, which work well with taxonomies and conceptual models of few thousands of concepts, but cannot cope with KG of millions of entities.
- Inappropriate data layer architecture. One such example is reasoning with virtual KG, which is often infeasible.
Semantics for integrated laboratory analytical processes - The Allotrope Pers...OSTHUS
The software environment currently found in the analytical community consists of a patchwork of incompatible software, proprietary and non-standardized file formats,
which is further complicated by incomplete, inconsistent and potentially inaccurate metadata. To overcome these issues, the Allotrope Foundation develops a
comprehensive and innovative Framework consisting of metadata dictionaries, data standards, and class libraries for managing analytical data throughout its lifecycle. The
talk describes how laboratory data and their semantic metadata descriptions are brought together to ease the management of vast amount of data that underpin almost
every aspect of drug discovery and development.
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...Ontotext
This webinar continues series are demonstrating how linked open data and semantic tagging of news can be used for comprehensive media monitoring, market and business intelligence. The platform for the demonstrations is FactForge: a hub for news and data about people, organizations, and locations (POL). FactForge embodies a big knowledge graph (BKG) of more than 1 billion facts that allows various analytical queries, including tracing suspicious patterns of company control; media monitoring of people, including companies owned by them, their subsidiaries, etc.
Allotrope foundation vanderwall_and_little_bio_it_world_2016OSTHUS
Allotrope Foundation is building a framework (a software toolkit) to embed a set of federated, public, non-proprietary standards for analytical data in software utilized throughout the entire analytical chemistry data lifecycle, and serves as a basis for providing controlled vocabularies and taxonomies for a variety of pharmaceutical and biotech R&D applications. This framework provides extended capabilities to build in business rules and other analytics on top of the standardized vocabularies allowing companies enhanced abilities to classify and manage their data. Legacy systems can be maintained more easily and new technologies including cloud databases, Big Data Analytics, or reasoning engines can be employed to allow researchers unprecedented access to important contextualized data, because the foundational class structure is common and highly extensible to new and expanding domains. We will briefly describe some of the current data integration and management challenges facing the industry, e.g., utilization of legacy data warehouses, the creation of new data lakes, integration of existing semantic models, cloud-scale applications and how the Allotrope Framework provides a semantic basis for improved metadata and master data management through the use of modularized semantic models that capture the most pertinent entities, attributes and relationships needed to capture the plethora of laboratory data. We will provide an update on the rapid progress of development and the release of the Allotrope Framework 1.0, including: the Allotrope Data Format (for data and semantically-described metadata), Allotrope Taxonomies, and the first release of APIs (application programming interfaces), and how Allotrope Member companies have begun to integrate these into their internal environments. We will then discuss some of the potential extensions of this framework, which in the future, could enable state-of-the-art data integration and analytics capabilities for various applications.
Semantics for Integrated Analytical Laboratory Processes – the Allotrope Pers...OSTHUS
The software environment currently found in the analytical community consists of a patchwork of incompatible software, proprietary and non-standardized file formats, which is further complicated by incomplete, inconsistent and potentially inaccurate metadata. To overcome these issues, Allotrope Foundation is developing a comprehensive and innovative framework consisting of metadata dictionaries, data standards, and class libraries for managing analytical data throughout its life cycle. In this talk we describe how laboratory data and semantic metadata descriptions are brought together to ease the management of a vast amount of data that underpins almost every aspect of drug discovery and development.
OKE Challenge was hosted by European Semantic Web Conference ESWC, 3-7 June 2018, held in Heraklion, Crete, Greece (Aldemar Knossos Royal & Royal Villa).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
This presentation will provide a brief introduction to logical reasoning and overview of the most popular semantic schema and ontology languages: RDFS and the profiles of OWL 2.
While automatic reasoning has always inspired the imagination, numerous projects have failed to deliver to the promises. The typical pitfalls related to ontologies and symbolic reasoning fall into two categories:
- Over-engineered ontologies. The selected ontology language and modeling patterns can be too expressive. This can make the results of inference hard to understand and verify, which in its turn makes KG hard to evolve and maintain. It can also impose performance penalties far greater than the benefits.
- Inappropriate reasoning support. There are many inference algorithms and implementation approaches, which work well with taxonomies and conceptual models of few thousands of concepts, but cannot cope with KG of millions of entities.
- Inappropriate data layer architecture. One such example is reasoning with virtual KG, which is often infeasible.
Semantics for integrated laboratory analytical processes - The Allotrope Pers...OSTHUS
The software environment currently found in the analytical community consists of a patchwork of incompatible software, proprietary and non-standardized file formats,
which is further complicated by incomplete, inconsistent and potentially inaccurate metadata. To overcome these issues, the Allotrope Foundation develops a
comprehensive and innovative Framework consisting of metadata dictionaries, data standards, and class libraries for managing analytical data throughout its lifecycle. The
talk describes how laboratory data and their semantic metadata descriptions are brought together to ease the management of vast amount of data that underpin almost
every aspect of drug discovery and development.
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...Ontotext
This webinar continues series are demonstrating how linked open data and semantic tagging of news can be used for comprehensive media monitoring, market and business intelligence. The platform for the demonstrations is FactForge: a hub for news and data about people, organizations, and locations (POL). FactForge embodies a big knowledge graph (BKG) of more than 1 billion facts that allows various analytical queries, including tracing suspicious patterns of company control; media monitoring of people, including companies owned by them, their subsidiaries, etc.
Allotrope foundation vanderwall_and_little_bio_it_world_2016OSTHUS
Allotrope Foundation is building a framework (a software toolkit) to embed a set of federated, public, non-proprietary standards for analytical data in software utilized throughout the entire analytical chemistry data lifecycle, and serves as a basis for providing controlled vocabularies and taxonomies for a variety of pharmaceutical and biotech R&D applications. This framework provides extended capabilities to build in business rules and other analytics on top of the standardized vocabularies allowing companies enhanced abilities to classify and manage their data. Legacy systems can be maintained more easily and new technologies including cloud databases, Big Data Analytics, or reasoning engines can be employed to allow researchers unprecedented access to important contextualized data, because the foundational class structure is common and highly extensible to new and expanding domains. We will briefly describe some of the current data integration and management challenges facing the industry, e.g., utilization of legacy data warehouses, the creation of new data lakes, integration of existing semantic models, cloud-scale applications and how the Allotrope Framework provides a semantic basis for improved metadata and master data management through the use of modularized semantic models that capture the most pertinent entities, attributes and relationships needed to capture the plethora of laboratory data. We will provide an update on the rapid progress of development and the release of the Allotrope Framework 1.0, including: the Allotrope Data Format (for data and semantically-described metadata), Allotrope Taxonomies, and the first release of APIs (application programming interfaces), and how Allotrope Member companies have begun to integrate these into their internal environments. We will then discuss some of the potential extensions of this framework, which in the future, could enable state-of-the-art data integration and analytics capabilities for various applications.
Semantics for Integrated Analytical Laboratory Processes – the Allotrope Pers...OSTHUS
The software environment currently found in the analytical community consists of a patchwork of incompatible software, proprietary and non-standardized file formats, which is further complicated by incomplete, inconsistent and potentially inaccurate metadata. To overcome these issues, Allotrope Foundation is developing a comprehensive and innovative framework consisting of metadata dictionaries, data standards, and class libraries for managing analytical data throughout its life cycle. In this talk we describe how laboratory data and semantic metadata descriptions are brought together to ease the management of a vast amount of data that underpins almost every aspect of drug discovery and development.
OKE Challenge was hosted by European Semantic Web Conference ESWC, 3-7 June 2018, held in Heraklion, Crete, Greece (Aldemar Knossos Royal & Royal Villa).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
The use of R statistical package in controlled infrastructure. The case of Cl...Adrian Olszewski
Facts and myths on the use of the R statistical package in controlled, validated environments by the example of Clinical Research in the pharmaceutical industry. This is the first part constituting the introduction. Technical details will be presented in the part II.
This document was presented at a conference organized by Polish National Group of the International Society for Clinical Biostatistics.
ICIC 2013 Conference Proceedings Nicolas Lalyre SyngentaDr. Haxel Consult
Do indexing systems of bibliographic databases meet today’s user needs and expectations?
Nicolas Lalyre (Syngenta, Switzerland)
Gerhard Fischer (Syngenta, Switzerland)
The ever increasing complexity of search requests and volume of documents, the need for value-add information delivery and short timelines require appropriate indexing systems for effective information retrieval and evaluation. Indexing systems have been developed some decades ago and the question is if their development kept track to meet today’s needs and expectations. The increased use of full-text databases might be an alarming sign for bibliographic database producers to update their indexing systems.
Some indexing systems lack transparency as they are only disclosed in part or not at all which may hinder their usability and further development.
FAIR Workflows: A step closer to the Scientific Paper of the Futuredgarijo
Keynote presented at the Computational and Autonomous Workflows (CAW-2021) at the Oak Ridge National Laboratory. The keynote describes an overview of the different aspects to take into account when aiming to create FAIR workflows and associated resources.
Making the data available with AMBIT cheminformatics platformNina Jeliazkova
Recent developments in AMBIT http://ambit.sf.net , presented by Dr. Nickolay Kochev at the EBI Industry workshop on Molecular Informatics Open Source Software (MIOSS) May 18-19
http://www.ebi.ac.uk/industry/workshops
SWIMing VoCamp 2016 - ifcOWL overview and current statePieter Pauwels
Presentation at the 2016 SWIMing VoCamp on 22-23 March 2016 in Dublin (http://phaedrus.scss.tcd.ie/buildviz/workshop/vocamp/march2016/): "ifcOWL overview and current state".
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
Slides presented at the DBpedia Day, at the Semantcis conference in 2021. FOOPS! (available at https://w3id.org/foops) is a validator based on the FAIR principles that will guide users when conforming their ontologies to them. For each principle, FOOPS! runs a series of tests and notifies errors, suggestions and ways to conform to the best practices.
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...Dr. Haxel Consult
Christopher Southan (The IUPHAR/BPS Guide to PHARMACOLOGY, UK)
While the raison d'être of patents is Intellectual Property (IP) there is a growing awareness of the scientific value of their data content. This is particularly so in medicinal chemistry and associated bioactivity domains where disclosed compounds and associated data not only exceeds that published in papers by several-fold and surfaces years earlier, but is also, paradoxically; completely open (i.e. no paywalls). Scientists have traditionally extracted their own relationships or used commercial sources but the last few years have seen a “big bang” in patent extractions submitted to open databases, including nearly 20 million structures now in PubChem.
This tutorial will:
Outline the statistics of patent chemistry in various open sources
Introduce a spectrum of open resources and tools
Enable an understanding of target identification, bioactivity and SAR extraction from patents and connecting these relationships to papers
Cover aspects of medicinal chemistry patent mining
Include hands on exercises using open source antimalarial research as examples
The focus will be on public databases and patent office portals, since these can be transparently demonstrated. However, the essential complementarity with commercial resources will be touched on. Those engaged in Competitive Intelligence will also find the material relevant.
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?Dr. Haxel Consult
Fernando Huerta (RISE Bioscience & Materials, SE)
Alexander Minidis (Collaborative Drug Discovery - CDD VAULT, Sweden)
How much information does the scientists need to design new potential drugs?
A thorough overview of public scientific information sources (open access) and methods to collect, process, analyse and visualize this information will be presented. A direct application of such free available information in conjunction with freeware will be described in relation with the efforts of the scientific community to find effective medicines for the ZIKA virus.
SOMEF: a metadata extraction framework from software documentationdgarijo
Presentation done at the council of software registries on March, 2021. SOMEF is a python package for automatically extracting over 25 metadata categories from a readme file. The output is then exported in JSON or in JSON-LD using the codemeta representation
On chemical structures, substances, nanomaterials and measurementsNina Jeliazkova
"On chemical structures, substances, nanomaterials and measurements"
Nina Jeliazkova, Ideaconsult
This talk attempts to highlight how I came to recognize the fundamental role of measurements, coming from the realm of data modelling and data analysis. Besides retaining the data provenance it provides insights how do we go beyond chemical structures and address the challenges of representing the identity of chemical substances and nanomaterials (with examples from the latest developments of AMBIT web services and OpenTox API). Finally, supporting the vision of distributed, open, web-like approach towards recording subtle experimental details is essential, not only for the chemists and biologists in the labs, but for all of us using, modelling, storing and querying the data.
Presented July 14,2014 in Cambridge , UK
Defining the Future for Open Notebook Science – A Memorial Symposium Celebrating the Work of Jean-Claude Bradley
http://inmemoriamjcb.wikispaces.com/Jean-Claude+Bradley+Memorial+Symposium
The concept of the new STN platform
Jan Baur (FIZ Karlsruhe, Germany)
A new STN platform is being designed and developed to turn the needs and priorities of today's patent experts into a state-of-the-art search and analysis system that works the way users do.
This presentation describes how the project-oriented workflow concept of the new STN leads to improved workflow continuity and efficiency, and how the new platform allows users to interact simultaneously with their queries and search results. A variety of innovative functional aspects of the new STN that improve efficiency, such as instant analysis and refinement capabilities, and vastly increased power for text and chemical structure searches, will be showcased.
This presentation was made during the a tutorial "A practical introduction to Linked Data: lift tour data" organised by IGN (France), EURECOM, UPM, Ordnance Survey, within the INSPIRE conference in Istanbul, from june 23th - june 27th.
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...OSTHUS
The Allotrope Foundation is a consortium of major pharmaceutical companies and a partner network whose goal is to address challenges in the pharmaceutical industry by providing a set of public, non-proprietary standards for using and integrating analytical laboratory data. Current challenges in data management within the pharmaceutical industry often center around inconsistent or incomplete data and metadata and proprietary data formats. Because of a lack of standardization, several operations (e.g. integration of instruments/applications, transfer of methods or results, archiving for regulatory purposes) require unnecessary efforts. Further, higher level aggregation of data, e.g. regulatory filings, that are derived from multiple sources of laboratory data are costly to create. These unnecessary costs impact operations within a company’s laboratories, between partnering companies, and between a company and contract research organizations (CROs). Finally, the accelerating transition of laboratories from hybrid (paper + electronic) to purely electronic data streams, coupled with an ever-increasing regulatory scrutiny of electronic data management practices, further require a comprehensive solution. This talk will discuss how The Allotrope Foundation is providing a new framework for data standards through collaboration between numerous stakeholders.
The use of R statistical package in controlled infrastructure. The case of Cl...Adrian Olszewski
Facts and myths on the use of the R statistical package in controlled, validated environments by the example of Clinical Research in the pharmaceutical industry. This is the first part constituting the introduction. Technical details will be presented in the part II.
This document was presented at a conference organized by Polish National Group of the International Society for Clinical Biostatistics.
ICIC 2013 Conference Proceedings Nicolas Lalyre SyngentaDr. Haxel Consult
Do indexing systems of bibliographic databases meet today’s user needs and expectations?
Nicolas Lalyre (Syngenta, Switzerland)
Gerhard Fischer (Syngenta, Switzerland)
The ever increasing complexity of search requests and volume of documents, the need for value-add information delivery and short timelines require appropriate indexing systems for effective information retrieval and evaluation. Indexing systems have been developed some decades ago and the question is if their development kept track to meet today’s needs and expectations. The increased use of full-text databases might be an alarming sign for bibliographic database producers to update their indexing systems.
Some indexing systems lack transparency as they are only disclosed in part or not at all which may hinder their usability and further development.
FAIR Workflows: A step closer to the Scientific Paper of the Futuredgarijo
Keynote presented at the Computational and Autonomous Workflows (CAW-2021) at the Oak Ridge National Laboratory. The keynote describes an overview of the different aspects to take into account when aiming to create FAIR workflows and associated resources.
Making the data available with AMBIT cheminformatics platformNina Jeliazkova
Recent developments in AMBIT http://ambit.sf.net , presented by Dr. Nickolay Kochev at the EBI Industry workshop on Molecular Informatics Open Source Software (MIOSS) May 18-19
http://www.ebi.ac.uk/industry/workshops
SWIMing VoCamp 2016 - ifcOWL overview and current statePieter Pauwels
Presentation at the 2016 SWIMing VoCamp on 22-23 March 2016 in Dublin (http://phaedrus.scss.tcd.ie/buildviz/workshop/vocamp/march2016/): "ifcOWL overview and current state".
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
Slides presented at the DBpedia Day, at the Semantcis conference in 2021. FOOPS! (available at https://w3id.org/foops) is a validator based on the FAIR principles that will guide users when conforming their ontologies to them. For each principle, FOOPS! runs a series of tests and notifies errors, suggestions and ways to conform to the best practices.
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...Dr. Haxel Consult
Christopher Southan (The IUPHAR/BPS Guide to PHARMACOLOGY, UK)
While the raison d'être of patents is Intellectual Property (IP) there is a growing awareness of the scientific value of their data content. This is particularly so in medicinal chemistry and associated bioactivity domains where disclosed compounds and associated data not only exceeds that published in papers by several-fold and surfaces years earlier, but is also, paradoxically; completely open (i.e. no paywalls). Scientists have traditionally extracted their own relationships or used commercial sources but the last few years have seen a “big bang” in patent extractions submitted to open databases, including nearly 20 million structures now in PubChem.
This tutorial will:
Outline the statistics of patent chemistry in various open sources
Introduce a spectrum of open resources and tools
Enable an understanding of target identification, bioactivity and SAR extraction from patents and connecting these relationships to papers
Cover aspects of medicinal chemistry patent mining
Include hands on exercises using open source antimalarial research as examples
The focus will be on public databases and patent office portals, since these can be transparently demonstrated. However, the essential complementarity with commercial resources will be touched on. Those engaged in Competitive Intelligence will also find the material relevant.
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?Dr. Haxel Consult
Fernando Huerta (RISE Bioscience & Materials, SE)
Alexander Minidis (Collaborative Drug Discovery - CDD VAULT, Sweden)
How much information does the scientists need to design new potential drugs?
A thorough overview of public scientific information sources (open access) and methods to collect, process, analyse and visualize this information will be presented. A direct application of such free available information in conjunction with freeware will be described in relation with the efforts of the scientific community to find effective medicines for the ZIKA virus.
SOMEF: a metadata extraction framework from software documentationdgarijo
Presentation done at the council of software registries on March, 2021. SOMEF is a python package for automatically extracting over 25 metadata categories from a readme file. The output is then exported in JSON or in JSON-LD using the codemeta representation
On chemical structures, substances, nanomaterials and measurementsNina Jeliazkova
"On chemical structures, substances, nanomaterials and measurements"
Nina Jeliazkova, Ideaconsult
This talk attempts to highlight how I came to recognize the fundamental role of measurements, coming from the realm of data modelling and data analysis. Besides retaining the data provenance it provides insights how do we go beyond chemical structures and address the challenges of representing the identity of chemical substances and nanomaterials (with examples from the latest developments of AMBIT web services and OpenTox API). Finally, supporting the vision of distributed, open, web-like approach towards recording subtle experimental details is essential, not only for the chemists and biologists in the labs, but for all of us using, modelling, storing and querying the data.
Presented July 14,2014 in Cambridge , UK
Defining the Future for Open Notebook Science – A Memorial Symposium Celebrating the Work of Jean-Claude Bradley
http://inmemoriamjcb.wikispaces.com/Jean-Claude+Bradley+Memorial+Symposium
The concept of the new STN platform
Jan Baur (FIZ Karlsruhe, Germany)
A new STN platform is being designed and developed to turn the needs and priorities of today's patent experts into a state-of-the-art search and analysis system that works the way users do.
This presentation describes how the project-oriented workflow concept of the new STN leads to improved workflow continuity and efficiency, and how the new platform allows users to interact simultaneously with their queries and search results. A variety of innovative functional aspects of the new STN that improve efficiency, such as instant analysis and refinement capabilities, and vastly increased power for text and chemical structure searches, will be showcased.
This presentation was made during the a tutorial "A practical introduction to Linked Data: lift tour data" organised by IGN (France), EURECOM, UPM, Ordnance Survey, within the INSPIRE conference in Istanbul, from june 23th - june 27th.
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...OSTHUS
The Allotrope Foundation is a consortium of major pharmaceutical companies and a partner network whose goal is to address challenges in the pharmaceutical industry by providing a set of public, non-proprietary standards for using and integrating analytical laboratory data. Current challenges in data management within the pharmaceutical industry often center around inconsistent or incomplete data and metadata and proprietary data formats. Because of a lack of standardization, several operations (e.g. integration of instruments/applications, transfer of methods or results, archiving for regulatory purposes) require unnecessary efforts. Further, higher level aggregation of data, e.g. regulatory filings, that are derived from multiple sources of laboratory data are costly to create. These unnecessary costs impact operations within a company’s laboratories, between partnering companies, and between a company and contract research organizations (CROs). Finally, the accelerating transition of laboratories from hybrid (paper + electronic) to purely electronic data streams, coupled with an ever-increasing regulatory scrutiny of electronic data management practices, further require a comprehensive solution. This talk will discuss how The Allotrope Foundation is providing a new framework for data standards through collaboration between numerous stakeholders.
MOCHA Challenge was hosted by European Semantic Web Conference ESWC, 3-7 June 2018, held in Heraklion, Crete, Greece (Aldemar Knossos Royal & Royal Villa).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs and so forth. Don’t stop reading. Data management isn’t likely to win anyone a Nobel prize. But publications should be supported and accompanied by data, methods, procedures, etc. to assure reproducibility of results. Funding agencies expect data (and increasingly software) management retention and access plans as part of the proposal process for projects to be funded. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems Biology demands the interlinking and exchange of assets and the systematic recording
of metadata for their interpretation.
The FAIR Guiding Principles for scientific data management and stewardship (http://www.nature.com/articles/sdata201618) has been an effective rallying-cry for EU and USA Research Infrastructures. FAIRDOM (Findable, Accessible, Interoperable, Reusable Data, Operations and Models) Initiative has 8 years of experience of asset sharing and data infrastructure ranging across European programmes (SysMO and EraSysAPP ERANets), national initiatives (de.NBI, German Virtual Liver Network, UK SynBio centres) and PI's labs. It aims to support Systems and Synthetic Biology researchers with data and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety.
This talk will use the FAIRDOM Initiative to discuss the FAIR management of data, SOPs, and models for Sys Bio, highlighting the challenges of and approaches to sharing, credit, citation and asset infrastructures in practice. I'll also highlight recent experiments in affecting sharing using behavioural interventions.
http://www.fair-dom.org
http://www.fairdomhub.org
http://www.seek4science.org
Presented at COMBINE 2016, Newcastle, 19 September.
http://co.mbine.org/events/COMBINE_2016
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)Dag Endresen
European genebanks, EURISCO and NGB. Overview of needs and priorities. CGIAR ICT-KM training workshop on information interoperability, 14th June 2005, IPGRI Rome Italy. Dag Endresen (Nordic Gene Bank).
Presentation of the paper "Primers or Reminders? The Effects of Existing Review Comments on Code Review" published at ICSE 2020.
Authors:
Davide Spadini, Gül Calikli, Alberto Bacchelli
Link to the paper: https://research.tudelft.nl/en/publications/primers-or-reminders-the-effects-of-existing-review-comments-on-c
This slide deck provides an overview to WSO2 Big data platform and discuss some of its customer case studies and applications. It discuss Big Data in general, real time analytics WSO2 CEP, batch analytics WSO2 BAM, and new products like predictive analytics with WSO2 Machine Learner. For more information, please reach us though architecture@wso2.org.
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BigData_Europe
Overview of Open PHACTS, the BDE Pilot project in SC1, presented at BDE SC1 Workshop 3, 13 December, 2017.
https://www.big-data-europe.eu/the-final-big-data-europe-workshop/
Fairification experience clarifying the semantics of data matricesPistoia Alliance
This webinar presents the Statistics Ontology, STATO which is a semantic framework to support the creation of standardized analysis reports to help with review of results in the form of data matrices. STATO includes a hierarchy of classes and a vocabulary for annotating statistical methods used in life, natural and biomedical sciences investigations, text mining and statistical analyses.
WSO2 Machine Learner takes data one step further, pairing data gathering and analytics with predictive intelligence: this helps you understand not just the present, but to predict scenarios and generate solutions for the future.
Ontology-based data access: why it is so cool!Josef Hardi
A brief introduction about ontology-based data access (shortly OBDA) and its core implementation. I presented too a recent simple benchmark between -ontop- and Semantika---two most available software for OBDA framework---in term of query performance (including details in the appendix section). The slides were presented for Friday Research Meeting in Stanford Center for Biomedical Informatics Research (BMIR).
License: Creative Commons by Attribution 3.0
Querying a Complex Web-Based KB for Cultural Heritage PreservationEster Giallonardo
Monitoring the status of cultural heritage for its preservation is a complex activity that can be performed by human experts with the support of technology. This complexity is due to the need of considering several related aspects: parts of buildings, materials, damages, environmental conditions, risk factors, etc. Querying this complex knowledge base could benefit from a very expressive ontology and an efficient technology support for reducing response times especially when implicit knowledge needs to be inferred in Web applications. We adopt the Cidoc4HeriwarD ontology as a benchmark to analyse two different technologies for storing and querying semantic data: an open source one based on relational data bases and a commercial solution with a native triple store.
Similar to Benchmarking Commercial RDF Stores with Publications Office Dataset (20)
What do a consumer goods manufacturer and a credit insurance group have in common? Both are subject to a variety of risks which, if not detected, may dramatically impact their operations and bottom lines. Delve into the challenges of putting together a semantic, technology-based business solution that monitors and reacts to a large amount of consumer feedback in real time, providing insights on consumer product quality. Hear how this approach assists credit risk analysts in the early detection of signals and events affecting companies’ solvency to anticipate default risks of targeted companies. Walk through this journey to solve real-world problems with business intelligence solutions based on semantic data and technologies.
Slides of my PhD presentation @ Eurecom, presenting our work on publishing and consuming geo-spatial data and government data using Semantic Web technologies.
Slides of the talk during Terracognita 2014 in RIVA del GARDA, where the authors presented the description of ontologies for geometries, coordinate reference systems and publication of French Administrative Units on the Web. Paper can be downloaded at http://event.cwi.nl/terracognita2014/terra2014_1.pdf.
Slides of the paper presented at #COLD2014 available at http://ceur-ws.org/Vol-1264/cold2014_AtemezingT.pdf, on building a Linked-data Visualization Wizard.
Information Content based Ranking Metric for Linked Open VocabulariesGhislain Atemezing
This talk was presented in Leipzig, during the SEMANTiCS '2014 Conference, in September. It basically gives an overview of how Information Content Theory metrics can be applied to Semantic Web, and especially to vocabularies. The results of the proposed ranking metrics can be applied in three areas: (1) vocabulary life-cycle management, (ii) semantic web visualizations and (iii) Interlinking process.
Harmonizing services for LOD vocabularies: a case studyGhislain Atemezing
This presentation describes a solution on how to align well-know services with the aim of managing and harmonizing vocabularies' metadata, with a special use case on prefix.cc.
The Internet of Things (IoT) is a revolutionary concept that connects everyday objects and devices to the internet, enabling them to communicate, collect, and exchange data. Imagine a world where your refrigerator notifies you when you’re running low on groceries, or streetlights adjust their brightness based on traffic patterns – that’s the power of IoT. In essence, IoT transforms ordinary objects into smart, interconnected devices, creating a network of endless possibilities.
Here is a blog on the role of electrical and electronics engineers in IOT. Let's dig in!!!!
For more such content visit: https://nttftrg.com/
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...ssuser7dcef0
Power plants release a large amount of water vapor into the
atmosphere through the stack. The flue gas can be a potential
source for obtaining much needed cooling water for a power
plant. If a power plant could recover and reuse a portion of this
moisture, it could reduce its total cooling water intake
requirement. One of the most practical way to recover water
from flue gas is to use a condensing heat exchanger. The power
plant could also recover latent heat due to condensation as well
as sensible heat due to lowering the flue gas exit temperature.
Additionally, harmful acids released from the stack can be
reduced in a condensing heat exchanger by acid condensation. reduced in a condensing heat exchanger by acid condensation.
Condensation of vapors in flue gas is a complicated
phenomenon since heat and mass transfer of water vapor and
various acids simultaneously occur in the presence of noncondensable
gases such as nitrogen and oxygen. Design of a
condenser depends on the knowledge and understanding of the
heat and mass transfer processes. A computer program for
numerical simulations of water (H2O) and sulfuric acid (H2SO4)
condensation in a flue gas condensing heat exchanger was
developed using MATLAB. Governing equations based on
mass and energy balances for the system were derived to
predict variables such as flue gas exit temperature, cooling
water outlet temperature, mole fraction and condensation rates
of water and sulfuric acid vapors. The equations were solved
using an iterative solution technique with calculations of heat
and mass transfer coefficients and physical properties.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Benchmarking Commercial RDF Stores with Publications Office Dataset
1. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Benchmarking Commercial RDF Stores with Publications
Office Dataset
Ghislain Auguste Atemezing, Ph.D1
1Mondeca, 35 Boulevard de Strasbourg, 75010, Paris, France,
Twitter: @gatemezing
Web: http://www.mondeca.com
Benchmark material: https://github.com/gatemezing/posb
04th June, 2018
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 1 / 24
2. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Agenda
1 Mondeca in a nutshell
Who we are
Why do clients come to us
2 Benchmark Context
3 Publications Office of the EU Datasets
Data Workflow & Use cases
Ontology
Datasets
Requirements
4 Benchmark Configuration
Experimental set up
5 Query Analysis
Instantaneous Queries
Analytical Queries
Read/Write queries
6 Benchmark Results
7 Conclusion
8 Aknowledgements
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 2 / 24
3. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Who we are
Mondeca in a nutshell
Located in Paris, France
Leading French semantic technology solution provider since 1999
SMA : agile and flat structure
Our solution : Smart Content Factory combines data management + content
annotation + semantic search.
Major clients in publishing activities(e.g.,Turner, AP, NPR), Insurance domain,
goods industry, national government and international organizations
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 3 / 24
4. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Why do clients come to us
Mondeca in a nutshell
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 4 / 24
5. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Why Benchmarking PO datasets?
1 To match the current and planned use cases of the Publications Office of the
European Union (OP) w.r.t current state-of-art of RDF stores
2 To analyze deeply both functional requirements and documentation of 7
commercial RDF stores : Virtuoso, GraphDB, Neo4j, Stardog, Oracle,
Blazegraph and Marklogic.
3 To document and motivate the choice of a given RDF stores based on key
requirements defined internally after interviews.
The end goal of the study is to identify
the RDF Store(s) that will best match the
OP’s planned use cases and requirements
in terms of scalability, stability and
reliability.
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 5 / 24
6. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Bench Context - OP
Publications Office of the European Union
publishes the daily Official Journal of the European Union in 23 official EU
languages (24 when Irish is required).
produces and disseminates of legal and general publications in a variety of paper
and electronic formats
Online services
EUR-Lex 1 : provides free access to European Union law
EU Bookshop : the online library and bookshop of publications from the
institutions and other bodies of the EU.
EU Open Data Portal is the single point of access to data from the institutions and
other bodies of the European Union.
Eurovoc : is a multilingual, multidisciplinary thesaurus covering the activities of the
EU
Whoiswho is the official directory of the EU.
CORDIS : repository and portal for EU-funded research projects
1. http://eur-lex.europa.eu/
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 6 / 24
7. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Data Workflow & Use cases
CELLAR RDF is the semantic repository at
OP, with ODP store featuring Linked Data
applications.
Current RDF usage/ Wish-list
Volume : approximately 730 million
triples.
The size of the RDF store increase
500 million triples after 2 years
OP foresees a volume of 1,5 billion
triples in the next 2 years as a
minimum.
Wish : handle 10x today’s volume
(ca.7 billion triples.)
OP receives 100k to 200k SPARQL
queries / day with strong growth. The
target architecture must handle 2mio
queries/day minimum
Search via browse by subject tab :
http ://publications.europa.eu/en/browse-
by-subject
example : https ://goo.gl/Yci9Nz
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 7 / 24
8. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Ontology
OP Dataset - Ontology
Common Data Model (CDM)
CDM is the ontology used to generate
RDF dataset at OOPCE
CDM is based on FRBR model to
represent work, expression, and
manifestation
Instances in PROD dataset
Dataset with 187 instantiated classes
covering 61% of CDM
4,958,220 blank nodes
Top 3 classes : cdm:item (4.77%);
cdm:expression (4.52%) and
cdm:manifestation (2.30%)
CDM ontology statistics
Metric Number
Class 308
Object Property 803
Data Property 690
SubClassOf 615
SubObjectProp. 485
InverseObjectProp. 248
SubDataProperty 405
DL Expressivity ALHOIQ(D)
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 8 / 24
9. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Datasets
OP Dataset - Explicit knowledge
The values in the tables are explicit triples in the knowledge base.
Top five instances by class in PROD dataset
Class #Instance Percentage
cdm:item 34,747,955 4.77
cdm:expression 32,898,325 4.52
cdm:manifestation 16,768,690 2.30
cdm:work 7,771,103 1.06
cdm:resource_legal 7,674,632 1.05
Size of dump datasets
Dataset name Disk size #Files #Triples RDF
format
Normalized (.zip) 226 GB 2,195 727,442,978 NQUADS
Non normalized (.tgz) 12 GB 64 728,163,464 NQUADS
NAL dataset 282 MB 72 402,926 RDF/XML
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 9 / 24
10. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Requirements
RDF Stores Killer Requirements
Blazegraph v2.1.4 Open Edition
Poor results in the earlier stage of the bench : (i) too slow in loading data (90h
43min, almost 4 days!!)
Too many time out (15) in first test in queries from category 1
No support at all on repeated requests from them to improve results or validate
our configuration file.
Neo4J
All loading tests aborted after 40h 27min 2.
Need to port the code (ad-hoc importRDF) for each Neo4j upgrade : blueprints,
tinkerpop, gremlin
Too much maintenance on this stack.
2. A work in progress with Neo4J techs to improve our ad-hoc RDF import loader
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 10 / 24
11. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Experimental set up
Bench Configuration
Hardware Server
CPU : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz , 6C/12T
RAM : 128 GB; Disk capacity : 4 TO SATA.
Operating System : CentOS 7, 64 bits and Java 1.8.0 running.
Marklogic FO
CPU : Intel(R) Xeon(R) E3 1245 v5 4c/8T @ 3.5GHz
RAM : 64 GB; Disk storage : 3 x 500 Go SSD
Tools for benchmark
JENA qparse tool to validate all the queries
Open tool Sparql Query Benchmarker 3 used with 20 runs per categories to
warm up the server; 5 runs for current benchmark
3. https://github.com/rvesse/sparql-query-bm
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 11 / 24
12. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Experimental set up
Triple stores setup
Virtuoso
NumberOfBuffers = 5450000 and MaxDirtyBuffers = 4000000
Stardog
Set Java heap size = 16GB and MaxDirectMemorySize = 8GB.
Deactivation of the strict parsing option, SL option by default
GraphDB
Set entity index size to 500000000 with entity predicate list enabled,
Disabling the content index.
Oracle
pga_aggregate_limit = 64GB and pga_aggregate_target = 32G
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 12 / 24
13. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Instantaneous Queries
Query Analysis
FIGURE – Queries of Category 1
20 instantaneous queries
Query form #Total
SELECT 16
DESCRIBE 3
CONSTRUCT 1
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 13 / 24
14. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Analytical Queries
Query Analysis
FIGURE – Queries of Category 2
Analytical queries
24 queries
Query form : SELECT
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 14 / 24
15. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Read/Write queries
Query Analysis
All the queries were gathered and developed by OP’s metadata teams.
The queries were originally optimized for Virtuoso.
The results in the quantitative benchmark are probably biased in favor of the
current triple store.
To remove the bias, we asked to other vendors to provide us with optimized
queries for their engines.
We present the results of the quantitative
study, which is part of a more global study
containing 66 functional requirements .
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 15 / 24
16. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Bulk loading
PROD Dataset (727Mio) : ranking order -> Virtuoso (3.8h), Stardog (4.59),
Marklogic (5.83), Oracle (23.07) and GraphDB (35.64). Oracle optimized to 8h!!
2Bio 4 : ranking order -> Virtuoso (13.01h), Stardog (13.30), GraphDB (17.46),
Marklogic (27.96) and Oracle (43.7). Oracle optimized to 32h!!
5Bio : ranking order -> Virtuoso (36.10h), GraphDB (44.14), Marklogic (169.95),
Stardog (unsuccessful), Oracle (N/A)
4. Generated by postfixing resources of type publications.europa.eu/resource/cellar
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 16 / 24
17. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Results Category 1 - time out=60s
Virtuoso is faster than all the rest of
the triple stores.
No time out with Virtuoso. Marklogic (1
time out), Oracle (2 time out), Stardog
(2 time out), GraphDB (4 time out) and
Blazegraph ( 15 time out).
Stardog performs poorly compared to
GraphDB and Oracle.
Blazegraph was removed after this
test.
Marklogic is NOT constant in
multithreading.
Stardog performs poorly compared to
GraphDB and Oracle.
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 17 / 24
18. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Results Category 2 - time out=600s
Virtuoso is faster than all the rest of
the triple stores.
No time out with Virtuoso, GraphDB
and Marklogic
1 timed out query (Q10) with Oracle.
4 timed out (Q15, Q16, Q19, Q22)
with Stardog.
Bench analytic queries ranking
RDF
Stores
#Time
Out
Rank
Virtuoso 0 1
Stardog 4 6
GraphDB
EE
0 3
GraphDB
EE
RDFS+
0 2
Marklogic 0 5
Oracle
12c
1 4
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 18 / 24
19. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Results Category 3 - time out=10s
01 CONSTRUCT; 01 DELETE/INSERT and 03 INSERT IN query.
Virtuoso is faster, followed by Marklogic and GraphDB.
Oracle performs worse in monothread
Stardog and Oracle scores are significantly lower than Marklogic and GraphDB
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 19 / 24
20. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Results Category 3 - time out=10s
Oracle performs better in multithread
scenario. Why? -> Index/disk
calibration?!
Stardog is constant in magnitude of
QMpH.
GraphDB and Marklogic have
significant changes from 5 clients to
20 clients.
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 20 / 24
21. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Stability Test
Stress test on triple stores using instantaneous queries. (category 1)
The test starts by specifying the number of parallel clients = 128.
Each client completes the run of the mix queries in parallel.
The number of parallel clients is then multiplied by 2 and the process is repeated.
This repeats until either the maximum runtime (180min) or the maximum number
of threads are reached.
Result Stress Test
Stardog and Oracle finished with the limit of the parallel threads.
Virtuoso and GraphDB completed the test after 180 min, reaching 256 parallel
threads.
GraphDB shows fewer errors compared to Virtuoso.
GraphDB is likely to be more stable respectively in this order to Stardog, Oracle
and Virtuoso.
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 21 / 24
22. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
What We learned
Lessons learned
3 out of the 5 RDF stores come close to the key requirements (Robustness,
Scalability, Reliability and Stability)
None of the RDF stores perfectly matches OP’s business cases
When pushed to their limits, all of the RDF stores require extensive support from
the vendors (e.g., case of Oracle 12c and GraphDB)
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 22 / 24
23. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Conclusions
We have presented a quantitative comparison of 5 commercial RDF stores :
Virtuoso, GraphDB, Oracle, Stardog and Marklogic based on OP datasets and
requirements.
The results show that Virtuoso and Stardog are faster in bulk loading.
Virtuoso outperforms respectively to GraphDB, Stardog and Oracle in
query-based performance.
GraphDB shows to be the winner in the stability test performed in this benchmark.
This study gives an overview of the current state of RDF stores performance with
respect to PO’s dataset
This work can be partly used to assess enterprise RDF stores
We plan to get query rewrites for all the stores vendors and evaluate the results
We also plan to perform the same benchmark on AWS Neptune 5
We plan to better compare this work with state-of-the-art benchmarking, maybe
using IGUANA framework.
5. http://blog.mondeca.com/2018/02/09/requetes-sparql-avec-neptune/
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 23 / 24
24. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Acknowledgements
We would like to thank RDF teams at Onto-
text, Stardog Union, Oracle, Marklogic and
OpenLink.
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 24 / 24