SlideShare a Scribd company logo

Benchmarking Commercial RDF Stores with Publications Office Dataset

The slides present a benchmark of RDF stores with real-world datasets and queries from the EU Publications Office (PO). The study compares the performance of four commercial triple stores: Stardog 4.3 EE, GraphDB 8.0.3 EE, Oracle 12.2c and Virtuoso 7.2.4.2 with respect to the following requirements: bulk loading, scalability, stability and query execution.

1 of 24
Download to read offline
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Benchmarking Commercial RDF Stores with Publications
Office Dataset
Ghislain Auguste Atemezing, Ph.D1
1Mondeca, 35 Boulevard de Strasbourg, 75010, Paris, France,
Twitter: @gatemezing
Web: http://www.mondeca.com
Benchmark material: https://github.com/gatemezing/posb
04th June, 2018
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 1 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Agenda
1 Mondeca in a nutshell
Who we are
Why do clients come to us
2 Benchmark Context
3 Publications Office of the EU Datasets
Data Workflow & Use cases
Ontology
Datasets
Requirements
4 Benchmark Configuration
Experimental set up
5 Query Analysis
Instantaneous Queries
Analytical Queries
Read/Write queries
6 Benchmark Results
7 Conclusion
8 Aknowledgements
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 2 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Who we are
Mondeca in a nutshell
Located in Paris, France
Leading French semantic technology solution provider since 1999
SMA : agile and flat structure
Our solution : Smart Content Factory combines data management + content
annotation + semantic search.
Major clients in publishing activities(e.g.,Turner, AP, NPR), Insurance domain,
goods industry, national government and international organizations
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 3 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Why do clients come to us
Mondeca in a nutshell
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 4 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Why Benchmarking PO datasets?
1 To match the current and planned use cases of the Publications Office of the
European Union (OP) w.r.t current state-of-art of RDF stores
2 To analyze deeply both functional requirements and documentation of 7
commercial RDF stores : Virtuoso, GraphDB, Neo4j, Stardog, Oracle,
Blazegraph and Marklogic.
3 To document and motivate the choice of a given RDF stores based on key
requirements defined internally after interviews.
The end goal of the study is to identify
the RDF Store(s) that will best match the
OP’s planned use cases and requirements
in terms of scalability, stability and
reliability.
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 5 / 24
Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu
Bench Context - OP
Publications Office of the European Union
publishes the daily Official Journal of the European Union in 23 official EU
languages (24 when Irish is required).
produces and disseminates of legal and general publications in a variety of paper
and electronic formats
Online services
EUR-Lex 1 : provides free access to European Union law
EU Bookshop : the online library and bookshop of publications from the
institutions and other bodies of the EU.
EU Open Data Portal is the single point of access to data from the institutions and
other bodies of the European Union.
Eurovoc : is a multilingual, multidisciplinary thesaurus covering the activities of the
EU
Whoiswho is the official directory of the EU.
CORDIS : repository and portal for EU-funded research projects
1. http://eur-lex.europa.eu/
Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 6 / 24

Recommended

Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
 
Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...OSTHUS
 
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...Ontotext
 
Allotrope foundation vanderwall_and_little_bio_it_world_2016
Allotrope foundation vanderwall_and_little_bio_it_world_2016Allotrope foundation vanderwall_and_little_bio_it_world_2016
Allotrope foundation vanderwall_and_little_bio_it_world_2016OSTHUS
 
Semantics for Integrated Analytical Laboratory Processes – the Allotrope Pers...
Semantics for Integrated Analytical Laboratory Processes – the Allotrope Pers...Semantics for Integrated Analytical Laboratory Processes – the Allotrope Pers...
Semantics for Integrated Analytical Laboratory Processes – the Allotrope Pers...OSTHUS
 
Cerif tutorial from CRIS2016
Cerif tutorial from CRIS2016Cerif tutorial from CRIS2016
Cerif tutorial from CRIS2016Valerie BRASSE
 

More Related Content

What's hot

The use of R statistical package in controlled infrastructure. The case of Cl...
The use of R statistical package in controlled infrastructure. The case of Cl...The use of R statistical package in controlled infrastructure. The case of Cl...
The use of R statistical package in controlled infrastructure. The case of Cl...Adrian Olszewski
 
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...semanticsconference
 
ICIC 2013 Conference Proceedings Nicolas Lalyre Syngenta
ICIC 2013 Conference Proceedings Nicolas Lalyre SyngentaICIC 2013 Conference Proceedings Nicolas Lalyre Syngenta
ICIC 2013 Conference Proceedings Nicolas Lalyre SyngentaDr. Haxel Consult
 
20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horse20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horseChris Southan
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Futuredgarijo
 
Making the data available with AMBIT cheminformatics platform
Making the data available with AMBIT cheminformatics platformMaking the data available with AMBIT cheminformatics platform
Making the data available with AMBIT cheminformatics platformNina Jeliazkova
 
ICWE2017 BigDataEurope
ICWE2017 BigDataEuropeICWE2017 BigDataEurope
ICWE2017 BigDataEuropeBigData_Europe
 
SWIMing VoCamp 2016 - ifcOWL overview and current state
SWIMing VoCamp 2016 - ifcOWL overview and current stateSWIMing VoCamp 2016 - ifcOWL overview and current state
SWIMing VoCamp 2016 - ifcOWL overview and current statePieter Pauwels
 
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...Dr. Haxel Consult
 
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?Dr. Haxel Consult
 
ISO 15926 Reference Data Engineering Methodology
ISO 15926 Reference Data Engineering MethodologyISO 15926 Reference Data Engineering Methodology
ISO 15926 Reference Data Engineering MethodologyAnatoly Levenchuk
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationdgarijo
 
On chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsOn chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsNina Jeliazkova
 
Beginners .net api dev days2017
Beginners  .net api   dev days2017Beginners  .net api   dev days2017
Beginners .net api dev days2017DevDays
 
ICIC 2013 Conference Proceedings Jan Baur STN
ICIC 2013 Conference Proceedings Jan Baur STNICIC 2013 Conference Proceedings Jan Baur STN
ICIC 2013 Conference Proceedings Jan Baur STNDr. Haxel Consult
 
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...Pedro Príncipe
 
Lift your data_inspire2012
Lift your data_inspire2012Lift your data_inspire2012
Lift your data_inspire2012EURECOM
 

What's hot (20)

GraphDB
GraphDBGraphDB
GraphDB
 
The use of R statistical package in controlled infrastructure. The case of Cl...
The use of R statistical package in controlled infrastructure. The case of Cl...The use of R statistical package in controlled infrastructure. The case of Cl...
The use of R statistical package in controlled infrastructure. The case of Cl...
 
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
 
ICIC 2013 Conference Proceedings Nicolas Lalyre Syngenta
ICIC 2013 Conference Proceedings Nicolas Lalyre SyngentaICIC 2013 Conference Proceedings Nicolas Lalyre Syngenta
ICIC 2013 Conference Proceedings Nicolas Lalyre Syngenta
 
20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horse20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horse
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
 
Making the data available with AMBIT cheminformatics platform
Making the data available with AMBIT cheminformatics platformMaking the data available with AMBIT cheminformatics platform
Making the data available with AMBIT cheminformatics platform
 
ICWE2017 BigDataEurope
ICWE2017 BigDataEuropeICWE2017 BigDataEurope
ICWE2017 BigDataEurope
 
SWIMing VoCamp 2016 - ifcOWL overview and current state
SWIMing VoCamp 2016 - ifcOWL overview and current stateSWIMing VoCamp 2016 - ifcOWL overview and current state
SWIMing VoCamp 2016 - ifcOWL overview and current state
 
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
 
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
 
ISO 15926 Reference Data Engineering Methodology
ISO 15926 Reference Data Engineering MethodologyISO 15926 Reference Data Engineering Methodology
ISO 15926 Reference Data Engineering Methodology
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
 
On chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsOn chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurements
 
Beginners .net api dev days2017
Beginners  .net api   dev days2017Beginners  .net api   dev days2017
Beginners .net api dev days2017
 
ICIC 2013 Conference Proceedings Jan Baur STN
ICIC 2013 Conference Proceedings Jan Baur STNICIC 2013 Conference Proceedings Jan Baur STN
ICIC 2013 Conference Proceedings Jan Baur STN
 
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
 
Lift your data_inspire2012
Lift your data_inspire2012Lift your data_inspire2012
Lift your data_inspire2012
 
Euro lipids 2014_graz
Euro lipids 2014_grazEuro lipids 2014_graz
Euro lipids 2014_graz
 

Similar to Benchmarking Commercial RDF Stores with Publications Office Dataset

Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...OSTHUS
 
The Case for Graphs in Supply Chains
The Case for Graphs in Supply ChainsThe Case for Graphs in Supply Chains
The Case for Graphs in Supply ChainsNeo4j
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
 
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)Dag Endresen
 
Primers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code ReviewPrimers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code ReviewDelft University of Technology
 
Orbital presentation pt1_200112_v1
Orbital presentation pt1_200112_v1Orbital presentation pt1_200112_v1
Orbital presentation pt1_200112_v1ensmjd
 
Metadata catalogues survey results, EOSCpilot H2020 EU project
Metadata catalogues survey results, EOSCpilot H2020 EU projectMetadata catalogues survey results, EOSCpilot H2020 EU project
Metadata catalogues survey results, EOSCpilot H2020 EU projectMassimiliano Assante
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsSrinath Perera
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Greg Landrum
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BigData_Europe
 
Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesPistoia Alliance
 
Frankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectFrankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectGoethe Univeristy
 
WSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product OverviewWSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product OverviewWSO2
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Sanjay Padhi, Ph.D
 
Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!Josef Hardi
 
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...Pedro Príncipe
 
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE
 
Querying a Complex Web-Based KB for Cultural Heritage Preservation
Querying a Complex Web-Based KB  for Cultural Heritage PreservationQuerying a Complex Web-Based KB  for Cultural Heritage Preservation
Querying a Complex Web-Based KB for Cultural Heritage PreservationEster Giallonardo
 

Similar to Benchmarking Commercial RDF Stores with Publications Office Dataset (20)

Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
 
The Case for Graphs in Supply Chains
The Case for Graphs in Supply ChainsThe Case for Graphs in Supply Chains
The Case for Graphs in Supply Chains
 
MOCHA 2018 Challenge @ ESWC2018
MOCHA 2018 Challenge @ ESWC2018MOCHA 2018 Challenge @ ESWC2018
MOCHA 2018 Challenge @ ESWC2018
 
Presentation of agriopenlink @ EFITA (main program)
Presentation of agriopenlink @ EFITA (main program)Presentation of agriopenlink @ EFITA (main program)
Presentation of agriopenlink @ EFITA (main program)
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
 
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)
 
Primers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code ReviewPrimers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code Review
 
Orbital presentation pt1_200112_v1
Orbital presentation pt1_200112_v1Orbital presentation pt1_200112_v1
Orbital presentation pt1_200112_v1
 
Metadata catalogues survey results, EOSCpilot H2020 EU project
Metadata catalogues survey results, EOSCpilot H2020 EU projectMetadata catalogues survey results, EOSCpilot H2020 EU project
Metadata catalogues survey results, EOSCpilot H2020 EU project
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and Applications
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
 
Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matrices
 
Frankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectFrankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee Projeect
 
WSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product OverviewWSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product Overview
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!
 
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
 
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
 
Querying a Complex Web-Based KB for Cultural Heritage Preservation
Querying a Complex Web-Based KB  for Cultural Heritage PreservationQuerying a Complex Web-Based KB  for Cultural Heritage Preservation
Querying a Complex Web-Based KB for Cultural Heritage Preservation
 

More from Ghislain Atemezing

Trends on Data Graphs & Security for the Internet of Things
Trends on Data Graphs & Security for the Internet of ThingsTrends on Data Graphs & Security for the Internet of Things
Trends on Data Graphs & Security for the Internet of ThingsGhislain Atemezing
 
Big Data & Taxonomies for Actionable Intelligence
Big Data & Taxonomies for Actionable IntelligenceBig Data & Taxonomies for Actionable Intelligence
Big Data & Taxonomies for Actionable IntelligenceGhislain Atemezing
 
LIVE: a Tool for Checking Licenses Compatibility between Vocabularies and Data
LIVE: a Tool for Checking Licenses Compatibility between Vocabularies and DataLIVE: a Tool for Checking Licenses Compatibility between Vocabularies and Data
LIVE: a Tool for Checking Licenses Compatibility between Vocabularies and DataGhislain Atemezing
 
Information Content based Ranking Metric for Linked Open Vocabularies
Information Content based Ranking Metric for Linked Open VocabulariesInformation Content based Ranking Metric for Linked Open Vocabularies
Information Content based Ranking Metric for Linked Open VocabulariesGhislain Atemezing
 
Harmonizing services for LOD vocabularies: a case study
Harmonizing services for LOD vocabularies: a case studyHarmonizing services for LOD vocabularies: a case study
Harmonizing services for LOD vocabularies: a case studyGhislain Atemezing
 
Visualisation and linked data applications edf 2013
Visualisation and linked data applications edf 2013Visualisation and linked data applications edf 2013
Visualisation and linked data applications edf 2013Ghislain Atemezing
 
Comparing Vocabularies for Representing Geographical Features and Their Geometry
Comparing Vocabularies for Representing Geographical Features and Their GeometryComparing Vocabularies for Representing Geographical Features and Their Geometry
Comparing Vocabularies for Representing Geographical Features and Their GeometryGhislain Atemezing
 

More from Ghislain Atemezing (10)

Trends on Data Graphs & Security for the Internet of Things
Trends on Data Graphs & Security for the Internet of ThingsTrends on Data Graphs & Security for the Internet of Things
Trends on Data Graphs & Security for the Internet of Things
 
Big Data & Taxonomies for Actionable Intelligence
Big Data & Taxonomies for Actionable IntelligenceBig Data & Taxonomies for Actionable Intelligence
Big Data & Taxonomies for Actionable Intelligence
 
Phd defense slides
Phd defense slidesPhd defense slides
Phd defense slides
 
LIVE: a Tool for Checking Licenses Compatibility between Vocabularies and Data
LIVE: a Tool for Checking Licenses Compatibility between Vocabularies and DataLIVE: a Tool for Checking Licenses Compatibility between Vocabularies and Data
LIVE: a Tool for Checking Licenses Compatibility between Vocabularies and Data
 
publishing-ign-data
 publishing-ign-data publishing-ign-data
publishing-ign-data
 
cold2014-ldvizwiz
cold2014-ldvizwizcold2014-ldvizwiz
cold2014-ldvizwiz
 
Information Content based Ranking Metric for Linked Open Vocabularies
Information Content based Ranking Metric for Linked Open VocabulariesInformation Content based Ranking Metric for Linked Open Vocabularies
Information Content based Ranking Metric for Linked Open Vocabularies
 
Harmonizing services for LOD vocabularies: a case study
Harmonizing services for LOD vocabularies: a case studyHarmonizing services for LOD vocabularies: a case study
Harmonizing services for LOD vocabularies: a case study
 
Visualisation and linked data applications edf 2013
Visualisation and linked data applications edf 2013Visualisation and linked data applications edf 2013
Visualisation and linked data applications edf 2013
 
Comparing Vocabularies for Representing Geographical Features and Their Geometry
Comparing Vocabularies for Representing Geographical Features and Their GeometryComparing Vocabularies for Representing Geographical Features and Their Geometry
Comparing Vocabularies for Representing Geographical Features and Their Geometry
 

Recently uploaded

DOC-20240215-WA0000..pdf class 11 check waves
DOC-20240215-WA0000..pdf class 11 check wavesDOC-20240215-WA0000..pdf class 11 check waves
DOC-20240215-WA0000..pdf class 11 check wavesshreyanshdubey7814
 
Presentation of Helmet Detection Using Machine Learning.pptx
Presentation of Helmet Detection Using Machine Learning.pptxPresentation of Helmet Detection Using Machine Learning.pptx
Presentation of Helmet Detection Using Machine Learning.pptxasmitaTele2
 
Center Enamel is the leading fire water tanks manufacturer in China.docx
Center Enamel is the leading fire water tanks manufacturer in China.docxCenter Enamel is the leading fire water tanks manufacturer in China.docx
Center Enamel is the leading fire water tanks manufacturer in China.docxsjzzztc
 
ROBOT PERCEPTION FOR AGRICULTURE AND GOOD PRODUCTION1.1.pdf
ROBOT PERCEPTION FOR AGRICULTURE AND GOOD PRODUCTION1.1.pdfROBOT PERCEPTION FOR AGRICULTURE AND GOOD PRODUCTION1.1.pdf
ROBOT PERCEPTION FOR AGRICULTURE AND GOOD PRODUCTION1.1.pdfRudraPratapSingh871925
 
Bresenham line-drawing-algorithm By S L Sonawane.pdf
Bresenham line-drawing-algorithm By S L Sonawane.pdfBresenham line-drawing-algorithm By S L Sonawane.pdf
Bresenham line-drawing-algorithm By S L Sonawane.pdfSujataSonawane11
 
CCNA: Routing and Switching Fundamentals
CCNA: Routing and Switching FundamentalsCCNA: Routing and Switching Fundamentals
CCNA: Routing and Switching FundamentalsDebabrata Halder
 
Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...
Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...
Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...GauravBhartie
 
biofilm fouling of the membrane present in aquaculture
biofilm fouling of the membrane present in aquaculturebiofilm fouling of the membrane present in aquaculture
biofilm fouling of the membrane present in aquacultureVINETUBE2
 
INTERACTIVE AQUATIC MUSEUM AT BAGH IBN QASIM CLIFTON KARACHI
INTERACTIVE AQUATIC MUSEUM AT BAGH IBN QASIM CLIFTON KARACHIINTERACTIVE AQUATIC MUSEUM AT BAGH IBN QASIM CLIFTON KARACHI
INTERACTIVE AQUATIC MUSEUM AT BAGH IBN QASIM CLIFTON KARACHIKiranKandhro1
 
Eversendai - HSE Performance Management Systems-R1.pptx
Eversendai - HSE Performance Management Systems-R1.pptxEversendai - HSE Performance Management Systems-R1.pptx
Eversendai - HSE Performance Management Systems-R1.pptxADILRASHID54
 
self introduction sri balaji
self introduction sri balajiself introduction sri balaji
self introduction sri balajiSriBalaji891607
 
【文凭定制】坎特伯雷大学毕业证学历认证
【文凭定制】坎特伯雷大学毕业证学历认证【文凭定制】坎特伯雷大学毕业证学历认证
【文凭定制】坎特伯雷大学毕业证学历认证muvgemo
 
Metrology Measurements and All units PPT
Metrology Measurements and  All units PPTMetrology Measurements and  All units PPT
Metrology Measurements and All units PPTdinesh babu
 
Introduction and replication to DragonflyDB
Introduction and replication to DragonflyDBIntroduction and replication to DragonflyDB
Introduction and replication to DragonflyDBMarian Marinov
 
Deluck Technical Works Company Profile.pdf
Deluck Technical Works Company Profile.pdfDeluck Technical Works Company Profile.pdf
Deluck Technical Works Company Profile.pdfartpoa9
 
MAXIMUM POWER POINT TRACKING ALGORITHMS APPLIED TO WIND-SOLAR HYBRID SYSTEM
MAXIMUM POWER POINT TRACKING ALGORITHMS APPLIED TO WIND-SOLAR HYBRID SYSTEMMAXIMUM POWER POINT TRACKING ALGORITHMS APPLIED TO WIND-SOLAR HYBRID SYSTEM
MAXIMUM POWER POINT TRACKING ALGORITHMS APPLIED TO WIND-SOLAR HYBRID SYSTEMArunkumar Tulasi
 
chap. 3. lipid deterioration oil and fat processign
chap. 3. lipid deterioration oil and fat processignchap. 3. lipid deterioration oil and fat processign
chap. 3. lipid deterioration oil and fat processignteddymebratie
 
Architectural Preservation - Heritage, focused on Saudi Arabia
Architectural Preservation - Heritage, focused on Saudi ArabiaArchitectural Preservation - Heritage, focused on Saudi Arabia
Architectural Preservation - Heritage, focused on Saudi ArabiaIgnacio J. Palma, Arch PhD.
 
Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...
Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...
Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...GauravBhartie
 
S. Kim, NeurIPS 2023, MLILAB, KAISTAI
S. Kim,  NeurIPS 2023,  MLILAB,  KAISTAIS. Kim,  NeurIPS 2023,  MLILAB,  KAISTAI
S. Kim, NeurIPS 2023, MLILAB, KAISTAIMLILAB
 

Recently uploaded (20)

DOC-20240215-WA0000..pdf class 11 check waves
DOC-20240215-WA0000..pdf class 11 check wavesDOC-20240215-WA0000..pdf class 11 check waves
DOC-20240215-WA0000..pdf class 11 check waves
 
Presentation of Helmet Detection Using Machine Learning.pptx
Presentation of Helmet Detection Using Machine Learning.pptxPresentation of Helmet Detection Using Machine Learning.pptx
Presentation of Helmet Detection Using Machine Learning.pptx
 
Center Enamel is the leading fire water tanks manufacturer in China.docx
Center Enamel is the leading fire water tanks manufacturer in China.docxCenter Enamel is the leading fire water tanks manufacturer in China.docx
Center Enamel is the leading fire water tanks manufacturer in China.docx
 
ROBOT PERCEPTION FOR AGRICULTURE AND GOOD PRODUCTION1.1.pdf
ROBOT PERCEPTION FOR AGRICULTURE AND GOOD PRODUCTION1.1.pdfROBOT PERCEPTION FOR AGRICULTURE AND GOOD PRODUCTION1.1.pdf
ROBOT PERCEPTION FOR AGRICULTURE AND GOOD PRODUCTION1.1.pdf
 
Bresenham line-drawing-algorithm By S L Sonawane.pdf
Bresenham line-drawing-algorithm By S L Sonawane.pdfBresenham line-drawing-algorithm By S L Sonawane.pdf
Bresenham line-drawing-algorithm By S L Sonawane.pdf
 
CCNA: Routing and Switching Fundamentals
CCNA: Routing and Switching FundamentalsCCNA: Routing and Switching Fundamentals
CCNA: Routing and Switching Fundamentals
 
Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...
Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...
Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...
 
biofilm fouling of the membrane present in aquaculture
biofilm fouling of the membrane present in aquaculturebiofilm fouling of the membrane present in aquaculture
biofilm fouling of the membrane present in aquaculture
 
INTERACTIVE AQUATIC MUSEUM AT BAGH IBN QASIM CLIFTON KARACHI
INTERACTIVE AQUATIC MUSEUM AT BAGH IBN QASIM CLIFTON KARACHIINTERACTIVE AQUATIC MUSEUM AT BAGH IBN QASIM CLIFTON KARACHI
INTERACTIVE AQUATIC MUSEUM AT BAGH IBN QASIM CLIFTON KARACHI
 
Eversendai - HSE Performance Management Systems-R1.pptx
Eversendai - HSE Performance Management Systems-R1.pptxEversendai - HSE Performance Management Systems-R1.pptx
Eversendai - HSE Performance Management Systems-R1.pptx
 
self introduction sri balaji
self introduction sri balajiself introduction sri balaji
self introduction sri balaji
 
【文凭定制】坎特伯雷大学毕业证学历认证
【文凭定制】坎特伯雷大学毕业证学历认证【文凭定制】坎特伯雷大学毕业证学历认证
【文凭定制】坎特伯雷大学毕业证学历认证
 
Metrology Measurements and All units PPT
Metrology Measurements and  All units PPTMetrology Measurements and  All units PPT
Metrology Measurements and All units PPT
 
Introduction and replication to DragonflyDB
Introduction and replication to DragonflyDBIntroduction and replication to DragonflyDB
Introduction and replication to DragonflyDB
 
Deluck Technical Works Company Profile.pdf
Deluck Technical Works Company Profile.pdfDeluck Technical Works Company Profile.pdf
Deluck Technical Works Company Profile.pdf
 
MAXIMUM POWER POINT TRACKING ALGORITHMS APPLIED TO WIND-SOLAR HYBRID SYSTEM
MAXIMUM POWER POINT TRACKING ALGORITHMS APPLIED TO WIND-SOLAR HYBRID SYSTEMMAXIMUM POWER POINT TRACKING ALGORITHMS APPLIED TO WIND-SOLAR HYBRID SYSTEM
MAXIMUM POWER POINT TRACKING ALGORITHMS APPLIED TO WIND-SOLAR HYBRID SYSTEM
 
chap. 3. lipid deterioration oil and fat processign
chap. 3. lipid deterioration oil and fat processignchap. 3. lipid deterioration oil and fat processign
chap. 3. lipid deterioration oil and fat processign
 
Architectural Preservation - Heritage, focused on Saudi Arabia
Architectural Preservation - Heritage, focused on Saudi ArabiaArchitectural Preservation - Heritage, focused on Saudi Arabia
Architectural Preservation - Heritage, focused on Saudi Arabia
 
Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...
Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...
Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...
 
S. Kim, NeurIPS 2023, MLILAB, KAISTAI
S. Kim,  NeurIPS 2023,  MLILAB,  KAISTAIS. Kim,  NeurIPS 2023,  MLILAB,  KAISTAI
S. Kim, NeurIPS 2023, MLILAB, KAISTAI
 

Benchmarking Commercial RDF Stores with Publications Office Dataset

  • 1. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Benchmarking Commercial RDF Stores with Publications Office Dataset Ghislain Auguste Atemezing, Ph.D1 1Mondeca, 35 Boulevard de Strasbourg, 75010, Paris, France, Twitter: @gatemezing Web: http://www.mondeca.com Benchmark material: https://github.com/gatemezing/posb 04th June, 2018 Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 1 / 24
  • 2. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Agenda 1 Mondeca in a nutshell Who we are Why do clients come to us 2 Benchmark Context 3 Publications Office of the EU Datasets Data Workflow & Use cases Ontology Datasets Requirements 4 Benchmark Configuration Experimental set up 5 Query Analysis Instantaneous Queries Analytical Queries Read/Write queries 6 Benchmark Results 7 Conclusion 8 Aknowledgements Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 2 / 24
  • 3. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Who we are Mondeca in a nutshell Located in Paris, France Leading French semantic technology solution provider since 1999 SMA : agile and flat structure Our solution : Smart Content Factory combines data management + content annotation + semantic search. Major clients in publishing activities(e.g.,Turner, AP, NPR), Insurance domain, goods industry, national government and international organizations Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 3 / 24
  • 4. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Why do clients come to us Mondeca in a nutshell Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 4 / 24
  • 5. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Why Benchmarking PO datasets? 1 To match the current and planned use cases of the Publications Office of the European Union (OP) w.r.t current state-of-art of RDF stores 2 To analyze deeply both functional requirements and documentation of 7 commercial RDF stores : Virtuoso, GraphDB, Neo4j, Stardog, Oracle, Blazegraph and Marklogic. 3 To document and motivate the choice of a given RDF stores based on key requirements defined internally after interviews. The end goal of the study is to identify the RDF Store(s) that will best match the OP’s planned use cases and requirements in terms of scalability, stability and reliability. Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 5 / 24
  • 6. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Bench Context - OP Publications Office of the European Union publishes the daily Official Journal of the European Union in 23 official EU languages (24 when Irish is required). produces and disseminates of legal and general publications in a variety of paper and electronic formats Online services EUR-Lex 1 : provides free access to European Union law EU Bookshop : the online library and bookshop of publications from the institutions and other bodies of the EU. EU Open Data Portal is the single point of access to data from the institutions and other bodies of the European Union. Eurovoc : is a multilingual, multidisciplinary thesaurus covering the activities of the EU Whoiswho is the official directory of the EU. CORDIS : repository and portal for EU-funded research projects 1. http://eur-lex.europa.eu/ Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 6 / 24
  • 7. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Data Workflow & Use cases CELLAR RDF is the semantic repository at OP, with ODP store featuring Linked Data applications. Current RDF usage/ Wish-list Volume : approximately 730 million triples. The size of the RDF store increase 500 million triples after 2 years OP foresees a volume of 1,5 billion triples in the next 2 years as a minimum. Wish : handle 10x today’s volume (ca.7 billion triples.) OP receives 100k to 200k SPARQL queries / day with strong growth. The target architecture must handle 2mio queries/day minimum Search via browse by subject tab : http ://publications.europa.eu/en/browse- by-subject example : https ://goo.gl/Yci9Nz Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 7 / 24
  • 8. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Ontology OP Dataset - Ontology Common Data Model (CDM) CDM is the ontology used to generate RDF dataset at OOPCE CDM is based on FRBR model to represent work, expression, and manifestation Instances in PROD dataset Dataset with 187 instantiated classes covering 61% of CDM 4,958,220 blank nodes Top 3 classes : cdm:item (4.77%); cdm:expression (4.52%) and cdm:manifestation (2.30%) CDM ontology statistics Metric Number Class 308 Object Property 803 Data Property 690 SubClassOf 615 SubObjectProp. 485 InverseObjectProp. 248 SubDataProperty 405 DL Expressivity ALHOIQ(D) Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 8 / 24
  • 9. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Datasets OP Dataset - Explicit knowledge The values in the tables are explicit triples in the knowledge base. Top five instances by class in PROD dataset Class #Instance Percentage cdm:item 34,747,955 4.77 cdm:expression 32,898,325 4.52 cdm:manifestation 16,768,690 2.30 cdm:work 7,771,103 1.06 cdm:resource_legal 7,674,632 1.05 Size of dump datasets Dataset name Disk size #Files #Triples RDF format Normalized (.zip) 226 GB 2,195 727,442,978 NQUADS Non normalized (.tgz) 12 GB 64 728,163,464 NQUADS NAL dataset 282 MB 72 402,926 RDF/XML Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 9 / 24
  • 10. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Requirements RDF Stores Killer Requirements Blazegraph v2.1.4 Open Edition Poor results in the earlier stage of the bench : (i) too slow in loading data (90h 43min, almost 4 days!!) Too many time out (15) in first test in queries from category 1 No support at all on repeated requests from them to improve results or validate our configuration file. Neo4J All loading tests aborted after 40h 27min 2. Need to port the code (ad-hoc importRDF) for each Neo4j upgrade : blueprints, tinkerpop, gremlin Too much maintenance on this stack. 2. A work in progress with Neo4J techs to improve our ad-hoc RDF import loader Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 10 / 24
  • 11. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Experimental set up Bench Configuration Hardware Server CPU : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz , 6C/12T RAM : 128 GB; Disk capacity : 4 TO SATA. Operating System : CentOS 7, 64 bits and Java 1.8.0 running. Marklogic FO CPU : Intel(R) Xeon(R) E3 1245 v5 4c/8T @ 3.5GHz RAM : 64 GB; Disk storage : 3 x 500 Go SSD Tools for benchmark JENA qparse tool to validate all the queries Open tool Sparql Query Benchmarker 3 used with 20 runs per categories to warm up the server; 5 runs for current benchmark 3. https://github.com/rvesse/sparql-query-bm Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 11 / 24
  • 12. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Experimental set up Triple stores setup Virtuoso NumberOfBuffers = 5450000 and MaxDirtyBuffers = 4000000 Stardog Set Java heap size = 16GB and MaxDirectMemorySize = 8GB. Deactivation of the strict parsing option, SL option by default GraphDB Set entity index size to 500000000 with entity predicate list enabled, Disabling the content index. Oracle pga_aggregate_limit = 64GB and pga_aggregate_target = 32G Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 12 / 24
  • 13. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Instantaneous Queries Query Analysis FIGURE – Queries of Category 1 20 instantaneous queries Query form #Total SELECT 16 DESCRIBE 3 CONSTRUCT 1 Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 13 / 24
  • 14. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Analytical Queries Query Analysis FIGURE – Queries of Category 2 Analytical queries 24 queries Query form : SELECT Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 14 / 24
  • 15. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Read/Write queries Query Analysis All the queries were gathered and developed by OP’s metadata teams. The queries were originally optimized for Virtuoso. The results in the quantitative benchmark are probably biased in favor of the current triple store. To remove the bias, we asked to other vendors to provide us with optimized queries for their engines. We present the results of the quantitative study, which is part of a more global study containing 66 functional requirements . Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 15 / 24
  • 16. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Bulk loading PROD Dataset (727Mio) : ranking order -> Virtuoso (3.8h), Stardog (4.59), Marklogic (5.83), Oracle (23.07) and GraphDB (35.64). Oracle optimized to 8h!! 2Bio 4 : ranking order -> Virtuoso (13.01h), Stardog (13.30), GraphDB (17.46), Marklogic (27.96) and Oracle (43.7). Oracle optimized to 32h!! 5Bio : ranking order -> Virtuoso (36.10h), GraphDB (44.14), Marklogic (169.95), Stardog (unsuccessful), Oracle (N/A) 4. Generated by postfixing resources of type publications.europa.eu/resource/cellar Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 16 / 24
  • 17. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Results Category 1 - time out=60s Virtuoso is faster than all the rest of the triple stores. No time out with Virtuoso. Marklogic (1 time out), Oracle (2 time out), Stardog (2 time out), GraphDB (4 time out) and Blazegraph ( 15 time out). Stardog performs poorly compared to GraphDB and Oracle. Blazegraph was removed after this test. Marklogic is NOT constant in multithreading. Stardog performs poorly compared to GraphDB and Oracle. Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 17 / 24
  • 18. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Results Category 2 - time out=600s Virtuoso is faster than all the rest of the triple stores. No time out with Virtuoso, GraphDB and Marklogic 1 timed out query (Q10) with Oracle. 4 timed out (Q15, Q16, Q19, Q22) with Stardog. Bench analytic queries ranking RDF Stores #Time Out Rank Virtuoso 0 1 Stardog 4 6 GraphDB EE 0 3 GraphDB EE RDFS+ 0 2 Marklogic 0 5 Oracle 12c 1 4 Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 18 / 24
  • 19. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Results Category 3 - time out=10s 01 CONSTRUCT; 01 DELETE/INSERT and 03 INSERT IN query. Virtuoso is faster, followed by Marklogic and GraphDB. Oracle performs worse in monothread Stardog and Oracle scores are significantly lower than Marklogic and GraphDB Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 19 / 24
  • 20. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Results Category 3 - time out=10s Oracle performs better in multithread scenario. Why? -> Index/disk calibration?! Stardog is constant in magnitude of QMpH. GraphDB and Marklogic have significant changes from 5 clients to 20 clients. Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 20 / 24
  • 21. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Stability Test Stress test on triple stores using instantaneous queries. (category 1) The test starts by specifying the number of parallel clients = 128. Each client completes the run of the mix queries in parallel. The number of parallel clients is then multiplied by 2 and the process is repeated. This repeats until either the maximum runtime (180min) or the maximum number of threads are reached. Result Stress Test Stardog and Oracle finished with the limit of the parallel threads. Virtuoso and GraphDB completed the test after 180 min, reaching 256 parallel threads. GraphDB shows fewer errors compared to Virtuoso. GraphDB is likely to be more stable respectively in this order to Stardog, Oracle and Virtuoso. Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 21 / 24
  • 22. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu What We learned Lessons learned 3 out of the 5 RDF stores come close to the key requirements (Robustness, Scalability, Reliability and Stability) None of the RDF stores perfectly matches OP’s business cases When pushed to their limits, all of the RDF stores require extensive support from the vendors (e.g., case of Oracle 12c and GraphDB) Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 22 / 24
  • 23. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Conclusions We have presented a quantitative comparison of 5 commercial RDF stores : Virtuoso, GraphDB, Oracle, Stardog and Marklogic based on OP datasets and requirements. The results show that Virtuoso and Stardog are faster in bulk loading. Virtuoso outperforms respectively to GraphDB, Stardog and Oracle in query-based performance. GraphDB shows to be the winner in the stability test performed in this benchmark. This study gives an overview of the current state of RDF stores performance with respect to PO’s dataset This work can be partly used to assess enterprise RDF stores We plan to get query rewrites for all the stores vendors and evaluate the results We also plan to perform the same benchmark on AWS Neptune 5 We plan to better compare this work with state-of-the-art benchmarking, maybe using IGUANA framework. 5. http://blog.mondeca.com/2018/02/09/requetes-sparql-avec-neptune/ Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 23 / 24
  • 24. Mondeca in a nutshell Benchmark Context Publications Office of the EU Datasets Benchmark Configuration Query Analysis Benchmark Results Conclu Acknowledgements We would like to thank RDF teams at Onto- text, Stardog Union, Oracle, Marklogic and OpenLink. Ghislain Auguste Atemezing, Ph.D MONDECA QuWeDa2018 @ ESWC 2018 04th June, 2018 24 / 24