SlideShare a Scribd company logo
Trying Not to Die Benchmarking
using LITMUS
Harsh Thakkar1
, Yashwant Keswani2
, Mohnish Dubey1
,
Jens Lehmann1,3
, Sören Auer4
1
University of Bonn, Bonn, Germany
2
DA-IICT, Gandhinagar, India
3
Fraunhofer IAIS, St. Augustin, Germany
4
TIB, Hannover, Germany
- Amsterdam - Nederland - September 13
2Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Outline
● Motivation
● Problem Statement
● State of the Art
● Approach - LITMUS Benchmark Suite
● Challenges
● Evaluation Plan
● Next Steps
3Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
RDF-3X
Ocean of Data
Sea of Tools+
K-V stores
Graph stores
Doc-oriented
stores
RDF stores
Wide column
stores
Real
Synthetic
http://lod-cloud.net/versions/2017-02-20/lod.pn
g
LOD Cloud 2017
Motivation
2
4Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
• Domain specific
applications:
i.e. perspectives
• Choice Overload!
• Vendors
• Researchers
• Users
https://steemit.com/philosophy/@l0k1/subjectivity-and-truth-how-blockchains-model-consensus-building
Motivation
5Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Benchmarking
● Tedious!
● Needs domain-specific expertise
● Lack of standardization (single focus)
○ Open software, System configuration
settings, etc.
● Near-zero Reusability
● Guaranteeing a fair benchmark is difficult!
● Choosing the right performance metrics is
cumbersome and subjective
● Visualising benchmark results
[6] http://2.bp.blogspot.com/-TkUb0TPN7IA/VewUHm_jVaI/AAAAAAAABgM/vZILnZNJv5A/s1600/2012-10-16-subjective-objective.jpg
6Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Problem Statement
“How can diverse cross-domain DMSs
be benchmarked in an automated
established *
standard #
environment?”
7Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
State of the Art
Benchmark Effort Relational DMSs RDF DMSs Graph DMSs
TPC [H,C,E,DS] [13]
XGDBench [6]
HPC [7]
Graph 500 [12]
DBPSB [11]
LUBM [9]
IGUANA [19]
WatDiv [1]
SP2Bench [20]
BSBM [4]
Pandora*
Graphium [8]
LDBC [2]
HOBBIT**
*http://pandora.ldc.usb.ve/
Single domain
Benchmarks
Cross domain
Benchmarks
**https://project-hobbit.eu/
8Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
LITMUS Benchmark Suite
9Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Dataset 1 Dataset 2 Dataset 3 Dataset N
Data integration module
Benchmarking Core
Controller & Tester
System configuration & integration
module
Queryset 1
Queryset 3
Queryset M
Analyzer
RDF stores Graph
stores
Relational
DBs
Wide Column
stores
Profiler
Queryset 2
Key value
stores
Queryconversion
module
Query Facet (F2)
Data Facet (F1)
System Facet (F3)
User Interface
(F4)
User
The LITMUS architecture
Thakkar, Harsh. "Towards an Open Extensible
Framework for Empirical Benchmarking of Data
Management Solutions: LITMUS." ESWC, 2017.
10Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Challenges
● Core challenges in developing
such an open, extensible, FAIR
framework?
○ C1 - Data Conversion
○ C2 - Query Translation
○ C3 - Key Performance Indicators
(KPIs)
http://media.thinkadvisor.com/lifehealthpro/article/2015/02/24/challenge.jp
g
11Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
C1 - Data Conversion
● Different data models
○ RDF Graph
○ Property Graph
● To conduct a fair benchmark
conversion is needed
● DMS’s native supported data model
is the best
RDF graph
Property graph
Lots of Data
Real
Synthetic
RQ1 - What are the methods to convert RDF into
Property Graph data model?
12Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
RDF Data Model
● RDF is a triple based graph model, where :
○ Subject: URI, Blank node
○ Predicate: URIs -> property
○ Object: URI, Literal, Blank node
“2017”
ex:Eventex:Person
ex:AMS
“Semantics”
ex:year
ex:name
ex:place
ex:speaker
URI = Universal Resource identifier, analogous
to ISBN for books
Literals = data values
Blank nodes = Desc. of entities that don’t need
to be named.
IRIs*
ex:stim
e
“30”
@prefix ex: <http://example.org>
ex:Person ex:speaker ex:Event
ex:Person ex:name “Harsh”
ex:Person ex:place ex:Bonn
ex:Person ex:age “27”
ex:Event ex:name “Graph Day”
ex:Event ex:Year “2017”
interpretation
representation
“Harsh” ex:name
ex:place
ex:Bonn
“27”
ex:age
13Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
RDF Graphs (RDFGs)
● Edge-labelled, directed, multi-graphs (w. Ent. URIs, Blank nodes, Literals)
● Going from information to Knowledge using OWL (DLs) and Ontologies
(RDFS, RDFa, etc)
● Bulky
○ Everything is a node-edge-node (edges dont have properties)
○ More relationships per node → More total number of triples!
14Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Property Graph Data Model
● Edge-labelled, directed, attributed, multi-graph
● Vertices and edges both have properties
● Main components:
○ Vertices, edges (Src,Dsc), properties (key-value pairs), labels (strings)
● Super neat (compact), super cute
● Easier to add weighted, reified edges
● Query Languages - CYPHER, Gremlin, PGQL, etc
Name: Semantics
Year: 2017
Place: AMS
Name: Harsh
Age: 27
Place: Bonn
Role: speaker
Time: 30
Person Event
15Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Mapping RDF → PG
● Initial Results:
○ Intra-conversion of graph data models (mapping problem)
○ PoC implementation ready (see GitHub)
● Work in progress:
○ Conversion of properties, blank nodes, etc.
○ Using e.g. Reification, Singleton Property, Hypergraphs, etc.
○ Use case: DBpedia 2016-10 (mapping from .owl & data)
16Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
C2 - Query Translation
● Yes we are linguistically
diverse and so are DMSs!
● That too with different
dialects:
○ SPARQL, CYPHER,
Gremlin, etc
● RDF - SPARQL (W3C ‘08)
● Graph - ??
http://cdn2.wpbeginner.com/wp-content/uploads/2015/02/multilingual-wordpress.jpg
17Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Gremlin Traversal Language
http://www.datastax.com/wp-content/uploads/2015/09/many-to-many-mapping.png
http://www.datastax.com/wp-content/uploads/2015/09/gtm-dataflow.png
Gremlin’s Multi-Graph Query Language (GQL) support
18Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Contd…
Multi-DMS & platform support
https://tinkerpop.apache.org/images/oltp-and-olap.png
RQ2 - What are the semantics preserving methods/approaches for translating SPARQL
queries to a graph query language such as Gremlin?
19Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
https://opinionessoftheworld.files.wordpress.com/2013/04/game-of-thrones-daenerys-dragon.j
pg
Gremlinator
Me
20Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
SPARQL → Gremlin
● C2: Gremlinator - the SPARQL-Gremlin translator
○ Formalizing Gremlin traversals in Graph algebra [DEXA ‘17]
○ A novel translation mechanism that maps SPARQL queries to Gremlin
pattern matching traversals [Planned submission - EDBT’18]
○ Nested queries still a challenge (i.e. UNION)
Addressing
RQ2
Talk@Graph Day 2017
21Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
C3 - Metrics/KPIs
RQ3 - What are the strengths and the
limitations of the existing KPIs, and to what
extent do they reflect the performance of a
DMS?
RDF graph
Property graph
Type of Data
Real
Synthetic
[11] https://www.tutorialspoint.com/computer_fundamentals/images/primary_memory.jpg
[12] http://s.hswstatic.com/gif/microprocessor-250x150.jpg
11
Query response time
Precision, Recall
DMS Index size
DMS configuration
Linear
Star
shaped
Snowflake
Type of Query
22Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Selection of KPIs
● CPU and Memory specific metrics:
○ Perf-tool - LITMUS v0.1 (supported)
■ TLB, LLC, instructions, L1 cache, page faults, etc (18 supported
currently)
● Dataset specific metrics:
○ |V|, |E|, Eccentricity, Clustering coefficient, Centrality, etc (in progress)
● Query specific metrics:
○ Type, Length, Response time, Precision, Recall, F1, etc (planned)
● DMS specific:
○ Load time, index time, index size (supported)
23Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Dataset 1 Dataset 2 Dataset 3 Dataset N
Data integration module
Benchmarking Core
Controller & Tester
System configuration & integration
module
Queryset 1
Queryset 3
Queryset M
Analyzer
RDF stores Graph
stores
Relational
DBs
Wide Column
stores
Profiler
Queryset 2
Key value
stores
Queryconversion
module
Query Facet (F2)
Data Facet (F1)
System Facet (F3)
User Interface
(F4)
User
Back in the bigger picture
C1
C2
C3
24Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
The LITMUS Test*
PLOTS
FILES
*Please visit our Poster & Demo for Hands on experience & more details in the paper!
25Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Evaluation
● RQs: Publications
● Framework: Continuous integration (v0.1 released, v0.2
planned Dec ‘17)
○ Reproducing third-party benchmarks
○ Gathering users and experts feedback
○ Going live @Industry:
■ Gremlinator - Apache Tinkerpop
■ Further collaboration… Adoption by other projects - LDBC,
HOBBIT! :-)
26Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Next Steps
● Framework - LITMUS v0.2 launch (Dec ‘17 - planned)
● DMS module - Adding two more DMSs each
● Dataset module - RDF → PG (Dec ‘17)
● Query module - Integrating Gremlinator
● GUI: Aesthetic GUI (may be?)
27Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Acknowledgements
Funding: Supervisors & Mentors:
Prof. Dr.
Soeren Auer
TiB, DE
Prof. Dr. Jens
Lehmann
UBO, DE
Prof. Dr.
Maria-Esther Vidal
TiB, DE
H2020 WDAqua ITN (GA: 642795)
Dr. Marko Rodriguez
DataStax & Apache,
USA
28Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Resources
http://wdaqua.eu/
https://github.com/LITMUS-Benchmark-Suite/sparql-to-gremlin
Code : https://github.com/LITMUS-Benchmark-Suite/
Web : https://litmus-benchmark-suite.github.io
Docker : https://hub.docker.com/r/litmusbenchmarksuite/litmus/
LITMUS Benchmark Suite
THANK YOU !
Harsh Thakkar
University of Bonn
Twitter: @harsh9t
LinkedIn: thakkarharsh
E-mail: harsh9t@gmail.com
Questions? Comments?
Insults? Injuries?
30Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
EXTRA STUFF
31Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Experiments*
Northwind dataset
● PG - Vertices: 3209, Edges: 6177
● RDF - Triples: 33033
BSBM 1M dataset
● PG - Vertices: 92737, Edges: 238309
● RDF - Triples: 1000313
CPU: Intel® Xeon® CPU E5-2660 v3 (20 cores @2.60GHz),
RAM: 128 GB DDR3, HDD: 512 GB SSD, OS: Linux 4.2-generic (x86_64)
Openlink Virtuoso v7.2.4, Apache TinkerGraph-Gremlin v3.2.3

More Related Content

What's hot

TinkerPop 2020
TinkerPop 2020TinkerPop 2020
TinkerPop 2020
Joshua Shinavier
 
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
Matthäus Zloch
 
Hacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsHacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge Graphs
ArangoDB Database
 
LD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and toolsLD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and tools
Vrije Universiteit Amsterdam
 
Evolution of the Graph Schema
Evolution of the Graph SchemaEvolution of the Graph Schema
Evolution of the Graph Schema
Joshua Shinavier
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
Heiko Paulheim
 
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked DataMark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
mbruemmer
 
Big (Language) Data – From research strategies to proof-of-concept and implem...
Big (Language) Data – From research strategies to proof-of-concept and implem...Big (Language) Data – From research strategies to proof-of-concept and implem...
Big (Language) Data – From research strategies to proof-of-concept and implem...
LEARN Project
 
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
PhD thesis defense:  Large-scale multilingual knowledge extraction, publishin...PhD thesis defense:  Large-scale multilingual knowledge extraction, publishin...
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
Dimitris Kontokostas
 
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...
Till Blume
 
Big Data Profiling
Big Data Profiling Big Data Profiling
Big Data Profiling
eXascale Infolab
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
Heiko Paulheim
 
DBPedia-past-present-future
DBPedia-past-present-futureDBPedia-past-present-future
DBPedia-past-present-future
Data Science Society
 
Essentials of R
Essentials of REssentials of R
Essentials of R
ExternalEvents
 
Sasaki datathon-madrid-2015
Sasaki datathon-madrid-2015Sasaki datathon-madrid-2015
Sasaki datathon-madrid-2015
Felix Sasaki
 
Automatic creation of mappings between classification systems
Automatic creation of mappings between classification systemsAutomatic creation of mappings between classification systems
Automatic creation of mappings between classification systems
Magnus Pfeffer
 
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jExplicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Connected Data World
 
Semantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLSemantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQL
Jerven Bolleman
 
Automatic creation of mappings between classification systems for bibliograph...
Automatic creation of mappings between classification systems for bibliograph...Automatic creation of mappings between classification systems for bibliograph...
Automatic creation of mappings between classification systems for bibliograph...
Magnus Pfeffer
 
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...
Joshua Shinavier
 

What's hot (20)

TinkerPop 2020
TinkerPop 2020TinkerPop 2020
TinkerPop 2020
 
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
 
Hacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsHacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge Graphs
 
LD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and toolsLD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and tools
 
Evolution of the Graph Schema
Evolution of the Graph SchemaEvolution of the Graph Schema
Evolution of the Graph Schema
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
 
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked DataMark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
 
Big (Language) Data – From research strategies to proof-of-concept and implem...
Big (Language) Data – From research strategies to proof-of-concept and implem...Big (Language) Data – From research strategies to proof-of-concept and implem...
Big (Language) Data – From research strategies to proof-of-concept and implem...
 
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
PhD thesis defense:  Large-scale multilingual knowledge extraction, publishin...PhD thesis defense:  Large-scale multilingual knowledge extraction, publishin...
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
 
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...
 
Big Data Profiling
Big Data Profiling Big Data Profiling
Big Data Profiling
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
 
DBPedia-past-present-future
DBPedia-past-present-futureDBPedia-past-present-future
DBPedia-past-present-future
 
Essentials of R
Essentials of REssentials of R
Essentials of R
 
Sasaki datathon-madrid-2015
Sasaki datathon-madrid-2015Sasaki datathon-madrid-2015
Sasaki datathon-madrid-2015
 
Automatic creation of mappings between classification systems
Automatic creation of mappings between classification systemsAutomatic creation of mappings between classification systems
Automatic creation of mappings between classification systems
 
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jExplicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
 
Semantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLSemantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQL
 
Automatic creation of mappings between classification systems for bibliograph...
Automatic creation of mappings between classification systems for bibliograph...Automatic creation of mappings between classification systems for bibliograph...
Automatic creation of mappings between classification systems for bibliograph...
 
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...
 

Similar to Semantics 2017 - Trying Not to Die Benchmarking using LITMUS

New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
Heiko Paulheim
 
Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and Ontario
BigData_Europe
 
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkDistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
Gezim Sejdiu
 
Hadoop and Beyond
Hadoop and BeyondHadoop and Beyond
Hadoop and Beyond
Paco Nathan
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
Gezim Sejdiu
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Heiko Paulheim
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
Enrico Daga
 
Ranking Resources in Folksonomies by Exploiting Semantic and Context-specific...
Ranking Resources in Folksonomies by Exploiting Semantic and Context-specific...Ranking Resources in Folksonomies by Exploiting Semantic and Context-specific...
Ranking Resources in Folksonomies by Exploiting Semantic and Context-specific...
Thomas Rodenhausen
 
Bicod2017
Bicod2017Bicod2017
Bicod2017
Rim Moussa
 
BICOD-2017
BICOD-2017BICOD-2017
BICOD-2017
Rim Moussa
 
Graph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFGraph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDF
Dimitris Kontokostas
 
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Bradley Allen
 
HPC in the cloud comes of age - Red Oak HPC Seminar
HPC in the cloud comes of age - Red Oak HPC SeminarHPC in the cloud comes of age - Red Oak HPC Seminar
HPC in the cloud comes of age - Red Oak HPC Seminar
Martin Hamilton
 
Slides 111017220255-phpapp01
Slides 111017220255-phpapp01Slides 111017220255-phpapp01
Slides 111017220255-phpapp01
Ken Mwai
 
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasPython for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
Wes McKinney
 
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
Jens Mittelbach
 
Web Data Engineering - A Technical Perspective on Web Archives
Web Data Engineering - A Technical Perspective on Web ArchivesWeb Data Engineering - A Technical Perspective on Web Archives
Web Data Engineering - A Technical Perspective on Web Archives
Helge Holzmann
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
LDBC council
 
HPC I/O for Computational Scientists
HPC I/O for Computational ScientistsHPC I/O for Computational Scientists
HPC I/O for Computational Scientists
inside-BigData.com
 
(Hierarchical) Topic Modeling_Yueshen Xu
(Hierarchical) Topic Modeling_Yueshen Xu(Hierarchical) Topic Modeling_Yueshen Xu
(Hierarchical) Topic Modeling_Yueshen Xu
Yueshen Xu
 

Similar to Semantics 2017 - Trying Not to Die Benchmarking using LITMUS (20)

New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and Ontario
 
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkDistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
 
Hadoop and Beyond
Hadoop and BeyondHadoop and Beyond
Hadoop and Beyond
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
 
Ranking Resources in Folksonomies by Exploiting Semantic and Context-specific...
Ranking Resources in Folksonomies by Exploiting Semantic and Context-specific...Ranking Resources in Folksonomies by Exploiting Semantic and Context-specific...
Ranking Resources in Folksonomies by Exploiting Semantic and Context-specific...
 
Bicod2017
Bicod2017Bicod2017
Bicod2017
 
BICOD-2017
BICOD-2017BICOD-2017
BICOD-2017
 
Graph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFGraph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDF
 
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
 
HPC in the cloud comes of age - Red Oak HPC Seminar
HPC in the cloud comes of age - Red Oak HPC SeminarHPC in the cloud comes of age - Red Oak HPC Seminar
HPC in the cloud comes of age - Red Oak HPC Seminar
 
Slides 111017220255-phpapp01
Slides 111017220255-phpapp01Slides 111017220255-phpapp01
Slides 111017220255-phpapp01
 
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasPython for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
 
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
 
Web Data Engineering - A Technical Perspective on Web Archives
Web Data Engineering - A Technical Perspective on Web ArchivesWeb Data Engineering - A Technical Perspective on Web Archives
Web Data Engineering - A Technical Perspective on Web Archives
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
 
HPC I/O for Computational Scientists
HPC I/O for Computational ScientistsHPC I/O for Computational Scientists
HPC I/O for Computational Scientists
 
(Hierarchical) Topic Modeling_Yueshen Xu
(Hierarchical) Topic Modeling_Yueshen Xu(Hierarchical) Topic Modeling_Yueshen Xu
(Hierarchical) Topic Modeling_Yueshen Xu
 

Recently uploaded

RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 

Recently uploaded (20)

RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 

Semantics 2017 - Trying Not to Die Benchmarking using LITMUS

  • 1. Trying Not to Die Benchmarking using LITMUS Harsh Thakkar1 , Yashwant Keswani2 , Mohnish Dubey1 , Jens Lehmann1,3 , Sören Auer4 1 University of Bonn, Bonn, Germany 2 DA-IICT, Gandhinagar, India 3 Fraunhofer IAIS, St. Augustin, Germany 4 TIB, Hannover, Germany - Amsterdam - Nederland - September 13
  • 2. 2Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Outline ● Motivation ● Problem Statement ● State of the Art ● Approach - LITMUS Benchmark Suite ● Challenges ● Evaluation Plan ● Next Steps
  • 3. 3Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn RDF-3X Ocean of Data Sea of Tools+ K-V stores Graph stores Doc-oriented stores RDF stores Wide column stores Real Synthetic http://lod-cloud.net/versions/2017-02-20/lod.pn g LOD Cloud 2017 Motivation 2
  • 4. 4Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn • Domain specific applications: i.e. perspectives • Choice Overload! • Vendors • Researchers • Users https://steemit.com/philosophy/@l0k1/subjectivity-and-truth-how-blockchains-model-consensus-building Motivation
  • 5. 5Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Benchmarking ● Tedious! ● Needs domain-specific expertise ● Lack of standardization (single focus) ○ Open software, System configuration settings, etc. ● Near-zero Reusability ● Guaranteeing a fair benchmark is difficult! ● Choosing the right performance metrics is cumbersome and subjective ● Visualising benchmark results [6] http://2.bp.blogspot.com/-TkUb0TPN7IA/VewUHm_jVaI/AAAAAAAABgM/vZILnZNJv5A/s1600/2012-10-16-subjective-objective.jpg
  • 6. 6Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Problem Statement “How can diverse cross-domain DMSs be benchmarked in an automated established * standard # environment?”
  • 7. 7Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn State of the Art Benchmark Effort Relational DMSs RDF DMSs Graph DMSs TPC [H,C,E,DS] [13] XGDBench [6] HPC [7] Graph 500 [12] DBPSB [11] LUBM [9] IGUANA [19] WatDiv [1] SP2Bench [20] BSBM [4] Pandora* Graphium [8] LDBC [2] HOBBIT** *http://pandora.ldc.usb.ve/ Single domain Benchmarks Cross domain Benchmarks **https://project-hobbit.eu/
  • 8. 8Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn LITMUS Benchmark Suite
  • 9. 9Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Dataset 1 Dataset 2 Dataset 3 Dataset N Data integration module Benchmarking Core Controller & Tester System configuration & integration module Queryset 1 Queryset 3 Queryset M Analyzer RDF stores Graph stores Relational DBs Wide Column stores Profiler Queryset 2 Key value stores Queryconversion module Query Facet (F2) Data Facet (F1) System Facet (F3) User Interface (F4) User The LITMUS architecture Thakkar, Harsh. "Towards an Open Extensible Framework for Empirical Benchmarking of Data Management Solutions: LITMUS." ESWC, 2017.
  • 10. 10Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Challenges ● Core challenges in developing such an open, extensible, FAIR framework? ○ C1 - Data Conversion ○ C2 - Query Translation ○ C3 - Key Performance Indicators (KPIs) http://media.thinkadvisor.com/lifehealthpro/article/2015/02/24/challenge.jp g
  • 11. 11Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn C1 - Data Conversion ● Different data models ○ RDF Graph ○ Property Graph ● To conduct a fair benchmark conversion is needed ● DMS’s native supported data model is the best RDF graph Property graph Lots of Data Real Synthetic RQ1 - What are the methods to convert RDF into Property Graph data model?
  • 12. 12Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn RDF Data Model ● RDF is a triple based graph model, where : ○ Subject: URI, Blank node ○ Predicate: URIs -> property ○ Object: URI, Literal, Blank node “2017” ex:Eventex:Person ex:AMS “Semantics” ex:year ex:name ex:place ex:speaker URI = Universal Resource identifier, analogous to ISBN for books Literals = data values Blank nodes = Desc. of entities that don’t need to be named. IRIs* ex:stim e “30” @prefix ex: <http://example.org> ex:Person ex:speaker ex:Event ex:Person ex:name “Harsh” ex:Person ex:place ex:Bonn ex:Person ex:age “27” ex:Event ex:name “Graph Day” ex:Event ex:Year “2017” interpretation representation “Harsh” ex:name ex:place ex:Bonn “27” ex:age
  • 13. 13Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn RDF Graphs (RDFGs) ● Edge-labelled, directed, multi-graphs (w. Ent. URIs, Blank nodes, Literals) ● Going from information to Knowledge using OWL (DLs) and Ontologies (RDFS, RDFa, etc) ● Bulky ○ Everything is a node-edge-node (edges dont have properties) ○ More relationships per node → More total number of triples!
  • 14. 14Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Property Graph Data Model ● Edge-labelled, directed, attributed, multi-graph ● Vertices and edges both have properties ● Main components: ○ Vertices, edges (Src,Dsc), properties (key-value pairs), labels (strings) ● Super neat (compact), super cute ● Easier to add weighted, reified edges ● Query Languages - CYPHER, Gremlin, PGQL, etc Name: Semantics Year: 2017 Place: AMS Name: Harsh Age: 27 Place: Bonn Role: speaker Time: 30 Person Event
  • 15. 15Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Mapping RDF → PG ● Initial Results: ○ Intra-conversion of graph data models (mapping problem) ○ PoC implementation ready (see GitHub) ● Work in progress: ○ Conversion of properties, blank nodes, etc. ○ Using e.g. Reification, Singleton Property, Hypergraphs, etc. ○ Use case: DBpedia 2016-10 (mapping from .owl & data)
  • 16. 16Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn C2 - Query Translation ● Yes we are linguistically diverse and so are DMSs! ● That too with different dialects: ○ SPARQL, CYPHER, Gremlin, etc ● RDF - SPARQL (W3C ‘08) ● Graph - ?? http://cdn2.wpbeginner.com/wp-content/uploads/2015/02/multilingual-wordpress.jpg
  • 17. 17Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Gremlin Traversal Language http://www.datastax.com/wp-content/uploads/2015/09/many-to-many-mapping.png http://www.datastax.com/wp-content/uploads/2015/09/gtm-dataflow.png Gremlin’s Multi-Graph Query Language (GQL) support
  • 18. 18Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Contd… Multi-DMS & platform support https://tinkerpop.apache.org/images/oltp-and-olap.png RQ2 - What are the semantics preserving methods/approaches for translating SPARQL queries to a graph query language such as Gremlin?
  • 19. 19Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn https://opinionessoftheworld.files.wordpress.com/2013/04/game-of-thrones-daenerys-dragon.j pg Gremlinator Me
  • 20. 20Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn SPARQL → Gremlin ● C2: Gremlinator - the SPARQL-Gremlin translator ○ Formalizing Gremlin traversals in Graph algebra [DEXA ‘17] ○ A novel translation mechanism that maps SPARQL queries to Gremlin pattern matching traversals [Planned submission - EDBT’18] ○ Nested queries still a challenge (i.e. UNION) Addressing RQ2 Talk@Graph Day 2017
  • 21. 21Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn C3 - Metrics/KPIs RQ3 - What are the strengths and the limitations of the existing KPIs, and to what extent do they reflect the performance of a DMS? RDF graph Property graph Type of Data Real Synthetic [11] https://www.tutorialspoint.com/computer_fundamentals/images/primary_memory.jpg [12] http://s.hswstatic.com/gif/microprocessor-250x150.jpg 11 Query response time Precision, Recall DMS Index size DMS configuration Linear Star shaped Snowflake Type of Query
  • 22. 22Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Selection of KPIs ● CPU and Memory specific metrics: ○ Perf-tool - LITMUS v0.1 (supported) ■ TLB, LLC, instructions, L1 cache, page faults, etc (18 supported currently) ● Dataset specific metrics: ○ |V|, |E|, Eccentricity, Clustering coefficient, Centrality, etc (in progress) ● Query specific metrics: ○ Type, Length, Response time, Precision, Recall, F1, etc (planned) ● DMS specific: ○ Load time, index time, index size (supported)
  • 23. 23Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Dataset 1 Dataset 2 Dataset 3 Dataset N Data integration module Benchmarking Core Controller & Tester System configuration & integration module Queryset 1 Queryset 3 Queryset M Analyzer RDF stores Graph stores Relational DBs Wide Column stores Profiler Queryset 2 Key value stores Queryconversion module Query Facet (F2) Data Facet (F1) System Facet (F3) User Interface (F4) User Back in the bigger picture C1 C2 C3
  • 24. 24Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn The LITMUS Test* PLOTS FILES *Please visit our Poster & Demo for Hands on experience & more details in the paper!
  • 25. 25Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Evaluation ● RQs: Publications ● Framework: Continuous integration (v0.1 released, v0.2 planned Dec ‘17) ○ Reproducing third-party benchmarks ○ Gathering users and experts feedback ○ Going live @Industry: ■ Gremlinator - Apache Tinkerpop ■ Further collaboration… Adoption by other projects - LDBC, HOBBIT! :-)
  • 26. 26Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Next Steps ● Framework - LITMUS v0.2 launch (Dec ‘17 - planned) ● DMS module - Adding two more DMSs each ● Dataset module - RDF → PG (Dec ‘17) ● Query module - Integrating Gremlinator ● GUI: Aesthetic GUI (may be?)
  • 27. 27Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Acknowledgements Funding: Supervisors & Mentors: Prof. Dr. Soeren Auer TiB, DE Prof. Dr. Jens Lehmann UBO, DE Prof. Dr. Maria-Esther Vidal TiB, DE H2020 WDAqua ITN (GA: 642795) Dr. Marko Rodriguez DataStax & Apache, USA
  • 28. 28Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Resources http://wdaqua.eu/ https://github.com/LITMUS-Benchmark-Suite/sparql-to-gremlin Code : https://github.com/LITMUS-Benchmark-Suite/ Web : https://litmus-benchmark-suite.github.io Docker : https://hub.docker.com/r/litmusbenchmarksuite/litmus/ LITMUS Benchmark Suite
  • 29. THANK YOU ! Harsh Thakkar University of Bonn Twitter: @harsh9t LinkedIn: thakkarharsh E-mail: harsh9t@gmail.com Questions? Comments? Insults? Injuries?
  • 30. 30Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn EXTRA STUFF
  • 31. 31Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Experiments* Northwind dataset ● PG - Vertices: 3209, Edges: 6177 ● RDF - Triples: 33033 BSBM 1M dataset ● PG - Vertices: 92737, Edges: 238309 ● RDF - Triples: 1000313 CPU: Intel® Xeon® CPU E5-2660 v3 (20 cores @2.60GHz), RAM: 128 GB DDR3, HDD: 512 GB SSD, OS: Linux 4.2-generic (x86_64) Openlink Virtuoso v7.2.4, Apache TinkerGraph-Gremlin v3.2.3