Knowledge graphs have become popular over the past decade and frequently rely on the Resource Description Framework (RDF) or property graph databases as data models. We present, the first translator from SPARQL -- the W3C standardised language for RDF -- and Gremlin -- a popular property graph traversal language. Gremlinator translates SPARQL queries to Gremlin path traversals for executing graph pattern matching queries over graph databases.
This allows a user, who is well versed in SPARQL, to access and query a wide variety of Graph Data Management Systems (DMSs) avoiding the steep learning curve for adapting to a new Graph Query Language (GQL). Gremlin is a graph computing system agnostic traversal language (covering both OLTP graph database or OLAP graph processors), making it a desirable choice for supporting interoperability for querying Graph DMSs.
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
1. SPARQL querying of Graph Databases
using Gremlin: Gremlinator
Harsh Thakkar
University of Bonn
Graph Day 2017 - San Francisco - USA - June 17
2. 2Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
WDAqua ITN - http://wdaqua.eu/
● Answering Questions using Web Data (WDAqua) is a EU H2020 -
Marie Skłodowska Curie Actions - Innovative Training
Networks (ITN)
● 15 PhD students across Europe
● Advancing SotA in data-driven Question Answering
● Reusable and extensible components for QA
3. 3Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Outline
● #Motivation 1 - LITMUS Benchmark Suite
● #Motivation 2 - RDF graphs vs. Property graphs
● Gremlinator
4. 4Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Outline
● #Motivation 1 - LITMUS Benchmark Suite
● #Motivation 2 - RDF graphs vs. Property graphs
● Gremlinator
5. 5Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
RDF-3X
Ocean of Data
Sea of Tools+
K-V stores
Graph stores
Doc-oriented
stores
RDF stores
Wide column
stores
Real
Synthetic
http://lod-cloud.net/versions/2017-02-20/lod.pn
g
LOD Cloud 2017
Motivation #1 - Benchmarking
2
6. 6Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
http://mattturck.com/big-data-landscape-2016-v18-final/
7. 7Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Contd…
• Domain specific
applications:
i.e. perspectives
https://steemit.com/philosophy/@l0k1/subjectivity-and-truth-how-blockchains-model-consensus-building
8. 8Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Benchmark… Manually?
Phewww.. that’s Tedious!
Fairness?
Standardization?Reusability?
Extensibility?
9. 9Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
LITMUS Benchmark Suite
https://kerniebemp.files.wordpress.com/2013/03/rubegoldberg.jpg
10. 10Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Dataset 1 Dataset 2 Dataset 3 Dataset N
Data integration module
Benchmarking Core
Controller & Tester
System configuration & integration
module
Queryset 1
Queryset 3
Queryset M
Analyzer
RDF stores Graph
stores
Relational
DBs
Wide Column
stores
Profiler
Queryset 2
Key value
stores
Queryconversion
module
Query Facet (F2)
Data Facet (F1)
System Facet (F3)
User Interface
(F4)
User
The LITMUS architecture
Thakkar, Harsh. "Towards an Open Extensible Framework for
Empirical Benchmarking of Data Management Solutions: LITMUS."
European Semantic Web Conference. Springer,, 2017.
11. 11Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Challenges
• Core challenges in developing such an
open, extensible FAIR framework?
• Data Representation
• Query Conversion
• Key Performance Indicators (KPIs)
http://media.thinkadvisor.com/lifehealthpro/article/2015/02/24/challenge.jp
g
12. 12Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Query Translation
• Yes we are linguistically
diverse and so are DMSs!
• That too with different
dialects:
• SPARQL, CYPHER, Gremlin,
etc
• Many many Graph-based
DMSs!
• Is there one GQL to cover
all Graph DMSs*?
http://cdn2.wpbeginner.com/wp-content/uploads/2015/02/multilingual-wordpress.jpg
13. 13Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Gremlin
http://www.datastax.com/wp-content/uploads/2015/09/many-to-many-mapping.png
http://www.datastax.com/wp-content/uploads/2015/09/gtm-dataflow.png
Gremlin’s Multi-Graph Query Language (GQL) support
14. 14Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Contd…
Multi-DMS & platform support
https://tinkerpop.apache.org/images/oltp-and-olap.png
# Motivation 1
15. 15Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
https://opinionessoftheworld.files.wordpress.com/2013/04/game-of-thrones-daenerys-dragon.j
pg
Gremlinator
Me
16. 16Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Outline
● WDAqua ITN
● #Motivation 1 - LITMUS Benchmark Suite
● #Motivation 2 - RDF graphs vs. Property graphs
● Gremlinator
17. 17Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Do I need a Graph
Database? If so, what
kind? (Juan Sequeda)
Earlier today
18. 18Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Motivation #2 - Graph Data Models
● Graphs are intuitive formalisms
● Represent complex natural and man-made networks (Genes, SNs)
● Data is increasingly connected (highly)
● Relationships within the data are an integral it
● Index-free, schema-free
● Mathematical foundation
● RDF and Property Graph two most popular graph data models
19. 19Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
RDF Data Model
● RDF = Resource Description Framework
● W3C Recommendation for data modeling and encoding machine readable
content on the Web
● Data Model:
○ encodes structured information
○ universal, machine-readable interchange format (Serializations - .nt, .xml, .ttl, etc)
○ data is structured in the form of graphs (RDF graphs)
20. 20Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
RDF Data Model
● RDF is a triple based graph model, where :
○ Subject: URI, Blank node
○ Predicate: URIs -> property
○ Object: URI, Literal, Blank node
“2017”
“Harsh”
ex:Eventex:Person
ex:SF
“27” ex:age
ex:name
ex:Bonn
“Graph Day”
ex:year
ex:name
ex:place
ex:Speaker
ex:place
URI = Universal Resource identifier, analogous
to ISBN for books
Literals = data values
Blank nodes = Desc. of entities that don’t need
to be named.
IRIs*
ex:stim
e
“40”
@prefix ex: <http://example.org>
ex:Person ex:Speaker ex:Event
ex:Person ex:name “Harsh”
ex:Person ex:place ex:Bonn
ex:Person ex:age “27”
ex:Event ex:name “Graph Day”
ex:Event ex:Year “2017”
interpretation
representation
21. 21Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
RDF Graphs (RDFGs)
● Edge-labelled, directed, multi-graphs (w. Ent. URIs, Blank nodes, Literals)
● Going from information to Knowledge using OWL (DLs) and Ontologies
(RDFS, RDFa, etc)
● Bulky
○ Everything is a node-edge-node (edges dont have properties)
○ More relationships per node -> More total number of triples!
22. 22Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Example - Modern Graph feat. RDFG
23. 23Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
SPARQL - The RDF Query Language
● SPARQL Protocol And RDF Query Language
● Graph Pattern Matching (GPM) QL for RDF graphs (declarative)
● Defacto QL of RDF stores, W3C standard since Jan 2008.
● High importance in semantic web - querying knowledge graphs - QA
○ SPARQL : Semantic Web (web in general) what SQL : Rel.DBs
● Main components:
○ Graph patterns (WHERE) [BGPs {s p o .}, GGPs {...}, CGPs{ops}]
○ Prefixes - encode URIs
○ Query result from range of variables (SELECT)
More info: 1,2,
24. 24Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Ex:SPARQL query
PREFIX a: <http://abc.com/prty>
SELECT DISTINCT ?x
WHERE { ?x a:Created ?y .}
?x
<http://abc.com/node/4>
<http://abc.com/node/2>
<http://abc.com/node/5>
Output
BGP
Matching patterns
25. 25Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Property Graph Data Model
● Edge-labelled, directed, attributed, multi-graph
● Vertices and edges both have properties
● Main components:
○ Vertices, edges (Src,Dsc), properties (key-value pairs), labels (strings)
● Super neat (compact), super cute
● Easier to add weighted, reified edges
● Query Languages - CYPHER, Gremlin, PGQL, etc
Name: Graph Day
Year: 2017
Place: SF, USA
Name: Harsh
Age: 27
From: Bonn, DE
Role: speaker
Time: 40
Person Event
26. 26Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Example - PG viz
RDF
PG
27. 27Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Gremlin - a PG Query Language
● A Graph Traversal Language and Machine (like Java - JVM)
● Offers both declarative (GPM) and imperative (traversal)
constructs
● Supports both OLTP: Graph DBs and OLAP: Graph processors
● Popular for querying Property graphs.
More InfoRef. Slides #13, #14
28. 28Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Ex:Gremlin query
1. g.V().in(“Created”).dedup()
//imperative
2. g.V().match(
__.as(‘x’).out(‘Created’).as(‘y’
)).select(‘x’).dedup()
// declarative - GPM
PREFIX a: <http://abc.com/prty>
SELECT DISTINCT ?x
WHERE { ?x a:Created ?y .}
==>x:v[4]
==>x:v[2]
==>x:v[5]
Output SPARQL
29. 29Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
SPARQL vs. Gremlin Queries
● Linear, standard, graph
pattern queries (exact
match)
● Less joins - less complexity -
better performance
● Benefit from macro indices
○ SPO, POS, etc
● Multi-hop, neighbourhood,
star/snowflake, traversal
queries (fuzzy match)
● Everything is a path - better
performance
● Benefit from micro-indices
Motivation #2 - So why not benefit from Gremlin for complex/
multi-hop/ path queries?
30. 30Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
SPARQL vs. Gremlin Queries
● Lacks looping and branching
and Graph analytical queries
(OLAP)
● Supports federated querying
● Supports looping, branching
and Graph Analytical queries
(OLAP)
● No federated querying
Motivation #2 - So why not benefit from Gremlin for complex/
multi-hop/ path queries?
32. 32Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Outline
● WDAqua ITN
● #Motivation 1 - LITMUS Benchmark Suite
● #Motivation 2 - RDF graphs vs. Property graphs
● Gremlinator
33. 33Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
SPARQL ➞ Gremlin analogy
● Query based on Graph
pattern [declarative]
● Provides GPM construct via
.match() steps [declarative]
Name: SPARQL-Gremlin 0.1
Lang: Java
Name: Daniel
Kupitz
….
Implements
Person Software
Name: Harsh Thakker
….
Extends
Year: 2015 Year: 2016
Person
Inception Result
34. 34Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Contd…
SPARQL and Gremlin language constructs and corresponding operators
35. 35Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Gremlinator architecture
36. 36Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
BGP (q) ➞ SST* ( )
Subject Predicate Object Operation
URI URI [rdfs:label] “literal” Lv
/Le
⇒ .hasLabel(“x”)
URI URI “literal” Pv
/Pe
⇒ .has(P, ”val”)
URI URI URI E ⇒ .in(“x”)/.out(“x”)
Traversals ⇒ Property value, Label value, vertex
Name: Graph Day
Year: 2017
Place: SF, USA
Name: Harsh
Age: 27
From: Bonn, DE
Role: speaker
Time: 40
Person Event
* Rodriguez, Marko A., and Peter Neubauer.
"The graph traversal pattern." arXiv preprint
arXiv:1004.1001 (2010).
37. 37Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
CGP (Q) ➞ Traversal ( )
Mapping corresponding Gremlin operators
38. 38Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Ex:1 Tinker Modern Graph
Select only those persons who are younger or equal
than 30 and created a soft. collectively.
SELECT ?a ?b ?c WHERE {
?a v:label "person" .
?a e:knows ?b .
?a e:created ?c .
?b e:created ?c .
?a v:age ?d .
FILTER (?d <= 30)
}
39. 39Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Contd…
SELECT ?a ?b ?c WHERE {
?a v:label "person" .
?a e:knows ?b .
?a e:created ?c .
?b e:created ?c .
?a v:age ?d .
FILTER (?d <= 30)
}
[MatchStartStep@[a], HasStep([~label.eq(person)]), MatchEndStep]
[MatchStartStep@[a], VertexStep(OUT,[knows],vertex)@[b], MatchEndStep]
[MatchStartStep@[a], VertexStep(OUT,[created],vertex)@[c], MatchEndStep]
[MatchStartStep@[b], VertexStep(OUT,[created],vertex)@[c], MatchEndStep]
[MatchStartStep@[a], PropertiesStep([age],value)@[d], MatchEndStep]
[WhereTraversalStep([WhereStartStep(d), IsStep(leq(30))]), MatchEndStep]
s
BGPs
BGP (q) ➞ SST* ( )
41. 41Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Select only those persons who are younger or equal
than 30 and created a soft. collectively.
SELECT ?a ?b ?c WHERE {
?a v:label "person" .
?a e:knows ?b .
?a e:created ?c .
?b e:created ?c .
?a v:age ?d .
FILTER (?d <= 30)
}
Output
{a=v[2], b=v[4], c=v[3]}
Contd…
42. 42Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Experiments*
Northwind dataset
● PG - Vertices: 3209, Edges: 6177
● RDF - Triples: 33033
BSBM 1M dataset
● PG - Vertices: 92737, Edges: 238309
● RDF - Triples: 1000313
* Detailed work submitted to CIKM 2017
CPU: Intel® Xeon® CPU E5-2660 v3 (20 cores @2.60GHz),
RAM: 128 GB DDR3, HDD: 512 GB SSD, OS: Linux 4.2-generic (x86_64)
Openlink Virtuoso v7.2.4, Apache TinkerGraph-Gremlin v3.2.3
43. 43Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Contd…
● Queries
44. 44Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
First Results
Comparison of executing SPARQL queries on Virtuoso and corresponding translated Gremlin pattern matching
traversals on TinkerGraph against BSBM 1M dataset (in secs, warm cache,10Qs).
13x - 17x fold faster* wrt SPARQL
45. 45Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Limitation
● SPARQL queries with:
○ Nested Union(s)
○ Optional(s) - v3.3 [fixed]
46. 46Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Next Step(s)
● Address the limitations
● Robustness (any PG)
● Deploy @Apache TinkerPop :)
47. 47Graph Day 2017 - San Francisco - USA - June 17 Gremlinator - Harsh Thakkar - University of Bonn
Acknowledgements
Funding: Supervisors & Mentors:
Prof. Dr.
Soeren Auer
Prof. Dr. Jens
Lehmann
Prof. Dr.
Maria-Esther
Vidal
H2020 WDAqua ITN (GA: 642795)
Dr. Marko Rodriguez
48. 48PhD Symposium - ESWC 2017 - Slovenia - Portoroz - May 29 LITMUS Benchmark Suite - Harsh Thakkar - University of Bonn
Resources
http://wdaqua.eu/
https://github.com/LITMUS-Benchmark-Suite/sparql-to-gremlin
Code: https://github.com/LITMUS-Benchmark-Suite/
Web: https://litmus-benchmark-suite.github.io
Docker: https://hub.docker.com/r/litmusbenchmarksuite/litmus/
LITMUS Benchmark Suite
49. THANK YOU GRAPH DAY ‘17 SF!!!
Harsh Thakkar
University of Bonn
Twitter: @harsh9t
LinkedIn: thakkarharsh
E-mail: harsh9t@gmail.com
Questions? Comments?
Insults? Injuries?