1. Introduction to Property graphs and
the Gremlin traversal language
Harsh Thakkar
University of Bonn, Germany
February 15, 2019 - CHARUSAT - India
On the ubiquity of graphs and their applications in the real world
2. Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of BonnCHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019
Outline
● WHY graphs?
● WHERE are the graphs?
● WHAT exactly are graphs?
○ Graph Data Models
■ RDF & Property graphs
○ Graph Query Languages
■ SPARQL & Gremlin
● Traversing graphs
3. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
WDAqua ITN - http://wdaqua.eu/
● Answering Questions using Web Data (WDAqua) is a EU H2020 - Marie
Skłodowska Curie Actions - Innovative Training Networks (ITN)
● 15 PhD students across Europe
● Advancing SotA in data-driven Question Answering
● Reusable and extensible components for QA
4. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
WHY Graphs?
5. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
The RDBMSs . . .
6. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Limitation of RDBMS:
Modelling the real world uncertainty
CHAOS
7. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
8. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
9. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
10. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
11. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
12. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
13. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
14. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
15. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
16. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
17. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
18. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Relational vs. Graph
In which department does Alice study?
19. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
The rise of the Graphs
● Complex real world relationships
● Intuitive
● Schema-free
● Mathematically sound
20. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
The Graphs DBMSs
●
21. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
The Graphs DBMSs
Source: http://connected-data.london/wp-content/uploads/2017/07/Connected-Data-landscape-2017-v1.0-April2017.jpg
22. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
So WHERE are the Graphs?
23. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
24. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
25. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
26. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Airport connection
A map of the 3,275 global airports
and all of the connecting flight
routes. Designed by Martin
Grandjean, each bubble represents
an individual airport and the bubble
sizes represents the number of flight
routes (37,153 routes in total) based
on OpenFlights.org data.
27. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
28. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Social network of Saddam Hussein built
in 2003 made by C. Wilson.
Searching for Saddam: a five-part series
on how the US military used social
networking and intelligence agency to
capture the Iraqi dictator. 2010.
29. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Image source: https://www.nature.com/articles/nrg1272
A protein-protein interaction network for
yeast. A network of interactions between
proteins in the single-celled organism
Saccharomyces cerevisiae (bakerʹs yeast),
as determined using, primarily, two-hybrid
screen experiments. From Jeong et al.
Copyright Macmillan Publishers Ltd.
30. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Neuronal network
White matter tracts within a
human brain, as visualized
by MRItractography.
In a neuronal network, the
neurons are the nodes and the
synapses are the links between
them. These networks are
usually studied using Graph
theory and machine learning.
31. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
The Graph Data Model
32. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Graph Data Model
● Graphs are intuitive formalisms
● Represent complex natural and man-made networks (Genes, SNs)
● Data is increasingly connected (highly)
● Relationships within the data are an integral it
● Index-free, schema-free
● Mathematical foundation
● RDF and Property Graph two most popular graph data models
33. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Graph Elements
● Vertices / Nodes
● Edges / Relationships
● Labels
● Attributes / Properties (key-value pairs)
“person” “created”
age: 27 name: Harsh
34. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
35. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Graph Types
● Undirected
● Directed
● Labeled
● Attributed
● Multigraph
● A combination of the above
○ Labelled directed attributed multigraphs (Property graph)
○ Labelled directed multigraphs (edge-labeled or simple graph - RDF )
36. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
RDF Data Model
● RDF = Resource Description Framework
● W3C Recommendation for data modeling and encoding machine
readable content on the Web
● Data Model:
○ encodes structured information
○ universal, machine-readable interchange format (Serializations - .nt, .xml,
.ttl, etc)
○ data is structured in the form of graphs (RDF graphs)
37. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
RDF Data Model
● RDF is a triple based graph model, where :
○ Subject: URI, Blank node
○ Predicate: URIs -> property
○ Object: URI, Literal, Blank node
“2019”
“Harsh”
ex:Eventex:Person
ex:GJ,IN
“27” ex:age
ex:name
ex:Bonn
“EXPERT TALK”
ex:year
ex:name
ex:place
ex:Speaker
ex:place
URI = Universal Resource identifier, analogous
to ISBN for books
Literals = data values
Blank nodes = Desc. of entities that don’t need
to be named.
IRIs*
ex:stim
e
“60”
@prefix ex: <http://example.org>
ex:Person ex:Speaker ex:Event
ex:Person ex:name “Harsh”
ex:Person ex:place ex:Bonn
ex:Person ex:age “27”
ex:Event ex:name “EXPERT TALK”
ex:Event ex:Year “2019”
interpretation
representation
38. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
RDF Graphs (RDFGs)
● Edge-labelled, directed, multi-graphs (w. Ent. URIs, Blank nodes,
Literals)
● Going from information to Knowledge using OWL (DLs) and Ontologies
(RDFS, RDFa, etc)
● Bulky
○ Everything is a node-edge-node (edges dont have properties)
○ More relationships per node -> More total number of triples!
39. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Example - Modern Graph feat. RDFG
40. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Property Graph Data Model
● Edge-labelled, directed, attributed, multigraph
● Vertices and edges both have properties
● Main components:
○ Vertices, edges (Src,Dsc), properties (key-value pairs), labels
(strings)
● Super neat (compact), super cute (readable)
● Easier to add weighted, reified edges
● Query Languages - CYPHER, Gremlin, PGQL, etc
Name: EXPERT TALK
Year: 2019
Place: GJ, IN
Name: Harsh
Age: 27
From: Bonn, DE
Role: speaker
Time: 60
Person Event
41. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Example - PG viz
RDF
PG
42. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Graph Query Languages
• Yes we are linguistically diverse and
so are the databases!
• That too with different dialects:
• SPARQL, CYPHER, Gremlin, etc
• RDF - SPARQL (W3C ‘08)
• Graph - ??
http://cdn2.wpbeginner.com/wp-content/uploads/2015/02/multilingual-wordpress.jpg
43. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
SPARQL - The RDF Query Language
● SPARQL Protocol And RDF Query Language
● Graph Pattern Matching (GPM) QL for RDF graphs (declarative)
● Defacto QL of RDF stores, W3C standard since Jan 2008.
● High importance in semantic web - querying knowledge graphs - QA
○ SPARQL : Semantic Web (web in general) what SQL : Rel.DBs
● Main components:
○ Graph patterns (WHERE) [BGPs {s p o .}, GGPs {...}, CGPs{ops}]
○ Prefixes - encode URIs
○ Query result from range of variables (SELECT)
More info: 1,2,
44. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Ex:SPARQL query
PREFIX a: <http://abc.com/prty>
SELECT DISTINCT ?x
WHERE { ?x a:Created ?y .}
?x
<http://abc.com/node/4>
<http://abc.com/node/2>
<http://abc.com/node/5>
Output
BGP
Matching patterns
45. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Gremlin - a PG Query Language
● A Graph Traversal Language and Machine (like Java - JVM)
● Offers both declarative (GPM) and imperative (traversal) constructs
● Supports both OLTP: Graph DBs and OLAP: Graph processors
● Popular for querying Property graphs.
More Info
46. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Gremlin
47. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Gremlin
http://www.datastax.com/wp-content/uploads/2015/09/many-to-many-mapping.png
http://www.datastax.com/wp-content/uploads/2015/09/gtm-dataflow.png
Gremlin’s Multi-Graph Query Language (GQL) support
48. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Contd…
Multi-DMS & platform support
https://tinkerpop.apache.org/images/oltp-and-olap.png
49. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Ex:Gremlin query
1. g.V().in(“Created”).dedup()
//imperative
2. g.V().match(
__.as(‘x’).out(‘Created’)
.as(‘y’)).select(‘x’).dedup()
// declarative - GPM
PREFIX a: <http://abc.com/prty>
SELECT DISTINCT ?x
WHERE { ?x a:Created ?y .}
==>x:v[4]
==>x:v[2]
==>x:v[5]
Output SPARQL
50. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
SPARQL vs. Gremlin Queries
● Linear, standard, graph pattern
queries (exact match)
● Less joins - less complexity -
better performance
● Benefit from macro indices
○ SPO, POS, etc
● Multi-hop, neighbourhood,
star/snowflake, traversal queries
(fuzzy match)
● Everything is a path - better
performance
● Benefit from micro-indices
51. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
SPARQL vs. Gremlin Queries
● Lacks looping and branching and
Graph analytical queries (OLAP)
● Supports federated querying
● Supports looping, branching and
Graph Analytical queries (OLAP)
● No federated querying
52. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
TRAVERSING using GREMLIN
53. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
TinkerPop Graph Computing
● Graph (data)
● Traverser (object, iterator)
● Traversal (a series of steps, functions)
Name: EXPERT TALK
Year: 2019
Place: GJ, IN
Name: Harsh
Age: 27
From: Bonn, DE
Role: speaker
Time: 60
Person Event
http://tinkerpop.apache.org/docs/current/reference/
54. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Gremlin steps
http://tinkerpop.apache.org/docs/current/reference/
55. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Gremlin steps
Vertex steps (flatMap)
http://tinkerpop.apache.org/docs/current/reference/
57. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
MovieLens data
● The MovieLens data set (https://grouplens.org/datasets/movielens/ ) gives you
movies, reviewers, and their ratings
● Provides movie recommendation data in a variety of sizes (of ratings):
○ 100K
○ 1M
○ 10M
○ 20M
58. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
MovieLens graph
occupation movie genreuser
uid: int
gender: m/f
zipcode: int
age: int
uid: int
name: string
uid: int
name: string
year: int
uid: int
name: string
occupation rated genre
stars: int
time: int
|V| = 9983
|E| = 1012657
|user| = 6040
|movie| = 3883
|occupation| = 42
|genre| = 18
59. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
MovieLens graph instance data
https://s3.amazonaws.com/dev.assets.neo4j.com/wp-content/uploads/2011/09/movie-graph.png
60. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
https://sd.keepcalm-o-matic.co.uk/i-w600/keep-calm-it-is-demo-time.jpg
https://github.com/harsh9t/Hands-on-with-MovieLens
61. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Reference material
● Practical Guide to Gremlin -
https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html
● Official reference documentation -
http://tinkerpop.apache.org/docs/current/reference/
● Courses -
https://academy.datastax.com/resources/getting-started-tinkerpop-and-gremlin
62. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
63. CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Questions?
Harsh Thakkar
SDA Lab, University of Bonn
Twitter: @harsh9t
LinkedIn: thakkarharsh
E-mail: harsh9t@gmail.com