Introduction to Property graphs and
the Gremlin traversal language
Harsh Thakkar
University of Bonn, Germany
February 15, 2019 - CHARUSAT - India
On the ubiquity of graphs and their applications in the real world
Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of BonnCHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019
Outline
● WHY graphs?
● WHERE are the graphs?
● WHAT exactly are graphs?
○ Graph Data Models
■ RDF & Property graphs
○ Graph Query Languages
■ SPARQL & Gremlin
● Traversing graphs
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
WDAqua ITN - http://wdaqua.eu/
● Answering Questions using Web Data (WDAqua) is a EU H2020 - Marie
Skłodowska Curie Actions - Innovative Training Networks (ITN)
● 15 PhD students across Europe
● Advancing SotA in data-driven Question Answering
● Reusable and extensible components for QA
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
WHY Graphs?
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
The RDBMSs . . .
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Limitation of RDBMS:
Modelling the real world uncertainty
CHAOS
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Relational vs. Graph
In which department does Alice study?
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
The rise of the Graphs
● Complex real world relationships
● Intuitive
● Schema-free
● Mathematically sound
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
The Graphs DBMSs
●
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
The Graphs DBMSs
Source: http://connected-data.london/wp-content/uploads/2017/07/Connected-Data-landscape-2017-v1.0-April2017.jpg
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
So WHERE are the Graphs?
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Airport connection
A map of the 3,275 global airports
and all of the connecting flight
routes. Designed by Martin
Grandjean, each bubble represents
an individual airport and the bubble
sizes represents the number of flight
routes (37,153 routes in total) based
on OpenFlights.org data.
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Social network of Saddam Hussein built
in 2003 made by C. Wilson.
Searching for Saddam: a five-part series
on how the US military used social
networking and intelligence agency to
capture the Iraqi dictator. 2010.
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Image source: https://www.nature.com/articles/nrg1272
A protein-protein interaction network for
yeast. A network of interactions between
proteins in the single-celled organism
Saccharomyces cerevisiae (bakerʹs yeast),
as determined using, primarily, two-hybrid
screen experiments. From Jeong et al.
Copyright Macmillan Publishers Ltd.
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Neuronal network
White matter tracts within a
human brain, as visualized
by MRItractography.
In a neuronal network, the
neurons are the nodes and the
synapses are the links between
them. These networks are
usually studied using Graph
theory and machine learning.
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
The Graph Data Model
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Graph Data Model
● Graphs are intuitive formalisms
● Represent complex natural and man-made networks (Genes, SNs)
● Data is increasingly connected (highly)
● Relationships within the data are an integral it
● Index-free, schema-free
● Mathematical foundation
● RDF and Property Graph two most popular graph data models
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Graph Elements
● Vertices / Nodes
● Edges / Relationships
● Labels
● Attributes / Properties (key-value pairs)
“person” “created”
age: 27 name: Harsh
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Graph Types
● Undirected
● Directed
● Labeled
● Attributed
● Multigraph
● A combination of the above
○ Labelled directed attributed multigraphs (Property graph)
○ Labelled directed multigraphs (edge-labeled or simple graph - RDF )
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
RDF Data Model
● RDF = Resource Description Framework
● W3C Recommendation for data modeling and encoding machine
readable content on the Web
● Data Model:
○ encodes structured information
○ universal, machine-readable interchange format (Serializations - .nt, .xml,
.ttl, etc)
○ data is structured in the form of graphs (RDF graphs)
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
RDF Data Model
● RDF is a triple based graph model, where :
○ Subject: URI, Blank node
○ Predicate: URIs -> property
○ Object: URI, Literal, Blank node
“2019”
“Harsh”
ex:Eventex:Person
ex:GJ,IN
“27” ex:age
ex:name
ex:Bonn
“EXPERT TALK”
ex:year
ex:name
ex:place
ex:Speaker
ex:place
URI = Universal Resource identifier, analogous
to ISBN for books
Literals = data values
Blank nodes = Desc. of entities that don’t need
to be named.
IRIs*
ex:stim
e
“60”
@prefix ex: <http://example.org>
ex:Person ex:Speaker ex:Event
ex:Person ex:name “Harsh”
ex:Person ex:place ex:Bonn
ex:Person ex:age “27”
ex:Event ex:name “EXPERT TALK”
ex:Event ex:Year “2019”
interpretation
representation
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
RDF Graphs (RDFGs)
● Edge-labelled, directed, multi-graphs (w. Ent. URIs, Blank nodes,
Literals)
● Going from information to Knowledge using OWL (DLs) and Ontologies
(RDFS, RDFa, etc)
● Bulky
○ Everything is a node-edge-node (edges dont have properties)
○ More relationships per node -> More total number of triples!
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Example - Modern Graph feat. RDFG
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Property Graph Data Model
● Edge-labelled, directed, attributed, multigraph
● Vertices and edges both have properties
● Main components:
○ Vertices, edges (Src,Dsc), properties (key-value pairs), labels
(strings)
● Super neat (compact), super cute (readable)
● Easier to add weighted, reified edges
● Query Languages - CYPHER, Gremlin, PGQL, etc
Name: EXPERT TALK
Year: 2019
Place: GJ, IN
Name: Harsh
Age: 27
From: Bonn, DE
Role: speaker
Time: 60
Person Event
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Example - PG viz
RDF
PG
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Graph Query Languages
• Yes we are linguistically diverse and
so are the databases!
• That too with different dialects:
• SPARQL, CYPHER, Gremlin, etc
• RDF - SPARQL (W3C ‘08)
• Graph - ??
http://cdn2.wpbeginner.com/wp-content/uploads/2015/02/multilingual-wordpress.jpg
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
SPARQL - The RDF Query Language
● SPARQL Protocol And RDF Query Language
● Graph Pattern Matching (GPM) QL for RDF graphs (declarative)
● Defacto QL of RDF stores, W3C standard since Jan 2008.
● High importance in semantic web - querying knowledge graphs - QA
○ SPARQL : Semantic Web (web in general) what SQL : Rel.DBs
● Main components:
○ Graph patterns (WHERE) [BGPs {s p o .}, GGPs {...}, CGPs{ops}]
○ Prefixes - encode URIs
○ Query result from range of variables (SELECT)
More info: 1,2,
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Ex:SPARQL query
PREFIX a: <http://abc.com/prty>
SELECT DISTINCT ?x
WHERE { ?x a:Created ?y .}
?x
<http://abc.com/node/4>
<http://abc.com/node/2>
<http://abc.com/node/5>
Output
BGP
Matching patterns
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Gremlin - a PG Query Language
● A Graph Traversal Language and Machine (like Java - JVM)
● Offers both declarative (GPM) and imperative (traversal) constructs
● Supports both OLTP: Graph DBs and OLAP: Graph processors
● Popular for querying Property graphs.
More Info
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Gremlin
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Gremlin
http://www.datastax.com/wp-content/uploads/2015/09/many-to-many-mapping.png
http://www.datastax.com/wp-content/uploads/2015/09/gtm-dataflow.png
Gremlin’s Multi-Graph Query Language (GQL) support
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Contd…
Multi-DMS & platform support
https://tinkerpop.apache.org/images/oltp-and-olap.png
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Ex:Gremlin query
1. g.V().in(“Created”).dedup()
//imperative
2. g.V().match(
__.as(‘x’).out(‘Created’)
.as(‘y’)).select(‘x’).dedup()
// declarative - GPM
PREFIX a: <http://abc.com/prty>
SELECT DISTINCT ?x
WHERE { ?x a:Created ?y .}
==>x:v[4]
==>x:v[2]
==>x:v[5]
Output SPARQL
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
SPARQL vs. Gremlin Queries
● Linear, standard, graph pattern
queries (exact match)
● Less joins - less complexity -
better performance
● Benefit from macro indices
○ SPO, POS, etc
● Multi-hop, neighbourhood,
star/snowflake, traversal queries
(fuzzy match)
● Everything is a path - better
performance
● Benefit from micro-indices
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
SPARQL vs. Gremlin Queries
● Lacks looping and branching and
Graph analytical queries (OLAP)
● Supports federated querying
● Supports looping, branching and
Graph Analytical queries (OLAP)
● No federated querying
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
TRAVERSING using GREMLIN
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
TinkerPop Graph Computing
● Graph (data)
● Traverser (object, iterator)
● Traversal (a series of steps, functions)
Name: EXPERT TALK
Year: 2019
Place: GJ, IN
Name: Harsh
Age: 27
From: Bonn, DE
Role: speaker
Time: 60
Person Event
http://tinkerpop.apache.org/docs/current/reference/
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Gremlin steps
http://tinkerpop.apache.org/docs/current/reference/
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Gremlin steps
Vertex steps (flatMap)
http://tinkerpop.apache.org/docs/current/reference/
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Gremlin steps
addE()
addV()
group()
groupCount()
pageRank()
…
map/sideEffect steps
properties() max()
values() mean()
sum() order()
count() fold()
label() match()
…
map steps
unfold() optional()
local() …
branch/flatMap steps
has()
hasLabel()
is()
not()
or()
and()
range()
where()
…
Filter steps
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
MovieLens data
● The MovieLens data set (https://grouplens.org/datasets/movielens/ ) gives you
movies, reviewers, and their ratings
● Provides movie recommendation data in a variety of sizes (of ratings):
○ 100K
○ 1M
○ 10M
○ 20M
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
MovieLens graph
occupation movie genreuser
uid: int
gender: m/f
zipcode: int
age: int
uid: int
name: string
uid: int
name: string
year: int
uid: int
name: string
occupation rated genre
stars: int
time: int
|V| = 9983
|E| = 1012657
|user| = 6040
|movie| = 3883
|occupation| = 42
|genre| = 18
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
MovieLens graph instance data
https://s3.amazonaws.com/dev.assets.neo4j.com/wp-content/uploads/2011/09/movie-graph.png
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
https://sd.keepcalm-o-matic.co.uk/i-w600/keep-calm-it-is-demo-time.jpg
https://github.com/harsh9t/Hands-on-with-MovieLens
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Reference material
● Practical Guide to Gremlin -
https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html
● Official reference documentation -
http://tinkerpop.apache.org/docs/current/reference/
● Courses -
https://academy.datastax.com/resources/getting-started-tinkerpop-and-gremlin
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
CHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
Questions?
Harsh Thakkar
SDA Lab, University of Bonn
Twitter: @harsh9t
LinkedIn: thakkarharsh
E-mail: harsh9t@gmail.com

Introduction to property graphs and gremlin

  • 1.
    Introduction to Propertygraphs and the Gremlin traversal language Harsh Thakkar University of Bonn, Germany February 15, 2019 - CHARUSAT - India On the ubiquity of graphs and their applications in the real world
  • 2.
    Introduction to PropertyGraphs and Gremlin ⦿ Harsh Thakkar ⦿ University of BonnCHARUSAT ⦿ Gujarat ⦿ India ⦿ February 2019 Outline ● WHY graphs? ● WHERE are the graphs? ● WHAT exactly are graphs? ○ Graph Data Models ■ RDF & Property graphs ○ Graph Query Languages ■ SPARQL & Gremlin ● Traversing graphs
  • 3.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn WDAqua ITN - http://wdaqua.eu/ ● Answering Questions using Web Data (WDAqua) is a EU H2020 - Marie Skłodowska Curie Actions - Innovative Training Networks (ITN) ● 15 PhD students across Europe ● Advancing SotA in data-driven Question Answering ● Reusable and extensible components for QA
  • 4.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn WHY Graphs?
  • 5.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn The RDBMSs . . .
  • 6.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Limitation of RDBMS: Modelling the real world uncertainty CHAOS
  • 7.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
  • 8.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
  • 9.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
  • 10.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
  • 11.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
  • 12.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
  • 13.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
  • 14.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
  • 15.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
  • 16.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
  • 17.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
  • 18.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Relational vs. Graph In which department does Alice study?
  • 19.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn The rise of the Graphs ● Complex real world relationships ● Intuitive ● Schema-free ● Mathematically sound
  • 20.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn The Graphs DBMSs ●
  • 21.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn The Graphs DBMSs Source: http://connected-data.london/wp-content/uploads/2017/07/Connected-Data-landscape-2017-v1.0-April2017.jpg
  • 22.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn So WHERE are the Graphs?
  • 23.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
  • 24.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
  • 25.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
  • 26.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Airport connection A map of the 3,275 global airports and all of the connecting flight routes. Designed by Martin Grandjean, each bubble represents an individual airport and the bubble sizes represents the number of flight routes (37,153 routes in total) based on OpenFlights.org data.
  • 27.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
  • 28.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Social network of Saddam Hussein built in 2003 made by C. Wilson. Searching for Saddam: a five-part series on how the US military used social networking and intelligence agency to capture the Iraqi dictator. 2010.
  • 29.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Image source: https://www.nature.com/articles/nrg1272 A protein-protein interaction network for yeast. A network of interactions between proteins in the single-celled organism Saccharomyces cerevisiae (bakerʹs yeast), as determined using, primarily, two-hybrid screen experiments. From Jeong et al. Copyright Macmillan Publishers Ltd.
  • 30.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Neuronal network White matter tracts within a human brain, as visualized by MRItractography. In a neuronal network, the neurons are the nodes and the synapses are the links between them. These networks are usually studied using Graph theory and machine learning.
  • 31.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn The Graph Data Model
  • 32.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Graph Data Model ● Graphs are intuitive formalisms ● Represent complex natural and man-made networks (Genes, SNs) ● Data is increasingly connected (highly) ● Relationships within the data are an integral it ● Index-free, schema-free ● Mathematical foundation ● RDF and Property Graph two most popular graph data models
  • 33.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Graph Elements ● Vertices / Nodes ● Edges / Relationships ● Labels ● Attributes / Properties (key-value pairs) “person” “created” age: 27 name: Harsh
  • 34.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
  • 35.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Graph Types ● Undirected ● Directed ● Labeled ● Attributed ● Multigraph ● A combination of the above ○ Labelled directed attributed multigraphs (Property graph) ○ Labelled directed multigraphs (edge-labeled or simple graph - RDF )
  • 36.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn RDF Data Model ● RDF = Resource Description Framework ● W3C Recommendation for data modeling and encoding machine readable content on the Web ● Data Model: ○ encodes structured information ○ universal, machine-readable interchange format (Serializations - .nt, .xml, .ttl, etc) ○ data is structured in the form of graphs (RDF graphs)
  • 37.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn RDF Data Model ● RDF is a triple based graph model, where : ○ Subject: URI, Blank node ○ Predicate: URIs -> property ○ Object: URI, Literal, Blank node “2019” “Harsh” ex:Eventex:Person ex:GJ,IN “27” ex:age ex:name ex:Bonn “EXPERT TALK” ex:year ex:name ex:place ex:Speaker ex:place URI = Universal Resource identifier, analogous to ISBN for books Literals = data values Blank nodes = Desc. of entities that don’t need to be named. IRIs* ex:stim e “60” @prefix ex: <http://example.org> ex:Person ex:Speaker ex:Event ex:Person ex:name “Harsh” ex:Person ex:place ex:Bonn ex:Person ex:age “27” ex:Event ex:name “EXPERT TALK” ex:Event ex:Year “2019” interpretation representation
  • 38.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn RDF Graphs (RDFGs) ● Edge-labelled, directed, multi-graphs (w. Ent. URIs, Blank nodes, Literals) ● Going from information to Knowledge using OWL (DLs) and Ontologies (RDFS, RDFa, etc) ● Bulky ○ Everything is a node-edge-node (edges dont have properties) ○ More relationships per node -> More total number of triples!
  • 39.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Example - Modern Graph feat. RDFG
  • 40.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Property Graph Data Model ● Edge-labelled, directed, attributed, multigraph ● Vertices and edges both have properties ● Main components: ○ Vertices, edges (Src,Dsc), properties (key-value pairs), labels (strings) ● Super neat (compact), super cute (readable) ● Easier to add weighted, reified edges ● Query Languages - CYPHER, Gremlin, PGQL, etc Name: EXPERT TALK Year: 2019 Place: GJ, IN Name: Harsh Age: 27 From: Bonn, DE Role: speaker Time: 60 Person Event
  • 41.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Example - PG viz RDF PG
  • 42.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Graph Query Languages • Yes we are linguistically diverse and so are the databases! • That too with different dialects: • SPARQL, CYPHER, Gremlin, etc • RDF - SPARQL (W3C ‘08) • Graph - ?? http://cdn2.wpbeginner.com/wp-content/uploads/2015/02/multilingual-wordpress.jpg
  • 43.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn SPARQL - The RDF Query Language ● SPARQL Protocol And RDF Query Language ● Graph Pattern Matching (GPM) QL for RDF graphs (declarative) ● Defacto QL of RDF stores, W3C standard since Jan 2008. ● High importance in semantic web - querying knowledge graphs - QA ○ SPARQL : Semantic Web (web in general) what SQL : Rel.DBs ● Main components: ○ Graph patterns (WHERE) [BGPs {s p o .}, GGPs {...}, CGPs{ops}] ○ Prefixes - encode URIs ○ Query result from range of variables (SELECT) More info: 1,2,
  • 44.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Ex:SPARQL query PREFIX a: <http://abc.com/prty> SELECT DISTINCT ?x WHERE { ?x a:Created ?y .} ?x <http://abc.com/node/4> <http://abc.com/node/2> <http://abc.com/node/5> Output BGP Matching patterns
  • 45.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Gremlin - a PG Query Language ● A Graph Traversal Language and Machine (like Java - JVM) ● Offers both declarative (GPM) and imperative (traversal) constructs ● Supports both OLTP: Graph DBs and OLAP: Graph processors ● Popular for querying Property graphs. More Info
  • 46.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Gremlin
  • 47.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Gremlin http://www.datastax.com/wp-content/uploads/2015/09/many-to-many-mapping.png http://www.datastax.com/wp-content/uploads/2015/09/gtm-dataflow.png Gremlin’s Multi-Graph Query Language (GQL) support
  • 48.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Contd… Multi-DMS & platform support https://tinkerpop.apache.org/images/oltp-and-olap.png
  • 49.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Ex:Gremlin query 1. g.V().in(“Created”).dedup() //imperative 2. g.V().match( __.as(‘x’).out(‘Created’) .as(‘y’)).select(‘x’).dedup() // declarative - GPM PREFIX a: <http://abc.com/prty> SELECT DISTINCT ?x WHERE { ?x a:Created ?y .} ==>x:v[4] ==>x:v[2] ==>x:v[5] Output SPARQL
  • 50.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn SPARQL vs. Gremlin Queries ● Linear, standard, graph pattern queries (exact match) ● Less joins - less complexity - better performance ● Benefit from macro indices ○ SPO, POS, etc ● Multi-hop, neighbourhood, star/snowflake, traversal queries (fuzzy match) ● Everything is a path - better performance ● Benefit from micro-indices
  • 51.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn SPARQL vs. Gremlin Queries ● Lacks looping and branching and Graph analytical queries (OLAP) ● Supports federated querying ● Supports looping, branching and Graph Analytical queries (OLAP) ● No federated querying
  • 52.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn TRAVERSING using GREMLIN
  • 53.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn TinkerPop Graph Computing ● Graph (data) ● Traverser (object, iterator) ● Traversal (a series of steps, functions) Name: EXPERT TALK Year: 2019 Place: GJ, IN Name: Harsh Age: 27 From: Bonn, DE Role: speaker Time: 60 Person Event http://tinkerpop.apache.org/docs/current/reference/
  • 54.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Gremlin steps http://tinkerpop.apache.org/docs/current/reference/
  • 55.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Gremlin steps Vertex steps (flatMap) http://tinkerpop.apache.org/docs/current/reference/
  • 56.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Gremlin steps addE() addV() group() groupCount() pageRank() … map/sideEffect steps properties() max() values() mean() sum() order() count() fold() label() match() … map steps unfold() optional() local() … branch/flatMap steps has() hasLabel() is() not() or() and() range() where() … Filter steps
  • 57.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn MovieLens data ● The MovieLens data set (https://grouplens.org/datasets/movielens/ ) gives you movies, reviewers, and their ratings ● Provides movie recommendation data in a variety of sizes (of ratings): ○ 100K ○ 1M ○ 10M ○ 20M
  • 58.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn MovieLens graph occupation movie genreuser uid: int gender: m/f zipcode: int age: int uid: int name: string uid: int name: string year: int uid: int name: string occupation rated genre stars: int time: int |V| = 9983 |E| = 1012657 |user| = 6040 |movie| = 3883 |occupation| = 42 |genre| = 18
  • 59.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn MovieLens graph instance data https://s3.amazonaws.com/dev.assets.neo4j.com/wp-content/uploads/2011/09/movie-graph.png
  • 60.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn https://sd.keepcalm-o-matic.co.uk/i-w600/keep-calm-it-is-demo-time.jpg https://github.com/harsh9t/Hands-on-with-MovieLens
  • 61.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Reference material ● Practical Guide to Gremlin - https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html ● Official reference documentation - http://tinkerpop.apache.org/docs/current/reference/ ● Courses - https://academy.datastax.com/resources/getting-started-tinkerpop-and-gremlin
  • 62.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn
  • 63.
    CHARUSAT ⦿ Gujarat⦿ India ⦿ February 2019 Introduction to Property Graphs and Gremlin ⦿ Harsh Thakkar ⦿ University of Bonn Questions? Harsh Thakkar SDA Lab, University of Bonn Twitter: @harsh9t LinkedIn: thakkarharsh E-mail: harsh9t@gmail.com