Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Gremlin’s Graph Traversal Machinery
Dr. Marko A. Rodriguez
Director of Engineering at DataStax, Inc.
Project Management Co...
f : X ! X
The function f is a process that maps a structure of type X to a structure of type X.
f(x1) = x2
The function f maps the object x1 (from the set of X) to the object x2 (from the set of X).
x1 2 X x2 2 X
f(x)
A step is a function.
f(x)
Assume that this step rotates an X by 90°.
90°
90°
The algorithm of the step is a “black box.”
90°
A traverser wraps a value of type V.
class Traverser<V> {
V value;
}
class Traverser<V> {
V value;
}
90°
The step maps an integer traverser to an integer traverser.
class Traverser<V> {
V value;
}
class Traverser<V> {
V val...
90°
A traverser of with a rotation of 0° becomes a traverser with a rotation of 90°.
Traverser(0) Traverser(90)
class Trav...
90°
Both the input and output traversers are of type Traverser<Integer>.
90°
T[N] ! T[N]
2 T[N]
2 T[N]
A stream of input traversers…
90°
…yields a stream of output traversers.
90°
A traverser can have a bulk which denotes how many V values it represents.
90°
class Traverser<V> {
V value;
long bulk;
}
...
4
4
Bulking groups identical traversers to reduce the number of evaluations of a step.
90°
class Traverser<V> {
V value;
l...
A variegated stream of input traversers yields a variegated stream of output traversers.
90°
1
2
1 1
1
2
Bulking can reduce the size of the stream.
90°
class Traverser<V> {
V value;
long bulk;
}
class Traverser<V> {...
If the order of the stream does not matter…
90°
…then reordering can increase the likelihood of bulking.
90°
1
21
total bulk = 4
total count = 4
total bulk = 4
total coun...
90°
180°
270°
360°
A traversal is a list of one or more steps.
90° 90° 90°
90°
f : X ! Y
Different functions can yield different mappings between different domains and ranges.
h : Y ! Z
The output of f can be the input to h because the range of f is the domain of h (i.e. Y).
Y y = f(x)
Z z = h(y)
y = f(x)
z = h(y)
y = f(x)
z = h(y)
f(x) = y
h(y) = z
f(x) = y
h(y) = z
f(x) = yh(y) = z
f(x) h = z
f h = zx
f h = zx · ·
x · f · h = z
h(f(x)) = z
readable
unreadable
≣
z = x · f · h
z = x.f().h().next()
z = x · f · h
stream/iterator/traversalhead/start
90° 225°
315°
Different types of steps exist and their various compositions yield query expressivity.
90° 225°
315°
Steps process traverser streams/flows and their composition is a traversal/iterator.
= . (). ().next()90° 225°
repeat( ).times(2)
180°
Anonymous traversals can serve as step arguments.
90°
traversal with a single step
90°
repeat( ).times(2)
180°
Some functions serve as step-modulators instead of steps.
90°
…groupCount().by(out().count())
…sel...
repeat( ).times(2)
≣
90° 90°
180°
During optimization, traversal strategies may rewrite a traversal to a more efficient,
sem...
repeat( ).until(0°)90°
Continuously process a traverser until some condition is met.
repeat( ).until(0°)
4 loops
2 loops
1 loop
90° 180° 360°
90°do while(x != )
repeat().until() provides do/while-semantics.
...
until(0°).repeat( )
0 loops
2 loops
1 loop
90° 180° 0°
90°dowhile(x != )
90°
≣
until().repeat() provides while/do-semantic...
until(0°).repeat( )
3
Even if the traversers in the input stream can not be bulked, the output traversers may be able to b...
filter(x)map(x) sideEffect(x)flatMap(x)
one-to-one one-to-many one-to-(one-or-none) one-to-same
m
filter(x)map(x) sideEffect(x)flatMap(x)
one-to-one one-to-many one-to-(one-or-none) one-to-same
out(“knows”)
has(“name”,”g...
values(“age”)
filter(x)map(x) sideEffect(x)flatMap(x)
one-to-one one-to-many one-to-(one-or-none) one-to-same
out(“knows”)...
T[V [ E] ! T[N+
]
values(“age”)
filter(x)map(x) sideEffect(x)flatMap(x)
one-to-one one-to-many one-to-(one-or-none) one-to...
T[V [ E] ! T[N+
]
values(“age”)
out(“knows”)
has(“name”,”gremlin”)
groupCount()
T[V [ E] ! ; [ T[V [ E]
T[V ] ! T[V ]⇤
V()...
T[V [ E] ! T[N+
]values(“age”)
out(“knows”)
has(“name”,”gremlin”)
groupCount()
T[V ] ! T[V ]⇤
V()
T[?] ! T[map[?, N+
]]
T[...
values(“age”)
out(“knows”)
has(“name”,”gremlin”)
groupCount()
T[V ] ! T[V ]⇤
V() T[G] ! T[V ]⇤
g.V().has(“name”,”gremlin”)...
g.V().has(“name”,”gremlin”).
out(“knows”).values(“age”).
groupCount()
What is the distribution of ages of the people that ...
g.V().has(“name”,”gremlin”).
out(“knows”).values(“age”).
groupCount()
one graph to many vertices
(flatMap)
…
g.V().has(“name”,”gremlin”).
out(“knows”).values(“age”).
groupCount()
one graph to many vertices
(flatMap)
one vertex
to th...
g.V().has(“name”,”gremlin”).
out(“knows”).values(“age”).
groupCount()
one graph to many vertices
(flatMap)
one vertex
to th...
g.V().has(“name”,”gremlin”).
out(“knows”).values(“age”).
groupCount()
one graph to many vertices
(flatMap)
one vertex
to th...
g.V().has(“name”,”gremlin”).
out(“knows”).values(“age”).
groupCount()
one graph to many vertices
(flatMap)
one vertex
to th...
The Gremlin Traversal Language
The Gremlin Traversal Machine
a b c
a b c
Traversal creation via
step composition
Step parameterization via
traversal and constant nesting
a().b().c()
a...
class Traverser<V> {
V value;
long bulk;
}
class Step<S,E> {
Traverser<E> processNextStart();
}
f(x)
class Traversal<S,E> ...
1
Bytecode
Gremlin-Java
g.V(1).
repeat(out(“knows”)).times(2).
groupCount().by(“age”)
[[V, 1]
[repeat, [[out, knows]]]
[ti...
1
Bytecode
Gremlin-Python
g.V(1).
repeat(out(‘knows’)).times(2).
groupCount().by(‘age’)
[[V, 1]
[repeat, [[out, knows]]]
[...
1
Bytecode
Gremlin-Python
g.V(1).
repeat(out(‘knows’)).times(2).
groupCount().by(‘age’)
[[V, 1]
[repeat, [[out, knows]]]
[...
LanguageMachine
Gremlin-Python
Gremlin-Groovy
Gremlin-Java
Gremlin-JavaScriptGremlin-Ruby
Gremlin-Scala
Gremlin-Clojure
By...
Python 2.7.2 (default, Oct 11 2012, 20:14:37)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin...
Python 2.7.2 (default, Oct 11 2012, 20:14:37)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin...
Python 2.7.2 (default, Oct 11 2012, 20:14:37)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin...
Python 2.7.2 (default, Oct 11 2012, 20:14:37)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin...
Cypher
Bytecode
Distinct query languages (not only Gremlin language variants) can generate bytecode for
evaluation by any ...
SELECT DISTINCT ?c
WHERE {
?a v:label "person" .
?a e:created ?b .
?b v:name ?c .
?a v:age ?d .
FILTER (?d > 30)
}
[
[V],
...
Graph
Database
OLTP
Graph
Processor
OLAP
Bytecode
Bytecode
Bytecode
TinkerGraph
DSE Graph
Titan
Neo4j
OrientDB
IBM Graph
…...
Graph
Database
OLTP
Graph
Processor
OLAP
Gremlin Traversal Machine
Traversal
Traversal
Traversal
TraversalStrategies Trave...
Most OLTP graph systems have a traversal strategy that combines
[V,has*]-sequences into a single global index-based flatMap...
Most OLAP graph systems have a traversal strategy that bypasses Traversal semantics
and implements reducers using the nati...
Physical Machine
DataProgram Traversal
Heap/DiskMemory Memory
Memory/Graph System
Physical Machine
Java
Virtual Machine
by...
Stakeholders
Language Providers
Gremlin Language Variant
Distinct Query Language
Application Developers Graph System Provi...
Stakeholders
Application Developers
One query language for
all OLTP/OLAP systems.
GremlinG = (V, E)
Real-time and analytic...
Stakeholders
Application Developers
One query language for
all OLTP/OLAP systems.
No vendor lock-in.
Apache TinkerPop as t...
Stakeholders
Application Developers
One query language for
all OLTP/OLAP systems.
No vendor lock-in.
Gremlin is embedded i...
Stakeholders
Language Providers
Gremlin Language Variant
Distinct Query Language
Easy to generate bytecode.
GraphTraversal...
Stakeholders
Language Providers
Gremlin Language Variant
Distinct Query Language
Easy to generate bytecode.
Bytecode execu...
Graph
Database
OLTP
Graph
Processor
OLAP
Stakeholders
Language Providers
Gremlin Language Variant
Distinct Query Language
...
Easy to implement core
interfaces.
Graph System Providers
Stakeholders
OLAP Provider
OLTP Provider
Vertex
Edge
Graph
Trans...
Provider supports all
provided languages.
Easy to implement core
interfaces.
Graph System Providers
Stakeholders
OLAP Prov...
OLTP providers can leverage
existing OLAP systems.
Provider supports all
provided languages.
Easy to implement core
interf...
Stakeholders
Language Providers
Gremlin Language Variant
Distinct Query Language
Application Developers Graph System Provi...
Thank you.
http://tinkerpop.apache.org
http://www.datastax.com/products/datastax-enterprise-graph
Upcoming SlideShare
Loading in …5
×

Gremlin's Graph Traversal Machinery

3,481 views

Published on

This talk argues that the future of data query/analytic languages will be all about embedding the language into the native programming language of the developer. As an example of this style, the Gremlin graph traversal language is presented. Gremlin can be represented in any programming language that supports function composition and function nesting. The language representation is then compiled to Gremlin bytecode to ultimately be executed by the/a Gremlin graph traversal machine. This enables both the Gremlin language and machine to be agnostic to the execution language.

Published in: Technology
  • Be the first to comment

Gremlin's Graph Traversal Machinery

  1. 1. Gremlin’s Graph Traversal Machinery Dr. Marko A. Rodriguez Director of Engineering at DataStax, Inc. Project Management Committee, Apache TinkerPop http://tinkerpop.apache.org
  2. 2. f : X ! X The function f is a process that maps a structure of type X to a structure of type X.
  3. 3. f(x1) = x2 The function f maps the object x1 (from the set of X) to the object x2 (from the set of X). x1 2 X x2 2 X
  4. 4. f(x) A step is a function.
  5. 5. f(x) Assume that this step rotates an X by 90°. 90°
  6. 6. 90° The algorithm of the step is a “black box.”
  7. 7. 90° A traverser wraps a value of type V. class Traverser<V> { V value; } class Traverser<V> { V value; }
  8. 8. 90° The step maps an integer traverser to an integer traverser. class Traverser<V> { V value; } class Traverser<V> { V value; } Traverser<Integer> Traverser<Integer>
  9. 9. 90° A traverser of with a rotation of 0° becomes a traverser with a rotation of 90°. Traverser(0) Traverser(90) class Traverser<V> { V value; } class Traverser<V> { V value; }
  10. 10. 90°
  11. 11. Both the input and output traversers are of type Traverser<Integer>. 90° T[N] ! T[N] 2 T[N] 2 T[N]
  12. 12. A stream of input traversers… 90°
  13. 13. …yields a stream of output traversers. 90°
  14. 14. A traverser can have a bulk which denotes how many V values it represents. 90° class Traverser<V> { V value; long bulk; } class Traverser<V> { V value; long bulk; }
  15. 15. 4 4 Bulking groups identical traversers to reduce the number of evaluations of a step. 90° class Traverser<V> { V value; long bulk; } class Traverser<V> { V value; long bulk; }
  16. 16. A variegated stream of input traversers yields a variegated stream of output traversers. 90°
  17. 17. 1 2 1 1 1 2 Bulking can reduce the size of the stream. 90° class Traverser<V> { V value; long bulk; } class Traverser<V> { V value; long bulk; }
  18. 18. If the order of the stream does not matter… 90°
  19. 19. …then reordering can increase the likelihood of bulking. 90° 1 21 total bulk = 4 total count = 4 total bulk = 4 total count = 3
  20. 20. 90° 180° 270° 360° A traversal is a list of one or more steps. 90° 90° 90° 90°
  21. 21. f : X ! Y Different functions can yield different mappings between different domains and ranges. h : Y ! Z
  22. 22. The output of f can be the input to h because the range of f is the domain of h (i.e. Y). Y y = f(x) Z z = h(y)
  23. 23. y = f(x) z = h(y)
  24. 24. y = f(x) z = h(y)
  25. 25. f(x) = y h(y) = z
  26. 26. f(x) = y h(y) = z
  27. 27. f(x) = yh(y) = z
  28. 28. f(x) h = z
  29. 29. f h = zx
  30. 30. f h = zx · ·
  31. 31. x · f · h = z h(f(x)) = z readable unreadable ≣
  32. 32. z = x · f · h
  33. 33. z = x.f().h().next() z = x · f · h stream/iterator/traversalhead/start
  34. 34. 90° 225° 315° Different types of steps exist and their various compositions yield query expressivity.
  35. 35. 90° 225° 315° Steps process traverser streams/flows and their composition is a traversal/iterator. = . (). ().next()90° 225°
  36. 36. repeat( ).times(2) 180° Anonymous traversals can serve as step arguments. 90° traversal with a single step 90°
  37. 37. repeat( ).times(2) 180° Some functions serve as step-modulators instead of steps. 90° …groupCount().by(out().count()) …select(“a”,”b”).by(“name”) …addE(“knows”).from(“a”).to(“b”) …order().by(“age”,decr)
  38. 38. repeat( ).times(2) ≣ 90° 90° 180° During optimization, traversal strategies may rewrite a traversal to a more efficient, semantically equivalent form. 90° RepeatUnrollStrategy interface TraversalStrategy { void apply(Traversal traversal); Set<TraversalStrategy> applyPrior(); Set<TraversalStrategy> applyPost(); }
  39. 39. repeat( ).until(0°)90° Continuously process a traverser until some condition is met.
  40. 40. repeat( ).until(0°) 4 loops 2 loops 1 loop 90° 180° 360° 90°do while(x != ) repeat().until() provides do/while-semantics. 90° ≣
  41. 41. until(0°).repeat( ) 0 loops 2 loops 1 loop 90° 180° 0° 90°dowhile(x != ) 90° ≣ until().repeat() provides while/do-semantics.
  42. 42. until(0°).repeat( ) 3 Even if the traversers in the input stream can not be bulked, the output traversers may be able to be. 90°
  43. 43. filter(x)map(x) sideEffect(x)flatMap(x) one-to-one one-to-many one-to-(one-or-none) one-to-same m
  44. 44. filter(x)map(x) sideEffect(x)flatMap(x) one-to-one one-to-many one-to-(one-or-none) one-to-same out(“knows”) has(“name”,”gremlin”) groupCount(“m”) where(“a”,eq(“b”)) select(“a”,”b”) path() mean() sum() count() groupCount() many-to-one(reducers) order() values(“age”) values(“name”) and(x,y) or(x,y) coin(0.5) sample(10) simplePath() dedup() store(“m”) tree(“m”) subgraph(“m”) in(“created”) group(“m”) m label() id() * A collection of examples. Not the full step library. match(x,y,z) properties() outE(“knows”) V()
  45. 45. values(“age”) filter(x)map(x) sideEffect(x)flatMap(x) one-to-one one-to-many one-to-(one-or-none) one-to-same out(“knows”) has(“name”,”gremlin”) path() groupCount() simplePath() m label() outE(“knows”) group(“m”) V()
  46. 46. T[V [ E] ! T[N+ ] values(“age”) filter(x)map(x) sideEffect(x)flatMap(x) one-to-one one-to-many one-to-(one-or-none) one-to-same out(“knows”) has(“name”,”gremlin”) path() groupCount() simplePath() m label() outE(“knows”) T[?] ! T[path] T[V [ E] ! T[string] T[?] ! T[?] T[V [ E] ! ; [ T[V [ E] T[?] ! ; [ T[?] T[V ] ! T[V ]⇤ T[V ] ! T[E]⇤ group(“m”) V()T[?] ! T[map[?, N+ ]] T[G] ! T[V ]⇤
  47. 47. T[V [ E] ! T[N+ ] values(“age”) out(“knows”) has(“name”,”gremlin”) groupCount() T[V [ E] ! ; [ T[V [ E] T[V ] ! T[V ]⇤ V()T[?] ! T[map[?, N+ ]] T[G] ! T[V ]⇤ g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount()
  48. 48. T[V [ E] ! T[N+ ]values(“age”) out(“knows”) has(“name”,”gremlin”) groupCount() T[V ] ! T[V ]⇤ V() T[?] ! T[map[?, N+ ]] T[G] ! T[V ]⇤ g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount() Steps can be composed if their respective domains and ranges match. T[V [ E] ! ; [ T[V [ E]
  49. 49. values(“age”) out(“knows”) has(“name”,”gremlin”) groupCount() T[V ] ! T[V ]⇤ V() T[G] ! T[V ]⇤ g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount() T[V ] ! ; [ T[V ] T[V ] ! T[N+ ] T[N+ ] ! T[map[N+ , N+ ]]
  50. 50. g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount() What is the distribution of ages of the people that Gremlin knows?
  51. 51. g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount() one graph to many vertices (flatMap) …
  52. 52. g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount() one graph to many vertices (flatMap) one vertex to that vertex or no vertex (filter) ? …
  53. 53. g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount() one graph to many vertices (flatMap) one vertex to that vertex or no vertex (filter) one vertex to many friend vertices (flatMap) ? … name=gremlin
  54. 54. g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount() one graph to many vertices (flatMap) one vertex to that vertex or no vertex (filter) one vertex to many friend vertices (flatMap) one vertex to one age value (map) ? … 37 name=gremlin
  55. 55. g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount() one graph to many vertices (flatMap) one vertex to that vertex or no vertex (filter) one vertex to many friend vertices (flatMap) one vertex to one age value (map) many age values to an age distribution (map — reducer) ? … 37 [37:2, 41:1, 24:1, 35:4]37 37 24 35 35 35 35 41 name=gremlin
  56. 56. The Gremlin Traversal Language The Gremlin Traversal Machine
  57. 57. a b c a b c Traversal creation via step composition Step parameterization via traversal and constant nesting a().b().c() a(b().c()).d(x)d(x) function com position function nesting fluent m ethods m ethod argum ents Any language that supports function composition and function nesting can host Gremlin. Gremlin Traversal Language
  58. 58. class Traverser<V> { V value; long bulk; } class Step<S,E> { Traverser<E> processNextStart(); } f(x) class Traversal<S,E> implements Iterator<E> { E next(); Traverser<E> nextTraverser(); } The fundamental constructs of Gremlin’s machinery. Gremlin Traversal Machine interface TraversalStrategy { void apply(Traversal traversal); Set<TraversalStrategy> applyPrior(); Set<TraversalStrategy> applyPost(); } a db c a de ≣
  59. 59. 1 Bytecode Gremlin-Java g.V(1). repeat(out(“knows”)).times(2). groupCount().by(“age”) [[V, 1] [repeat, [[out, knows]]] [times, 2] [groupCount] [by, age]] Traversal GraphStep GroupCountStep RepeatStep VertexStep GraphStep GroupCountStepVertexStep VertexStep Traversal Strategies GraphStep GroupCountStepVertexStep VertexStep translates compiles optimizes executes [29:2, 30:1, 31:1, 35:10]
  60. 60. 1 Bytecode Gremlin-Python g.V(1). repeat(out(‘knows’)).times(2). groupCount().by(‘age’) [[V, 1] [repeat, [[out, knows]]] [times, 2] [groupCount] [by, age]] Traversal GraphStep GroupCountStep RepeatStep VertexStep GraphStep GroupCountStepVertexStep VertexStep Traversal Strategies GraphStep GroupCountStepVertexStep VertexStep translates compiles optimizes executes [29:2, 30:1, 31:1, 35:10]
  61. 61. 1 Bytecode Gremlin-Python g.V(1). repeat(out(‘knows’)).times(2). groupCount().by(‘age’) [[V, 1] [repeat, [[out, knows]]] [times, 2] [groupCount] [by, age]] Traversal GraphStep GroupCountStep RepeatStep VertexStep GraphStep GroupCountStepVertexStep VertexStep Traversal Strategies GraphStep GroupCountStepVertexStep VertexStep translates compiles optimizes executes [29:2, 30:1, 31:1, 35:10] Gremlin language variant Language agnostic bytecode Execution engine assembly Execution engine optimization Execution engine evaluation LanguageMachine
  62. 62. LanguageMachine Gremlin-Python Gremlin-Groovy Gremlin-Java Gremlin-JavaScriptGremlin-Ruby Gremlin-Scala Gremlin-Clojure Bytecode Java-based Implementation ? ?-based Implementation ? ?
  63. 63. Python 2.7.2 (default, Oct 11 2012, 20:14:37) [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from gremlin_python.structure.graph import Graph >>> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection Gremlin-Python CPython
  64. 64. Python 2.7.2 (default, Oct 11 2012, 20:14:37) [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from gremlin_python.structure.graph import Graph >>> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection >>> graph = Graph() >>> g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182','g')) Gremlin-Python DriverRemoteConnection CPython
  65. 65. Python 2.7.2 (default, Oct 11 2012, 20:14:37) [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from gremlin_python.structure.graph import Graph >>> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection >>> graph = Graph() >>> g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182','g')) # nested traversal with Python slicing and attribute interception extensions >>> g.V().hasLabel("person").repeat(both()).times(2).name[0:2].toList() [u'marko', u'marko'] Gremlin-Python Bytecode DriverRemoteConnection Gremlin Traversal Machine CPython JVM
  66. 66. Python 2.7.2 (default, Oct 11 2012, 20:14:37) [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from gremlin_python.structure.graph import Graph >>> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection >>> graph = Graph() >>> g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182','g')) # nested traversal with Python slicing and attribute interception extensions >>> g.V().hasLabel("person").repeat(both()).times(2).name[0:2].toList() [u'marko', u'marko'] # a complex, nested multi-line traversal >>> g.V().match( ... as_(“a”).out("created").as_(“b”), ... as_(“b”).in_(“created").as_(“c”), ... as_(“a”).out("knows").as_(“c”)). ... select("c"). ... union(in_(“knows"),out("created")). ... name.toList() [u'ripple', u'marko', u'lop'] >>> Gremlin-Python Bytecode DriverRemoteConnection Gremlin Traversal Machine CPython JVM
  67. 67. Cypher Bytecode Distinct query languages (not only Gremlin language variants) can generate bytecode for evaluation by any OLTP/OLAP TinkerPop-enabled graph system. Gremlin Traversal Machine
  68. 68. SELECT DISTINCT ?c WHERE { ?a v:label "person" . ?a e:created ?b . ?b v:name ?c . ?a v:age ?d . FILTER (?d > 30) } [ [V], [match, [[as, a], [hasLabel, person]], [[as, a], [out, created], [as, b]], [[as, a], [has, age, gt(30)]]], [select, b], [by, name], [dedup] ] Bytecode
  69. 69. Graph Database OLTP Graph Processor OLAP Bytecode Bytecode Bytecode TinkerGraph DSE Graph Titan Neo4j OrientDB IBM Graph … TinkerComputer Spark Giraph Hadoop Fulgora … Gremlin Traversal Machine Traversal Traversal Traversal Gremlin traversals can be executed against OLTP graph databases and OLAP graph processors. That means that if, e.g., SPARQL compiles to bytecode, it can execute both OLTP and OLAP.
  70. 70. Graph Database OLTP Graph Processor OLAP Gremlin Traversal Machine Traversal Traversal Traversal TraversalStrategies TraversalStrategies optimizes Graph system providers (OLTP/OLAP) typically have custom strategies registered with the Gremlin traversal machine that take advantage of unique, provider-specific features.
  71. 71. Most OLTP graph systems have a traversal strategy that combines [V,has*]-sequences into a single global index-based flatMap-step. g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount() one graph to many vertices using index lookup (flatMap) GraphStepStrategy one graph to many vertices (flatMap) one vertex to that vertex or no vertex (filter) ? … compiles optimizes name=gremlin DataStax Enterprise Graph
  72. 72. Most OLAP graph systems have a traversal strategy that bypasses Traversal semantics and implements reducers using the native API of the system. g.V().count() one graph to long (map — reducer) rdd.count() 12,146,934 compiles one graph to many vertices (flatMap) many vertices to long (map — reducer) … 12,146,934 optimizes SparkInterceptorStrategy …
  73. 73. Physical Machine DataProgram Traversal Heap/DiskMemory Memory Memory/Graph System Physical Machine Java Virtual Machine bytecode steps DataProgram Memory/DiskMemory Physical Machine instructions Java Virtual Machine Gremlin Traversal Machine From the physical computing machine to the Gremlin traversal machine.
  74. 74. Stakeholders Language Providers Gremlin Language Variant Distinct Query Language Application Developers Graph System Providers OLAP Provider OLTP Provider
  75. 75. Stakeholders Application Developers One query language for all OLTP/OLAP systems. GremlinG = (V, E) Real-time and analytic queries are represented in Gremlin. Graph Database OLTP Graph Processor OLAP
  76. 76. Stakeholders Application Developers One query language for all OLTP/OLAP systems. No vendor lock-in. Apache TinkerPop as the JDBC for graphs. DataStax Enterprise Graph
  77. 77. Stakeholders Application Developers One query language for all OLTP/OLAP systems. No vendor lock-in. Gremlin is embedded in the developer’s language. Iterator<String> result = g.V().hasLabel(“person”). order().by(“age”). limit(10).values(“name”) vs. ResultSet result = statement.executeQuery( “SELECT name FROM People n” + “ ORDER BY age n” + “ LIMIT 10”) Grem lin-Java SQL in Java No “fat strings.” The developer writes their graph database/processor queries in their native programming language.
  78. 78. Stakeholders Language Providers Gremlin Language Variant Distinct Query Language Easy to generate bytecode. GraphTraversal.getMethods() .findAll { GraphTraversal.class == it.returnType } .collect { it.name } .unique() .each { pythonClass.append( """ def ${it}(self, *args): self.bytecode.add_step(“${it}”, *args) return self “””)} Gremlin-Python’s source code is programmatically generated using Java reflection.
  79. 79. Stakeholders Language Providers Gremlin Language Variant Distinct Query Language Easy to generate bytecode. Bytecode executes against TinkerPop-enabled systems. Language providers write a translator for the Gremlin traversal machine, not a particular graph database/processor. DataStax Enterprise Graph
  80. 80. Graph Database OLTP Graph Processor OLAP Stakeholders Language Providers Gremlin Language Variant Distinct Query Language Easy to generate bytecode. Bytecode executes against TinkerPop-enabled systems. Provider can focus on design, not evaluation. Gremlin Traversal Machine The language designer does not have to concern themselves with OLTP or OLAP execution. They simply generate bytecode and the Gremlin traversal machine handles the rest.
  81. 81. Easy to implement core interfaces. Graph System Providers Stakeholders OLAP Provider OLTP Provider Vertex Edge Graph Transaction TraversalStrategy Property key=value ? ? ?
  82. 82. Provider supports all provided languages. Easy to implement core interfaces. Graph System Providers Stakeholders OLAP Provider OLTP Provider The provider automatically supports all query languages that have compilers that generate Gremlin bytecode.
  83. 83. OLTP providers can leverage existing OLAP systems. Provider supports all provided languages. Easy to implement core interfaces. Graph System Providers Stakeholders OLAP Provider OLTP Provider DSE Graph leverages SparkGraphComputer for OLAP processing. DataStax Enterprise Graph
  84. 84. Stakeholders Language Providers Gremlin Language Variant Distinct Query Language Application Developers Graph System Providers OLAP Provider OLTP Provider One query language for all OLTP/OLAP systems. No vendor lock-in. Gremlin is embedded in the developer’s language. Easy to generate bytecode. Bytecode executes against TinkerPop-enabled systems. Provider can focus on design, not evaluation. Easy to implement core interfaces. Provider supports all provided languages. OLTP providers can leverage existing OLAP systems.
  85. 85. Thank you. http://tinkerpop.apache.org http://www.datastax.com/products/datastax-enterprise-graph

×