Successfully reported this slideshow.

The Path-o-Logical Gremlin

13

Share

Upcoming SlideShare
Gremlin
Gremlin
Loading in …3
×
1 of 47
1 of 47

The Path-o-Logical Gremlin

13

Share

Download to read offline

Gremlin is a graph traversal language that connects to various graph databases/frameworks.

* Neo4j [http://neo4j.org]
* OrientDB [http://orientechnologies.com]
* DEX [http://www.sparsity-technologies.com/dex]
* OpenRDF Sail [http://openrdf.org]
* JUNG [http://jung.sourceforge.net]

This lecture addresses the state of Gremlin as of the 0.9 (April 16, 2011).

Gremlin is a graph traversal language that connects to various graph databases/frameworks.

* Neo4j [http://neo4j.org]
* OrientDB [http://orientechnologies.com]
* DEX [http://www.sparsity-technologies.com/dex]
* OpenRDF Sail [http://openrdf.org]
* JUNG [http://jung.sourceforge.net]

This lecture addresses the state of Gremlin as of the 0.9 (April 16, 2011).

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

The Path-o-Logical Gremlin

  1. 1. The Path-o-Logical Gremlin Marko A. Rodriguez Graph Systems Architect http://markorodriguez.com The 2nd International Workshop on Graph Data Management (GDM’11) Hannover, Germany – April 16, 2011 GremlinG = (V, E) April 16, 2011
  2. 2. Abstract Gremlin is a graph traversal language that connects to various graph databases/frameworks. • Neo4j [http://neo4j.org] • OrientDB [http://orientechnologies.com] • DEX [http://www.sparsity-technologies.com/dex] • OpenRDF Sail [http://openrdf.org] • JUNG [http://jung.sourceforge.net] This lecture addresses the state of Gremlin as of the 0.9 (April 16th, 2011).
  3. 3. TinkerPop Productions 1 1 TinkerPop (http://tinkerpop.com), Marko A. Rodriguez (http://markorodriguez.com), Peter Neubauer (http://www.linkedin.com/in/neubauer), Joshua Shinavier (http://fortytwo.net/), Pavel Yaskevich (https://github.com/xedin), Darrick Wiebe (http://ofallpossibleworlds.wordpress. com/), Stephen Mallette (http://stephen.genoprime.com/), Alex Averbuch (http://se.linkedin. com/in/alexaverbuch)
  4. 4. Blueprints –> Pipes –> Groovy –> Gremlin To understand Gremlin, its important to understand Groovy, Blueprints, and Pipes. Bl ue s pr ip e in P ts
  5. 5. Gremlin is a Domain Specific Language • Groovy is a superset of Java and as such, natively exposes the full JDK to Gremlin. • Gremlin 0.7+ uses Groovy as its host language.2 • Gremlin 0.7+ takes advantage of Groovy’s meta-programming, dynamic typing, and closure properties. 2 Groovy available at http://groovy.codehaus.org/.
  6. 6. Gremlin is for Property Graphs name = "lop" lang = "java" weight = 0.4 3 name = "marko" age = 29 created weight = 0.2 9 1 created 8 created 12 7 weight = 1.0 weight = 0.5 weight = 0.4 6 knows knows 11 name = "peter" age = 35 name = "josh" 4 age = 32 2 10 name = "vadas" age = 27 weight = 1.0 created 5 name = "ripple" lang = "java"
  7. 7. Gremlin is for Property Graphs PROPERTIES key/value VERTEX EDGE EDGE LABEL ID name = "marko" name = "lop" age = 29 weight = 0.4 lang = "java" 1 9 created 3
  8. 8. Gremlin is for Blueprints-enabled Graph Databases • Blueprints can be seen as the JDBC for property graph databases.3 • Provides a collection of interfaces for graph database providers to implement. • Provides tests to ensure the operational semantics of any implementation are correct. • Provides support for GraphML and other “helper” utilities. Blueprints 3 Blueprints is available at http://blueprints.tinkerpop.com.
  9. 9. A Blueprints Detour - Basic Example Graph graph = new Neo4jGraph("/tmp/neo4j"); Vertex a = graph.addVertex(null); Vertex b = graph.addVertex(null); a.setProperty("name","marko"); b.setProperty("name","peter"); Edge e = graph.addEdge(null, a, b, "knows"); e.setProperty("since", 2007); graph.shutdown(); 0 knows 1 since=2007 name=marko name=peter
  10. 10. A Blueprints Detour - Sub-Interfaces • TransactionalGraph extends Graph For transaction-based graph databases.4 • IndexableGraph extends Graph For graph databases that support the indexing of properties.5 4 Graph Transactions https://github.com/tinkerpop/blueprints/wiki/Graph-Transactions. 5 Graph Indices https://github.com/tinkerpop/blueprints/wiki/Graph-Indices.
  11. 11. A Blueprints Detour - Implementations TinkerGraph
  12. 12. A Blueprints Detour - Implementations • TinkerGraph implements IndexableGraph • Neo4jGraph implements TransactionalGraph, IndexableGraph6 • OrientGraph implements TransactionalGraph, IndexableGraph7 • DexGraph implements TransactionalGraph, IndexableGraph8 • RexsterGraph implements IndexableGraph: Binds to Rexster REST API.9 • SailGraph implements TransactionalGraph: Turns any Sail-based RDF store into a Blueprints Graph.10 6 Neo4j is available at http://neo4j.org. 7 OrientDB is available at http://www.orientechnologies.com. 8 DEX is available at http://sparsity-technologies.com/dex. 9 Rexster is available at http://rexster.tinkerpop.com. 10 Sail is available at http://openrdf.org.
  13. 13. A Blueprints Detour - Ouplementations JUNG Java Universal Network/Graph Framework
  14. 14. A Blueprints Detour - Ouplementations 11 • GraphSail: Turns any IndexableGraph into an RDF store. • GraphJung: Turns any Graph into a JUNG graph.12 13 11 GraphSail https://github.com/tinkerpop/blueprints/wiki/Sail-Ouplementation. 12 JUNG is available at http://jung.sourceforge.net/. 13 GraphJung https://github.com/tinkerpop/blueprints/wiki/JUNG-Ouplementation.
  15. 15. Gremlin Compiles Down to Pipes • Pipes is a data flow framework for evaluating lazy graph traversals.14 • A Pipe extends Iterator, Iterable and can be chained together to create processing pipelines. • A Pipe can be embedded into another Pipe to allow for nested processing. Pipes 14 Pipes is available at http://pipes.tinkerpop.com.
  16. 16. A Pipes Detour - Chained Iterators • This Pipeline takes objects of type A and turns them into objects of type D. D D A A A Pipe1 B Pipe2 C Pipe3 D D A D A Pipeline Pipe<A,D> pipeline = new Pipeline<A,D>(Pipe1<A,B>, Pipe2<B,C>, Pipe3<C,D>)
  17. 17. A Pipes Detour - Simple Example “What are the names of the people that marko knows?” B name=peter knows A knows C name=pavel name=marko created created D name=gremlin
  18. 18. A Pipes Detour - Simple Example Pipe<Vertex,Edge> pipe1 = new OutEdgesPipe("knows"); Pipe<Edge,Vertex> pipe2 = new InVertexPipe(); Pipe<Vertex,String> pipe3 = new PropertyPipe<String>("name"); Pipe<Vertex,String> pipeline = new Pipeline(pipe1,pipe2,pipe3); pipeline.setStarts(new SingleIterator<Vertex>(graph.getVertex("A")); InVertexPipe() OutEdgesPipe("knows") PropertyPipe("name") B name=peter knows A knows C name=pavel name=marko created created D name=gremlin HINT: The benefit of Gremlin is that this Java verbosity is reduced to g.v(‘A’).outE(‘knows’).inV.name.
  19. 19. A Pipes Detour - Pipes Library [ GRAPHS ] [ SIDEEFFECTS ] [ FILTERS ] OutEdgesPipe AggregatorPipe AndFilterPipe InEdgesPipe GroupCountPipe BackFilterPipe OutVertexPipe CountPipe CollectionFilterPipe InVertexPip SideEffectCapPipe ComparisonFilterPipe IdFilterPipe DuplicateFilterPipe IdPipe [ UTILITIES ] FutureFilterPipe LabelFilterPipe GatherPipe ObjectFilterPipe LabelPipe PathPipe OrFilterPipe PropertyFilterPipe ScatterPipe RandomFilterPipe PropertyPipe Pipeline RangeFilterPipe ...
  20. 20. A Pipes Detour - Creating Pipes public class NumCharsPipe extends AbstractPipe<String,Integer> { public Integer processNextStart() { String word = this.starts.next(); return word.length(); } } When extending the base class AbstractPipe<S,E> all that is required is an implementation of processNextStart().
  21. 21. Now onto Gremlin proper...
  22. 22. The Gremlin Architecture Neo4j OrientDB DEX
  23. 23. The Many Ways of Using Gremlin • Gremlin has a REPL to be run from the shell. • Gremlin can be natively integrated into any Groovy class. • Gremlin can be interacted with indirectly through Java, via Groovy. • Gremlin has a JSR 223 ScriptEngine as well.
  24. 24. The Seamless Nature of Gremlin/Groovy/Java Simply Gremlin.load() and add Gremlin to your Groovy class. // a Groovy class class GraphAlgorithms { static { Gremlin.load(); } public static Map<Vertex, Integer> eigenvectorRank(Graph g) { Map<Vertex,Integer> m = [:]; int c = 0; g.V.outE.inV.groupCount(m).loop(3) {c++ < 1000} >> -1; return m; } } Writing software that mixes Groovy and Java is simple...dead simple. // a Java class public class GraphFramework { public static void main(String[] args) { System.out.println(GraphAlgorithms.eigenvectorRank(new TinkerGraph("/tmp/tinker")); } }
  25. 25. Loading a Toy Graph marko$ ./gremlin.sh name = "lop" lang = "java" ,,,/ weight = 0.4 3 name = "marko" (o o) age = 29 created weight = 0.2 -----oOOo-(_)-oOOo----- 9 1 created gremlin> g = TinkerGraphFactory.createTinkerGraph() 8 created 7 weight = 1.0 12 ==>tinkergraph[vertices:6 edges:6] weight = 0.5 weight = 0.4 6 gremlin> knows knows 11 name = "peter" age = 35 4 name = "josh" age = 32 15 2 10 name = "vadas" age = 27 weight = 1.0 created 5 name = "ripple" lang = "java" 15 Load up the toy 6 vertex/6 edge graph that comes hardcoded with Blueprints.
  26. 26. Basic Gremlin Traversals - Part 1 name = "lop" gremlin> g.v(1) lang = "java" ==>v[1] weight = 0.4 3 name = "marko" age = 29 gremlin> g.v(1).map() created 1 9 ==>name=marko created 7 8 created 12 ==>age=29 6 weight = 0.5 knows gremlin> g.v(1).outE knows 11 weight = 1.0 ==>e[7][1-knows->2] name = "josh" 4 2 age = 32 ==>e[9][1-created->3] name = "vadas" 10 ==>e[8][1-knows->4] age = 27 gremlin> created 5 16 16 Look at the neighborhood around marko.
  27. 27. Basic Gremlin Traversals - Part 2 gremlin> g.v(1) name = "lop" lang = "java" ==>v[1] 3 gremlin> g.v(1).outE(‘created’).inV name = "marko" age = 29 created ==>v[3] 9 1 created gremlin> g.v(1).outE(‘created’).inV.name 8 created ==>lop 12 7 6 gremlin> knows knows 11 17 4 2 10 created 5 17 What has marko created?
  28. 28. Basic Gremlin Traversals - Part 3 name = "lop" gremlin> g.v(1) lang = "java" ==>v[1] name = "marko" 3 gremlin> g.v(1).outE(‘knows’).inV age = 29 created 9 ==>v[2] 1 created 8 created ==>v[4] 12 7 6 gremlin> g.v(1).outE(‘knows’).inV.name knows knows 11 ==>vadas 4 name = "josh" ==>josh age = 32 2 gremlin> g.v(1).outE(‘knows’). 10 name = "vadas" age = 27 inV{it.age < 30}.name created ==>vadas gremlin> 5 18 18 Who does marko know? Who does marko know that is under 30 years of age?
  29. 29. Basic Gremlin Traversals - Part 4 codeveloper gremlin> g.v(1) ==>v[1] gremlin> g.v(1).outE(‘created’).inV lop codeveloper ==>v[3] created gremlin> g.v(1).outE(‘created’).inV codeveloper created .inE(‘created’).outV marko ==>v[1] peter ==>v[4] created ==>v[6] knows knows gremlin> g.v(1).outE(‘created’).inV .inE(‘created’).outV.except([g.v(1)]) josh ==>v[4] vadas ==>v[6] gremlin> g.v(1).outE(‘created’).inV created .inE(‘created’).outV.except([g.v(1)]).name ==>josh ripple ==>peter gremlin> 19 19 Who are marko’s codevelopers? That is, people who have created the same software as him, but are not him (marko can’t be a codeveloper of himeself).
  30. 30. Basic Gremlin Traversals - Part 5 gremlin> Gremlin.addStep(‘codeveloper’) ==>null lop gremlin> c = {_{x = it}.outE(‘created’).inV codeveloper .inE(‘created’).outV{!x.equals(it)}} created ==>groovysh_evaluate$_run_closure1@69e94001 created marko gremlin> [Vertex, Pipe].each{ it.metaClass.codeveloper = created { Gremlin.compose(delegate, c())}} codeveloper peter ==>interface com.tinkerpop.blueprints.pgm.Vertex knows knows ==>interface com.tinkerpop.pipes.Pipe gremlin> g.v(1).codeveloper ==>v[4] josh vadas ==>v[6] gremlin> g.v(1).codeveloper.codeveloper created ==>v[1] ==>v[6] ==>v[1] ripple ==>v[4] 20 20 I don’t want to talk in terms of outE, inV, etc. Given my domain model, I want to talk in terms of higher-order, abstract adjacencies — e.g. codeveloper.
  31. 31. Abstract/Derived/Inferred Adjacency outE in outE V loop(2){..} back(3) {i t. sa la } ry } friend_of_a_friend_who_earns_less_than_friend_at_work • outE, inV, etc. is low-level graph speak (the domain is the graph). • codeveloper is high-level domain speak (the domain is software development).21 21 In this way, Gremlin can be seen as a DSL (domain-specific language) for creating DSL’s for your graph applications. Gremlin’s domain is “the graph.” Build languages for your domain on top of Gremlin (e.g. “software development domain”).
  32. 32. Developer's FOAF Import Graph Friend-Of-A-Friend Graph Developer Imports Friends at Work Graph Graph You need not make Software Imports Friendship derived graphs explicit. Graph Graph You can, at runtime, compute them. Moreover, generate them locally, not Developer Created globally (e.g. ``Marko's Employment Graph friends from work Graph relations"). This concept is related to automated reasoning and whether reasoned relations are inserted into the explicit graph or Explicit Graph computed at query time.
  33. 33. Lets explore more complex traversals...
  34. 34. Grateful Dead Concert Graph The Grateful Dead were an American band that was born out of the San Francisco, California psychedelic movement of the 1960s. The band played music together from 1965 to 1995 and is well known for concert performances containing extended improvisations and long and unique set lists. [http://arxiv.org/abs/0807.2466]
  35. 35. Grateful Dead Concert Graph Schema • vertices song vertices ∗ type: always song for song vertices. ∗ name: the name of the song. ∗ performances: the number of times the song was played in concert. ∗ song type: whether the song is a cover song or an original. artist vertices ∗ type: always artist for artist vertices. ∗ name: the name of the artist. • edges followed by (song → song): if the tail song was followed by the head song in concert. ∗ weight: the number of times these two songs were paired in concert. sung by (song → artist): if the tail song was primarily sung by the head artist. written by (song → artist): if the tail song was written by the head artist.
  36. 36. A Subset of the Grateful Dead Concert Graph type="artist" type="artist" name="Hunter" name="Garcia" type="song" name="Scarlet.." 7 5 written_by 1 sung_by weight=239 followed_by type="song" name="Fire on.." sung_by sung_by written_by 2 type="artist" weight=1 name="Lesh" type="song" followed_by name="Pass.." 6 written_by 3 sung_by followed_by type="song" weight=2 name="Terrap.." 4
  37. 37. Loading the GraphML Representation gremlin> g = new TinkerGraph() ==>tinkergraph[vertices:0 edges:0] gremlin> GraphMLReader.inputGraph(g, new FileInputStream(‘data/graph-example-2.xml’)) ==>null gremlin> g.V.name ==>OTHER ONE JAM ==>Hunter ==>HIDEAWAY ==>A MIND TO GIVE UP LIVIN ==>TANGLED UP IN BLUE ... 22 22 Load the GraphML (http://graphml.graphdrawing.org/) representation of the Grateful Dead graph, iterate through its vertices and get the name property of each vertex.
  38. 38. Becoming gremlin> v = g.idx(T.v)[[name:‘DARK STAR’]] >> 1 ==>v[89] gremlin> v.outE(‘sung_by’).inV.name ==>Garcia 23 23 Use the default vertex index (T.v) and find all vertices index by the key/value pair name/DARK STAR. Who sung Dark Star?
  39. 39. Changeling gremlin> v.outE(‘followed_by’).inV.name ==>EYES OF THE WORLD ==>TRUCKING ==>SING ME BACK HOME ==>MORNING DEW ... gremlin> v.outE(‘followed_by’).transform{[it.inV.name >> 1, it.weight]} ==>[EYES OF THE WORLD, 9] ==>[TRUCKING, 1] ==>[SING ME BACK HOME, 1] ==>[MORNING DEW, 11] ... 24 24 What followed Dark Star in concert? How many times did each song follow Dark Start in concert?
  40. 40. Back Stabber gremlin> v.outE(‘followed_by’).inV[[name:‘BERTHA’]] ==>v[4] gremlin> v.outE(‘followed_by’).inV[[name:‘BERTHA’]].name ==>BERTHA gremlin> v.outE(‘followed_by’).inV[[name:‘BERTHA’]].name.back(1) ==>v[4] gremlin> v.outE(‘followed_by’).inV[[name:‘BERTHA’]].name.back(3) ==>e[7029][89-followed_by->4] gremlin> v.outE(‘followed_by’).inV[[name:‘BERTHA’]].name.back(3).weight ==>1 25 25 As you walk a path, its possible to go back to where you were n-steps ago.
  41. 41. Loopy Lou gremlin> m = [:]; c = 0 ==>0 gremlin> g.V.outE(‘followed_by’).inV.groupCount(m).loop(3){c++ < 1000} ... gremlin> m.sort{a,b -> b.value <=> a.value} ==>v[13]=762 ==>v[21]=668 ==>v[50]=661 ==>v[153]=630 ==>v[96]=609 ==>v[26]=586 ... 26 26 Emanate from each vertex the followed by path. Index each vertex by how many times its been traversed over in the map m. Do this for 1000 times. The final filter{false} will ensure nothing is outputted.
  42. 42. Path-o-Logical gremlin> g.v(89).outE.inV.name ==>EYES OF THE WORLD ==>TRUCKING ==>SING ME BACK HOME ==>MORNING DEW ==>HES GONE ... gremlin> g.v(89).outE.inV.name.paths ==>[v[89], e[7021][89-followed_by->83], v[83], EYES OF THE WORLD] ==>[v[89], e[7022][89-followed_by->21], v[21], TRUCKING] ==>[v[89], e[7023][89-followed_by->206], v[206], SING ME BACK HOME] ==>[v[89], e[7006][89-followed_by->127], v[127], MORNING DEW] ==>[v[89], e[7024][89-followed_by->49], v[49], HES GONE] ... 27 27 What are the paths that are out.inV.name emanating from Dark Star (v[89])?
  43. 43. Paths of Length < 4 Between Dark Star and Drums gremlin> g.v(89).outE.inV.loop(2){it.loops < 4 & !(it.object.equals(g.v(96)))}[[id:‘96’]].paths ==>[v[89], e[7014][89-followed_by->96], v[96]] ==>[v[89], e[7021][89-followed_by->83], v[83], e[1418][83-followed_by->96], v[96]] ==>[v[89], e[7022][89-followed_by->21], v[21], e[6320][21-followed_by->96], v[96]] ==>[v[89], e[7006][89-followed_by->127], v[127], e[6599][127-followed_by->96], v[96]] ==>[v[89], e[7024][89-followed_by->49], v[49], e[6277][49-followed_by->96], v[96]] ==>[v[89], e[7025][89-followed_by->129], v[129], e[5751][129-followed_by->96], v[96]] ==>[v[89], e[7026][89-followed_by->149], v[149], e[2017][149-followed_by->96], v[96]] ==>[v[89], e[7027][89-followed_by->148], v[148], e[1937][148-followed_by->96], v[96]] ==>[v[89], e[7028][89-followed_by->130], v[130], e[1378][130-followed_by->96], v[96]] ==>[v[89], e[7019][89-followed_by->39], v[39], e[6804][39-followed_by->96], v[96]] ==>[v[89], e[7034][89-followed_by->91], v[91], e[925][91-followed_by->96], v[96]] ==>[v[89], e[7035][89-followed_by->70], v[70], e[181][70-followed_by->96], v[96]] ==>[v[89], e[7017][89-followed_by->57], v[57], e[2551][57-followed_by->96], v[96]] ... 28 28 Get the adjacent vertices to Dark Star (v[89]). Loop on outE.inV while the amount of loops is less than 4 and the current path hasn’t reached Drums (v[96]). Return the paths traversed.
  44. 44. Generic Steps Gremlin is founded on a collection of atomic “steps.” The syntax is generally seen as step.step.step. There are 3 categories of steps. • transform: map the input to some output. S → T outE, inV, paths, copySplit, fairMerge, . . . • filter: either output the input or not. S → (S ∪ ∅) back, except, uniqueObject, andf, orf, . . . • sideEffect: output the input and yield a side-effect. S → S aggregate, groupCount, . . .
  45. 45. Integrating the JDK (Java API) • Groovy is the host language for Gremlin. Thus, the JDK is natively available in a path expression. gremlin> v.outE(‘followed_by’).inV{it.name.matches(‘D.*’)}.name ==>DEAL ==>DRUMS • Examples below are not with respect to the Grateful Dead schema presented previously. They help to further articulate the point at hand. gremlin> x.outE.inV.transform{JSONParser.get(it.uri).reviews}.mean() ... gremlin> y.outE(‘rated’).filter{TextAnalysis.isElated(it.review)}.inV ...
  46. 46. In the End, Gremlin is Just Pipes gremlin> (g.v(89).outE(‘followed_by’).inV.name).toString() ==>[OutEdgesPipe<followed_by>, InVertexPipe, PropertyPipe<name>] 29 29 Gremlin is a domain specific language for constructing lazy evaluation, data flow Pipes (http: //pipes.tinkerpop.com). In the end, what is evaluated is a pipeline process in pure Java over a Blueprints-enabled property graph (http://blueprints.tinkerpop.com).
  47. 47. Credits • Marko A. Rodriguez: designed, developed, and documented Gremlin. • Pavel Yaskevich: heavy development on earlier versions of Gremlin. • Darrick Wiebe: inspired Groovy Gremlin and develops on Pipes and Blueprints. • Peter Neubauer: promoter and evangelist of Gremlin. • Ketrina Yim: designed the Gremlin logo. Gremlin G = (V, E) http://gremlin.tinkerpop.com http://groups.google.com/group/gremlin-users http://tinkerpop.com

×