Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin

  • 4,442 views
Uploaded on

A quick overview of the history, motivation, and uses of graph modeling and graph databases in various industries. Covers a brief introduction to graph databases with an emphasis on the Tinkerpop......

A quick overview of the history, motivation, and uses of graph modeling and graph databases in various industries. Covers a brief introduction to graph databases with an emphasis on the Tinkerpop stack and Gremlin query language. These concepts are then solidified through a hands-on lab modeling a blog engine using Titan and Gremlin.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
4,442
On Slideshare
4,172
From Embeds
270
Number of Embeds
5

Actions

Shares
Downloads
104
Comments
2
Likes
10

Embeds 270

https://twitter.com 206
http://www.slideee.com 44
http://tweetedtimes.com 13
http://www.slidesearchengine.com 5
https://www.linkedin.com 2

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. INTRO TO GRAPH DATABASES Using Tinkerpop, TitanDB, and Gremlin { “email” : “calebjones@gmail.com”, “website” : “http://calebjones.info”, “twitter” : “@JonesWCaleb” }
  • 2. Overview • Why Graphs? • Order to complexity • Use cases – major players • Graphs & Adjacency Matrices • Tinkerpop Framework • Blueprints, Frames, Pipes, Furnace, Gremlin, Rexster • Titan using Cassandra • Blog Application (lab) • Traversals using Gremlin
  • 3. WHY GRAPHS?
  • 4. Warren Weaver • 17th - 19th century • Problems of simplicity • How one element interacts with another • First half of 20th century • Problem of disorganized complexity • Many elements operating in a system w/o regard to how they interact with each other • Predicted • Problem of organized complexity • Many elements operating in a system taking into account how they interact with each other • Would require computational power far beyond what was currently available Science and Complexity 1948 ENIAC (1946)
  • 5. Organisms
  • 6. Knowledge Classification
  • 7. Organizational Hierarchy
  • 8. Neurology
  • 9. Order to Complexity • Trees describe order • Linear (simple lineage) • Categorized • Single dimensional • Symmetrical • Hierarchical • Convergent modeling • Networks describe complexity • Non-linear (multi-lineage) • Multi-categorical • Multi-dimensional • Asymmetrical • Decentralized • Divergent modeling
  • 10. Types of Networks
  • 11. Types of Networks
  • 12. Types of Networks
  • 13. Types of Networks
  • 14. Types of Networks
  • 15. Types of Networks
  • 16. Types of Networks
  • 17. Types of Networks
  •ypes of Networks
  • 19. Types of Networks Neuron Network of Mouse Millennium Simulation (2005) Largest astronomical simulation ever on the structure and evolution of galaxies in the universe. 25 TB of data and 20 million galaxies
  • 20. Use Cases • Recommendation engines (avoid relational N-JOIN or self-JOIN) • Ranking/credibility (Google’s PageRank) • Path finding (shortest, longest, mutual friends) • Social (friendship, following, key connectors)
  • 21. Graphs • Node/Verticy: An entity that can have zero or more edges connected to it. 1 2 3 • Edge: An entity which connects two nodes. May be directed or undirected 1 2 A B
  • 22. Adjacency Matrix • If graph is undirected, the adjacency matrix is symmetric • Thus, transposition of matrix is the same graph
  • 23. Adjacency Matrix • Some graphs have different ‘types’ or dimensions of edges
  • 24. Property Graphs Attribute Value id 2 name Bob Attribute Value id E3 type knows since 2013-09-01 Attribute Value id 4 name Alice Attribute Value id 3 name Eve Attribute Value id E2 type knows since 2013-09-01 Attribute Value id E4 type sibling twins true Attribute Value id 1 name Ivan Attribute Value id E1 type cousin separation 1
  • 25. Traversals • Breadth-first • 3, 2, 4, 1 • Depth-first • 3, 2, 1, 4 • Breadth-first and depth-first search can be combined. • Filtering • Ability to filter/sort paths in traversal • Aggregating • Ability to aggregate/count properties as traversal occurs and affect traversal with result of aggregation (e.g. power-grid load distr.) • Backtracking • Leave marker in traversal and come back to it when certain criteria is met in a lower step 1 2 3 4
  • 26. TINKERPOP Graph Framework
  • 27. Tinkerpop • A comprehensive, open-source graph framework (http://www.tinkerpop.com/) Property graph model that is DB agnostic. A kind of JDBC for graphs. Data flow API for processing graphs. Underlying component for graph traversals DSL for traversing property graphs. Implemented in JSR-223. Maps between domain objects and the graph’s nodes and edges. Like ORM for graphs. Collection of common graph analysis algorithms for property graphs. Exposes any blueprints graph via a uniform RESTful API. Blueprints Pipes Gremlin Frames Furnace Rexster
  • 28. Tinkerpop Stack • Different components all build on each other • Provides abstraction from HTTP layer, to object mapping layer, to traversal scripting, to pluggable graph API • Blueprints underpins the stack making it all DB agnostic • Blueprints implementations: • Neo4j, Sail, OrientDB, Dex • *) Accumulo, ArangoDB, Bitsy, FluxGraph, FoundationDB, InfiniteGraph, MongoDB, Oracle- NoSQL, TitanDB * - Implemented by 3rd party
  • 29. Tinkerpop - Rexter • Provides REST and binary (RexPro - grizzly) protocols • Flexible extension model (e.g. ad-hoc Gremlin queries) • Server-side stored procedures (Gremlin) • Browser-based interface (Dog House) • Command-line tool for interacting with API • Pluggable security • SPARQL plugin to work against Sail graphs (OpenRDF) • More information: https://github.com/tinkerpop/rexster/wiki
  • 30. Tinkerpop - Furnace • Collection of industry-standard algorithms for traversing or analyzing graphs. • Network generators (by clique or degree distribution) • Search: A*, Breadth-first, Depth-first • Shortest path • Bellman-Ford (like Dijkstra’s but can handle neg. paths) • PageRank • Degree Distribution • More information: https://github.com/tinkerpop/furnace/wiki
  • 31. Tinkerpop - Frames More Information: https://github.com/tinkerpop/frames/wiki
  • 32. Tinkerpop - Pipes • Dataflow framework for process graphs. • Computational step becomes a node and an edge is a communication channel between steps. • Pipes are then chained and nested. • Custom pipes can be created. • Pipe types: • Transform – emit transformation of object • Dozens of different types of transforms • Filter – decide whether to include/exclude object in traversal • ~20 different types of filters • sideEffect – include object but produce side-effect from it • ~15 different types of sideEffects (e.g. group, count, table, tree) • Branch – decide which step to take next in traversal • Several different branching options
  • 33. Tinkerpop - Blueprints • Like JDBC but for graphs. • Common API for Property Graphs which are very flexible • Foundational component for Pipes, Gremlin, Frames, Furnace, and Rexster • Supports transactions (if underlying DB engine does) • Multi-threaded transactions supported • Format readers/writers (GML, GraphML, GraphSON) • More Information: https://github.com/tinkerpop/blueprints/wiki
  • 34. Tinkerpop - Gremlin • Graph traversal scripting language. • Works against Blueprints API and is “compiled” into Frames data-flows. • Both native Java and Groovy (JSR-223) supported. • Step library (https://github.com/tinkerpop/gremlin/wiki/Gremlin-Steps) • Transform – emit transformation of object • Dozens of different types of transforms • Filter – decide whether to include/exclude object in traversal • ~20 different types of filters • sideEffect – include object but produce side-effect from it • ~15 different types of sideEffects (e.g. group, count, table, tree) • Branch – decide which step to take next in traversal • Several different branching options
  • 35. SQL → Gremlin (secret decoder ring) Query SQL Gremlin Get all users select * from users g.V(‘type’, ‘user’).map() Get user names select name from users g.V(‘type’, ‘user’).name Get user names/ages select name, age from users g.V(‘type’, ‘user’) .transform( { [ ‘name’ : it.getProperty(‘name’), ‘age’ : it.getProperty(‘age’) ] }) Get distinct user ages select distinct(age) from users g.V(‘type’, ‘user’) .age.dedup() Get oldest user select max(age) from users g.V(‘type’, ‘user’) .age.max()
  • 36. SQL → Gremlin (secret decoder ring) Query SQL Gremlin Select by equality select * from users where age = 35 g.V(‘type’, ‘user’) .has(‘age’, 35).map() Select by comparison select * from users where age > 21 g.V(‘type’, ‘user’) .has(‘age’, T.gt, 21) .map() Select by multiple criteria select * from users where sex = “M” and age > 25 g.V(‘type’, ‘user’) .has(‘age’, T.gt, 25) .has(‘sex’, ‘M’) .map() Order by age (switch ‘a’ and ‘b’ to do asc) select * from users order by age desc g.V(‘type’, ‘user’).order({ it.b.getProperty(‘age’) <=> it.a.getProperty(‘age’) }).map() Paging select * from users order by age desc limit 5 offset 5 g.V(‘type’, ‘user’) .order({ it.b.getProperty(‘age’) <=> it.a.getProperty(‘age’) })[5..<10].map()
  • 37. SQL → Gremlin (secret decoder ring) Query SQL Gremlin Join select users.* from users inner join groups on users.gId = groups.id where groups.name = “devs” g.V(‘type’, ‘groups’) .has(‘name’, ‘dev’) .in(‘inGroup’).map() Join-on-join-on-join … SELECT TOP (5) [t14].[ProductName] FROM (SELECT COUNT(*) AS [value], [t13].[ProductName] FROM [customers] AS [t0] CROSS APPLY (SELECT [t9].[ProductName] FROM [orders] AS [t1] CROSS JOIN [order details] AS [t2] INNER JOIN [products] AS [t3] ON [t3].[ProductID] = [t2].[ProductID] CROSS JOIN [order details] AS [t4] INNER JOIN [orders] AS [t5] ON [t5].[OrderID] = [t4].[OrderID] LEFT JOIN [customers] AS [t6] ON [t6].[CustomerID] = [t5].[CustomerID] CROSS JOIN ([orders] AS [t7] CROSS JOIN [order details] AS [t8] INNER JOIN [products] AS [t9] ON [t9].[ProductID] = [t8].[ProductID]) WHERE NOT EXISTS(SELECT NULL AS [EMPTY] FROM [orders] AS [t10] CROSS JOIN [order details] AS [t11] INNER JOIN [products] AS [t12] ON [t12].[ProductID] = [t11].[ProductID] WHERE [t9].[ProductID] = [t12].[ProductID] AND [t10].[CustomerID] = [t0].[CustomerID] AND [t11].[OrderID] = [t10].[OrderID]) AND [t6].[CustomerID] <> [t0].[CustomerID] AND [t1].[CustomerID] = [t0].[CustomerID] AND [t2].[OrderID] = [t1].[OrderID] AND [t4].[ProductID] = [t3].[ProductID] AND [t7].[CustomerID] = [t6].[CustomerID] AND [t8].[OrderID] = [t7].[OrderID]) AS [t13] WHERE [t0].[CustomerID] = N'ALFKI' GROUP BY [t13].[ProductName]) AS [t14] ORDER BY [t14].[value] DESC g.V('customerId','ALFKI') .as('customer’) .out('ordered') .out('contains') .out('is') .as('products’) .in('is') .in('contains') .in('ordered') .except('customer’) .out('ordered') .out('contains') .out('is') .except('products’) .groupCount().cap() .orderMap(T.decr[0..<5] .productName
  • 38. Gremlin Resources • Tinkerpop resources • https://github.com/tinkerpop/gremlin/wiki/Basic-Graph-Traversals • https://github.com/tinkerpop/gremlin/wiki/Gremlin-Steps • https://github.com/tinkerpop/gremlin/wiki/Using-Gremlin-through-Java • https://groups.google.com/forum/#!forum/gremlin-users • https://github.com/tinkerpop/gremlin/wiki/SPARQL-vs.-Gremlin • http://markorodriguez.com/2011/08/03/on-the-nature-of-pipes/ • http://sql2gremlin.com/ • http://gremlindocs.com/ • Groovy • http://groovy.codehaus.org/Beginners+Tutorial • http://groovy.codehaus.org/Collections • Misc • http://www.fromdev.com/2013/09/Gremlin-Example-Query-Snippets-Graph-DB.html • http://markorodriguez.com/2011/06/15/graph-pattern-matching-with-gremlin-1-1/
  • 39. GREMLIN Demo Dataset Lab
  • 40. Tinkerpop - Gremlin gremlin> g = TinkerGraphFactory.createTinkerGraph() ==>tinkergraph[vertices:6 edges:6] gremlin> g.V.count() ==>6 gremlin> g.E.count() ==>6 gremlin> g.v(1) ==>v[1] gremlin> g.v(1).map ==>{age=29, name=marko} gremlin> g.v(1).outE ==>e[7][1-­‐knows-­‐>2] ==>e[8][1-­‐knows-­‐>4] ==>e[9][1-­‐created-­‐>3] gremlin> g.v(1).outE('knows') ==>e[7][1-­‐knows-­‐>2] ==>e[8][1-­‐knows-­‐>4] gremlin> g.v(1).outE('knows').map ==>{weight=0.5} ==>{weight=1.0}
  • 41. Tinkerpop - Gremlin // get verticies known by marko gremlin> g.v(1).outE('knows').inV ==>v[2] ==>v[4] // get properties of verticies known by marko gremlin> g.v(1).outE('knows').inV.map ==>{age=27, name=vadas} ==>{age=32, name=josh} // filter by those older than 30 gremlin> g.v(1).outE('knows').inV .filter{it.age > 30}.map ==>{age=32, name=josh} // just get name gremlin> g.v(1).outE('knows').inV .filter{it.age > 30}.name ==>josh // find nodes who ‘know’ someone older than 30 gremlin> g.V.as('x').outE('knows').inV .has('age', T.gt, 30).back('x').map ==>{age=29, name=marko}
  • 42. Tinkerpop - Gremlin // find edges with weight > .5 gremlin> g.E.filter{it.weight > 0.5} ==>e[10][4-­‐created-­‐>5] ==>e[8][1-­‐knows-­‐>4] // find edges w/ weight > .5 from marko gremlin> g.E.filter{it.weight > 0.5} .as('x').outV.has('name', T.eq, 'marko') .back('x') ==>e[8][1-­‐knows-­‐>4] // find nodes ‘created’ by other nodes gremlin> g.V.as('x').inE('created') .back('x').map ==>{name=lop, lang=java} ==>{name=ripple, lang=java} gremlin> g.E.filter{it.label == 'created'}.inV .dedup().map ==>{name=lop, lang=java} ==>{name=ripple, lang=java} // find nodes ‘created’ by more than 1 node gremlin> g.E.filter{it.label == 'created'} .inV.groupCount().cap() ==>{v[3]=3, v[5]=1} // find nodes ‘created’ by marko’s friends gremlin> g.v(1).outE('knows').inV .outE('created').inV.map ==>{name=ripple, lang=java} ==>{name=lop, lang=java}
  • 43. Tinkerpop - Gremlin // add some new nodes gremlin> g.addVertex([name:'bob',age:'60']) ==>v[0] gremlin> g.addVertex([name:'eve',age:'40']) ==>v[7] gremlin> g.addVertex([name:'timmy',age:'5']) ==>v[8] // add some edges gremlin> g.addEdge(g.v(0), g.v(7),'friend’) ==>e[13][0-­‐friend-­‐>7] gremlin> g.addEdge(g.v(0), g.v(8),'child') ==>e[14][0-­‐child-­‐>8] gremlin> g.V.filter{it.name == 'bob'} .outE('child').as('x').inV .filter{it.name == 'timmy'}.back('x') ==>e[14][0-­‐child-­‐>8] gremlin> g.removeEdge(g.e(14)) ==>null gremlin> g.V.filter{it.name == 'bob'} .outE('child').as('x').inV .filter{it.name == 'timmy'}.back('x') // no results
  • 44. Tinkerpop - Gremlin // previously gremlin> g.addVertex([name:'bob',age:'60']) ==>v[0] gremlin> g.addVertex([name:'eve',age:'40']) ==>v[7] gremlin> g.addEdge(g.v(0), g.v(7),'friend') ==>e[13][0-­‐friend-­‐>7] // query for edge gremlin> g.v(0).outE ==>e[13][0-­‐friend-­‐>7] // remove vertex (auto removes orphaned edge) gremlin> g.removeVertex(g.v(7)) ==>null gremlin> g.v(0).outE // no results gremlin> g.e(13) ==>null
  • 45. TITAN A Distributed Graph Database
  • 46. Titan Graph Database • Optimized to work against billions of nodes and edges • Theoretical limitation of 2^60 edges and 1^60 nodes • Works with several different distributed DBs including Cassandra and HBase • Supports many concurrent users doing complex graph traversals simultaneously • Native integration with Tinkerpop stack • Supports integration with search technologies such as Lucene and Elasticsearch • Created by Thinkaurelius (http://thinkaurelius.com/)
  • 47. Titan Distributed Architecture • TitanDB can integrate with distributed architectures in a few different ways Native Remote Embedded • Put Rexter in front to allow RESTful access • Connects remotely to cluster • Can scale size as far as cluster can • Possible processing bottleneck • TitanDB and Rexter run on each node in the cluster • Can run on same JVM • Considerable performance/scalability improvement • Connects remotely to cluster (or local) • Can scale size as far as cluster can • Native Titan API • Possible processing bottleneck
  • 48. Titan Indexing • Standard index • Internal to Titan • Very fast but only supports exact matches • External index • Use indexing engine external to Titan (Lucene or Elasticsearch) • Supports range queries • Lucene • Limited to only one machine (small-sized datasets) • Also as richer set of search features (than Elasticsearch) • Elasticsearch • Distributed • Not as feature-filled as Lucene
  • 49. Distributed Titan Limitations/Gotchas • Limitations which are present but which are scheduled to be remedied • Property indexes must be created before property is ever used • Unable to drop indices • Types cannot be changed once created • Gotchas • Multiple graphs on same backend requires specific configurations per graph • Ghost vertices – certain concurrency circumstances can leave traces of vertices. Recommendation is to allow this and periodically clean them up
  • 50. Titan Graph Database - Gremlin graph vertices edges properties G = (V , E , λ)
  • 51. Titan Graph Database - Gremlin graph vertices edges properties G = (V , E , λ)
  • 52. Titan Graph Database - Gremlin graph vertices edges properties G = (V , E , λ) Application
  • 53. Titan Graph Database - Gremlin graph vertices edges properties G = (V , E , λ) Application
  • 54. Titan Graph Database - Gremlin graph vertices edges properties G = (V , E , λ) Application
  • 55. DATA MODELING EXAMPLE A Blogging Application
  • 56. “Bloggie Blog” Requirements • Create users, posts, and comments • Retrieve all posts for a user • Retrieve posts by time range • Retrieve all comments for a user • Retrieve all comments for a post, sorted by vote • Retrieve the top N posts, sorted by vote • User can only vote *once* on a post or comment
  • 57. Get Cassandra & Titan • https://github.com/thinkaurelius/titan/wiki/Downloads (0.3.2 stable) $ $TITAN_LOCATION/bin/gremlin.sh ,,,/ (o o) -­‐-­‐-­‐-­‐-­‐oOOo-­‐(_)-­‐oOOo-­‐-­‐-­‐-­‐-­‐ gremlin> g = new TinkerGraph(); ==>tinkergraph[vertices:0 edges:0] gremlin>
  • 58. Modeling Entities (User, Post, Comment) • There’s no one way to model this. • General rules to follow: • 1-N relationships can be modeled as one node with N edges pointing to other nodes • 1-1 relationships can be modeled as a simple edge between two nodes • M-N relationships are just more edges • It is important to categorize the different types of edges since many different types of edges will connect to a single node • Don’t shy away from attaching properties to edges. Remember that edges are just a query-able as nodes. • A common practice is to tend to model “actions” as edges and “actors”/”artifacts” as nodes • Denormalize to minimize traversals
  • 59. Users, Posts, Comments
  • 60. Retrieve User’s Posts • Let’s create a user and post • Link them together • Retrieve the user and their posts gremlin> g.addVertex([ type: 'user', email: 'bob@test.com', name: 'Robert', password: 'asdf']) ==>v[0] gremlin> g.addVertex( [type: 'post', guid: '21EC2020-­‐3AEA-­‐1069-­‐A2DD-­‐08002B30309D', title: 'Hello World', text: 'My first post!', userDisplayName: 'Bob']) ==>v[1] gremlin> g.addEdge(g.v(0), g.v(1), 'postAuthor') ==>e[3][0-­‐postAuthor-­‐>1] gremlin> g.V.has('type', 'post').as('posts') .inE('postAuthor') .outV.has('email', 'bob@test.com') .back('posts').map() ==>{guid=21EC2020-­‐3AEA-­‐1069-­‐A2DD-­‐08002B30309D, text=My first post!, title=Hello World, userDisplayName=Bob, type=post}
  • 61. Retrieve Posts by Time Range • Add timestamp property to post • Query by range gremlin> g.V .has('guid','21EC2020-­‐3AEA-­‐1069-­‐A2DD-­‐08002B30309D') .has('type', 'post').sideEffect( {it.createTimestamp = 1383726500}); ==>v[1] gremlin> g.V .has('createTimestamp', T.gt, 1383726400) .has('createTimestamp', T.lt, 1383726600) .map() ==>{guid=21EC2020-­‐3AEA-­‐1069-­‐ A2DD-­‐08002B30309D, createTimestamp=1383726500, text=My first post!, title=Hello World, userDisplayName=Bob, type=post}
  • 62. Retrieve All User’s Comments • Add comment • Link to author and to post gremlin> g.addVertex([ type: 'comment', guid: '3F2504E0-­‐4F89-­‐11D3-­‐9A0C-­‐0305E82C3301', text: 'I like it!', userDisplayName: 'Sally', createTimestamp: 1383736500]) ==>v[4] gremlin> g.addEdge( g.v(1), g.v(4), 'postComment') ==>e[5][1-­‐postComment-­‐>4] gremlin> g.addVertex([type: 'user', email: 'sally@test.com', name: 'Sally', password: 'qwerty']) ==>v[6] gremlin> g.addEdge(g.v(6), g.v(4), 'commentAuthor') ==>e[7][6-­‐commentAuthor-­‐>4] gremlin> g.V.has('type', 'comment').as('comments') .inE('commentAuthor').outV.has( 'email', 'sally@test.com') .back('comments').map() ==>{guid=3F2504E0-­‐4F89-­‐11D3-­‐9A0C-­‐0305E82C3301, createTimestamp=1383736500, text=I like it!, userDisplayName=Sally, type=comment}
  • 63. Retrieve top N posts by vote • Create “postVote” edge and aggregated votes count in post • Query and sort by votes gremlin> g.addEdge(g.v(6), g.v(1), 'postVote', [date: 1383726600]) ==>e[8][6-­‐postVote-­‐>1] gremlin> g.V.has('type','post').has('guid','21EC2 020-­‐3AEA-­‐1069-­‐ A2DD-­‐08002B30309D').sideEffect({it.votes = 1}) ==>v[1] gremlin> g.addVertex([ type: 'post', guid: '21EC2020-­‐3AEA-­‐1069-­‐A2DD-­‐08002B30309E', createTimestamp: 1383726600, title: 'Learning Gremlin', text: 'Gremlin is neat.', userDisplayName: 'Bob', votes: 2]) ==>v[9] gremlin> g.V('type', 'post').order({it.b.getProperty('votes') <=> it.a.getProperty('votes')}).transform({['title' : it.getProperty('title'), 'votes' : it.getProperty('votes')]})[0..5] ==>{title=Learning Gremlin, votes=2} ==>{title=Hello World, votes=1}
  • 64. Retrieve Post Comments Sorted by Vote • Similar to post votes gremlin> g.addEdge(g.v(0), g.v(4), 'commentVote', [date: 1383726700]) ==>e[10][0-­‐commentVote-­‐>4] gremlin> g.V.has('type','comment').has('guid','3F 2504E0-­‐4F89-­‐11D3-­‐9A0C-­‐0305E82C3301').sid eEffect({it.votes = 1}) ==>v[4] gremlin> g.addVertex([ type: 'comment', guid: '3F2504E0-­‐4F89-­‐11D3-­‐9A0C-­‐0305E82C3302', text: 'Thanks.', userDisplayName: 'Bob', createTimestamp: 1383736500]) ==>v[11] gremlin> g.addEdge(g.v(1), g.v(11), 'postComment') gremlin> g.addEdge(g.v(0), g.v(11), 'commentAuthor') gremlin> g.v(1).outE('postComment').inV.order({it.b.getProperty( 'votes') <=> it.a.getProperty('votes')}).map() ==>{guid=3F2504E0-­‐4F89-­‐11D3-­‐9A0C-­‐0305E82C3301, createTimestamp=1383736500, text=I like it!, votes=1, userDisplayName=Sally, type=comment} ==>{guid=3F2504E0-­‐4F89-­‐11D3-­‐9A0C-­‐0305E82C3302, createTimestamp=1383736500, text=Thanks., userDisplayName=Bob, type=comment}
  • 65. User Can Only Vote Once • Could enforce using external unique indexes • Or do 2-step incrementing in gremlin (small chance of dups) gremlin> user = g.v(0); post = g.v(1); if (post.inE('postVote').outV.has( 'email', user.email).count() == 0) { g.addEdge(user, post, 'postVote', [date: new Date().getTime()]); if (post.getProperty('votes') != null){ post.votes++; } else { post.votes = 1; } } ==>1 gremlin> // same command above ==>null
  • 66. Graph Visualization
  • 67. Areas Not Covered • Map/Reduce • Gremlin has its own built-in M/R API • Indexing • Titan currently has limitation requiring all indexes are created up-front • Integration with other backends • HBase, Oracle Berkeley DB, Hazelcast, Persistit • Detailed full-text search through external indexes • Graph analytics engine (Faunus) • Deep dive into gremlin query language and Groovy • Seriously, there’s a TON there.
  • 68. References http://sql2gremlin.com/ http://www.tinkerpopbook.com/ - http://www.tinkerpop.com/ https://github.com/thinkaurelius/titan/wiki/Getting-Started https://groups.google.com/forum/#!forum/gremlin-users https://groups.google.com/forum/#!forum/aureliusgraphs http://thinkaurelius.com/
  • 69. THANK YOU { “email” : “calebjones@gmail.com”, “website” : “http://calebjones.info”, “twitter” : “@JonesWCaleb” }