Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Enabling Multimodel Graphs with Apache TinkerPop


Published on

Graphs are everywhere, but in a modern data stack, they are not the only tool in the toolbox. With Apache TinkerPop, adding graph capability on top of your existing data platform is not as daunting as it sounds. We will do a deep dive on writing Traversal Strategies to optimize performance of the underlying graph database. We will investigate how various TinkerPop systems offer unique possibilities in a multimodel approach to graph processing. We will discuss how using Gremlin frees you from vendor lock-in and enables you to swap out your graph database as your requirements evolve. Presented at Graph Day Texas, January 14, 2017.

Published in: Data & Analytics

Enabling Multimodel Graphs with Apache TinkerPop

  1. 1. Jason Plurad • • @pluradj IBM Open Technology • Apache TinkerPop January 14, 2017 • Graph Day Texas • #ddtx17 #gdtx17 Enabling Multimodel Graphs with Apache TinkerPop™
  2. 2. Agenda Apache TinkerPop Multimodel Graphs Graph Traversal Strategies Provider Optimizations On the Horizon 2 @pluradj #ddtx17 #gdtx17
  3. 3. Apache TinkerPop™ Open source graph computing framework
  4. 4. Apache TinkerPop § Open source, vendor-agnostic, graph computing framework § Gremlin graph traversal language 4 Apache TinkerPop™ Maintainer Apache Software Foundation License Apache Latest Release 3.2.3 October 2016 @pluradj #ddtx17 #gdtx17
  5. 5. Graph System Integration 5 @pluradj #ddtx17 #gdtx17
  6. 6. Multimodel Graphs Polyglot persistence
  7. 7. Multimodel Database § Graphs often are not alone in a data application § Multimodel: Combining capabilities of different database types § Choose the right tool for the job § Use graphs for highly connected data § Single persistence layer 7 OrientDB® Maintainer OrientDB License Apache Latest Release 2.2.14 December 2016 @pluradj #ddtx17 #gdtx17
  8. 8. Multimodel Platform § Graphs often are not alone in a data application § Multimodel: Combining capabilities of different database types § Choose the right tool for the job § Use graphs for highly connected data § Take advantage of existing storage architectures 8 DataStax Enterprise Graph Maintainer DataStax License Commercial Latest Release 5.0.5 December 2016 @pluradj #ddtx17 #gdtx17
  9. 9. Graph Traversal Strategies Optimizing a Gremlin traversal
  10. 10. Gremlin Machine: Everything Is a Traversal § Traversal § Step § Traverser § Traversal Source § Traversal Strategy 10 @pluradj #ddtx17 #gdtx17
  11. 11. explain() § Details on how a traversal is compiled into a final execution plan 11 @pluradj #ddtx17 #gdtx17
  12. 12. withStrategies() / withoutStrategies() § Add or remove specific traversal strategies to a traversal source 12 @pluradj #ddtx17 #gdtx17
  13. 13. Traversal Strategy Types 1. Decoration 2. Optimization 3. Provider Optimization 4. Finalization 5. Verification 13 @pluradj #ddtx17 #gdtx17
  14. 14. Decoration § Application-level feature that can be embedded into the traversal logic § Event: raise events for graph mutations § Partition: use partition names to restrict element reads/writes § Sack: use a sack to store data that gets updated as traversers split/merge § Subgraph: restrict element reads based on traversals 14 @pluradj #ddtx17 #gdtx17
  15. 15. Finalization § Enforce final adjustment, cleanup, or analysis required before executing the traversal § MatchAlgorithm: used in match() step to reorder execution plan – CountMatchAlgorithm: largest traversal reduction goes first (default) – GreedyMatchAlgorithm: traversers drain in order § Profile: injects profile steps into traversal to measure runtime/counts 15 @pluradj #ddtx17 #gdtx17
  16. 16. Verification § Prevent traversals that are not legal for the application or traversal engine § LambdaRestriction: Do not allow use of lambdas § ReadOnly: Do not allow graph mutations § StandardVerification: Vertex computing steps must be executed by a graph computer. Reducing barrier steps cannot immediately follow repeat steps. 16 @pluradj #ddtx17 #gdtx17
  17. 17. Optimization § A more efficient way to express the traversal using TinkerPop steps only § AdjacentToIncident: replace out().count() with outE().count() § IncidentToAdjacent: replace outE().inV() with out() § Connective: rewrites binary conjunction (and/or steps) § FilterRanking: reorders filter and order steps to prioritize steps that will keep traversers small and bulkable § InlineFilter: removes parent filters when child traversals are pure filters § PathRetraction: traversers shed unneeded path information, reducing path footprint, increasing likelihood of bulking 17 @pluradj #ddtx17 #gdtx17
  18. 18. Provider Optimizations Graph system-specific graph traversals
  19. 19. Sqlg § Implementation of Apache TinkerPop over RDBMS – PostgreSQL – HSQLDB (HyperSQL Database) – H2 Database Engine § Optimizes Gremlin by reducing the number of calls to the RDBMS § Analyze the steps and where possible combine them into a single SqlgGraphStepCompiled or SqlgVertexStepCompiled 19 Sqlg Maintainer Pieter Martin License MIT Latest Release 1.3.2 November 2016 @pluradj #ddtx17 #gdtx17
  20. 20. Sqlg 20 @pluradj #ddtx17 #gdtx17
  21. 21. TitanDB § Scalable graph database distributed on multi-machine clusters § Pluggable storage backends – Apache Cassandra® – Apache HBase® § Pluggable index backends – Apache Solr™ – Elasticsearch™ 21 TitanDB™ Maintainer DataStax License Apache Latest Release 1.0 November 2015 @pluradj #ddtx17 #gdtx17
  22. 22. TitanDB 22 @pluradj #ddtx17 #gdtx17
  23. 23. TitanDB + ScyllaDB storage backend § Scylla is a drop-in replacement for Apache Cassandra 2.1 – Higher throughput, lower latency – C++ implementation, I/O scheduler § Scylla on IBM Compose (beta) – § Titan 1.0 compatibility starting with Scylla 1.3 23 ScyllaDB™ Maintainer ScyllaDB License AGPL Latest Release 1.5 December 2016 @pluradj #ddtx17 #gdtx17
  24. 24. IBM Graph § Fully-managed, Apache TinkerPop compatible OLTP graph database § Focus on your data, not on install and operations § #sleepMore 24 IBM Graph Maintainer IBM License Commercial Latest Release GA July 2016 @pluradj #ddtx17 #gdtx17
  25. 25. On the Horizon More Apache TinkerPop-enabled providers in development
  26. 26. Unipop § Data federation and virtualization engine – Elasticsearch® – JDBC § Models your data as a "virtual" graph § Uses Gremlin as graph query language 26 Unipop Maintainer Sean Barzilay, Ran Magen License Apache Latest Release 0.2 September 2016 @pluradj #ddtx17 #gdtx17
  27. 27. Apache S2Graph (incubating) § A graph database designed for distributed and scalable management of highly interconnected data at web scale § Built with Apache HBase, Scala § S2Graph powers 20+ services in production at Kakao (mobile messaging app) § Apache TinkerPop support coming soon [JIRA S2GRAPH-72] 27 Apache S2Graph (incubating) Maintainer Apache Software Foundation License Apache Latest Release 0.1 October 2016 @pluradj #ddtx17 #gdtx17
  28. 28. HGraphDB § Apache HBase as an Apache TinkerPop Graph Database § Allows user-supplied ids § Integration with Apache Giraph for OLAP 28 HGraphDB Maintainer Robert Yokota License Apache Latest Release 0.4.12 January 2017 @pluradj #ddtx17 #gdtx17
  29. 29. JanusGraph § Fork of TitanDB code base § Scalable graph database distributed on multi-machine clusters with pluggable storage and indexing § Vendor-neutral, open community with open governance 29 JanusGraph™ Maintainer Linux Foundation License Apache First Release Planned 1Q 2017 @pluradj #ddtx17 #gdtx17
  30. 30. Acknowledgements 30 @pluradj #ddtx17 #gdtx17 § The Crew from Aurelius § The Apache Software Foundation § The Linux Foundation § Ketrina Yim
  31. 31. Thank you!