Successfully reported this slideshow.
Your SlideShare is downloading. ×

Cassandra Summit - What's New In Apache TinkerPop?

Ad

What’s New in Apache TinkerPop?
Open Source Graph Computing Framework
http://tinkerpop.incubator.apache.org/
Stephen Malle...

Ad

© 2015. All Rights Reserved.

Ad

By Andrea Mann from London, United Kingdom (Flickr Uploaded by Hohum) [CC BY 2.0
(http://creativecommons.org/licenses/by/2...

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Check these out next

1 of 47 Ad
1 of 47 Ad
Advertisement

More Related Content

Advertisement

Cassandra Summit - What's New In Apache TinkerPop?

  1. 1. What’s New in Apache TinkerPop? Open Source Graph Computing Framework http://tinkerpop.incubator.apache.org/ Stephen Mallette - @spmallette © 2015. All Rights Reserved.
  2. 2. © 2015. All Rights Reserved.
  3. 3. By Andrea Mann from London, United Kingdom (Flickr Uploaded by Hohum) [CC BY 2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons © 2015. All Rights Reserved.
  4. 4. © 2015. All Rights Reserved.
  5. 5. Georgius Agricola, De re metallica 1556 © 2015. All Rights Reserved.
  6. 6. “Woman at spinning wheel with man carding” Smithfield Decretals (British Library, Royal 10 E. IV, fol. 147v), c. 1340“Carding, Spinning and Weaving” by Giovanni Boccaccio from De claris mulieribus 15th Century © 2015. All Rights Reserved.
  7. 7. London, British Library, Royal 18 E.iii (15th century) [Public domain], via Wikimedia Commons © 2015. All Rights Reserved.
  8. 8. [Public domain], via Wikimedia Commons © 2015. All Rights Reserved.
  9. 9. By Unknown. Photo credit: Yale University Art Gallery. In the Public Domain. [Public domain], via Wikimedia Commons [Public domain], via Wikimedia Commons © 2015. All Rights Reserved.
  10. 10. By Dogcow (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons © 2015. All Rights Reserved.
  11. 11. By Adam Schuster (Flickr: Proto IBM) [CC BY 2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons By Arnold Reinhold [CC BY-SA 2.5 (http://creativecommons.org/licenses/by-sa/2.5)], via Wikimedia Commons © 2015. All Rights Reserved.
  12. 12. © 2015. All Rights Reserved.
  13. 13. label: person name: Stephen label: book title: Connections label: person name: James label: bought label: wrote Graph Data Structure © 2015. All Rights Reserved.
  14. 14. TinkerPop 2.0 TinkerPop 3.0 The TinkerPop Stack © 2015. All Rights Reserved.
  15. 15. The TinkerPop Stack © 2015. All Rights Reserved.
  16. 16. Gremlin in TinkerPop3 is NOT “just ” It is advised that not use expressionsƛ supports BOTH imperative and declarative querying © 2015. All Rights Reserved.
  17. 17. $ bin/gremlin.sh ,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: tinkerpop.tinkergraph gremlin> © 2015. All Rights Reserved.
  18. 18. $ bin/gremlin.sh ,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: tinkerpop.tinkergraph gremlin> graph = GraphFactory.open("graph.properties") ==>tinkergraph[vertices:0 edges:0] gremlin> © 2015. All Rights Reserved.
  19. 19. $ bin/gremlin.sh ,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: tinkerpop.tinkergraph gremlin> graph = GraphFactory.open("graph.properties") ==>tinkergraph[vertices:0 edges:0] gremlin> graph.io(gryo()).readGraph('data.kryo') ==>null gremlin> graph ==>tinkergraph[vertices:1933 edges:4125] gremlin> discussion wrote hasResponse person response participatesIn hasRoot © 2015. All Rights Reserved.
  20. 20. $ bin/gremlin.sh ,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: tinkerpop.tinkergraph gremlin> graph = GraphFactory.open("graph.properties") ==>tinkergraph[vertices:0 edges:0] gremlin> graph.io(gryo()).readGraph('data.kryo') ==>null gremlin> graph ==>tinkergraph[vertices:1933 edges:4125] gremlin> g = graph.traversal() ==>graphtraversalsource[tinkergraph[vertices:1933 edges:4125], standard] gremlin> © 2015. All Rights Reserved.
  21. 21. gremlin> g.V(4608) ==>v[4608] 4608 person g.V(4608) “Find the vertex with id 4608” © 2015. All Rights Reserved.
  22. 22. gremlin> g.V(4608).values('userName') ==>Renlit 4608 person g.V(4608) Renlit userName .values('userName') “Get the value of the ‘userName’ property on vertex 4608” © 2015. All Rights Reserved.
  23. 23. gremlin> g.V(4608).out('wrote') ==>v[354560] ==>v[640768] ... ==>v[466432] 4608 wrote person response g.V(4608) .out('wrote') “Find the responses posted by ‘Renlit’” © 2015. All Rights Reserved.
  24. 24. gremlin> g.V(4608).out('wrote').count() ==>67 4608 wrote person response .out('wrote') “Find the number of responses posted by ‘Renlit’” g.V(4608) .count() 67 © 2015. All Rights Reserved.
  25. 25. gremlin> t = g.V(4608).out('wrote').count();null ==>null gremlin> t.strategies.toList() ==>ConjunctionStrategy ==>IncidentToAdjacentStrategy ==>AdjacentToIncidentStrategy ==>IdentityRemovalStrategy ==>DedupBijectionStrategy ==>MatchPredicateStrategy ==>RangeByIsCountStrategy ==>TinkerGraphStepStrategy ==>ProfileStrategy ==>EngineDependentStrategy ==>ComputerVerificationStrategy ==>StandardVerificationStrategy © 2015. All Rights Reserved.
  26. 26. t.strategies.toList() Strategy Application Original Query g.V(4608).out('wrote').count() © 2015. All Rights Reserved. AdjacentToIncidentStrategy Post-Strategies g.V(4608).outE('wrote').count() ConjunctionStrategy IncidentToAdjacentStrategy IdentityRemovalStrategy DedupBijectionStrategy MatchPredicateStrategy RangeByIsCountStrategy TinkerGraphStepStrategy ProfileStrategy EngineDependentStrategy ComputerVerificationStrategy StandardVerificationStrategy
  27. 27. gremlin> g.V(4608).as('a').out('wrote').out('hasResponse').in('wrote') .where(neq('a')).groupCount().next() ==>v[5376]=4 ==>v[2304]=2 ==>v[5888]=7 ... ==>v[10496]=1 4608 wrote person response hasResponse hasResponse hasResponse ... response wrote wrote wrote ... person person 4608 g.V(4608) .as('a') .out('wrote') .out('hasResponse') .in('wrote') .where(neq('a')) .groupCount() “Get a distribution over the authors who replied to ‘Renlit’” © 2015. All Rights Reserved.
  28. 28. gremlin> g.V(4608).out('wrote').values('responseLevel').groupCount() ==>[1:11, 2:19, 3:22, 4:9, 5:3, 6:3] gremlin> 4608 wrote person response g.V(4608) .out('wrote') ... responseLevel .values('responseLevel').groupCount() “Get a distribution over the ‘responseLevel’ value for posts by ‘Renlit’” © 2015. All Rights Reserved.
  29. 29. gremlin> g.V().has('type','response').values('responseLevel').groupCount() ==>[1:358, 2:796, 3:445, 4:150, 5:57, 6:13, 7:4, 8:1] gremlin> response g.V() .has('type','response') ... responseLevel .values('responseLevel') .groupCount() type response “Get a distribution over the ‘responseLevel’ for all posts in the graph”
  30. 30. gremlin> g.V(4608).out('wrote').values('responseLevel').groupCount() ==>[1:11, 2:19, 3:22, 4:9, 5:3, 6:3] gremlin> g.V().has('type','response').values('responseLevel').groupCount() ==>[1:358, 2:796, 3:445, 4:150, 5:57, 6:13, 7:4, 8:1] gremlin> g.V(4608).out('wrote') .values('responseLevel') .groupCount() g.V().has('type','response') .values('responseLevel') .groupCount() © 2015. All Rights Reserved.
  31. 31. gremlin> :install org.apache.tinkerpop hadoop-gremlin 3.0.0-incubating ==>Loaded: [org.apache.tinkerpop, hadoop-gremlin, 3.0.0-incubating] - restart the console to use [tinkerpop.hadoop] gremlin> :exit ... $ bin/gremlin.sh ,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: tinkerpop.tinkergraph gremlin> :plugin use tinkerpop.hadoop ==>tinkerpop.hadoop activated gremlin> hdfs.copyFromLocal('data.kryo', 'data.kryo') ==>null gremlin> hdfs.ls() ==>rw-r--r-- smallette supergroup 5782840 data.kryo gremlin> © 2015. All Rights Reserved.
  32. 32. gremlin> graph = GraphFactory.open('conf/hadoop/data-gryo.properties') ==>hadoopgraph[gryoinputformat->gryooutputformat] gremlin> g = graph.traversal(computer(SparkGraphComputer)) ==>graphtraversalsource[hadoopgraph[gryoinputformat- >gryooutputformat],sparkgraphcomputer] © 2015. All Rights Reserved.
  33. 33. gremlin> graph = GraphFactory.open('conf/hadoop/data-gryo.properties') ==>hadoopgraph[gryoinputformat->gryooutputformat] gremlin> g = graph.traversal(computer(SparkGraphComputer)) ==>graphtraversalsource[hadoopgraph[gryoinputformat- >gryooutputformat],sparkgraphcomputer] gremlin> g.V(4608).out('wrote').values('responseLevel').groupCount() ==>[1:11, 2:19, 3:22, 4:9, 5:3, 6:3] gremlin> g.V().has('type','response').values('responseLevel').groupCount() ==>[1:358, 2:796, 3:445, 4:150, 5:57, 6:13, 7:4, 8:1] © 2015. All Rights Reserved. Any Graph System Neo4j Titan Sqlg BlueMix Hadoop Giraph Spark OrientDB ...
  34. 34. gremlin> :plugin use tinkerpop.gephi ==>tinkerpop.gephi activated gremlin> :remote connect tinkerpop.gephi ==>Connection to Gephi - http://localhost:8080/workspace0 with stepDelay:1000, startRGBColor:[0.0, 1.0, 0.5], colorToFade:g, colorFadeRate:0.7, startSize:20.0,sizeDecrementRate:0.33 © 2015. All Rights Reserved.
  35. 35. gremlin> :plugin use tinkerpop.gephi ==>tinkerpop.gephi activated gremlin> :remote connect tinkerpop.gephi ==>Connection to Gephi - http://localhost:8080/workspace0 with stepDelay:1000, startRGBColor:[0.0, 1.0, 0.5], colorToFade:g, colorFadeRate:0.7, startSize:20.0,sizeDecrementRate:0.33 gremlin> :> graph ==>tinkergraph[vertices:1933 edges:4125] © 2015. All Rights Reserved.
  36. 36. gremlin> :> graph ==>tinkergraph[vertices:1933 edges:4125] © 2015. All Rights Reserved.
  37. 37. gremlin> g.V(10240).values('userName') ==>Naya gremlin> g.V(5888).values('userName') ==>Loret © 2015. All Rights Reserved.
  38. 38. gremlin> subGraph = g.V(10240,5888).repeat(__.outE().subgraph('subGraph').inV()) .times(10) .cap('subGraph').next() ==>tinkergraph[vertices:1152 edges:1343] gremlin> :> subGraph © 2015. All Rights Reserved. Naya Loret
  39. 39. gremlin> :remote config visualTraversal subGraph svg ==>Connection to Gephi - http://localhost:8080/workspace0 with stepDelay:1000, startRGBColor:[0.0, 1.0, 0.5], colorToFade:g, colorFadeRate:0.7, startSize:20.0,sizeDecrementRate:0.33 gremlin> svg ==>graphtraversalsource[tinkergraph[vertices:1152 edges:1343], standard] gremlin> svg.strategies.toList() ==>ConjunctionStrategy ==>IncidentToAdjacentStrategy ==>AdjacentToIncidentStrategy ==>IdentityRemovalStrategy ==>FilterRankingStrategy ==>MatchPredicateStrategy ==>RangeByIsCountStrategy ==>TinkerGraphStepStrategy ==>EngineDependentStrategy ==>GephiTraversalVisualizationStrategy ==>ProfileStrategy ==>ComputerVerificationStrategy © 2015. All Rights Reserved.
  40. 40. gremlin> :> svg.V(10240).as('x').out('wrote').out('hasResponse').in('wrote') .where(neq('x')).groupCount() ==>[v[5888]:4] © 2015. All Rights Reserved.
  41. 41. gremlin> :> svg.V(10240).as('x').out('wrote').out('hasResponse').in('wrote') .where(neq('x')).groupCount() ==>[v[5888]:4] © 2015. All Rights Reserved.
  42. 42. gremlin> :> svg.V(10240).as('x').out('wrote').out('hasResponse').in('wrote') .where(neq('x')).groupCount() ==>[v[5888]:4] © 2015. All Rights Reserved.
  43. 43. gremlin> :> svg.V(10240).as('x').out('wrote').out('hasResponse').in('wrote') .where(neq('x')).groupCount() ==>[v[5888]:4] © 2015. All Rights Reserved.
  44. 44. gremlin> :> svg.V(10240).as('x').out('wrote').out('hasResponse').in('wrote') .where(neq('x')).groupCount() ==>[v[5888]:4] © 2015. All Rights Reserved.
  45. 45. gremlin> :> svg.V(10240).as('x').out('wrote').out('hasResponse').in('wrote') .where(neq('x')).groupCount() ==>[v[5888]:4] © 2015. All Rights Reserved.
  46. 46. Takeaways If you have connected data, use a Graph DB If you use a Graph DB, consider If you use , get started with Gremlin Console © 2015. All Rights Reserved.
  47. 47. Acknowledgements Ketrina Yim @KetrinaYim Artist behind Gremlin and his friends Joe Lee http://jml3designz.com/ Graphic designer providing support on this presentation Apache TinkerPop http://tinkerpop.incubator.apache.org/ The TinkerPop Community © 2015. All Rights Reserved.

Editor's Notes

  • Recognize him? James Burke is a science historian. In 1978, he developed and presented a television documentary called Connections. He’s written several books and has developed several documentary sequels. In his work, he presents an alternative view of history that drops the conventional linear and isolated account of history we’re used to. He instead demonstrates how seemingly unrelated events, chance meetings among unlikely fellows, footnotes in the works of geniuses intermingled with the ongoing chaos of human existence as it pertained to war, famine, sickness, etc. linked together to form the innovations and inventions we take for granted today. There was a particular chapter in his book where he used this view of history to explain the origins of modern day programming.
  • “Renaissance Gremlin” - Ketrina Yim
  • The story begins with the waterwheel an invention that came about around 2000 years ago. It also demonstrates the early desire of humans to “automate”.
    It’s usage expanded rapidly in the 12th century helping to usher in a Medieval Industrial Revolution. It is shown here in use crushing ore, but reached a wide variety of industries to include tanning mills, saw mills, etc.
  • The textile industry might have benefited the most from this innovation where fulling mills helped increase linen production. This was especially true when coupled with the european debut of the horizontal loom and the spinning wheel. By the 14th century, this produced a linen boom in Europe. As there was so much linen available, there was also lots of discarded linen, which is a very useful raw material in the production of high quality paper (which incidentally also made use of the same technology of the fulling mill).
  • There was paper everywhere but the Black Death saw to it that much of the literate community were not around to write on it. As a result, there was a massive demand for scribes. They were expensive and slow and created a demand for “automated writing”. The demand for “automated writing” was answered by the Johannes Gutenberg in the 1400s and his invention of the printing press with movable type. The printing press spread across Europe very quickly and it eventually held its strongest foothold in Venice, Italy.
  • In all of Venice, the busiest printer was Aldus Manutius who took a different approach to printing in that he focused on printing small, inexpensive, pocket-style books. He was also quite interested in printing translated versions of the Greek Classics and as fortune would have it, many of his workers were Greek refugees that came to Italy after the Fall of Constantinople to the Turks. As a result of this work, The Renaissance has been seeded with Greek philosophy and science.
  • With this renewed interest interest in Greek science came interest in the pneumatic and hydraulic machinations of Hero of Alexanderia. This led to moving toys, complex clocks, watergardens, self-playing organs, a mechanical duck that could “digest” food, and other interesting baubles. This interest in automata helped to solve a particularly vexing problem in the French silk industry where costly errors were occurring in the very manual process of getting patterns properly woven.
  • It was the son of an organ maker, Basile Bouchon, who came up with the initial solution in 1725. He encoded the patterns on to paper with holes that the machine would interpret in order to establish the appropriate weaving pattern. The idea didn’t immediately catch on and this innovation was improved upon several times by different individuals until ultimately in 1801 it was Joseph Marie Jacquard who ended up acquiring most of the credit for the work for what we know today as the Jacquard loom.
  • The concept of telling machines what to do via “paper with holes” spread to use in engineering fields and was used in tabulating the 1890 US census by Herman Hollerith and then later to program computers. As the input and output devices for computers evolved so did the programming languages giving us LISP, COBOL, C, Java and eventually the Gremlin programming language.
  • The point of this history lesson was to show how adding connections among historical moments present a new way to look at facts yielding a fresh analysis. By looking at data in a different way it sometimes presents new opportunities for understanding. So to that end, what if this concept of connections within history was made more generic? Rather than just connecting historical points, what if the connections applied to something as general as an “entity”? In this way any data, would fit this model yielding the opportunity to see how any one entity related to any other in the set. What if developers looked at their data in this way and treated the connections in their data as high value? What kinds of interesting things would be uncovered? Assuming one went that far in placing high value on connections, then one would want their database to treat those connections as first-class citizens. A database that does that is a graph database.
  • A graph has vertices and edges. A vertex is a domain object or entity and an edge is the connection or relationship between two vertices. A vertex and edge can have a label and arbitrary set of attributes/key-value pairs. This structure provides for a flexible and real-world way to represent data.
  • The TinkerPop Stack provides a foundation for developing applications over property graphs. TP2 separated the different components of TinkerPop into different projects. In TP3 there is only one project but the essence of each of the original TP2 projects is still present.
  • The TinkerPop Stack provides a graph abstraction layer over different graph databases/processors. The Gremlin query language operates over that for interacting with the graph data. Gremlin Server provides a way to remotely execute those traversals or to centralize their functionality as a service.
  • How does one get started with TinkerPop? The Gremlin Console, of course! Note that TP3 has a plugin system that makes it possible to extend the Console. The Gremlin Console is an invaluable tool as it allows for instant feedback while coding. Use it to load data, administer a graph, perform ad-hoc analysis, or work tough tough bits of a complex traversal.
  • GraphFactory demonstrates TinkerPop as a graph database abstraction layer. Another “getting started” note: use TinkerGraph!
  • Data is a subset of data taken from a collaborative study between Pearson Global Higher Education and Columbia University. Together they studied how social interactions and knowledge construction unfold in online courses.

    Baker-Stein, Marni; York, Sean; and Dashew, Brian (2014) "Visualizing Knowledge Networks in Online Courses," Internet Learning: Vol. 3: Iss. 2, Article 8.
    Available at: http://digitalcommons.apus.edu/internetlearning/vol3/iss2/8
  • The TraversalSource provides context to a Gremlin. This context defines the GraphComputer to utilize when executing the traversal and TraversalStrategy implementations to be applied.
  • AdjacentToIncidentStrategy is an example of an optimization strategy but there are many other types of strategies and use cases for them. Decorative strategies provide features at the application level (e.g. ReadOnlyStrategy). Implementers of the core APIs for graph databases may use strategies to showcase the underlying capabilities of their systems by writing strategies to take advantage of indices or other meta-data they have about their graph to improve performance.
  • TinkerPop 3 was built from the ground up with the idea that it would provide support for OLTP and OLAP. OLTP or local traversals are ones that start with a one vertex of a small subset of vertices and traverse away from there. OLAP traversals need to touch the entire graph to execute. In recent years, distributed processing frameworks like Hadoop and Spark have come along and their features can be applied to these OLAP type queries.
  • Initialize the Hadoop Plugin and make the data available.
  • Get a HadoopGraph instance then specify that it should use the SparkGraphComputer.
  • So, what do those two queries look like when you execute them with Spark? They are identical. Gremlin is not only an abstraction layer over graph databases but also over graph processing frameworks so just as the same Gremlin script will run on Neo4j as it will for Titan, the same Gremlin will also run over Spark or Giraph.
  • Everyone likes to visualize their graph data when they first get started. TinkerPop provides visualization support through integration with Gephi - a graph visualization application.
  • The use of :submit (shorthanded by :>) will submit the “graph” to the currently active “remote” in the console. In this case, the current remote is Gephi, so the graph instance will be streamed there.
  • ……….and you get a hairball!
  • …..but there’s two interesting bits, perhaps….
  • ...if these two interesting bits are subgraphed out, then it’s easier to understand without all the additional noise
  • It is also possible to visualize the execution of a Traversal.

×