Recognize him? James Burke is a science historian. In 1978, he developed and presented a television documentary called Connections. He’s written several books and has developed several documentary sequels. In his work, he presents an alternative view of history that drops the conventional linear and isolated account of history we’re used to. He instead demonstrates how seemingly unrelated events, chance meetings among unlikely fellows, footnotes in the works of geniuses intermingled with the ongoing chaos of human existence as it pertained to war, famine, sickness, etc. linked together to form the innovations and inventions we take for granted today. There was a particular chapter in his book where he used this view of history to explain the origins of modern day programming.
“Renaissance Gremlin” - Ketrina Yim
The story begins with the waterwheel an invention that came about around 2000 years ago. It also demonstrates the early desire of humans to “automate”. It’s usage expanded rapidly in the 12th century helping to usher in a Medieval Industrial Revolution. It is shown here in use crushing ore, but reached a wide variety of industries to include tanning mills, saw mills, etc.
The textile industry might have benefited the most from this innovation where fulling mills helped increase linen production. This was especially true when coupled with the european debut of the horizontal loom and the spinning wheel. By the 14th century, this produced a linen boom in Europe. As there was so much linen available, there was also lots of discarded linen, which is a very useful raw material in the production of high quality paper (which incidentally also made use of the same technology of the fulling mill).
There was paper everywhere but the Black Death saw to it that much of the literate community were not around to write on it. As a result, there was a massive demand for scribes. They were expensive and slow and created a demand for “automated writing”. The demand for “automated writing” was answered by the Johannes Gutenberg in the 1400s and his invention of the printing press with movable type. The printing press spread across Europe very quickly and it eventually held its strongest foothold in Venice, Italy.
In all of Venice, the busiest printer was Aldus Manutius who took a different approach to printing in that he focused on printing small, inexpensive, pocket-style books. He was also quite interested in printing translated versions of the Greek Classics and as fortune would have it, many of his workers were Greek refugees that came to Italy after the Fall of Constantinople to the Turks. As a result of this work, The Renaissance has been seeded with Greek philosophy and science.
With this renewed interest interest in Greek science came interest in the pneumatic and hydraulic machinations of Hero of Alexanderia. This led to moving toys, complex clocks, watergardens, self-playing organs, a mechanical duck that could “digest” food, and other interesting baubles. This interest in automata helped to solve a particularly vexing problem in the French silk industry where costly errors were occurring in the very manual process of getting patterns properly woven.
It was the son of an organ maker, Basile Bouchon, who came up with the initial solution in 1725. He encoded the patterns on to paper with holes that the machine would interpret in order to establish the appropriate weaving pattern. The idea didn’t immediately catch on and this innovation was improved upon several times by different individuals until ultimately in 1801 it was Joseph Marie Jacquard who ended up acquiring most of the credit for the work for what we know today as the Jacquard loom.
The concept of telling machines what to do via “paper with holes” spread to use in engineering fields and was used in tabulating the 1890 US census by Herman Hollerith and then later to program computers. As the input and output devices for computers evolved so did the programming languages giving us LISP, COBOL, C, Java and eventually the Gremlin programming language.
The point of this history lesson was to show how adding connections among historical moments present a new way to look at facts yielding a fresh analysis. By looking at data in a different way it sometimes presents new opportunities for understanding. So to that end, what if this concept of connections within history was made more generic? Rather than just connecting historical points, what if the connections applied to something as general as an “entity”? In this way any data, would fit this model yielding the opportunity to see how any one entity related to any other in the set. What if developers looked at their data in this way and treated the connections in their data as high value? What kinds of interesting things would be uncovered? Assuming one went that far in placing high value on connections, then one would want their database to treat those connections as first-class citizens. A database that does that is a graph database.
A graph has vertices and edges. A vertex is a domain object or entity and an edge is the connection or relationship between two vertices. A vertex and edge can have a label and arbitrary set of attributes/key-value pairs. This structure provides for a flexible and real-world way to represent data.
The TinkerPop Stack provides a foundation for developing applications over property graphs. TP2 separated the different components of TinkerPop into different projects. In TP3 there is only one project but the essence of each of the original TP2 projects is still present.
The TinkerPop Stack provides a graph abstraction layer over different graph databases/processors. The Gremlin query language operates over that for interacting with the graph data. Gremlin Server provides a way to remotely execute those traversals or to centralize their functionality as a service.
How does one get started with TinkerPop? The Gremlin Console, of course! Note that TP3 has a plugin system that makes it possible to extend the Console. The Gremlin Console is an invaluable tool as it allows for instant feedback while coding. Use it to load data, administer a graph, perform ad-hoc analysis, or work tough tough bits of a complex traversal.
GraphFactory demonstrates TinkerPop as a graph database abstraction layer. Another “getting started” note: use TinkerGraph!
Data is a subset of data taken from a collaborative study between Pearson Global Higher Education and Columbia University. Together they studied how social interactions and knowledge construction unfold in online courses.
Baker-Stein, Marni; York, Sean; and Dashew, Brian (2014) "Visualizing Knowledge Networks in Online Courses," Internet Learning: Vol. 3: Iss. 2, Article 8. Available at: http://digitalcommons.apus.edu/internetlearning/vol3/iss2/8
The TraversalSource provides context to a Gremlin. This context defines the GraphComputer to utilize when executing the traversal and TraversalStrategy implementations to be applied.
AdjacentToIncidentStrategy is an example of an optimization strategy but there are many other types of strategies and use cases for them. Decorative strategies provide features at the application level (e.g. ReadOnlyStrategy). Implementers of the core APIs for graph databases may use strategies to showcase the underlying capabilities of their systems by writing strategies to take advantage of indices or other meta-data they have about their graph to improve performance.
TinkerPop 3 was built from the ground up with the idea that it would provide support for OLTP and OLAP. OLTP or local traversals are ones that start with a one vertex of a small subset of vertices and traverse away from there. OLAP traversals need to touch the entire graph to execute. In recent years, distributed processing frameworks like Hadoop and Spark have come along and their features can be applied to these OLAP type queries.
Initialize the Hadoop Plugin and make the data available.
Get a HadoopGraph instance then specify that it should use the SparkGraphComputer.
So, what do those two queries look like when you execute them with Spark? They are identical. Gremlin is not only an abstraction layer over graph databases but also over graph processing frameworks so just as the same Gremlin script will run on Neo4j as it will for Titan, the same Gremlin will also run over Spark or Giraph.
Everyone likes to visualize their graph data when they first get started. TinkerPop provides visualization support through integration with Gephi - a graph visualization application.
The use of :submit (shorthanded by :>) will submit the “graph” to the currently active “remote” in the console. In this case, the current remote is Gephi, so the graph instance will be streamed there.
……….and you get a hairball!
…..but there’s two interesting bits, perhaps….
...if these two interesting bits are subgraphed out, then it’s easier to understand without all the additional noise
It is also possible to visualize the execution of a Traversal.
Cassandra Summit - What's New In Apache TinkerPop?
==>[1:358, 2:796, 3:445, 4:150, 5:57, 6:13, 7:4, 8:1]
“Get a distribution over the ‘responseLevel’ for all posts in the graph”