SlideShare a Scribd company logo
1 of 129
Download to read offline
Memoirs of a Graph Addict:
         Despair to Redemption
                  Marko A. Rodriguez
                Graph Systems Architect
             http://markorodriguez.com
             http://twitter.com/twarko




Winter Whirlwind Tour – Chicago to Malm¨ – January 10-14, 2011
                                       o

                     January 8, 2011
Abstract


A graph database provides a means of linking together objects using direct
references. In other words, in order to determine if one object is adjacent
to another, no index lookup is required. In contrast to relational databases,
in a graph database, there is no notion of a join operation as the graph is
already an explicitly joined structure. Given a graph, problems are solved
using graph traversals–that is, directed walks over the objects and relations
that compose the graph. This lecture has three primary points of
discussion. The first is a description of graph database technology. The
second, a memoir of the speaker’s applied and theoretical work with
graphs. The third and final point, a review of an open source graph
processing stack currently being developed by AT&T Interactive and its
collaborators.
For 10 years now, I’ve dealt with a painful graph addiction...
                Let me share my story with you.
Outline



• Graph Structures

• Graph Databases

• Graph Applications

• TinkerPop Product Suite
Outline



• Graph Structures

• Graph Databases

• Graph Applications

• TinkerPop Product Suite
Graph Data Structure Pieces: Part 1




 id          vertex (thing, object, dot)

                                           }


                                               element
             edge (relation, join, line)
Single-Relational Graph


                      marko                   peter

                                                                  neotech




                                 tinkerpop

                                                                   neo4j




                      gremlin                 blueprints




In single-relational graphs, things are related. Unfortunately, not a very useful structure
for most domain modeling situations. Relatedness is too generic—all edges have the
same meaning.
Graph Data Structure Pieces: Part 2




 id          vertex (thing, object, dot)

                                           }


                                               element
label        edge (relation, join, line)
Multi-Relational Graph
                                     knows


                    marko                knows               peter
                                                                          member
                                                                                    neotech
                            member               member

                                                                                    created

                                     tinkerpop

                                                                                     neo4j
                               created             created
                                                                          imports

                     gremlin             imports             blueprints




By adding labels to the edges, its possible to denote the type of relation that exists
between any two vertices. Now its possible to denote different types of things and the
different ways in which they relate to one another.
Graph Data Structure Pieces: Part 3



    id        vertex (thing, object, dot)

                                            }



                                                element
   label      edge (relation, join, line)


key=value     property (key/value, attribute)

key1=value1
key2=value2   property map
Property Graph
                                      knows


                     marko                knows            peter
                                                                        member
                                                                                    neotech
                             member               member

                                                                                    created

                                      tinkerpop

                       date=2009                       date=2009                     neo4j
                                created             created
                                                                        imports    lang=java
                                                                                  use=graphdb
                      gremlin             imports          blueprints
                                              lang=java
                     lang=java
                                               use=api
                    use=traverse




Allow elements to have key/value properties. In particular, very useful for further
specifying the meaning of an edge. “When did TinkerPop create Gremlin?”
Numerous Graph Types

                                            vertex-labeled
                                                    a
                        multi



                                      ted
                                   igh                                                hyper
                                 we 0.2
                                                             edge-labeled
                                                                knows
                                     simple                                                                created=2-01-09
                                                                                                           modified=2-11-09
                ge




                                                    tic




                                                                                              undirected
              half-ed




                                            hired                di
                                                an
                                                                      re                                    edge-attributed
                                                                           cte
                                              sem


                                                                                                                              pseudo
                                                                                 d


                                 name=emil
                                 type=person                                         http://ex.com/123
                                vertex-attributed               resource description framework



Rodriguez, M.A., Neubauer, P., “Constructions from Dots and Lines,” Bulletin of the American Society for Information Science

and Technology, 36(6), pp. 35-41, 2010. [http://arxiv.org/abs/1006.2361]
Property Graph as a Rich Structure
                                                                weighted graph


                                                                add weight attribute


                                                                 property graph


                                              remove attributes remove attributes           no op



                           labeled graph          no op         semantic graph              no op    directed graph

                                       remove edge labels       remove edge labels
                           make labels URIs                                                 no op



                              rdf graph                            multi-graph                      remove directionality


                                                            remove loops, directionality,
                                                               and multiple edges



                                                                  simple graph              no op   undirected graph




A fun related thought: Rodriguez, M.A., “Mapping Semantic Networks to Undirected Networks,” International Journal of

Applied Mathematics and Computer Sciences, 4(1), pp. 39–42, 2009. [http://arxiv.org/abs/0804.0277]
Graph Algorithms in Single-Relational Graphs

• Most graph algorithms are designed for single-relational graphs.1
       Geodesic: shortest path, eccentricity, diameter, closeness centrality,
       betweenness centrality, etc.
       Eigenvector: spreading activation, pagerank, eigenvector centrality,
       etc.
       Assortative: scalar, assortative, etc.




  1
   Excellent book reviewing numerous graph algorithms: Brandes U., Erlebach, T., “Network Analysis:
Methodological Foundations,” Springer, 2005.
Graph Algorithms in Multi-Relational+ Graphs
• Most real-world software systems require multi-relational+ graphs. E.g.:
  Who are the most central coauthors when all I know is wrote?

                                 coauthor
                                                         coauthor


                                                                wrote
                       wrote     wrote        wrote     wrote                  wrote




• A key concept when evaluating graph algorithms over multi-relational+
  graphs is implicit adjacency/path descriptions/virtual edges/etc.2
   2
    Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network Analysis
Algorithms,” Journal of Informetrics, 4(1), pp. 29–41, 2009. [http://arxiv.org/abs/0806.2274]
Outline



• Graph Structures

• Graph Databases

• Graph Applications

• TinkerPop Product Suite
The Simplicity of a Graph

• A graph is a simple data structure.

• A graph states that something is related to something else (the foundation
  of any other data structure).3

• It is possible to model a graph in various types of databases.4
       Relational database: MySQL, Oracle, PostgreSQL
       JSON document database: MongoDB, CouchDB
       XML document database: MarkLogic, eXist-db
       etc.
   3
     A graph can be used to represent other data structures. This point becomes convenient when looking
beyond using graphs for typical, real-world domain models (e.g. friends, favorites, etc.), and seeing their
applicability in other areas such as modeling code (e.g. http://arxiv.org/abs/0802.3492), indices, etc.
   4
     For the sake of diagram clarity, the examples to follow are with respect to a single-relational, directed
graph. Note that it is possible to model multi-relational graphs in these types of database as well.
Representing a Graph in a Relational Database

outV | inV
------------                           A

  A   |   B
  A   |   C
  C   |   D                  B                     C

  D   |   A

                                       D
Representing a Graph in a JSON Database

{
    A : {
                                         A
      outE   : [B, C]
    }
    B : {
      outE   : []
    }                           B                  C
    C : {
      outE   : [D]
    }
    D : {
                                         D
      outE   : [A]
    }
}
Representing a Graph in an XML Database

<graphml>
  <graph>
                                         A
    <node id=A />
    <node id=B />
    <node id=C />
    <node id=D />
    <edge source=A   target=B   />   B           C
    <edge source=A   target=C   />
    <edge source=C   target=D   />
    <edge source=D   target=A   />
  </graph>
                                         D
</graphml>
Defining a Graph Database




“If any database can represent a graph, then what
              is a graph database?”
Defining a Graph Database




A graph database is any storage system that
       provides index-free adjacency.
Defining a Graph Database by Example

            Toy Graph                Gremlin
                                     (stuntman)

        B               E



A


        C               D
Graph Databases and Index-Free Adjacency
                                     B                    E



                     A


                                     C                    D


• Our gremlin is at vertex A.
• In a graph database, vertex A has direct references to its adjacent vertices.
• Constant time cost to move from A to B and C . It is dependent upon the number
  of edges emanating from vertex A (local).
Graph Databases and Index-Free Adjacency


                   B                E



        A


                   C                D



             The Graph (explicit)
Graph Databases and Index-Free Adjacency


                   B                E



       A


                   C                D



             The Graph (explicit)
Non-Graph Databases and Index-Based Adjacency



                                        B    E



      A        B   C                A
     B,C       E   D,E

                         D      E
                                        C    D



• Our gremlin is at vertex A.
Non-Graph Databases and Index-Based Adjacency


                                                       B                 E



      A         B     C                   A
      B,C        E   D,E

                           D       E
                                                       C                 D



• In a non-graph database, the gremlin needs to look at an index to determine what
  is adjacent to A.
• log(n) time cost to move to B and C . It is dependent upon the total number of
  vertices and edges in the database (global).
Non-Graph Databases and Index-Based Adjacency


                                         B                  E



A          B     C               A
B,C        E    D,E

                       D     E           C                  D




      The Index (explicit)           The Graph (implicit)
Non-Graph Databases and Index-Based Adjacency



                                         B                  E



A          B     C               A
B,C        E    D,E

                       D     E           C                  D




      The Index (explicit)           The Graph (implicit)
Index-Free Adjacency
• While any database can implicitly represent a graph, only a
  graph database makes the graph structure explicit.5

• In a graph database, each vertex serves as a “mini index”
  of its adjacent elements.6

• Thus, as the graph grows in size, the cost of a local step
  remains the same.7
   5
      Please see http://markorodriguez.com/Blarko/Entries/2010/3/29_MySQL_vs._Neo4j_on_a_
Large-Scale_Graph_Traversal.html for some performance characteristics of graph traversals in a
relational database (MySQL) and a graph database (Neo4j).
    6
      Each vertex can be intepreted as a “parent node” in an index with its children being its adjacent
elements. In this sense, traversing a graph is analogous in many ways to traversing an index—albeit the
graph is not an acyclic connected graph (tree). (a vision espoused by Craig Taverner)
    7
      A graph, in many ways, is like a distributed index.
Graph Query = Graph Traversal

• Graph databases are optimized for graph-theoretic operations
  (e.g. graph traversals).

• Graph databases are not optimized for set-theoretic
  operations (e.g. union, intersection, theta-join).

• The graph traversal pattern:8
       Given some root set of elements, traverse in X fashion
       to yield some side-effect and/or destination.
  8
    Rodriguez, M.A., Neubauer, P., “The Graph Traversal Pattern,” Graph Data Management: Techniques
and Applications, eds. S. Sakr, E. Pardede, IGI Global, 2011. http://arxiv.org/abs/1004.1001
Outline



• Graph Structures

• Graph Databases

• Graph Applications

• TinkerPop Product Suite
Adventures in Graphlandia


My graph disease first started in 2001 and it’s only progressed since...


• Collective decision making: graph-based voting.

• Eudaemonic engine: graph-based recommendation.

• Universal computer: graph-based computing.
Collective Decision Making: Fall of the Modern World




            The year is 2014.
Oil production has dropped significantly. Any reserves that are left are too
expensive to purchase. Nations can not transport food.9

Regions with poor agriculture yield famine.




  9
      Peak oil available at http://en.wikipedia.org/wiki/Peak_oil.
People are in shock, fear, and panic over the fall of
the modern world.



The world sees a 75% drop in human population.
The technology and knowledge of the modern world
still exists.

The social infrastructure doesn’t....A few rise to
create a new world order.10



  10
    Watkins, J.H., M.A. Rodriguez, “A Survey of Web-Based Collective Decision Making Systems,” Studies
in Computational Intelligence: Evolution of the Web in Artificial Intelligence Environments, eds. R. Nayak,
N. Ichalkaranje, and L.C. Jain, pp. 245-279, 2008. [http://escholarship.org/uc/item/04h3h1cr]
Collective Decision Making: Rise of the Machines

Four strong, brave men begin the
journey to stability. Decisions        marko          peter

need to be made regarding how
to determine and execute social
goals. The distributed collective of
TinkerPop is created.                          josh


• Marko Rodriguez (former USA)

• Peter Neubauer (former Sweden)                         pavel


• Josh Shinavier (former China)

• Pavel Yaskevich (former Belarus)
Collective Decision Making: Rise of the Machines

                                        marko      peter


                                         josh      pavel




                                                                 Dynamically Distribute
            Direct Democracy                                         Democracy



Two examples will be presented for the same decision making scenario. One using direct
democracy as the aggregation algorithm and one using dynamically distributed
democracy as the aggregation algorithm.11
  11
   Rodriguez, M.A., Watkins, J.H., “Revisiting the Age of Enlightenment from a Collective Decision
Making Systems Perspective,” First Monday, 14(8), 2009. [http://arxiv.org/abs/0901.3929]
Collective Decision Making: Direct Democracy

• “What percentage of our crop
                                      marko          peter
  yield should we store as             0.8            0.5
  reserves?”

• The outcome is represented as a
  real value in [0, 1].                       josh
                                               0.8

• Each individual has their opinion
  of the situation.
                                                        pavel
    Marko (80% should be stored.)                        0.9
    Peter (50% should be stored.)
    Josh (80% should be stored.)
    Pavel (90% should be stored.)
Collective Decision Making: Direct Democracy

• In a direct democracy, every one
                                         marko          peter
  voices their opinion.                   0.8            0.5




• The average of all voiced opinions
  is the final decision (even in binary           josh
  decisions).                                     0.8




• For our society of 4, a pure direct
                                                           pavel
  democracy would yield                                     0.9
  (0.8 + 0.5 + 0.8 + 0.9)/4 = 0.75.
Collective Decision Making: Direct Democracy

• If an individual abstains from
                                      marko          peter
  participation, then their opinion    0.8            0.5
  is not considered.

• Assume only Peter and Pavel are
  there to participate. Marko and             josh
                                               0.8
  Josh are out hunting.

• For our society of 4 (with 2
  voters), a pure direct democracy                      pavel
  would yield                                            0.9

  (0.5 + 0.9)/2 = 0.7.
  |0.75 − 0.7| = 0.05 error.
Collective Decision Making: Representative Democracy




• Thomas Paine stated that when populations are small “some convenient
  tree will afford them a State house”, but as the population increases it
  becomes a necessity for representatives to “act in the same manner as
  the whole body would act were they present.”12 13
  12
   Paine, T., “Common Sense,” 1776.
  13
   The role of the representative as an expert vs. a model is argued at length in Pitkin, H.F., “The
Concept of Representation,” University of California Press, 1972.
Collective Decision Making: DDD

• Dynamically distributed democracy (DDD) strikes a balance between
  direct and representative democracy.

• An individual is at least a representative of themselves.

• An individual can also yield the power of those that abstain from
  participation.

• Dynamically distributing representative power is the purpose of the
  algorithm.
Collective Decision Making: DDD

• Peter believes that Josh and
  Marko are good decision makers.            marko   0.5           peter



• When Peter abstains, Marko                                 0.5
  and Josh yield his social power
  in equal parts (0.5).                               josh


• Like a friendship graph, but the
  edges denote “trust.”
    “I believe that X has identical values                            pavel
    to me and will behave as I do.”
    “I believe that X is more expert than
    I and should make decisions.”
Collective Decision Making: DDD

• Marko believes Josh is the key to
  humanity.                           marko         0.5            peter



                                              1.0           0.5

• Josh prefers people closer to his                                0.25

  eastern home of former China.                      josh


                                                            0.75

• Pavel is of the former Soviet
  Union, and simply has no faith                                      pavel

  in anyone.
Collective Decision Making: DDD
                      marko         0.5            peter



                              1.0           0.5
                                                   0.25

                                     josh


                                            0.75



                                                      pavel




This is the trust-based social graph. Individuals can add/remove
outgoing edges from their vertex as they please. When decisions are
required, the current snapshot of the graph is used to compute the
collective decision.
Collective Decision Making: DDD

• In a dynamically distributed
  democracy, every can voice their      marko         0.5            peter

  opinion.
                                                1.0           0.5
• The weighted average of all                                        0.25
  voiced opinions is the final                          josh
  decision.
                                                              0.75
• For our society of 4, a pure direct
  democracy would yield
                                                                        pavel
  (0.8 + 0.5 + 0.8 + 0.9)/4 = 0.75.

• When everyone participates,
  its a direct democracy.
Collective Decision Making: DDD

• Assume Marko and Josh go
                                       1.0                         1.0
  hunting, again. By abstaining,
                                      marko                        peter
  they diffuse their vote power         0.8
                                                    0.5
                                                                    0.5
  over their outgoing edges.
                                              1.0           0.5

• By participating, Peter and                                      0.25

                                                     josh
  Pavel aggregate vote power                          0.8
  through their incoming edges.                       1.0
                                                            0.75
                                                                         1.0
• This diffusion process continues
                                                                      pavel
  until all power has aggregated at                                    0.9
  participating individuals.
Collective Decision Making: DDD

• Note that Marko fully trusts Josh
  decision making abilities.                                        1.25
                                       marko                        peter
                                                     0.5
                                        0.8                          0.5

• However, given that Josh is not
                                               1.0           0.5
  participating, Marko is implicitly
                                                                    0.25
  stating that he trusts Josh’s
                                                      josh
  decision in choosing decision                        0.8

  makers.                                              1.0
                                                             0.75
                                                                       1.75
                                                                       pavel
• Thus, Josh serves to route                                            0.9
  Marko’s power.
Collective Decision Making: DDD

• In the end, Peter and Pavel
  have aggregated all the energy
                                                                     1.5
  in the graph (albeit, to different     marko                        peter
                                                      0.5
  degrees).                              0.8                          0.5


                                                1.0           0.5
• Now a weighted direct democracy                                    0.25
  is used to calculate the collective                  josh
                                                        0.8
  decision.
                                                              0.75
                                                                           2.5
• The collective vote is
                                                                        pavel
  ((1.5·0.5)+(2.5·0.9))/4 = 0.75.                                        0.9
  |0.75 − 0.75| = 0.0 error.
Collective Decision Making: DDD




                                                                                     0.20
                                                                       correct decisions
                                                      0.00 0.05 0.10 0.15 0.95
                                                                                                  direct democracy
                                                                                                  dynamically distributed democracy




                                                                         0.80
                                                         proportion oferror
                                                               0.65
                                                                                                  dynamically distributed democracy
                                                                                                  direct democracy




                                                      0.50
                                                                                             100 90 80 70 60 50 40 30 20 10
                                                                                            100 90 80 70 60 50 40 30 20 10                  0
                                                                                                                                            0
                                                                                                         percentage of active citizens
                                                                                                        percentage of active citizens (n)

                                                          Fig. 5. The relationship between k and evote for direct democracy (gray
                                                                                                        k
                                                          line) and dynamically distributed democracy (black line). The plot provides
                                                          the proportion of identical, correct decisions over a simulation that was run
                                                • As participation wanes, dynamically 6. A visualization
                                                          with 1000 artificially generated networks composed of 100 citizens each.
                                                                                                               Fig.
                                                  distributed democracy is able to1, andcolor denotes th       citizen’s
                                                                                                               is        purple is 0.5.
                                                     As previously stated, let x ∈ [0, 1]n denote 14 political Reingold layout.
                                                                                                  the
                                                  simulate direct democracy. xi is the
                                                  tendency of each citizen in this population, where
                                                      tendency of citizen i and, for the purpose of simulation, is
                                                      determined from a uniform distribution. Assume that every n “vote power” and
                                                                                                                      1
 14
    Rodriguez, M.A., Steinbock, D.J., “A Social      Networka population of n citizens uses some social network- such that thentotal a
                                                      citizen in    for Societal-Scale Decision-Making
                                                      based system to create links to those individuals that they 1. Let y ∈ R+ deno
Systems,” Proceedings of the Computational Social   and Organizational Science In practice, these links flowed to each citize
                                                      believe reflect their tendency the best. Conference, 2004.
[http://arxiv.org/abs/cs/0412047]                     may point to a close friend, a relative, or some public figure a ∈ {0, 1}n denotes
                                                      whose political tendencies resonate with the individual. In in the current decis
                                                      other words, representatives are any citizens, not political values of a are biase
                                                      candidates that serve in public office. Let A ∈ [0, 1]n×n denote of making the citize
                                                      the link matrix representing the network, where the weight of the citizen inactive.
                                                      an edge, for the purpose of simulation, is denoted              where ◦ denotes en

                                                                                                         1 − |xi − xj | if link exists
                                                                                               Ai,j =                                           π←0
                                                                                                         0              otherwise.                    i≤
                                                                                                                                                while i=
                                                                                                                                                   y←y
Collective Decision Making: Techno-Government
• In this model of decision making, there is no governmental body.

• Power is determined when a decision is needed.

• How are bills created? Wikilegislature?15

• What about different types of trust (e.g. “Marko trusts Josh in
  engineering decisions only.”) — Hint: Multi-relational+ graphs. Tagging
  legislature and tagging trust.16
  15
     Turoff, M., Roxanne-Hiltz, S., Bieber, M., Rana, A., “Collaborative Discourse Structures in Computer
Mediated Group Communications”, Hawaii International Conference on Systems Science (HICSS), 1998.
[http://web.njit.edu/~turoff/Papers/CDSCMC/CDSCMC.htm]
  16
     Rodriguez, M.A., “Social Decision Making with Multi-Relational Networks and Grammar-Based
Particle Swarms,” Hawaii International Conference on Systems Science (HICSS), pp. 39–49, 2007.
[http://arxiv.org/abs/cs/0609034]
“The founders of modern democracies provided a moral heritage that
   remains highly regarded in societies today. However, it should be
   remembered that it is the ideals that are valuable, not the specific
   implementation of the systems that protect and support them. If
   there is another implementation of government that better realizes
   these ideals, then, by the rights of man, it must be enacted.”17

                                                                 – Michael Scott



  17
    Rodriguez, M.A., Watkins, J.H., “Revisiting the Age of Enlightenment from a Collective Decision
Making Systems Perspective,” First Monday, 14(8), University of Illinois at Chicago Library, 2009.
[http://arxiv.org/abs/0901.3929]
Eudaemonic Engine: Seeking Virtue through Circuitry




            The year is 2018.
Human life on earth has stabilized.
Humans no longer struggle to survive. They
struggle for eudaemonia. They seek the “good
daemon” within...
Eudaemonic Engine: Artistotle

• Being virtuous is repeatedly choosing correctly.
• Habitual correct behavior leads to eudaemonia – complete engagement in the world
  (a complete sense of engagement/acceptance).18 19
• Can systems aid individuals in choosing correctly – in all aspects of life?




                                      Aristotle                          David L. Norton

 18
      Aristotle, “Nicomachean Ethics”, 350 B.C.
 19
      Mihaly Csikszentmihalyi, “Flow: The Psychology of Optimal Experience”, Harper Perennial, 1990.
Eudaemonic Engine: Resource Modeling
    But if the development of character is a the moral objective, it is obvious that
    [...] the choices of vocation and avocations to pursue, of friends to cultivate, of
    books to read are moral for they clearly influence such development.20

• Web services are continuing to build richer models of humans, resources,
  and the relationships between them.

• There exists an increasing reliance on such services to aid in decision
  making: correct books (Amazon.com), correct movies (NetFlix.com),
  correct music (Pandora), correct occupation (Monster.com), correct
  friends (PointsCommuns.com), correct life partner (Match.com), etc.21
  20
       David L. Norton, “Democracy and Moral Development: A Politics of Virtue”, University of California Press, 1991.
  21
       Rodriguez, M.A., Watkins, J., “Faith in the Algorithm, Part 2:       Computational Eudaemonics,” Proceedings of the
International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, 5712, pp. 813–820, 2009.
[http://arxiv.org/abs/0904.0027]
Eudaemonic Engine: Mapping Person to Resource

                                            movie


                             watch
                                            article
                              read
                                                      time
                 person       listen        music

                              meet
                                             friend
                              eat


                                             food




Map an individual to actions on resources. However, how do we
model/expose the resources of the world?
Model
Eudaemonic Engine: The Web of Data
                                homologenekegg                   projectgutenberg
                             symbol                                          libris
                                                  cas                               bbcjohnpeel
                  unists                   diseasome dailymed                 w3cwordnet
                                  chebi
                                       hgnc     pubchem           eurostat
                         mgi               omim                      wikicompany         geospecies
                                  geneid
                      reactome                drugbank                        worldfactbook
                                                                magnatune
                                  pubmed                                     opencyc
                                                                                           freebase
              uniparc                                    linkedct                           homologenekegg                              projectgutenberg
taxonomy
                            uniprot
                                      interpro                                           symbol                                                     libris
         uniref       geneontologypdb                                             umbel
                                                                        yago
                                            pfam                  dbpedia                    bbclatertotp     govtrack
                                        prosite                                                                                cas                                bbcjohnpeel
                              prodom                                     flickrwrappropencalais
                                                                          unists               uscensusdata         diseasome      dailymed                 w3cwordnet
                                                                                          surgeradio      chebi
                                                                     lingvoj linkedmdb
                                                                                 virtuososponger
                                                                                                               hgnc         pubchem             eurostat
                                                          rdfbookmashup                    mgi                       omim                          wikicompany         geospecies
                                                           swconferencecorpus         geonames musicbrainz
                                                                                                         geneid myspacewrapper
                                                 dblpberlin                          reactome pubguide                   drugbank                           worldfactbook
                                                                                                                                              magnatune
                                                               revyu                                      pubmed
                                                                                                     jamendo                                               opencyc
                                                                    uniparcrdfohloh                                                                                      freebase
                                                                                                              bbcplaycountdata         linkedct
                                                                                               uniprotriese
                                            taxonomy semanticweborg     foafprofiles
                                                                                    siocsites                 interpro
                                                             uniref                   geneontology
                                                                                       audioscrobbler pdb               bbcprogrammes                           umbel
                                    dblphannover    openguides                                                                                        yago
                                                                                             crunchbase
                                                                                                                      pfam                      dbpedia                    bbclatertotp            govtrack
                                                                            doapspace                           prosite
                                                                                                  prodom                                               flickrwrappropencalais
                                                                    flickrexporter
                                                                                qdos
                                                                                                                                                                             uscensusdata
              budapestbme
                                                                                                                                                                        surgeradio
           eurecom
                                                                                  semwebcentral                                                    lingvoj linkedmdb
                                        ecssouthampton
                           dblprkbexplorer
                                   newcastle
                                                                                                                                                               virtuososponger
                   pisa
                                      rae2001                                                                                    rdfbookmashup
                                                                                                                                                             geonames musicbrainz
                                  eprints
                                       irittoulouse
                    laascnrs acm
                                   citeseer                                                                                        swconferencecorpus                                           myspacewrapper
                resex
                           ieee                                                                                          dblpberlin                                                pubguide
                                ibm

                                                                                                                                      revyu                                  jamendo
                                                                                                                                                 rdfohloh
                                                                                                                                                                                            bbcplaycountdata
                                                                                                                                  semanticweborg        siocsites        riese
                                                                                                                                            foafprofiles
                                                                                                                            openguides                     audioscrobbler                            bbcprogrammes
                                                                                                      dblphannover
                                                                                                                                                                 crunchbase
                                                                                                                                                 doapspace


                                                                                                                                           flickrexporter
                                                                   budapestbme                                                                        qdos
Eudaemonic Engine: URIs of the Web of Data
http://dbpedia.org/resource/The Fountainhead




                                  FLICKR
                                             http://www4.wiwiss.fu-berlin.de/flickrwrappr/photos/Ayn_Rand


                                              foaf:depiction


                                                 flickr:Ayn_Rand


                                                 dbpprop:hasPhotoCollection

                                                                    dbpedia:Ayn_Rand




                                   DBPEDIA
                                                                                       dbpedia:Book
                                                   dbpedia:author



                                              dbpedia:Fountain_Head         rdf:type
Eudaemonic Engine: Datasets on the Web of Data
data set           domain       data set           domain       data set             domain
audioscrobbler     music        govtrack           government   pubguide             books
bbclatertotp       music        homologene         biology      qdos                 social
bbcplaycountdata   music        ibm                computer     rae2001              computer
bbcprogrammes      media        ieee               computer     rdfbookmashup        books
budapestbme        computer     interpro           biology      rdfohloh             social
chebi              biology      jamendo            music        resex                computer
crunchbase         business     laascnrs           computer     riese                government
dailymed           medical      libris             books        semanticweborg       computer
dblpberlin         computer     lingvoj            reference    semwebcentral        social
dblphannover       computer     linkedct           medical      siocsites            social
dblprkbexplorer    computer     linkedmdb          movie        surgeradio           music
dbpedia            general      magnatune          music        swconferencecorpus   computer
doapspace          social       musicbrainz        music        taxonomy             reference
drugbank           medical      myspacewrapper     social       umbel                general
eurecom            computer     opencalais         reference    uniref               biology
eurostat           government   opencyc            general      unists               biology
flickrexporter      images       openguides         reference    uscensusdata         government
flickrwrappr        images       pdb                biology      virtuososponger      reference
foafprofiles        social       pfam               biology      w3cwordnet           reference
freebase           general      pisa               computer     wikicompany          business
geneid             biology      prodom             biology      worldfactbook        government
geneontology       biology      projectgutenberg   books        yago                 general
geonames           geographic   prosite            biology      ...
Eudaemonic Engine: Transforms Development
A new application development paradigm emerges. No longer do data and application
providers need to be the same entity (left). With the Web of Data, its possible for
developers to write applications that utilize data that they do not maintain (right).22

                      Application 1   Application 2   Application 3   Application 1     Application 2      Application 3


                                                                            processes    processes      processes

                       processes       processes       processes




                                                                      Web of Data

                       structures      structures      structures
                                                                           structures    structures      structures



                        127.0.0.1      127.0.0.2       127.0.0.3        127.0.0.1        127.0.0.2           127.0.0.3




  22
       Rodriguez, M.A., “A Reflection on the Structure and Process of the Web of Data,” Bulletin of the American Society for
Information Science and Technology, 35(6), pp. 38–43, 2009. [http://arxiv.org/abs/0908.0373]
Now that there is a rich structure, what is the
process?
Process
Eudaemonic Engine: Diffusion Processes on Graphs


A graph diffusion process will be used to determine the solution to one’s
problems.

• Graph traversing can be seen as a diffusion process over a graph.

• “Energy” moves over a graph and reverberates in regions where there
  is recurrence (i.e. cycles).

• At some t in the future, the vertices with the greatest flow are the
  solution to the problem.
Eudaemonic Engine: Diffusion Processes on Graphs
Eudaemonic Engine: Diffusion Processes on Graphs
Eudaemonic Engine: Diffusion Processes on Graphs
Eudaemonic Engine: Diffusion Processes on Graphs
Eudaemonic Engine: Diffusion Processes on Graphs
Implementing a diffusion process is easy when the edges of the
graph are unlabeled.

flow = new HashMap<Vertex,Integer>();
current = Arrays.asList(startVertex);
steps = 10;

for(int i=0; i<steps; i++) {
  current = current.collect{ it.getAdjacentVertices() }
  current.each{ flow[it] = flow[it] + 1 }
}
Eudaemonic Engine: Diffusion on a Property Graph?

                                                           likes                            emil


                                                                                                      likes
                                                                        linked
                                 24
                                                                       process
                                                                                           knows              True Blood

                        likes                                 wrote              wrote
                                           likes
                                                                                                      likes

             jen                knows              marko                knows              peter


                                                                                         occupation
         occupation     likes              likes   wrote           occupation



         intelligence           The Wire           gremlin            tagged              graphs




With different types of things being related by different types of relations,
you need to specify legal paths for the energy to flow over.
Eudaemonic Engine: Diffusion on a Property Graph


• Problem statement = Start vertices + path expression.

• Problem solution = Highest energy vertices at t.23                                         24 25




  23
     Examples presented next are basic due to the simplicity of the toy graph example used. In such cases,
queries as opposed to energy diffusions are best. In general, the purpose of an energy diffusion is to
expose recurrence/feedback in the graph. For the more technically inclined, think of it as determining the
eigenvector of the graph defined by the path expression.
  24
     Rodriguez, M.A., “Grammar-Based Random Walkers in Semantic Networks,” Knowledge-Based Systems,
21(7), pp. 727–739, 2008. [http://arxiv.org/abs/0803.4355]
  25
     Rodriguez, M.A., Neubauer, P., “A Path Algebra for Multi-Relational Graphs,” 2nd International
Workshop on Graph Data Management (GDM11), 2010. [http://arxiv.org/abs/1011.0390]
Eudaemonic Engine: Friend Recommendation
                                                                  likes                            emil


                                                                                                             likes
                                                                               linked
                                        24
                                                                              process
                                                                                                  knows              True Blood

                               likes                                 wrote              wrote
                                                  likes
                                                                                                             likes

                    jen                knows              marko                knows              peter


                                                                                                occupation
                occupation     likes              likes   wrote           occupation



                intelligence           The Wire           gremlin            tagged              graphs




“Who are my friends’ friends that are not me or my friends?”26
 26
      marko.outE[[label:’knows’]].inV.aggregate(x).outE.inV{!x.contains(it)}
Eudaemonic Engine: Product Recommendation
                                                                   likes                            emil


                                                                                                              likes
                                                                                linked
                                         24
                                                                               process
                                                                                                   knows              True Blood

                                likes                                 wrote              wrote
                                                   likes
                                                                                                              likes

                     jen                knows              marko                knows              peter


                                                                                                 occupation
                 occupation     likes              likes   wrote           occupation



                 intelligence           The Wire           gremlin            tagged              graphs




“Who likes what I like? Of those things they like, what else do they like
that I don’t already like?”27
 27
      marko.outE[[label:’likes’]].inV.aggregate(x).inE[[label:’likes’]].outV.outE[[label:’likes’]].inV{!x.contains(it)}
Eudaemonic Engine: Product Recommendation 2
                                                           likes                            emil


                                                                                                      likes
                                                                        linked
                                 24
                                                                       process
                                                                                           knows              True Blood

                        likes                                 wrote              wrote
                                           likes
                                                                                                      likes

             jen                knows              marko                knows              peter


                                                                                         occupation
         occupation     likes              likes   wrote           occupation



         intelligence           The Wire           gremlin            tagged              graphs




“Who likes what I like and what do they like? What do the people I know
like? Of those things liked, what do I not already like?”
Eudaemonic Engine: Recommendation

• Different paths through a domain model expose different types of
  recommendations.

• Individual path preferences allow for an ecosystem of traversals (different
  problems can be solved over the same domain model).28 29 30



  28
     Rodriguez, M.A., Allen, D.W., Shinavier, J., Ebersole, G., “A Recommender System to Support the
Scholarly Communication Process,” 2009. [http://arxiv.org/abs/0905.1594]
  29
     Rodriguez, M.A., “Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and
Recommendation,” Technical Talk Seminar, AT&T Interactive, 2010.
[http://slidesha.re/bOCy4Q]
  30
     Traversal Patterns with Gremlin available at https://github.com/tinkerpop/gremlin/wiki/
Traversal-Patterns.
Universal Computer: A Single Computational Substrate




             The year is 2023.
Life is good. Humans flourish. Virtuous men’s minds are filled
with wonderfully creative ideas. Inventions proliferate.
Advances in computer network technology yield a
new model of computing.


Computer networks are no longer the bottleneck for
speed. Accessing local and remote data is no longer
considered “different.” The distinction between
RAM, disk drive, and Web disappears.
Universal Computer: A Computational Substrate



On the Web...

• Represent data.

• Represent code.

• Represent virtual machines.
Universal Computer: Represent Data



• URIs form an infinite universal address space.

• A URI can denote a datum.
    http://markorodriguez.com#self (Marko)
    http://sws.geonames.org/4887398/about.rdf (Chicago)
    http://data.nytimes.com/N38395718310308503251 (Malm¨)
                                                        o

• RDF (Resource Description Framework) is a data model for linking URIs
  into a multi-relational graph.
Universal Computer: Represent Data
                                                                                            127.0.0.2
        127.0.0.1

                        atti:marko           atti:bestFriend              nm:puppy

                             atti:hasFur                                      atti:hasFur
             atti:numberOfLegs                                 atti:numberOfLegs


          "2"^^xsd:integer    "false"^^xsd:boolean        "4"^^xsd:integer      "true"^^xsd:boolean




• The concept of atti:marko and the properties atti:numberOfLegs, atti:hasFur,
  and atti:bestFriend is maintained by AT&Ti graph server.
• The concept of nm:puppy is maintained by a New Mexico graph server.
• The data types of xsd:integer and xsd:boolean are maintained by XML standards
  organization.
Universal Computer: Represent Code

• Computing is a series of instructions — add, write, branch, goto...

• The URI address space and RDF glue can be seen as computational
  medium.31


                                            _:123           rdf:type       atti:Add


                             atti:left-op           atti:right-op
                                                                        rdf:subClassOf



                       "3"^^xsd:int                      "7"^^xsd:int   atti:Instruction


  31
     Rodriguez, M.A., “General-Purpose Computing on a Semantic Network Substrate,” Emergent Web
Intelligence: Advanced Semantic Technologies, eds. R. Chbeir, A. Hassanien, A. Abraham, and Y. Badr, pp.
57–104, 2010. [http://arxiv.org/abs/0704.3395]
Universal Computer: Represent Code

                                  atti:marko             atti:bestFriend       nm:puppy

                                 atti:hasMethod                                atti:isHappy
         Method

                                    atti:pet                               "false"^^xsd:boolean
               atti:args
                                   atti:block


                _:1234              _:2345
                                                  atti:inst
                  rdf:1

                                                  _:3456
          "animal"^^xsd:string
                                           // make animal happy




Represent methods and their instructions attached to objects/classes.
Universal Computer: Represent Virtual Machines


     Virtual Machine

        atti:VM                            atti:marko     atti:bestFriend       nm:puppy

                                         atti:hasMethod                         atti:isHappy
        rdf:type

        _:6789     atti:pc   _:3456         atti:pet                        "false"^^xsd:boolean

                                           atti:block
                             atti:inst

                                            _:2345


                                   write "true"^^xsd:boolean



Represent not only code, but the machines that execute it.
Universal Computer: Represent Virtual Machines
                                          xsd:boolean                     RVM                    xsd:boolean
                                                  [1]                                                 [1]

                                                        methodReuse                       halt


                      programLocation                                     Fhat

                                             operandTop                                                                          hasFrame
                                                                        returnTop

            [0..1]                           [0..1]                           [0..1]
                                                                                                     currentFrame
                           [0..1]     Operand                 [0..1]
            Instruction                                                ReturnStack
                                       Stack
                      rdf:rest                           rdf:rest                        blockTop
                                                                         rdf:first                                       [0..1]            [0..*]
                                       rdf:first
                                                                               [0..1]
                                            [0..1]                                                 forFrame                        Frame
                                                                                                                  [1]
                                    rdfs:Resource                      Instruction
                                                                                                                                     rdf:li
                                                                                                                                 [0..*]

                                                              [0..1]                    [0..1]                                    Frame
                                                                          Block
                                                                                                                                 Variable
                                                                          Stack
                                                         rdf:rest                                 hasSymbol                       hasValue         fromBlock
                                                                         rdf:first
                                                                              [0..1]                        [1]                  [0..*]            [1]

                                                                          Block                  xsd:string               rdfs:Resource             Block




NenoFhat Project (circa 2006): http://neno.lanl.gov.
Global Data Structure

 Data
                                    Machine Architecture




   API




Program                             Virtual Machine State


         read/write
                                          read/write


              Virtual Machine Processes

                              ...
127.0.0.1             Physical Machines                127.0.0.4
            127.0.0.2                    127.0.0.3
                          Physics
                      My Belief in Reality
Universal Computer: A Ramification

• Data, APIs, code, machine architectures, and virtual machines are within
  the same global URI address space.
    Code can by physically distributed across computers. For example,
    an add instruction on 127.0.0.1 references a branch instruction on
    127.0.0.2.
    Hardware machines can be added or removed without altering the
    state of computation — only the speed.
    No developer concept of RAM-based memory addresses — the only
    address space is the space of all URIs.
Universal Computer: Another Ramification

• Reflection down to the machine level.32
      Most languages support the manipulation of code at runtime. In this
      model, the virtual machine can be altered at runtime.
      Code can rewrite the virtual machine that is evaluating the
      code. (i.e. create lots of bugs.)




 32
   Rodriguez, M.A., The RDF Virtual Machine, LA-UR-08-03925, in review, 2009. [http://arxiv.org/
abs/0802.3492]
The year is 2030.
Man learns to encode themselves into the URI
address space...33 34




  33
    Egan, G., “Permutation City,” Eos Publisher, 1995.
  34
    Rodriguez, M.A., “From the Signal to the Symbol: Structure and Process in Artificial Intelligence,”
Center for Nonlinear Studies Post Doctorate Seminar, Los Alamos National Laboratory, Los Alamos, New
Mexico, 2008. [http://slidesha.re/hdqRn2]
Outline



• Graph Structures

• Graph Databases

• Graph Applications

• TinkerPop Product Suite
This is the TinkerPop...
TinkerPop Productions
• Blueprints: Data Models and their Implementations
  [http://blueprints.tinkerpop.com]
• Pipes: A Data Flow Framework using Process Graphs
  [http://pipes.tinkerpop.com]
• Gremlin: A Graph-Based Programming Language
  [http://gremlin.tinkerpop.com]
• Rexster: A RESTful Graph Shell
  [http://rexster.tinkerpop.com]35




 35
    Please see http://engineering.attinteractive.com/2010/12/a-graph-processing-stack/ for
a short review of these products.
Also TinkerPop’s homepage at: http://tinkerpop.com
Blueprints: A Property Graph Model Interface

                                    Blueprints

• Blueprints is the like the JDBC of the graph database community.

• Provides a Java-based interface API for the property graph data model.
       Graph, Vertex, Edge, Index.

• Connectors to TinkerGraph, Neo4j, OrientDB, Sails (e.g. AllegroGraph,
  HyperSail, etc.), and soon InfiniteGraph. Into the future, hope to support
  InfoGrid, Sones, DEX, and HyperGraphDB.36
  36
    HyperGraphDB makes use of an n-ary graph structure known as a hypergraph. Blueprints, in its current
form, only supports the more common binary graph.
Creating a Neo4jGraph in Blueprints
// create a graph
     Graph graph = new Neo4jGraph("/tmp/neo4j");
// add two vertices
     Vertex a = graph.addVertex(null);
     a.setProperty("name","marko");
     Vertex b = graph.addVertex(null);
     b.setProperty("name","peter");
// join the two vertices by a knows relation
     Edge e = graph.addEdge(null,a,b,"knows");
     e.setProperty("since","2007");


                 0          knows          1
                          since=2007
             name=marko                name=peter
Handy Features of Blueprints
• Supports automatic transactions
    graph.setTransactionMode(AUTOMATIC -or- MANUAL)
    In automatic mode, every manipulation of the graph is wrapped in a
    transaction and committed.

• Supports automatic indices
    graph.createIndex(AUTOMATIC -or- MANUAL)
    In automatic mode, elements are added or removed from an index as
    their properties are manipulated.

• Utility Suite
    Blueprints Sail makes a graphdb into a traversal-based RDF store.
    GraphML Reader/Writer library.
Pipes: A Data Flow Framework using Process Graphs


                                   Pipes

• Lazy data flow with support for Blueprints-based graph processing.

• Provides a collection of “pipes” (implement Iterable and Iterator)
  that are connected together to form processing pipelines.
    Filters: ComparisonFilterPipe, RandomFilterPipe, etc.
    Traversal: VertexEdgePipe, EdgeVertexPipe, PropertyPipe, etc.
    Splitting/Merging: CopySplitPipe, RobinMergePipe, etc.
    Logic: OrFilterPipe, AndFilterPipe, etc.
Pipes: Chained Iterators


This pipeline takes objects of type A and turns them into objects of type D
through a sequence of processing pipes...37

                                                                                                      D
                                                                                              D
              A
                      A   A    Pipe1        B       Pipe2       C       Pipe3        D            D
          A                                                                               D
                  A
                                                   Pipeline



Pipe<A,D> pipeline =
   new Pipeline<A,D>(Pipe1<A,B>, Pipe2<B,C>, Pipe3<C,D>)
 37
      Though not discussed, splitting and merging is allowed as well (branching pipelines).
Pipes: A Simple Example

“What are the names of the people that marko knows?”

                                    B       name=peter
                        knows



                A       knows       C       name=pavel

          name=marko
                                  created
                        created
                                    D       name=gremlin
Pipes: A Simple Example
Pipe<Vertex,Edge> pipe1 = new VertexEdgePipe(Step.OUT_EDGES);
Pipe<Edge,Edge> pipe2= new LabelFilterPipe("knows",Filter.NOT_EQUAL);
Pipe<Edge,Vertex> pipe3 = new EdgeVertexPipe(Step.IN_VERTEX);
Pipe<Vertex,String> pipe4 = new PropertyPipe<String>("name");

Pipe<Vertex,String> pipeline = new Pipeline(pipe1,pipe2,pipe3,pipe4);
pipeline.setStarts(new SingleIterator<Vertex>(graph.getVertex("A"));


                                           B       name=peter
                               knows



                       A       knows       C       name=pavel

                 name=marko
                                         created
                               created
                                           D       name=gremlin
Pipes: A Simple Example
for(String name : pipeline) {
  System.out.println(name);
}

                                     B       name=peter
                         knows



                 A       knows       C       name=pavel

           name=marko
                                   created
                         created
                                     D       name=gremlin




peter
pavel
Pipes: A Simple Example
                            EdgeVertexPipe(IN_VERTEX)


VertexEdgePipe(OUT_EDGES)
                                                   PropertyPipe("name")

                                             B     name=peter
                          knows



           A              knows              C     name=pavel

     name=marko
                                         created
                          created
                                             D     name=gremlin



                  LabelFilterPipe("knows")
Pipes: A Simple Example
                            EdgeVertexPipe(IN_VERTEX)


VertexEdgePipe(OUT_EDGES)
                                                   PropertyPipe("name")

                                             B     name=peter
                          knows



           A              knows              C     name=pavel

     name=marko
                                         created
                          created
                                             D     name=gremlin



                  LabelFilterPipe("knows")
Pipes: A Simple Example
                            EdgeVertexPipe(IN_VERTEX)


VertexEdgePipe(OUT_EDGES)
                                                   PropertyPipe("name")

                                             B     name=peter
                          knows



           A              knows              C     name=pavel

     name=marko
                                         created
                          created
                                             D     name=gremlin



                  LabelFilterPipe("knows")
Pipes: A Simple Example
                            EdgeVertexPipe(IN_VERTEX)


VertexEdgePipe(OUT_EDGES)
                                                   PropertyPipe("name")

                                             B     name=peter
                          knows



           A              knows              C     name=pavel

     name=marko
                                         created
                          created
                                             D     name=gremlin



                  LabelFilterPipe("knows")
Pipes: Library of Generally Useful Pipes

                       [ MERGES ]            [ SIDEEFFECTS ]
[ FILTERS ]            ExhaustiveMergePipe   AggregatorPipe
AndFilterPipe          RobinMergePipe        CountCombinePipe
CollectionFilterPipe                         CountPipe
ComparisonFilterPipe   [ GRAPHS ]            KeyCombinePipe
DuplicateFilterPipe    EdgeVertexPipe        SideEffectCapPipe
FutureFilterPipe       IdFilterPipe
ObjectFilterPipe       IdPipe                [ UTILITIES ]
OrFilterPipe           LabelFilterPipe       DynamicStartsPipe
RandomFilterPipe       LabelPipe             GatherPipe
RangeFilterPipe        PropertyFilterPipe    PathPipe
                       PropertyPipe          PrintStreamPipe
                       VertexEdgePipe        ProductPipe
[ SPLITS ]                                   ScatterPipe
CopySplitPipe                                TypeCastPipe
RobinSplitPipe                               Pipeline
                                             ...
Pipes: Easy to Create New Pipes


public class NumCharsPipe extends AbstractPipe<String,Integer> {
  public Integer processNextStart() {
    String word = this.starts.next();
    return word.length();
  }
}


When extending the base class AbstractPipe<S,E> all that is required is
an implementation of processNextStart().
Pipes: Easy to Create New Pipes
                      Most of my projects are composed
                      of lots of application specific Pipes.
com.tinkerpop.pipes   That is, Pipes that are specific to
                      my domain model and yield useful
                      jumps in the graph. For example,

  domain specific      SameLikesPipe<Vertex,Vertex>.

                      From these domain specific Pipes,
                      complex algorithms are created
                      through the piecing together of
 complex traversal
                      those Pipes. For example,
    algorithms
                      RecommenderPipe<Vertex,Map>.
Gremlin: A Graph-Based Programming Language



                                            Gremlin         G = (V, E)



• A graph traversal language that uses Groovy as its host language.

• Compiles Gremlin syntax down to Pipes (implements JSR 223).38



  38
     At the time of this presentation, Gremlin’s most recent stable release is 0.6 which is a standalone
language. To increase the flexibility of the language, 0.7-SNAPSHOT+ boasts the use of Groovy as the host
the language.
Gremlin: Easily Compose Graph Related Pipes
Pipes is verbose...

Pipe<Vertex,Edge> pipe1 = new VertexEdgePipe(Step.OUT_EDGES);
Pipe<Edge,Edge> pipe2 = new LabelFilterPipe("knows",Filter.NOT_EQUAL);
Pipe<Edge,Vertex> pipe3 = new EdgeVertexPipe(Step.IN_VERTEX);
Pipe<Vertex,String> pipe4 = new PropertyPipe<String>("name");

Pipe<Vertex,String> pipeline = new Pipeline(pipe1,pipe2,pipe3,pipe4);
pipeline.setStarts(new SingleIterator<Vertex>(graph.getVertex("A"));

...relative to Gremlin.

          g.v(‘A’).outE[[label:‘knows’]].inV.name
Gremlin: The Simple Example
                                inV

           outE                                    name
                                      B    name=peter
                     knows
g.v('A')

      A               knows           C    name=pavel

name=marko
                                 created
                      created
                                      D    name=gremlin



                  [[label:'knows']]
Gremlin: Defining a Step
“Who likes the same things that I like?”

Vertex.metaClass.same_like =
  { _().outE[[label:‘likes’]].inV.inE[[label:‘likes’]].outV }


                              B     likes    E

                      likes         likes

               A              C     likes    F

                      likes         likes


                              D     likes    G
Gremlin: Defining a Step
gremlin> g.v(‘A’).same_likes
==>v[E]
==>v[F]
==>v[F]
==>v[G]


                              B   likes      E

                      likes       likes

               A              C   likes      F

                      likes       likes


                              D   likes      G
Gremlin: Defining a Step
gremlin> m = g:id-v(‘A’).same_likes.group_count >> 1
gremlin> m
==>v[E]=1
==>v[F]=2
==>v[G]=1

v[F] is most similar, in terms of likes, to v[A].39




 39
    For a thorough review of such traversal patterns, please see: Rodriguez, M.A., “Problem-
Solving using Graph Traversals: Searching, Scoring, Ranking, and Recommendation,” July 2010.
[http://slidesha.re/bOCy4Q]
Rexster: A RESTful Graph Shell

                                reXster
• Allows Blueprints graphs to be exposed through a RESTful API (HTTP).

• All communication is via JSON.

• Supports stored traversals written in raw Pipes or Gremlin.

• Supports adhoc traversals represented in Gremlin.

• Provides “helper classes” for performing search-, score-, and rank-based
  traversal algorithms—in concert, support for recommendation.
Rexster: URI Patterns
• http://localhost/graph/vertices: all the vertices in the graph

• http://localhost/graph/vertices/1: vertex with id 1 in the graph.

• http://localhost/graph/vertices/1/outE:        outgoing edges of
  vertex with id 1.

{ "results": {
     "_type":"vertex",
     "_id":"1",
     "name":"aaron",
     "type":"person"
  },
  "query_time":0.1537 }
Typical TinkerPop Graph Stack
       GET http://{host}/{resource}




       Neo4j    NativeStore   TinkerGraph
Conclusion

• Property graphs are convenient structures for modeling the real-world.

• Graph databases provide index-free adjacency to ensure speedy
  traversal over graphs.

• The graph is such a general data structure that it can be used for
  numerous applications.

• TinkerPop provides a database agnostic stack of technologies for
  working with property graphs.
Acknowledgements

• Research collaborators: Daniel Steinbock (Stanford), Jennifer H.
  Watkins (LANL), Alberto Pepe (Harvard), Joshua Shinvaier (RPI), Johan
  Bollen (LANL), Herbert Van de Sompel (LANL).

• TinkerPop contributors: Pavel Yaskevich (Riptano), Stephen Mallete
  (Independent), Darrick Weibe (Independent), Alex Averbuch (Swedish
  Institute of CS), Peter Neubauer (Neo4j).

• Others: Emil Eifrem (Neo4j), Luca Garulli (Orient Technologies), Aaron
  Patterson (AT&Ti).
http://tinkerpop.spreadshirt.com

More Related Content

More from Marko Rodriguez

ACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageMarko Rodriguez
 
The Gremlin Graph Traversal Language
The Gremlin Graph Traversal LanguageThe Gremlin Graph Traversal Language
The Gremlin Graph Traversal LanguageMarko Rodriguez
 
Faunus: Graph Analytics Engine
Faunus: Graph Analytics EngineFaunus: Graph Analytics Engine
Faunus: Graph Analytics EngineMarko Rodriguez
 
Solving Problems with Graphs
Solving Problems with GraphsSolving Problems with Graphs
Solving Problems with GraphsMarko Rodriguez
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataMarko Rodriguez
 
The Pathology of Graph Databases
The Pathology of Graph DatabasesThe Pathology of Graph Databases
The Pathology of Graph DatabasesMarko Rodriguez
 
Traversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinTraversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinMarko Rodriguez
 
The Path-o-Logical Gremlin
The Path-o-Logical GremlinThe Path-o-Logical Gremlin
The Path-o-Logical GremlinMarko Rodriguez
 
The Gremlin in the Graph
The Gremlin in the GraphThe Gremlin in the Graph
The Gremlin in the GraphMarko Rodriguez
 
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataGraph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataMarko Rodriguez
 
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...Marko Rodriguez
 
A Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network ScienceA Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network ScienceMarko Rodriguez
 
The Graph Traversal Programming Pattern
The Graph Traversal Programming PatternThe Graph Traversal Programming Pattern
The Graph Traversal Programming PatternMarko Rodriguez
 
The Network Data Structure in Computing
The Network Data Structure in ComputingThe Network Data Structure in Computing
The Network Data Structure in ComputingMarko Rodriguez
 
A Model of the Scholarly Community
A Model of the Scholarly CommunityA Model of the Scholarly Community
A Model of the Scholarly CommunityMarko Rodriguez
 
General-Purpose, Internet-Scale Distributed Computing with Linked Process
General-Purpose, Internet-Scale Distributed Computing with Linked ProcessGeneral-Purpose, Internet-Scale Distributed Computing with Linked Process
General-Purpose, Internet-Scale Distributed Computing with Linked ProcessMarko Rodriguez
 
Collective Decision Making Systems: From the Ideal State to Human Eudaimonia
Collective Decision Making Systems: From the Ideal State to Human EudaimoniaCollective Decision Making Systems: From the Ideal State to Human Eudaimonia
Collective Decision Making Systems: From the Ideal State to Human EudaimoniaMarko Rodriguez
 
Distributed Graph Databases and the Emerging Web of Data
Distributed Graph Databases and the Emerging Web of DataDistributed Graph Databases and the Emerging Web of Data
Distributed Graph Databases and the Emerging Web of DataMarko Rodriguez
 
An Overview of Data Management Paradigms: Relational, Document, and Graph
An Overview of Data Management Paradigms: Relational, Document, and GraphAn Overview of Data Management Paradigms: Relational, Document, and Graph
An Overview of Data Management Paradigms: Relational, Document, and GraphMarko Rodriguez
 

More from Marko Rodriguez (20)

ACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and Language
 
The Gremlin Graph Traversal Language
The Gremlin Graph Traversal LanguageThe Gremlin Graph Traversal Language
The Gremlin Graph Traversal Language
 
The Path Forward
The Path ForwardThe Path Forward
The Path Forward
 
Faunus: Graph Analytics Engine
Faunus: Graph Analytics EngineFaunus: Graph Analytics Engine
Faunus: Graph Analytics Engine
 
Solving Problems with Graphs
Solving Problems with GraphsSolving Problems with Graphs
Solving Problems with Graphs
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph Data
 
The Pathology of Graph Databases
The Pathology of Graph DatabasesThe Pathology of Graph Databases
The Pathology of Graph Databases
 
Traversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinTraversing Graph Databases with Gremlin
Traversing Graph Databases with Gremlin
 
The Path-o-Logical Gremlin
The Path-o-Logical GremlinThe Path-o-Logical Gremlin
The Path-o-Logical Gremlin
 
The Gremlin in the Graph
The Gremlin in the GraphThe Gremlin in the Graph
The Gremlin in the Graph
 
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataGraph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
 
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
 
A Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network ScienceA Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network Science
 
The Graph Traversal Programming Pattern
The Graph Traversal Programming PatternThe Graph Traversal Programming Pattern
The Graph Traversal Programming Pattern
 
The Network Data Structure in Computing
The Network Data Structure in ComputingThe Network Data Structure in Computing
The Network Data Structure in Computing
 
A Model of the Scholarly Community
A Model of the Scholarly CommunityA Model of the Scholarly Community
A Model of the Scholarly Community
 
General-Purpose, Internet-Scale Distributed Computing with Linked Process
General-Purpose, Internet-Scale Distributed Computing with Linked ProcessGeneral-Purpose, Internet-Scale Distributed Computing with Linked Process
General-Purpose, Internet-Scale Distributed Computing with Linked Process
 
Collective Decision Making Systems: From the Ideal State to Human Eudaimonia
Collective Decision Making Systems: From the Ideal State to Human EudaimoniaCollective Decision Making Systems: From the Ideal State to Human Eudaimonia
Collective Decision Making Systems: From the Ideal State to Human Eudaimonia
 
Distributed Graph Databases and the Emerging Web of Data
Distributed Graph Databases and the Emerging Web of DataDistributed Graph Databases and the Emerging Web of Data
Distributed Graph Databases and the Emerging Web of Data
 
An Overview of Data Management Paradigms: Relational, Document, and Graph
An Overview of Data Management Paradigms: Relational, Document, and GraphAn Overview of Data Management Paradigms: Relational, Document, and Graph
An Overview of Data Management Paradigms: Relational, Document, and Graph
 

Recently uploaded

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Memoirs of a Graph Addict: Despair to Redemption

  • 1. Memoirs of a Graph Addict: Despair to Redemption Marko A. Rodriguez Graph Systems Architect http://markorodriguez.com http://twitter.com/twarko Winter Whirlwind Tour – Chicago to Malm¨ – January 10-14, 2011 o January 8, 2011
  • 2. Abstract A graph database provides a means of linking together objects using direct references. In other words, in order to determine if one object is adjacent to another, no index lookup is required. In contrast to relational databases, in a graph database, there is no notion of a join operation as the graph is already an explicitly joined structure. Given a graph, problems are solved using graph traversals–that is, directed walks over the objects and relations that compose the graph. This lecture has three primary points of discussion. The first is a description of graph database technology. The second, a memoir of the speaker’s applied and theoretical work with graphs. The third and final point, a review of an open source graph processing stack currently being developed by AT&T Interactive and its collaborators.
  • 3.
  • 4.
  • 5.
  • 6. For 10 years now, I’ve dealt with a painful graph addiction... Let me share my story with you.
  • 7. Outline • Graph Structures • Graph Databases • Graph Applications • TinkerPop Product Suite
  • 8. Outline • Graph Structures • Graph Databases • Graph Applications • TinkerPop Product Suite
  • 9. Graph Data Structure Pieces: Part 1 id vertex (thing, object, dot) } element edge (relation, join, line)
  • 10. Single-Relational Graph marko peter neotech tinkerpop neo4j gremlin blueprints In single-relational graphs, things are related. Unfortunately, not a very useful structure for most domain modeling situations. Relatedness is too generic—all edges have the same meaning.
  • 11. Graph Data Structure Pieces: Part 2 id vertex (thing, object, dot) } element label edge (relation, join, line)
  • 12. Multi-Relational Graph knows marko knows peter member neotech member member created tinkerpop neo4j created created imports gremlin imports blueprints By adding labels to the edges, its possible to denote the type of relation that exists between any two vertices. Now its possible to denote different types of things and the different ways in which they relate to one another.
  • 13. Graph Data Structure Pieces: Part 3 id vertex (thing, object, dot) } element label edge (relation, join, line) key=value property (key/value, attribute) key1=value1 key2=value2 property map
  • 14. Property Graph knows marko knows peter member neotech member member created tinkerpop date=2009 date=2009 neo4j created created imports lang=java use=graphdb gremlin imports blueprints lang=java lang=java use=api use=traverse Allow elements to have key/value properties. In particular, very useful for further specifying the meaning of an edge. “When did TinkerPop create Gremlin?”
  • 15. Numerous Graph Types vertex-labeled a multi ted igh hyper we 0.2 edge-labeled knows simple created=2-01-09 modified=2-11-09 ge tic undirected half-ed hired di an re edge-attributed cte sem pseudo d name=emil type=person http://ex.com/123 vertex-attributed resource description framework Rodriguez, M.A., Neubauer, P., “Constructions from Dots and Lines,” Bulletin of the American Society for Information Science and Technology, 36(6), pp. 35-41, 2010. [http://arxiv.org/abs/1006.2361]
  • 16. Property Graph as a Rich Structure weighted graph add weight attribute property graph remove attributes remove attributes no op labeled graph no op semantic graph no op directed graph remove edge labels remove edge labels make labels URIs no op rdf graph multi-graph remove directionality remove loops, directionality, and multiple edges simple graph no op undirected graph A fun related thought: Rodriguez, M.A., “Mapping Semantic Networks to Undirected Networks,” International Journal of Applied Mathematics and Computer Sciences, 4(1), pp. 39–42, 2009. [http://arxiv.org/abs/0804.0277]
  • 17. Graph Algorithms in Single-Relational Graphs • Most graph algorithms are designed for single-relational graphs.1 Geodesic: shortest path, eccentricity, diameter, closeness centrality, betweenness centrality, etc. Eigenvector: spreading activation, pagerank, eigenvector centrality, etc. Assortative: scalar, assortative, etc. 1 Excellent book reviewing numerous graph algorithms: Brandes U., Erlebach, T., “Network Analysis: Methodological Foundations,” Springer, 2005.
  • 18. Graph Algorithms in Multi-Relational+ Graphs • Most real-world software systems require multi-relational+ graphs. E.g.: Who are the most central coauthors when all I know is wrote? coauthor coauthor wrote wrote wrote wrote wrote wrote • A key concept when evaluating graph algorithms over multi-relational+ graphs is implicit adjacency/path descriptions/virtual edges/etc.2 2 Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network Analysis Algorithms,” Journal of Informetrics, 4(1), pp. 29–41, 2009. [http://arxiv.org/abs/0806.2274]
  • 19. Outline • Graph Structures • Graph Databases • Graph Applications • TinkerPop Product Suite
  • 20. The Simplicity of a Graph • A graph is a simple data structure. • A graph states that something is related to something else (the foundation of any other data structure).3 • It is possible to model a graph in various types of databases.4 Relational database: MySQL, Oracle, PostgreSQL JSON document database: MongoDB, CouchDB XML document database: MarkLogic, eXist-db etc. 3 A graph can be used to represent other data structures. This point becomes convenient when looking beyond using graphs for typical, real-world domain models (e.g. friends, favorites, etc.), and seeing their applicability in other areas such as modeling code (e.g. http://arxiv.org/abs/0802.3492), indices, etc. 4 For the sake of diagram clarity, the examples to follow are with respect to a single-relational, directed graph. Note that it is possible to model multi-relational graphs in these types of database as well.
  • 21. Representing a Graph in a Relational Database outV | inV ------------ A A | B A | C C | D B C D | A D
  • 22. Representing a Graph in a JSON Database { A : { A outE : [B, C] } B : { outE : [] } B C C : { outE : [D] } D : { D outE : [A] } }
  • 23. Representing a Graph in an XML Database <graphml> <graph> A <node id=A /> <node id=B /> <node id=C /> <node id=D /> <edge source=A target=B /> B C <edge source=A target=C /> <edge source=C target=D /> <edge source=D target=A /> </graph> D </graphml>
  • 24. Defining a Graph Database “If any database can represent a graph, then what is a graph database?”
  • 25. Defining a Graph Database A graph database is any storage system that provides index-free adjacency.
  • 26. Defining a Graph Database by Example Toy Graph Gremlin (stuntman) B E A C D
  • 27. Graph Databases and Index-Free Adjacency B E A C D • Our gremlin is at vertex A. • In a graph database, vertex A has direct references to its adjacent vertices. • Constant time cost to move from A to B and C . It is dependent upon the number of edges emanating from vertex A (local).
  • 28. Graph Databases and Index-Free Adjacency B E A C D The Graph (explicit)
  • 29. Graph Databases and Index-Free Adjacency B E A C D The Graph (explicit)
  • 30. Non-Graph Databases and Index-Based Adjacency B E A B C A B,C E D,E D E C D • Our gremlin is at vertex A.
  • 31. Non-Graph Databases and Index-Based Adjacency B E A B C A B,C E D,E D E C D • In a non-graph database, the gremlin needs to look at an index to determine what is adjacent to A. • log(n) time cost to move to B and C . It is dependent upon the total number of vertices and edges in the database (global).
  • 32. Non-Graph Databases and Index-Based Adjacency B E A B C A B,C E D,E D E C D The Index (explicit) The Graph (implicit)
  • 33. Non-Graph Databases and Index-Based Adjacency B E A B C A B,C E D,E D E C D The Index (explicit) The Graph (implicit)
  • 34. Index-Free Adjacency • While any database can implicitly represent a graph, only a graph database makes the graph structure explicit.5 • In a graph database, each vertex serves as a “mini index” of its adjacent elements.6 • Thus, as the graph grows in size, the cost of a local step remains the same.7 5 Please see http://markorodriguez.com/Blarko/Entries/2010/3/29_MySQL_vs._Neo4j_on_a_ Large-Scale_Graph_Traversal.html for some performance characteristics of graph traversals in a relational database (MySQL) and a graph database (Neo4j). 6 Each vertex can be intepreted as a “parent node” in an index with its children being its adjacent elements. In this sense, traversing a graph is analogous in many ways to traversing an index—albeit the graph is not an acyclic connected graph (tree). (a vision espoused by Craig Taverner) 7 A graph, in many ways, is like a distributed index.
  • 35. Graph Query = Graph Traversal • Graph databases are optimized for graph-theoretic operations (e.g. graph traversals). • Graph databases are not optimized for set-theoretic operations (e.g. union, intersection, theta-join). • The graph traversal pattern:8 Given some root set of elements, traverse in X fashion to yield some side-effect and/or destination. 8 Rodriguez, M.A., Neubauer, P., “The Graph Traversal Pattern,” Graph Data Management: Techniques and Applications, eds. S. Sakr, E. Pardede, IGI Global, 2011. http://arxiv.org/abs/1004.1001
  • 36. Outline • Graph Structures • Graph Databases • Graph Applications • TinkerPop Product Suite
  • 37. Adventures in Graphlandia My graph disease first started in 2001 and it’s only progressed since... • Collective decision making: graph-based voting. • Eudaemonic engine: graph-based recommendation. • Universal computer: graph-based computing.
  • 38. Collective Decision Making: Fall of the Modern World The year is 2014.
  • 39. Oil production has dropped significantly. Any reserves that are left are too expensive to purchase. Nations can not transport food.9 Regions with poor agriculture yield famine. 9 Peak oil available at http://en.wikipedia.org/wiki/Peak_oil.
  • 40. People are in shock, fear, and panic over the fall of the modern world. The world sees a 75% drop in human population.
  • 41. The technology and knowledge of the modern world still exists. The social infrastructure doesn’t....A few rise to create a new world order.10 10 Watkins, J.H., M.A. Rodriguez, “A Survey of Web-Based Collective Decision Making Systems,” Studies in Computational Intelligence: Evolution of the Web in Artificial Intelligence Environments, eds. R. Nayak, N. Ichalkaranje, and L.C. Jain, pp. 245-279, 2008. [http://escholarship.org/uc/item/04h3h1cr]
  • 42. Collective Decision Making: Rise of the Machines Four strong, brave men begin the journey to stability. Decisions marko peter need to be made regarding how to determine and execute social goals. The distributed collective of TinkerPop is created. josh • Marko Rodriguez (former USA) • Peter Neubauer (former Sweden) pavel • Josh Shinavier (former China) • Pavel Yaskevich (former Belarus)
  • 43. Collective Decision Making: Rise of the Machines marko peter josh pavel Dynamically Distribute Direct Democracy Democracy Two examples will be presented for the same decision making scenario. One using direct democracy as the aggregation algorithm and one using dynamically distributed democracy as the aggregation algorithm.11 11 Rodriguez, M.A., Watkins, J.H., “Revisiting the Age of Enlightenment from a Collective Decision Making Systems Perspective,” First Monday, 14(8), 2009. [http://arxiv.org/abs/0901.3929]
  • 44. Collective Decision Making: Direct Democracy • “What percentage of our crop marko peter yield should we store as 0.8 0.5 reserves?” • The outcome is represented as a real value in [0, 1]. josh 0.8 • Each individual has their opinion of the situation. pavel Marko (80% should be stored.) 0.9 Peter (50% should be stored.) Josh (80% should be stored.) Pavel (90% should be stored.)
  • 45. Collective Decision Making: Direct Democracy • In a direct democracy, every one marko peter voices their opinion. 0.8 0.5 • The average of all voiced opinions is the final decision (even in binary josh decisions). 0.8 • For our society of 4, a pure direct pavel democracy would yield 0.9 (0.8 + 0.5 + 0.8 + 0.9)/4 = 0.75.
  • 46. Collective Decision Making: Direct Democracy • If an individual abstains from marko peter participation, then their opinion 0.8 0.5 is not considered. • Assume only Peter and Pavel are there to participate. Marko and josh 0.8 Josh are out hunting. • For our society of 4 (with 2 voters), a pure direct democracy pavel would yield 0.9 (0.5 + 0.9)/2 = 0.7. |0.75 − 0.7| = 0.05 error.
  • 47. Collective Decision Making: Representative Democracy • Thomas Paine stated that when populations are small “some convenient tree will afford them a State house”, but as the population increases it becomes a necessity for representatives to “act in the same manner as the whole body would act were they present.”12 13 12 Paine, T., “Common Sense,” 1776. 13 The role of the representative as an expert vs. a model is argued at length in Pitkin, H.F., “The Concept of Representation,” University of California Press, 1972.
  • 48. Collective Decision Making: DDD • Dynamically distributed democracy (DDD) strikes a balance between direct and representative democracy. • An individual is at least a representative of themselves. • An individual can also yield the power of those that abstain from participation. • Dynamically distributing representative power is the purpose of the algorithm.
  • 49. Collective Decision Making: DDD • Peter believes that Josh and Marko are good decision makers. marko 0.5 peter • When Peter abstains, Marko 0.5 and Josh yield his social power in equal parts (0.5). josh • Like a friendship graph, but the edges denote “trust.” “I believe that X has identical values pavel to me and will behave as I do.” “I believe that X is more expert than I and should make decisions.”
  • 50. Collective Decision Making: DDD • Marko believes Josh is the key to humanity. marko 0.5 peter 1.0 0.5 • Josh prefers people closer to his 0.25 eastern home of former China. josh 0.75 • Pavel is of the former Soviet Union, and simply has no faith pavel in anyone.
  • 51. Collective Decision Making: DDD marko 0.5 peter 1.0 0.5 0.25 josh 0.75 pavel This is the trust-based social graph. Individuals can add/remove outgoing edges from their vertex as they please. When decisions are required, the current snapshot of the graph is used to compute the collective decision.
  • 52. Collective Decision Making: DDD • In a dynamically distributed democracy, every can voice their marko 0.5 peter opinion. 1.0 0.5 • The weighted average of all 0.25 voiced opinions is the final josh decision. 0.75 • For our society of 4, a pure direct democracy would yield pavel (0.8 + 0.5 + 0.8 + 0.9)/4 = 0.75. • When everyone participates, its a direct democracy.
  • 53. Collective Decision Making: DDD • Assume Marko and Josh go 1.0 1.0 hunting, again. By abstaining, marko peter they diffuse their vote power 0.8 0.5 0.5 over their outgoing edges. 1.0 0.5 • By participating, Peter and 0.25 josh Pavel aggregate vote power 0.8 through their incoming edges. 1.0 0.75 1.0 • This diffusion process continues pavel until all power has aggregated at 0.9 participating individuals.
  • 54. Collective Decision Making: DDD • Note that Marko fully trusts Josh decision making abilities. 1.25 marko peter 0.5 0.8 0.5 • However, given that Josh is not 1.0 0.5 participating, Marko is implicitly 0.25 stating that he trusts Josh’s josh decision in choosing decision 0.8 makers. 1.0 0.75 1.75 pavel • Thus, Josh serves to route 0.9 Marko’s power.
  • 55. Collective Decision Making: DDD • In the end, Peter and Pavel have aggregated all the energy 1.5 in the graph (albeit, to different marko peter 0.5 degrees). 0.8 0.5 1.0 0.5 • Now a weighted direct democracy 0.25 is used to calculate the collective josh 0.8 decision. 0.75 2.5 • The collective vote is pavel ((1.5·0.5)+(2.5·0.9))/4 = 0.75. 0.9 |0.75 − 0.75| = 0.0 error.
  • 56. Collective Decision Making: DDD 0.20 correct decisions 0.00 0.05 0.10 0.15 0.95 direct democracy dynamically distributed democracy 0.80 proportion oferror 0.65 dynamically distributed democracy direct democracy 0.50 100 90 80 70 60 50 40 30 20 10 100 90 80 70 60 50 40 30 20 10 0 0 percentage of active citizens percentage of active citizens (n) Fig. 5. The relationship between k and evote for direct democracy (gray k line) and dynamically distributed democracy (black line). The plot provides the proportion of identical, correct decisions over a simulation that was run • As participation wanes, dynamically 6. A visualization with 1000 artificially generated networks composed of 100 citizens each. Fig. distributed democracy is able to1, andcolor denotes th citizen’s is purple is 0.5. As previously stated, let x ∈ [0, 1]n denote 14 political Reingold layout. the simulate direct democracy. xi is the tendency of each citizen in this population, where tendency of citizen i and, for the purpose of simulation, is determined from a uniform distribution. Assume that every n “vote power” and 1 14 Rodriguez, M.A., Steinbock, D.J., “A Social Networka population of n citizens uses some social network- such that thentotal a citizen in for Societal-Scale Decision-Making based system to create links to those individuals that they 1. Let y ∈ R+ deno Systems,” Proceedings of the Computational Social and Organizational Science In practice, these links flowed to each citize believe reflect their tendency the best. Conference, 2004. [http://arxiv.org/abs/cs/0412047] may point to a close friend, a relative, or some public figure a ∈ {0, 1}n denotes whose political tendencies resonate with the individual. In in the current decis other words, representatives are any citizens, not political values of a are biase candidates that serve in public office. Let A ∈ [0, 1]n×n denote of making the citize the link matrix representing the network, where the weight of the citizen inactive. an edge, for the purpose of simulation, is denoted where ◦ denotes en 1 − |xi − xj | if link exists Ai,j = π←0 0 otherwise. i≤ while i= y←y
  • 57. Collective Decision Making: Techno-Government • In this model of decision making, there is no governmental body. • Power is determined when a decision is needed. • How are bills created? Wikilegislature?15 • What about different types of trust (e.g. “Marko trusts Josh in engineering decisions only.”) — Hint: Multi-relational+ graphs. Tagging legislature and tagging trust.16 15 Turoff, M., Roxanne-Hiltz, S., Bieber, M., Rana, A., “Collaborative Discourse Structures in Computer Mediated Group Communications”, Hawaii International Conference on Systems Science (HICSS), 1998. [http://web.njit.edu/~turoff/Papers/CDSCMC/CDSCMC.htm] 16 Rodriguez, M.A., “Social Decision Making with Multi-Relational Networks and Grammar-Based Particle Swarms,” Hawaii International Conference on Systems Science (HICSS), pp. 39–49, 2007. [http://arxiv.org/abs/cs/0609034]
  • 58. “The founders of modern democracies provided a moral heritage that remains highly regarded in societies today. However, it should be remembered that it is the ideals that are valuable, not the specific implementation of the systems that protect and support them. If there is another implementation of government that better realizes these ideals, then, by the rights of man, it must be enacted.”17 – Michael Scott 17 Rodriguez, M.A., Watkins, J.H., “Revisiting the Age of Enlightenment from a Collective Decision Making Systems Perspective,” First Monday, 14(8), University of Illinois at Chicago Library, 2009. [http://arxiv.org/abs/0901.3929]
  • 59. Eudaemonic Engine: Seeking Virtue through Circuitry The year is 2018.
  • 60. Human life on earth has stabilized.
  • 61. Humans no longer struggle to survive. They struggle for eudaemonia. They seek the “good daemon” within...
  • 62. Eudaemonic Engine: Artistotle • Being virtuous is repeatedly choosing correctly. • Habitual correct behavior leads to eudaemonia – complete engagement in the world (a complete sense of engagement/acceptance).18 19 • Can systems aid individuals in choosing correctly – in all aspects of life? Aristotle David L. Norton 18 Aristotle, “Nicomachean Ethics”, 350 B.C. 19 Mihaly Csikszentmihalyi, “Flow: The Psychology of Optimal Experience”, Harper Perennial, 1990.
  • 63. Eudaemonic Engine: Resource Modeling But if the development of character is a the moral objective, it is obvious that [...] the choices of vocation and avocations to pursue, of friends to cultivate, of books to read are moral for they clearly influence such development.20 • Web services are continuing to build richer models of humans, resources, and the relationships between them. • There exists an increasing reliance on such services to aid in decision making: correct books (Amazon.com), correct movies (NetFlix.com), correct music (Pandora), correct occupation (Monster.com), correct friends (PointsCommuns.com), correct life partner (Match.com), etc.21 20 David L. Norton, “Democracy and Moral Development: A Politics of Virtue”, University of California Press, 1991. 21 Rodriguez, M.A., Watkins, J., “Faith in the Algorithm, Part 2: Computational Eudaemonics,” Proceedings of the International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, 5712, pp. 813–820, 2009. [http://arxiv.org/abs/0904.0027]
  • 64. Eudaemonic Engine: Mapping Person to Resource movie watch article read time person listen music meet friend eat food Map an individual to actions on resources. However, how do we model/expose the resources of the world?
  • 65. Model
  • 66. Eudaemonic Engine: The Web of Data homologenekegg projectgutenberg symbol libris cas bbcjohnpeel unists diseasome dailymed w3cwordnet chebi hgnc pubchem eurostat mgi omim wikicompany geospecies geneid reactome drugbank worldfactbook magnatune pubmed opencyc freebase uniparc linkedct homologenekegg projectgutenberg taxonomy uniprot interpro symbol libris uniref geneontologypdb umbel yago pfam dbpedia bbclatertotp govtrack prosite cas bbcjohnpeel prodom flickrwrappropencalais unists uscensusdata diseasome dailymed w3cwordnet surgeradio chebi lingvoj linkedmdb virtuososponger hgnc pubchem eurostat rdfbookmashup mgi omim wikicompany geospecies swconferencecorpus geonames musicbrainz geneid myspacewrapper dblpberlin reactome pubguide drugbank worldfactbook magnatune revyu pubmed jamendo opencyc uniparcrdfohloh freebase bbcplaycountdata linkedct uniprotriese taxonomy semanticweborg foafprofiles siocsites interpro uniref geneontology audioscrobbler pdb bbcprogrammes umbel dblphannover openguides yago crunchbase pfam dbpedia bbclatertotp govtrack doapspace prosite prodom flickrwrappropencalais flickrexporter qdos uscensusdata budapestbme surgeradio eurecom semwebcentral lingvoj linkedmdb ecssouthampton dblprkbexplorer newcastle virtuososponger pisa rae2001 rdfbookmashup geonames musicbrainz eprints irittoulouse laascnrs acm citeseer swconferencecorpus myspacewrapper resex ieee dblpberlin pubguide ibm revyu jamendo rdfohloh bbcplaycountdata semanticweborg siocsites riese foafprofiles openguides audioscrobbler bbcprogrammes dblphannover crunchbase doapspace flickrexporter budapestbme qdos
  • 67. Eudaemonic Engine: URIs of the Web of Data http://dbpedia.org/resource/The Fountainhead FLICKR http://www4.wiwiss.fu-berlin.de/flickrwrappr/photos/Ayn_Rand foaf:depiction flickr:Ayn_Rand dbpprop:hasPhotoCollection dbpedia:Ayn_Rand DBPEDIA dbpedia:Book dbpedia:author dbpedia:Fountain_Head rdf:type
  • 68. Eudaemonic Engine: Datasets on the Web of Data data set domain data set domain data set domain audioscrobbler music govtrack government pubguide books bbclatertotp music homologene biology qdos social bbcplaycountdata music ibm computer rae2001 computer bbcprogrammes media ieee computer rdfbookmashup books budapestbme computer interpro biology rdfohloh social chebi biology jamendo music resex computer crunchbase business laascnrs computer riese government dailymed medical libris books semanticweborg computer dblpberlin computer lingvoj reference semwebcentral social dblphannover computer linkedct medical siocsites social dblprkbexplorer computer linkedmdb movie surgeradio music dbpedia general magnatune music swconferencecorpus computer doapspace social musicbrainz music taxonomy reference drugbank medical myspacewrapper social umbel general eurecom computer opencalais reference uniref biology eurostat government opencyc general unists biology flickrexporter images openguides reference uscensusdata government flickrwrappr images pdb biology virtuososponger reference foafprofiles social pfam biology w3cwordnet reference freebase general pisa computer wikicompany business geneid biology prodom biology worldfactbook government geneontology biology projectgutenberg books yago general geonames geographic prosite biology ...
  • 69. Eudaemonic Engine: Transforms Development A new application development paradigm emerges. No longer do data and application providers need to be the same entity (left). With the Web of Data, its possible for developers to write applications that utilize data that they do not maintain (right).22 Application 1 Application 2 Application 3 Application 1 Application 2 Application 3 processes processes processes processes processes processes Web of Data structures structures structures structures structures structures 127.0.0.1 127.0.0.2 127.0.0.3 127.0.0.1 127.0.0.2 127.0.0.3 22 Rodriguez, M.A., “A Reflection on the Structure and Process of the Web of Data,” Bulletin of the American Society for Information Science and Technology, 35(6), pp. 38–43, 2009. [http://arxiv.org/abs/0908.0373]
  • 70. Now that there is a rich structure, what is the process?
  • 72. Eudaemonic Engine: Diffusion Processes on Graphs A graph diffusion process will be used to determine the solution to one’s problems. • Graph traversing can be seen as a diffusion process over a graph. • “Energy” moves over a graph and reverberates in regions where there is recurrence (i.e. cycles). • At some t in the future, the vertices with the greatest flow are the solution to the problem.
  • 73. Eudaemonic Engine: Diffusion Processes on Graphs
  • 74. Eudaemonic Engine: Diffusion Processes on Graphs
  • 75. Eudaemonic Engine: Diffusion Processes on Graphs
  • 76. Eudaemonic Engine: Diffusion Processes on Graphs
  • 77. Eudaemonic Engine: Diffusion Processes on Graphs
  • 78. Implementing a diffusion process is easy when the edges of the graph are unlabeled. flow = new HashMap<Vertex,Integer>(); current = Arrays.asList(startVertex); steps = 10; for(int i=0; i<steps; i++) { current = current.collect{ it.getAdjacentVertices() } current.each{ flow[it] = flow[it] + 1 } }
  • 79. Eudaemonic Engine: Diffusion on a Property Graph? likes emil likes linked 24 process knows True Blood likes wrote wrote likes likes jen knows marko knows peter occupation occupation likes likes wrote occupation intelligence The Wire gremlin tagged graphs With different types of things being related by different types of relations, you need to specify legal paths for the energy to flow over.
  • 80. Eudaemonic Engine: Diffusion on a Property Graph • Problem statement = Start vertices + path expression. • Problem solution = Highest energy vertices at t.23 24 25 23 Examples presented next are basic due to the simplicity of the toy graph example used. In such cases, queries as opposed to energy diffusions are best. In general, the purpose of an energy diffusion is to expose recurrence/feedback in the graph. For the more technically inclined, think of it as determining the eigenvector of the graph defined by the path expression. 24 Rodriguez, M.A., “Grammar-Based Random Walkers in Semantic Networks,” Knowledge-Based Systems, 21(7), pp. 727–739, 2008. [http://arxiv.org/abs/0803.4355] 25 Rodriguez, M.A., Neubauer, P., “A Path Algebra for Multi-Relational Graphs,” 2nd International Workshop on Graph Data Management (GDM11), 2010. [http://arxiv.org/abs/1011.0390]
  • 81. Eudaemonic Engine: Friend Recommendation likes emil likes linked 24 process knows True Blood likes wrote wrote likes likes jen knows marko knows peter occupation occupation likes likes wrote occupation intelligence The Wire gremlin tagged graphs “Who are my friends’ friends that are not me or my friends?”26 26 marko.outE[[label:’knows’]].inV.aggregate(x).outE.inV{!x.contains(it)}
  • 82. Eudaemonic Engine: Product Recommendation likes emil likes linked 24 process knows True Blood likes wrote wrote likes likes jen knows marko knows peter occupation occupation likes likes wrote occupation intelligence The Wire gremlin tagged graphs “Who likes what I like? Of those things they like, what else do they like that I don’t already like?”27 27 marko.outE[[label:’likes’]].inV.aggregate(x).inE[[label:’likes’]].outV.outE[[label:’likes’]].inV{!x.contains(it)}
  • 83. Eudaemonic Engine: Product Recommendation 2 likes emil likes linked 24 process knows True Blood likes wrote wrote likes likes jen knows marko knows peter occupation occupation likes likes wrote occupation intelligence The Wire gremlin tagged graphs “Who likes what I like and what do they like? What do the people I know like? Of those things liked, what do I not already like?”
  • 84. Eudaemonic Engine: Recommendation • Different paths through a domain model expose different types of recommendations. • Individual path preferences allow for an ecosystem of traversals (different problems can be solved over the same domain model).28 29 30 28 Rodriguez, M.A., Allen, D.W., Shinavier, J., Ebersole, G., “A Recommender System to Support the Scholarly Communication Process,” 2009. [http://arxiv.org/abs/0905.1594] 29 Rodriguez, M.A., “Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Recommendation,” Technical Talk Seminar, AT&T Interactive, 2010. [http://slidesha.re/bOCy4Q] 30 Traversal Patterns with Gremlin available at https://github.com/tinkerpop/gremlin/wiki/ Traversal-Patterns.
  • 85. Universal Computer: A Single Computational Substrate The year is 2023.
  • 86. Life is good. Humans flourish. Virtuous men’s minds are filled with wonderfully creative ideas. Inventions proliferate.
  • 87. Advances in computer network technology yield a new model of computing. Computer networks are no longer the bottleneck for speed. Accessing local and remote data is no longer considered “different.” The distinction between RAM, disk drive, and Web disappears.
  • 88. Universal Computer: A Computational Substrate On the Web... • Represent data. • Represent code. • Represent virtual machines.
  • 89. Universal Computer: Represent Data • URIs form an infinite universal address space. • A URI can denote a datum. http://markorodriguez.com#self (Marko) http://sws.geonames.org/4887398/about.rdf (Chicago) http://data.nytimes.com/N38395718310308503251 (Malm¨) o • RDF (Resource Description Framework) is a data model for linking URIs into a multi-relational graph.
  • 90. Universal Computer: Represent Data 127.0.0.2 127.0.0.1 atti:marko atti:bestFriend nm:puppy atti:hasFur atti:hasFur atti:numberOfLegs atti:numberOfLegs "2"^^xsd:integer "false"^^xsd:boolean "4"^^xsd:integer "true"^^xsd:boolean • The concept of atti:marko and the properties atti:numberOfLegs, atti:hasFur, and atti:bestFriend is maintained by AT&Ti graph server. • The concept of nm:puppy is maintained by a New Mexico graph server. • The data types of xsd:integer and xsd:boolean are maintained by XML standards organization.
  • 91. Universal Computer: Represent Code • Computing is a series of instructions — add, write, branch, goto... • The URI address space and RDF glue can be seen as computational medium.31 _:123 rdf:type atti:Add atti:left-op atti:right-op rdf:subClassOf "3"^^xsd:int "7"^^xsd:int atti:Instruction 31 Rodriguez, M.A., “General-Purpose Computing on a Semantic Network Substrate,” Emergent Web Intelligence: Advanced Semantic Technologies, eds. R. Chbeir, A. Hassanien, A. Abraham, and Y. Badr, pp. 57–104, 2010. [http://arxiv.org/abs/0704.3395]
  • 92. Universal Computer: Represent Code atti:marko atti:bestFriend nm:puppy atti:hasMethod atti:isHappy Method atti:pet "false"^^xsd:boolean atti:args atti:block _:1234 _:2345 atti:inst rdf:1 _:3456 "animal"^^xsd:string // make animal happy Represent methods and their instructions attached to objects/classes.
  • 93. Universal Computer: Represent Virtual Machines Virtual Machine atti:VM atti:marko atti:bestFriend nm:puppy atti:hasMethod atti:isHappy rdf:type _:6789 atti:pc _:3456 atti:pet "false"^^xsd:boolean atti:block atti:inst _:2345 write "true"^^xsd:boolean Represent not only code, but the machines that execute it.
  • 94. Universal Computer: Represent Virtual Machines xsd:boolean RVM xsd:boolean [1] [1] methodReuse halt programLocation Fhat operandTop hasFrame returnTop [0..1] [0..1] [0..1] currentFrame [0..1] Operand [0..1] Instruction ReturnStack Stack rdf:rest rdf:rest blockTop rdf:first [0..1] [0..*] rdf:first [0..1] [0..1] forFrame Frame [1] rdfs:Resource Instruction rdf:li [0..*] [0..1] [0..1] Frame Block Variable Stack rdf:rest hasSymbol hasValue fromBlock rdf:first [0..1] [1] [0..*] [1] Block xsd:string rdfs:Resource Block NenoFhat Project (circa 2006): http://neno.lanl.gov.
  • 95. Global Data Structure Data Machine Architecture API Program Virtual Machine State read/write read/write Virtual Machine Processes ... 127.0.0.1 Physical Machines 127.0.0.4 127.0.0.2 127.0.0.3 Physics My Belief in Reality
  • 96. Universal Computer: A Ramification • Data, APIs, code, machine architectures, and virtual machines are within the same global URI address space. Code can by physically distributed across computers. For example, an add instruction on 127.0.0.1 references a branch instruction on 127.0.0.2. Hardware machines can be added or removed without altering the state of computation — only the speed. No developer concept of RAM-based memory addresses — the only address space is the space of all URIs.
  • 97. Universal Computer: Another Ramification • Reflection down to the machine level.32 Most languages support the manipulation of code at runtime. In this model, the virtual machine can be altered at runtime. Code can rewrite the virtual machine that is evaluating the code. (i.e. create lots of bugs.) 32 Rodriguez, M.A., The RDF Virtual Machine, LA-UR-08-03925, in review, 2009. [http://arxiv.org/ abs/0802.3492]
  • 98. The year is 2030.
  • 99. Man learns to encode themselves into the URI address space...33 34 33 Egan, G., “Permutation City,” Eos Publisher, 1995. 34 Rodriguez, M.A., “From the Signal to the Symbol: Structure and Process in Artificial Intelligence,” Center for Nonlinear Studies Post Doctorate Seminar, Los Alamos National Laboratory, Los Alamos, New Mexico, 2008. [http://slidesha.re/hdqRn2]
  • 100. Outline • Graph Structures • Graph Databases • Graph Applications • TinkerPop Product Suite
  • 101. This is the TinkerPop...
  • 102. TinkerPop Productions • Blueprints: Data Models and their Implementations [http://blueprints.tinkerpop.com] • Pipes: A Data Flow Framework using Process Graphs [http://pipes.tinkerpop.com] • Gremlin: A Graph-Based Programming Language [http://gremlin.tinkerpop.com] • Rexster: A RESTful Graph Shell [http://rexster.tinkerpop.com]35 35 Please see http://engineering.attinteractive.com/2010/12/a-graph-processing-stack/ for a short review of these products. Also TinkerPop’s homepage at: http://tinkerpop.com
  • 103. Blueprints: A Property Graph Model Interface Blueprints • Blueprints is the like the JDBC of the graph database community. • Provides a Java-based interface API for the property graph data model. Graph, Vertex, Edge, Index. • Connectors to TinkerGraph, Neo4j, OrientDB, Sails (e.g. AllegroGraph, HyperSail, etc.), and soon InfiniteGraph. Into the future, hope to support InfoGrid, Sones, DEX, and HyperGraphDB.36 36 HyperGraphDB makes use of an n-ary graph structure known as a hypergraph. Blueprints, in its current form, only supports the more common binary graph.
  • 104. Creating a Neo4jGraph in Blueprints // create a graph Graph graph = new Neo4jGraph("/tmp/neo4j"); // add two vertices Vertex a = graph.addVertex(null); a.setProperty("name","marko"); Vertex b = graph.addVertex(null); b.setProperty("name","peter"); // join the two vertices by a knows relation Edge e = graph.addEdge(null,a,b,"knows"); e.setProperty("since","2007"); 0 knows 1 since=2007 name=marko name=peter
  • 105. Handy Features of Blueprints • Supports automatic transactions graph.setTransactionMode(AUTOMATIC -or- MANUAL) In automatic mode, every manipulation of the graph is wrapped in a transaction and committed. • Supports automatic indices graph.createIndex(AUTOMATIC -or- MANUAL) In automatic mode, elements are added or removed from an index as their properties are manipulated. • Utility Suite Blueprints Sail makes a graphdb into a traversal-based RDF store. GraphML Reader/Writer library.
  • 106. Pipes: A Data Flow Framework using Process Graphs Pipes • Lazy data flow with support for Blueprints-based graph processing. • Provides a collection of “pipes” (implement Iterable and Iterator) that are connected together to form processing pipelines. Filters: ComparisonFilterPipe, RandomFilterPipe, etc. Traversal: VertexEdgePipe, EdgeVertexPipe, PropertyPipe, etc. Splitting/Merging: CopySplitPipe, RobinMergePipe, etc. Logic: OrFilterPipe, AndFilterPipe, etc.
  • 107. Pipes: Chained Iterators This pipeline takes objects of type A and turns them into objects of type D through a sequence of processing pipes...37 D D A A A Pipe1 B Pipe2 C Pipe3 D D A D A Pipeline Pipe<A,D> pipeline = new Pipeline<A,D>(Pipe1<A,B>, Pipe2<B,C>, Pipe3<C,D>) 37 Though not discussed, splitting and merging is allowed as well (branching pipelines).
  • 108. Pipes: A Simple Example “What are the names of the people that marko knows?” B name=peter knows A knows C name=pavel name=marko created created D name=gremlin
  • 109. Pipes: A Simple Example Pipe<Vertex,Edge> pipe1 = new VertexEdgePipe(Step.OUT_EDGES); Pipe<Edge,Edge> pipe2= new LabelFilterPipe("knows",Filter.NOT_EQUAL); Pipe<Edge,Vertex> pipe3 = new EdgeVertexPipe(Step.IN_VERTEX); Pipe<Vertex,String> pipe4 = new PropertyPipe<String>("name"); Pipe<Vertex,String> pipeline = new Pipeline(pipe1,pipe2,pipe3,pipe4); pipeline.setStarts(new SingleIterator<Vertex>(graph.getVertex("A")); B name=peter knows A knows C name=pavel name=marko created created D name=gremlin
  • 110. Pipes: A Simple Example for(String name : pipeline) { System.out.println(name); } B name=peter knows A knows C name=pavel name=marko created created D name=gremlin peter pavel
  • 111. Pipes: A Simple Example EdgeVertexPipe(IN_VERTEX) VertexEdgePipe(OUT_EDGES) PropertyPipe("name") B name=peter knows A knows C name=pavel name=marko created created D name=gremlin LabelFilterPipe("knows")
  • 112. Pipes: A Simple Example EdgeVertexPipe(IN_VERTEX) VertexEdgePipe(OUT_EDGES) PropertyPipe("name") B name=peter knows A knows C name=pavel name=marko created created D name=gremlin LabelFilterPipe("knows")
  • 113. Pipes: A Simple Example EdgeVertexPipe(IN_VERTEX) VertexEdgePipe(OUT_EDGES) PropertyPipe("name") B name=peter knows A knows C name=pavel name=marko created created D name=gremlin LabelFilterPipe("knows")
  • 114. Pipes: A Simple Example EdgeVertexPipe(IN_VERTEX) VertexEdgePipe(OUT_EDGES) PropertyPipe("name") B name=peter knows A knows C name=pavel name=marko created created D name=gremlin LabelFilterPipe("knows")
  • 115. Pipes: Library of Generally Useful Pipes [ MERGES ] [ SIDEEFFECTS ] [ FILTERS ] ExhaustiveMergePipe AggregatorPipe AndFilterPipe RobinMergePipe CountCombinePipe CollectionFilterPipe CountPipe ComparisonFilterPipe [ GRAPHS ] KeyCombinePipe DuplicateFilterPipe EdgeVertexPipe SideEffectCapPipe FutureFilterPipe IdFilterPipe ObjectFilterPipe IdPipe [ UTILITIES ] OrFilterPipe LabelFilterPipe DynamicStartsPipe RandomFilterPipe LabelPipe GatherPipe RangeFilterPipe PropertyFilterPipe PathPipe PropertyPipe PrintStreamPipe VertexEdgePipe ProductPipe [ SPLITS ] ScatterPipe CopySplitPipe TypeCastPipe RobinSplitPipe Pipeline ...
  • 116. Pipes: Easy to Create New Pipes public class NumCharsPipe extends AbstractPipe<String,Integer> { public Integer processNextStart() { String word = this.starts.next(); return word.length(); } } When extending the base class AbstractPipe<S,E> all that is required is an implementation of processNextStart().
  • 117. Pipes: Easy to Create New Pipes Most of my projects are composed of lots of application specific Pipes. com.tinkerpop.pipes That is, Pipes that are specific to my domain model and yield useful jumps in the graph. For example, domain specific SameLikesPipe<Vertex,Vertex>. From these domain specific Pipes, complex algorithms are created through the piecing together of complex traversal those Pipes. For example, algorithms RecommenderPipe<Vertex,Map>.
  • 118. Gremlin: A Graph-Based Programming Language Gremlin G = (V, E) • A graph traversal language that uses Groovy as its host language. • Compiles Gremlin syntax down to Pipes (implements JSR 223).38 38 At the time of this presentation, Gremlin’s most recent stable release is 0.6 which is a standalone language. To increase the flexibility of the language, 0.7-SNAPSHOT+ boasts the use of Groovy as the host the language.
  • 119. Gremlin: Easily Compose Graph Related Pipes Pipes is verbose... Pipe<Vertex,Edge> pipe1 = new VertexEdgePipe(Step.OUT_EDGES); Pipe<Edge,Edge> pipe2 = new LabelFilterPipe("knows",Filter.NOT_EQUAL); Pipe<Edge,Vertex> pipe3 = new EdgeVertexPipe(Step.IN_VERTEX); Pipe<Vertex,String> pipe4 = new PropertyPipe<String>("name"); Pipe<Vertex,String> pipeline = new Pipeline(pipe1,pipe2,pipe3,pipe4); pipeline.setStarts(new SingleIterator<Vertex>(graph.getVertex("A")); ...relative to Gremlin. g.v(‘A’).outE[[label:‘knows’]].inV.name
  • 120. Gremlin: The Simple Example inV outE name B name=peter knows g.v('A') A knows C name=pavel name=marko created created D name=gremlin [[label:'knows']]
  • 121. Gremlin: Defining a Step “Who likes the same things that I like?” Vertex.metaClass.same_like = { _().outE[[label:‘likes’]].inV.inE[[label:‘likes’]].outV } B likes E likes likes A C likes F likes likes D likes G
  • 122. Gremlin: Defining a Step gremlin> g.v(‘A’).same_likes ==>v[E] ==>v[F] ==>v[F] ==>v[G] B likes E likes likes A C likes F likes likes D likes G
  • 123. Gremlin: Defining a Step gremlin> m = g:id-v(‘A’).same_likes.group_count >> 1 gremlin> m ==>v[E]=1 ==>v[F]=2 ==>v[G]=1 v[F] is most similar, in terms of likes, to v[A].39 39 For a thorough review of such traversal patterns, please see: Rodriguez, M.A., “Problem- Solving using Graph Traversals: Searching, Scoring, Ranking, and Recommendation,” July 2010. [http://slidesha.re/bOCy4Q]
  • 124. Rexster: A RESTful Graph Shell reXster • Allows Blueprints graphs to be exposed through a RESTful API (HTTP). • All communication is via JSON. • Supports stored traversals written in raw Pipes or Gremlin. • Supports adhoc traversals represented in Gremlin. • Provides “helper classes” for performing search-, score-, and rank-based traversal algorithms—in concert, support for recommendation.
  • 125. Rexster: URI Patterns • http://localhost/graph/vertices: all the vertices in the graph • http://localhost/graph/vertices/1: vertex with id 1 in the graph. • http://localhost/graph/vertices/1/outE: outgoing edges of vertex with id 1. { "results": { "_type":"vertex", "_id":"1", "name":"aaron", "type":"person" }, "query_time":0.1537 }
  • 126. Typical TinkerPop Graph Stack GET http://{host}/{resource} Neo4j NativeStore TinkerGraph
  • 127. Conclusion • Property graphs are convenient structures for modeling the real-world. • Graph databases provide index-free adjacency to ensure speedy traversal over graphs. • The graph is such a general data structure that it can be used for numerous applications. • TinkerPop provides a database agnostic stack of technologies for working with property graphs.
  • 128. Acknowledgements • Research collaborators: Daniel Steinbock (Stanford), Jennifer H. Watkins (LANL), Alberto Pepe (Harvard), Joshua Shinvaier (RPI), Johan Bollen (LANL), Herbert Van de Sompel (LANL). • TinkerPop contributors: Pavel Yaskevich (Riptano), Stephen Mallete (Independent), Darrick Weibe (Independent), Alex Averbuch (Swedish Institute of CS), Peter Neubauer (Neo4j). • Others: Emil Eifrem (Neo4j), Luca Garulli (Orient Technologies), Aaron Patterson (AT&Ti).