SlideShare a Scribd company logo
1 of 129
Download to read offline
Graph Databases: Trends in the Web of Data
                              Marko A. Rodriguez
                            Graph Systems Architect
                             http://markorodriguez.com
                             http://twitter.com/twarko
                           http://slideshare.com/slidarko




KRDB Trends in the Web of Data School - Brixen/Bressanone, Italy– September 18, 2010

                            September 18, 2010
Abstract
Relational databases are perhaps the most commonly used data management systems. In
relational databases, data is modeled as a collection of disparate tables. In order to unify
the data within these tables, a join operation is used. This operation is expensive as the
amount of data grows. For information retrieval operations that do not make use of
extensive joins, relational databases are an excellent tool. However, when an excessive
amount of joins are required, the relational database model breaks down. In contrast,
graph databases maintain one single data structure—a graph. A graph contains a set of
vertices (i.e. nodes, dots) and a set of edges (i.e. links, lines). These elements make
direct reference to one another, and as such, there is no notion of a join operation. The
direct references between graph elements make the joining of data explicit within the
structure of the graph. The benefit of this model is that traversing (i.e. moving between
the elements of a graph in an intelligent, direct manner) is very efficient and yields a style
of problem-solving called the graph traversal pattern. This session will discuss graph
databases, the graph traversal programming pattern, and their use in solving real-world
problems.
Outline

• Graph Structures, Algorithms, and Algebras

• Graph Databases and the Property Graph

• TinkerPop Open-Source Graph Product Suite

• Real-Time, Real-World Use Cases for Graphs
difficulty
    graphs




    algebra
 databases


    indices
               time


data models
                                  Difficulty Chart




   software


 algorithms

  real-world

 conclusion
Outline

• Graph Structures, Algorithms, and Algebras

• Graph Databases and the Property Graph

• TinkerPop Open-Source Graph Product Suite

• Real-Time, Real-World Use Cases for Graphs
difficulty
    graphs




    algebra
 databases


    indices
               time


data models
                                  Difficulty Chart




   software


 algorithms

  real-world

 conclusion
G = (V, E)
A Vertex




There once was a vertex i ∈ V named tenderlove.
Two Vertices




And then came along another vertex j ∈ V named sixwing.
Thus, i, j ∈ V .
A Directed Edge




Our tenderlove extended a relationship to sixwing. Thus,
(i, j) ∈ E.
The Single-Relational, Directed Graph




More vertices join, create edges and, in turn, the graph grows...
The Single-Relational, Directed Graph as a Matrix

A single-relational graph defined as

                         G = (V, E ⊆ (V × V ))

can be represented as the adjacency matrix A ∈ {0, 1}n×n, where

                                  1   if (i, j) ∈ E
                        Ai,j =
                                  0   otherwise.
The Single-Relational, Directed Graph as a Matrix



                                0    1       1   0

                                1    0       0   1

                                1    0       0   0

                                0    1       0   0


         G                               A
The Single-Relational, Directed Graph

• All vertices are homogenous in meaning—all vertices denote the same
  type of object (e.g. people, webpages, etc.).1

• All edges are homogenous in meaning—all edges denote the same type
  of relationships (e.g. friendship, works with, etc.).2



   1
     This is not completely true. All n-partite single-relational graphs allow for the division of the vertex set
into n subsets, where V = n Ai : Ai ∩ Aj = ∅. Thus, its possible to implicitly type the vertices.
                                 i
   2
     This is not completely true. There exists an injective, information-preserving function that maps any
multi-relational graph to a single-relational graph, where edge types are denoted by topological structures.
Thus, at a “higher-level,” it is possible to create a heterogenous set of relationships.
Rodriguez, M.A., “Mapping Semantic Networks to Undirected Networks,” International Journal of Applied
Mathematics and Computer Sciences, 5(1), pp. 39–42, 2009. [http://arxiv.org/abs/0804.0277]
Applications of Single-Relational Graphs

• Social: define how people interact (collaborators, friends, kins).

• Biological: define how biological components interact (protein, food
  chains, gene regulation).

• Transportation: define how cities are joined by air and road routes.

• Dependency: define how software modules, data sets, functions depend
  on each other.

• Technology: define the connectivity of Internet routers, web pages, etc.

• Language: define the relationships between words.
The Limitations of Single-Relational Graph Modeling




       Friendship Graph        Favorite Graph       Works-For Graph


Unfortunately, single-relational graphs are independent of each other. This
is because G = (V, E)—there is only a single edge set E (i.e. a single type
of relation).
Numerous Algorithms for Single-Relational Graphs
We would like a more flexible graph modeling construct, but unfortunately,
most of our graph algorithms were designed for single-relational graphs.3

• Geodesic: diameter, radius, eccentricity, closeness, betweenness, etc.

• Spectral: random walks, PageRank, eigenvector centrality, spreading
  activation, etc.

• Assortativity: scalar, categorical, hierarchal, etc.

• Others: ...4
   3
      For a fine book on graph analysis algorithms, please see:
Brandes, U., Erlebach T., “Network Analysis: Methodological Foundations,” edited book, Springer, 2005.
    4
      One of the purposes of this presentation is advocate for local graph analysis algorithms (i.e. priors-based,
relative) vs. global graph analysis algorithms. Most popular graph analysis algorithms are global in that
they require an analysis of the whole graph (or a large portion of a graph) to yield results. Local analysis
algorithms are dependent on sub-graphs of the whole and in effect, can boast faster running times.
How do we solve this?


A multi-relational graph and a path
algebra.
G = (V, E)
A Directed Edge
A Directed, Labeled Edge




                             friend




Lets specify the type of relationship that exists between
tenderlove and sixwing. Thus, (i, j) ∈ Efriend.
Growing a Multi-Relational Graph




                           friend


                           friend


Lets make the friendship relationship symmetric. Thus,
(j, i) ∈ Efriend.
Growing a Multi-Relational Graph




             friend   friend




                               friend


                               friend


Lets add marko to the mix: k ∈ V . This graph is still
single-relational. There is only one type of relation.
Growing a Multi-Relational Graph




              friend   friend   favorite




                                 friend


                                 friend


Lets add an (i, l) ∈ Efavorite. Now there are multiple types of
relationships: Efriend and Efavorite (2 edge sets).
The Multi-Relational, Directed Graph

• At this point, there is a multi-relational, directed graph: G = (V, E),
  where E = (E0, E1, . . . , Em ⊆ (V × V )).5

• Vertices can denote different types of objects (e.g. people, places).6

• Edge can denote different types of relationships (e.g. friend, favorite).7




   5
      Another representation is G ⊆ (V × Ω × V ), where Ω ⊆ Σ∗ is the set of legal edge labels.
    6
      Vertex types can be determined by the domain and range specification of the respective edge
relation/label/predicate. Or, another way, by means of an explicit typing relation such as a, type, b .
    7
      Edge types are determined by the label that accompanies the edge.
The Multi-Relational, Directed RDF Graph

• This is the data model of the Web of Data—the RDF data model.

• The RDF data model’s vertex set is split into URIs (U ), literals (L), and
  blank/anonymous nodes (B), such that:

                          G ⊆ ((U × B) × U × (U × B × L)).8




   8
    Named graphs are a popular extension to the RDF data model. There are various serializatons such as
TriX FIND and Trig FIND. However, for the sake of brevity, this presentation will not discuss named graphs.
The Multi-Relational, Directed Graph as a Tensor

A three-way tensor can be used to represent a multi-relational graph. If

               G = (V, E = {E0, E1, . . . , Em ⊆ (V × V )})

is a multi-relational graph, then A ∈ {0, 1}n×n×m and

                            1 if (i, j) ∈ Em : 1 ≤ k ≤ m
                 Ak
                  i,j   =
                            0 otherwise.

Thus, each edge set in E represents an adjacency matrix and the
combination of m adjacency matrices forms a 3-way tensor.
The Multi-Relational, Directed Graph as a Tensor



          friend
                                                   0   0   0   0
                                                   0   0   0   1
 friend            favorite
                                                   0   0   0   0

                                                   0   0   0   0




                                    s
                                   er
                               sw

                                        nd
                              an


                                        e

                                             ite
                                    fri

                                            or
          G                                                A
                                           v
                                        fa
Multi-Relational Graph Algorithms




“Can we evaluate single-relational graph analysis algorithms
on a multi-relational graph?”
The Meaning of Edge Meanings

               loves loves loves            hates hates hates
           loves              loves      hates            hates




• Multi-relationally: tenderlove is more liked than marko.

• Single-relationally: tenderlove and marko simply have the same
  in-degree.
    Given, lets say, degree-centrality, tenderlove and marko are equal as
    they have the same number of relationships. The edge labels do not
    effect the output of the degree-centrality algorithm.
What Do You Mean By “Central?”
                                         answer

                                                                                      ...
                         answer_for




                                                                              ite
                                                                             or
                                                                              v
                      What is your favorite




                                                                           fa
                                              answer_by
                         bookstore?


                                                       favorite
                        question_by
                                                                                     ...




                                                                                    friend


                                              friend              friend



Lets focus specifically on centrality. What is the most central vertex in a
multi-relational graph? Who is the most central friend in the graph—by friendship, by
question answering, by favorites, etc?
Primary Eigenvector


“What does the primary eigenvector of a multi-relational
graph mean?”91011




   9
     We will use the primary eigenvector for the following argument. Note that the same argument applies
for all known single-relational graph algorithms (i.e. geodesic, spectral, community detection, etc.).
  10
     Technical details are left aside such as outgoing edge probability distributions and the irreducibility of
the graph.
  11
     The popular PageRank vector is defined as the primary eigenvector of a low-probability fully connected
graph combined with the original graph (i.e. both graphs maintain the same V ).
Primary Eigenvector: Ignoring Edge Labels

                              |V |×|V |
• If π = Bπ, where B ∈ N+            is the adjacency matrix formed by
  merging the edge sets in E, then edge labels are ignored—all edges are
  treated equally.

• In this “ignoring labels”-model, there is only one primary eigenvector for
  the graph—one definition of centrality.

• With a heterogenous set of vertices connected by a heterogenous set of
  edges, what does this type of centrality mean?
Primary Eigenvector: Isolating Subgraphs
• Are there other primary eigenvectors in the multi-relational graph?

• You can ignore certain edge sets and calculate the primary eigenvector
  (e.g. pull out the single-relational “friend”-graph.)
        π = Afriendπ, where Afriend ∈ {0, 1}|V |×|V | is the adjacency matrix
        formed by the edge set Efriend.

• Thus, you can isolate subgraphs (i.e. adjacency matrices) of the
  multi-relational graph and calculate the primary eigenvector for those
  subgraphs.

• In this “isolation”-model, there are m definitions of centrality—one for
  each isolated subgraph.12
 12
      Remember, A ∈ {0, 1}n×n×m .
Ultimately what we want is...
Primary Eigenvector: Turing Completeness
• What about using paths through the graph—not simply explicit one-step
  edges?

• What about determining centrality for a relation that isn’t explicit in E
  (i.e. Ak ∈ A)? In general, what about π = Xπ, where X is a derived
  adjacency matrix of the multi-relational graph.
         For example, if I know who everyone’s friends are, then I know (i.e. can
         infer, derive, compute) who everyone’s friends-of-a-friends (FOAF) are.
         What about the primary eigenvector of the derived FOAF graph?

• In the end, you want a Turing-complete framework—you want complete
  control (universal computability) over how π moves through the
  multi-relational graph structure.13
 13
      These ideas are expounded upon at great length throughout this presentation.
A Path Algebra for Evaluating
Single-Relational Algorithms on Multi-Relational Graphs
• There exists a multi-relational graph algebra for mapping single-relational
  graph analysis algorithms to the multi-relational domain.14

• The algebra works on a tensor representation of a multi-relational graph.

• In this framework and given the running example, there are as many
  primary eigenvectors as there are abstract path definitions.
  14
    * Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network
Analysis Algorithms,” Journal of Informetrics, 4(1), pp. 29–41, doi:10.1016/j.joi.2009.06.004, 2009.
[http://arxiv.org/abs/0806.2274]
* Rodriguez, M.A., “Grammar-Based Random Walkers in Semantic Networks,” Knowledge-Based Systems,
21(7), pp. 727–739, doi:10.1016/j.knosys.2008.03.030, 2008. [http://arxiv.org/abs/0803.4355]
* Rodriguez, M.A., Watkins, J.,“Grammar-Based Geodesics in Semantic Networks,” Knowledge-Based
Systems, in press, doi:10.1016/j.knosys.2010.05.009, 2010.
The Operations of the Multi-Relational Path Algebra

• A · B: ordinary matrix multiplication determines the number of (A, B)-
  paths between vertices.
• A : matrix transpose inverts path directionality.
• A ◦ B: Hadamard, entry-wise multiplication applies a filter to selectively
  exclude paths.
• n(A): not generates the complement of a {0, 1}n×n matrix.
• c(A): clip generates a {0, 1}n×n matrix from a Rn×n matrix.
                                                  +
• v ±(A): vertex generates a {0, 1}n×n matrix from a Rn×n matrix, where
                                                        +
  only certain rows or columns contain non-zero values.
• xA: scalar multiplication weights the entries of a matrix.
• A + B: matrix addition merges paths.
Primary Eigenvectors in a Multi-Relational Graph
• Friend: Afriend π
                                                        2
• FOAF: Afriend · Afriend π ≡                 Afriend       π
                                       2
• FOAF (no self):            Afriend ◦ n(I) π 15
                                                        2
• FOAF (no friends nor self):                 Afriend ◦ n Afriend ◦ n(I) π

• Co-Worker:             Aworks   at
                                       · Aworks   at
                                                        ◦ n (I) π

• Friend-or-CoWorker:              0.65Afriend + 0.35           Aworks   at
                                                                              · Aworks   at
                                                                                              ◦ n ( I)    π
• ...and more.16
  15
      I ∈ {0, 1}|V |×|V | : Ii,i = 1—the identity matrix.
   16
      Note, again, that the examples are with respect to determining the primary eigenvector of the derived
adjacency matrix. The same argument holds for all other single-relational graph analysis algorithms. In
general, the path algebra provides a means of creating “higher-order” (i.e. semantically-rich) single-relational
graphs from a single multi-relational graph. Thus, these derived matrices can be subjected to standard
single-relational graph analysis algorithms.
Deriving “Semantically Rich” Adjacency Matrices

                                                     0      0         0    0
                       0   0   0   0


                                                                                  =
                                                     0      0        1     0
                                                                                                          0   0   0   0
                       0   0   0   1

                       0   0   0   0
                                       ∪             0      1        0     0                              0   0   0   1

                                                                                                          0   0   0   0
                       0   0   0   0                 0       0       0     0
       s




                                                                                             an f) d
      er




                                                                                                      n
                                                                                                          0   0   0   0
  sw




                                                                                                se ie
           nd




                                                                                             fri rs
                                                                                             o -fr
                                                                
 an


           e




                                                                                                    e
                 ite
       fri




                                                                                          (n -of




                                                                                                   d
                                                                                                 sw
                                                                                                  l


                                                                                                en
                           A               Afriend · A    friend
               or




                                                                                               nd
                                                                      ◦ n(I)                                  A
              v




                                                                                              e
                                                                                      e
           fa




                                                                                           rit
                                                                                  fri




                                                                                        vo
                                                                                     fa
                                                                  2
                                                         Afriend ◦ n(I)
                                                 friend-of-a-friend (no self)



Use the multi-relational graph to generate explicit edges that were implicitly defined as
paths. Those new explicit edges can then be memoized17 and re-used (time vs. space
tradeoff)—aka path reuse.
 17
      Memoization Wikipedia entry: http://en.wikipedia.org/wiki/Memoization.
Benefits, Drawbacks, and Future of the Path Algebra
• Benefit: Provides a set of theorems for deriving equivalences and thus,
  provides the foundation for graph traversal engine optimizers.18 Serves a
  similar purpose as the relational algebra for relational databases.19

• Drawback: The algebra is represented in matrix form and thus,
  operationally, works globally over the graph.20

• Future: A non-matrix-based, ring theoretic model of graph traversal
  that supports +, −, and · on individual vertices and edges. The Gremlin
  [http://gremlin.tinkerpop.com] graph traversal engine presented
  later provides the implementation before a fully-developed theory.
  18
     Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network Analysis
Algorithms,” Journal of Informetrics, 4(1), pp. 29–41, 2009. [http://arxiv.org/abs/0806.2274]
  19
     Codd, E.F., “A Relational Model of Data for Large Shared Data Banks,” Communications of the ACM,
13(6), pp. 377–387, doi:10.1145/362384.362685, 1970.
  20
     It is possible to represent local traversals using vertex filters at the expense of clumsy notation.
Outline

• Graph Structures, Algorithms, and Algebras

• Graph Databases and the Property Graph

• TinkerPop Open-Source Graph Product Suite

• Real-Time, Real-World Use Cases for Graphs
difficulty
    graphs




    algebra
 databases


    indices
               time


data models
                                  Difficulty Chart




   software


 algorithms

  real-world

 conclusion
The Simplicity of a Graph

• A graph is a simple data structure.

• A graph states that something is related to something else (the foundation
  of any other data structure).21

• It is possible to model a graph in various types of databases.22
       Relational database: MySQL, Oracle, PostgreSQL
       JSON document database: MongoDB, CouchDB
       XML document database: MarkLogic, eXist-db
       etc.
  21
     A graph can be used to represent other data structures. This point becomes convenient when looking
beyond using graphs for typical, real-world domain models (e.g. friends, favorites, etc.), and seeing their
applicability in other areas such as modeling code (e.g. http://arxiv.org/abs/0802.3492), indices, etc.
  22
     For the sake of diagram clarity, the examples to follow are with respect to a single-relational, directed
graph. Note that it is possible to model multi-relational graphs in these types of database as well.
Representing a Graph in a Relational Database

outV | inV
------------                           A
  A   |   B
  A   |   C
  C   |   D                  B                     C
  D   |   A

                                       D
Representing a Graph in a JSON Database

{
    A : {
      outE   : [B, C]                     A
    }
    B : {
      outE   : []
    }
                                B                  C
    C : {
      outE   : [D]
    }
    D : {
      outE   : [A]                        D
    }
}
Representing a Graph in an XML Database

graphml
  graph
    node id=A /                        A
    node id=B /
    node id=C /
    node id=D /
    edge source=A   target=B   /
    edge source=A   target=C   /   B           C
    edge source=C   target=D   /
    edge source=D   target=A   /
  /graph
/graphml
                                         D
Defining a Graph Database



“If any database can represent a graph, then what
              is a graph database?”
Defining a Graph Database



   A graph database is any storage system that
        provides index-free adjacency.2324



  23
     There is no “official” definition of what makes a database a graph database. The one provided is my
definition (respective of the influence of my collaborators in this area). However, hopefully the following
argument will convince you that this is a necessary definition. Given that any database can model a graph,
such a definition would not provide strict enough bounds to yield a formal concept (i.e. ).
  24
     There is adjacency between the elements of an index, but if the index is not the primary data structure
of concern (to the developer), then there is indirect/implicit adjacency, not direct/explicit adjacency. A
graph database exposes the graph as an explicit data structure (not an implicit data structure).
Defining a Graph Database by Example

            Toy Graph                Gremlin
                                     (stuntman)

        B               E



A


        C               D
Graph Databases and Index-Free Adjacency
                                     B                    E



                     A


                                     C                    D


• Our gremlin is at vertex A.
• In a graph database, vertex A has direct references to its adjacent vertices.
• Constant time cost to move from A to B and C . It is dependent upon the number
  of edges emanating from vertex A (local).
Graph Databases and Index-Free Adjacency


                   B                E



        A


                   C                D



             The Graph (explicit)
Graph Databases and Index-Free Adjacency


                   B                E



       A


                   C                D



             The Graph (explicit)
Non-Graph Databases and Index-Based Adjacency



                                        B    E



      A        B   C                A
     B,C       E   D,E

                         D      E
                                        C    D



• Our gremlin is at vertex A.
Non-Graph Databases and Index-Based Adjacency


                                                       B                 E



      A         B     C                   A
      B,C        E   D,E

                           D       E
                                                       C                 D



• In a non-graph database, the gremlin needs to look at an index to determine what
  is adjacent to A.
• log2(n) time cost to move to B and C . It is dependent upon the total number of
  vertices and edges in the database (global).
Non-Graph Databases and Index-Based Adjacency


                                         B                  E



A          B     C               A
B,C        E    D,E

                       D     E           C                  D




      The Index (explicit)           The Graph (implicit)
Non-Graph Databases and Index-Based Adjacency



                                         B                  E



A          B     C               A
B,C        E    D,E

                       D     E           C                  D




      The Index (explicit)           The Graph (implicit)
Index-Free Adjacency
• While any database can implicitly represent a graph, only a
  graph database makes the graph structure explicit.25

• In a graph database, each vertex serves as a “mini index”
  of its adjacent elements.26

• Thus, as the graph grows in size, the cost of a local step
  remains the same.27
  25
      Please see http://markorodriguez.com/Blarko/Entries/2010/3/29_MySQL_vs._Neo4j_on_a_
Large-Scale_Graph_Traversal.html for some performance characteristics of graph traversals in a
relational database (MySQL) and a graph database (Neo4j).
   26
      Each vertex can be intepreted as a “parent node” in an index with its children being its adjacent
elements. In this sense, traversing a graph is analogous in many ways to traversing an index—albeit the
graph is not an acyclic connected graph (tree). (a vision espoused by Craig Taverner)
   27
      A graph, in many ways, is like a distributed index.
Graph Databases Do Make Use of Indices



                     A            B     C
                                                             }      Index of Vertices
                                                                          (by id)




                                               D         E   }         The Graph




• There is more to the graph than the explicit graph structure.

• Indices index the vertices by their properties (e.g. ids, name, latitude).28
  28
     Graph databases can be used to create index structures. In fact, in the early days of Neo4j, Neo4j used
its own graph structure to index the properties of its vertices—a graph indexing a graph. A thought iterated
many times over by Craig Taverner who is interested in graph databases for geo-spatial indexing/analysis.
The Patterns of a Relational Database



• In a relational database, operations are conceptualized set-
  theoretically with the joining of tuple structures being the
  means by which normalized/separated data is associated.
The Pattern of a Graph Databases



• In a graph database, operations are conceptualized graph-
  theoretically with paths over edges being the means by which
  non-adjacent/separated vertices are associated.29




  29
   Rodriguez, M.A., Neubauer, P., “The Graph Traversal Pattern,” ATTi and NeoTechnology Technical
Report, currently in review, 2010. [http://arxiv.org/abs/1004.1001]
What About Triple/Quad Stores?

• In a triple/quad store, operations are conceptualized set-
  theoretically.
    pattern matching (e.g. SPARQL): ?pattern
    inferencing (e.g. RDFS, OWL): ?pattern =⇒ triples.

• In many implementations, the triple/quad store make use
  of indices that combine subjects (?s), predicates (?p), and
  objects (?o).
Triple/Quad Stores, Graph Theory, and the Web of Data

• The triple/quad store rides an interesting boundary between
  a relational and graph database — though its seen more set
  theoretically. This is because, I believe, RDF/Web of Data
  is not presented/taught in terms of graphs and graph
  theoretic operations.
Graph Databases and the Web of Data

• In theory and ignoring performance, index and index-free models have the
  same expressivity and allow for the same manipulations. But such theory
  does not determine intention and the mental ruts that any approach
  engrains.

• Can the graph traversal pattern become a staple in the Web of
  Data?
    Formulate SPARQL pattern matching in terms of traversing.
    Formulate inference in terms of traversing.
    Take advantage of graph theoretic models of data processing.
Outline

• Graph Structures, Algorithms, and Algebras

• Graph Databases and the Property Graph

• TinkerPop Open-Source Graph Product Suite

• Real-Time, Real-World Use Cases for Graphs
difficulty
    graphs




    algebra
 databases


    indices
               time


data models
                                  Difficulty Chart




   software


 algorithms

  real-world

 conclusion
TinkerPop: Making Stuff for the Fun of It
• Open source software group started in 2008 focusing on graph data
  structures, graph query engines, graph-based programming languages,
  and, in general, tools and techniques for working with graphs.
  [http://tinkerpop.com] [http://github.com/tinkerpop]
    Current members: Marko A. Rodriguez (ATTi), Peter Neubauer
    (NeoTechnology), Joshua Shinavier (Rensselaer Polytechnic Institute),
    and Pavel Yaskevich (“I am no one from nowhere”).
TinkerPop Productions

• Blueprints: Data Models and their Implementations
  [http://blueprints.tinkerpop.com]
• Pipes: A Data Flow Framework using Process Graphs
  [http://pipes.tinkerpop.com]
• Gremlin: A Graph-Based Programming Language
  [http://gremlin.tinkerpop.com]
• Rexster: A RESTful Graph Shell
  [http://rexster.tinkerpop.com]
     Wreckster: A Ruby API for Rexster
     [http://github.com/tenderlove/wreckster]


There are other TinkerPop products (e.g. Ripple, LoPSideD, TwitLogic, etc.), but for the
purpose of this presentation, only the above will be discussed.
Blueprints: Data Models and their Implementations

                                    Blueprints

• Blueprints is the like the JDBC of the graph database community.

• Provides a Java-based interface API for the property graph data model.
        Graph, Vertex, Edge, Index.

• Provides implementations of the interfaces for TinkerGraph, Neo4j,
  OrientDB, Sails (e.g. AllegroSail, Neo4jSail), and soon (hopefully)
  others such as InfiniteGraph, InfoGrid, Sones, and HyperGraphDB.30
  30
    HyperGraphDB makes use of an n-ary graph structure known as a hypergraph. Blueprints, in its current
form, only supports the more common binary graph.
Pipes: A Data Flow Framework using Process Graphs


                                   Pipes

• A dataflow framework with support for Blueprints-based graph processing.

• Provides a collection of “pipes” (implement Iterable and Iterator)
  that are connected together to form processing pipelines.
    Filters: ComparisonFilterPipe, RandomFilterPipe, etc.
    Traversal: VertexEdgePipe, EdgeVertexPipe, PropertyPipe, etc.
    Splitting/Merging: CopySplitPipe, RobinMergePipe, etc.
    Logic: OrPipe, AndPipe, etc.
Gremlin: A Graph-Based Programming Language


                             Gremlin    G = (V, E)



• A Turing-complete, graph-based programming language that compiles
  Gremlin syntax down to Pipes (implements JSR 223).

• Support various language constructs: :=, foreach, while, repeat,
  if/else, function and path definitions, etc.
    ./outE[@label=‘friend’]/inV
    ./outE[@label=‘friend’]/inV/outE[@label=‘friend’]/inV[g:except($ )]
    g:key(‘name’,‘Aaron Patterson’)[0]/outE[@label=‘favorite’]/inV/@name
Rexster: A RESTful Graph Shell

                                reXster
• Allows Blueprints graphs to be exposed through a RESTful API (HTTP).

• Supports stored traversals written in raw Pipes or Gremlin.

• Supports adhoc traversals represented in Gremlin.

• Provides “helper classes” for performing search-, score-, and rank-based
  traversal algorithms—in concert, support for recommendation.

• Aaron Patterson (ATTi) maintains the Ruby connector Wreckster.
Typical TinkerPop Graph Stack
       GET http://{host}/{resource}




       Neo4j    NativeStore   TinkerGraph
Outline

• Graph Structures, Algorithms, and Algebras

• Graph Databases and the Property Graph

• TinkerPop Open-Source Graph Product Suite

• Real-Time, Real-World Use Cases for Graphs
difficulty
    graphs




    algebra
 databases


    indices
               time


data models
                                  Difficulty Chart




   software


 algorithms

  real-world

 conclusion
Using Graphs in Real-Time Systems
• Most popular graph algorithms require global graph analysis.
        Such algorithms compute a score, a vector, etc. given the structure
        of the whole graph. Moreover, many of these algorithms have large
        running times: O(|V | + |E|), O(|V | log |V |), O(|V |2), etc.

• Many real-world situations can make use of local graph analysis.31
        Search for x starting from y.
        Score x given its local neighborhood.
        Rank x relative to y.
        Recommend vertices to user x.
  31
     Many web applications are “ego-centric” in that they are with respect to a particular user (the user
logged in). In such scenarios, local graph analysis algorithms are not only prudent to use, but also, beneficial
in that they are faster than global graph analysis algorithms. Many of the local analysis algorithms discussed
run in the sub-second range (for graphs with “natural” statistics).
Applications of Graph Databases and Traversal Engines:
            Searching, Scoring, and Ranking
                                                        ˆ
• Searching: given a power multi-set of vertices (P(V )) and a path
  description (Ψ), return the vertices at the end of that path.32
     ˆ              ˆ
    P(V ) × Ψ → P(V )

• Scoring: given some vertices and a path description, return a score.
    ˆ
    P(V ) × Ψ → R

• Ranking: given some vertices and a path description, return a map of
  scored vertices.
     ˆ
    P(V ) × Ψ → (V × R)
  32
   Use cases need not be with respect to vertices only. Edges can be searched, scored, and ranked as well.
However, in order to express the ideas as simply as possible, all discussion is with respect to vertices.
Applications of Graph Databases and Traversal Engines:
                   Recommendation
• Recommendation: searching, scoring, and ranking can all be used as
  components of a recommendation. Thus, recommendation is founded on
  these more basic ideas.
       Recommendation aids the user by allowing them to make “jumps” through
       the data. Items that are not explicitly connected, are connected implicitly through
       recommendation (through some abstract path Ψ).

• The act of recommending can be seen as an attempt to increase the
  density of the graph around a user’s vertex. For example, recommending
  user i ∈ V places to visit U ⊂ V , will hopefully lead to edges of the form
   i, visited, j : ∀j ∈ U .33
  33
   A standard metric for recommendation quality is seen as how well it predicts the user’s future behavior.
That is, does it predict an edge.
There Is More Than “People Who Like X Also Like Y .”
• A system need not be limited to one type of recommendation. With graph-based
  methods, there are as many recommendations as there are abstract paths.
• Use recommendation to aid the user in solving problems (i.e. computationally
  derive solutions for which your data set is primed for). Examples below are with respect
  to problem-solving in the scholarly community.34
     Recommend articles to read. (articles)
     Recommend collaborators to work on an idea/article with. (people)
     Recommend a venue to submit the article to. (venues)
     Recommend an editor referees to review the article. (people)35
     Recommend scholars to talk to and concepts to talk to them about at the venue.
     (people and tags)
  34
     Rodriguez, M.A., Allen, D.W., Shinavier, J., Ebersole, G., “A Recommender System to Support the
Scholarly Communication Process,” KRS-2009-02, 2009. [http://arxiv.org/abs/0905.1594]
  35
     Rodriguez, M.A., Bollen, J., “An Algorithm to Determine Peer-Reviewers,” Conference on Information
and Knowledge Management (CIKM), pp. 319–328, doi:10.1145/1458082.1458127, 2008. [http:
//arxiv.org/abs/cs/0605112]
Real-Time, Domain-Specific, Graph-Based,
                  Problem-Solving Engine



                                      Ψ5
                                  Ψ1                                 Real-Time
                            +     Ψ4
                                     Ψn Ψ2
                                      Ψ3
                                                           =      Domain-Specific
                                                                    Graph-Based
                                                               Problem-Solving Engine

                               Library of Path/Traversal
                                      Expressions
       Graph Data Set


Your domain model (i.e. graph dataset) determines what traversals you can design,
develop, and deploy. Together, these determine which types of problems you can solve
automatically/computationally for yourself, your users.
Applicable in Various, Seemingly Diverse Areas
 • Applications to a techno-social government (i.e. collective decision making systems).36




                                                                                                          0.20
                                                                                            correct decisions
                                                                           0.00 0.05 0.10 0.15 0.95
                                                                                                                         direct democracy
                                                                                                                         dynamically distributed democracy




                                                                                              0.80
                                                                              proportion oferror
                                                                                    0.65
                                                                                                                          dynamically distributed democracy
                                                                                                                          direct democracy




                                                                           0.50
                                                                                                                  100 90 80 70 60 50 40 30 20 10
                                                                                                                 100 90 80 70 60 50 40 30 20 10                            0
                                                                                                                                                                           0
                                                                                                                               percentage of active citizens
                                                                                                                              percentage of active citizens (n)

   36                                                                          Fig. 5.                           The relationship between k and evote for direct democracy (gray
        * Rodriguez, M.A., Watkins, J.H., “Revisiting the Age of Enlightenment from a Collective The plot provides
                                                              line) and dynamically distributed democracy (black line). Decision Making Systems
                                                                                                         k

                                                                               the proportion of identical, correct decisions over a simulation that was run
Perspective,” First Monday, 14(8), 2009. [http://arxiv.org/abs/0901.3929]
                                                            with 1000 artificially generated networks composed of 100 citizens each.
                                                                                                        Fig. 6. A visualization of a network of t
* Rodriguez, M.A., “Social Decision Making with Multi-Relational Networks and Grammar-Based Particle Swarms,” color denotes their “political tenden
                                                                                                        citizen’s Hawaii
International Conference on Systems Science (HICSS), pp. 39–49, 2007. [http://arxiv.org/abs/cs/0609034] is 1, and layout. is 0.5.    purple                                                              The layout algori
                                                                 As previously stated, let x ∈ [0, 1]n denote the political Reingold
* Rodriguez, M.A., Steinbock, D.J., “A Social Network for Societal-Scale each citizen in this population, where xi is the of the North
                                                              tendency of Decision-Making Systems,” Proceedings
                                                                               tendency of citizen i and, for the purpose of simulation, is
American Association for Computational Social and Organizational Science Conference, 2004. [http://arxiv.org/abs/cs/
                                                            determined from a uniform distribution. Assume that every 1                                                            n “vote power” and this is represe
0412047]                                                                       citizen in a population of n citizens uses some social network-                                     such that the total amount of vote
                                                                               based system to create links to those individuals that they                                         1. Let y ∈ Rn denote the total amo
                                                                                                                                                                                                 +
                                                                               believe reflect their tendency the best. In practice, these links                                    flowed to each citizen over the cours
                                                                               may point to a close friend, a relative, or some public figure                                       a ∈ {0, 1}n denotes whether citizen
                                                                               whose political tendencies resonate with the individual. In                                         in the current decision making pro
                                                                               other words, representatives are any citizens, not political                                        values of a are biased by an unfair
                                                                               candidates that serve in public office. Let A ∈ [0, 1]n×n denote                                     of making the citizen an active parti
                                                                               the link matrix representing the network, where the weight of                                       the citizen inactive. The iterative alg
                                                                               an edge, for the purpose of simulation, is denoted                                                  where ◦ denotes entry-wise multip

                                                                                                                                1 − |xi − xj | if link exists
A detour into the property graph
         data model...
Property Graphs and Graph Databases

• Most graph databases support a graph data model known as a property
  graph.

• A property graph is a directed, attributed, multi-relational graph.
  In other words, vertices and edges are equipped with a collection of
  key/value pairs.37




  37
     Rodriguez, M.A., Neubauer, P., “Constructions from Dots and Lines,” Bulletin of the American Society
for Information Science and Technology, American Society for Information Science and Technology, 2010.
[http://arxiv.org/abs/1006.2361]
From a Multi-Relational Graph...




friend   friend   favorite




                  friend


                   friend
...to a Property Graph
            name=marko
         location=Santa Fe                               lat=11111
            gender=male                                long=22222




                             created_at=123456

friend       friend              favorite
                                                       name=sixwing
                                                 location=West Hollywood
                                                        gender=male
                             created_at=234567

                                  friend


                                  friend
                             created_at=234567
Why the Property Graph Model?
• Standard single-relational graphs do not provide enough modeling flexibility for use in
  real-world situations.38
• Multi-relational graphs do and the Web of Data (RDF) world demonstrates this to be
  the case in practice.

• Property graphs are perhaps more practical because not every datum needs to be
  “related” (e.g. age, name, etc.). Thus, the edge and key/value model is a convenient
  dichotomy.39
• Property graphs provide finer-granularity on the meaning of an edge as the key/values
  of an edge add extra information beyond the edge label.


  38
      This is not completely true—researchers use the single-relational graph all the time. However, in most
data rich applications, its limiting to work with a single edge type and a homogenous population of vertices.
   39
      RDF has a similar argument in that literals can only be the object of a triple. However, in practice, when
represented in a graph database, there is a single literal vertex denoting that literal and thus, is traversable
like any other vertex.
Graph Type Morphisms
                                    weighted graph


                                    add weight attribute


                                     property graph


                   remove attributes remove attributes          no op



labeled graph           no op       semantic graph              no op    directed graph

           remove edge labels       remove edge labels
make labels URIs                                                no op


                                                                        remove directionality
   rdf graph                           multi-graph

                                remove loops, directionality,
                                   and multiple edges


                                      simple graph              no op   undirected graph
Toy Graph Dataset
                                                                                     lat=11111
                                                                                   long=22222




                   name=marko
                                                          created_at=123456                      4            name=sixwing
                                                                                                        location=West Hollywood
                location=Santa Fe
                   gender=male
                                                                   favorite                                    gender=male



                                    friend                                     friend
                                                1                                                                           2
                3
                                                                                  favorite
                                                                              created_at=234567                 friend
                                    favorite


                                                6
                                               name=Bryce Canyon                             favorite
                                                                                                                            5
                                                                                                             name=charlie




We will use the toy-graph above to demonstrate Gremlin (to introduce the syntax).
Dataset Schema in Neo4j
Neo4j [http://neo4j.org] is a “schema-less” database. However, ultimately, data is
represented according to some schema whether that schema be explicit in the database, in
the code interacting with the database, or in the developer’s head.40 Please note the
schema diagrammed below is a non-standard convention.41


                              name=string                   name=string
                            location=string                  lat=double
                             gender=string                  long=double
                               type=Person                      type=Place

                                Person                            Place

                   friend
                                                 favorite

  40
   A better term for “schema-less” might have been “dynamic schema.”
  41
   For expressive, standardized graph-based schema languages, refer to RDFS [http://www.w3.org/TR/
rdf-schema/] and OWL [http://www.w3.org/TR/owl-features/] of the Web of Data community.
Dataset Schema in MySQL
CREATE TABLE friend (
   outV INT NOT NULL,
   inV INT NOT NULL);
CREATE INDEX friend_outV_index USING BTREE ON friend (outV);
CREATE INDEX friend_inV_index USING BTREE ON friend (inV);

CREATE TABLE favorite (
   outV INT NOT NULL,
   inV INT NOT NULL);
CREATE INDEX favorite_outV_index USING BTREE ON favorite (outV);
CREATE INDEX favorite_inV_index USING BTREE ON favorite (inV);

CREATE TABLE metadata (
   vertex INT NOT NULL,
   _key VARCHAR(100) NOT NULL,
   _value VARCHAR(100),
   PRIMARY KEY (vertex, _key));
CREATE INDEX metadata_vertex_index USING BTREE ON metadata (vertex);
CREATE INDEX metadata_key_index USING BTREE ON metadata (_key);
CREATE INDEX metadata_value_index USING BTREE ON metadata (_value);
Basic Gremlin

gremlin   (1 + 2) * 4 div 5
==2.4
gremlin marko +  a.  + rodriguez
==marko a. rodriguez
gremlin func ex:add-one($x)
            $x + 1
         end
gremlin foreach $y in g:list(1,2,3,4)
            g:print(ex:add-one($y))
         end
2
3
4
5
Searching Example: Friends

                                                                          gremlin $_g := neo4j:open(‘/data/mygraph’)
         name=marko
      location=Santa Fe                                 lat=11111         gremlin $_ := g:id-v(1)
         gender=male                                  long=22222
                                                                          ==v[1]
                                                                          gremlin .
                                                                          ==v[1]
       3                                                            4
                                                                          gremlin ./outE
                          created_at=123456                               ==e[10][1-friend-2]
           friend             favorite                name=sixwing
                                                                          ==e[11][1-friend-3]
                                                location=West Hollywood   ==e[12][1-favorite-4]
                                                       gender=male
                                                                          gremlin ./outE[@label=‘friend’]/inV/@name
                               friend                                     ==sixwing
       1                                                            2     ==marko
                                                                          gremlin ./outE[@label=‘friend’]/inV/@gender
favorite                        favorite
                                                                          ==male
                            created_at=234567           friend            ==male
                                                                          gremin ./outE[@label=‘friend’]
       6                                                                      /inV[@location=‘Santa Fe’]/@name
      name=Bryce Canyon                  favorite                         ==marko
                                                                    5
                                                     name=charlie
Searching Example: Friends in SPARQL
The name of tenderlove’s friends...

SELECT ?y WHERE {
  ex:tenderlove ex:friend ?x .
  ?x ex:name ?y }

The gender of tenderlove’s friends...

SELECT ?y WHERE {
  ex:tenderlove ex:friend ?x .
  ?x ex:gender ?y }

The name of tenderlove’s friends who live in Santa Fe...

SELECT ?y WHERE     {
  ex:tenderlove     ex:friend ?x .
  ?x ex:livesIn     ex:SantaFe .
  ?x ex:name ?y     }
Searching Example: FOAF (No Friends, No Self)

                                                                          gremlin .
         name=marko
      location=Santa Fe                                 lat=11111         ==v[1]
         gender=male                                  long=22222
                                                                          gremlin ./outE[@label=‘friend’]/inV
                                                                             /outE[@label=‘friend’]/inV
                                                                          ==v[1]
       3                                                            4
                                                                          ==v[1]
                          created_at=123456                               ==v[5]
           friend             favorite                name=sixwing
                                                                          gremlin (./outE[@label=‘friend’]
                                                location=West Hollywood       /inV)[g:assign($x)]
                                                       gender=male
                                                                                /outE[@label=‘friend’]
                               friend                                              /inV[g:except($_)][g:except($x)]
       1                                                            2                 /@name
                                                                          ==charlie
favorite                        favorite
                            created_at=234567           friend


       6
      name=Bryce Canyon                  favorite
                                                                    5
                                                     name=charlie
Searching Example: FOAF (No Friends, No Self)
                      in SPARQL


The name of tenderlove’s friends’ friends who are not him or his friends.

SELECT ?z WHERE {
  ex:tenderlove ex:friend ?x .
  ?x ex:friend ?y .
  ?y ex:name ?z .
  FILTER { ?y != ex:tenderlove AND ?x != ?y }}
Searching Example: Friend’s Favorites

                                                                          gremlin .
         name=marko
      location=Santa Fe                                 lat=11111         ==v[1]
         gender=male                                  long=22222
                                                                          gremlin ./outE[@label=‘friend’]/inV
                                                                             /outE[@label=‘favorite’]/inV
                                                                          ==v[6]
       3                                                            4
                                                                          ==v[6]
                          created_at=123456                               gremlin ./outE[@label=‘friend’]/inV
           friend             favorite                name=sixwing
                                                                             /outE[@label=‘favorite’ and @created_at234500]
                                                location=West Hollywood         /inV/@name
                                                       gender=male
                                                                          ==Bryce Canyon
                               friend
       1                                                            2
favorite                        favorite
                            created_at=234567           friend


       6
      name=Bryce Canyon                  favorite
                                                                    5
                                                     name=charlie
Loading Identical Data into MySQL and Neo4j

On my laptop. 10,000,000 edges are created between 100,000 vertices.
Random assignment with 50% favorite-edges and 50% friend-edges.
This is a dense, relatively unnatural graph—everyone is heavily
connected.42




  42
     The largest Neo4j instance that I know of contained 100,030,002 (100 million) vertices, 3,041,030,000
(3 billion) edges, and 140,120,000 (140 million) properties. This was deployed on Amazon EC2 and was
yielding FOAF traversals, on average, in ∼50ms (again, index-free traversal). Figures provided by Todd
Stavish (Stav.ish Consulting [http://blog.stavi.sh/]).
Play Query



“What do my friends’ friends
        favorite?”
Querying Random Vertices with Repeats
mysql SELECT count(favorite.inV) FROM friend as fa, friend as fb, favorite
   WHERE fa.outV=XXX AND fa.inV=fb.outV AND fb.inV=favorite.outV;
29.72 sec -- vertex 110752
0.330 sec -- vertex 110752 REPEAT
10.10 sec -- vertex 145893
11.64 sec -- vertex 126993
0.250 sec -- vertex 126993 REPEAT
14.37 sec -- vertex 136442
6.990 sec -- vertex 154837
0.240 sec -- vertex 154837 REPEAT

gremlin g:count(g:id(XXX)/outE[@label=‘friend’]/inV
   /outE[@label=‘friend’]/inV/outE[@label=‘favorite’]/inV)
3.646 sec -- vertex 110752
0.350 sec -- vertex 110752 REPEAT
0.756 sec -- vertex 145893
3.251 sec -- vertex 126993
0.211 sec -- vertex 126993 REPEAT
1.462 sec -- vertex 136442
1.875 sec -- vertex 154837
0.268 sec -- vertex 154837 REPEAT
Web of Data Detour
A Traversal Detour Through the Web of Data
                                                                                                                  ECS
                                                                                                                 South-
                                                                                                                                  Sem-              Wiki-
                                             BBC                         Surge                                   ampton
                                                                                              LIBRIS                              Web-            company
                                          Playcount                      Radio                                                   Central                                RDF
                                             Data                                                                                                                      ohloh
                                                                                                                                                                                                 Resex
                                                                                                                   Doap-                                                             Buda-
                                                          Music-                                                   space                               Semantic                                                                    ReSIST
                                                          brainz                  Audio-                                                                                              pest                         Eurécom
                                                                                                                                                                                                                                   Project
                                                                                                                                   Flickr              Web.org
                                MySpace                                          Scrobbler         QDOS                                                                  SW           BME                                           Wiki
                                                                                                                                  exporter
                                Wrapper
                                                                                                                                                                      Conference                      IRIT
                                                                                                                                                                        Corpus                       Toulouse

                                                                                                                                                                                                                            RAE               National
                                                       BBC               BBC             Crunch                                                                                                                             2001              Science
                                                                                                              FOAF                    SIOC                                           ACM
                    BBC Music                         Later +            John             Base                                                      Revyu                                                                                    Foundation
                                     Jamendo                             Peel                                profiles                 Sites
                                                       TOTP                                                                                                         Open-
                                                                                                                                                                    Guides
                                                                                                                                                                                                       DBLP
                                                                                                                             flickr                                                                     RKB
                                                                                                       Project
                                            Pub                  Geo-                   Euro-                               wrappr                                                                    Explorer
                                                                                                       Guten-                                             Virtuoso
                                           Guide                names                    stat                                                                                 Pisa                                                               CORDIS
                                                                                                        berg                                              Sponger                                                             eprints
                         BBC
                      Programmes                                                                                                        Open
                                                                                                                                        Calais
                                                                                                                                                                                RKB
                                          riese                             World                                  Linked
                                                                                                                                                                                ECS
                                                           Magna-           Fact-                                   MDB                                                                                              IEEE                       New-
                                                                                                                                                                               South-
                                                            tune            book
                                                                                                                                                                               ampton                                                           castle
                                                                                                                                                          RDF Book
                                                                                                DBpedia                                                    Mashup
                      Linked
                     GeoData                                                                                         lingvoj          Freebase                                                                               LAAS-
                                             US                                                                                                                                              CiteSeer
                                           Census                                                                                                                                                                            CNRS
                                                                 W3C                                                                                                           DBLP
                                            Data                                                                                                                                                                                               IBM
                                                                WordNet                                                                                                      Hannover
                                                                                                                                                                                                                UniRef
                                                                                                                       GEO
                                                                                  UMBEL                               Species                 DBLP
                        Gov-
                        Track                                                                                                                 Berlin
                                                                                                                                                                  Reactome
                                                                                                  LinkedCT                                                                                 UniParc
                                   Open                                                                                                                                                                                         Taxonomy
                                    Cyc            Yago                                                                        Drug
                                                                                                                                                                                                                PROSITE
                                                                                Daily                                          Bank
                                                                                Med
                                                             Pub                                                                                 GeneID
                                                            Chem
                                   Homolo                                                                  KEGG                                                                 UniProt
                                    Gene
                                                                                                                                                                                                        Pfam                 ProDom
                                                                Disea-                  CAS
                                                                                                                                                          Gene
                                                                some
                                                                                                                                       ChEBI             Ontology
                                               Symbol                                                               OMIM

                                                                                                                                                                               Inter
                                                                                                                                                                                Pro
                                                                UniSTS                                                                                                                           PDB
                                                                                                  HGNC
                                                                                                                               MGI
                                                                                                                                                   PubMed
                                                                                                                                                                                                                                    As of July 2009




Image produced by Richard Cyganiak and Anja Jentzsch. [http://linkeddata.org/]
Defining the Web of Data

• The Web of Data is similar to the Web of Documents (of common knowledge), but
  instead of referencing documents (e.g. HTML, images, etc.) with the URI address
  space, individual datum are referenced.4344
       http://markorodriguez.com, foaf:fundedBy, http://atti.com
       http://markorodriguez.com, foaf:name, Marko Rodriguez
       http://markorodriguez.com, foaf:age, 30
       http://markorodriguez.com, foaf:knows, http://tenderlovemaking.com
• In graph theoretic terms, the Web of Data is a multi-relational graph defined as
  G ⊆ (U ∪ B) × U × (U ∪ B ∪ L), where U is the set of all URIs, B is the set of
  all blank/anonymous nodes, and L is the set of all literals.
  43
     The Web of Data is also known as the Linked Data Web, the Giant Global Graph, the Semantic Web,
the RDF graph, etc.
  44
     * Rodriguez, M.A., “Interpretations of the Web of Data, Data Management in the Semantic Web, eds.
H. Jin and Z. Lv, Nova Publishing, in press, 2010. [http://arxiv.org/abs/0905.3378]
* Rodriguez, M.A., “A Graph Analysis of the Linked Data Cloud,” Technical Report, KRS-2009-01, 2009.
[http://arxiv.org/abs/0903.0194]
Some of the Datasets on the Web of Data
data set           domain       data set           domain       data set             domain
audioscrobbler     music        govtrack           government   pubguide             books
bbclatertotp       music        homologene         biology      qdos                 social
bbcplaycountdata   music        ibm                computer     rae2001              computer
bbcprogrammes      media        ieee               computer     rdfbookmashup        books
budapestbme        computer     interpro           biology      rdfohloh             social
chebi              biology      jamendo            music        resex                computer
crunchbase         business     laascnrs           computer     riese                government
dailymed           medical      libris             books        semanticweborg       computer
dblpberlin         computer     lingvoj            reference    semwebcentral        social
dblphannover       computer     linkedct           medical      siocsites            social
dblprkbexplorer    computer     linkedmdb          movie        surgeradio           music
dbpedia            general      magnatune          music        swconferencecorpus   computer
doapspace          social       musicbrainz        music        taxonomy             reference
drugbank           medical      myspacewrapper     social       umbel                general
eurecom            computer     opencalais         reference    uniref               biology
eurostat           government   opencyc            general      unists               biology
flickrexporter      images       openguides         reference    uscensusdata         government
flickrwrappr        images       pdb                biology      virtuososponger      reference
foafprofiles        social       pfam               biology      w3cwordnet           reference
freebase           general      pisa               computer     wikicompany          business
geneid             biology      prodom             biology      worldfactbook        government
geneontology       biology      projectgutenberg   books        yago                 general
geonames           geographic   prosite            biology      ...
Web of Data Dataset Dependencies
                               homologenekegg                    projectgutenberg
                            symbol                                           libris
                                                  cas                               bbcjohnpeel
                  unists                   diseasome dailymed                 w3cwordnet
                                  chebi
                                       hgnc     pubchem           eurostat
                         mgi               omim                      wikicompany         geospecies
                                  geneid
                      reactome                drugbank                        worldfactbook
                                                                magnatune
                                  pubmed                                     opencyc
              uniparc                                                                      freebase
                                                         linkedct
                            uniprot
taxonomy                              interpro
         uniref       geneontologypdb                                             umbel
                                                                        yago
                                            pfam                  dbpedia                    bbclatertotp           govtrack
                                        prosite
                              prodom                                     flickrwrappropencalais
                                                                                               uscensusdata
                                                                                          surgeradio
                                                                     lingvoj linkedmdb
                                                                                 virtuososponger
                                                                                                         homologenekegg                        projectgutenberg
                                                          rdfbookmashup                               symbol                                               libris
                                                            swconferencecorpus        geonames musicbrainz        myspacewrapper
                                                  dblpberlin                                           pubguide             cas                              bbcjohnpeel
                                                               revyu                       unists
                                                                                                    jamendo         diseasome dailymed                 w3cwordnet
                                                                                                          chebi
                                                                     rdfohloh                                   hgnc
                                                                                                         bbcplaycountdata
                                                                                                                          pubchem          eurostat
                                                                                                 mgi                omim                      wikicompany         geospecies
                                                      semanticweborg          siocsites        riese
                                                                                                          geneid
                                                                  foafprofiles               reactome                  drugbank                        worldfactbook
                                                                                 audioscrobbler                    bbcprogrammes         magnatune
                                dblphannover    openguides                                                pubmed                                      opencyc
                                                                                 uniparc
                                                                                       crunchbase
                                                                                                                                                                    freebase
                                                                                                                                  linkedct
                                                                                                    uniprot
                                                           taxonomy   doapspace                                interpro
                                                                          uniref             geneontology pdb                                              umbel
                                                                                                                                                 yago
                                                                                                                     pfam                  dbpedia                    bbclatertotp                govtrack
                                                              flickrexporter
              budapestbme                                                 qdos                                   prosite
                                                                                                      prodom                                      flickrwrappropencalais
                                                                            semwebcentral                                                                               uscensusdata
           eurecom                  ecssouthampton
                       dblprkbexplorer
                                                                                                                                                                   surgeradio
                               newcastle                                                                                                      lingvoj linkedmdb
                  pisa
                                      rae2001                                                                                                             virtuososponger
                             acm
                                  eprints
                                       irittoulouse                                                                                    rdfbookmashup
                    laascnrs       citeseer
                                                                                                                                         swconferencecorpus         geonames musicbrainz        myspacewrapper
                           ieee
                resex                                                                                                          dblpberlin                                            pubguide
                                ibm
                                                                                                                                             revyu                               jamendo
                                                                                                                                                       rdfohloh
                                                                                                                                                                                           bbcplaycountdata
                                                                                                                                         semanticweborg        siocsites        riese
                                                                                                                                                   foafprofiles
                                                                                                                                   openguides                     audioscrobbler                    bbcprogrammes
                                                                                                              dblphannover
                                                                                                                                                                        crunchbase
                                                                                                                                                        doapspace


                                                                                                                                                  flickrexporter
Web of Data Transforms Development Paradigm
A new application development paradigm emerges. No longer do data and application
providers need to be the same entity (left). With the Web of Data, its possible for
developers to write applications that utilize data that they do not maintain (right).45

               Application 1   Application 2   Application 3   Application 1     Application 2      Application 3


                                                                     processes    processes      processes

                processes       processes       processes




                                                               Web of Data

                structures      structures      structures
                                                                    structures    structures      structures



                127.0.0.1       127.0.0.2       127.0.0.3        127.0.0.1        127.0.0.2           127.0.0.3




 45
     Rodriguez, M.A., “A Reflection on the Structure and Process of the Web of Data,”
Bulletin of the American Society for Information Science and Technology, 35(6), pp. 38–43,
doi:10.1002/bult.2009.1720350611, 2009. [http://arxiv.org/abs/0908.0373]
Extending our Knowledge of Bryce Canyon National Park
gremlin $h := lds:open()
gremlin $_ := g:id-v($h, ‘http://dbpedia.org/resource/Bryce_Canyon_National_Park’)
==v[http://dbpedia.org/resource/Bryce_Canyon_National_Park]
gremlin ./outE
==e[dbpedia:Bryce_Canyon_National_Park - dbpprop:reference - http://www.nps.gov/brca/]
==e[dbpedia:Bryce_Canyon_National_Park - dbpprop:iucnCategory - II@en]
==e[dbpedia:Bryce_Canyon_National_Park - dbpedia-owl:numberOfVisitors - 1012563^^xsd:integer]
==e[dbpedia:Bryce_Canyon_National_Park - skos:subject - dbpedia:Category:Colorado_Plateau]
==e[dbpedia:Bryce_Canyon_National_Park - dbpprop:visitationNum - 1012563^^xsd:int]
==e[dbpedia:Bryce_Canyon_National_Park - dbpedia-owl:abstract - Bryce Canyon National Park is a national
park located in southwestern Utah in the United States...@en]
==e[dbpedia:Bryce_Canyon_National_Park - dbpprop:area - 35835.0^^http://dbpedia.org/datatype/acre]
==e[dbpedia:Bryce_Canyon_National_Park - rdf:type - dbpedia-owl:ProtectedArea]
==e[dbpedia:Bryce_Canyon_National_Park - dbpedia-owl:location - dbpedia:Garfield_County%2C_Utah]
==e[dbpedia:Bryce_Canyon_National_Park - dbpprop:nearestCity - dbpedia:Panguitch%2C_Utah]
==e[dbpedia:Bryce_Canyon_National_Park - dbpprop:established - 1928-09-15^^xsd:date]
...


46

     46
     Linked Data Sail (LDS) was developed by Joshua Shinavier (RPI and TinkerPop) and connects to
Gremlin through Gremlin’s native support for Sail (i.e. for RDF graphs). LDS caches the traversed aspects
of the Web of Data into any quad-store (e.g. MemoryStore, AllegroGraph, HyperGraphSail, Neo4jSail, etc.).
Augmenting Traversals with the Web of Data
Lets extend our query over the Web of Data. Perhaps incorporate that into our searching,
scoring, ranking, and recommendation.

gremlin $visits := ./outE[@label=‘dbpprop:visitationNum’]/inV/@value
==1012563
gremlin $acreage := ./outE[@label=‘dbpprop:area’]/inV/@value
==35835.0

### imagine wrapping traversals in Gremlin functions:
###       func lds:acreage($h, $v) and func lds:visitors($h, $v)

gremlin ./outE[@label=‘friend’]/inV/outE[@label=‘favorite’]
   /inV[lds:acreage($h, .)  1000000 and lds:visitors($h, .)  2000000]/@name
==Bryce Canyon

Thus, what do tenderlove’s friends favorite that are small in acreage and visitation?47
  47
    In Gremlin, its possible to have multiple graphs open in parallel and thus, mix and match data from
each graph as desired. Hence, demonstrated by the example above, its possible to mix Web of Data RDF
graph data and Blueprints property graph data.
Using the Web of Data for Music Recommendation

Yet another aside: Using only the Web of Data data to recommend musicians/bands
with a simplistic, edge-boolean spreading activation algorithm.48

gremlin $_ :=                                         ==The Tubes
   g:id(‘http://dbpedia.../Grateful_Dead’)             ==Bob Dylan
==v[http://dbpedia.../Grateful_Dead]                  ==New Riders of the Purple Sage
gremlin lds:spreading-activation(.)                   ==Bruce Hornsby
==Jerry Garcia Acoustic Band                          ==Donna Jean Godchaux
==BK3                                                 ==Kingfish
==Phil Lesh and Friends                               ==Jerry Garcia Band
==Old and In the Way                                  ==Donna Jean Godchaux Band
==RatDog                                              ==The Other Ones
==The Dead                                            ==Bobby and the Midnites
==Heart of Gold Band                                  ==Furthur
==Legion of Mary                                      ==Rhythm Devils

  48
   Please read the following for interesting, deeper ideas in this space: Clark, A., “Associative Engines:
Connectionism, Concepts, and Representational Change,” MIT Press, 1993.
Another View of the TinkerPop Stack

                       GET http://{host}/{resource}




  Local Dataset                                 Web of Data



                  owl:sameAs
Recommendation
Extending the Schema for Some Richer Examples
For the last part of this presentation on recommendation, we will extend
the data schema to include tags (a place can be tagged with a tag). This
will allow for some richer examples.4950

                       name=string                name=string
                     location=string               lat=double
                      gender=string               long=double            name=string
                        type=Person                   type=Place               type=Tag

                         Person                         Place                     Tag

            friend
                                         favorite                   tagged

  49
      Please note that 1.)       “place” can be item/thing/book/music/etc.       2.)    “favorite” can be
likes/purchased/visited/etc. 3.) “tag” can be category/etc. A particular use case is presented, but with
little imagination, application to other schemas is, of course, plausible.
   50
      Following examples have experimental syntax that may differ slightly from official Gremlin 0.5 release.
Recommendation Example: Friend Finder
• Open Friendship Triangles: (V × Ψ) → (V × N+)51 (people)
  1. Create return map (i.e. V × N+).
  2. Determine who my friends are.
  3. Determine who my friends friends are...
  4. ...that are not already my friends or me. (weighted by the number of overlapping
     friends—more overlaps, more traversers at that user vertex)
  5. Sort return map by number of traversers at those user/people vertices.


$m := g:map()
(./outE[@label=‘friend’]/inV)[g:assign($x)]
   /outE[@label=‘friend’]/inV
     /.[g:except($x)][g:except($_)][g:op-value(‘+’,$m,.,1)]
g:sort($m,‘value’,true)
  51
        Rx ◦ Afriend · Afriend ◦ n Afriend ◦ n (I), where x is the user/person vertex. The in-degree
centrality vector of the derived adjacency matrix determines the resultant V rank.
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data

More Related Content

What's hot

Intermediate Cypher.pdf
Intermediate Cypher.pdfIntermediate Cypher.pdf
Intermediate Cypher.pdfNeo4j
 
Challenge And Evolution Of Data Orchestration at Rakuten Data System
Challenge And Evolution Of Data Orchestration at Rakuten Data SystemChallenge And Evolution Of Data Orchestration at Rakuten Data System
Challenge And Evolution Of Data Orchestration at Rakuten Data SystemAlluxio, Inc.
 
NOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jNOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jTobias Lindaaker
 
Graphs for Finance - AML with Neo4j Graph Data Science
Graphs for Finance - AML with Neo4j Graph Data Science Graphs for Finance - AML with Neo4j Graph Data Science
Graphs for Finance - AML with Neo4j Graph Data Science Neo4j
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science LandscapePhilip Bourne
 
Principles of Data Visualization
Principles of Data VisualizationPrinciples of Data Visualization
Principles of Data VisualizationEamonn Maguire
 
Neo4j Webinar: Graphs in banking
Neo4j Webinar:  Graphs in banking Neo4j Webinar:  Graphs in banking
Neo4j Webinar: Graphs in banking Neo4j
 
Graph Data Science at Scale
Graph Data Science at ScaleGraph Data Science at Scale
Graph Data Science at ScaleNeo4j
 
Scaling MongoDB with Horizontal and Vertical Sharding
Scaling MongoDB with Horizontal and Vertical Sharding Scaling MongoDB with Horizontal and Vertical Sharding
Scaling MongoDB with Horizontal and Vertical Sharding Mydbops
 
Graph Gurus 15: Introducing TigerGraph 2.4
Graph Gurus 15: Introducing TigerGraph 2.4 Graph Gurus 15: Introducing TigerGraph 2.4
Graph Gurus 15: Introducing TigerGraph 2.4 TigerGraph
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Uwe Printz
 
The Graph Traversal Programming Pattern
The Graph Traversal Programming PatternThe Graph Traversal Programming Pattern
The Graph Traversal Programming PatternMarko Rodriguez
 
Network embedding
Network embeddingNetwork embedding
Network embeddingSOYEON KIM
 
Google Data Studio for business
Google Data Studio for businessGoogle Data Studio for business
Google Data Studio for businessOWOX BI
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j InternalsTobias Lindaaker
 
ETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft AzureETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft AzureMark Kromer
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AISemantic Web Company
 
Visualizing Networks: Beyond the Hairball
Visualizing Networks: Beyond the HairballVisualizing Networks: Beyond the Hairball
Visualizing Networks: Beyond the HairballOReillyStrata
 

What's hot (20)

Intermediate Cypher.pdf
Intermediate Cypher.pdfIntermediate Cypher.pdf
Intermediate Cypher.pdf
 
Challenge And Evolution Of Data Orchestration at Rakuten Data System
Challenge And Evolution Of Data Orchestration at Rakuten Data SystemChallenge And Evolution Of Data Orchestration at Rakuten Data System
Challenge And Evolution Of Data Orchestration at Rakuten Data System
 
NOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jNOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4j
 
Graphs for Finance - AML with Neo4j Graph Data Science
Graphs for Finance - AML with Neo4j Graph Data Science Graphs for Finance - AML with Neo4j Graph Data Science
Graphs for Finance - AML with Neo4j Graph Data Science
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
 
Principles of Data Visualization
Principles of Data VisualizationPrinciples of Data Visualization
Principles of Data Visualization
 
Neo4j Webinar: Graphs in banking
Neo4j Webinar:  Graphs in banking Neo4j Webinar:  Graphs in banking
Neo4j Webinar: Graphs in banking
 
Graph based data models
Graph based data modelsGraph based data models
Graph based data models
 
Graph Data Science at Scale
Graph Data Science at ScaleGraph Data Science at Scale
Graph Data Science at Scale
 
Scaling MongoDB with Horizontal and Vertical Sharding
Scaling MongoDB with Horizontal and Vertical Sharding Scaling MongoDB with Horizontal and Vertical Sharding
Scaling MongoDB with Horizontal and Vertical Sharding
 
Graph Gurus 15: Introducing TigerGraph 2.4
Graph Gurus 15: Introducing TigerGraph 2.4 Graph Gurus 15: Introducing TigerGraph 2.4
Graph Gurus 15: Introducing TigerGraph 2.4
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
 
The Graph Traversal Programming Pattern
The Graph Traversal Programming PatternThe Graph Traversal Programming Pattern
The Graph Traversal Programming Pattern
 
Network embedding
Network embeddingNetwork embedding
Network embedding
 
Google Data Studio for business
Google Data Studio for businessGoogle Data Studio for business
Google Data Studio for business
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j Internals
 
ETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft AzureETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft Azure
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AI
 
Data models
Data modelsData models
Data models
 
Visualizing Networks: Beyond the Hairball
Visualizing Networks: Beyond the HairballVisualizing Networks: Beyond the Hairball
Visualizing Networks: Beyond the Hairball
 

Viewers also liked

Graph Databases, The Web of Data Storage Engines
Graph Databases, The Web of Data Storage EnginesGraph Databases, The Web of Data Storage Engines
Graph Databases, The Web of Data Storage EnginesPere Urbón-Bayes
 
Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories Dániel Stein
 
Relational to Big Graph
Relational to Big GraphRelational to Big Graph
Relational to Big GraphNeo4j
 
Graph database super star
Graph database super starGraph database super star
Graph database super starandres_taylor
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use CasesMax De Marzi
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4jNeo4j
 
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseBringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseJimmy Angelakos
 
How to share useful data
How to share useful dataHow to share useful data
How to share useful dataPeter McQuilton
 
GraphDevRoom Call for Sponsors
GraphDevRoom Call for SponsorsGraphDevRoom Call for Sponsors
GraphDevRoom Call for SponsorsPere Urbón-Bayes
 
Try NoSQL it doesn't hurts and is fun
Try NoSQL it doesn't hurts and is funTry NoSQL it doesn't hurts and is fun
Try NoSQL it doesn't hurts and is funPere Urbón-Bayes
 
Graph Databases introduction to rug-b
Graph Databases introduction to rug-bGraph Databases introduction to rug-b
Graph Databases introduction to rug-bPere Urbón-Bayes
 
Multi-Relational Graph Structures: From Algebra to Application
Multi-Relational Graph Structures: From Algebra to ApplicationMulti-Relational Graph Structures: From Algebra to Application
Multi-Relational Graph Structures: From Algebra to ApplicationMarko Rodriguez
 
Curriculum NESTOR MORENO
Curriculum NESTOR MORENOCurriculum NESTOR MORENO
Curriculum NESTOR MORENOmorenonestor
 
Easy methods to Validate Email Addresses
Easy methods to Validate Email AddressesEasy methods to Validate Email Addresses
Easy methods to Validate Email Addressesbillie4reid
 
Kundenservice 1x1 für Start-ups - Convo Coworking - Tag der offenen Türe
Kundenservice 1x1 für Start-ups - Convo Coworking - Tag der offenen TüreKundenservice 1x1 für Start-ups - Convo Coworking - Tag der offenen Türe
Kundenservice 1x1 für Start-ups - Convo Coworking - Tag der offenen TüreUsersnap
 

Viewers also liked (20)

Graph Databases, The Web of Data Storage Engines
Graph Databases, The Web of Data Storage EnginesGraph Databases, The Web of Data Storage Engines
Graph Databases, The Web of Data Storage Engines
 
Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories
 
Relational to Big Graph
Relational to Big GraphRelational to Big Graph
Relational to Big Graph
 
Graph database super star
Graph database super starGraph database super star
Graph database super star
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Kaleja
KalejaKaleja
Kaleja
 
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseBringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
 
How to share useful data
How to share useful dataHow to share useful data
How to share useful data
 
GraphDevRoom Call for Sponsors
GraphDevRoom Call for SponsorsGraphDevRoom Call for Sponsors
GraphDevRoom Call for Sponsors
 
Cooking Software101
Cooking Software101Cooking Software101
Cooking Software101
 
Try NoSQL it doesn't hurts and is fun
Try NoSQL it doesn't hurts and is funTry NoSQL it doesn't hurts and is fun
Try NoSQL it doesn't hurts and is fun
 
Graph Databases introduction to rug-b
Graph Databases introduction to rug-bGraph Databases introduction to rug-b
Graph Databases introduction to rug-b
 
Multi-Relational Graph Structures: From Algebra to Application
Multi-Relational Graph Structures: From Algebra to ApplicationMulti-Relational Graph Structures: From Algebra to Application
Multi-Relational Graph Structures: From Algebra to Application
 
Rechtliche Aspekte des Social Media Marketing
Rechtliche Aspekte des Social Media MarketingRechtliche Aspekte des Social Media Marketing
Rechtliche Aspekte des Social Media Marketing
 
Enertec(Estudio Comercial)
Enertec(Estudio Comercial)Enertec(Estudio Comercial)
Enertec(Estudio Comercial)
 
Curriculum NESTOR MORENO
Curriculum NESTOR MORENOCurriculum NESTOR MORENO
Curriculum NESTOR MORENO
 
Learning Analytics in serious games
Learning Analytics in serious gamesLearning Analytics in serious games
Learning Analytics in serious games
 
Easy methods to Validate Email Addresses
Easy methods to Validate Email AddressesEasy methods to Validate Email Addresses
Easy methods to Validate Email Addresses
 
Kundenservice 1x1 für Start-ups - Convo Coworking - Tag der offenen Türe
Kundenservice 1x1 für Start-ups - Convo Coworking - Tag der offenen TüreKundenservice 1x1 für Start-ups - Convo Coworking - Tag der offenen Türe
Kundenservice 1x1 für Start-ups - Convo Coworking - Tag der offenen Türe
 

Similar to Graph Databases: Trends in the Web of Data

Skiena algorithm 2007 lecture10 graph data strctures
Skiena algorithm 2007 lecture10 graph data strcturesSkiena algorithm 2007 lecture10 graph data strctures
Skiena algorithm 2007 lecture10 graph data strctureszukun
 
Semantic Data Management in Graph Databases
Semantic Data Management in Graph DatabasesSemantic Data Management in Graph Databases
Semantic Data Management in Graph DatabasesMaribel Acosta Deibe
 
Lecture 5b graphs and hashing
Lecture 5b graphs and hashingLecture 5b graphs and hashing
Lecture 5b graphs and hashingVictor Palmar
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXBenjamin Bengfort
 
Slides Chapter10.1 10.2
Slides Chapter10.1 10.2Slides Chapter10.1 10.2
Slides Chapter10.1 10.2showslidedump
 
FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATION
FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATIONFREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATION
FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATIONcscpconf
 
Graph-terminology.pptx
Graph-terminology.pptxGraph-terminology.pptx
Graph-terminology.pptxsharlinE4
 
09 Inference for Networks – Exponential Random Graph Models (2017)
09 Inference for Networks – Exponential Random Graph Models (2017)09 Inference for Networks – Exponential Random Graph Models (2017)
09 Inference for Networks – Exponential Random Graph Models (2017)Duke Network Analysis Center
 
Lecture 2.3.1 Graph.pptx
Lecture 2.3.1 Graph.pptxLecture 2.3.1 Graph.pptx
Lecture 2.3.1 Graph.pptxking779879
 
08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)dnac
 
Discrete-Chapter 11 Graphs Part I
Discrete-Chapter 11 Graphs Part IDiscrete-Chapter 11 Graphs Part I
Discrete-Chapter 11 Graphs Part IWongyos Keardsri
 
graph representation.pdf
graph representation.pdfgraph representation.pdf
graph representation.pdfamitbhachne
 
Chapter9 graph data structure
Chapter9  graph data structureChapter9  graph data structure
Chapter9 graph data structureMahmoud Alfarra
 
Graph Introduction.ppt
Graph Introduction.pptGraph Introduction.ppt
Graph Introduction.pptFaruk Hossen
 
Fingerprint matching using ridge count
Fingerprint matching using ridge countFingerprint matching using ridge count
Fingerprint matching using ridge countAAKANKSHA JAIN
 

Similar to Graph Databases: Trends in the Web of Data (20)

Skiena algorithm 2007 lecture10 graph data strctures
Skiena algorithm 2007 lecture10 graph data strcturesSkiena algorithm 2007 lecture10 graph data strctures
Skiena algorithm 2007 lecture10 graph data strctures
 
Semantic Data Management in Graph Databases
Semantic Data Management in Graph DatabasesSemantic Data Management in Graph Databases
Semantic Data Management in Graph Databases
 
Lecture 5b graphs and hashing
Lecture 5b graphs and hashingLecture 5b graphs and hashing
Lecture 5b graphs and hashing
 
Spanningtreesppt
SpanningtreespptSpanningtreesppt
Spanningtreesppt
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
 
Slides Chapter10.1 10.2
Slides Chapter10.1 10.2Slides Chapter10.1 10.2
Slides Chapter10.1 10.2
 
FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATION
FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATIONFREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATION
FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATION
 
Graph-terminology.pptx
Graph-terminology.pptxGraph-terminology.pptx
Graph-terminology.pptx
 
09 Inference for Networks – Exponential Random Graph Models (2017)
09 Inference for Networks – Exponential Random Graph Models (2017)09 Inference for Networks – Exponential Random Graph Models (2017)
09 Inference for Networks – Exponential Random Graph Models (2017)
 
Lecture 2.3.1 Graph.pptx
Lecture 2.3.1 Graph.pptxLecture 2.3.1 Graph.pptx
Lecture 2.3.1 Graph.pptx
 
08 Exponential Random Graph Models (2016)
08 Exponential Random Graph Models (2016)08 Exponential Random Graph Models (2016)
08 Exponential Random Graph Models (2016)
 
08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)
 
Discrete-Chapter 11 Graphs Part I
Discrete-Chapter 11 Graphs Part IDiscrete-Chapter 11 Graphs Part I
Discrete-Chapter 11 Graphs Part I
 
Graphs data structures
Graphs data structuresGraphs data structures
Graphs data structures
 
Graph
GraphGraph
Graph
 
graph representation.pdf
graph representation.pdfgraph representation.pdf
graph representation.pdf
 
Chapter9 graph data structure
Chapter9  graph data structureChapter9  graph data structure
Chapter9 graph data structure
 
Graph Introduction.ppt
Graph Introduction.pptGraph Introduction.ppt
Graph Introduction.ppt
 
Fingerprint matching using ridge count
Fingerprint matching using ridge countFingerprint matching using ridge count
Fingerprint matching using ridge count
 
05 graph
05 graph05 graph
05 graph
 

More from Marko Rodriguez

mm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Virtual Machine/An Economic Machinemm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Virtual Machine/An Economic MachineMarko Rodriguez
 
mm-ADT: A Multi-Model Abstract Data Type
mm-ADT: A Multi-Model Abstract Data Typemm-ADT: A Multi-Model Abstract Data Type
mm-ADT: A Multi-Model Abstract Data TypeMarko Rodriguez
 
Open Problems in the Universal Graph Theory
Open Problems in the Universal Graph TheoryOpen Problems in the Universal Graph Theory
Open Problems in the Universal Graph TheoryMarko Rodriguez
 
Gremlin 101.3 On Your FM Dial
Gremlin 101.3 On Your FM DialGremlin 101.3 On Your FM Dial
Gremlin 101.3 On Your FM DialMarko Rodriguez
 
Gremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryGremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryMarko Rodriguez
 
Quantum Processes in Graph Computing
Quantum Processes in Graph ComputingQuantum Processes in Graph Computing
Quantum Processes in Graph ComputingMarko Rodriguez
 
ACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageMarko Rodriguez
 
The Gremlin Graph Traversal Language
The Gremlin Graph Traversal LanguageThe Gremlin Graph Traversal Language
The Gremlin Graph Traversal LanguageMarko Rodriguez
 
Faunus: Graph Analytics Engine
Faunus: Graph Analytics EngineFaunus: Graph Analytics Engine
Faunus: Graph Analytics EngineMarko Rodriguez
 
Solving Problems with Graphs
Solving Problems with GraphsSolving Problems with Graphs
Solving Problems with GraphsMarko Rodriguez
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataMarko Rodriguez
 
The Pathology of Graph Databases
The Pathology of Graph DatabasesThe Pathology of Graph Databases
The Pathology of Graph DatabasesMarko Rodriguez
 
Traversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinTraversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinMarko Rodriguez
 
The Path-o-Logical Gremlin
The Path-o-Logical GremlinThe Path-o-Logical Gremlin
The Path-o-Logical GremlinMarko Rodriguez
 
The Gremlin in the Graph
The Gremlin in the GraphThe Gremlin in the Graph
The Gremlin in the GraphMarko Rodriguez
 
Memoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to RedemptionMemoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to RedemptionMarko Rodriguez
 
A Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network ScienceA Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network ScienceMarko Rodriguez
 
The Network Data Structure in Computing
The Network Data Structure in ComputingThe Network Data Structure in Computing
The Network Data Structure in ComputingMarko Rodriguez
 
A Model of the Scholarly Community
A Model of the Scholarly CommunityA Model of the Scholarly Community
A Model of the Scholarly CommunityMarko Rodriguez
 

More from Marko Rodriguez (20)

mm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Virtual Machine/An Economic Machinemm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Virtual Machine/An Economic Machine
 
mm-ADT: A Multi-Model Abstract Data Type
mm-ADT: A Multi-Model Abstract Data Typemm-ADT: A Multi-Model Abstract Data Type
mm-ADT: A Multi-Model Abstract Data Type
 
Open Problems in the Universal Graph Theory
Open Problems in the Universal Graph TheoryOpen Problems in the Universal Graph Theory
Open Problems in the Universal Graph Theory
 
Gremlin 101.3 On Your FM Dial
Gremlin 101.3 On Your FM DialGremlin 101.3 On Your FM Dial
Gremlin 101.3 On Your FM Dial
 
Gremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryGremlin's Graph Traversal Machinery
Gremlin's Graph Traversal Machinery
 
Quantum Processes in Graph Computing
Quantum Processes in Graph ComputingQuantum Processes in Graph Computing
Quantum Processes in Graph Computing
 
ACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and Language
 
The Gremlin Graph Traversal Language
The Gremlin Graph Traversal LanguageThe Gremlin Graph Traversal Language
The Gremlin Graph Traversal Language
 
The Path Forward
The Path ForwardThe Path Forward
The Path Forward
 
Faunus: Graph Analytics Engine
Faunus: Graph Analytics EngineFaunus: Graph Analytics Engine
Faunus: Graph Analytics Engine
 
Solving Problems with Graphs
Solving Problems with GraphsSolving Problems with Graphs
Solving Problems with Graphs
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph Data
 
The Pathology of Graph Databases
The Pathology of Graph DatabasesThe Pathology of Graph Databases
The Pathology of Graph Databases
 
Traversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinTraversing Graph Databases with Gremlin
Traversing Graph Databases with Gremlin
 
The Path-o-Logical Gremlin
The Path-o-Logical GremlinThe Path-o-Logical Gremlin
The Path-o-Logical Gremlin
 
The Gremlin in the Graph
The Gremlin in the GraphThe Gremlin in the Graph
The Gremlin in the Graph
 
Memoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to RedemptionMemoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to Redemption
 
A Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network ScienceA Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network Science
 
The Network Data Structure in Computing
The Network Data Structure in ComputingThe Network Data Structure in Computing
The Network Data Structure in Computing
 
A Model of the Scholarly Community
A Model of the Scholarly CommunityA Model of the Scholarly Community
A Model of the Scholarly Community
 

Recently uploaded

COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServiceRenan Moreira de Oliveira
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdfJamie (Taka) Wang
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceMartin Humpolec
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 

Recently uploaded (20)

COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your Salesforce
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 

Graph Databases: Trends in the Web of Data

  • 1. Graph Databases: Trends in the Web of Data Marko A. Rodriguez Graph Systems Architect http://markorodriguez.com http://twitter.com/twarko http://slideshare.com/slidarko KRDB Trends in the Web of Data School - Brixen/Bressanone, Italy– September 18, 2010 September 18, 2010
  • 2. Abstract Relational databases are perhaps the most commonly used data management systems. In relational databases, data is modeled as a collection of disparate tables. In order to unify the data within these tables, a join operation is used. This operation is expensive as the amount of data grows. For information retrieval operations that do not make use of extensive joins, relational databases are an excellent tool. However, when an excessive amount of joins are required, the relational database model breaks down. In contrast, graph databases maintain one single data structure—a graph. A graph contains a set of vertices (i.e. nodes, dots) and a set of edges (i.e. links, lines). These elements make direct reference to one another, and as such, there is no notion of a join operation. The direct references between graph elements make the joining of data explicit within the structure of the graph. The benefit of this model is that traversing (i.e. moving between the elements of a graph in an intelligent, direct manner) is very efficient and yields a style of problem-solving called the graph traversal pattern. This session will discuss graph databases, the graph traversal programming pattern, and their use in solving real-world problems.
  • 3. Outline • Graph Structures, Algorithms, and Algebras • Graph Databases and the Property Graph • TinkerPop Open-Source Graph Product Suite • Real-Time, Real-World Use Cases for Graphs
  • 4. difficulty graphs algebra databases indices time data models Difficulty Chart software algorithms real-world conclusion
  • 5. Outline • Graph Structures, Algorithms, and Algebras • Graph Databases and the Property Graph • TinkerPop Open-Source Graph Product Suite • Real-Time, Real-World Use Cases for Graphs
  • 6. difficulty graphs algebra databases indices time data models Difficulty Chart software algorithms real-world conclusion
  • 7. G = (V, E)
  • 8. A Vertex There once was a vertex i ∈ V named tenderlove.
  • 9. Two Vertices And then came along another vertex j ∈ V named sixwing. Thus, i, j ∈ V .
  • 10. A Directed Edge Our tenderlove extended a relationship to sixwing. Thus, (i, j) ∈ E.
  • 11. The Single-Relational, Directed Graph More vertices join, create edges and, in turn, the graph grows...
  • 12. The Single-Relational, Directed Graph as a Matrix A single-relational graph defined as G = (V, E ⊆ (V × V )) can be represented as the adjacency matrix A ∈ {0, 1}n×n, where 1 if (i, j) ∈ E Ai,j = 0 otherwise.
  • 13. The Single-Relational, Directed Graph as a Matrix 0 1 1 0 1 0 0 1 1 0 0 0 0 1 0 0 G A
  • 14. The Single-Relational, Directed Graph • All vertices are homogenous in meaning—all vertices denote the same type of object (e.g. people, webpages, etc.).1 • All edges are homogenous in meaning—all edges denote the same type of relationships (e.g. friendship, works with, etc.).2 1 This is not completely true. All n-partite single-relational graphs allow for the division of the vertex set into n subsets, where V = n Ai : Ai ∩ Aj = ∅. Thus, its possible to implicitly type the vertices. i 2 This is not completely true. There exists an injective, information-preserving function that maps any multi-relational graph to a single-relational graph, where edge types are denoted by topological structures. Thus, at a “higher-level,” it is possible to create a heterogenous set of relationships. Rodriguez, M.A., “Mapping Semantic Networks to Undirected Networks,” International Journal of Applied Mathematics and Computer Sciences, 5(1), pp. 39–42, 2009. [http://arxiv.org/abs/0804.0277]
  • 15. Applications of Single-Relational Graphs • Social: define how people interact (collaborators, friends, kins). • Biological: define how biological components interact (protein, food chains, gene regulation). • Transportation: define how cities are joined by air and road routes. • Dependency: define how software modules, data sets, functions depend on each other. • Technology: define the connectivity of Internet routers, web pages, etc. • Language: define the relationships between words.
  • 16. The Limitations of Single-Relational Graph Modeling Friendship Graph Favorite Graph Works-For Graph Unfortunately, single-relational graphs are independent of each other. This is because G = (V, E)—there is only a single edge set E (i.e. a single type of relation).
  • 17. Numerous Algorithms for Single-Relational Graphs We would like a more flexible graph modeling construct, but unfortunately, most of our graph algorithms were designed for single-relational graphs.3 • Geodesic: diameter, radius, eccentricity, closeness, betweenness, etc. • Spectral: random walks, PageRank, eigenvector centrality, spreading activation, etc. • Assortativity: scalar, categorical, hierarchal, etc. • Others: ...4 3 For a fine book on graph analysis algorithms, please see: Brandes, U., Erlebach T., “Network Analysis: Methodological Foundations,” edited book, Springer, 2005. 4 One of the purposes of this presentation is advocate for local graph analysis algorithms (i.e. priors-based, relative) vs. global graph analysis algorithms. Most popular graph analysis algorithms are global in that they require an analysis of the whole graph (or a large portion of a graph) to yield results. Local analysis algorithms are dependent on sub-graphs of the whole and in effect, can boast faster running times.
  • 18. How do we solve this? A multi-relational graph and a path algebra.
  • 19. G = (V, E)
  • 21. A Directed, Labeled Edge friend Lets specify the type of relationship that exists between tenderlove and sixwing. Thus, (i, j) ∈ Efriend.
  • 22. Growing a Multi-Relational Graph friend friend Lets make the friendship relationship symmetric. Thus, (j, i) ∈ Efriend.
  • 23. Growing a Multi-Relational Graph friend friend friend friend Lets add marko to the mix: k ∈ V . This graph is still single-relational. There is only one type of relation.
  • 24. Growing a Multi-Relational Graph friend friend favorite friend friend Lets add an (i, l) ∈ Efavorite. Now there are multiple types of relationships: Efriend and Efavorite (2 edge sets).
  • 25. The Multi-Relational, Directed Graph • At this point, there is a multi-relational, directed graph: G = (V, E), where E = (E0, E1, . . . , Em ⊆ (V × V )).5 • Vertices can denote different types of objects (e.g. people, places).6 • Edge can denote different types of relationships (e.g. friend, favorite).7 5 Another representation is G ⊆ (V × Ω × V ), where Ω ⊆ Σ∗ is the set of legal edge labels. 6 Vertex types can be determined by the domain and range specification of the respective edge relation/label/predicate. Or, another way, by means of an explicit typing relation such as a, type, b . 7 Edge types are determined by the label that accompanies the edge.
  • 26. The Multi-Relational, Directed RDF Graph • This is the data model of the Web of Data—the RDF data model. • The RDF data model’s vertex set is split into URIs (U ), literals (L), and blank/anonymous nodes (B), such that: G ⊆ ((U × B) × U × (U × B × L)).8 8 Named graphs are a popular extension to the RDF data model. There are various serializatons such as TriX FIND and Trig FIND. However, for the sake of brevity, this presentation will not discuss named graphs.
  • 27. The Multi-Relational, Directed Graph as a Tensor A three-way tensor can be used to represent a multi-relational graph. If G = (V, E = {E0, E1, . . . , Em ⊆ (V × V )}) is a multi-relational graph, then A ∈ {0, 1}n×n×m and 1 if (i, j) ∈ Em : 1 ≤ k ≤ m Ak i,j = 0 otherwise. Thus, each edge set in E represents an adjacency matrix and the combination of m adjacency matrices forms a 3-way tensor.
  • 28. The Multi-Relational, Directed Graph as a Tensor friend 0 0 0 0 0 0 0 1 friend favorite 0 0 0 0 0 0 0 0 s er sw nd an e ite fri or G A v fa
  • 29. Multi-Relational Graph Algorithms “Can we evaluate single-relational graph analysis algorithms on a multi-relational graph?”
  • 30. The Meaning of Edge Meanings loves loves loves hates hates hates loves loves hates hates • Multi-relationally: tenderlove is more liked than marko. • Single-relationally: tenderlove and marko simply have the same in-degree. Given, lets say, degree-centrality, tenderlove and marko are equal as they have the same number of relationships. The edge labels do not effect the output of the degree-centrality algorithm.
  • 31. What Do You Mean By “Central?” answer ... answer_for ite or v What is your favorite fa answer_by bookstore? favorite question_by ... friend friend friend Lets focus specifically on centrality. What is the most central vertex in a multi-relational graph? Who is the most central friend in the graph—by friendship, by question answering, by favorites, etc?
  • 32. Primary Eigenvector “What does the primary eigenvector of a multi-relational graph mean?”91011 9 We will use the primary eigenvector for the following argument. Note that the same argument applies for all known single-relational graph algorithms (i.e. geodesic, spectral, community detection, etc.). 10 Technical details are left aside such as outgoing edge probability distributions and the irreducibility of the graph. 11 The popular PageRank vector is defined as the primary eigenvector of a low-probability fully connected graph combined with the original graph (i.e. both graphs maintain the same V ).
  • 33. Primary Eigenvector: Ignoring Edge Labels |V |×|V | • If π = Bπ, where B ∈ N+ is the adjacency matrix formed by merging the edge sets in E, then edge labels are ignored—all edges are treated equally. • In this “ignoring labels”-model, there is only one primary eigenvector for the graph—one definition of centrality. • With a heterogenous set of vertices connected by a heterogenous set of edges, what does this type of centrality mean?
  • 34. Primary Eigenvector: Isolating Subgraphs • Are there other primary eigenvectors in the multi-relational graph? • You can ignore certain edge sets and calculate the primary eigenvector (e.g. pull out the single-relational “friend”-graph.) π = Afriendπ, where Afriend ∈ {0, 1}|V |×|V | is the adjacency matrix formed by the edge set Efriend. • Thus, you can isolate subgraphs (i.e. adjacency matrices) of the multi-relational graph and calculate the primary eigenvector for those subgraphs. • In this “isolation”-model, there are m definitions of centrality—one for each isolated subgraph.12 12 Remember, A ∈ {0, 1}n×n×m .
  • 35. Ultimately what we want is...
  • 36. Primary Eigenvector: Turing Completeness • What about using paths through the graph—not simply explicit one-step edges? • What about determining centrality for a relation that isn’t explicit in E (i.e. Ak ∈ A)? In general, what about π = Xπ, where X is a derived adjacency matrix of the multi-relational graph. For example, if I know who everyone’s friends are, then I know (i.e. can infer, derive, compute) who everyone’s friends-of-a-friends (FOAF) are. What about the primary eigenvector of the derived FOAF graph? • In the end, you want a Turing-complete framework—you want complete control (universal computability) over how π moves through the multi-relational graph structure.13 13 These ideas are expounded upon at great length throughout this presentation.
  • 37. A Path Algebra for Evaluating Single-Relational Algorithms on Multi-Relational Graphs • There exists a multi-relational graph algebra for mapping single-relational graph analysis algorithms to the multi-relational domain.14 • The algebra works on a tensor representation of a multi-relational graph. • In this framework and given the running example, there are as many primary eigenvectors as there are abstract path definitions. 14 * Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network Analysis Algorithms,” Journal of Informetrics, 4(1), pp. 29–41, doi:10.1016/j.joi.2009.06.004, 2009. [http://arxiv.org/abs/0806.2274] * Rodriguez, M.A., “Grammar-Based Random Walkers in Semantic Networks,” Knowledge-Based Systems, 21(7), pp. 727–739, doi:10.1016/j.knosys.2008.03.030, 2008. [http://arxiv.org/abs/0803.4355] * Rodriguez, M.A., Watkins, J.,“Grammar-Based Geodesics in Semantic Networks,” Knowledge-Based Systems, in press, doi:10.1016/j.knosys.2010.05.009, 2010.
  • 38. The Operations of the Multi-Relational Path Algebra • A · B: ordinary matrix multiplication determines the number of (A, B)- paths between vertices. • A : matrix transpose inverts path directionality. • A ◦ B: Hadamard, entry-wise multiplication applies a filter to selectively exclude paths. • n(A): not generates the complement of a {0, 1}n×n matrix. • c(A): clip generates a {0, 1}n×n matrix from a Rn×n matrix. + • v ±(A): vertex generates a {0, 1}n×n matrix from a Rn×n matrix, where + only certain rows or columns contain non-zero values. • xA: scalar multiplication weights the entries of a matrix. • A + B: matrix addition merges paths.
  • 39. Primary Eigenvectors in a Multi-Relational Graph • Friend: Afriend π 2 • FOAF: Afriend · Afriend π ≡ Afriend π 2 • FOAF (no self): Afriend ◦ n(I) π 15 2 • FOAF (no friends nor self): Afriend ◦ n Afriend ◦ n(I) π • Co-Worker: Aworks at · Aworks at ◦ n (I) π • Friend-or-CoWorker: 0.65Afriend + 0.35 Aworks at · Aworks at ◦ n ( I) π • ...and more.16 15 I ∈ {0, 1}|V |×|V | : Ii,i = 1—the identity matrix. 16 Note, again, that the examples are with respect to determining the primary eigenvector of the derived adjacency matrix. The same argument holds for all other single-relational graph analysis algorithms. In general, the path algebra provides a means of creating “higher-order” (i.e. semantically-rich) single-relational graphs from a single multi-relational graph. Thus, these derived matrices can be subjected to standard single-relational graph analysis algorithms.
  • 40. Deriving “Semantically Rich” Adjacency Matrices 0 0 0 0 0 0 0 0 = 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 ∪ 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 s an f) d er n 0 0 0 0 sw se ie nd fri rs o -fr an e e ite fri (n -of d sw l en A Afriend · A friend or nd ◦ n(I) A v e e fa rit fri vo fa 2 Afriend ◦ n(I) friend-of-a-friend (no self) Use the multi-relational graph to generate explicit edges that were implicitly defined as paths. Those new explicit edges can then be memoized17 and re-used (time vs. space tradeoff)—aka path reuse. 17 Memoization Wikipedia entry: http://en.wikipedia.org/wiki/Memoization.
  • 41. Benefits, Drawbacks, and Future of the Path Algebra • Benefit: Provides a set of theorems for deriving equivalences and thus, provides the foundation for graph traversal engine optimizers.18 Serves a similar purpose as the relational algebra for relational databases.19 • Drawback: The algebra is represented in matrix form and thus, operationally, works globally over the graph.20 • Future: A non-matrix-based, ring theoretic model of graph traversal that supports +, −, and · on individual vertices and edges. The Gremlin [http://gremlin.tinkerpop.com] graph traversal engine presented later provides the implementation before a fully-developed theory. 18 Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network Analysis Algorithms,” Journal of Informetrics, 4(1), pp. 29–41, 2009. [http://arxiv.org/abs/0806.2274] 19 Codd, E.F., “A Relational Model of Data for Large Shared Data Banks,” Communications of the ACM, 13(6), pp. 377–387, doi:10.1145/362384.362685, 1970. 20 It is possible to represent local traversals using vertex filters at the expense of clumsy notation.
  • 42. Outline • Graph Structures, Algorithms, and Algebras • Graph Databases and the Property Graph • TinkerPop Open-Source Graph Product Suite • Real-Time, Real-World Use Cases for Graphs
  • 43. difficulty graphs algebra databases indices time data models Difficulty Chart software algorithms real-world conclusion
  • 44. The Simplicity of a Graph • A graph is a simple data structure. • A graph states that something is related to something else (the foundation of any other data structure).21 • It is possible to model a graph in various types of databases.22 Relational database: MySQL, Oracle, PostgreSQL JSON document database: MongoDB, CouchDB XML document database: MarkLogic, eXist-db etc. 21 A graph can be used to represent other data structures. This point becomes convenient when looking beyond using graphs for typical, real-world domain models (e.g. friends, favorites, etc.), and seeing their applicability in other areas such as modeling code (e.g. http://arxiv.org/abs/0802.3492), indices, etc. 22 For the sake of diagram clarity, the examples to follow are with respect to a single-relational, directed graph. Note that it is possible to model multi-relational graphs in these types of database as well.
  • 45. Representing a Graph in a Relational Database outV | inV ------------ A A | B A | C C | D B C D | A D
  • 46. Representing a Graph in a JSON Database { A : { outE : [B, C] A } B : { outE : [] } B C C : { outE : [D] } D : { outE : [A] D } }
  • 47. Representing a Graph in an XML Database graphml graph node id=A / A node id=B / node id=C / node id=D / edge source=A target=B / edge source=A target=C / B C edge source=C target=D / edge source=D target=A / /graph /graphml D
  • 48. Defining a Graph Database “If any database can represent a graph, then what is a graph database?”
  • 49. Defining a Graph Database A graph database is any storage system that provides index-free adjacency.2324 23 There is no “official” definition of what makes a database a graph database. The one provided is my definition (respective of the influence of my collaborators in this area). However, hopefully the following argument will convince you that this is a necessary definition. Given that any database can model a graph, such a definition would not provide strict enough bounds to yield a formal concept (i.e. ). 24 There is adjacency between the elements of an index, but if the index is not the primary data structure of concern (to the developer), then there is indirect/implicit adjacency, not direct/explicit adjacency. A graph database exposes the graph as an explicit data structure (not an implicit data structure).
  • 50. Defining a Graph Database by Example Toy Graph Gremlin (stuntman) B E A C D
  • 51. Graph Databases and Index-Free Adjacency B E A C D • Our gremlin is at vertex A. • In a graph database, vertex A has direct references to its adjacent vertices. • Constant time cost to move from A to B and C . It is dependent upon the number of edges emanating from vertex A (local).
  • 52. Graph Databases and Index-Free Adjacency B E A C D The Graph (explicit)
  • 53. Graph Databases and Index-Free Adjacency B E A C D The Graph (explicit)
  • 54. Non-Graph Databases and Index-Based Adjacency B E A B C A B,C E D,E D E C D • Our gremlin is at vertex A.
  • 55. Non-Graph Databases and Index-Based Adjacency B E A B C A B,C E D,E D E C D • In a non-graph database, the gremlin needs to look at an index to determine what is adjacent to A. • log2(n) time cost to move to B and C . It is dependent upon the total number of vertices and edges in the database (global).
  • 56. Non-Graph Databases and Index-Based Adjacency B E A B C A B,C E D,E D E C D The Index (explicit) The Graph (implicit)
  • 57. Non-Graph Databases and Index-Based Adjacency B E A B C A B,C E D,E D E C D The Index (explicit) The Graph (implicit)
  • 58. Index-Free Adjacency • While any database can implicitly represent a graph, only a graph database makes the graph structure explicit.25 • In a graph database, each vertex serves as a “mini index” of its adjacent elements.26 • Thus, as the graph grows in size, the cost of a local step remains the same.27 25 Please see http://markorodriguez.com/Blarko/Entries/2010/3/29_MySQL_vs._Neo4j_on_a_ Large-Scale_Graph_Traversal.html for some performance characteristics of graph traversals in a relational database (MySQL) and a graph database (Neo4j). 26 Each vertex can be intepreted as a “parent node” in an index with its children being its adjacent elements. In this sense, traversing a graph is analogous in many ways to traversing an index—albeit the graph is not an acyclic connected graph (tree). (a vision espoused by Craig Taverner) 27 A graph, in many ways, is like a distributed index.
  • 59. Graph Databases Do Make Use of Indices A B C } Index of Vertices (by id) D E } The Graph • There is more to the graph than the explicit graph structure. • Indices index the vertices by their properties (e.g. ids, name, latitude).28 28 Graph databases can be used to create index structures. In fact, in the early days of Neo4j, Neo4j used its own graph structure to index the properties of its vertices—a graph indexing a graph. A thought iterated many times over by Craig Taverner who is interested in graph databases for geo-spatial indexing/analysis.
  • 60. The Patterns of a Relational Database • In a relational database, operations are conceptualized set- theoretically with the joining of tuple structures being the means by which normalized/separated data is associated.
  • 61. The Pattern of a Graph Databases • In a graph database, operations are conceptualized graph- theoretically with paths over edges being the means by which non-adjacent/separated vertices are associated.29 29 Rodriguez, M.A., Neubauer, P., “The Graph Traversal Pattern,” ATTi and NeoTechnology Technical Report, currently in review, 2010. [http://arxiv.org/abs/1004.1001]
  • 62. What About Triple/Quad Stores? • In a triple/quad store, operations are conceptualized set- theoretically. pattern matching (e.g. SPARQL): ?pattern inferencing (e.g. RDFS, OWL): ?pattern =⇒ triples. • In many implementations, the triple/quad store make use of indices that combine subjects (?s), predicates (?p), and objects (?o).
  • 63. Triple/Quad Stores, Graph Theory, and the Web of Data • The triple/quad store rides an interesting boundary between a relational and graph database — though its seen more set theoretically. This is because, I believe, RDF/Web of Data is not presented/taught in terms of graphs and graph theoretic operations.
  • 64. Graph Databases and the Web of Data • In theory and ignoring performance, index and index-free models have the same expressivity and allow for the same manipulations. But such theory does not determine intention and the mental ruts that any approach engrains. • Can the graph traversal pattern become a staple in the Web of Data? Formulate SPARQL pattern matching in terms of traversing. Formulate inference in terms of traversing. Take advantage of graph theoretic models of data processing.
  • 65. Outline • Graph Structures, Algorithms, and Algebras • Graph Databases and the Property Graph • TinkerPop Open-Source Graph Product Suite • Real-Time, Real-World Use Cases for Graphs
  • 66. difficulty graphs algebra databases indices time data models Difficulty Chart software algorithms real-world conclusion
  • 67. TinkerPop: Making Stuff for the Fun of It • Open source software group started in 2008 focusing on graph data structures, graph query engines, graph-based programming languages, and, in general, tools and techniques for working with graphs. [http://tinkerpop.com] [http://github.com/tinkerpop] Current members: Marko A. Rodriguez (ATTi), Peter Neubauer (NeoTechnology), Joshua Shinavier (Rensselaer Polytechnic Institute), and Pavel Yaskevich (“I am no one from nowhere”).
  • 68. TinkerPop Productions • Blueprints: Data Models and their Implementations [http://blueprints.tinkerpop.com] • Pipes: A Data Flow Framework using Process Graphs [http://pipes.tinkerpop.com] • Gremlin: A Graph-Based Programming Language [http://gremlin.tinkerpop.com] • Rexster: A RESTful Graph Shell [http://rexster.tinkerpop.com] Wreckster: A Ruby API for Rexster [http://github.com/tenderlove/wreckster] There are other TinkerPop products (e.g. Ripple, LoPSideD, TwitLogic, etc.), but for the purpose of this presentation, only the above will be discussed.
  • 69. Blueprints: Data Models and their Implementations Blueprints • Blueprints is the like the JDBC of the graph database community. • Provides a Java-based interface API for the property graph data model. Graph, Vertex, Edge, Index. • Provides implementations of the interfaces for TinkerGraph, Neo4j, OrientDB, Sails (e.g. AllegroSail, Neo4jSail), and soon (hopefully) others such as InfiniteGraph, InfoGrid, Sones, and HyperGraphDB.30 30 HyperGraphDB makes use of an n-ary graph structure known as a hypergraph. Blueprints, in its current form, only supports the more common binary graph.
  • 70. Pipes: A Data Flow Framework using Process Graphs Pipes • A dataflow framework with support for Blueprints-based graph processing. • Provides a collection of “pipes” (implement Iterable and Iterator) that are connected together to form processing pipelines. Filters: ComparisonFilterPipe, RandomFilterPipe, etc. Traversal: VertexEdgePipe, EdgeVertexPipe, PropertyPipe, etc. Splitting/Merging: CopySplitPipe, RobinMergePipe, etc. Logic: OrPipe, AndPipe, etc.
  • 71. Gremlin: A Graph-Based Programming Language Gremlin G = (V, E) • A Turing-complete, graph-based programming language that compiles Gremlin syntax down to Pipes (implements JSR 223). • Support various language constructs: :=, foreach, while, repeat, if/else, function and path definitions, etc. ./outE[@label=‘friend’]/inV ./outE[@label=‘friend’]/inV/outE[@label=‘friend’]/inV[g:except($ )] g:key(‘name’,‘Aaron Patterson’)[0]/outE[@label=‘favorite’]/inV/@name
  • 72. Rexster: A RESTful Graph Shell reXster • Allows Blueprints graphs to be exposed through a RESTful API (HTTP). • Supports stored traversals written in raw Pipes or Gremlin. • Supports adhoc traversals represented in Gremlin. • Provides “helper classes” for performing search-, score-, and rank-based traversal algorithms—in concert, support for recommendation. • Aaron Patterson (ATTi) maintains the Ruby connector Wreckster.
  • 73. Typical TinkerPop Graph Stack GET http://{host}/{resource} Neo4j NativeStore TinkerGraph
  • 74. Outline • Graph Structures, Algorithms, and Algebras • Graph Databases and the Property Graph • TinkerPop Open-Source Graph Product Suite • Real-Time, Real-World Use Cases for Graphs
  • 75. difficulty graphs algebra databases indices time data models Difficulty Chart software algorithms real-world conclusion
  • 76. Using Graphs in Real-Time Systems • Most popular graph algorithms require global graph analysis. Such algorithms compute a score, a vector, etc. given the structure of the whole graph. Moreover, many of these algorithms have large running times: O(|V | + |E|), O(|V | log |V |), O(|V |2), etc. • Many real-world situations can make use of local graph analysis.31 Search for x starting from y. Score x given its local neighborhood. Rank x relative to y. Recommend vertices to user x. 31 Many web applications are “ego-centric” in that they are with respect to a particular user (the user logged in). In such scenarios, local graph analysis algorithms are not only prudent to use, but also, beneficial in that they are faster than global graph analysis algorithms. Many of the local analysis algorithms discussed run in the sub-second range (for graphs with “natural” statistics).
  • 77. Applications of Graph Databases and Traversal Engines: Searching, Scoring, and Ranking ˆ • Searching: given a power multi-set of vertices (P(V )) and a path description (Ψ), return the vertices at the end of that path.32 ˆ ˆ P(V ) × Ψ → P(V ) • Scoring: given some vertices and a path description, return a score. ˆ P(V ) × Ψ → R • Ranking: given some vertices and a path description, return a map of scored vertices. ˆ P(V ) × Ψ → (V × R) 32 Use cases need not be with respect to vertices only. Edges can be searched, scored, and ranked as well. However, in order to express the ideas as simply as possible, all discussion is with respect to vertices.
  • 78. Applications of Graph Databases and Traversal Engines: Recommendation • Recommendation: searching, scoring, and ranking can all be used as components of a recommendation. Thus, recommendation is founded on these more basic ideas. Recommendation aids the user by allowing them to make “jumps” through the data. Items that are not explicitly connected, are connected implicitly through recommendation (through some abstract path Ψ). • The act of recommending can be seen as an attempt to increase the density of the graph around a user’s vertex. For example, recommending user i ∈ V places to visit U ⊂ V , will hopefully lead to edges of the form i, visited, j : ∀j ∈ U .33 33 A standard metric for recommendation quality is seen as how well it predicts the user’s future behavior. That is, does it predict an edge.
  • 79. There Is More Than “People Who Like X Also Like Y .” • A system need not be limited to one type of recommendation. With graph-based methods, there are as many recommendations as there are abstract paths. • Use recommendation to aid the user in solving problems (i.e. computationally derive solutions for which your data set is primed for). Examples below are with respect to problem-solving in the scholarly community.34 Recommend articles to read. (articles) Recommend collaborators to work on an idea/article with. (people) Recommend a venue to submit the article to. (venues) Recommend an editor referees to review the article. (people)35 Recommend scholars to talk to and concepts to talk to them about at the venue. (people and tags) 34 Rodriguez, M.A., Allen, D.W., Shinavier, J., Ebersole, G., “A Recommender System to Support the Scholarly Communication Process,” KRS-2009-02, 2009. [http://arxiv.org/abs/0905.1594] 35 Rodriguez, M.A., Bollen, J., “An Algorithm to Determine Peer-Reviewers,” Conference on Information and Knowledge Management (CIKM), pp. 319–328, doi:10.1145/1458082.1458127, 2008. [http: //arxiv.org/abs/cs/0605112]
  • 80. Real-Time, Domain-Specific, Graph-Based, Problem-Solving Engine Ψ5 Ψ1 Real-Time + Ψ4 Ψn Ψ2 Ψ3 = Domain-Specific Graph-Based Problem-Solving Engine Library of Path/Traversal Expressions Graph Data Set Your domain model (i.e. graph dataset) determines what traversals you can design, develop, and deploy. Together, these determine which types of problems you can solve automatically/computationally for yourself, your users.
  • 81. Applicable in Various, Seemingly Diverse Areas • Applications to a techno-social government (i.e. collective decision making systems).36 0.20 correct decisions 0.00 0.05 0.10 0.15 0.95 direct democracy dynamically distributed democracy 0.80 proportion oferror 0.65 dynamically distributed democracy direct democracy 0.50 100 90 80 70 60 50 40 30 20 10 100 90 80 70 60 50 40 30 20 10 0 0 percentage of active citizens percentage of active citizens (n) 36 Fig. 5. The relationship between k and evote for direct democracy (gray * Rodriguez, M.A., Watkins, J.H., “Revisiting the Age of Enlightenment from a Collective The plot provides line) and dynamically distributed democracy (black line). Decision Making Systems k the proportion of identical, correct decisions over a simulation that was run Perspective,” First Monday, 14(8), 2009. [http://arxiv.org/abs/0901.3929] with 1000 artificially generated networks composed of 100 citizens each. Fig. 6. A visualization of a network of t * Rodriguez, M.A., “Social Decision Making with Multi-Relational Networks and Grammar-Based Particle Swarms,” color denotes their “political tenden citizen’s Hawaii International Conference on Systems Science (HICSS), pp. 39–49, 2007. [http://arxiv.org/abs/cs/0609034] is 1, and layout. is 0.5. purple The layout algori As previously stated, let x ∈ [0, 1]n denote the political Reingold * Rodriguez, M.A., Steinbock, D.J., “A Social Network for Societal-Scale each citizen in this population, where xi is the of the North tendency of Decision-Making Systems,” Proceedings tendency of citizen i and, for the purpose of simulation, is American Association for Computational Social and Organizational Science Conference, 2004. [http://arxiv.org/abs/cs/ determined from a uniform distribution. Assume that every 1 n “vote power” and this is represe 0412047] citizen in a population of n citizens uses some social network- such that the total amount of vote based system to create links to those individuals that they 1. Let y ∈ Rn denote the total amo + believe reflect their tendency the best. In practice, these links flowed to each citizen over the cours may point to a close friend, a relative, or some public figure a ∈ {0, 1}n denotes whether citizen whose political tendencies resonate with the individual. In in the current decision making pro other words, representatives are any citizens, not political values of a are biased by an unfair candidates that serve in public office. Let A ∈ [0, 1]n×n denote of making the citizen an active parti the link matrix representing the network, where the weight of the citizen inactive. The iterative alg an edge, for the purpose of simulation, is denoted where ◦ denotes entry-wise multip 1 − |xi − xj | if link exists
  • 82. A detour into the property graph data model...
  • 83. Property Graphs and Graph Databases • Most graph databases support a graph data model known as a property graph. • A property graph is a directed, attributed, multi-relational graph. In other words, vertices and edges are equipped with a collection of key/value pairs.37 37 Rodriguez, M.A., Neubauer, P., “Constructions from Dots and Lines,” Bulletin of the American Society for Information Science and Technology, American Society for Information Science and Technology, 2010. [http://arxiv.org/abs/1006.2361]
  • 84. From a Multi-Relational Graph... friend friend favorite friend friend
  • 85. ...to a Property Graph name=marko location=Santa Fe lat=11111 gender=male long=22222 created_at=123456 friend friend favorite name=sixwing location=West Hollywood gender=male created_at=234567 friend friend created_at=234567
  • 86. Why the Property Graph Model? • Standard single-relational graphs do not provide enough modeling flexibility for use in real-world situations.38 • Multi-relational graphs do and the Web of Data (RDF) world demonstrates this to be the case in practice. • Property graphs are perhaps more practical because not every datum needs to be “related” (e.g. age, name, etc.). Thus, the edge and key/value model is a convenient dichotomy.39 • Property graphs provide finer-granularity on the meaning of an edge as the key/values of an edge add extra information beyond the edge label. 38 This is not completely true—researchers use the single-relational graph all the time. However, in most data rich applications, its limiting to work with a single edge type and a homogenous population of vertices. 39 RDF has a similar argument in that literals can only be the object of a triple. However, in practice, when represented in a graph database, there is a single literal vertex denoting that literal and thus, is traversable like any other vertex.
  • 87. Graph Type Morphisms weighted graph add weight attribute property graph remove attributes remove attributes no op labeled graph no op semantic graph no op directed graph remove edge labels remove edge labels make labels URIs no op remove directionality rdf graph multi-graph remove loops, directionality, and multiple edges simple graph no op undirected graph
  • 88. Toy Graph Dataset lat=11111 long=22222 name=marko created_at=123456 4 name=sixwing location=West Hollywood location=Santa Fe gender=male favorite gender=male friend friend 1 2 3 favorite created_at=234567 friend favorite 6 name=Bryce Canyon favorite 5 name=charlie We will use the toy-graph above to demonstrate Gremlin (to introduce the syntax).
  • 89. Dataset Schema in Neo4j Neo4j [http://neo4j.org] is a “schema-less” database. However, ultimately, data is represented according to some schema whether that schema be explicit in the database, in the code interacting with the database, or in the developer’s head.40 Please note the schema diagrammed below is a non-standard convention.41 name=string name=string location=string lat=double gender=string long=double type=Person type=Place Person Place friend favorite 40 A better term for “schema-less” might have been “dynamic schema.” 41 For expressive, standardized graph-based schema languages, refer to RDFS [http://www.w3.org/TR/ rdf-schema/] and OWL [http://www.w3.org/TR/owl-features/] of the Web of Data community.
  • 90. Dataset Schema in MySQL CREATE TABLE friend ( outV INT NOT NULL, inV INT NOT NULL); CREATE INDEX friend_outV_index USING BTREE ON friend (outV); CREATE INDEX friend_inV_index USING BTREE ON friend (inV); CREATE TABLE favorite ( outV INT NOT NULL, inV INT NOT NULL); CREATE INDEX favorite_outV_index USING BTREE ON favorite (outV); CREATE INDEX favorite_inV_index USING BTREE ON favorite (inV); CREATE TABLE metadata ( vertex INT NOT NULL, _key VARCHAR(100) NOT NULL, _value VARCHAR(100), PRIMARY KEY (vertex, _key)); CREATE INDEX metadata_vertex_index USING BTREE ON metadata (vertex); CREATE INDEX metadata_key_index USING BTREE ON metadata (_key); CREATE INDEX metadata_value_index USING BTREE ON metadata (_value);
  • 91. Basic Gremlin gremlin (1 + 2) * 4 div 5 ==2.4 gremlin marko + a. + rodriguez ==marko a. rodriguez gremlin func ex:add-one($x) $x + 1 end gremlin foreach $y in g:list(1,2,3,4) g:print(ex:add-one($y)) end 2 3 4 5
  • 92. Searching Example: Friends gremlin $_g := neo4j:open(‘/data/mygraph’) name=marko location=Santa Fe lat=11111 gremlin $_ := g:id-v(1) gender=male long=22222 ==v[1] gremlin . ==v[1] 3 4 gremlin ./outE created_at=123456 ==e[10][1-friend-2] friend favorite name=sixwing ==e[11][1-friend-3] location=West Hollywood ==e[12][1-favorite-4] gender=male gremlin ./outE[@label=‘friend’]/inV/@name friend ==sixwing 1 2 ==marko gremlin ./outE[@label=‘friend’]/inV/@gender favorite favorite ==male created_at=234567 friend ==male gremin ./outE[@label=‘friend’] 6 /inV[@location=‘Santa Fe’]/@name name=Bryce Canyon favorite ==marko 5 name=charlie
  • 93. Searching Example: Friends in SPARQL The name of tenderlove’s friends... SELECT ?y WHERE { ex:tenderlove ex:friend ?x . ?x ex:name ?y } The gender of tenderlove’s friends... SELECT ?y WHERE { ex:tenderlove ex:friend ?x . ?x ex:gender ?y } The name of tenderlove’s friends who live in Santa Fe... SELECT ?y WHERE { ex:tenderlove ex:friend ?x . ?x ex:livesIn ex:SantaFe . ?x ex:name ?y }
  • 94. Searching Example: FOAF (No Friends, No Self) gremlin . name=marko location=Santa Fe lat=11111 ==v[1] gender=male long=22222 gremlin ./outE[@label=‘friend’]/inV /outE[@label=‘friend’]/inV ==v[1] 3 4 ==v[1] created_at=123456 ==v[5] friend favorite name=sixwing gremlin (./outE[@label=‘friend’] location=West Hollywood /inV)[g:assign($x)] gender=male /outE[@label=‘friend’] friend /inV[g:except($_)][g:except($x)] 1 2 /@name ==charlie favorite favorite created_at=234567 friend 6 name=Bryce Canyon favorite 5 name=charlie
  • 95. Searching Example: FOAF (No Friends, No Self) in SPARQL The name of tenderlove’s friends’ friends who are not him or his friends. SELECT ?z WHERE { ex:tenderlove ex:friend ?x . ?x ex:friend ?y . ?y ex:name ?z . FILTER { ?y != ex:tenderlove AND ?x != ?y }}
  • 96. Searching Example: Friend’s Favorites gremlin . name=marko location=Santa Fe lat=11111 ==v[1] gender=male long=22222 gremlin ./outE[@label=‘friend’]/inV /outE[@label=‘favorite’]/inV ==v[6] 3 4 ==v[6] created_at=123456 gremlin ./outE[@label=‘friend’]/inV friend favorite name=sixwing /outE[@label=‘favorite’ and @created_at234500] location=West Hollywood /inV/@name gender=male ==Bryce Canyon friend 1 2 favorite favorite created_at=234567 friend 6 name=Bryce Canyon favorite 5 name=charlie
  • 97. Loading Identical Data into MySQL and Neo4j On my laptop. 10,000,000 edges are created between 100,000 vertices. Random assignment with 50% favorite-edges and 50% friend-edges. This is a dense, relatively unnatural graph—everyone is heavily connected.42 42 The largest Neo4j instance that I know of contained 100,030,002 (100 million) vertices, 3,041,030,000 (3 billion) edges, and 140,120,000 (140 million) properties. This was deployed on Amazon EC2 and was yielding FOAF traversals, on average, in ∼50ms (again, index-free traversal). Figures provided by Todd Stavish (Stav.ish Consulting [http://blog.stavi.sh/]).
  • 98. Play Query “What do my friends’ friends favorite?”
  • 99. Querying Random Vertices with Repeats mysql SELECT count(favorite.inV) FROM friend as fa, friend as fb, favorite WHERE fa.outV=XXX AND fa.inV=fb.outV AND fb.inV=favorite.outV; 29.72 sec -- vertex 110752 0.330 sec -- vertex 110752 REPEAT 10.10 sec -- vertex 145893 11.64 sec -- vertex 126993 0.250 sec -- vertex 126993 REPEAT 14.37 sec -- vertex 136442 6.990 sec -- vertex 154837 0.240 sec -- vertex 154837 REPEAT gremlin g:count(g:id(XXX)/outE[@label=‘friend’]/inV /outE[@label=‘friend’]/inV/outE[@label=‘favorite’]/inV) 3.646 sec -- vertex 110752 0.350 sec -- vertex 110752 REPEAT 0.756 sec -- vertex 145893 3.251 sec -- vertex 126993 0.211 sec -- vertex 126993 REPEAT 1.462 sec -- vertex 136442 1.875 sec -- vertex 154837 0.268 sec -- vertex 154837 REPEAT
  • 100. Web of Data Detour
  • 101. A Traversal Detour Through the Web of Data ECS South- Sem- Wiki- BBC Surge ampton LIBRIS Web- company Playcount Radio Central RDF Data ohloh Resex Doap- Buda- Music- space Semantic ReSIST brainz Audio- pest Eurécom Project Flickr Web.org MySpace Scrobbler QDOS SW BME Wiki exporter Wrapper Conference IRIT Corpus Toulouse RAE National BBC BBC Crunch 2001 Science FOAF SIOC ACM BBC Music Later + John Base Revyu Foundation Jamendo Peel profiles Sites TOTP Open- Guides DBLP flickr RKB Project Pub Geo- Euro- wrappr Explorer Guten- Virtuoso Guide names stat Pisa CORDIS berg Sponger eprints BBC Programmes Open Calais RKB riese World Linked ECS Magna- Fact- MDB IEEE New- South- tune book ampton castle RDF Book DBpedia Mashup Linked GeoData lingvoj Freebase LAAS- US CiteSeer Census CNRS W3C DBLP Data IBM WordNet Hannover UniRef GEO UMBEL Species DBLP Gov- Track Berlin Reactome LinkedCT UniParc Open Taxonomy Cyc Yago Drug PROSITE Daily Bank Med Pub GeneID Chem Homolo KEGG UniProt Gene Pfam ProDom Disea- CAS Gene some ChEBI Ontology Symbol OMIM Inter Pro UniSTS PDB HGNC MGI PubMed As of July 2009 Image produced by Richard Cyganiak and Anja Jentzsch. [http://linkeddata.org/]
  • 102. Defining the Web of Data • The Web of Data is similar to the Web of Documents (of common knowledge), but instead of referencing documents (e.g. HTML, images, etc.) with the URI address space, individual datum are referenced.4344 http://markorodriguez.com, foaf:fundedBy, http://atti.com http://markorodriguez.com, foaf:name, Marko Rodriguez http://markorodriguez.com, foaf:age, 30 http://markorodriguez.com, foaf:knows, http://tenderlovemaking.com • In graph theoretic terms, the Web of Data is a multi-relational graph defined as G ⊆ (U ∪ B) × U × (U ∪ B ∪ L), where U is the set of all URIs, B is the set of all blank/anonymous nodes, and L is the set of all literals. 43 The Web of Data is also known as the Linked Data Web, the Giant Global Graph, the Semantic Web, the RDF graph, etc. 44 * Rodriguez, M.A., “Interpretations of the Web of Data, Data Management in the Semantic Web, eds. H. Jin and Z. Lv, Nova Publishing, in press, 2010. [http://arxiv.org/abs/0905.3378] * Rodriguez, M.A., “A Graph Analysis of the Linked Data Cloud,” Technical Report, KRS-2009-01, 2009. [http://arxiv.org/abs/0903.0194]
  • 103. Some of the Datasets on the Web of Data data set domain data set domain data set domain audioscrobbler music govtrack government pubguide books bbclatertotp music homologene biology qdos social bbcplaycountdata music ibm computer rae2001 computer bbcprogrammes media ieee computer rdfbookmashup books budapestbme computer interpro biology rdfohloh social chebi biology jamendo music resex computer crunchbase business laascnrs computer riese government dailymed medical libris books semanticweborg computer dblpberlin computer lingvoj reference semwebcentral social dblphannover computer linkedct medical siocsites social dblprkbexplorer computer linkedmdb movie surgeradio music dbpedia general magnatune music swconferencecorpus computer doapspace social musicbrainz music taxonomy reference drugbank medical myspacewrapper social umbel general eurecom computer opencalais reference uniref biology eurostat government opencyc general unists biology flickrexporter images openguides reference uscensusdata government flickrwrappr images pdb biology virtuososponger reference foafprofiles social pfam biology w3cwordnet reference freebase general pisa computer wikicompany business geneid biology prodom biology worldfactbook government geneontology biology projectgutenberg books yago general geonames geographic prosite biology ...
  • 104. Web of Data Dataset Dependencies homologenekegg projectgutenberg symbol libris cas bbcjohnpeel unists diseasome dailymed w3cwordnet chebi hgnc pubchem eurostat mgi omim wikicompany geospecies geneid reactome drugbank worldfactbook magnatune pubmed opencyc uniparc freebase linkedct uniprot taxonomy interpro uniref geneontologypdb umbel yago pfam dbpedia bbclatertotp govtrack prosite prodom flickrwrappropencalais uscensusdata surgeradio lingvoj linkedmdb virtuososponger homologenekegg projectgutenberg rdfbookmashup symbol libris swconferencecorpus geonames musicbrainz myspacewrapper dblpberlin pubguide cas bbcjohnpeel revyu unists jamendo diseasome dailymed w3cwordnet chebi rdfohloh hgnc bbcplaycountdata pubchem eurostat mgi omim wikicompany geospecies semanticweborg siocsites riese geneid foafprofiles reactome drugbank worldfactbook audioscrobbler bbcprogrammes magnatune dblphannover openguides pubmed opencyc uniparc crunchbase freebase linkedct uniprot taxonomy doapspace interpro uniref geneontology pdb umbel yago pfam dbpedia bbclatertotp govtrack flickrexporter budapestbme qdos prosite prodom flickrwrappropencalais semwebcentral uscensusdata eurecom ecssouthampton dblprkbexplorer surgeradio newcastle lingvoj linkedmdb pisa rae2001 virtuososponger acm eprints irittoulouse rdfbookmashup laascnrs citeseer swconferencecorpus geonames musicbrainz myspacewrapper ieee resex dblpberlin pubguide ibm revyu jamendo rdfohloh bbcplaycountdata semanticweborg siocsites riese foafprofiles openguides audioscrobbler bbcprogrammes dblphannover crunchbase doapspace flickrexporter
  • 105. Web of Data Transforms Development Paradigm A new application development paradigm emerges. No longer do data and application providers need to be the same entity (left). With the Web of Data, its possible for developers to write applications that utilize data that they do not maintain (right).45 Application 1 Application 2 Application 3 Application 1 Application 2 Application 3 processes processes processes processes processes processes Web of Data structures structures structures structures structures structures 127.0.0.1 127.0.0.2 127.0.0.3 127.0.0.1 127.0.0.2 127.0.0.3 45 Rodriguez, M.A., “A Reflection on the Structure and Process of the Web of Data,” Bulletin of the American Society for Information Science and Technology, 35(6), pp. 38–43, doi:10.1002/bult.2009.1720350611, 2009. [http://arxiv.org/abs/0908.0373]
  • 106. Extending our Knowledge of Bryce Canyon National Park gremlin $h := lds:open() gremlin $_ := g:id-v($h, ‘http://dbpedia.org/resource/Bryce_Canyon_National_Park’) ==v[http://dbpedia.org/resource/Bryce_Canyon_National_Park] gremlin ./outE ==e[dbpedia:Bryce_Canyon_National_Park - dbpprop:reference - http://www.nps.gov/brca/] ==e[dbpedia:Bryce_Canyon_National_Park - dbpprop:iucnCategory - II@en] ==e[dbpedia:Bryce_Canyon_National_Park - dbpedia-owl:numberOfVisitors - 1012563^^xsd:integer] ==e[dbpedia:Bryce_Canyon_National_Park - skos:subject - dbpedia:Category:Colorado_Plateau] ==e[dbpedia:Bryce_Canyon_National_Park - dbpprop:visitationNum - 1012563^^xsd:int] ==e[dbpedia:Bryce_Canyon_National_Park - dbpedia-owl:abstract - Bryce Canyon National Park is a national park located in southwestern Utah in the United States...@en] ==e[dbpedia:Bryce_Canyon_National_Park - dbpprop:area - 35835.0^^http://dbpedia.org/datatype/acre] ==e[dbpedia:Bryce_Canyon_National_Park - rdf:type - dbpedia-owl:ProtectedArea] ==e[dbpedia:Bryce_Canyon_National_Park - dbpedia-owl:location - dbpedia:Garfield_County%2C_Utah] ==e[dbpedia:Bryce_Canyon_National_Park - dbpprop:nearestCity - dbpedia:Panguitch%2C_Utah] ==e[dbpedia:Bryce_Canyon_National_Park - dbpprop:established - 1928-09-15^^xsd:date] ... 46 46 Linked Data Sail (LDS) was developed by Joshua Shinavier (RPI and TinkerPop) and connects to Gremlin through Gremlin’s native support for Sail (i.e. for RDF graphs). LDS caches the traversed aspects of the Web of Data into any quad-store (e.g. MemoryStore, AllegroGraph, HyperGraphSail, Neo4jSail, etc.).
  • 107. Augmenting Traversals with the Web of Data Lets extend our query over the Web of Data. Perhaps incorporate that into our searching, scoring, ranking, and recommendation. gremlin $visits := ./outE[@label=‘dbpprop:visitationNum’]/inV/@value ==1012563 gremlin $acreage := ./outE[@label=‘dbpprop:area’]/inV/@value ==35835.0 ### imagine wrapping traversals in Gremlin functions: ### func lds:acreage($h, $v) and func lds:visitors($h, $v) gremlin ./outE[@label=‘friend’]/inV/outE[@label=‘favorite’] /inV[lds:acreage($h, .) 1000000 and lds:visitors($h, .) 2000000]/@name ==Bryce Canyon Thus, what do tenderlove’s friends favorite that are small in acreage and visitation?47 47 In Gremlin, its possible to have multiple graphs open in parallel and thus, mix and match data from each graph as desired. Hence, demonstrated by the example above, its possible to mix Web of Data RDF graph data and Blueprints property graph data.
  • 108. Using the Web of Data for Music Recommendation Yet another aside: Using only the Web of Data data to recommend musicians/bands with a simplistic, edge-boolean spreading activation algorithm.48 gremlin $_ := ==The Tubes g:id(‘http://dbpedia.../Grateful_Dead’) ==Bob Dylan ==v[http://dbpedia.../Grateful_Dead] ==New Riders of the Purple Sage gremlin lds:spreading-activation(.) ==Bruce Hornsby ==Jerry Garcia Acoustic Band ==Donna Jean Godchaux ==BK3 ==Kingfish ==Phil Lesh and Friends ==Jerry Garcia Band ==Old and In the Way ==Donna Jean Godchaux Band ==RatDog ==The Other Ones ==The Dead ==Bobby and the Midnites ==Heart of Gold Band ==Furthur ==Legion of Mary ==Rhythm Devils 48 Please read the following for interesting, deeper ideas in this space: Clark, A., “Associative Engines: Connectionism, Concepts, and Representational Change,” MIT Press, 1993.
  • 109. Another View of the TinkerPop Stack GET http://{host}/{resource} Local Dataset Web of Data owl:sameAs
  • 111. Extending the Schema for Some Richer Examples For the last part of this presentation on recommendation, we will extend the data schema to include tags (a place can be tagged with a tag). This will allow for some richer examples.4950 name=string name=string location=string lat=double gender=string long=double name=string type=Person type=Place type=Tag Person Place Tag friend favorite tagged 49 Please note that 1.) “place” can be item/thing/book/music/etc. 2.) “favorite” can be likes/purchased/visited/etc. 3.) “tag” can be category/etc. A particular use case is presented, but with little imagination, application to other schemas is, of course, plausible. 50 Following examples have experimental syntax that may differ slightly from official Gremlin 0.5 release.
  • 112. Recommendation Example: Friend Finder • Open Friendship Triangles: (V × Ψ) → (V × N+)51 (people) 1. Create return map (i.e. V × N+). 2. Determine who my friends are. 3. Determine who my friends friends are... 4. ...that are not already my friends or me. (weighted by the number of overlapping friends—more overlaps, more traversers at that user vertex) 5. Sort return map by number of traversers at those user/people vertices. $m := g:map() (./outE[@label=‘friend’]/inV)[g:assign($x)] /outE[@label=‘friend’]/inV /.[g:except($x)][g:except($_)][g:op-value(‘+’,$m,.,1)] g:sort($m,‘value’,true) 51 Rx ◦ Afriend · Afriend ◦ n Afriend ◦ n (I), where x is the user/person vertex. The in-degree centrality vector of the derived adjacency matrix determines the resultant V rank.