SlideShare a Scribd company logo
1 of 80
Download to read offline
Outline
                                  Introduction
                    The Realization of Graphs
                             Graph Traversals
                                    Conclusion




               The Graph Traversal Pattern
           (Marko A. Rodriguez, Peter Neubauer)


            Igor Bogicevic (igor.bogicevic@sbgenomics.com)




                                   September 5, 2010




Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline
                                               Introduction
                                 The Realization of Graphs
                                          Graph Traversals
                                                 Conclusion




Introduction


The Realization of Graphs
   Brief Introductory
   The Indices of Relational Tables
   The Graph as an Index


Graph Traversals
   Definition of Graph Traversals
   Traversing for Recommendation
   Content-Based Recommendation
   Collaborative Filtering-Based Recommendation
   Traversing Endogenous Indices


Conclusion




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline
                                                 Introduction
                                   The Realization of Graphs
                                            Graph Traversals
                                                   Conclusion


Introduction


       In the most common sense of the term, a graph is an ordered pair G = (V , E )
       comprising a set V of vertices or nodes together with a set E of edges or lines,
       which are 2-element subsets of V (i.e, an edge is related with two vertices, and
       the relation is represented as unordered pair of the vertices with respect to the
       particular edge). To avoid ambiguity, this type of graph may be described
       precisely as undirected and simple.




               Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline
                                                 Introduction
                                   The Realization of Graphs
                                            Graph Traversals
                                                   Conclusion


Introduction


       In the most common sense of the term, a graph is an ordered pair G = (V , E )
       comprising a set V of vertices or nodes together with a set E of edges or lines,
       which are 2-element subsets of V (i.e, an edge is related with two vertices, and
       the relation is represented as unordered pair of the vertices with respect to the
       particular edge). To avoid ambiguity, this type of graph may be described
       precisely as undirected and simple.
       For directed graphs, E ⊆ (V × V ) and for undirected graphs, E ⊆ {V × V }. That
       is, E is a subset of all ordered or unordered permutations of V element pairings.




               Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline
                                                 Introduction
                                   The Realization of Graphs
                                            Graph Traversals
                                                   Conclusion


Introduction


       In the most common sense of the term, a graph is an ordered pair G = (V , E )
       comprising a set V of vertices or nodes together with a set E of edges or lines,
       which are 2-element subsets of V (i.e, an edge is related with two vertices, and
       the relation is represented as unordered pair of the vertices with respect to the
       particular edge). To avoid ambiguity, this type of graph may be described
       precisely as undirected and simple.
       For directed graphs, E ⊆ (V × V ) and for undirected graphs, E ⊆ {V × V }. That
       is, E is a subset of all ordered or unordered permutations of V element pairings.
       In different contexts it may be useful to define the term graph with different
       degrees of generality. Whenever it is necessary to draw a strict distinction, the
       following terms are used. Most commonly, in modern texts in graph theory, unless
       stated otherwise, graph means ”undirected simple finite graph”




               Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline
                                                 Introduction
                                   The Realization of Graphs
                                            Graph Traversals
                                                   Conclusion


Introduction


       In the most common sense of the term, a graph is an ordered pair G = (V , E )
       comprising a set V of vertices or nodes together with a set E of edges or lines,
       which are 2-element subsets of V (i.e, an edge is related with two vertices, and
       the relation is represented as unordered pair of the vertices with respect to the
       particular edge). To avoid ambiguity, this type of graph may be described
       precisely as undirected and simple.
       For directed graphs, E ⊆ (V × V ) and for undirected graphs, E ⊆ {V × V }. That
       is, E is a subset of all ordered or unordered permutations of V element pairings.
       In different contexts it may be useful to define the term graph with different
       degrees of generality. Whenever it is necessary to draw a strict distinction, the
       following terms are used. Most commonly, in modern texts in graph theory, unless
       stated otherwise, graph means ”undirected simple finite graph”
       We can talk about various other types of graphs: Mixed graphs, Multigraphs,
       Weighted graphs, etc.




               Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline
                                               Introduction   Brief Introductory
                                 The Realization of Graphs    The Indices of Relational Tables
                                          Graph Traversals    The Graph as an Index
                                                 Conclusion


Brief Introductory


        Relational databases have been around since the late 1960s (Edgar F. Codd, A
        relational model of data for large shared data banks) and are todays most
        predominate data management tool. Relational databases maintain a collection
        of tables. Each table can be defined by a set of rows and a set of columns.
        Semantically, rows denote objects and columns denote properties/attributes.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline
                                                Introduction   Brief Introductory
                                  The Realization of Graphs    The Indices of Relational Tables
                                           Graph Traversals    The Graph as an Index
                                                  Conclusion


Brief Introductory


        Relational databases have been around since the late 1960s (Edgar F. Codd, A
        relational model of data for large shared data banks) and are todays most
        predominate data management tool. Relational databases maintain a collection
        of tables. Each table can be defined by a set of rows and a set of columns.
        Semantically, rows denote objects and columns denote properties/attributes.
        In contrast, graph databases do not store data in disparate tables. Instead there
        is a single data structure—the graph. Moreover, there is no concept of a “join”
        operation as every vertex and edge has a direct reference to its adjacent vertex or
        edge. The data structure is already “joined” by the edges that are defined.




              Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline
                                                Introduction   Brief Introductory
                                  The Realization of Graphs    The Indices of Relational Tables
                                           Graph Traversals    The Graph as an Index
                                                  Conclusion


Brief Introductory


        Relational databases have been around since the late 1960s (Edgar F. Codd, A
        relational model of data for large shared data banks) and are todays most
        predominate data management tool. Relational databases maintain a collection
        of tables. Each table can be defined by a set of rows and a set of columns.
        Semantically, rows denote objects and columns denote properties/attributes.
        In contrast, graph databases do not store data in disparate tables. Instead there
        is a single data structure—the graph. Moreover, there is no concept of a “join”
        operation as every vertex and edge has a direct reference to its adjacent vertex or
        edge. The data structure is already “joined” by the edges that are defined.
        Main drawback - it’s hard to shard a graph.




              Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline
                                                Introduction   Brief Introductory
                                  The Realization of Graphs    The Indices of Relational Tables
                                           Graph Traversals    The Graph as an Index
                                                  Conclusion


Brief Introductory


        Relational databases have been around since the late 1960s (Edgar F. Codd, A
        relational model of data for large shared data banks) and are todays most
        predominate data management tool. Relational databases maintain a collection
        of tables. Each table can be defined by a set of rows and a set of columns.
        Semantically, rows denote objects and columns denote properties/attributes.
        In contrast, graph databases do not store data in disparate tables. Instead there
        is a single data structure—the graph. Moreover, there is no concept of a “join”
        operation as every vertex and edge has a direct reference to its adjacent vertex or
        edge. The data structure is already “joined” by the edges that are defined.
        Main drawback - it’s hard to shard a graph.
        Main benefit - constant time cost for retrieving an adjacent vertex or edge. In
        other words, regardless of the size of the graph as a whole, the cost of a local
        read operation at a vertex or edge remains constant.




              Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline
                                                Introduction   Brief Introductory
                                  The Realization of Graphs    The Indices of Relational Tables
                                           Graph Traversals    The Graph as an Index
                                                  Conclusion


The Indices of Relational Tables




        Direct data access is inefficient - for n elements, a linear scan runs in O(n)




              Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline
                                                Introduction   Brief Introductory
                                  The Realization of Graphs    The Indices of Relational Tables
                                           Graph Traversals    The Graph as an Index
                                                  Conclusion


The Indices of Relational Tables




        Direct data access is inefficient - for n elements, a linear scan runs in O(n)
        Binary search tree dramatically reduces the overhead - searching such indices
        takes O(log2 n)




              Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline
                                                Introduction   Brief Introductory
                                  The Realization of Graphs    The Indices of Relational Tables
                                           Graph Traversals    The Graph as an Index
                                                  Conclusion


The Indices of Relational Tables




        Direct data access is inefficient - for n elements, a linear scan runs in O(n)
        Binary search tree dramatically reduces the overhead - searching such indices
        takes O(log2 n)
        In effect, the join operation forms a graph that is dynamically constructed as one
        table is linked to another table. While having the benefit of being able to
        dynamically construct graphs, the limitation is that this graph is not explicit in
        the relational structure, but instead must be inferred through a series of
        index-intensive operations.




              Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline
                                                Introduction   Brief Introductory
                                  The Realization of Graphs    The Indices of Relational Tables
                                           Graph Traversals    The Graph as an Index
                                                  Conclusion


The Indices of Relational Tables


        person.identifier index                   person.name index                          friend.person_a index




                        identifier    name                                                 person_a      person_b
                           1      Alberto Pepe                                                1             2
                             2             ...                                                    1         3
                             3             ...                                                    1         4
                             4             ...                                                    ...       ...



                        Figure: A table representation of people and their friends.




              Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline
                                               Introduction   Brief Introductory
                                 The Realization of Graphs    The Indices of Relational Tables
                                          Graph Traversals    The Graph as an Index
                                                 Conclusion


The Graph as an Index
       A single-relational graph maintains a set of edges, where all the edges are
       homogeneous in meaning. This means that separate relationships (i.e. friendship,
       kinship, etc.) are stored as separate graphs.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline
                                               Introduction   Brief Introductory
                                 The Realization of Graphs    The Indices of Relational Tables
                                          Graph Traversals    The Graph as an Index
                                                 Conclusion


The Graph as an Index
       A single-relational graph maintains a set of edges, where all the edges are
       homogeneous in meaning. This means that separate relationships (i.e. friendship,
       kinship, etc.) are stored as separate graphs.
       Formally, a property graph can be defined as G = (V , E , λ, µ), where edges are
       directed (i.e. E ⊆ (V × V )), edges are labeled (i.e. λ : E → Σ), and properties
       are a map from elements and keys to values (i.e. µ : (V ∪ E ) × R → S).




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline
                                               Introduction   Brief Introductory
                                 The Realization of Graphs    The Indices of Relational Tables
                                          Graph Traversals    The Graph as an Index
                                                 Conclusion


The Graph as an Index
       A single-relational graph maintains a set of edges, where all the edges are
       homogeneous in meaning. This means that separate relationships (i.e. friendship,
       kinship, etc.) are stored as separate graphs.
       Formally, a property graph can be defined as G = (V , E , λ, µ), where edges are
       directed (i.e. E ⊆ (V × V )), edges are labeled (i.e. λ : E → Σ), and properties
       are a map from elements and keys to values (i.e. µ : (V ∪ E ) × R → S).
       In the property graph model, it is common for the properties of the vertices (and
       sometimes edges) to be indexed using a tree structure analogous, in many ways,
       to those used by relational databases. This index can be represented by some
       external indexing system or endogenous to the graph as an embedded tree.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline
                                               Introduction   Brief Introductory
                                 The Realization of Graphs    The Indices of Relational Tables
                                          Graph Traversals    The Graph as an Index
                                                 Conclusion


The Graph as an Index
       A single-relational graph maintains a set of edges, where all the edges are
       homogeneous in meaning. This means that separate relationships (i.e. friendship,
       kinship, etc.) are stored as separate graphs.
       Formally, a property graph can be defined as G = (V , E , λ, µ), where edges are
       directed (i.e. E ⊆ (V × V )), edges are labeled (i.e. λ : E → Σ), and properties
       are a map from elements and keys to values (i.e. µ : (V ∪ E ) × R → S).
       In the property graph model, it is common for the properties of the vertices (and
       sometimes edges) to be indexed using a tree structure analogous, in many ways,
       to those used by relational databases. This index can be represented by some
       external indexing system or endogenous to the graph as an embedded tree.
       The domain model defines how the elements of the problem space are related.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline
                                               Introduction   Brief Introductory
                                 The Realization of Graphs    The Indices of Relational Tables
                                          Graph Traversals    The Graph as an Index
                                                 Conclusion


The Graph as an Index
       A single-relational graph maintains a set of edges, where all the edges are
       homogeneous in meaning. This means that separate relationships (i.e. friendship,
       kinship, etc.) are stored as separate graphs.
       Formally, a property graph can be defined as G = (V , E , λ, µ), where edges are
       directed (i.e. E ⊆ (V × V )), edges are labeled (i.e. λ : E → Σ), and properties
       are a map from elements and keys to values (i.e. µ : (V ∪ E ) × R → S).
       In the property graph model, it is common for the properties of the vertices (and
       sometimes edges) to be indexed using a tree structure analogous, in many ways,
       to those used by relational databases. This index can be represented by some
       external indexing system or endogenous to the graph as an embedded tree.
       The domain model defines how the elements of the problem space are related.
       The domain model partitions elements using semantics defined by the domain
       modeler. Thus, in many ways, a graph can be seen as an indexing structure.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline
                                               Introduction   Brief Introductory
                                 The Realization of Graphs    The Indices of Relational Tables
                                          Graph Traversals    The Graph as an Index
                                                 Conclusion


The Graph as an Index
       A single-relational graph maintains a set of edges, where all the edges are
       homogeneous in meaning. This means that separate relationships (i.e. friendship,
       kinship, etc.) are stored as separate graphs.
       Formally, a property graph can be defined as G = (V , E , λ, µ), where edges are
       directed (i.e. E ⊆ (V × V )), edges are labeled (i.e. λ : E → Σ), and properties
       are a map from elements and keys to values (i.e. µ : (V ∪ E ) × R → S).
       In the property graph model, it is common for the properties of the vertices (and
       sometimes edges) to be indexed using a tree structure analogous, in many ways,
       to those used by relational databases. This index can be represented by some
       external indexing system or endogenous to the graph as an embedded tree.
       The domain model defines how the elements of the problem space are related.
       The domain model partitions elements using semantics defined by the domain
       modeler. Thus, in many ways, a graph can be seen as an indexing structure.
       In a graph database, there is no explicit join operation because vertices maintain
       direct references to their adjacent edges. In many ways, the edges of the graph
       serve as explicit, “hard-wired” join structures, not computed in run time as joins.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline
                                               Introduction   Brief Introductory
                                 The Realization of Graphs    The Indices of Relational Tables
                                          Graph Traversals    The Graph as an Index
                                                 Conclusion


The Graph as an Index
       A single-relational graph maintains a set of edges, where all the edges are
       homogeneous in meaning. This means that separate relationships (i.e. friendship,
       kinship, etc.) are stored as separate graphs.
       Formally, a property graph can be defined as G = (V , E , λ, µ), where edges are
       directed (i.e. E ⊆ (V × V )), edges are labeled (i.e. λ : E → Σ), and properties
       are a map from elements and keys to values (i.e. µ : (V ∪ E ) × R → S).
       In the property graph model, it is common for the properties of the vertices (and
       sometimes edges) to be indexed using a tree structure analogous, in many ways,
       to those used by relational databases. This index can be represented by some
       external indexing system or endogenous to the graph as an embedded tree.
       The domain model defines how the elements of the problem space are related.
       The domain model partitions elements using semantics defined by the domain
       modeler. Thus, in many ways, a graph can be seen as an indexing structure.
       In a graph database, there is no explicit join operation because vertices maintain
       direct references to their adjacent edges. In many ways, the edges of the graph
       serve as explicit, “hard-wired” join structures, not computed in run time as joins.
       The act of traversing over an edge is the act of joining. However, what makes
       this more efficient in a graph database is that traversing from one vertex to
       another is a constant time operation. Thus, traversal time is defined solely by the
       number of elements touched by the traversal.
             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline
                                                 Introduction   Brief Introductory
                                   The Realization of Graphs    The Indices of Relational Tables
                                            Graph Traversals    The Graph as an Index
                                                   Conclusion


The Graph as an Index


                                                                                                   name=...

                                                                                                      2
                vertex.name index                                             friend
                                            name=Alberto Pepe                                      name=...

                                                                1                friend               3

                                                                                            ...      ...
                                                                             friend                name=...

                                                                                                      4


   Figure: A graph representation of people and their friends. Given the tree-nature of the
   vertex.name index, it is possible, and many times useful to model the index endogenous to the
   graph.




               Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Definition of Graph Traversals



       Graph traversal refers to visiting elements (i.e. vertices and edges) in a graph in
       some algorithmic fashion.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Definition of Graph Traversals



       Graph traversal refers to visiting elements (i.e. vertices and edges) in a graph in
       some algorithmic fashion.
       The most primitive, read-based operation on a graph is a single step traversal
       from element i to element j, where i, j ∈ (V ∪ E ).




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Definition of Graph Traversals



       Graph traversal refers to visiting elements (i.e. vertices and edges) in a graph in
       some algorithmic fashion.
       The most primitive, read-based operation on a graph is a single step traversal
       from element i to element j, where i, j ∈ (V ∪ E ).
       These operations are defined over power multiset domains and ranges - power
                              ˆ
       multiset of A, denoted P(A), is the infinite set of all subsets of multisets of A.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Definition of Graph Traversals



       Graph traversal refers to visiting elements (i.e. vertices and edges) in a graph in
       some algorithmic fashion.
       The most primitive, read-based operation on a graph is a single step traversal
       from element i to element j, where i, j ∈ (V ∪ E ).
       These operations are defined over power multiset domains and ranges - power
                                 ˆ
       multiset of A, denoted P(A), is the infinite set of all subsets of multisets of A.
       Let’s list single step traversals:
           eout : P(V ) → P(E ): traverse to the outgoing edges of the vertices.
                   ˆ       ˆ
           ein : P(V ) → P(E ): traverse to the incoming edges to the vertices.
                  ˆ       ˆ
           vout : P(E ) → P(V ): traverse to the outgoing (i.e. tail) vertices of the edges.
                   ˆ       ˆ
           vin : P(E ) → P(V ): traverse the incoming (i.e. head) vertices of the edges.
                  ˆ      ˆ
             : P(V ∪ E ) × R → P(S): get the element property values for key r ∈ R.
                ˆ                 ˆ




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Definition of Graph Traversals
       When edges are labeled and elements have properties, it is desirable to constrain
       the traversal to edges of a particular label or elements with particular properties.
       These operations are known as filters and are abstractly defined in the following
       itemization:
           elab± : P(E ) × Σ → P(E ): allow (or filter) all edges with the label σ ∈ Σ.
                    ˆ           ˆ
            p± : P(V ∪ E ) × R × S → P(V ∪ E ): allow (or filter) all elements with the property
                  ˆ                     ˆ
           s ∈ S for key r ∈ R.
              ± : P(V ∪ E ) × (V ∪ E ) → P(V ∪ E ): allow (or filter) all elements that are the
                  ˆ                       ˆ
           provided element.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                                 Introduction   Traversing for Recommendation
                                   The Realization of Graphs    Content-Based Recommendation
                                            Graph Traversals    Collaborative Filtering-Based Recommendation
                                                   Conclusion   Traversing Endogenous Indices


Definition of Graph Traversals
       When edges are labeled and elements have properties, it is desirable to constrain
       the traversal to edges of a particular label or elements with particular properties.
       These operations are known as filters and are abstractly defined in the following
       itemization:
           elab± : P(E ) × Σ → P(E ): allow (or filter) all edges with the label σ ∈ Σ.
                    ˆ           ˆ
            p± : P(V ∪ E ) × R × S → P(V ∪ E ): allow (or filter) all elements with the property
                  ˆ                     ˆ
           s ∈ S for key r ∈ R.
              ± : P(V ∪ E ) × (V ∪ E ) → P(V ∪ E ): allow (or filter) all elements that are the
                  ˆ                       ˆ
           provided element.
       Through function composition, we can define graph traversals of arbitrary length,
       i.e. Alberto Pepe’s friends:
                                         ˆ       ˆ
                                     f : P(V ) → P(S),
       where
                                   f (i) = (vin (elab+ (eout (i), friend)) , name) ,
       then f (i) will return the names of Alberto Pepe’s friends. Through function
       currying and composition, the previous definition can be represented more clearly
       with the following function rule,
                                      “                         ”
                               f (i) = name ◦ vin ◦ elab+ ◦ eout (i).
                                                     friend




               Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline      Definition of Graph Traversals
                                              Introduction      Traversing for Recommendation
                                The Realization of Graphs       Content-Based Recommendation
                                         Graph Traversals       Collaborative Filtering-Based Recommendation
                                                Conclusion      Traversing Endogenous Indices


Definition of Graph Traversals


                                                                                  name=...

                                                                                    2
                                                             friend
                          name=Alberto Pepe                                       name=...

                                               1               friend               3


                                                                                  name=...
                                         eout                friend
                                                                                    4

                                              efriend
                                               lab+
                                                                        vin             name

                            Figure: A single path along along the f traversal.




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)      The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Traversing for Recommendation
       Recommendation systems are designed to help people deal with the problem of
       information overload by filtering information in the system that doesn’t pertain to
       the person.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Traversing for Recommendation
       Recommendation systems are designed to help people deal with the problem of
       information overload by filtering information in the system that doesn’t pertain to
       the person.
       Content-based recommendation deals with recommending resources that share
       characteristics (i.e. content) with a set of resources.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Traversing for Recommendation
       Recommendation systems are designed to help people deal with the problem of
       information overload by filtering information in the system that doesn’t pertain to
       the person.
       Content-based recommendation deals with recommending resources that share
       characteristics (i.e. content) with a set of resources.
       On the other side collaborative filtering-based recommendation in concerned with
       determining the similarity of resources based upon the similarity of the taste of
       the people modeled within the system.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Traversing for Recommendation
       Recommendation systems are designed to help people deal with the problem of
       information overload by filtering information in the system that doesn’t pertain to
       the person.
       Content-based recommendation deals with recommending resources that share
       characteristics (i.e. content) with a set of resources.
       On the other side collaborative filtering-based recommendation in concerned with
       determining the similarity of resources based upon the similarity of the taste of
       the people modeled within the system.
       Graph databases provide a good paradigm for dealing with both recommendation
       techniques.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Traversing for Recommendation
       Recommendation systems are designed to help people deal with the problem of
       information overload by filtering information in the system that doesn’t pertain to
       the person.
       Content-based recommendation deals with recommending resources that share
       characteristics (i.e. content) with a set of resources.
       On the other side collaborative filtering-based recommendation in concerned with
       determining the similarity of resources based upon the similarity of the taste of
       the people modeled within the system.
       Graph databases provide a good paradigm for dealing with both recommendation
       techniques.
       Furthermore, using the graph traversal pattern, there exists a single graph data
       structure that can be traversed in different ways to expose different types of
       recommendations—generally, different types of relationships between vertices.
       Being able to mix and match the types of traversals executed alters the semantics
       of the final rankings and conveniently allows for hybrid recommendation
       algorithms to emerge.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Traversing for Recommendation
       Recommendation systems are designed to help people deal with the problem of
       information overload by filtering information in the system that doesn’t pertain to
       the person.
       Content-based recommendation deals with recommending resources that share
       characteristics (i.e. content) with a set of resources.
       On the other side collaborative filtering-based recommendation in concerned with
       determining the similarity of resources based upon the similarity of the taste of
       the people modeled within the system.
       Graph databases provide a good paradigm for dealing with both recommendation
       techniques.
       Furthermore, using the graph traversal pattern, there exists a single graph data
       structure that can be traversed in different ways to expose different types of
       recommendations—generally, different types of relationships between vertices.
       Being able to mix and match the types of traversals executed alters the semantics
       of the final rankings and conveniently allows for hybrid recommendation
       algorithms to emerge.
       Following figure presents a toy graph data set, where there exist a set of people,
       resources, and features related to each other by likes- and feature-labeled
       edges.

             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline     Definition of Graph Traversals
                                                 Introduction     Traversing for Recommendation
                                   The Realization of Graphs      Content-Based Recommendation
                                            Graph Traversals      Collaborative Filtering-Based Recommendation
                                                   Conclusion     Traversing Endogenous Indices


Traversing for Recommendation


                                                                  feature                          8
                                                                                               f


                                                    r 2           likes        p 6
                              likes
                                                                  likes


                      p 1              likes        r 3           likes        p 7

                                                                                   feature
                               likes

                                                    r 4         feature
                                                                               f
                                                                                   5


   Figure: A graph data structure containing people (p), their liked resources (r), and each resource’s
   features (f).




               Igor Bogicevic (igor.bogicevic@sbgenomics.com)     The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                              Introduction   Traversing for Recommendation
                                The Realization of Graphs    Content-Based Recommendation
                                         Graph Traversals    Collaborative Filtering-Based Recommendation
                                                Conclusion   Traversing Endogenous Indices


Content-Based Recommendation

      In order to identify resources that that are similar in features (i.e. content-based
      recommendation) to a resource, traverse to all resources that share the same
                                                                         ˆ         ˆ
      features. This is accomplished with the following function, f : P(V ) → P(V ),
      where                “                                              ”
                    f (i) = i − ◦ vout ◦ elab+ ◦ ein ◦ vin ◦ elab+ ◦ eout (i).
                                          feature             feature




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                              Introduction   Traversing for Recommendation
                                The Realization of Graphs    Content-Based Recommendation
                                         Graph Traversals    Collaborative Filtering-Based Recommendation
                                                Conclusion   Traversing Endogenous Indices


Content-Based Recommendation

      In order to identify resources that that are similar in features (i.e. content-based
      recommendation) to a resource, traverse to all resources that share the same
                                                                         ˆ         ˆ
      features. This is accomplished with the following function, f : P(V ) → P(V ),
      where                “                                              ”
                    f (i) = i − ◦ vout ◦ elab+ ◦ ein ◦ vin ◦ elab+ ◦ eout (i).
                                          feature             feature


      Assuming i = 3, function f states, traverse to the outgoing edges of resource
      vertex 3, only allow feature-labeled edges, and then traverse to the incoming
      vertices of those feature-labeled edges.




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                              Introduction   Traversing for Recommendation
                                The Realization of Graphs    Content-Based Recommendation
                                         Graph Traversals    Collaborative Filtering-Based Recommendation
                                                Conclusion   Traversing Endogenous Indices


Content-Based Recommendation

      In order to identify resources that that are similar in features (i.e. content-based
      recommendation) to a resource, traverse to all resources that share the same
                                                                         ˆ         ˆ
      features. This is accomplished with the following function, f : P(V ) → P(V ),
      where                “                                              ”
                    f (i) = i − ◦ vout ◦ elab+ ◦ ein ◦ vin ◦ elab+ ◦ eout (i).
                                          feature             feature


      Assuming i = 3, function f states, traverse to the outgoing edges of resource
      vertex 3, only allow feature-labeled edges, and then traverse to the incoming
      vertices of those feature-labeled edges.
      At this point, the traverser is at feature vertex 8. Next, traverse to the incoming
      edges of feature vertex 8, only allow feature-labeled edges, and then traverse to
      the outgoing vertices of these feature-labeled edges and at that point, the
      traverser is at resource vertices 3 and 2.




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                              Introduction   Traversing for Recommendation
                                The Realization of Graphs    Content-Based Recommendation
                                         Graph Traversals    Collaborative Filtering-Based Recommendation
                                                Conclusion   Traversing Endogenous Indices


Content-Based Recommendation

      In order to identify resources that that are similar in features (i.e. content-based
      recommendation) to a resource, traverse to all resources that share the same
                                                                         ˆ         ˆ
      features. This is accomplished with the following function, f : P(V ) → P(V ),
      where                “                                              ”
                    f (i) = i − ◦ vout ◦ elab+ ◦ ein ◦ vin ◦ elab+ ◦ eout (i).
                                          feature             feature


      Assuming i = 3, function f states, traverse to the outgoing edges of resource
      vertex 3, only allow feature-labeled edges, and then traverse to the incoming
      vertices of those feature-labeled edges.
      At this point, the traverser is at feature vertex 8. Next, traverse to the incoming
      edges of feature vertex 8, only allow feature-labeled edges, and then traverse to
      the outgoing vertices of these feature-labeled edges and at that point, the
      traverser is at resource vertices 3 and 2.
      However, since we are trying to identify those resources similar in content to
      vertex 3, we need to filter out vertex 3. This is accomplished by the last stage of
      the function composition. Thus, given the toy graph data set, vertex 2 is similar
      to vertex 3 in content.




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                              Introduction   Traversing for Recommendation
                                The Realization of Graphs    Content-Based Recommendation
                                         Graph Traversals    Collaborative Filtering-Based Recommendation
                                                Conclusion   Traversing Endogenous Indices


Content-Based Recommendation

      In order to identify resources that that are similar in features (i.e. content-based
      recommendation) to a resource, traverse to all resources that share the same
                                                                         ˆ         ˆ
      features. This is accomplished with the following function, f : P(V ) → P(V ),
      where                “                                              ”
                    f (i) = i − ◦ vout ◦ elab+ ◦ ein ◦ vin ◦ elab+ ◦ eout (i).
                                          feature             feature


      Assuming i = 3, function f states, traverse to the outgoing edges of resource
      vertex 3, only allow feature-labeled edges, and then traverse to the incoming
      vertices of those feature-labeled edges.
      At this point, the traverser is at feature vertex 8. Next, traverse to the incoming
      edges of feature vertex 8, only allow feature-labeled edges, and then traverse to
      the outgoing vertices of these feature-labeled edges and at that point, the
      traverser is at resource vertices 3 and 2.
      However, since we are trying to identify those resources similar in content to
      vertex 3, we need to filter out vertex 3. This is accomplished by the last stage of
      the function composition. Thus, given the toy graph data set, vertex 2 is similar
      to vertex 3 in content.
      Following diagram illustrates such traversal:


            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline     Definition of Graph Traversals
                                                 Introduction     Traversing for Recommendation
                                   The Realization of Graphs      Content-Based Recommendation
                                            Graph Traversals      Collaborative Filtering-Based Recommendation
                                                   Conclusion     Traversing Endogenous Indices


Content-Based Recommendation

                                                       efeature
                                                        lab+
                                                                      ein
                               vout
                                                      feature                     8
                                                                              f


                                       r 2                 i
                                                            −
                                                                                         vin
                                       r 3

                                                                feature


                                           eout
                                                         efeature
                                                          lab+

   Figure: A traversal that identifies resources that are similar in content to a set of resources based
   upon shared features.




               Igor Bogicevic (igor.bogicevic@sbgenomics.com)     The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                             Introduction   Traversing for Recommendation
                               The Realization of Graphs    Content-Based Recommendation
                                        Graph Traversals    Collaborative Filtering-Based Recommendation
                                               Conclusion   Traversing Endogenous Indices


Content-Based Recommendation


      It’s simple to extend content-based recommendation to problems such as: “Given
      what person i likes, what other resources have similar features?”




           Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                              Introduction   Traversing for Recommendation
                                The Realization of Graphs    Content-Based Recommendation
                                         Graph Traversals    Collaborative Filtering-Based Recommendation
                                                Conclusion   Traversing Endogenous Indices


Content-Based Recommendation


      It’s simple to extend content-based recommendation to problems such as: “Given
      what person i likes, what other resources have similar features?”
      Such a problem is solved using the previous function f defined above combined
      with a new composition that finds all the resources that person i likes. Thus, if
          ˆ        ˆ
      g : P(V ) → P(V ), where
                                       “                 ”
                                               likes
                               g (i) = vin ◦ elab+ ◦ eout (i),

      then to determine those resources similar in features to the resources that person
      vertex 7 likes, compose function f and g : (f ◦ g ).




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                              Introduction   Traversing for Recommendation
                                The Realization of Graphs    Content-Based Recommendation
                                         Graph Traversals    Collaborative Filtering-Based Recommendation
                                                Conclusion   Traversing Endogenous Indices


Content-Based Recommendation


      It’s simple to extend content-based recommendation to problems such as: “Given
      what person i likes, what other resources have similar features?”
      Such a problem is solved using the previous function f defined above combined
      with a new composition that finds all the resources that person i likes. Thus, if
          ˆ        ˆ
      g : P(V ) → P(V ), where
                                       “                 ”
                                               likes
                               g (i) = vin ◦ elab+ ◦ eout (i),

      then to determine those resources similar in features to the resources that person
      vertex 7 likes, compose function f and g : (f ◦ g ).
      Those resources that share more features in common will be returned more by
      f ◦ g.




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                              Introduction   Traversing for Recommendation
                                The Realization of Graphs    Content-Based Recommendation
                                         Graph Traversals    Collaborative Filtering-Based Recommendation
                                                Conclusion   Traversing Endogenous Indices


Content-Based Recommendation


      It’s simple to extend content-based recommendation to problems such as: “Given
      what person i likes, what other resources have similar features?”
      Such a problem is solved using the previous function f defined above combined
      with a new composition that finds all the resources that person i likes. Thus, if
          ˆ        ˆ
      g : P(V ) → P(V ), where
                                       “                 ”
                                               likes
                               g (i) = vin ◦ elab+ ◦ eout (i),

      then to determine those resources similar in features to the resources that person
      vertex 7 likes, compose function f and g : (f ◦ g ).
      Those resources that share more features in common will be returned more by
      f ◦ g.
      Using this approach, function can return similar results so, if necessary,
      deduplication has to be handled with a separate mechanism.




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Collaborative Filtering-Based Recommendation




       With collaborative filtering, the objective is to identify a set of resources that
       have a high probability of being liked by a person based upon identifying other
       people in the system that enjoy similar likes.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Collaborative Filtering-Based Recommendation




       With collaborative filtering, the objective is to identify a set of resources that
       have a high probability of being liked by a person based upon identifying other
       people in the system that enjoy similar likes.
       In example, if person a and person b share 90% of their liked resources in
       common, then the remaining 10% they don’t share in common are candidates for
       recommendation.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Collaborative Filtering-Based Recommendation




       With collaborative filtering, the objective is to identify a set of resources that
       have a high probability of being liked by a person based upon identifying other
       people in the system that enjoy similar likes.
       In example, if person a and person b share 90% of their liked resources in
       common, then the remaining 10% they don’t share in common are candidates for
       recommendation.
       For simplicity, this traversal is broken into two components f and g .




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Collaborative Filtering-Based Recommendation
                              ˆ       ˆ
       First component is f : P(V ) → P(V ) where
                            “                                            ”
                    f (i) = i − ◦ vout ◦ elab+ ◦ ein ◦ vin ◦ elab+ ◦ eout (i).
                                          like                like


       Function f traverses to all those people vertices that like the same resources as
       person vertex i and who themselves are not vertex i (as a person is obviously
       similar to themselves and thus, doesn’t contribute anything to the computation).
       The more resources liked that a person shares in common with i, the more
       traversers will be located at that person’s vertex. In other words, if person i and
       person j share 10 liked resources in common, then f (i) will return person j 10
       times.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Collaborative Filtering-Based Recommendation
                              ˆ       ˆ
       First component is f : P(V ) → P(V ) where
                            “                                            ”
                    f (i) = i − ◦ vout ◦ elab+ ◦ ein ◦ vin ◦ elab+ ◦ eout (i).
                                          like                like


       Function f traverses to all those people vertices that like the same resources as
       person vertex i and who themselves are not vertex i (as a person is obviously
       similar to themselves and thus, doesn’t contribute anything to the computation).
       The more resources liked that a person shares in common with i, the more
       traversers will be located at that person’s vertex. In other words, if person i and
       person j share 10 liked resources in common, then f (i) will return person j 10
       times.
       Next, function g is defined as
                                                  “                  ”
                                                          like
                                           g (j) = vin ◦ elab+ ◦ eout (j).

       Function g traverses to all the resources liked by vertex j. In composition,
       (g ◦ f )(i) determines all those resources that are liked by those people that have
       similar tastes to vertex i. If person j likes 10 resources in common with person i,
       then the resources that person j likes will be returned at least 10 times by g ◦ f
       (perhaps more if a path exists to those resources from another person vertex as
       well).

             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline       Definition of Graph Traversals
                                                Introduction       Traversing for Recommendation
                                  The Realization of Graphs        Content-Based Recommendation
                                           Graph Traversals        Collaborative Filtering-Based Recommendation
                                                  Conclusion       Traversing Endogenous Indices


Collaborative Filtering-Based Recommendation
       Following figure diagrams a function path starting from vertex 7. Only one legal
       path is presented for the sake of diagram clarity

                                                               g        likes
                                              j
                                               −
                                                                   vin elab+ eout                      i
                                                                                                        −

                                                           r 2              likes             p 6
                                                                   elikes
                                                                    lab+
                                  likes                ein                                             vout
                                                                                    likes

                                                                                                        f
                        p 1                 likes          r 3              likes             p 7
                                    likes
                                                                      vin                   eout
                                                           r 4                elikes
                                                                               lab+

       Figure: A traversal that identifies resources that are similar in content to a resource based
       upon shared features.


              Igor Bogicevic (igor.bogicevic@sbgenomics.com)       The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Traversing Endogenous Indices

       A graph is a general-purpose data structure. A graph can be used to model lists,
       maps, trees, etc. As such, a graph can model an index.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Traversing Endogenous Indices

       A graph is a general-purpose data structure. A graph can be used to model lists,
       maps, trees, etc. As such, a graph can model an index.
       It was previously assumed that a graph database makes use of an external
       indexing system to index the properties of its vertices and edges with assumption
       that specialized indexing systems are better suited for special-purpose queries
       such as those involving full-text search.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Traversing Endogenous Indices

       A graph is a general-purpose data structure. A graph can be used to model lists,
       maps, trees, etc. As such, a graph can model an index.
       It was previously assumed that a graph database makes use of an external
       indexing system to index the properties of its vertices and edges with assumption
       that specialized indexing systems are better suited for special-purpose queries
       such as those involving full-text search.
       In many cases, there is nothing that prevents the representation of an index
       within the graph itself—vertices and edges can be indexed by other vertices and
       edges and given the nature of how vertices and edges directly reference each
       other in a graph database, index look-up speeds are comparable.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Traversing Endogenous Indices

       A graph is a general-purpose data structure. A graph can be used to model lists,
       maps, trees, etc. As such, a graph can model an index.
       It was previously assumed that a graph database makes use of an external
       indexing system to index the properties of its vertices and edges with assumption
       that specialized indexing systems are better suited for special-purpose queries
       such as those involving full-text search.
       In many cases, there is nothing that prevents the representation of an index
       within the graph itself—vertices and edges can be indexed by other vertices and
       edges and given the nature of how vertices and edges directly reference each
       other in a graph database, index look-up speeds are comparable.
       Endogenous indices afford graph databases a great flexibility in modeling a
       domain. Not only can objects and their relationships be modeled (e.g. people and
       their friendships), but also the indices that partition the objects into meaningful
       subsets (e.g. people within a 2D region of space).




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Traversing Endogenous Indices

       A graph is a general-purpose data structure. A graph can be used to model lists,
       maps, trees, etc. As such, a graph can model an index.
       It was previously assumed that a graph database makes use of an external
       indexing system to index the properties of its vertices and edges with assumption
       that specialized indexing systems are better suited for special-purpose queries
       such as those involving full-text search.
       In many cases, there is nothing that prevents the representation of an index
       within the graph itself—vertices and edges can be indexed by other vertices and
       edges and given the nature of how vertices and edges directly reference each
       other in a graph database, index look-up speeds are comparable.
       Endogenous indices afford graph databases a great flexibility in modeling a
       domain. Not only can objects and their relationships be modeled (e.g. people and
       their friendships), but also the indices that partition the objects into meaningful
       subsets (e.g. people within a 2D region of space).
       The domain of spatial analysis makes use of advanced indexing structures such as
       the quadtree.



             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Traversing Endogenous Indices

       Quadtrees partition a two-dimensional plane into rectangular boxes based upon
       the spatial density of the points being indexed. Following figure diagrams how
       space is partitioned as the density of points increases within a region of the index.:




       Figure: A quadtree partition of a plane. This figure is an adaptation of a public domain
       image provided courtesy of David Eppstein.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline                            Definition of Graph Traversals
                                                Introduction                            Traversing for Recommendation
                                  The Realization of Graphs                             Content-Based Recommendation
                                           Graph Traversals                             Collaborative Filtering-Based Recommendation
                                                  Conclusion                            Traversing Endogenous Indices


Traversing Endogenous Indices
       In order to demonstrate how a quadtree index can be represented and traversed, a
       toy graph data set is presented. This data set is diagrammed in following Figure:
                                                                                type=quad
                                                                                  bl=[0,0]
                                                                               tr=[100,100]
                                                                                    1
                                                                     sub                            sub


                                               type=quad                                    type=quad                    type=quad
                                                 bl=[0,0]    2                      3        bl=[50,25]           4       bl=[50,0]
                                               tr=[50,100]                                   tr=[75,50]                 tr=[100,100]
                                                                     type=quad               type=quad
                                                                        bl=[0,0]              bl=[50,50]
                                                                      tr=[50,50]            tr=[100,100]
                                               type=quad                                                                 type=quad
                                                 bl=[0,50]   5             6        9           7                 8        bl=[50,0]
                                               tr=[50,100]                                                               tr=[100,50]
                                                                               type=quad
                                                   [0,100] 1                    bl=[50,25]                              [100,100]
                                                                                tr=[62,37]


                                                                                                     g        f

                                                                 a   b
                                                                                                e
                                                             5                          7


                                                                                   9                                       bl=[25,20]
                                                                                        d           3     h
                                                                           c                                               tr=[90,45]


                                                                                                                  i
                                                      [0,0] 6                      2 4                                8 [100,0]




       Figure: A quadtree index of a space that contains points of interest. The index is composed
       of the vertices 1-9 and the points of interest are the vertices a-i. While not diagrammed for
       the sake of clarity, all edges are labeled sub (meaning subsumes) and each point of interest
       vertex has an associated bottom-left (bl) property, top-right (tr) property, and a type
       property which is equal to “poi.”

              Igor Bogicevic (igor.bogicevic@sbgenomics.com)                            The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Traversing Endogenous Indices
       The top half of Figure represents a quadtree index (vertices 1-9). This quadtree
       index is partitioning “points of interest” (vertices a-i) located within the
       diagrammed plane.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Traversing Endogenous Indices
       The top half of Figure represents a quadtree index (vertices 1-9). This quadtree
       index is partitioning “points of interest” (vertices a-i) located within the
       diagrammed plane.
       All vertices maintain three properties—bottom-left (bl), top-right (tr), and type.
       For a quadtree vertex, these properties identify the two corner points defining a
       rectangular bounding box (i.e. the region that the quadtree vertex is indexing)
       and the vertex type which is equal to “quad”.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Traversing Endogenous Indices
       The top half of Figure represents a quadtree index (vertices 1-9). This quadtree
       index is partitioning “points of interest” (vertices a-i) located within the
       diagrammed plane.
       All vertices maintain three properties—bottom-left (bl), top-right (tr), and type.
       For a quadtree vertex, these properties identify the two corner points defining a
       rectangular bounding box (i.e. the region that the quadtree vertex is indexing)
       and the vertex type which is equal to “quad”.
       For a point of interest vertex, these properties denote the region of space that the
       point of interest exists within and the vertex type which is equal to “poi.”




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Traversing Endogenous Indices
       The top half of Figure represents a quadtree index (vertices 1-9). This quadtree
       index is partitioning “points of interest” (vertices a-i) located within the
       diagrammed plane.
       All vertices maintain three properties—bottom-left (bl), top-right (tr), and type.
       For a quadtree vertex, these properties identify the two corner points defining a
       rectangular bounding box (i.e. the region that the quadtree vertex is indexing)
       and the vertex type which is equal to “quad”.
       For a point of interest vertex, these properties denote the region of space that the
       point of interest exists within and the vertex type which is equal to “poi.”
       Quadtree vertex 1 denotes the entire region of space being indexed. This region
       is defined by its bottom-left (bl) and top-right (tr) corner points—namely [0, 0]
       and [100, 100], where blx = 0, bly = 0, trx = 100, and try = 100.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Traversing Endogenous Indices
       The top half of Figure represents a quadtree index (vertices 1-9). This quadtree
       index is partitioning “points of interest” (vertices a-i) located within the
       diagrammed plane.
       All vertices maintain three properties—bottom-left (bl), top-right (tr), and type.
       For a quadtree vertex, these properties identify the two corner points defining a
       rectangular bounding box (i.e. the region that the quadtree vertex is indexing)
       and the vertex type which is equal to “quad”.
       For a point of interest vertex, these properties denote the region of space that the
       point of interest exists within and the vertex type which is equal to “poi.”
       Quadtree vertex 1 denotes the entire region of space being indexed. This region
       is defined by its bottom-left (bl) and top-right (tr) corner points—namely [0, 0]
       and [100, 100], where blx = 0, bly = 0, trx = 100, and try = 100.
       Within the region defined by vertex 1, there are 8 other defined regions that
       partition that space into smaller spaces (vertices 2-9).




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Traversing Endogenous Indices
       The top half of Figure represents a quadtree index (vertices 1-9). This quadtree
       index is partitioning “points of interest” (vertices a-i) located within the
       diagrammed plane.
       All vertices maintain three properties—bottom-left (bl), top-right (tr), and type.
       For a quadtree vertex, these properties identify the two corner points defining a
       rectangular bounding box (i.e. the region that the quadtree vertex is indexing)
       and the vertex type which is equal to “quad”.
       For a point of interest vertex, these properties denote the region of space that the
       point of interest exists within and the vertex type which is equal to “poi.”
       Quadtree vertex 1 denotes the entire region of space being indexed. This region
       is defined by its bottom-left (bl) and top-right (tr) corner points—namely [0, 0]
       and [100, 100], where blx = 0, bly = 0, trx = 100, and try = 100.
       Within the region defined by vertex 1, there are 8 other defined regions that
       partition that space into smaller spaces (vertices 2-9).
       When one vertex subsumes another vertex by a directed edge labeled sub
       (i.e. subsumes), the outgoing (i.e. tail) vertex is subsuming the space that is
       defined by the incoming (i.e. head) vertex.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Traversing Endogenous Indices
       The top half of Figure represents a quadtree index (vertices 1-9). This quadtree
       index is partitioning “points of interest” (vertices a-i) located within the
       diagrammed plane.
       All vertices maintain three properties—bottom-left (bl), top-right (tr), and type.
       For a quadtree vertex, these properties identify the two corner points defining a
       rectangular bounding box (i.e. the region that the quadtree vertex is indexing)
       and the vertex type which is equal to “quad”.
       For a point of interest vertex, these properties denote the region of space that the
       point of interest exists within and the vertex type which is equal to “poi.”
       Quadtree vertex 1 denotes the entire region of space being indexed. This region
       is defined by its bottom-left (bl) and top-right (tr) corner points—namely [0, 0]
       and [100, 100], where blx = 0, bly = 0, trx = 100, and try = 100.
       Within the region defined by vertex 1, there are 8 other defined regions that
       partition that space into smaller spaces (vertices 2-9).
       When one vertex subsumes another vertex by a directed edge labeled sub
       (i.e. subsumes), the outgoing (i.e. tail) vertex is subsuming the space that is
       defined by the incoming (i.e. head) vertex.
       Given these properties and edges, identifying point of interest vertices within a
       region of space is simply a matter of traversing the quadtree index in a
       directed/algorithmic fashion.
             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Traversing Endogenous Indices

       In Figure the shaded region represents the spatial query: “Which points of
       interest are within the rectangular region defined by the corner points
       bl = [25, 20] and tr = [90, 45]?”




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Traversing Endogenous Indices

       In Figure the shaded region represents the spatial query: “Which points of
       interest are within the rectangular region defined by the corner points
       bl = [25, 20] and tr = [90, 45]?”
       In order to locate all the points of interest in this region, iteratively execute the
       following traversal starting from the root of the quadtree index (i.e. vertex 1).
                                        ˆ         ˆ
       The function is defined as f : P(V ) → P(V ), where
                         try ≥20             bly ≤45
                       “                                                           ”
               f (i) = p+        ◦ trx ≥25 ◦ p+
                                   p+                ◦ blx ≤90 ◦ vin ◦ elab+ ◦ eout (i).
                                                       p+
                                                                        sub




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
Outline   Definition of Graph Traversals
                                               Introduction   Traversing for Recommendation
                                 The Realization of Graphs    Content-Based Recommendation
                                          Graph Traversals    Collaborative Filtering-Based Recommendation
                                                 Conclusion   Traversing Endogenous Indices


Traversing Endogenous Indices

       In Figure the shaded region represents the spatial query: “Which points of
       interest are within the rectangular region defined by the corner points
       bl = [25, 20] and tr = [90, 45]?”
       In order to locate all the points of interest in this region, iteratively execute the
       following traversal starting from the root of the quadtree index (i.e. vertex 1).
                                        ˆ         ˆ
       The function is defined as f : P(V ) → P(V ), where
                         try ≥20             bly ≤45
                       “                                                           ”
               f (i) = p+        ◦ trx ≥25 ◦ p+
                                   p+                ◦ blx ≤90 ◦ vin ◦ elab+ ◦ eout (i).
                                                       p+
                                                                        sub


       The defining aspect of f is the set of 4 p+ filters that determine whether the
       current vertex is overlapping or within the query rectangle. Those vertices not
       overlapping or within the query rectangle are not traversed to. Thus, as the
       traversal iterates, fewer and fewer paths are examined and the resulting point of
       interest vertices within the query rectangle are converged upon.




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
The Graph Traversal Pattern
The Graph Traversal Pattern
The Graph Traversal Pattern
The Graph Traversal Pattern
The Graph Traversal Pattern
The Graph Traversal Pattern
The Graph Traversal Pattern
The Graph Traversal Pattern
The Graph Traversal Pattern
The Graph Traversal Pattern
The Graph Traversal Pattern

More Related Content

What's hot

Graphs In Data Structure
Graphs In Data StructureGraphs In Data Structure
Graphs In Data StructureAnuj Modi
 
Gremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming LanguageGremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming LanguageMarko Rodriguez
 
08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)dnac
 
Discrete-Chapter 11 Graphs Part I
Discrete-Chapter 11 Graphs Part IDiscrete-Chapter 11 Graphs Part I
Discrete-Chapter 11 Graphs Part IWongyos Keardsri
 
Statistical inference of network structure
Statistical inference of network structureStatistical inference of network structure
Statistical inference of network structureTiago Peixoto
 
Discrete-Chapter 11 Graphs Part III
Discrete-Chapter 11 Graphs Part IIIDiscrete-Chapter 11 Graphs Part III
Discrete-Chapter 11 Graphs Part IIIWongyos Keardsri
 
Tn 110 lecture 8
Tn 110 lecture 8Tn 110 lecture 8
Tn 110 lecture 8ITNet
 

What's hot (10)

Graphs In Data Structure
Graphs In Data StructureGraphs In Data Structure
Graphs In Data Structure
 
Gremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming LanguageGremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming Language
 
08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)
 
Lec14 eigenface and fisherface
Lec14 eigenface and fisherfaceLec14 eigenface and fisherface
Lec14 eigenface and fisherface
 
Graphs data structures
Graphs data structuresGraphs data structures
Graphs data structures
 
Discrete-Chapter 11 Graphs Part I
Discrete-Chapter 11 Graphs Part IDiscrete-Chapter 11 Graphs Part I
Discrete-Chapter 11 Graphs Part I
 
Statistical inference of network structure
Statistical inference of network structureStatistical inference of network structure
Statistical inference of network structure
 
Discrete-Chapter 11 Graphs Part III
Discrete-Chapter 11 Graphs Part IIIDiscrete-Chapter 11 Graphs Part III
Discrete-Chapter 11 Graphs Part III
 
Graphs
GraphsGraphs
Graphs
 
Tn 110 lecture 8
Tn 110 lecture 8Tn 110 lecture 8
Tn 110 lecture 8
 

Similar to The Graph Traversal Pattern

Graph Introduction.ppt
Graph Introduction.pptGraph Introduction.ppt
Graph Introduction.pptFaruk Hossen
 
Slides Chapter10.1 10.2
Slides Chapter10.1 10.2Slides Chapter10.1 10.2
Slides Chapter10.1 10.2showslidedump
 
FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATION
FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATIONFREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATION
FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATIONcscpconf
 
Graph theory concepts complex networks presents-rouhollah nabati
Graph theory concepts   complex networks presents-rouhollah nabatiGraph theory concepts   complex networks presents-rouhollah nabati
Graph theory concepts complex networks presents-rouhollah nabatinabati
 
Skiena algorithm 2007 lecture10 graph data strctures
Skiena algorithm 2007 lecture10 graph data strcturesSkiena algorithm 2007 lecture10 graph data strctures
Skiena algorithm 2007 lecture10 graph data strctureszukun
 
Graph representation
Graph representationGraph representation
Graph representationDEEPIKA T
 
Semantic Data Management in Graph Databases
Semantic Data Management in Graph DatabasesSemantic Data Management in Graph Databases
Semantic Data Management in Graph DatabasesMaribel Acosta Deibe
 
Grpahs in Data Structure
Grpahs in Data StructureGrpahs in Data Structure
Grpahs in Data StructureAvichalVishnoi
 
Fallsem2015 16 cp4194-13-oct-2015_rm01_graphs
Fallsem2015 16 cp4194-13-oct-2015_rm01_graphsFallsem2015 16 cp4194-13-oct-2015_rm01_graphs
Fallsem2015 16 cp4194-13-oct-2015_rm01_graphsSnehilKeshari
 
Graph terminology and algorithm and tree.pptx
Graph terminology and algorithm and tree.pptxGraph terminology and algorithm and tree.pptx
Graph terminology and algorithm and tree.pptxasimshahzad8611
 

Similar to The Graph Traversal Pattern (20)

Graph Introduction.ppt
Graph Introduction.pptGraph Introduction.ppt
Graph Introduction.ppt
 
Chap 6 Graph.ppt
Chap 6 Graph.pptChap 6 Graph.ppt
Chap 6 Graph.ppt
 
Slides Chapter10.1 10.2
Slides Chapter10.1 10.2Slides Chapter10.1 10.2
Slides Chapter10.1 10.2
 
FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATION
FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATIONFREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATION
FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATION
 
Graph theory concepts complex networks presents-rouhollah nabati
Graph theory concepts   complex networks presents-rouhollah nabatiGraph theory concepts   complex networks presents-rouhollah nabati
Graph theory concepts complex networks presents-rouhollah nabati
 
Skiena algorithm 2007 lecture10 graph data strctures
Skiena algorithm 2007 lecture10 graph data strcturesSkiena algorithm 2007 lecture10 graph data strctures
Skiena algorithm 2007 lecture10 graph data strctures
 
Graph
GraphGraph
Graph
 
Graphs
GraphsGraphs
Graphs
 
Graph representation
Graph representationGraph representation
Graph representation
 
Semantic Data Management in Graph Databases
Semantic Data Management in Graph DatabasesSemantic Data Management in Graph Databases
Semantic Data Management in Graph Databases
 
09_DS_MCA_Graphs.pdf
09_DS_MCA_Graphs.pdf09_DS_MCA_Graphs.pdf
09_DS_MCA_Graphs.pdf
 
Grpahs in Data Structure
Grpahs in Data StructureGrpahs in Data Structure
Grpahs in Data Structure
 
Fallsem2015 16 cp4194-13-oct-2015_rm01_graphs
Fallsem2015 16 cp4194-13-oct-2015_rm01_graphsFallsem2015 16 cp4194-13-oct-2015_rm01_graphs
Fallsem2015 16 cp4194-13-oct-2015_rm01_graphs
 
Graph terminology and algorithm and tree.pptx
Graph terminology and algorithm and tree.pptxGraph terminology and algorithm and tree.pptx
Graph terminology and algorithm and tree.pptx
 
05 graph
05 graph05 graph
05 graph
 
Dsa.PPT
Dsa.PPTDsa.PPT
Dsa.PPT
 
chapter6.PPT
chapter6.PPTchapter6.PPT
chapter6.PPT
 
Graph
GraphGraph
Graph
 
Graph
GraphGraph
Graph
 
graph theory
graph theorygraph theory
graph theory
 

Recently uploaded

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Recently uploaded (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

The Graph Traversal Pattern

  • 1. Outline Introduction The Realization of Graphs Graph Traversals Conclusion The Graph Traversal Pattern (Marko A. Rodriguez, Peter Neubauer) Igor Bogicevic (igor.bogicevic@sbgenomics.com) September 5, 2010 Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 2. Outline Introduction The Realization of Graphs Graph Traversals Conclusion Introduction The Realization of Graphs Brief Introductory The Indices of Relational Tables The Graph as an Index Graph Traversals Definition of Graph Traversals Traversing for Recommendation Content-Based Recommendation Collaborative Filtering-Based Recommendation Traversing Endogenous Indices Conclusion Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 3. Outline Introduction The Realization of Graphs Graph Traversals Conclusion Introduction In the most common sense of the term, a graph is an ordered pair G = (V , E ) comprising a set V of vertices or nodes together with a set E of edges or lines, which are 2-element subsets of V (i.e, an edge is related with two vertices, and the relation is represented as unordered pair of the vertices with respect to the particular edge). To avoid ambiguity, this type of graph may be described precisely as undirected and simple. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 4. Outline Introduction The Realization of Graphs Graph Traversals Conclusion Introduction In the most common sense of the term, a graph is an ordered pair G = (V , E ) comprising a set V of vertices or nodes together with a set E of edges or lines, which are 2-element subsets of V (i.e, an edge is related with two vertices, and the relation is represented as unordered pair of the vertices with respect to the particular edge). To avoid ambiguity, this type of graph may be described precisely as undirected and simple. For directed graphs, E ⊆ (V × V ) and for undirected graphs, E ⊆ {V × V }. That is, E is a subset of all ordered or unordered permutations of V element pairings. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 5. Outline Introduction The Realization of Graphs Graph Traversals Conclusion Introduction In the most common sense of the term, a graph is an ordered pair G = (V , E ) comprising a set V of vertices or nodes together with a set E of edges or lines, which are 2-element subsets of V (i.e, an edge is related with two vertices, and the relation is represented as unordered pair of the vertices with respect to the particular edge). To avoid ambiguity, this type of graph may be described precisely as undirected and simple. For directed graphs, E ⊆ (V × V ) and for undirected graphs, E ⊆ {V × V }. That is, E is a subset of all ordered or unordered permutations of V element pairings. In different contexts it may be useful to define the term graph with different degrees of generality. Whenever it is necessary to draw a strict distinction, the following terms are used. Most commonly, in modern texts in graph theory, unless stated otherwise, graph means ”undirected simple finite graph” Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 6. Outline Introduction The Realization of Graphs Graph Traversals Conclusion Introduction In the most common sense of the term, a graph is an ordered pair G = (V , E ) comprising a set V of vertices or nodes together with a set E of edges or lines, which are 2-element subsets of V (i.e, an edge is related with two vertices, and the relation is represented as unordered pair of the vertices with respect to the particular edge). To avoid ambiguity, this type of graph may be described precisely as undirected and simple. For directed graphs, E ⊆ (V × V ) and for undirected graphs, E ⊆ {V × V }. That is, E is a subset of all ordered or unordered permutations of V element pairings. In different contexts it may be useful to define the term graph with different degrees of generality. Whenever it is necessary to draw a strict distinction, the following terms are used. Most commonly, in modern texts in graph theory, unless stated otherwise, graph means ”undirected simple finite graph” We can talk about various other types of graphs: Mixed graphs, Multigraphs, Weighted graphs, etc. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 7. Outline Introduction Brief Introductory The Realization of Graphs The Indices of Relational Tables Graph Traversals The Graph as an Index Conclusion Brief Introductory Relational databases have been around since the late 1960s (Edgar F. Codd, A relational model of data for large shared data banks) and are todays most predominate data management tool. Relational databases maintain a collection of tables. Each table can be defined by a set of rows and a set of columns. Semantically, rows denote objects and columns denote properties/attributes. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 8. Outline Introduction Brief Introductory The Realization of Graphs The Indices of Relational Tables Graph Traversals The Graph as an Index Conclusion Brief Introductory Relational databases have been around since the late 1960s (Edgar F. Codd, A relational model of data for large shared data banks) and are todays most predominate data management tool. Relational databases maintain a collection of tables. Each table can be defined by a set of rows and a set of columns. Semantically, rows denote objects and columns denote properties/attributes. In contrast, graph databases do not store data in disparate tables. Instead there is a single data structure—the graph. Moreover, there is no concept of a “join” operation as every vertex and edge has a direct reference to its adjacent vertex or edge. The data structure is already “joined” by the edges that are defined. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 9. Outline Introduction Brief Introductory The Realization of Graphs The Indices of Relational Tables Graph Traversals The Graph as an Index Conclusion Brief Introductory Relational databases have been around since the late 1960s (Edgar F. Codd, A relational model of data for large shared data banks) and are todays most predominate data management tool. Relational databases maintain a collection of tables. Each table can be defined by a set of rows and a set of columns. Semantically, rows denote objects and columns denote properties/attributes. In contrast, graph databases do not store data in disparate tables. Instead there is a single data structure—the graph. Moreover, there is no concept of a “join” operation as every vertex and edge has a direct reference to its adjacent vertex or edge. The data structure is already “joined” by the edges that are defined. Main drawback - it’s hard to shard a graph. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 10. Outline Introduction Brief Introductory The Realization of Graphs The Indices of Relational Tables Graph Traversals The Graph as an Index Conclusion Brief Introductory Relational databases have been around since the late 1960s (Edgar F. Codd, A relational model of data for large shared data banks) and are todays most predominate data management tool. Relational databases maintain a collection of tables. Each table can be defined by a set of rows and a set of columns. Semantically, rows denote objects and columns denote properties/attributes. In contrast, graph databases do not store data in disparate tables. Instead there is a single data structure—the graph. Moreover, there is no concept of a “join” operation as every vertex and edge has a direct reference to its adjacent vertex or edge. The data structure is already “joined” by the edges that are defined. Main drawback - it’s hard to shard a graph. Main benefit - constant time cost for retrieving an adjacent vertex or edge. In other words, regardless of the size of the graph as a whole, the cost of a local read operation at a vertex or edge remains constant. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 11. Outline Introduction Brief Introductory The Realization of Graphs The Indices of Relational Tables Graph Traversals The Graph as an Index Conclusion The Indices of Relational Tables Direct data access is inefficient - for n elements, a linear scan runs in O(n) Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 12. Outline Introduction Brief Introductory The Realization of Graphs The Indices of Relational Tables Graph Traversals The Graph as an Index Conclusion The Indices of Relational Tables Direct data access is inefficient - for n elements, a linear scan runs in O(n) Binary search tree dramatically reduces the overhead - searching such indices takes O(log2 n) Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 13. Outline Introduction Brief Introductory The Realization of Graphs The Indices of Relational Tables Graph Traversals The Graph as an Index Conclusion The Indices of Relational Tables Direct data access is inefficient - for n elements, a linear scan runs in O(n) Binary search tree dramatically reduces the overhead - searching such indices takes O(log2 n) In effect, the join operation forms a graph that is dynamically constructed as one table is linked to another table. While having the benefit of being able to dynamically construct graphs, the limitation is that this graph is not explicit in the relational structure, but instead must be inferred through a series of index-intensive operations. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 14. Outline Introduction Brief Introductory The Realization of Graphs The Indices of Relational Tables Graph Traversals The Graph as an Index Conclusion The Indices of Relational Tables person.identifier index person.name index friend.person_a index identifier name person_a person_b 1 Alberto Pepe 1 2 2 ... 1 3 3 ... 1 4 4 ... ... ... Figure: A table representation of people and their friends. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 15. Outline Introduction Brief Introductory The Realization of Graphs The Indices of Relational Tables Graph Traversals The Graph as an Index Conclusion The Graph as an Index A single-relational graph maintains a set of edges, where all the edges are homogeneous in meaning. This means that separate relationships (i.e. friendship, kinship, etc.) are stored as separate graphs. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 16. Outline Introduction Brief Introductory The Realization of Graphs The Indices of Relational Tables Graph Traversals The Graph as an Index Conclusion The Graph as an Index A single-relational graph maintains a set of edges, where all the edges are homogeneous in meaning. This means that separate relationships (i.e. friendship, kinship, etc.) are stored as separate graphs. Formally, a property graph can be defined as G = (V , E , λ, µ), where edges are directed (i.e. E ⊆ (V × V )), edges are labeled (i.e. λ : E → Σ), and properties are a map from elements and keys to values (i.e. µ : (V ∪ E ) × R → S). Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 17. Outline Introduction Brief Introductory The Realization of Graphs The Indices of Relational Tables Graph Traversals The Graph as an Index Conclusion The Graph as an Index A single-relational graph maintains a set of edges, where all the edges are homogeneous in meaning. This means that separate relationships (i.e. friendship, kinship, etc.) are stored as separate graphs. Formally, a property graph can be defined as G = (V , E , λ, µ), where edges are directed (i.e. E ⊆ (V × V )), edges are labeled (i.e. λ : E → Σ), and properties are a map from elements and keys to values (i.e. µ : (V ∪ E ) × R → S). In the property graph model, it is common for the properties of the vertices (and sometimes edges) to be indexed using a tree structure analogous, in many ways, to those used by relational databases. This index can be represented by some external indexing system or endogenous to the graph as an embedded tree. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 18. Outline Introduction Brief Introductory The Realization of Graphs The Indices of Relational Tables Graph Traversals The Graph as an Index Conclusion The Graph as an Index A single-relational graph maintains a set of edges, where all the edges are homogeneous in meaning. This means that separate relationships (i.e. friendship, kinship, etc.) are stored as separate graphs. Formally, a property graph can be defined as G = (V , E , λ, µ), where edges are directed (i.e. E ⊆ (V × V )), edges are labeled (i.e. λ : E → Σ), and properties are a map from elements and keys to values (i.e. µ : (V ∪ E ) × R → S). In the property graph model, it is common for the properties of the vertices (and sometimes edges) to be indexed using a tree structure analogous, in many ways, to those used by relational databases. This index can be represented by some external indexing system or endogenous to the graph as an embedded tree. The domain model defines how the elements of the problem space are related. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 19. Outline Introduction Brief Introductory The Realization of Graphs The Indices of Relational Tables Graph Traversals The Graph as an Index Conclusion The Graph as an Index A single-relational graph maintains a set of edges, where all the edges are homogeneous in meaning. This means that separate relationships (i.e. friendship, kinship, etc.) are stored as separate graphs. Formally, a property graph can be defined as G = (V , E , λ, µ), where edges are directed (i.e. E ⊆ (V × V )), edges are labeled (i.e. λ : E → Σ), and properties are a map from elements and keys to values (i.e. µ : (V ∪ E ) × R → S). In the property graph model, it is common for the properties of the vertices (and sometimes edges) to be indexed using a tree structure analogous, in many ways, to those used by relational databases. This index can be represented by some external indexing system or endogenous to the graph as an embedded tree. The domain model defines how the elements of the problem space are related. The domain model partitions elements using semantics defined by the domain modeler. Thus, in many ways, a graph can be seen as an indexing structure. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 20. Outline Introduction Brief Introductory The Realization of Graphs The Indices of Relational Tables Graph Traversals The Graph as an Index Conclusion The Graph as an Index A single-relational graph maintains a set of edges, where all the edges are homogeneous in meaning. This means that separate relationships (i.e. friendship, kinship, etc.) are stored as separate graphs. Formally, a property graph can be defined as G = (V , E , λ, µ), where edges are directed (i.e. E ⊆ (V × V )), edges are labeled (i.e. λ : E → Σ), and properties are a map from elements and keys to values (i.e. µ : (V ∪ E ) × R → S). In the property graph model, it is common for the properties of the vertices (and sometimes edges) to be indexed using a tree structure analogous, in many ways, to those used by relational databases. This index can be represented by some external indexing system or endogenous to the graph as an embedded tree. The domain model defines how the elements of the problem space are related. The domain model partitions elements using semantics defined by the domain modeler. Thus, in many ways, a graph can be seen as an indexing structure. In a graph database, there is no explicit join operation because vertices maintain direct references to their adjacent edges. In many ways, the edges of the graph serve as explicit, “hard-wired” join structures, not computed in run time as joins. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 21. Outline Introduction Brief Introductory The Realization of Graphs The Indices of Relational Tables Graph Traversals The Graph as an Index Conclusion The Graph as an Index A single-relational graph maintains a set of edges, where all the edges are homogeneous in meaning. This means that separate relationships (i.e. friendship, kinship, etc.) are stored as separate graphs. Formally, a property graph can be defined as G = (V , E , λ, µ), where edges are directed (i.e. E ⊆ (V × V )), edges are labeled (i.e. λ : E → Σ), and properties are a map from elements and keys to values (i.e. µ : (V ∪ E ) × R → S). In the property graph model, it is common for the properties of the vertices (and sometimes edges) to be indexed using a tree structure analogous, in many ways, to those used by relational databases. This index can be represented by some external indexing system or endogenous to the graph as an embedded tree. The domain model defines how the elements of the problem space are related. The domain model partitions elements using semantics defined by the domain modeler. Thus, in many ways, a graph can be seen as an indexing structure. In a graph database, there is no explicit join operation because vertices maintain direct references to their adjacent edges. In many ways, the edges of the graph serve as explicit, “hard-wired” join structures, not computed in run time as joins. The act of traversing over an edge is the act of joining. However, what makes this more efficient in a graph database is that traversing from one vertex to another is a constant time operation. Thus, traversal time is defined solely by the number of elements touched by the traversal. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 22. Outline Introduction Brief Introductory The Realization of Graphs The Indices of Relational Tables Graph Traversals The Graph as an Index Conclusion The Graph as an Index name=... 2 vertex.name index friend name=Alberto Pepe name=... 1 friend 3 ... ... friend name=... 4 Figure: A graph representation of people and their friends. Given the tree-nature of the vertex.name index, it is possible, and many times useful to model the index endogenous to the graph. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 23. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Definition of Graph Traversals Graph traversal refers to visiting elements (i.e. vertices and edges) in a graph in some algorithmic fashion. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 24. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Definition of Graph Traversals Graph traversal refers to visiting elements (i.e. vertices and edges) in a graph in some algorithmic fashion. The most primitive, read-based operation on a graph is a single step traversal from element i to element j, where i, j ∈ (V ∪ E ). Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 25. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Definition of Graph Traversals Graph traversal refers to visiting elements (i.e. vertices and edges) in a graph in some algorithmic fashion. The most primitive, read-based operation on a graph is a single step traversal from element i to element j, where i, j ∈ (V ∪ E ). These operations are defined over power multiset domains and ranges - power ˆ multiset of A, denoted P(A), is the infinite set of all subsets of multisets of A. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 26. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Definition of Graph Traversals Graph traversal refers to visiting elements (i.e. vertices and edges) in a graph in some algorithmic fashion. The most primitive, read-based operation on a graph is a single step traversal from element i to element j, where i, j ∈ (V ∪ E ). These operations are defined over power multiset domains and ranges - power ˆ multiset of A, denoted P(A), is the infinite set of all subsets of multisets of A. Let’s list single step traversals: eout : P(V ) → P(E ): traverse to the outgoing edges of the vertices. ˆ ˆ ein : P(V ) → P(E ): traverse to the incoming edges to the vertices. ˆ ˆ vout : P(E ) → P(V ): traverse to the outgoing (i.e. tail) vertices of the edges. ˆ ˆ vin : P(E ) → P(V ): traverse the incoming (i.e. head) vertices of the edges. ˆ ˆ : P(V ∪ E ) × R → P(S): get the element property values for key r ∈ R. ˆ ˆ Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 27. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Definition of Graph Traversals When edges are labeled and elements have properties, it is desirable to constrain the traversal to edges of a particular label or elements with particular properties. These operations are known as filters and are abstractly defined in the following itemization: elab± : P(E ) × Σ → P(E ): allow (or filter) all edges with the label σ ∈ Σ. ˆ ˆ p± : P(V ∪ E ) × R × S → P(V ∪ E ): allow (or filter) all elements with the property ˆ ˆ s ∈ S for key r ∈ R. ± : P(V ∪ E ) × (V ∪ E ) → P(V ∪ E ): allow (or filter) all elements that are the ˆ ˆ provided element. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 28. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Definition of Graph Traversals When edges are labeled and elements have properties, it is desirable to constrain the traversal to edges of a particular label or elements with particular properties. These operations are known as filters and are abstractly defined in the following itemization: elab± : P(E ) × Σ → P(E ): allow (or filter) all edges with the label σ ∈ Σ. ˆ ˆ p± : P(V ∪ E ) × R × S → P(V ∪ E ): allow (or filter) all elements with the property ˆ ˆ s ∈ S for key r ∈ R. ± : P(V ∪ E ) × (V ∪ E ) → P(V ∪ E ): allow (or filter) all elements that are the ˆ ˆ provided element. Through function composition, we can define graph traversals of arbitrary length, i.e. Alberto Pepe’s friends: ˆ ˆ f : P(V ) → P(S), where f (i) = (vin (elab+ (eout (i), friend)) , name) , then f (i) will return the names of Alberto Pepe’s friends. Through function currying and composition, the previous definition can be represented more clearly with the following function rule, “ ” f (i) = name ◦ vin ◦ elab+ ◦ eout (i). friend Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 29. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Definition of Graph Traversals name=... 2 friend name=Alberto Pepe name=... 1 friend 3 name=... eout friend 4 efriend lab+ vin name Figure: A single path along along the f traversal. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 30. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing for Recommendation Recommendation systems are designed to help people deal with the problem of information overload by filtering information in the system that doesn’t pertain to the person. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 31. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing for Recommendation Recommendation systems are designed to help people deal with the problem of information overload by filtering information in the system that doesn’t pertain to the person. Content-based recommendation deals with recommending resources that share characteristics (i.e. content) with a set of resources. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 32. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing for Recommendation Recommendation systems are designed to help people deal with the problem of information overload by filtering information in the system that doesn’t pertain to the person. Content-based recommendation deals with recommending resources that share characteristics (i.e. content) with a set of resources. On the other side collaborative filtering-based recommendation in concerned with determining the similarity of resources based upon the similarity of the taste of the people modeled within the system. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 33. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing for Recommendation Recommendation systems are designed to help people deal with the problem of information overload by filtering information in the system that doesn’t pertain to the person. Content-based recommendation deals with recommending resources that share characteristics (i.e. content) with a set of resources. On the other side collaborative filtering-based recommendation in concerned with determining the similarity of resources based upon the similarity of the taste of the people modeled within the system. Graph databases provide a good paradigm for dealing with both recommendation techniques. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 34. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing for Recommendation Recommendation systems are designed to help people deal with the problem of information overload by filtering information in the system that doesn’t pertain to the person. Content-based recommendation deals with recommending resources that share characteristics (i.e. content) with a set of resources. On the other side collaborative filtering-based recommendation in concerned with determining the similarity of resources based upon the similarity of the taste of the people modeled within the system. Graph databases provide a good paradigm for dealing with both recommendation techniques. Furthermore, using the graph traversal pattern, there exists a single graph data structure that can be traversed in different ways to expose different types of recommendations—generally, different types of relationships between vertices. Being able to mix and match the types of traversals executed alters the semantics of the final rankings and conveniently allows for hybrid recommendation algorithms to emerge. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 35. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing for Recommendation Recommendation systems are designed to help people deal with the problem of information overload by filtering information in the system that doesn’t pertain to the person. Content-based recommendation deals with recommending resources that share characteristics (i.e. content) with a set of resources. On the other side collaborative filtering-based recommendation in concerned with determining the similarity of resources based upon the similarity of the taste of the people modeled within the system. Graph databases provide a good paradigm for dealing with both recommendation techniques. Furthermore, using the graph traversal pattern, there exists a single graph data structure that can be traversed in different ways to expose different types of recommendations—generally, different types of relationships between vertices. Being able to mix and match the types of traversals executed alters the semantics of the final rankings and conveniently allows for hybrid recommendation algorithms to emerge. Following figure presents a toy graph data set, where there exist a set of people, resources, and features related to each other by likes- and feature-labeled edges. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 36. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing for Recommendation feature 8 f r 2 likes p 6 likes likes p 1 likes r 3 likes p 7 feature likes r 4 feature f 5 Figure: A graph data structure containing people (p), their liked resources (r), and each resource’s features (f). Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 37. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Content-Based Recommendation In order to identify resources that that are similar in features (i.e. content-based recommendation) to a resource, traverse to all resources that share the same ˆ ˆ features. This is accomplished with the following function, f : P(V ) → P(V ), where “ ” f (i) = i − ◦ vout ◦ elab+ ◦ ein ◦ vin ◦ elab+ ◦ eout (i). feature feature Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 38. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Content-Based Recommendation In order to identify resources that that are similar in features (i.e. content-based recommendation) to a resource, traverse to all resources that share the same ˆ ˆ features. This is accomplished with the following function, f : P(V ) → P(V ), where “ ” f (i) = i − ◦ vout ◦ elab+ ◦ ein ◦ vin ◦ elab+ ◦ eout (i). feature feature Assuming i = 3, function f states, traverse to the outgoing edges of resource vertex 3, only allow feature-labeled edges, and then traverse to the incoming vertices of those feature-labeled edges. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 39. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Content-Based Recommendation In order to identify resources that that are similar in features (i.e. content-based recommendation) to a resource, traverse to all resources that share the same ˆ ˆ features. This is accomplished with the following function, f : P(V ) → P(V ), where “ ” f (i) = i − ◦ vout ◦ elab+ ◦ ein ◦ vin ◦ elab+ ◦ eout (i). feature feature Assuming i = 3, function f states, traverse to the outgoing edges of resource vertex 3, only allow feature-labeled edges, and then traverse to the incoming vertices of those feature-labeled edges. At this point, the traverser is at feature vertex 8. Next, traverse to the incoming edges of feature vertex 8, only allow feature-labeled edges, and then traverse to the outgoing vertices of these feature-labeled edges and at that point, the traverser is at resource vertices 3 and 2. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 40. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Content-Based Recommendation In order to identify resources that that are similar in features (i.e. content-based recommendation) to a resource, traverse to all resources that share the same ˆ ˆ features. This is accomplished with the following function, f : P(V ) → P(V ), where “ ” f (i) = i − ◦ vout ◦ elab+ ◦ ein ◦ vin ◦ elab+ ◦ eout (i). feature feature Assuming i = 3, function f states, traverse to the outgoing edges of resource vertex 3, only allow feature-labeled edges, and then traverse to the incoming vertices of those feature-labeled edges. At this point, the traverser is at feature vertex 8. Next, traverse to the incoming edges of feature vertex 8, only allow feature-labeled edges, and then traverse to the outgoing vertices of these feature-labeled edges and at that point, the traverser is at resource vertices 3 and 2. However, since we are trying to identify those resources similar in content to vertex 3, we need to filter out vertex 3. This is accomplished by the last stage of the function composition. Thus, given the toy graph data set, vertex 2 is similar to vertex 3 in content. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 41. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Content-Based Recommendation In order to identify resources that that are similar in features (i.e. content-based recommendation) to a resource, traverse to all resources that share the same ˆ ˆ features. This is accomplished with the following function, f : P(V ) → P(V ), where “ ” f (i) = i − ◦ vout ◦ elab+ ◦ ein ◦ vin ◦ elab+ ◦ eout (i). feature feature Assuming i = 3, function f states, traverse to the outgoing edges of resource vertex 3, only allow feature-labeled edges, and then traverse to the incoming vertices of those feature-labeled edges. At this point, the traverser is at feature vertex 8. Next, traverse to the incoming edges of feature vertex 8, only allow feature-labeled edges, and then traverse to the outgoing vertices of these feature-labeled edges and at that point, the traverser is at resource vertices 3 and 2. However, since we are trying to identify those resources similar in content to vertex 3, we need to filter out vertex 3. This is accomplished by the last stage of the function composition. Thus, given the toy graph data set, vertex 2 is similar to vertex 3 in content. Following diagram illustrates such traversal: Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 42. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Content-Based Recommendation efeature lab+ ein vout feature 8 f r 2 i − vin r 3 feature eout efeature lab+ Figure: A traversal that identifies resources that are similar in content to a set of resources based upon shared features. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 43. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Content-Based Recommendation It’s simple to extend content-based recommendation to problems such as: “Given what person i likes, what other resources have similar features?” Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 44. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Content-Based Recommendation It’s simple to extend content-based recommendation to problems such as: “Given what person i likes, what other resources have similar features?” Such a problem is solved using the previous function f defined above combined with a new composition that finds all the resources that person i likes. Thus, if ˆ ˆ g : P(V ) → P(V ), where “ ” likes g (i) = vin ◦ elab+ ◦ eout (i), then to determine those resources similar in features to the resources that person vertex 7 likes, compose function f and g : (f ◦ g ). Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 45. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Content-Based Recommendation It’s simple to extend content-based recommendation to problems such as: “Given what person i likes, what other resources have similar features?” Such a problem is solved using the previous function f defined above combined with a new composition that finds all the resources that person i likes. Thus, if ˆ ˆ g : P(V ) → P(V ), where “ ” likes g (i) = vin ◦ elab+ ◦ eout (i), then to determine those resources similar in features to the resources that person vertex 7 likes, compose function f and g : (f ◦ g ). Those resources that share more features in common will be returned more by f ◦ g. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 46. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Content-Based Recommendation It’s simple to extend content-based recommendation to problems such as: “Given what person i likes, what other resources have similar features?” Such a problem is solved using the previous function f defined above combined with a new composition that finds all the resources that person i likes. Thus, if ˆ ˆ g : P(V ) → P(V ), where “ ” likes g (i) = vin ◦ elab+ ◦ eout (i), then to determine those resources similar in features to the resources that person vertex 7 likes, compose function f and g : (f ◦ g ). Those resources that share more features in common will be returned more by f ◦ g. Using this approach, function can return similar results so, if necessary, deduplication has to be handled with a separate mechanism. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 47. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Collaborative Filtering-Based Recommendation With collaborative filtering, the objective is to identify a set of resources that have a high probability of being liked by a person based upon identifying other people in the system that enjoy similar likes. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 48. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Collaborative Filtering-Based Recommendation With collaborative filtering, the objective is to identify a set of resources that have a high probability of being liked by a person based upon identifying other people in the system that enjoy similar likes. In example, if person a and person b share 90% of their liked resources in common, then the remaining 10% they don’t share in common are candidates for recommendation. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 49. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Collaborative Filtering-Based Recommendation With collaborative filtering, the objective is to identify a set of resources that have a high probability of being liked by a person based upon identifying other people in the system that enjoy similar likes. In example, if person a and person b share 90% of their liked resources in common, then the remaining 10% they don’t share in common are candidates for recommendation. For simplicity, this traversal is broken into two components f and g . Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 50. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Collaborative Filtering-Based Recommendation ˆ ˆ First component is f : P(V ) → P(V ) where “ ” f (i) = i − ◦ vout ◦ elab+ ◦ ein ◦ vin ◦ elab+ ◦ eout (i). like like Function f traverses to all those people vertices that like the same resources as person vertex i and who themselves are not vertex i (as a person is obviously similar to themselves and thus, doesn’t contribute anything to the computation). The more resources liked that a person shares in common with i, the more traversers will be located at that person’s vertex. In other words, if person i and person j share 10 liked resources in common, then f (i) will return person j 10 times. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 51. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Collaborative Filtering-Based Recommendation ˆ ˆ First component is f : P(V ) → P(V ) where “ ” f (i) = i − ◦ vout ◦ elab+ ◦ ein ◦ vin ◦ elab+ ◦ eout (i). like like Function f traverses to all those people vertices that like the same resources as person vertex i and who themselves are not vertex i (as a person is obviously similar to themselves and thus, doesn’t contribute anything to the computation). The more resources liked that a person shares in common with i, the more traversers will be located at that person’s vertex. In other words, if person i and person j share 10 liked resources in common, then f (i) will return person j 10 times. Next, function g is defined as “ ” like g (j) = vin ◦ elab+ ◦ eout (j). Function g traverses to all the resources liked by vertex j. In composition, (g ◦ f )(i) determines all those resources that are liked by those people that have similar tastes to vertex i. If person j likes 10 resources in common with person i, then the resources that person j likes will be returned at least 10 times by g ◦ f (perhaps more if a path exists to those resources from another person vertex as well). Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 52. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Collaborative Filtering-Based Recommendation Following figure diagrams a function path starting from vertex 7. Only one legal path is presented for the sake of diagram clarity g likes j − vin elab+ eout i − r 2 likes p 6 elikes lab+ likes ein vout likes f p 1 likes r 3 likes p 7 likes vin eout r 4 elikes lab+ Figure: A traversal that identifies resources that are similar in content to a resource based upon shared features. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 53. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing Endogenous Indices A graph is a general-purpose data structure. A graph can be used to model lists, maps, trees, etc. As such, a graph can model an index. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 54. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing Endogenous Indices A graph is a general-purpose data structure. A graph can be used to model lists, maps, trees, etc. As such, a graph can model an index. It was previously assumed that a graph database makes use of an external indexing system to index the properties of its vertices and edges with assumption that specialized indexing systems are better suited for special-purpose queries such as those involving full-text search. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 55. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing Endogenous Indices A graph is a general-purpose data structure. A graph can be used to model lists, maps, trees, etc. As such, a graph can model an index. It was previously assumed that a graph database makes use of an external indexing system to index the properties of its vertices and edges with assumption that specialized indexing systems are better suited for special-purpose queries such as those involving full-text search. In many cases, there is nothing that prevents the representation of an index within the graph itself—vertices and edges can be indexed by other vertices and edges and given the nature of how vertices and edges directly reference each other in a graph database, index look-up speeds are comparable. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 56. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing Endogenous Indices A graph is a general-purpose data structure. A graph can be used to model lists, maps, trees, etc. As such, a graph can model an index. It was previously assumed that a graph database makes use of an external indexing system to index the properties of its vertices and edges with assumption that specialized indexing systems are better suited for special-purpose queries such as those involving full-text search. In many cases, there is nothing that prevents the representation of an index within the graph itself—vertices and edges can be indexed by other vertices and edges and given the nature of how vertices and edges directly reference each other in a graph database, index look-up speeds are comparable. Endogenous indices afford graph databases a great flexibility in modeling a domain. Not only can objects and their relationships be modeled (e.g. people and their friendships), but also the indices that partition the objects into meaningful subsets (e.g. people within a 2D region of space). Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 57. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing Endogenous Indices A graph is a general-purpose data structure. A graph can be used to model lists, maps, trees, etc. As such, a graph can model an index. It was previously assumed that a graph database makes use of an external indexing system to index the properties of its vertices and edges with assumption that specialized indexing systems are better suited for special-purpose queries such as those involving full-text search. In many cases, there is nothing that prevents the representation of an index within the graph itself—vertices and edges can be indexed by other vertices and edges and given the nature of how vertices and edges directly reference each other in a graph database, index look-up speeds are comparable. Endogenous indices afford graph databases a great flexibility in modeling a domain. Not only can objects and their relationships be modeled (e.g. people and their friendships), but also the indices that partition the objects into meaningful subsets (e.g. people within a 2D region of space). The domain of spatial analysis makes use of advanced indexing structures such as the quadtree. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 58. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing Endogenous Indices Quadtrees partition a two-dimensional plane into rectangular boxes based upon the spatial density of the points being indexed. Following figure diagrams how space is partitioned as the density of points increases within a region of the index.: Figure: A quadtree partition of a plane. This figure is an adaptation of a public domain image provided courtesy of David Eppstein. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 59. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing Endogenous Indices In order to demonstrate how a quadtree index can be represented and traversed, a toy graph data set is presented. This data set is diagrammed in following Figure: type=quad bl=[0,0] tr=[100,100] 1 sub sub type=quad type=quad type=quad bl=[0,0] 2 3 bl=[50,25] 4 bl=[50,0] tr=[50,100] tr=[75,50] tr=[100,100] type=quad type=quad bl=[0,0] bl=[50,50] tr=[50,50] tr=[100,100] type=quad type=quad bl=[0,50] 5 6 9 7 8 bl=[50,0] tr=[50,100] tr=[100,50] type=quad [0,100] 1 bl=[50,25] [100,100] tr=[62,37] g f a b e 5 7 9 bl=[25,20] d 3 h c tr=[90,45] i [0,0] 6 2 4 8 [100,0] Figure: A quadtree index of a space that contains points of interest. The index is composed of the vertices 1-9 and the points of interest are the vertices a-i. While not diagrammed for the sake of clarity, all edges are labeled sub (meaning subsumes) and each point of interest vertex has an associated bottom-left (bl) property, top-right (tr) property, and a type property which is equal to “poi.” Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 60. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing Endogenous Indices The top half of Figure represents a quadtree index (vertices 1-9). This quadtree index is partitioning “points of interest” (vertices a-i) located within the diagrammed plane. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 61. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing Endogenous Indices The top half of Figure represents a quadtree index (vertices 1-9). This quadtree index is partitioning “points of interest” (vertices a-i) located within the diagrammed plane. All vertices maintain three properties—bottom-left (bl), top-right (tr), and type. For a quadtree vertex, these properties identify the two corner points defining a rectangular bounding box (i.e. the region that the quadtree vertex is indexing) and the vertex type which is equal to “quad”. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 62. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing Endogenous Indices The top half of Figure represents a quadtree index (vertices 1-9). This quadtree index is partitioning “points of interest” (vertices a-i) located within the diagrammed plane. All vertices maintain three properties—bottom-left (bl), top-right (tr), and type. For a quadtree vertex, these properties identify the two corner points defining a rectangular bounding box (i.e. the region that the quadtree vertex is indexing) and the vertex type which is equal to “quad”. For a point of interest vertex, these properties denote the region of space that the point of interest exists within and the vertex type which is equal to “poi.” Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 63. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing Endogenous Indices The top half of Figure represents a quadtree index (vertices 1-9). This quadtree index is partitioning “points of interest” (vertices a-i) located within the diagrammed plane. All vertices maintain three properties—bottom-left (bl), top-right (tr), and type. For a quadtree vertex, these properties identify the two corner points defining a rectangular bounding box (i.e. the region that the quadtree vertex is indexing) and the vertex type which is equal to “quad”. For a point of interest vertex, these properties denote the region of space that the point of interest exists within and the vertex type which is equal to “poi.” Quadtree vertex 1 denotes the entire region of space being indexed. This region is defined by its bottom-left (bl) and top-right (tr) corner points—namely [0, 0] and [100, 100], where blx = 0, bly = 0, trx = 100, and try = 100. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 64. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing Endogenous Indices The top half of Figure represents a quadtree index (vertices 1-9). This quadtree index is partitioning “points of interest” (vertices a-i) located within the diagrammed plane. All vertices maintain three properties—bottom-left (bl), top-right (tr), and type. For a quadtree vertex, these properties identify the two corner points defining a rectangular bounding box (i.e. the region that the quadtree vertex is indexing) and the vertex type which is equal to “quad”. For a point of interest vertex, these properties denote the region of space that the point of interest exists within and the vertex type which is equal to “poi.” Quadtree vertex 1 denotes the entire region of space being indexed. This region is defined by its bottom-left (bl) and top-right (tr) corner points—namely [0, 0] and [100, 100], where blx = 0, bly = 0, trx = 100, and try = 100. Within the region defined by vertex 1, there are 8 other defined regions that partition that space into smaller spaces (vertices 2-9). Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 65. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing Endogenous Indices The top half of Figure represents a quadtree index (vertices 1-9). This quadtree index is partitioning “points of interest” (vertices a-i) located within the diagrammed plane. All vertices maintain three properties—bottom-left (bl), top-right (tr), and type. For a quadtree vertex, these properties identify the two corner points defining a rectangular bounding box (i.e. the region that the quadtree vertex is indexing) and the vertex type which is equal to “quad”. For a point of interest vertex, these properties denote the region of space that the point of interest exists within and the vertex type which is equal to “poi.” Quadtree vertex 1 denotes the entire region of space being indexed. This region is defined by its bottom-left (bl) and top-right (tr) corner points—namely [0, 0] and [100, 100], where blx = 0, bly = 0, trx = 100, and try = 100. Within the region defined by vertex 1, there are 8 other defined regions that partition that space into smaller spaces (vertices 2-9). When one vertex subsumes another vertex by a directed edge labeled sub (i.e. subsumes), the outgoing (i.e. tail) vertex is subsuming the space that is defined by the incoming (i.e. head) vertex. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 66. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing Endogenous Indices The top half of Figure represents a quadtree index (vertices 1-9). This quadtree index is partitioning “points of interest” (vertices a-i) located within the diagrammed plane. All vertices maintain three properties—bottom-left (bl), top-right (tr), and type. For a quadtree vertex, these properties identify the two corner points defining a rectangular bounding box (i.e. the region that the quadtree vertex is indexing) and the vertex type which is equal to “quad”. For a point of interest vertex, these properties denote the region of space that the point of interest exists within and the vertex type which is equal to “poi.” Quadtree vertex 1 denotes the entire region of space being indexed. This region is defined by its bottom-left (bl) and top-right (tr) corner points—namely [0, 0] and [100, 100], where blx = 0, bly = 0, trx = 100, and try = 100. Within the region defined by vertex 1, there are 8 other defined regions that partition that space into smaller spaces (vertices 2-9). When one vertex subsumes another vertex by a directed edge labeled sub (i.e. subsumes), the outgoing (i.e. tail) vertex is subsuming the space that is defined by the incoming (i.e. head) vertex. Given these properties and edges, identifying point of interest vertices within a region of space is simply a matter of traversing the quadtree index in a directed/algorithmic fashion. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 67. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing Endogenous Indices In Figure the shaded region represents the spatial query: “Which points of interest are within the rectangular region defined by the corner points bl = [25, 20] and tr = [90, 45]?” Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 68. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing Endogenous Indices In Figure the shaded region represents the spatial query: “Which points of interest are within the rectangular region defined by the corner points bl = [25, 20] and tr = [90, 45]?” In order to locate all the points of interest in this region, iteratively execute the following traversal starting from the root of the quadtree index (i.e. vertex 1). ˆ ˆ The function is defined as f : P(V ) → P(V ), where try ≥20 bly ≤45 “ ” f (i) = p+ ◦ trx ≥25 ◦ p+ p+ ◦ blx ≤90 ◦ vin ◦ elab+ ◦ eout (i). p+ sub Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)
  • 69. Outline Definition of Graph Traversals Introduction Traversing for Recommendation The Realization of Graphs Content-Based Recommendation Graph Traversals Collaborative Filtering-Based Recommendation Conclusion Traversing Endogenous Indices Traversing Endogenous Indices In Figure the shaded region represents the spatial query: “Which points of interest are within the rectangular region defined by the corner points bl = [25, 20] and tr = [90, 45]?” In order to locate all the points of interest in this region, iteratively execute the following traversal starting from the root of the quadtree index (i.e. vertex 1). ˆ ˆ The function is defined as f : P(V ) → P(V ), where try ≥20 bly ≤45 “ ” f (i) = p+ ◦ trx ≥25 ◦ p+ p+ ◦ blx ≤90 ◦ vin ◦ elab+ ◦ eout (i). p+ sub The defining aspect of f is the set of 4 p+ filters that determine whether the current vertex is overlapping or within the query rectangle. Those vertices not overlapping or within the query rectangle are not traversed to. Thus, as the traversal iterates, fewer and fewer paths are examined and the resulting point of interest vertices within the query rectangle are converged upon. Igor Bogicevic (igor.bogicevic@sbgenomics.com) The Graph Traversal Pattern(Marko A. Rodriguez, Peter Neubauer)