Graphs, Edges &
                               Nodes
                             Untangling the social web.




Wednesday, March 9, 2011
What’s a graph?


Wednesday, March 9, 2011
Graph




Wednesday, March 9, 2011
Graph




Wednesday, March 9, 2011
Graph




Wednesday, March 9, 2011
Graph
                                                             10


                                                     19
                                                9                  7
                                                                               2   15
                                            7
                                    3
                                                    12
                                                                  13
                                                                           9
                               6
                                        6
                                                             4                          3
                                                5                  7
                           4
                                   14
                                                                       1

                                                         4




Wednesday, March 9, 2011
Graph
                                                          11                      10                    10

                                                                          19
                                        6                      9                            7
                                                                                                        2    15
                                                          7         21
                                             3                                                  8
                                                                         12
                                                         15                                13                     13
                                       17                                                           9
                                                                         22
                               6
                                                 6
                                                                                                    3
                       4                                                          4                               3
                                                     2         5                            7
                           4
                                   6        14                                         9                     12
                                                                                                1
                                                               10             4
                                                                                           19




Wednesday, March 9, 2011
Simple




                       At most one edge bet ween any pair of nodes.



Wednesday, March 9, 2011
Multigraph




                           Multiple edges bet ween vertices allowed.



Wednesday, March 9, 2011
Pseudograph




                           Self-loops are permitted.



Wednesday, March 9, 2011
G = (V, E)




Wednesday, March 9, 2011
Wednesday, March 9, 2011
What’s a node?
                                  vertex
                                   point
                                 junction
                                0-simplex



Wednesday, March 9, 2011
Wednesday, March 9, 2011
Wednesday, March 9, 2011
What’s an edge?
                                   arc
                                 branch
                                   line
                                   link
                                1-simplex


Wednesday, March 9, 2011
Directed

Wednesday, March 9, 2011
Wednesday, March 9, 2011
Wednesday, March 9, 2011
Undirected

Wednesday, March 9, 2011
Undirected

Wednesday, March 9, 2011
Wednesday, March 9, 2011
Wednesday, March 9, 2011
Wednesday, March 9, 2011
Wednesday, March 9, 2011
Data Structures


Wednesday, March 9, 2011
1


                                                       3




                           2


                                               4



                               (Finite simple graph)



Wednesday, March 9, 2011
vertices

                           0       1       1      1

                           1       0       0      0
              vertices
                           1       0       0      1

                           1       0       1      0


                               Adjacency Matrix
                                   (2d array)


Wednesday, March 9, 2011
vertices

                           0       1       1      1

                           1       0       0      0
              vertices
                           1       0       0      1

                           1       0       1      0


                               Adjacency Matrix
                                   (2d array)


Wednesday, March 9, 2011
1


                                                       3




                           2


                                               4



                               (Finite simple graph)



Wednesday, March 9, 2011
[1, 2, 3, 4]
                            2   1       1         1
                            3           4         3
                            4

         Array entries (vertices) point to singly linked-lists




Wednesday, March 9, 2011
Visualizations


Wednesday, March 9, 2011
You are here.




Wednesday, March 9, 2011
Wednesday, March 9, 2011
(Graph does not include Justin Bieber)



Wednesday, March 9, 2011
Social Graphs


Wednesday, March 9, 2011
Wednesday, March 9, 2011
Wednesday, March 9, 2011
Wednesday, March 9, 2011
Wednesday, March 9, 2011
Wednesday, March 9, 2011
Wednesday, March 9, 2011
Wednesday, March 9, 2011
Wednesday, March 9, 2011
Wednesday, March 9, 2011
User-based item recommendations




Wednesday, March 9, 2011
People




                           Recommend items to me that are popular amongst my friends



Wednesday, March 9, 2011
People
                                                                                (friends)




                           Recommend items to me that are popular amongst my friends



Wednesday, March 9, 2011
People
                                                                                (friends)




                                                   Items

                           Recommend items to me that are popular amongst my friends



Wednesday, March 9, 2011
People
                                                                                (friends)




                                                   Items

                           Recommend items to me that are popular amongst my friends



Wednesday, March 9, 2011
People
                             (me)                                               (friends)




                                                   Items

                           Recommend items to me that are popular amongst my friends



Wednesday, March 9, 2011
People
                             (me)                                               (friends)




                                                   Items

                           Recommend items to me that are popular amongst my friends



Wednesday, March 9, 2011
People
                             (me)                                               (friends)




                                                   Items

                           Recommend items to me that are popular amongst my friends



Wednesday, March 9, 2011
2-step path on homogeneous bipartite
                                           graph.




Wednesday, March 9, 2011
Strong Connection Problem (SCP)




Wednesday, March 9, 2011
There are many of these ‘fundamental’
                  graph units:

                      -    tripartite graphs (user/asset/tag)
                      -    folksonomies
                      -    multicolor-multiparity graph
                      -    etc.




Wednesday, March 9, 2011
Graph Storage
                              Engines


Wednesday, March 9, 2011
Neo4j
    “An embedded, disk-based, fully transactional Java persistence engine that
            stores data structured in graphs rather than in tables.”


                             http://neo4j.org


Wednesday, March 9, 2011
HypergraphDB
“A general purpose, extensible, portable, distributed, embeddable, open-source
    data storage mechanism. It is a graph database designed specifically for
               artificial intelligence and semantic web projects.”


                            http://kobrix.org/hgdb.jsp


Wednesday, March 9, 2011
Special Purpose
                           Storage Engines


Wednesday, March 9, 2011
FlockDB
                  “FlockDB is a database that stores graph data, but it isn't a database
                optimized for graph-traversal operations. Instead, it's optimized for very
                large adjacency lists, fast reads and writes, and page-able set arithmetic
                                                 queries.”



  http://engineering.twitter.com/2010/05/introducing-flockdb.html


Wednesday, March 9, 2011
Redis
       “Redis is an advanced key-value store. [...] the dataset is not volatile, and values
       can be strings, exactly like in memcached, but also lists, sets, and ordered sets. All
       this data types can be manipulated with atomic operations to push/pop elements,
       add/remove elements, perform server side union, intersection, difference between
       sets, etc.”



                           http://code.google.com/p/redis

Wednesday, March 9, 2011
A Redis Friends/
                 Followers Example


Wednesday, March 9, 2011
Redis makes you think in terms of datastructures,
                and operations on those structures.




Wednesday, March 9, 2011
Set:
         Finite (for our cases) collection of objects in which
         order has no significance and multiplicity is generally
         ignored.
                           S = { Alice, Bob, Carol }

        List:
          Finite (for our cases) collection of objects in which
          order *is* significant and multiplicity is allowed.

                             L = [ X, Y, X, Z, Q]


Wednesday, March 9, 2011
Insert a user into a set


                            SET uid:1000:username jperras


                           Command    Key          Value




Wednesday, March 9, 2011
Use sets for denoting my followers/people
                                  I follow.




Wednesday, March 9, 2011
Adding a new follower

                            SADD uid:1000:following 1001
                            SADD uid:1001:followers 1000


                           Command     Key         Value




Wednesday, March 9, 2011
Posting Updates

                      $r = Redis();
                      $postid = $r->incr("global:nextPostId");
                      $post = $User['id'] ."|". time() ."|". $status;
                      $r->set("post:$postid", $post);
                      $followers = $r->smembers("uid:".$User['id'].":followers");

                      if ($followers === false) $followers = Array();
                      $followers[] = $User['id']; /* Add the post to our own posts too */

                      foreach($followers as $fid) {
                          $r->push("uid:$fid:posts", $postid, false);
                      }
                      # Push the post on the timeline, and trim the timeline to the
                      # newest 1000 elements.
                      $r->push("global:timeline", $postid, false);
                      $r->ltrim("global:timeline",0,1000);




Wednesday, March 9, 2011
Common followers? - Set intersections!



                           SINTER users:1000:followers users:1000:followers


                           Command       Key 1                Key 2




Wednesday, March 9, 2011
A MySQL Example
                           (simplified)




Wednesday, March 9, 2011
# Mutual Friends
                           select f.friend_id
                               from friends f
                               join friends m
                               on m.user_id = f.friend_id
                               and m.friend_id = f.user_id
                           where f.user_id = 1234

                           # Following (for directed graphs)
                           select f.friend_id
                               from friends f
                               left join friends m
                                   on m.user_id = f.friend_id
                                   and m.friend_id = f.user_id
                               where f.user_id = 1234
                                   and m.user_id is null;

                           # Followers (for directed graphs)
                           select m.friend_id
                               from friends f
                               left join friends m
                                   on m.user_id = f.friend_id
                                   and m.friend_id = f.user_id
                               where f.friend_id = 1234
                                   and m.user_id is null




Wednesday, March 9, 2011
# Mutual Friends
                           select f.friend_id
                               from friends f
                               join friends m
                               on m.user_id = f.friend_id
                               and m.friend_id = f.user_id
                           where f.user_id = 1234

                           # Following (for directed graphs)
                           select f.friend_id
                               from friends f
                               left join friends m

        Not too bad.               on m.user_id = f.friend_id
                                   and m.friend_id = f.user_id
                               where f.user_id = 1234
                                   and m.user_id is null;

                           # Followers (for directed graphs)
                           select m.friend_id
                               from friends f
                               left join friends m
                                   on m.user_id = f.friend_id
                                   and m.friend_id = f.user_id
                               where f.friend_id = 1234
                                   and m.user_id is null




Wednesday, March 9, 2011
Relational databases can work for the simplest
          of cases, but are not always the best solution for
                 many graph operations/algorithms.




Wednesday, March 9, 2011
Graphs and graph-databases are only
                            going to be more and more useful.




Wednesday, March 9, 2011
However, graph algorithms are hard.

                                So don’t write your own.

        And make sure you use a persistent storage engine
            that is best suited for the type of queries
                      you will be performing.




Wednesday, March 9, 2011
Resources


Wednesday, March 9, 2011
Resources
                           The Algorithm Design Manual,
                           Steve S. Skiena
                           Programming Collective
                           Intelligence, Toby Segaran
                           Introduction to Algorithms,
                           Cormen, Leiserson, Rivest


Wednesday, March 9, 2011
@jperras

Wednesday, March 9, 2011
Photo Credits


                           Graph of the internet, circa 2003: http://www.duniacyber.com/freebies/education/what-
                           is-internet-lookslike/ (built from partial troll of public servers using traceroute)

                           My real friends for letting me use their Facebook profile images.




Wednesday, March 9, 2011
References

                           Large Scale Graph Algorithms (class lectures), Yuri Lifshits, Steklov Institute of
                           Mathematics at St. Petersburg

                           http://mathworld.wolfram.com/Set.html

                           Programming Collective Intelligence, Toby Segaran

                           The Algorithm Design Manual, Steve S. Skiena




Wednesday, March 9, 2011

Graphs, Edges & Nodes: Untangling the Social Web

  • 1.
    Graphs, Edges & Nodes Untangling the social web. Wednesday, March 9, 2011
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
    Graph 10 19 9 7 2 15 7 3 12 13 9 6 6 4 3 5 7 4 14 1 4 Wednesday, March 9, 2011
  • 7.
    Graph 11 10 10 19 6 9 7 2 15 7 21 3 8 12 15 13 13 17 9 22 6 6 3 4 4 3 2 5 7 4 6 14 9 12 1 10 4 19 Wednesday, March 9, 2011
  • 8.
    Simple At most one edge bet ween any pair of nodes. Wednesday, March 9, 2011
  • 9.
    Multigraph Multiple edges bet ween vertices allowed. Wednesday, March 9, 2011
  • 10.
    Pseudograph Self-loops are permitted. Wednesday, March 9, 2011
  • 11.
    G = (V,E) Wednesday, March 9, 2011
  • 12.
  • 13.
    What’s a node? vertex point junction 0-simplex Wednesday, March 9, 2011
  • 14.
  • 15.
  • 16.
    What’s an edge? arc branch line link 1-simplex Wednesday, March 9, 2011
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
    1 3 2 4 (Finite simple graph) Wednesday, March 9, 2011
  • 28.
    vertices 0 1 1 1 1 0 0 0 vertices 1 0 0 1 1 0 1 0 Adjacency Matrix (2d array) Wednesday, March 9, 2011
  • 29.
    vertices 0 1 1 1 1 0 0 0 vertices 1 0 0 1 1 0 1 0 Adjacency Matrix (2d array) Wednesday, March 9, 2011
  • 30.
    1 3 2 4 (Finite simple graph) Wednesday, March 9, 2011
  • 31.
    [1, 2, 3,4] 2 1 1 1 3 4 3 4 Array entries (vertices) point to singly linked-lists Wednesday, March 9, 2011
  • 32.
  • 33.
  • 34.
  • 35.
    (Graph does notinclude Justin Bieber) Wednesday, March 9, 2011
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
    People Recommend items to me that are popular amongst my friends Wednesday, March 9, 2011
  • 48.
    People (friends) Recommend items to me that are popular amongst my friends Wednesday, March 9, 2011
  • 49.
    People (friends) Items Recommend items to me that are popular amongst my friends Wednesday, March 9, 2011
  • 50.
    People (friends) Items Recommend items to me that are popular amongst my friends Wednesday, March 9, 2011
  • 51.
    People (me) (friends) Items Recommend items to me that are popular amongst my friends Wednesday, March 9, 2011
  • 52.
    People (me) (friends) Items Recommend items to me that are popular amongst my friends Wednesday, March 9, 2011
  • 53.
    People (me) (friends) Items Recommend items to me that are popular amongst my friends Wednesday, March 9, 2011
  • 54.
    2-step path onhomogeneous bipartite graph. Wednesday, March 9, 2011
  • 55.
    Strong Connection Problem(SCP) Wednesday, March 9, 2011
  • 56.
    There are manyof these ‘fundamental’ graph units: - tripartite graphs (user/asset/tag) - folksonomies - multicolor-multiparity graph - etc. Wednesday, March 9, 2011
  • 57.
    Graph Storage Engines Wednesday, March 9, 2011
  • 58.
    Neo4j “An embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables.” http://neo4j.org Wednesday, March 9, 2011
  • 59.
    HypergraphDB “A general purpose,extensible, portable, distributed, embeddable, open-source data storage mechanism. It is a graph database designed specifically for artificial intelligence and semantic web projects.” http://kobrix.org/hgdb.jsp Wednesday, March 9, 2011
  • 60.
    Special Purpose Storage Engines Wednesday, March 9, 2011
  • 61.
    FlockDB “FlockDB is a database that stores graph data, but it isn't a database optimized for graph-traversal operations. Instead, it's optimized for very large adjacency lists, fast reads and writes, and page-able set arithmetic queries.” http://engineering.twitter.com/2010/05/introducing-flockdb.html Wednesday, March 9, 2011
  • 62.
    Redis “Redis is an advanced key-value store. [...] the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists, sets, and ordered sets. All this data types can be manipulated with atomic operations to push/pop elements, add/remove elements, perform server side union, intersection, difference between sets, etc.” http://code.google.com/p/redis Wednesday, March 9, 2011
  • 63.
    A Redis Friends/ Followers Example Wednesday, March 9, 2011
  • 64.
    Redis makes youthink in terms of datastructures, and operations on those structures. Wednesday, March 9, 2011
  • 65.
    Set: Finite (for our cases) collection of objects in which order has no significance and multiplicity is generally ignored. S = { Alice, Bob, Carol } List: Finite (for our cases) collection of objects in which order *is* significant and multiplicity is allowed. L = [ X, Y, X, Z, Q] Wednesday, March 9, 2011
  • 66.
    Insert a userinto a set SET uid:1000:username jperras Command Key Value Wednesday, March 9, 2011
  • 67.
    Use sets fordenoting my followers/people I follow. Wednesday, March 9, 2011
  • 68.
    Adding a newfollower SADD uid:1000:following 1001 SADD uid:1001:followers 1000 Command Key Value Wednesday, March 9, 2011
  • 69.
    Posting Updates $r = Redis(); $postid = $r->incr("global:nextPostId"); $post = $User['id'] ."|". time() ."|". $status; $r->set("post:$postid", $post); $followers = $r->smembers("uid:".$User['id'].":followers"); if ($followers === false) $followers = Array(); $followers[] = $User['id']; /* Add the post to our own posts too */ foreach($followers as $fid) {     $r->push("uid:$fid:posts", $postid, false); } # Push the post on the timeline, and trim the timeline to the # newest 1000 elements. $r->push("global:timeline", $postid, false); $r->ltrim("global:timeline",0,1000); Wednesday, March 9, 2011
  • 70.
    Common followers? -Set intersections! SINTER users:1000:followers users:1000:followers Command Key 1 Key 2 Wednesday, March 9, 2011
  • 71.
    A MySQL Example (simplified) Wednesday, March 9, 2011
  • 72.
    # Mutual Friends select f.friend_id from friends f join friends m on m.user_id = f.friend_id and m.friend_id = f.user_id where f.user_id = 1234 # Following (for directed graphs) select f.friend_id from friends f left join friends m on m.user_id = f.friend_id and m.friend_id = f.user_id where f.user_id = 1234 and m.user_id is null; # Followers (for directed graphs) select m.friend_id from friends f left join friends m on m.user_id = f.friend_id and m.friend_id = f.user_id where f.friend_id = 1234 and m.user_id is null Wednesday, March 9, 2011
  • 73.
    # Mutual Friends select f.friend_id from friends f join friends m on m.user_id = f.friend_id and m.friend_id = f.user_id where f.user_id = 1234 # Following (for directed graphs) select f.friend_id from friends f left join friends m Not too bad. on m.user_id = f.friend_id and m.friend_id = f.user_id where f.user_id = 1234 and m.user_id is null; # Followers (for directed graphs) select m.friend_id from friends f left join friends m on m.user_id = f.friend_id and m.friend_id = f.user_id where f.friend_id = 1234 and m.user_id is null Wednesday, March 9, 2011
  • 74.
    Relational databases canwork for the simplest of cases, but are not always the best solution for many graph operations/algorithms. Wednesday, March 9, 2011
  • 75.
    Graphs and graph-databasesare only going to be more and more useful. Wednesday, March 9, 2011
  • 76.
    However, graph algorithmsare hard. So don’t write your own. And make sure you use a persistent storage engine that is best suited for the type of queries you will be performing. Wednesday, March 9, 2011
  • 77.
  • 78.
    Resources The Algorithm Design Manual, Steve S. Skiena Programming Collective Intelligence, Toby Segaran Introduction to Algorithms, Cormen, Leiserson, Rivest Wednesday, March 9, 2011
  • 79.
  • 80.
    Photo Credits Graph of the internet, circa 2003: http://www.duniacyber.com/freebies/education/what- is-internet-lookslike/ (built from partial troll of public servers using traceroute) My real friends for letting me use their Facebook profile images. Wednesday, March 9, 2011
  • 81.
    References Large Scale Graph Algorithms (class lectures), Yuri Lifshits, Steklov Institute of Mathematics at St. Petersburg http://mathworld.wolfram.com/Set.html Programming Collective Intelligence, Toby Segaran The Algorithm Design Manual, Steve S. Skiena Wednesday, March 9, 2011