Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Graphs, Edges &
    Nodes
  Untangling the social web.
What’s a graph?
Graph
Graph
Graph
Graph
                                  10


                          19
                     9                  7
      ...
Graph
                                   11                      10                    10

                               ...
Simple




At most one edge bet ween any pair of nodes.
Multigraph




Multiple edges bet ween vertices allowed.
Pseudograph




Self-loops are permitted.
G = (V, E)
What’s a node?
       vertex
        point
      junction
     0-simplex
What’s an edge?
        arc
      branch
        line
        link
     1-simplex
Directed
Undirected
Undirected
Visualizations
You are here.
(Graph does not include Justin Bieber)
Social Graphs
Find the band that is most often co-listened with the given one.
People




Find the band that is most often co-listened with the given one.
People




                           Bands


Find the band that is most often co-listened with the given one.
People




                           Bands


Find the band that is most often co-listened with the given one.
People




                           Bands


Find the band that is most often co-listened with the given one.
People




                           Bands


Find the band that is most often co-listened with the given one.
Basically, most kinds of simple
content/co-occurrence similarity.
That’s a 2-step path on a bipartite graph.

There are many of these ‘fundamental’
graph units:

 - tripartite
 - folksonom...
Graph Storage
   Engines
Neo4j
“An embedded, disk-based, fully transactional Java persistence engine
    that stores data structured in graphs rath...
HypergraphDB
  “A general purpose, extensible, portable, distributed, embeddable,
open-source data storage mechanism. It i...
Special Purpose
Storage Engines
FlockDB
 “FlockDB is a database that stores graph data, but it isn't a database optimized for
  graph-traversal operations...
Redis
“Redis is an advanced key-value store. [...] the dataset is not volatile, and values can be strings,
exactly like in...
A Redis Friends/
Followers Example
Redis makes you think in terms of datastructures,
       and operations on those structures.
Set:
 Finite (for our cases) collection of objects in which
 order has no significance and multiplicity is generally
 ignor...
Insert a user into a set

SET uid:1000:username jperras
SET uid:1000:password bazinga!
Use sets for denoting my followers/people
                 I follow.


uid:1000:followers => Set of uids of all the follow...
Adding a new follower

SADD uid:1000:following 1001
SADD uid:1001:followers 1000
Posting Updates

$r = Redis();
$postid = $r->incr("global:nextPostId");
$post = $User['id'] ."|". time() ."|". $status;
$r...
Common followers? - Set intersections!




SINTER users:1000:followers users:1000:followers
Let’s compare that
     to MySQL
Can be Painful
Even More Pain
Relational databases can work for the simplest
of cases, but fail horribly at nearly all graph-related
               oper...
Graphs and graph-databases are only
  going to be more and more useful.
However, graph algorithms are hard.

            So don’t write your own.

And make sure you use a persistent storage engi...
Resources
Resources
The Algorithm Design Manual,
Steve S. Skiena
Programming Collective
Intelligence, Toby Segaran
Introduction to A...
@jperras
Photo Credits


Graph of the internet, circa 2003: http://www.duniacyber.com/freebies/education/what-
is-internet-lookslik...
References

Large Scale Graph Algorithms (class lectures), Yuri Lifshits, Steklov Institute of
Mathematics at St. Petersbu...
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
Upcoming SlideShare
Loading in …5
×

Graphs, Edges & Nodes - Untangling the Social Web

32,005 views

Published on

Many of the most popular web applications today deal with highly organized and structured data that represent entities, and the relationships between these entities. LinkedIn can tell you how many degrees of separation there are between yourself and the CEO of Samsung, Facebook can figure out people that you might already know, Digg can recommend article submissions that you might like, and LastFM suggests music based on your current listening habits.

We’ll take a look at the basic theory behind how some of these features can be implemented (no computer science degree required!), and then dig in to a few practical implementations using PHP & and a relational database, as well as with Redis. Lastly, we’ll take a quick look at the current landscape of graph-based datastores that simplify many of these operations.

Published in: Technology

Graphs, Edges & Nodes - Untangling the Social Web

  1. Graphs, Edges & Nodes Untangling the social web.
  2. What’s a graph?
  3. Graph
  4. Graph
  5. Graph
  6. Graph 10 19 9 7 2 15 7 3 12 13 9 6 6 4 3 5 7 4 14 1 4
  7. Graph 11 10 10 19 6 9 7 2 15 7 21 3 8 12 15 13 13 17 9 22 6 6 3 4 4 3 2 5 7 4 6 14 9 12 1 10 4 19
  8. Simple At most one edge bet ween any pair of nodes.
  9. Multigraph Multiple edges bet ween vertices allowed.
  10. Pseudograph Self-loops are permitted.
  11. G = (V, E)
  12. What’s a node? vertex point junction 0-simplex
  13. What’s an edge? arc branch line link 1-simplex
  14. Directed
  15. Undirected
  16. Undirected
  17. Visualizations
  18. You are here.
  19. (Graph does not include Justin Bieber)
  20. Social Graphs
  21. Find the band that is most often co-listened with the given one.
  22. People Find the band that is most often co-listened with the given one.
  23. People Bands Find the band that is most often co-listened with the given one.
  24. People Bands Find the band that is most often co-listened with the given one.
  25. People Bands Find the band that is most often co-listened with the given one.
  26. People Bands Find the band that is most often co-listened with the given one.
  27. Basically, most kinds of simple content/co-occurrence similarity.
  28. That’s a 2-step path on a bipartite graph. There are many of these ‘fundamental’ graph units: - tripartite - folksonomies (tripartite 3-graph + 2- step path) - multicolor-multiparity graph - etc.
  29. Graph Storage Engines
  30. Neo4j “An embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables.” http://neo4j.org
  31. HypergraphDB “A general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism. It is a graph database designed specifically for artificial intelligence and semantic web projects.” http://kobrix.org/hgdb.jsp
  32. Special Purpose Storage Engines
  33. FlockDB “FlockDB is a database that stores graph data, but it isn't a database optimized for graph-traversal operations. Instead, it's optimized for very large adjacency lists, fast reads and writes, and page-able set arithmetic queries.” http://engineering.t witter.com/2010/05/introducing- flockdb.html
  34. Redis “Redis is an advanced key-value store. [...] the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists, sets, and ordered sets. All this data types can be manipulated with atomic operations to push/pop elements, add/remove elements, perform server side union, intersection, difference bet ween sets, etc.” http://code.google.com/p/redis
  35. A Redis Friends/ Followers Example
  36. Redis makes you think in terms of datastructures, and operations on those structures.
  37. Set: Finite (for our cases) collection of objects in which order has no significance and multiplicity is generally ignored. S = { Alice, Bob, Carol } List: Finite (for our cases) collection of objects in which order *is* significant and multiplicity is allowed. L = [ X, Y, X, Z, Q]
  38. Insert a user into a set SET uid:1000:username jperras SET uid:1000:password bazinga!
  39. Use sets for denoting my followers/people I follow. uid:1000:followers => Set of uids of all the followers users uid:1000:following => Set of uids of all the following users
  40. Adding a new follower SADD uid:1000:following 1001 SADD uid:1001:followers 1000
  41. Posting Updates $r = Redis(); $postid = $r->incr("global:nextPostId"); $post = $User['id'] ."|". time() ."|". $status; $r->set("post:$postid", $post); $followers = $r->smembers("uid:".$User['id'].":followers"); if ($followers === false) $followers = Array(); $followers[] = $User['id']; /* Add the post to our own posts too */ foreach($followers as $fid) {     $r->push("uid:$fid:posts", $postid, false); } # Push the post on the timeline, and trim the timeline to the # newest 1000 elements. $r->push("global:timeline", $postid, false); $r->ltrim("global:timeline",0,1000);
  42. Common followers? - Set intersections! SINTER users:1000:followers users:1000:followers
  43. Let’s compare that to MySQL
  44. Can be Painful
  45. Even More Pain
  46. Relational databases can work for the simplest of cases, but fail horribly at nearly all graph-related operations/algorithms.
  47. Graphs and graph-databases are only going to be more and more useful.
  48. However, graph algorithms are hard. So don’t write your own. And make sure you use a persistent storage engine that is best suited for the type of queries you will be performing.
  49. Resources
  50. Resources The Algorithm Design Manual, Steve S. Skiena Programming Collective Intelligence, Toby Segaran Introduction to Algorithms, Cormen, Leiserson, Rivest
  51. @jperras
  52. Photo Credits Graph of the internet, circa 2003: http://www.duniacyber.com/freebies/education/what- is-internet-lookslike/ (built from partial troll of public servers using traceroute) My real friends for letting me use their Facebook profile images.
  53. References Large Scale Graph Algorithms (class lectures), Yuri Lifshits, Steklov Institute of Mathematics at St. Petersburg http://mathworld.wolfram.com/Set.html Programming Collective Intelligence, Toby Segaran The Algorithm Design Manual, Steve S. Skiena

×