Graphs, Edges & Nodes - Untangling the Social Web
Upcoming SlideShare
Loading in...5
×
 

Graphs, Edges & Nodes - Untangling the Social Web

on

  • 13,486 views

Many of the most popular web applications today deal with highly organized and structured data that represent entities, and the relationships between these entities. LinkedIn can tell you how many ...

Many of the most popular web applications today deal with highly organized and structured data that represent entities, and the relationships between these entities. LinkedIn can tell you how many degrees of separation there are between yourself and the CEO of Samsung, Facebook can figure out people that you might already know, Digg can recommend article submissions that you might like, and LastFM suggests music based on your current listening habits.

We’ll take a look at the basic theory behind how some of these features can be implemented (no computer science degree required!), and then dig in to a few practical implementations using PHP & and a relational database, as well as with Redis. Lastly, we’ll take a quick look at the current landscape of graph-based datastores that simplify many of these operations.

Statistics

Views

Total Views
13,486
Views on SlideShare
13,131
Embed Views
355

Actions

Likes
36
Downloads
375
Comments
4

6 Embeds 355

http://www.slideshare.net 208
http://webstandart.info 142
http://www.lmodules.com 2
http://staging.facebook.slideshare.com 1
http://n3.doloops.net 1
http://www.php-talks.com 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Many of the most popular web applications today deal with highly organized and structured data that represent entities, and the relationships between these entities. LinkedIn can tell you how many degrees of separation there are between yourself and the CEO of Samsung, Facebook can figure out people that you might already know, Digg can recommend article submissions that you might like, and LastFM suggests music based on your current listening habits. <br /> <br /> We&#x2019;ll take a look at the basic theory behind how some of these features can be implemented (no computer science degree required!), and take a quick look at the current landscape of graph-based datastores that simplify many of these operations. <br />
  • Start with some definitions. <br />
  • Collection of points - e.g. Users (Twitter/Facebook), songs (iTunes) <br />
  • Add relationships between data points <br />
  • Some relations are not symmetric - e.g. `friend` vs. following/follower is asymmetric. <br /> <br />
  • Your relationships might have a weight - e.g. # of Scrabulous games they have played together. <br />
  • Data points can also have weight - e.g. `reputation` score on social news sites like Digg, Reddit. <br />
  • Simple graph - at most one edge between vertex pair. <br />
  • Simple graph - at most one edge between vertex pair. <br />
  • Self-loops are allowed. <br /> e.g. if your application needs the ability for you to be your own &#x2018;follower&#x2019; or &#x2018;friend&#x2019;. <br />
  • Notation that you might see - G is the &#x2018;name&#x2019; of the graph, and is composed of &#x2018;V&#x2019; vertices (nodes) and &#x2018;E&#x2019; edges. <br />
  • <br />
  • "Vertex" is a synonym for a node of a graph, i.e., one of the points on which the graph is defined and which may be connected by graph edges. <br />
  • <br />
  • <br />
  • An ordered (or unordered) pair of nodes. <br /> Different types of edges: directed. <br /> <br /> In geometry, a simplex (plural simplexes or simplices) is a generalization of the notion of a triangle or tetrahedron to arbitrary dimension. <br /> Specifically, an n-simplex is an n-dimensional polytope with n&#xA0;+&#xA0;1 vertices, of which the simplex is the convex hull. For example, a 2-simplex is a triangle, a 3-simplex is a tetrahedron, and a 4-simplex is apentachoron. <br /> A single point may be considered a 0-simplex, and a line segment may be viewed as a 1-simplex. <br /> A simplex may be defined as the smallest convex set which contains the given vertices. <br />
  • The edge is an ordered pair of nodes. <br /> The terms "arc", "branch" "line", "link" and "1-simplex" are sometimes used instead of edge <br />
  • Edge highlight on next slide. <br />
  • <br />
  • an unordered pair of nodes that specify a line joining these two nodes are said to form an edge <br />
  • an unordered pair of nodes that specify a line joining these two nodes are said to form an edge <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • Partial map of the internet, culled in 2003 using traceroute. <br />
  • Graph visualizations have also become quite important - displaying information on billions of points and edges <br /> in a useful manner is quite difficult. <br /> <br /> The graph is projected inside a 3D sphere using a special kind of space based hyperbolic geometry. This is a non-Euclidean space, which has useful distorting properties of making elements at the center of the display much larger than those on the periphery. <br /> <br /> Hyperbolic space projection is commonly know as &#x201C;focus+context&#x201D; in the field of information visualization and has been used to display all kinds of data that can be represented as large graphs in either two and three dimensions. <br />
  • This is a graph representation of the similarity relationships derived from the database of Last.fm. The circles (vertices) on the left hand side figure are bands, musicians, composers, whatever you will find in theMusic section of the site. Lines (edges) connect similar artists.&#xA0;&#xA0;Vertex sizes vary according to the popularity of the artists. I Vertex colors correspond to musical genres, identified by tags attached to the artists by the users of Last.fm <br />
  • You are already a part of and use several social graphs. <br />
  • Twitter is one giant graph (users, followers, following) + timeline attached to users <br />
  • Linkedin is another giant graph. It&#x2019;s basically in their name! <br />
  • <br />
  • Me <br />
  • I&#x2019;m the center of the world. <br />
  • Relationships with my friends <br />
  • My friends also have relationships between themselves <br />
  • Let&#x2019;s get rid of the pictures for a second <br />
  • My friends also have friends, and those friends can be friends with my other immediate friends. <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • Important problems: max-intersection + strongest connection problem. <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • From Twitter - solves their problems <br />
  • <br />
  • <br />
  • <br />
  • Sets are great when the order of your data doesn&#x2019;t matter, and when you know that the objects need to be unique. Example: USERS <br /> <br /> Lists are best for things that need to be displayed in a given order, e.g. POST TIMELINE <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • Sets are great when the order of your data doesn&#x2019;t matter, and when you know that the objects need to be unique. Example: USERS <br /> <br /> Lists are best for things that need to be displayed in a given order, e.g. POST TIMELINE <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />

Graphs, Edges & Nodes - Untangling the Social Web Graphs, Edges & Nodes - Untangling the Social Web Presentation Transcript

  • Graphs, Edges & Nodes Untangling the social web.
  • What’s a graph?
  • Graph
  • Graph
  • Graph
  • Graph 10 19 9 7 2 15 7 3 12 13 9 6 6 4 3 5 7 4 14 1 4
  • Graph 11 10 10 19 6 9 7 2 15 7 21 3 8 12 15 13 13 17 9 22 6 6 3 4 4 3 2 5 7 4 6 14 9 12 1 10 4 19
  • Simple At most one edge bet ween any pair of nodes.
  • Multigraph Multiple edges bet ween vertices allowed.
  • Pseudograph Self-loops are permitted.
  • G = (V, E)
  • What’s a node? vertex point junction 0-simplex
  • What’s an edge? arc branch line link 1-simplex
  • Directed
  • Undirected
  • Undirected
  • Visualizations
  • You are here.
  • (Graph does not include Justin Bieber)
  • Social Graphs
  • Find the band that is most often co-listened with the given one.
  • People Find the band that is most often co-listened with the given one.
  • People Bands Find the band that is most often co-listened with the given one.
  • People Bands Find the band that is most often co-listened with the given one.
  • People Bands Find the band that is most often co-listened with the given one.
  • People Bands Find the band that is most often co-listened with the given one.
  • Basically, most kinds of simple content/co-occurrence similarity.
  • That’s a 2-step path on a bipartite graph. There are many of these ‘fundamental’ graph units: - tripartite - folksonomies (tripartite 3-graph + 2- step path) - multicolor-multiparity graph - etc.
  • Graph Storage Engines
  • Neo4j “An embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables.” http://neo4j.org
  • HypergraphDB “A general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism. It is a graph database designed specifically for artificial intelligence and semantic web projects.” http://kobrix.org/hgdb.jsp
  • Special Purpose Storage Engines
  • FlockDB “FlockDB is a database that stores graph data, but it isn't a database optimized for graph-traversal operations. Instead, it's optimized for very large adjacency lists, fast reads and writes, and page-able set arithmetic queries.” http://engineering.t witter.com/2010/05/introducing- flockdb.html
  • Redis “Redis is an advanced key-value store. [...] the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists, sets, and ordered sets. All this data types can be manipulated with atomic operations to push/pop elements, add/remove elements, perform server side union, intersection, difference bet ween sets, etc.” http://code.google.com/p/redis
  • A Redis Friends/ Followers Example
  • Redis makes you think in terms of datastructures, and operations on those structures.
  • Set: Finite (for our cases) collection of objects in which order has no significance and multiplicity is generally ignored. S = { Alice, Bob, Carol } List: Finite (for our cases) collection of objects in which order *is* significant and multiplicity is allowed. L = [ X, Y, X, Z, Q]
  • Insert a user into a set SET uid:1000:username jperras SET uid:1000:password bazinga!
  • Use sets for denoting my followers/people I follow. uid:1000:followers => Set of uids of all the followers users uid:1000:following => Set of uids of all the following users
  • Adding a new follower SADD uid:1000:following 1001 SADD uid:1001:followers 1000
  • Posting Updates $r = Redis(); $postid = $r->incr("global:nextPostId"); $post = $User['id'] ."|". time() ."|". $status; $r->set("post:$postid", $post); $followers = $r->smembers("uid:".$User['id'].":followers"); if ($followers === false) $followers = Array(); $followers[] = $User['id']; /* Add the post to our own posts too */ foreach($followers as $fid) {     $r->push("uid:$fid:posts", $postid, false); } # Push the post on the timeline, and trim the timeline to the # newest 1000 elements. $r->push("global:timeline", $postid, false); $r->ltrim("global:timeline",0,1000);
  • Common followers? - Set intersections! SINTER users:1000:followers users:1000:followers
  • Let’s compare that to MySQL
  • Can be Painful
  • Even More Pain
  • Relational databases can work for the simplest of cases, but fail horribly at nearly all graph-related operations/algorithms.
  • Graphs and graph-databases are only going to be more and more useful.
  • However, graph algorithms are hard. So don’t write your own. And make sure you use a persistent storage engine that is best suited for the type of queries you will be performing.
  • Resources
  • Resources The Algorithm Design Manual, Steve S. Skiena Programming Collective Intelligence, Toby Segaran Introduction to Algorithms, Cormen, Leiserson, Rivest
  • @jperras
  • Photo Credits Graph of the internet, circa 2003: http://www.duniacyber.com/freebies/education/what- is-internet-lookslike/ (built from partial troll of public servers using traceroute) My real friends for letting me use their Facebook profile images.
  • References Large Scale Graph Algorithms (class lectures), Yuri Lifshits, Steklov Institute of Mathematics at St. Petersburg http://mathworld.wolfram.com/Set.html Programming Collective Intelligence, Toby Segaran The Algorithm Design Manual, Steve S. Skiena