3.
Networks/Graphs Nodes/verticespeople, organizations, webpages, computers Edgesrepresent connections between pairs of nodes
4.
DistanceLength of the shortest path between two nodes
5.
DistanceLength of the shortest path between two nodes
6.
Breadth-first Searchiteratively explore nodes one layer at a time
7.
# initialize distancesdist = {}for u in G: dist[u] = NAdist [u0] = 0d=0periphery = { u0 }while len(periphery) > 0: # find nodes one step away from the periphery next_level = {} for u in periphery: next_level += { w for w in neighbors[u] if dist[w] == NA } # update distances d += 1 for u in next_level: dist[u] = d # update periphery periphery = next_level
8.
BFS @ scale undirected network Input edge list, starting node u0 OutputDistance to all nodes from u0
9.
BFS @ scale undirected networkInput: edge list, distances (u, d)1. join distances with edge list2. foreach (u, d, w) output (w, d+1) [ also output (u0, 0) ]3. group by w, and output min d
10.
Connected Components undirected network Input Edge list OutputList of nodes for each component
11.
Connected GraphThere is a path between every pair of nodes
12.
Connected GraphThere is a path between every pair of nodes
13.
Connected Component A connected subset of nodes that is notcontained in any larger connected subset
14.
Connected Components undirected network1. Select a node u0 that has not yet been assigned2. BFS starting from u03. Record nodes reached by BFS
15.
Consider the global human social network,with an edge between every pair of friends Is this network connected?
16.
Consider the global human social network, with an edge between every pair of friends Is this network connected?No – there are people with no (living) friends, who are hence isolated from the rest of the network
17.
Consider the global human social network,with an edge between every pair of friendsIs there a “giant” connected component?
18.
Consider the global human social network,with an edge between every pair of friendsIs there more than one “giant” component?
19.
Consider the global human social network,with an edge between every pair of friendsIs there more than one “giant” component? No – unlikely to have two large disconnected sets of people
20.
Consider the global human social network,with an edge between every pair of friendsIs there more than one “giant” component? No – unlikely to have two large disconnected sets of people Historically it was more likely e.g., pre-Columbian America & Eurasia
21.
Consider the global human social network, with an edge between every pair of friendsOn average, how far are people from one another?
22.
The Small-world Experiment Stanley Milgram, 1967296 people were randomly selected in Omaha and WichitaPackages sent to the selected individuals with instructions toforward to a particular stock broker in Boston through a chain ofpeople they knew on a first-name basis.
23.
The Small-world Experiment Stanley Milgram, 1967 Of the 296 packages, 232 did not reach targetOf the 64 that did arrive, average path length was 6 “Six degrees of separation”
24.
Small-world phenomenonIs “six degrees” big or small?
25.
Small-world phenomenonnavigational vs. topological
26.
The Anatomy of the Facebook Social GraphJ. Ugander, B. Karrer, L. Backstrom, C. Marlow 721 million users, 69 billion edges 5 degrees of separation
27.
Edge list degree distribution undirected network Input Edge list Output Degree distribution
28.
31 2 5 4 7 6 Degree of node u # of edges incident on u
29.
Edge list degree distribution undirected network Map input: (u, w) output: (u, w), key := u output: (w, u), key := w Reduce input: u, {w1, …, wk} output: u, k
30.
Edge list degree distribution undirected network Map input: u, k identity, key := k Reduce input: k, {u1, …, um} output: k, m
31.
An email network of 130M usersEdges indicate reciprocated communication
32.
An email network of 130M usersEdges indicate reciprocated communication (log-log plot)
36.
Counting Triangles undirected network Input adjacency list OutputNumber of triangles incident on each node
37.
Counting Triangles In memoryfor u in nodes: triangles[u] = 0 for w in neighbors[u]: triangles[u] += len(neighbors[w] & neighbors[u])triangles[u] = triangles[u] / 2
38.
Counting Triangles @ scaleEvery node needs to know to which nodes it is connected and to which nodes its neighbors are connected
39.
Counting Triangles @ scale Map input: u {w1, …, wk} foreach wi: output wi u {w1, …, wk} ReduceIn memory triangle count
40.
Homophilythe tendency of individuals to associate with similar others “birds of a feather flock together”
41.
Birds of a Feather: Homophily in Social Networks McPherson, Smith-Lovin, Cook race, sex, age, religion, education, occupation, social class, behaviors, attitudes, abilities, aspirations
44.
Computing Homophily Input Edge list, race of each individual Output Distribution of race among friends White Black Latino AsianWhiteBlackLatinoAsian
45.
Computing Homophily1. Join edges (u, v) by u, demographics (w, race) by w2. Join edges (u, v, urace) by v, demographics (w, race) by w
46.
Computing Homophily1. Join edges (u, v) by u, demographics (w, race) by w2. Join edges (u, v, urace) by v, demographics (w, race) by w3. Group edges (u, v, urace, vrace) by sorted([urace, vrace])
47.
Computing Homophily1. Join edges (u, v) by u, demographics (w, race) by w2. Join edges (u, v, urace) by v, demographics (w, race) by w3. Group edges (u, v, urace, vrace) by sorted([urace, vrace])4. Count edges in each group5. Normalize the table
48.
How do ideas and products spread through society?
50.
Computing the structure of diffusion Input Twitter network Time-stamped “adoptions” of 1B URLs Output Distribution of cascade structures
51.
Computing the structure of diffusion We assume v influenced u to adopt link t ifv is the last of u’s contacts to adopt before u 2 3 5 9
52.
Computing the structure of diffusion Draw a labeled edge from v to u 3 t 5
53.
Computing the structure of diffusion Group edges by their labels (URL)
54.
Computing the structure of diffusionCompute the connected components for each forest corresponding to a URL
55.
Computing the structure of diffusionDefinition. Two (rooted) trees are isomorphic if they areidentical under a relabeling of the vertices.
56.
x (x) (x, x) ((x)) (x, (x))Basis. The canonical name c(T) for the one-node tree T is x.Induction. If T has more than one node, let T1, . . . ,Tk denote thesubtrees of the root indexed such that c(T1) ≤ c(T2) ≤ · · · ≤ c(Tk)under the lexicographic order. Then the canonical name for T is(c(T1), . . . ,c(Tk)). Aho et al. [1974]
57.
Computing the structure of diffusion Compute the canonical name for each tree in the URL forests
58.
Computing the structure of diffusionCount the number of trees of each type
59.
Computing the structure of diffusion We assume v influenced u to adopt link t ifv is the last of u’s contacts to adopt before u 2 3 5 9
60.
Computing the structure of diffusion Draw a labeled edge from v to u 3 t 5
61.
Computing the structure of diffusion1. Join adoptions (link, u, uts) by u, edges (u, w) by u2. Join (link, u, uts, w) by (link, w), adoptions (link, w, wts) by (link, w)
62.
Computing the structure of diffusion1. Join adoptions (link, u, uts) by u, edges (u, w) by u2. Join (link, u, uts, w) by (link, w), adoptions (link, w, wts) by (link, w)3. Group (link, u, uts, w, wts) by (link, u)
63.
Computing the structure of diffusion1. Join adoptions (link, u, uts) by u, edges (u, w) by u2. Join (link, u, uts, w) by (link, w), adoptions (link, w, wts) by (link, w)3. Group (link, u, uts, w, wts) by (link, u)4. Output unique “parent” edge (link, u, uts, w, wts) for each group
Be the first to comment