Computational Social Science, Lecture 06: Networks, Part II

2,102 views

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,102
On SlideShare
0
From Embeds
0
Number of Embeds
1,515
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Computational Social Science, Lecture 06: Networks, Part II

  1. 1. Networks Part II Sharad Goel Columbia UniversityComputational Social Science: Lecture 6 March 1, 2013
  2. 2. Corporate E-mail Communication[ Adamic & Adar, 2004 ]via Easley & Kleinberg
  3. 3. Networks/Graphs Nodes/verticespeople, organizations, webpages, computers Edgesrepresent connections between pairs of nodes
  4. 4. DistanceLength of the shortest path between two nodes
  5. 5. DistanceLength of the shortest path between two nodes
  6. 6. Breadth-first Searchiteratively explore nodes one layer at a time
  7. 7. # initialize distancesdist = {}for u in G: dist[u] = NAdist [u0] = 0d=0periphery = { u0 }while len(periphery) > 0: # find nodes one step away from the periphery next_level = {} for u in periphery: next_level += { w for w in neighbors[u] if dist[w] == NA } # update distances d += 1 for u in next_level: dist[u] = d # update periphery periphery = next_level
  8. 8. BFS @ scale undirected network Input edge list, starting node u0 OutputDistance to all nodes from u0
  9. 9. BFS @ scale undirected networkInput: edge list, distances (u, d)1. join distances with edge list2. foreach (u, d, w) output (w, d+1) [ also output (u0, 0) ]3. group by w, and output min d
  10. 10. Connected Components undirected network Input Edge list OutputList of nodes for each component
  11. 11. Connected GraphThere is a path between every pair of nodes
  12. 12. Connected GraphThere is a path between every pair of nodes
  13. 13. Connected Component A connected subset of nodes that is notcontained in any larger connected subset
  14. 14. Connected Components undirected network1. Select a node u0 that has not yet been assigned2. BFS starting from u03. Record nodes reached by BFS
  15. 15. Consider the global human social network,with an edge between every pair of friends Is this network connected?
  16. 16. Consider the global human social network, with an edge between every pair of friends Is this network connected?No – there are people with no (living) friends, who are hence isolated from the rest of the network
  17. 17. Consider the global human social network,with an edge between every pair of friendsIs there a “giant” connected component?
  18. 18. Consider the global human social network,with an edge between every pair of friendsIs there more than one “giant” component?
  19. 19. Consider the global human social network,with an edge between every pair of friendsIs there more than one “giant” component? No – unlikely to have two large disconnected sets of people
  20. 20. Consider the global human social network,with an edge between every pair of friendsIs there more than one “giant” component? No – unlikely to have two large disconnected sets of people Historically it was more likely e.g., pre-Columbian America & Eurasia
  21. 21. Consider the global human social network, with an edge between every pair of friendsOn average, how far are people from one another?
  22. 22. The Small-world Experiment Stanley Milgram, 1967296 people were randomly selected in Omaha and WichitaPackages sent to the selected individuals with instructions toforward to a particular stock broker in Boston through a chain ofpeople they knew on a first-name basis.
  23. 23. The Small-world Experiment Stanley Milgram, 1967 Of the 296 packages, 232 did not reach targetOf the 64 that did arrive, average path length was 6 “Six degrees of separation”
  24. 24. Small-world phenomenonIs “six degrees” big or small?
  25. 25. Small-world phenomenonnavigational vs. topological
  26. 26. The Anatomy of the Facebook Social GraphJ. Ugander, B. Karrer, L. Backstrom, C. Marlow 721 million users, 69 billion edges 5 degrees of separation
  27. 27. Edge list  degree distribution undirected network Input Edge list Output Degree distribution
  28. 28. 31 2 5 4 7 6 Degree of node u # of edges incident on u
  29. 29. Edge list  degree distribution undirected network Map input: (u, w) output: (u, w), key := u output: (w, u), key := w Reduce input: u, {w1, …, wk} output: u, k
  30. 30. Edge list  degree distribution undirected network Map input: u, k identity, key := k Reduce input: k, {u1, …, um} output: k, m
  31. 31. An email network of 130M usersEdges indicate reciprocated communication
  32. 32. An email network of 130M usersEdges indicate reciprocated communication (log-log plot)
  33. 33. Clustering
  34. 34. Clustering
  35. 35. Triadic closure1. Opportunity2. Incentive3. Commonality
  36. 36. Counting Triangles undirected network Input adjacency list OutputNumber of triangles incident on each node
  37. 37. Counting Triangles In memoryfor u in nodes: triangles[u] = 0 for w in neighbors[u]: triangles[u] += len(neighbors[w] & neighbors[u])triangles[u] = triangles[u] / 2
  38. 38. Counting Triangles @ scaleEvery node needs to know to which nodes it is connected and to which nodes its neighbors are connected
  39. 39. Counting Triangles @ scale Map input: u {w1, …, wk} foreach wi: output wi u {w1, …, wk} ReduceIn memory triangle count
  40. 40. Homophilythe tendency of individuals to associate with similar others “birds of a feather flock together”
  41. 41. Birds of a Feather: Homophily in Social Networks McPherson, Smith-Lovin, Cook race, sex, age, religion, education, occupation, social class, behaviors, attitudes, abilities, aspirations
  42. 42. Homophily1. Preference2. Influence3. Opportunity
  43. 43. Fantasy Football
  44. 44. Computing Homophily Input Edge list, race of each individual Output Distribution of race among friends White Black Latino AsianWhiteBlackLatinoAsian
  45. 45. Computing Homophily1. Join edges (u, v) by u, demographics (w, race) by w2. Join edges (u, v, urace) by v, demographics (w, race) by w
  46. 46. Computing Homophily1. Join edges (u, v) by u, demographics (w, race) by w2. Join edges (u, v, urace) by v, demographics (w, race) by w3. Group edges (u, v, urace, vrace) by sorted([urace, vrace])
  47. 47. Computing Homophily1. Join edges (u, v) by u, demographics (w, race) by w2. Join edges (u, v, urace) by v, demographics (w, race) by w3. Group edges (u, v, urace, vrace) by sorted([urace, vrace])4. Count edges in each group5. Normalize the table
  48. 48. How do ideas and products spread through society?
  49. 49. The structure of diffusion93% 5% 1% 0.3% 0.3%
  50. 50. Computing the structure of diffusion Input Twitter network Time-stamped “adoptions” of 1B URLs Output Distribution of cascade structures
  51. 51. Computing the structure of diffusion We assume v influenced u to adopt link t ifv is the last of u’s contacts to adopt before u 2 3 5 9
  52. 52. Computing the structure of diffusion Draw a labeled edge from v to u 3 t 5
  53. 53. Computing the structure of diffusion Group edges by their labels (URL)
  54. 54. Computing the structure of diffusionCompute the connected components for each forest corresponding to a URL
  55. 55. Computing the structure of diffusionDefinition. Two (rooted) trees are isomorphic if they areidentical under a relabeling of the vertices.
  56. 56. x (x) (x, x) ((x)) (x, (x))Basis. The canonical name c(T) for the one-node tree T is x.Induction. If T has more than one node, let T1, . . . ,Tk denote thesubtrees of the root indexed such that c(T1) ≤ c(T2) ≤ · · · ≤ c(Tk)under the lexicographic order. Then the canonical name for T is(c(T1), . . . ,c(Tk)). Aho et al. [1974]
  57. 57. Computing the structure of diffusion Compute the canonical name for each tree in the URL forests
  58. 58. Computing the structure of diffusionCount the number of trees of each type
  59. 59. Computing the structure of diffusion We assume v influenced u to adopt link t ifv is the last of u’s contacts to adopt before u 2 3 5 9
  60. 60. Computing the structure of diffusion Draw a labeled edge from v to u 3 t 5
  61. 61. Computing the structure of diffusion1. Join adoptions (link, u, uts) by u, edges (u, w) by u2. Join (link, u, uts, w) by (link, w), adoptions (link, w, wts) by (link, w)
  62. 62. Computing the structure of diffusion1. Join adoptions (link, u, uts) by u, edges (u, w) by u2. Join (link, u, uts, w) by (link, w), adoptions (link, w, wts) by (link, w)3. Group (link, u, uts, w, wts) by (link, u)
  63. 63. Computing the structure of diffusion1. Join adoptions (link, u, uts) by u, edges (u, w) by u2. Join (link, u, uts, w) by (link, w), adoptions (link, w, wts) by (link, w)3. Group (link, u, uts, w, wts) by (link, u)4. Output unique “parent” edge (link, u, uts, w, wts) for each group

×