Your SlideShare is downloading. ×
0
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Computational Social Science, Lecture 08: Counting Fast, Part II

1,154

Published on

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,154
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  1. Counting Fast (Part II) Sergei Vassilvitskii Columbia University Computational Social Science March 8, 2013Thursday, March 14, 13
  2. Last time Counting fast: – Quadratic time doesn’t scale – Sorting is slightly more than linear – Hashing allows you to do membership queries in constant time 2 Sergei VassilvitskiiThursday, March 14, 13
  3. Today Counting on Networks: – Large Graphs: Internet, Facebook, Twitter – Recommendation Graphs: Netflix, Amazon, etc. 3 Sergei VassilvitskiiThursday, March 14, 13
  4. Friends & Followers Given a network: – When do people become friends? – What factors influence this? 4 Sergei VassilvitskiiThursday, March 14, 13
  5. Friends & Followers Given a network: – When do people become friends? – What factors influence this? Products: – People You May Know (PYMK). Reconnect people, help new users 5 Sergei VassilvitskiiThursday, March 14, 13
  6. Friends & Followers Given a network: – When do people become friends? – What factors influence this? Products: – People You May Know (PYMK). Reconnect people, help new users – Twitter’s who to follow? 6 Sergei VassilvitskiiThursday, March 14, 13
  7. Friends & Followers Given a network: – When do people become friends? – What factors influence this? Products: – People You May Know (PYMK). Reconnect people, help new users – Twitter’s who to follow? Recommendations: – Netflix, Amazon, etc. (Future lectures) 7 Sergei VassilvitskiiThursday, March 14, 13
  8. Triadic Closure Likely to become friends with: – People in similar groups – Friends of friends 8 Sergei VassilvitskiiThursday, March 14, 13
  9. Defining Tight Knit Circles Looking for tight-knit circles: – People whose friends are friends themselves Why? – Network Cohesion: Tightly knit communities foster more trust, social norms. [Coleman ’88, Portes ’88] – Structural Holes: Individuals benefit form bridging [Burt ’04, ’07] 9 Sergei VassilvitskiiThursday, March 14, 13
  10. Clustering Coefficient vs. 10 Sergei VassilvitskiiThursday, March 14, 13
  11. Clustering Coefficient cc ( ) = 0.5 cc ( ) = 0.1 vs. Given an undirected graph - For each node, it’s the fraction of v’s neighbors who are neighbors themselves - Identical to the number of triangles containing the node 11 Sergei VassilvitskiiThursday, March 14, 13
  12. How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ v Triangles[v]=0 12 Sergei VassilvitskiiThursday, March 14, 13
  13. How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ v Triangles[v]=1 w u 13 Sergei VassilvitskiiThursday, March 14, 13
  14. How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ v Triangles[v]=1 w u 14 Sergei VassilvitskiiThursday, March 14, 13
  15. How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ Running time: – For each vertex, look at all pairs of neighbors – Number of pairs ~ quadratic in the degree of the vertex – What happens if the degree is very large? 15 Sergei VassilvitskiiThursday, March 14, 13
  16. Parallel Version But use 1,000 machines! – Quadratic algorithms still don’t scale – Simple parallelization: process each vertex separately Naive parallelization does not help with data skew – Some nodes will have very high degree – Example. 3.2 Million followers, must generate 10 Trillion (10^13) potential edges to check. – Even if generating 100M edges per second this is 100K seconds ~ 27 hours for one vertex! 16 Sergei VassilvitskiiThursday, March 14, 13
  17. “Just 5 more minutes” On the LiveJournal Graph (5M nodes, 70M edges) – 80% of vertices are done after 5 min – 99% done after 35 min 17 Sergei VassilvitskiiThursday, March 14, 13
  18. Adapting the Algorithm Approach 1: Dealing with skew directly – currently every triangle counted 3 times (once per vertex) – Running time quadratic in the degree of the vertex – Idea: Count each once, from the perspective of lowest degree vertex – Does this heuristic work? 18 Sergei VassilvitskiiThursday, March 14, 13
  19. How to Count Triangles Better Idea [Schank ’07] – Only pivot on nodes who have smaller degrees than both neighbors. – Neighbors of high degree nodes tend to have small degrees 19 Sergei VassilvitskiiThursday, March 14, 13
  20. How to Count Triangles Better foreach v in V foreach u in Adjacency(v) with deg(u) > deg(v): foreach w in Adjacency(v) with deg(w) > deg(v): if (u,w) is an edge: Triangles[v]++ Triangles[w]++ Triangles[u]++ 20 Sergei VassilvitskiiThursday, March 14, 13
  21. Does it make a difference? 21 Sergei VassilvitskiiThursday, March 14, 13
  22. Why does it help? Look at two different kinds of nodes: – Few friends: • OK to be quadratic on small instances – Lots of friends • Only care about number of friends with even more friends! • Cannot have too many (can make this formal) 22 Sergei VassilvitskiiThursday, March 14, 13
  23. Break 23 Sergei VassilvitskiiThursday, March 14, 13
  24. Working in Parallel MapReduce (review): Map: – Decide how to group the data for computation Reduce: – Given the grouping, perform the computation 24 Sergei VassilvitskiiThursday, March 14, 13
  25. Building People You May Know Friendships are undirected: – If Alice knows Bob, Bob knows Alice – Data stored as a list of all edges – Find all friends of friends – Score the possible pairs 25 Sergei VassilvitskiiThursday, March 14, 13
  26. Data Suppose you have edges and degrees of each vertex: Joe 56 Mary 78 Alice 398 Bob 198 Dan 983 Justin 11,985,234 ... An alternate view may be data stored as adjacency list: Joe 56 Mary 78 Don 99 Bill 1 Alice 398 Kate 55 Bob 198 Mary 78 ... 26 Sergei VassilvitskiiThursday, March 14, 13
  27. Previous Algorithm Adjacency list input. – Map: • For each node and its neighbors, output all paths through the node – Reduce: • none – Map: [ | ] – Output: – Map: [ | ] – Output: None 27 Sergei VassilvitskiiThursday, March 14, 13
  28. How to Count Triangles Better Idea [Schank ’07] – Only pivot on nodes who have smaller degrees than both neighbors. – Neighbors of high degree nodes tend to have small degrees 28 Sergei VassilvitskiiThursday, March 14, 13
  29. Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree 29 Sergei VassilvitskiiThursday, March 14, 13
  30. Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree 30 Sergei VassilvitskiiThursday, March 14, 13
  31. Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree 31 Sergei VassilvitskiiThursday, March 14, 13
  32. Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree – Orient each edge to point to a node of higher degree, breaking ties arbitrarily but consistently 32 Sergei VassilvitskiiThursday, March 14, 13
  33. Want to compute all open triads Map: – Orient each edge to point to a node of higher degree, breaking ties arbitrarily but consistently – Given: Joe 56 Mary 78 – Output: <Key = Joe, Value = Mary> – Given: Alice 398 Bob 198 – Output: <Key = Bob, Value = Alice> map(key, value): split = value.split() if split[3] > split[1] or (split[3] == split[1] and split[0] < split[2]): emit(split[0], split[2]) if split[3] < split[1] or (split[3] == split[1] and split[0] > split[2]): emit(split[2], split[0]) 33 Sergei VassilvitskiiThursday, March 14, 13
  34. Want to compute all open triads Aggregate (Shuffle): – Collect all values with same key (nodes with higher degree) Computation: – Generate all 2-paths (friend of a friend relationships): 34 Sergei VassilvitskiiThursday, March 14, 13
  35. Want to compute all open triads Aggregate (Shuffle): – Collect all values with same key (nodes with higher degree) Computation: – Generate all 2-paths (friend of a friend relationships): – Generate all 2-paths: , , 35 Sergei VassilvitskiiThursday, March 14, 13
  36. Want to compute all open triads Aggregate (Shuffle): – Collect all values with same key (nodes with higher degree) Computation: – Generate all 2-paths (friend of a friend relationships) – Given: key= Joe, value={Mary, Justin, Alice} – Output: • (key = Joe, Value = (Mary, Justin)) • (key = Joe, Value = (Mary, Alice)) • (key = Joe, Value = (Justin, Alice)) reduce(key, values): for friend1 : values for friend2 : values emit(key, (friend1, friend2)) 36 Sergei VassilvitskiiThursday, March 14, 13
  37. Comparing Algorithms Edgelist MapOnly Algorithm: – MapOnly – Output from some nodes is quadratic Edge at a time Algroithm: – Map & Reduce – More balanced output from each node 37 Sergei VassilvitskiiThursday, March 14, 13
  38. Scoring Some suggestions are better than others: – Some people are already friends! – Or they used to be friends... – Connected through a friend with 1000s of friends – Connected through multiple friends – ... 38 Sergei VassilvitskiiThursday, March 14, 13
  39. Spring Break! 39 Sergei VassilvitskiiThursday, March 14, 13

×