Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Computational Social Science, Lectu... by jakehofman 1796 views
- Computational Social Science, Lectu... by jakehofman 1609 views
- Computational Social Science, Lectu... by jakehofman 3216 views
- Computational Social Science, Lectu... by jakehofman 1834 views
- Computational Social Science, Lectu... by jakehofman 1658 views
- Computational Social Science, Lectu... by jakehofman 2119 views

1,727 views

Published on

Published in:
Education

No Downloads

Total views

1,727

On SlideShare

0

From Embeds

0

Number of Embeds

1,120

Shares

0

Downloads

0

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Counting Fast (Part II) Sergei Vassilvitskii Columbia University Computational Social Science March 8, 2013Thursday, March 14, 13
- 2. Last time Counting fast: – Quadratic time doesn’t scale – Sorting is slightly more than linear – Hashing allows you to do membership queries in constant time 2 Sergei VassilvitskiiThursday, March 14, 13
- 3. Today Counting on Networks: – Large Graphs: Internet, Facebook, Twitter – Recommendation Graphs: Netﬂix, Amazon, etc. 3 Sergei VassilvitskiiThursday, March 14, 13
- 4. Friends & Followers Given a network: – When do people become friends? – What factors inﬂuence this? 4 Sergei VassilvitskiiThursday, March 14, 13
- 5. Friends & Followers Given a network: – When do people become friends? – What factors inﬂuence this? Products: – People You May Know (PYMK). Reconnect people, help new users 5 Sergei VassilvitskiiThursday, March 14, 13
- 6. Friends & Followers Given a network: – When do people become friends? – What factors inﬂuence this? Products: – People You May Know (PYMK). Reconnect people, help new users – Twitter’s who to follow? 6 Sergei VassilvitskiiThursday, March 14, 13
- 7. Friends & Followers Given a network: – When do people become friends? – What factors inﬂuence this? Products: – People You May Know (PYMK). Reconnect people, help new users – Twitter’s who to follow? Recommendations: – Netﬂix, Amazon, etc. (Future lectures) 7 Sergei VassilvitskiiThursday, March 14, 13
- 8. Triadic Closure Likely to become friends with: – People in similar groups – Friends of friends 8 Sergei VassilvitskiiThursday, March 14, 13
- 9. Deﬁning Tight Knit Circles Looking for tight-knit circles: – People whose friends are friends themselves Why? – Network Cohesion: Tightly knit communities foster more trust, social norms. [Coleman ’88, Portes ’88] – Structural Holes: Individuals beneﬁt form bridging [Burt ’04, ’07] 9 Sergei VassilvitskiiThursday, March 14, 13
- 10. Clustering Coefficient vs. 10 Sergei VassilvitskiiThursday, March 14, 13
- 11. Clustering Coefficient cc ( ) = 0.5 cc ( ) = 0.1 vs. Given an undirected graph - For each node, it’s the fraction of v’s neighbors who are neighbors themselves - Identical to the number of triangles containing the node 11 Sergei VassilvitskiiThursday, March 14, 13
- 12. How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ v Triangles[v]=0 12 Sergei VassilvitskiiThursday, March 14, 13
- 13. How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ v Triangles[v]=1 w u 13 Sergei VassilvitskiiThursday, March 14, 13
- 14. How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ v Triangles[v]=1 w u 14 Sergei VassilvitskiiThursday, March 14, 13
- 15. How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ Running time: – For each vertex, look at all pairs of neighbors – Number of pairs ~ quadratic in the degree of the vertex – What happens if the degree is very large? 15 Sergei VassilvitskiiThursday, March 14, 13
- 16. Parallel Version But use 1,000 machines! – Quadratic algorithms still don’t scale – Simple parallelization: process each vertex separately Naive parallelization does not help with data skew – Some nodes will have very high degree – Example. 3.2 Million followers, must generate 10 Trillion (10^13) potential edges to check. – Even if generating 100M edges per second this is 100K seconds ~ 27 hours for one vertex! 16 Sergei VassilvitskiiThursday, March 14, 13
- 17. “Just 5 more minutes” On the LiveJournal Graph (5M nodes, 70M edges) – 80% of vertices are done after 5 min – 99% done after 35 min 17 Sergei VassilvitskiiThursday, March 14, 13
- 18. Adapting the Algorithm Approach 1: Dealing with skew directly – currently every triangle counted 3 times (once per vertex) – Running time quadratic in the degree of the vertex – Idea: Count each once, from the perspective of lowest degree vertex – Does this heuristic work? 18 Sergei VassilvitskiiThursday, March 14, 13
- 19. How to Count Triangles Better Idea [Schank ’07] – Only pivot on nodes who have smaller degrees than both neighbors. – Neighbors of high degree nodes tend to have small degrees 19 Sergei VassilvitskiiThursday, March 14, 13
- 20. How to Count Triangles Better foreach v in V foreach u in Adjacency(v) with deg(u) > deg(v): foreach w in Adjacency(v) with deg(w) > deg(v): if (u,w) is an edge: Triangles[v]++ Triangles[w]++ Triangles[u]++ 20 Sergei VassilvitskiiThursday, March 14, 13
- 21. Does it make a difference? 21 Sergei VassilvitskiiThursday, March 14, 13
- 22. Why does it help? Look at two different kinds of nodes: – Few friends: • OK to be quadratic on small instances – Lots of friends • Only care about number of friends with even more friends! • Cannot have too many (can make this formal) 22 Sergei VassilvitskiiThursday, March 14, 13
- 23. Break 23 Sergei VassilvitskiiThursday, March 14, 13
- 24. Working in Parallel MapReduce (review): Map: – Decide how to group the data for computation Reduce: – Given the grouping, perform the computation 24 Sergei VassilvitskiiThursday, March 14, 13
- 25. Building People You May Know Friendships are undirected: – If Alice knows Bob, Bob knows Alice – Data stored as a list of all edges – Find all friends of friends – Score the possible pairs 25 Sergei VassilvitskiiThursday, March 14, 13
- 26. Data Suppose you have edges and degrees of each vertex: Joe 56 Mary 78 Alice 398 Bob 198 Dan 983 Justin 11,985,234 ... An alternate view may be data stored as adjacency list: Joe 56 Mary 78 Don 99 Bill 1 Alice 398 Kate 55 Bob 198 Mary 78 ... 26 Sergei VassilvitskiiThursday, March 14, 13
- 27. Previous Algorithm Adjacency list input. – Map: • For each node and its neighbors, output all paths through the node – Reduce: • none – Map: [ | ] – Output: – Map: [ | ] – Output: None 27 Sergei VassilvitskiiThursday, March 14, 13
- 28. How to Count Triangles Better Idea [Schank ’07] – Only pivot on nodes who have smaller degrees than both neighbors. – Neighbors of high degree nodes tend to have small degrees 28 Sergei VassilvitskiiThursday, March 14, 13
- 29. Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree 29 Sergei VassilvitskiiThursday, March 14, 13
- 30. Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree 30 Sergei VassilvitskiiThursday, March 14, 13
- 31. Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree 31 Sergei VassilvitskiiThursday, March 14, 13
- 32. Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree – Orient each edge to point to a node of higher degree, breaking ties arbitrarily but consistently 32 Sergei VassilvitskiiThursday, March 14, 13
- 33. Want to compute all open triads Map: – Orient each edge to point to a node of higher degree, breaking ties arbitrarily but consistently – Given: Joe 56 Mary 78 – Output: <Key = Joe, Value = Mary> – Given: Alice 398 Bob 198 – Output: <Key = Bob, Value = Alice> map(key, value): split = value.split() if split[3] > split[1] or (split[3] == split[1] and split[0] < split[2]): emit(split[0], split[2]) if split[3] < split[1] or (split[3] == split[1] and split[0] > split[2]): emit(split[2], split[0]) 33 Sergei VassilvitskiiThursday, March 14, 13
- 34. Want to compute all open triads Aggregate (Shuffle): – Collect all values with same key (nodes with higher degree) Computation: – Generate all 2-paths (friend of a friend relationships): 34 Sergei VassilvitskiiThursday, March 14, 13
- 35. Want to compute all open triads Aggregate (Shuffle): – Collect all values with same key (nodes with higher degree) Computation: – Generate all 2-paths (friend of a friend relationships): – Generate all 2-paths: , , 35 Sergei VassilvitskiiThursday, March 14, 13
- 36. Want to compute all open triads Aggregate (Shuffle): – Collect all values with same key (nodes with higher degree) Computation: – Generate all 2-paths (friend of a friend relationships) – Given: key= Joe, value={Mary, Justin, Alice} – Output: • (key = Joe, Value = (Mary, Justin)) • (key = Joe, Value = (Mary, Alice)) • (key = Joe, Value = (Justin, Alice)) reduce(key, values): for friend1 : values for friend2 : values emit(key, (friend1, friend2)) 36 Sergei VassilvitskiiThursday, March 14, 13
- 37. Comparing Algorithms Edgelist MapOnly Algorithm: – MapOnly – Output from some nodes is quadratic Edge at a time Algroithm: – Map & Reduce – More balanced output from each node 37 Sergei VassilvitskiiThursday, March 14, 13
- 38. Scoring Some suggestions are better than others: – Some people are already friends! – Or they used to be friends... – Connected through a friend with 1000s of friends – Connected through multiple friends – ... 38 Sergei VassilvitskiiThursday, March 14, 13
- 39. Spring Break! 39 Sergei VassilvitskiiThursday, March 14, 13

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment