Your SlideShare is downloading.
×

×

Saving this for later?
Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.

Text the download link to your phone

Standard text messaging rates apply

Like this presentation? Why not share!

- Kisah dongeng by Kuan Lian Pong 981 views
- Tax Credit ERA Powerpoint by JosephSam 398 views
- A1-S4-Omar-Saltillo by nancyrios 282 views
- Q2 2013 Sage 100 ERP (MAS 90 & 200)... by BCS ProSoft 306 views
- Partnerships in faculty Professiona... by Kenneth Ronkowitz 2130 views
- Professional development andra mari by British Council 500 views

1,154

Published on

Published in:
Education

No Downloads

Total Views

1,154

On Slideshare

0

From Embeds

0

Number of Embeds

2

Shares

0

Downloads

0

Comments

0

Likes

1

No embeds

No notes for slide

- Counting Fast (Part II) Sergei Vassilvitskii Columbia University Computational Social Science March 8, 2013Thursday, March 14, 13
- Last time Counting fast: – Quadratic time doesn’t scale – Sorting is slightly more than linear – Hashing allows you to do membership queries in constant time 2 Sergei VassilvitskiiThursday, March 14, 13
- Today Counting on Networks: – Large Graphs: Internet, Facebook, Twitter – Recommendation Graphs: Netﬂix, Amazon, etc. 3 Sergei VassilvitskiiThursday, March 14, 13
- Friends & Followers Given a network: – When do people become friends? – What factors inﬂuence this? 4 Sergei VassilvitskiiThursday, March 14, 13
- Friends & Followers Given a network: – When do people become friends? – What factors inﬂuence this? Products: – People You May Know (PYMK). Reconnect people, help new users 5 Sergei VassilvitskiiThursday, March 14, 13
- Friends & Followers Given a network: – When do people become friends? – What factors inﬂuence this? Products: – People You May Know (PYMK). Reconnect people, help new users – Twitter’s who to follow? 6 Sergei VassilvitskiiThursday, March 14, 13
- Friends & Followers Given a network: – When do people become friends? – What factors inﬂuence this? Products: – People You May Know (PYMK). Reconnect people, help new users – Twitter’s who to follow? Recommendations: – Netﬂix, Amazon, etc. (Future lectures) 7 Sergei VassilvitskiiThursday, March 14, 13
- Triadic Closure Likely to become friends with: – People in similar groups – Friends of friends 8 Sergei VassilvitskiiThursday, March 14, 13
- Deﬁning Tight Knit Circles Looking for tight-knit circles: – People whose friends are friends themselves Why? – Network Cohesion: Tightly knit communities foster more trust, social norms. [Coleman ’88, Portes ’88] – Structural Holes: Individuals beneﬁt form bridging [Burt ’04, ’07] 9 Sergei VassilvitskiiThursday, March 14, 13
- Clustering Coefficient vs. 10 Sergei VassilvitskiiThursday, March 14, 13
- Clustering Coefficient cc ( ) = 0.5 cc ( ) = 0.1 vs. Given an undirected graph - For each node, it’s the fraction of v’s neighbors who are neighbors themselves - Identical to the number of triangles containing the node 11 Sergei VassilvitskiiThursday, March 14, 13
- How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ v Triangles[v]=0 12 Sergei VassilvitskiiThursday, March 14, 13
- How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ v Triangles[v]=1 w u 13 Sergei VassilvitskiiThursday, March 14, 13
- How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ v Triangles[v]=1 w u 14 Sergei VassilvitskiiThursday, March 14, 13
- How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ Running time: – For each vertex, look at all pairs of neighbors – Number of pairs ~ quadratic in the degree of the vertex – What happens if the degree is very large? 15 Sergei VassilvitskiiThursday, March 14, 13
- Parallel Version But use 1,000 machines! – Quadratic algorithms still don’t scale – Simple parallelization: process each vertex separately Naive parallelization does not help with data skew – Some nodes will have very high degree – Example. 3.2 Million followers, must generate 10 Trillion (10^13) potential edges to check. – Even if generating 100M edges per second this is 100K seconds ~ 27 hours for one vertex! 16 Sergei VassilvitskiiThursday, March 14, 13
- “Just 5 more minutes” On the LiveJournal Graph (5M nodes, 70M edges) – 80% of vertices are done after 5 min – 99% done after 35 min 17 Sergei VassilvitskiiThursday, March 14, 13
- Adapting the Algorithm Approach 1: Dealing with skew directly – currently every triangle counted 3 times (once per vertex) – Running time quadratic in the degree of the vertex – Idea: Count each once, from the perspective of lowest degree vertex – Does this heuristic work? 18 Sergei VassilvitskiiThursday, March 14, 13
- How to Count Triangles Better Idea [Schank ’07] – Only pivot on nodes who have smaller degrees than both neighbors. – Neighbors of high degree nodes tend to have small degrees 19 Sergei VassilvitskiiThursday, March 14, 13
- How to Count Triangles Better foreach v in V foreach u in Adjacency(v) with deg(u) > deg(v): foreach w in Adjacency(v) with deg(w) > deg(v): if (u,w) is an edge: Triangles[v]++ Triangles[w]++ Triangles[u]++ 20 Sergei VassilvitskiiThursday, March 14, 13
- Does it make a difference? 21 Sergei VassilvitskiiThursday, March 14, 13
- Why does it help? Look at two different kinds of nodes: – Few friends: • OK to be quadratic on small instances – Lots of friends • Only care about number of friends with even more friends! • Cannot have too many (can make this formal) 22 Sergei VassilvitskiiThursday, March 14, 13
- Break 23 Sergei VassilvitskiiThursday, March 14, 13
- Working in Parallel MapReduce (review): Map: – Decide how to group the data for computation Reduce: – Given the grouping, perform the computation 24 Sergei VassilvitskiiThursday, March 14, 13
- Building People You May Know Friendships are undirected: – If Alice knows Bob, Bob knows Alice – Data stored as a list of all edges – Find all friends of friends – Score the possible pairs 25 Sergei VassilvitskiiThursday, March 14, 13
- Data Suppose you have edges and degrees of each vertex: Joe 56 Mary 78 Alice 398 Bob 198 Dan 983 Justin 11,985,234 ... An alternate view may be data stored as adjacency list: Joe 56 Mary 78 Don 99 Bill 1 Alice 398 Kate 55 Bob 198 Mary 78 ... 26 Sergei VassilvitskiiThursday, March 14, 13
- Previous Algorithm Adjacency list input. – Map: • For each node and its neighbors, output all paths through the node – Reduce: • none – Map: [ | ] – Output: – Map: [ | ] – Output: None 27 Sergei VassilvitskiiThursday, March 14, 13
- How to Count Triangles Better Idea [Schank ’07] – Only pivot on nodes who have smaller degrees than both neighbors. – Neighbors of high degree nodes tend to have small degrees 28 Sergei VassilvitskiiThursday, March 14, 13
- Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree 29 Sergei VassilvitskiiThursday, March 14, 13
- Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree 30 Sergei VassilvitskiiThursday, March 14, 13
- Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree 31 Sergei VassilvitskiiThursday, March 14, 13
- Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree – Orient each edge to point to a node of higher degree, breaking ties arbitrarily but consistently 32 Sergei VassilvitskiiThursday, March 14, 13
- Want to compute all open triads Map: – Orient each edge to point to a node of higher degree, breaking ties arbitrarily but consistently – Given: Joe 56 Mary 78 – Output: <Key = Joe, Value = Mary> – Given: Alice 398 Bob 198 – Output: <Key = Bob, Value = Alice> map(key, value): split = value.split() if split[3] > split[1] or (split[3] == split[1] and split[0] < split[2]): emit(split[0], split[2]) if split[3] < split[1] or (split[3] == split[1] and split[0] > split[2]): emit(split[2], split[0]) 33 Sergei VassilvitskiiThursday, March 14, 13
- Want to compute all open triads Aggregate (Shuffle): – Collect all values with same key (nodes with higher degree) Computation: – Generate all 2-paths (friend of a friend relationships): 34 Sergei VassilvitskiiThursday, March 14, 13
- Want to compute all open triads Aggregate (Shuffle): – Collect all values with same key (nodes with higher degree) Computation: – Generate all 2-paths (friend of a friend relationships): – Generate all 2-paths: , , 35 Sergei VassilvitskiiThursday, March 14, 13
- Want to compute all open triads Aggregate (Shuffle): – Collect all values with same key (nodes with higher degree) Computation: – Generate all 2-paths (friend of a friend relationships) – Given: key= Joe, value={Mary, Justin, Alice} – Output: • (key = Joe, Value = (Mary, Justin)) • (key = Joe, Value = (Mary, Alice)) • (key = Joe, Value = (Justin, Alice)) reduce(key, values): for friend1 : values for friend2 : values emit(key, (friend1, friend2)) 36 Sergei VassilvitskiiThursday, March 14, 13
- Comparing Algorithms Edgelist MapOnly Algorithm: – MapOnly – Output from some nodes is quadratic Edge at a time Algroithm: – Map & Reduce – More balanced output from each node 37 Sergei VassilvitskiiThursday, March 14, 13
- Scoring Some suggestions are better than others: – Some people are already friends! – Or they used to be friends... – Connected through a friend with 1000s of friends – Connected through multiple friends – ... 38 Sergei VassilvitskiiThursday, March 14, 13
- Spring Break! 39 Sergei VassilvitskiiThursday, March 14, 13

Be the first to comment