Your SlideShare is downloading. ×
0
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Vertex neighborhoods, low conductance cuts, and good seeds for local community methods

725

Published on

My talk from KDD2012 about vertex neighborhoods and low conductance cuts. See the paper here: http://arxiv.org/abs/1112.0031 and http://dl.acm.org/citation.cfm?id=2339628

My talk from KDD2012 about vertex neighborhoods and low conductance cuts. See the paper here: http://arxiv.org/abs/1112.0031 and http://dl.acm.org/citation.cfm?id=2339628

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
725
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  1. Vertex Neighborhoods, !Low Conductance Cuts, !and Good Seeds for LocalCommunity MethodsDAVID F. GLEICHPURDUEC. SESHADHRISANDIA - LIVERMOREKDD2012David Gleich · Purdue
  2. Neighborhoods are good communities
  3. Neighborhoods are good communities^conductance^Vertex
  4. Neighborhoods are good communities^conductance^A Vertex(4-4 𝜅)/(3-2 𝜅)
  5. Neighborhoods are good communities^conductance^A Vertex(4-4 𝜅)/(3-2 𝜅) where 𝜅 isthe clusteringcoefficientand the graph has a heavy tailed degree distribution
  6. Neighborhoods are good communities^conductance^A Vertex(4-4 𝜅)/(3-2 𝜅) where 𝜅 isthe clusteringcoefficientand the graph has a heavy tailed degree distributionis a y
  7. A vertex neighborhood is a“good” conductance communityin a graph with a heavy-taileddegree distribution and largeclustering coefficient.
  8. Our contributions1.  The previous theorem and its proof. Thisshows that good communities are expectedand easy to find in modern networks withheavy-tailed degrees and large clustering.2.  An empirical evaluation of neighborhoodcommunities that shows vertexneighborhoods are the “backbone” of thenetwork community profile.KDD2012David Gleich · Purdue
  9. Formal background for the theorem1.  Vertex neighborhoods2.  Low conductance cuts3.  Clustering coefficientsKDD2012David Gleich · Purdue
  10. Vertex neighborhoodsThe set of a vertex and"all its neighborhood Also called an “egonet”Prior research on egonets of social networks fromthe “structural holes” perspective [Burt95,Kleinberg08]. Used for anomaly detection [Akoglu10], "community seeds [Huang11,Schaeffer11], "overlapping communities [Schaeffer07,Rees10]. KDD2012David Gleich · Purdue
  11. Conductance communitiesConductance is one of the mostimportant community scores [Schaeffer07]The conductance of a set of vertices isthe ratio of edges leaving to total edges:Equivalently, it’s the probability that arandom edge leaves the set.Small conductance ó Good community(S) =cut(S)min vol(S), vol( ¯S)(edges leaving the set)(total edgesin the set)KDD2012David Gleich · Purduecut(S) = 7vol(S) = 33vol( ¯S) = 11(S) = 7/11
  12. Clustering coefficientsWedgeGlobal clustering coefficient =number of closed wedgesnumber of wedgescenter of wedgeclosed wedgeProbability that arandom wedgeis closedKDD2012David Gleich · Purdue
  13. Simple version of theoremIf global clustering coefficient = 1, then "the graph is a disjoint union of cliques.Vertex neighborhoods are optimal communities!KDD2012David Gleich · Purdue
  14. TheoremCondition Let graph G haveclustering coefficient 𝜅 and "have vertex degrees bounded "by a power-law function withexponent 𝛾 less than 3.Theorem Then there exists a vertexneighborhood with conductance log degreelogprobability↵1n/d↵2n/d 4(1 )/(3 2)KDD2012David Gleich · Purdue
  15. Proof Sketch1) Large clustering coefficient "⇒ many wedges are closed2) Heavy tailed degree dist "⇒ a few vertices have a very large degree3) Large degree ⇒ O(d 2) wedges ⇒ “most” of wedges Thus, there must exist a vertex with a high edge density ⇒“good” conductance Use the probabilistic method to formalize10010110210310400.20.40.60.81CDFofNumberofWedgesDegreeKDD2012David Gleich · Purdue
  16. Confession!The theory is weak(S)  4(1 )/(3 2)Collaborationnetworks "𝜅 ~ [0.1 – 0.5]Social networks "𝜅 ~ [0.05 – 0.1]Graph Verts Edges Avg.Deg.MaxDeg. ¯Cca-AstroPh 17903 196972 22.0 504 0.318 0.633email-Enron 33696 180811 10.7 1383 0.085 0.509cond-mat-2005 36458 171735 9.4 278 0.243 0.657arxiv 86376 517563 12.0 1253 0.560 0.678dblp 226413 716460 6.3 238 0.383 0.635hollywood-2009 1069126 56306653 105.3 11467 0.310 0.766fb-Penn94 41536 1362220 65.6 4410 0.098 0.212fb-A-oneyear 1138557 4404989 7.7 695 0.038 0.060fb-A 3097165 23667394 15.3 4915 0.048 0.097soc-LiveJournal1 4843953 42845684 17.7 20333 0.118 0.274oregon2-010526 11461 32730 5.7 2432 0.037 0.352p2p-Gnutella25 22663 54693 4.8 66 0.005 0.005as-22july06 22963 48436 4.2 2390 0.011 0.230itdk0304 190914 607610 6.4 1071 0.061 0.158Verts Edges Avg.Deg.MaxDeg. ¯C17903 196972 22.0 504 0.318 0.633n 33696 180811 10.7 1383 0.085 0.5092005 36458 171735 9.4 278 0.243 0.65786376 517563 12.0 1253 0.560 0.678226413 716460 6.3 238 0.383 0.6352009 1069126 56306653 105.3 11467 0.310 0.76641536 1362220 65.6 4410 0.098 0.212ar 1138557 4404989 7.7 695 0.038 0.0603097165 23667394 15.3 4915 0.048 0.097urnal1 4843953 42845684 17.7 20333 0.118 0.27410526 11461 32730 5.7 2432 0.037 0.352la25 22663 54693 4.8 66 0.005 0.0056 22963 48436 4.2 2390 0.011 0.230190914 607610 6.4 1071 0.061 0.158Tech. networks "𝜅 ~ [0.005 – 0.05]This bound is uselessunless 𝜅 ≥ 1/2KDD2012David Gleich · Purdue
  17. We view this theory as "“intuition for the truth”KDD2012David Gleich · Purdue
  18. Empirical Evaluation usingNetwork Community Profilesla25 [27]) or as304 [10]). Thethe nodes.and the edges].ODure to computehe conductanceost of the workerformed whenWe can express10010110210310410510−410maxdeg10010110210310410510−410maxdeg100101110−410−3100101110−410−3fb-A-oneyear10010110210310410510−410−310−210−1100maxdeg10010110210310410510−410−310−210−1100maxdeg10010110−410−310−210−110010010110−410−310−210−1100soc-LiveJournal1 ca100100100100Community SizeMinimumconductance forany community ofthe given sizeCanonical shapefound byLeskovec, Lang,Dasgupta, andMahoneyHolds for a varietyof approximationsto conductance.KDD2012David Gleich · Purdue
  19. Empirical Evaluation usingNetwork Community Profilesla25 [27]) or as304 [10]). Thethe nodes.and the edges].ODure to computehe conductanceost of the workerformed whenWe can express10010110210310410510−410maxdeg10010110210310410510−410maxdeg100101110−410−3100101110−410−3fb-A-oneyear10010110210310410510−410−310−210−1100maxdeg10010110210310410510−410−310−210−1100maxdeg10010110−410−310−210−110010010110−410−310−210−1100soc-LiveJournal1 ca100100100100Community Size"(Degree + 1)Minimumconductance forany communityneighborhood ofthe given size“Egonetcommunityprofile” showsthe sameshape, 3 secsto compute.1.1M verts, 4M edgesThe Fiedlercommunitycomputed fromthe normalizedLaplacian is aneighborhood!KDD2012David Gleich · PurdueFacebook datafrom Wilson etal. 2009
  20. Not just one graph10510510010110210310410510−410−3maxdegver t s210010110210310410510−410−3maxdegver t s2arxiv10510510010110210310410510−410−310−210−1100maxdegver t s210010110210310410510−410−310−210−1100maxdegver t s2ca-AstroPh100100t any procedure to computeo compute the conductancehe graph. Most of the worke cient is performed whenthe vertex. We can expresss: {v})/2ighbors produces a triangledouble-counts). Note alsodges(N1(v))/2 dv. Thenges(N1(v)). And so, givencompute the cut given thewell. This is easy to do withtly stores the degrees.hood communitiesnetwork community plot to10010110210310410510−410−3maxdeg10010110210310410510−410−3maxdeg110−410−3110−410−3soc-LiveJournal110010110210310410510−410−310−210−1100maxdeg10010110210310410510−410−310−210−1100maxdeg110−210−1100110−210−1100Number of vertices in cluster NFigure 2: The best neighboductance at each size (blackarXiv – 86k verts, 500k edges soc-LiveJournal – 5M verts, 42M edges15 more graphs availablewww.cs.purdue.edu/~dgleich/codes/neighborhoods KDD2012David Gleich · Purdue
  21. Filling in the !Network Community Profilela25 [27]) or as304 [10]). Thethe nodes.and the edges].ODure to computehe conductanceost of the workerformed whenWe can express10010110210310410510−410maxdeg10010110210310410510−410maxdeg100101110−410−3100101110−410−3fb-A-oneyear10010110210310410510−410−310−210−1100maxdeg10010110210310410510−410−310−210−1100maxdeg10010110−410−310−210−110010010110−410−310−210−1100soc-LiveJournal1 ca100100100100Minimumconductance forany communityneighborhood ofthe given sizeWe are missinga region of theNCP when wejust look atneighborhoods KDD2012David Gleich · PurdueCommunity Size"(Degree + 1)
  22. Personalized PageRankCommunities [Andersen06]To find the canonical NCP structure, Leskovec etal. used a personalized PageRank basedcommunity finder. These start with a single vertex seed, and thenexpand the community based on the solution of apersonalized PageRank problem.The resulting community satisfies a local Cheegerinequality.This needs to run thousands of times for an NCPKDD2012David Gleich · Purdue
  23. 10010110210310410510−410−310−210−1100maxdeg10010110210310410510−410−310−210−1100maxdegFilling in the !Network Community Profilela25 [27]) or as304 [10]). Thethe nodes.and the edges].ODure to computehe conductanceost of the workerformed whenWe can express10010110210310410510−410maxdeg10010110210310410510−410maxdeg100101110−410−3100101110−410−3fb-A-oneyear10010110210310410510−410−310−210−1100maxdeg10010110210310410510−410−310−210−1100maxdeg10010110−410−310−210−110010010110−410−310−210−1100soc-LiveJournal1 ca100100100100Minimumconductance forany community ofthe given size7807 secondsThis regionfills whenusing thePPR method(like now!)KDD2012David Gleich · PurdueCommunity Size"
  24. Vertex Neighborhoods, !Low Conductance Cuts, !and Good Seeds for LocalCommunity MethodsKDD2012David Gleich · Purdue
  25. Am I a good seed?!Locally Minimal Communities“My conductance is the best locally.”(N(v))  (N(w))for all w adjacent to vIn Zachary’s Karate Clubnetwork, there are four locallyminimal communities, the twoleaders and two peripheralnodes.KDD2012David Gleich · Purdue
  26. 10010110210310410510−410−310−210−1100maxdeg10010110210310410510−410−310−210−1100maxdegLocally minimal communitiescapture extremal neighborhoodsla25 [27]) or as304 [10]). Thethe nodes.and the edges].ODure to computehe conductanceost of the workerformed whenWe can express10010110210310410510−410maxdeg10010110210310410510−410maxdeg100101110−410−3100101110−410−3fb-A-oneyear10010110210310410510−410−310−210−1100maxdeg10010110210310410510−410−310−210−1100maxdeg10010110−410−310−210−110010010110−410−310−210−1100soc-LiveJournal1 ca100100100100Red dots areconductance "and size of a "locally minimalcommunityUsually about 1%of # of vertices.The redcircles – thebest localmins – findthe extremesin the egonetprofile.KDD2012David Gleich · PurdueCommunity Size"
  27. 10010110210310410510−410−310−210−1100maxdeg10010110210310410510−410−310−210−1100maxdegFilling in the NCP!Growing locally minimal comm.la25 [27]) or as304 [10]). Thethe nodes.and the edges].ODure to computehe conductanceost of the workerformed whenWe can express10010110210310410510−410maxdeg10010110210310410510−410maxdeg100101110−410−3100101110−410−3fb-A-oneyear10010110210310410510−410−310−210−1100maxdeg10010110210310410510−410−310−210−1100maxdeg10010110−410−310−210−110010010110−410−310−210−1100soc-LiveJournal1 ca100100100100Growing onlylocally minimalcommunities283 secondsvs.7807 secondsFull NCPLocally minNCPOriginalEgonetKDD2012David Gleich · PurdueCommunity Size"
  28. 10010110210310410510−410−310−210−1100maxdegver t s210010110210310410510−410−310−210−1100maxdegver t s2Filling in the NCP!Growing locally minimal comm.Growing onlylocally minimalcommunities 143 secondsvs.2211 secondsFull NCPLocally minNCPOriginalEgonetsarXiv – 86k verts, 500k edgesKDD2012David Gleich · PurdueCommunity Size"
  29. RecapA theorem relating clustering,"heavy-tailed degrees, and"low-conductance cuts of "vertex neighborhoods.Empirical evaluation of "vertex neighborhoods.More on k-cores in the paper.⇒  Many communities are easy to find!⇒  Explains success of community detection?Acknowledgements!David supported by NSF CAREERaward 1149756-CCF.Sesh supported by the SandiaLDRD program (project 158477) andthe applied mathematics program atthe Dept. of Energy.KDD2012David Gleich · PurdueCode and results available onlinewww.cs.purdue.edu/~dgleich/codes/neighborhoods
  30. Two words on computingCan be done by justcounting the triangles ateach node. Linearcomplexity in |E| in a power-law graph.It’s possible to do this inMapReduce too.KDD2012David Gleich · Purdue
  31. 10010110210310410510−410−310−210−1100maxdegver t s210010110210310410510−410−310−210−1100maxdegver t s2Filling in the NCP!Growing k coresGrowing onlylocally minimalcommunities and k-cores 143 secondsvs.2211 secondsFull NCPLocally minNCPOriginalEgonetsarXiv – 86k verts, 500k edgesKDD2012David Gleich · PurdueCommunity Size"PPR grownk-coresk-cores
  32. Clustering coefficientsWedgeGlobal clustering coefficientLocal clustering coefficient =number of closed wedgesnumber of wedgesCv =number of closed wedges centered at vnumber of wedges centered at vcenter of wedgeclosed wedgeProbability that arandom wedgeis closedKDD2012David Gleich · Purdue

×