Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fast Graphlet Decomposition: Theory, Algorithms, and Applications

463 views

Published on

From social science to biology, graphlets have found numerous applications and were used as the building blocks of network analysis. In social science, graphlet analysis (typically known as k-subgraph census) is widely adopted in sociometric studies. Much of the work in this vein focused on analyzing triadic tendencies as important structural features of social networks (e.g., transitivity or triadic closure) as well as analyzing triadic configurations as the basis for various social network theories (e.g., social balance, strength of weak ties, stability of ties, or trust). In biology, graphlets were widely used for protein function prediction, network alignment, and phylogeny to name a few. More recently, there has been an increased interest in exploring the role of graphlet analysis in computer networking (e.g., for web spam detection, analysis of peer-to-peer protocols and Internet AS graphs), chemoinformatics, image segmentation, among others.

While graphlet counting and discovery have witnessed a tremendous success and impact in a variety of domains from social science to biology, there has yet to be a fast and efficient approach for computing the frequencies of these patterns. The main contribution of this work is a fast, efficient, and parallel framework and a family of algorithms for counting graphlets of size k-nodes that take only a fraction of the time to compute when compared with the current methods used. The proposed graphlet counting algorithm leverages a number of theoretical combinatorial arguments for different graphlets. For each edge, we count a few graphlets, and with these counts along with the combinatorial arguments, we obtain the exact counts of others in constant time. Furthermore, we show a number of important machine learning tasks that rely on this approach, including graph anomaly detection, as well as using graphlets as features for improving community detection, role discovery, graph classification, and relational learning.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Fast Graphlet Decomposition: Theory, Algorithms, and Applications

  1. 1. Jennifer'Neville Purdue&University Ryan'A.'Rossi PARC Nick'Duffield Texas&A&M&University Ted'Willke Intel&Research&Labs
  2. 2. Social'Network Internet'(AS) BiologicalPolitical'Blogs Graph Mining
  3. 3. Network(Motifs:(Simple(Building(Blocks(of(Complex(Networks(– [Milo&et.&al&– Science&2002] The(Structure(and(Function(of(Complex(Networks(– [Newman&– Siam&Review&2003] 2"node' Graphlets 3"node' Graphlets 4"node' Graphlets Connected( Disconnected
  4. 4. ! Small&k9vertex&induced&subgraphs Network(Motifs:(Simple(Building(Blocks(of(Complex(Networks(– [Milo&et.&al&– Science&2002] The(Structure(and(Function(of(Complex(Networks(– [Newman&– Siam&Review&2003] 2"node' Graphlets 3"node' Graphlets 4"node' Graphlets Connected( Disconnected
  5. 5. ! Small&k9vertex&induced&subgraphs ! Motifs:&Occur&in&real9world&networks&with&frequencies&significantly( higher than&randomly&generated&networks Network(Motifs:(Simple(Building(Blocks(of(Complex(Networks(– [Milo&et.&al&– Science&2002] The(Structure(and(Function(of(Complex(Networks(– [Newman&– Siam&Review&2003] 2"node' Graphlets 3"node' Graphlets 4"node' Graphlets Connected( Disconnected
  6. 6. ! Small&k9vertex&induced&subgraphs ! Motifs:&Occur&in&real9world&networks&with&frequencies&significantly( higher than&randomly&generated&networks ! Applied&to&food&web,&genetic,&neural,&web,&and&other&networks • Found&distinct&graphlets in&each&case Network(Motifs:(Simple(Building(Blocks(of(Complex(Networks(– [Milo&et.&al&– Science&2002] The(Structure(and(Function(of(Complex(Networks(– [Newman&– Siam&Review&2003] 2"node' Graphlets 3"node' Graphlets 4"node' Graphlets Connected( Disconnected
  7. 7. AISTATS'2009
  8. 8. AISTATS'2009 Bioinformatics'2006
  9. 9. 9 9 9 9 9 ! Biological&Networks& • network&alignment,&protein&function&prediction ! Social&Networks& • triad&analysis,&community&detection,&Exp.&Random&Models ! Computer&Networks ! Internet&AS ! Cyber&Security& • spam&detection ! Ecology .(.(.
  10. 10. Ex:(Given(an(input(graph(G 9 How%many%triangles%in%G? 9 How%many%cliques%of%size%49nodes%in%G? 9 How%many%cycles%of%size%49nodes%in%G?
  11. 11. Ex:(Given(an(input(graph(G 9 How%many%triangles%in%G? 9 How%many%cliques%of%size%49nodes%in%G? 9 How%many%cycles%of%size%49nodes%in%G? " In%practice,%we%would%like%to%count%all%k9vertex%graphlets
  12. 12. ! Enumerate&all&possible&graphlets
  13. 13. ! Enumerate&all&possible&graphlets " Exhaustive%enumeration% is%too%expensive%
  14. 14. ! Enumerate&all&possible&graphlets " Exhaustive%enumeration% is%too%expensive% ! Count&graphlets for&each&node&– and&combine&all&node&counts [Shervashidze et.%al%– AISTAT%2009]%
  15. 15. ! Enumerate&all&possible&graphlets " Exhaustive%enumeration% is%too%expensive% ! Count&graphlets for&each&node&– and&combine&all&node&counts " Still%expensive%for%relatively%large%k%%%[Shervashidze et.%al%– AISTAT%2009]%
  16. 16. ! Enumerate&all&possible&graphlets " Exhaustive%enumeration% is%too%expensive% ! Count&graphlets for&each&node&– and&combine&all&node&counts " Still%expensive%for%relatively%large%k% [Shervashidze et.%al%– AISTAT%2009]% ! Other&recent&work&counts&only&connected& graphlets of&size&k=4 [Marcus%&%Shavitt – Computer%Networks%2012]% Not(practical(– scales&only&for&small&graphs&with&few& hundred/thousand&nodes/edges 9 taking%2400%secs for%a%graph%with%26K%nodes
  17. 17. Most&work&focused&on&graphlets of&k=3&nodes& In&this&work,&we&focus&on&graphlets of&k=3,4&nodes Efficient%Graphlet Counting%for%Large%Networks%%[Ahmed%et%al.,%ICDM%2015] Graphlet Decomposition:% Framework,%Algorithms,%and%Applications [Ahmed%et%al.,%KAIS%Journal%2016%(to%appear)]
  18. 18. Searching(Edge( Neighborhoods ① For(each(edge(do u v v2 v3v1 v4 v6 v7 edge
  19. 19. Searching(Edge( Neighborhoods ① For(each(edge(do • Count(All(3<node(graphlets ② Merge(counts(from(all(edges u v v2 v3v1 v4 v6 v7 edge Triangle 2<star 1<edge Independent(
  20. 20. Searching(Edge( Neighborhoods ① For(each(edge(do • Count(All(3<node(graphlets ② Merge(counts(from(all(edges u v v2 v3v1 V4 v6 v7 edge Triangle 2<star 1<edge Independent( # We(only(need(to( find/count(triangles # Use(equations to(get( counts(of(others(in(o(1) Triangle
  21. 21. Edge"centric,'Parallel,'Memory"efficient'Framework'
  22. 22. How to count all 4-node graphlets? 4<Clique 4<Cycle4<Chrodal<Cycle Tailed<triangle 4<Path 3<Star 4<node<triangle 4<node<2star 4<node<2edge 4<node<1edge Independent(
  23. 23. Step(1 Step(2 Step(3 Searching(Edge( Neighborhoods For%each%edge% Find%the%triangles Count(4Gnode(graphlets For%each%edge% Count%49node%cliques% and%49node%cycles only Count(4Gnode(graphlets For%each%edge% Use%combinatorial%%% relationships%to%compute% counts%of%other%graphlets in%constant(time Step(4 Merge(counts(from(all(edges(
  24. 24. ± 1&edge 4<Node(Graphlet Transition(Diagram(
  25. 25. 4<Node(Graphlet Transition(Diagram( ± 1&edge Count(Cliques(&(Cycles(ONLY Use(relationships(&(transitions( to(count(all(other(graphlets in(constant(time 4<Cliques 4<Cycles Maximum&no.&triangles& Incident&to&an&edge Maximum&no.&stars Incident&to&an&edge
  26. 26. T T Relationship(between(4<cliques(&(4<ChordalCycles 4<Cliques 4<ChordalCycle e T T e No.&49ChordalCycles No.&&49Cliques Proof'in'Lemma'1'" Ahmed'et'al.,'ICDM'2015
  27. 27. T T Relationship(between(4<cliques(&(4<ChordalCycles T T No.&49ChordalCycles No.&&49Cliques 4<Cliques 4<ChordalCycle e e Proof'in'Lemma'1'" Ahmed'et'al.,'ICDM'2015
  28. 28. Experiments & Results
  29. 29. ! Shared&Memory&Implementation ! Tested&on&graphs&with&over&a&billion&edges ! Largest&systematic&investigation&on&300+&networks • Social,&web,&technological,& biological,&co9authorship,& infrastructure…& • Facebook&100&networks&from&a&variety&of&US&schools • Dense&graphs&from&the&DIMACS&challenge& • Large&collections&of&biological&and&chemical&graphs Details'in'the'paper Data/code'online
  30. 30. Comparison(to(RAGE([Marcus'&'Shavitt – J.'Computer'Networks'2011]''' Facebook100 Networks from US Schools Ours RAGE Time-in-Seconds
  31. 31. |V| |E| Ours RAGE Time-in-Seconds Baseline%(RAGE)% did%not%finish%for% most%graphs We'take'~45'mins for'socSorkut (117M'edges) We'take'~40'secs for'caSdblp (15M'edges) Most'graphlet counts'in'orders'of'106'– 1015
  32. 32. |V| |E| Ours RAGE Time-in-Seconds Baseline%(RAGE)% did%not%finish%for% most%graphs We'take'~4.5'secs for'webSgoogle (4.3M'edges) We'take'~4'secs for'infSroadSusa (29M'edges) Most'graphlet counts'in'orders'of'106'– 1015
  33. 33. 0 1 2 4 8 16 0 5 10 15 Number of Processing Units Speedup socfb−Texas socfb−OR socfb−UCLA socfb−Berkeley13 socfb−MIT socfb−Penn94 0 1 2 4 8 16 0 5 10 15 Number of Processing Units Speedup 0 1 2 4 8 16 0 2 4 6 8 10 12 14 Number of Processing Units Speedup tech−internet−as tech−WHOIS web−it−2004 web−spam 0 1 2 4 8 16 0 2 4 6 8 10 12 14 Number of Processing Units Speedup Strong'scaling'results Intel%Xeon%3.10%Ghz E592687W%server,%16%cores
  34. 34. Applications
  35. 35. Label'1 Label'0 Enzyme NonSEnzyme Collection'of'Graphs (e.g.'Protein'Graphs) . . . Graphs& Each%Protein%is%represented%by%a%graph Binary%label%represents%the%function%of%the%protein
  36. 36. Label'1 Label'0 Enzyme NonSEnzyme ? ? . . . Graphs& ? ? ? Collection'of'Graphs (e.g.'Protein'Graphs) Assume%we%know%the%labels%of%a%few%graphs How%to%predict%the%labels%of%the%unlabeled%graphs?
  37. 37. Features Graphs Graphlet Feature& Extraction Model Learning Predict&Labels&of& Unlabeled&Graphs Label'1 Label'0 ? ? ? ? . . . Graphs& ? ? ? Protein'Graphs
  38. 38. ! D&D&– 1178&protein&graphs.&Binary&labeled&as&Enzymes&vs.& Non.&Enzymes ! MUTAG&– 188&mutagenic&compounds. Binary&labeled& (whether&or&not&they&have&a&mutagenic&effect&on&the&Gram9 negative&bacterium) ! 109fold&validation,&Support&Vector&Machine ! Used&2,3,4&node&graphlets as&features
  39. 39. Previous)work: in)machine)learning)&)biological)networks) Shervashidze et.al [AISTATS'2009] Feature'Extraction'Time: D&D' 2'hours,'45'mins MUTAG 4.73'secs
  40. 40. Ranking'by'graphlet counts Links'are'colored/weighted' by'stars'of'size'4'nodes Nodes'are'colored/weighted' by'triangle'counts Leukemia Colon( cancer Deafness
  41. 41. ! Local&Graphlet Decomposition Role'discovery,'relational'learning,'multi"label'classification
  42. 42. ! Unbiased&Estimation&of&Graphlet Counts 10 4 10 5 0.85 0.9 0.95 1 1.05 1.1 1.15 soc−orkut−dir Sample Size 10 4 10 5 0.85 0.9 0.95 1 1.05 1.1 1.15 soc−orkut−dir Sample Size x/y 10 4 10 5 0.9 0.95 1 1.05 1.1 1.15 soc−flickr Sample Size 10 4 10 5 0.9 0.95 1 1.05 1.1 1.15 soc−flickr Sample Size Estimation'of'counts'of'4"vertex'clique
  43. 43. ! Framework&&&Algorithms& • One&of&the&first&parallel&approaches&for&graphlet counting • On&average&460x&faster&than&current& methods • Edge9centric&computations&(only&requires&access&to&edge& neighborhood) • Time&and&space9efficient • Sampling/estimation&methods& • Local/global&counting ! Applications • Large<scale graph&comparison,& classification,&and&anomaly&detection • Visual&analytics&and&real<time graphlet mining
  44. 44. Code http://nesreenahmed.com/graphlets https://github.com/nkahmed/PGD Data http://networkrepository.com " Email%us%for%questions%
  45. 45. Thank&you! Questions? nesreen.k.ahmed@intel.com http://nesreenahmed.com

×