Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Bump	
  Hun(ng	
  in	
  the	
  Dark	
  
Local	
  Discrepancy	
  Maximiza(on	
  on	
  Graphs	
  
Aris(des	
  Gionis	
  
Mic...
Find the Bump!
Here it
is!
regular	
  nodes	
  
2	
  
undirected	
  graph	
  
computer	
  network	
  
online	
  social	
  ...
for	
  α	
  =	
  1	
  
score	
  =	
  18	
  –	
  3	
  =	
  15	
  
score	
  =	
  1	
  –	
  0	
  =	
  1	
  
score	
  =	
  0	
...
fixed	
  frac(on	
  of	
  special	
  nodes	
  
subgraph	
  size	
  ì	
  	
  score	
  ì
fixed	
  subgraph	
  size	
  
frac(...
local	
  access	
  
special	
  nodes	
  are	
  provided	
  as	
  input	
  
build	
  graph	
  via	
  get-­‐neighbors	
  fun...
Approach	
  
	
  
first	
  
retrieve	
  a	
  minimal	
  part	
  of	
  the	
  graph	
  
necessary	
  when	
  we	
  have	
  o...
In	
  What	
  Follows	
  	
  
•  Unrestricted	
  Access	
  
–  Complexity	
  	
  
–  Connec(on	
  to	
  Steiner	
  Trees	
...
Unrestricted	
  Access	
  
the	
  problem	
  is…	
  
NP-­‐hard	
  
(reduc(on	
  from	
  SETCOVER)	
  
8	
  
Unrestricted	
  Access	
  
the	
  problem	
  is…	
  
a	
  special	
  case	
  of	
  PRIZECOLLECTINGSTEINERTREE	
  
	
  
inp...
In	
  What	
  Follows	
  	
  
•  Unrestricted	
  Access	
  
–  Complexity	
  	
  
–  Connec(on	
  to	
  Steiner	
  Trees	
...
Special	
  Case	
  
graph	
  is	
  a	
  tree	
  
op(mal	
  linear	
  recursive	
  algorithm	
  
op(mal	
  in	
  graph	
  
...
Algorithms	
  
idea	
  for	
  general	
  case	
  heuris(cs	
  
generate	
  spanning	
  tree	
  for	
  graph	
  
solve	
  p...
In	
  What	
  Follows	
  	
  
•  Unrestricted	
  Access	
  
–  Complexity	
  	
  
–  Connec(on	
  to	
  Steiner	
  Trees	
...
Local	
  Access	
  
start	
  with	
  special	
  nodes	
  
call	
  get-neighbors(node)to	
  expand	
  
retrieve	
  en(re	
 ...
Full	
  Expansion	
  
retrieve	
  graph	
  that	
  contains	
  op(mal	
  solu(on	
  
with	
  calls	
  to	
  	
  get-neighb...
Full	
  Expansion	
  
	
  
	
  
expensive	
  in	
  prac(ce	
  
we	
  look	
  for	
  heuris(cs	
  
16	
  
Oblivious	
  Expansion	
  
expand	
  (α+1)	
  (mes	
  around	
  each	
  special	
  node	
  
not	
  guaranteed	
  to	
  cov...
Adap(ve	
  Expansion	
  	
  
idea	
  :	
  while	
  expanding,	
  es(mate	
  heurisHcally	
  how	
  close	
  we	
  are	
  t...
In	
  What	
  Follows	
  	
  
•  Unrestricted	
  Access	
  
–  Complexity	
  	
  
–  Connec(on	
  to	
  Steiner	
  Trees	
...
Datasets	
  
Table I
DATASET STATISTICS (NUMBERS ARE ROUNDED).
Dataset |V | |E|
Geo 1 · 106 4 · 106
BA 1 · 106 10 · 106
Gr...
Input	
  
graphs	
  from	
  previous	
  datasets	
  
	
  
special	
  nodes:	
  planted	
  spheres	
  
k	
  spheres	
  
rad...
Algorithms	
  
	
  
expansion	
  
Full	
  (...not!)	
  
Oblivious	
  
Adap(ve	
  
	
  
op(miza(on	
  
BFS	
  
random-­‐ST	...
Expansion	
  
Table II
EXPANSION TABLE (AVERAGES OF 20 RUNS)
ObliviousExp. AdaptiveExp.
dataset s k cost size cost size
Gr...
Op(miza(on	
  
10
ES OF 20 RUNS)
Exp. AdaptiveExp.
size cost size
888 2783 7950
784 534 1604
2578 4833 30883
2452 578 3991...
Op(miza(on	
   11
Table III
ACCURACY, AVERAGES OF 20 RUNS
ObliviousExpansion AdaptiveExpansion
dataset s k BFS-Tree Random...
In	
  What	
  Follows	
  	
  
•  Unrestricted	
  Access	
  
–  Complexity	
  	
  
–  Connec(on	
  to	
  Steiner	
  Trees	
...
Future	
  Work	
  
	
  
Approxima(on	
  Guarantee	
  /	
  Tighter	
  expansions	
  
Distributed	
  Se>ng	
  
Unknown	
  Sc...
The End
28	
  
Upcoming SlideShare
Loading in …5
×

Bump Hunting in the Dark - ICDE15 presentation

180 views

Published on

Presentation of our work "Local discrepancy maximization on graphs" at the International IEEE Conference on Data Engineering (ICDE) 2015 in Seoul, Korea.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Bump Hunting in the Dark - ICDE15 presentation

  1. 1. Bump  Hun(ng  in  the  Dark   Local  Discrepancy  Maximiza(on  on  Graphs   Aris(des  Gionis   Michael  Mathioudakis   An>  Ukkonen   April  16th,  2015   Seoul,  Korea   ICDE  2015  
  2. 2. Find the Bump! Here it is! regular  nodes   2   undirected  graph   computer  network   online  social  network   etc.   special  nodes   failure  reported  /  virus  detected!   posted  message  about  topic  X    
  3. 3. for  α  =  1   score  =  18  –  3  =  15   score  =  1  –  0  =  1   score  =  0  –  1  =  -­‐1   graph,  special  &  non-­‐special  nodes     find  connected  subgraph  with     max  linear  discrepancy   score    =    α  x  #special  -­‐  #non-­‐special   3  
  4. 4. fixed  frac(on  of  special  nodes   subgraph  size  ì    score  ì fixed  subgraph  size   frac(on  of  special  nodes  ì    score  ì   graph,  special  &  non-­‐special  nodes     find  connected  subgraph  with     max  linear  discrepancy   score    =    α  x  #special  -­‐  #non-­‐special   4  
  5. 5. local  access   special  nodes  are  provided  as  input   build  graph  via  get-­‐neighbors  func(on     how  much  of  the  graph  do  we  need?   (infinite  graph?)   5  
  6. 6. Approach     first   retrieve  a  minimal  part  of  the  graph   necessary  when  we  have  only  local  access   use  get-neighbors func(on  to   expand  around  the  special  nodes     then   solve  problem  on  retrieved  subgraph   6  
  7. 7. In  What  Follows     •  Unrestricted  Access   –  Complexity     –  Connec(on  to  Steiner  Trees   –  Special  Case:  Graph  is  Tree   –  Algorithms   •  Local  Access   –  Algorithms  to  retrieve  (part  of)  the  graph   •  Experiments   •  Future  Work   7  
  8. 8. Unrestricted  Access   the  problem  is…   NP-­‐hard   (reduc(on  from  SETCOVER)   8  
  9. 9. Unrestricted  Access   the  problem  is…   a  special  case  of  PRIZECOLLECTINGSTEINERTREE     input:  graph  posi(ve  weight  on  terminal  nodes          non-­‐nega(ve  weight  on  edges   output:  tree  min  Σ(missed  terminal  node  weights)  +  Σ  (edge  weights)   9   terminal  nodes   special  nodes:  weight  =  α+1   edges:    weight  =  1   max  discrepancy          is  instance  of          min  PCST  weight   GoemansWilliamson  algorithm,  O(|V||E|)  &  O(|V|2log|V|)   (2  -­‐  1/(n  -­‐  1))-­‐approxima(on   min-­‐approxima(on  does  not  translate  to  our  max-­‐problem  
  10. 10. In  What  Follows     •  Unrestricted  Access   –  Complexity     –  Connec(on  to  Steiner  Trees   –  Special  Case:  Graph  is  Tree   –  Algorithms   •  Local  Access   –  Algorithms  to  retrieve  (part  of)  the  graph   •  Experiments   •  Future  Work   10  
  11. 11. Special  Case   graph  is  a  tree   op(mal  linear  recursive  algorithm   op(mal  in  graph   op(mal  with  root   op(mal  with  root   of  subtree   or   11   op(mal  without  root   (in  subtree)  
  12. 12. Algorithms   idea  for  general  case  heuris(cs   generate  spanning  tree  for  graph   solve  problem  on  spanning  tree     BFS-­‐trees  from  each  special  node   min  weight  spanning  tree   random  weights     w(u,v) in [0,1] ‘smart’  weights   w(u,v) = 2 - 1{u is special} - 1{v is special} GW  for  PCST   12  
  13. 13. In  What  Follows     •  Unrestricted  Access   –  Complexity     –  Connec(on  to  Steiner  Trees   –  Special  Case:  Graph  is  Tree   –  Algorithms   •  Local  Access   –  Algorithms  to  retrieve  (part  of)  the  graph   •  Experiments   •  Future  Work   13  
  14. 14. Local  Access   start  with  special  nodes   call  get-neighbors(node)to  expand   retrieve  en(re  graph?     expansion  strategies   full  expansion   return  a  subgraph  that  contains  the  op(mal  solu(on   oblivious  expansion,  adap(ve  expansion     14  
  15. 15. Full  Expansion   retrieve  graph  that  contains  op(mal  solu(on   with  calls  to    get-neighbors()   maintain  fron(er  to  expand  in  each  itera(on     fron(er  condi(on   unexpanded  nodes  for  which   min distance from reachable special node less than (α+1) x #reachable_special_nodes 1.  fron(er  =  special  nodes   2.  call  get-­‐neighbors()  on  fron(er  nodes   3.  update  fron(er   4.  go  to  .2.  if  fron(er  not  empty   ini(ally:   loop:   15  
  16. 16. Full  Expansion       expensive  in  prac(ce   we  look  for  heuris(cs   16  
  17. 17. Oblivious  Expansion   expand  (α+1)  (mes  around  each  special  node   not  guaranteed  to  cover  op(mal  solu(on      a   b 17   retrieved  by  ObliviousExpansion   addi(onally  retrieved  by  Full  Expansion   α  =  1   special  node   solution solution op(mal  
  18. 18. Adap(ve  Expansion     idea  :  while  expanding,  es(mate  heurisHcally  how  close  we  are  to  op(mal   for  each  component,  solve  problem  on  spanning  trees  rooted  at  fron(er  nodes   obtain  solu(ons  with  and  w/o  roots   maintain  heuris(c   upper  bound  of  op(mal  solu(on  in  en(re  graph   lower  bound  of  op(mal  solu(on  from  retrieved  graph   terminate  when  upper  bound    ≅    lower  bound     18   connected  components   of  retrieved  graph   tree  of   posi(ve   discrepancy   tree  of   nega(ve   discrepancy  
  19. 19. In  What  Follows     •  Unrestricted  Access   –  Complexity     –  Connec(on  to  Steiner  Trees   –  Special  Case:  Graph  is  Tree   –  Algorithms   •  Local  Access   –  Algorithms  to  retrieve  (part  of)  the  graph   •  Experiments   •  Future  Work   19  
  20. 20. Datasets   Table I DATASET STATISTICS (NUMBERS ARE ROUNDED). Dataset |V | |E| Geo 1 · 106 4 · 106 BA 1 · 106 10 · 106 Grid 4 · 106 8 · 106 Livejournal 4.3 · 106 69 · 106 Patents 2 · 106 16.5 · 106 Pokec 1.4 · 106 30.6 · 106 All graphs used in the experiments are undirected and their sizes are reported in Table I. N s q s g g i F synthe(c   real   20  
  21. 21. Input   graphs  from  previous  datasets     special  nodes:  planted  spheres   k  spheres   radius  ρ   s  special  nodes  inside  spheres   z  special  nodes  outside  spheres     21  
  22. 22. Algorithms     expansion   Full  (...not!)   Oblivious   Adap(ve     op(miza(on   BFS   random-­‐ST   smart-­‐ST   PCST     measure   cost  (get-neighbors  calls)   graph  size           running  (me  vs  graph  size   accuracy  (Jaccard  coeff)   discrepancy   22  
  23. 23. Expansion   Table II EXPANSION TABLE (AVERAGES OF 20 RUNS) ObliviousExp. AdaptiveExp. dataset s k cost size cost size Grid 20 2 302 888 2783 7950 Grid 60 1 261 784 534 1604 Geo 20 2 452 2578 4833 30883 Geo 60 1 418 2452 578 3991 BA 20 2 3943 243227 114 6032 BA 60 1 4477 271870 135 7407 Patents 20 2 605 3076 13436 25544 Patents 60 1 620 3126 5907 13009 Pokec 20 2 3884 217592 161 7249 Pokec 60 1 4343 240544 116 5146 Livejournal 20 2 3703 348933 234 13540 Livejournal 60 1 4667 394023 129 7087 generate Q, and in interest of presentation, here we report what we consider to be representative results. Table II shows the cost (number of API calls) as well as the size (number of edges) of the retrieved graph G . 1e+01 1e-02 1e-01 1e+00 1e+01 1e+02 Figure 6. Running times of size (number of edges). We                                                                                                                                                                               23  
  24. 24. Op(miza(on   10 ES OF 20 RUNS) Exp. AdaptiveExp. size cost size 888 2783 7950 784 534 1604 2578 4833 30883 2452 578 3991 3227 114 6032 1870 135 7407 3076 13436 25544 3126 5907 13009 7592 161 7249 0544 116 5146 8933 234 13540 4023 129 7087 sentation, here we report ve results. r of API calls) as well the retrieved graph GX . s that for Grid, Geo, on results in fewer API Running time (in sec) 1e+01 1e+03 1e+05 1e-02 1e-01 1e+00 1e+01 1e+02 expansion size (#edges) BFS-Tree Random-ST PCST-Tree Smart-ST Figure 6. Running times of the different algorithms as a function of expansion size (number of edges). We can see that in comparison to PCST-Tree Smart- ST scales to inputs that are up to two orders of magnitude larger. 24  
  25. 25. Op(miza(on   11 Table III ACCURACY, AVERAGES OF 20 RUNS ObliviousExpansion AdaptiveExpansion dataset s k BFS-Tree Random-ST PCST-Tree Smart-ST BFS-Tree Random-ST PCST-Tree Smart-ST Grid 20 2 0.88 (0) 0.81 (0) 0.93 (0) 0.93 (0) 0.88 (0) 0.85 (0) 0.93 (0) 0.93 (0) Grid 60 1 1.00 (0) 0.94 (0) 1.00 (0) 1.00 (0) 0.99 (0) 0.98 (0) 1.00 (0) 1.00 (0) Geo 20 2 1.00 (0) 0.95 (0) 1.00 (0) 1.00 (0) 1.00 (0) 0.98 (0) 1.00 (0) 1.00 (0) Geo 60 1 1.00 (0) 0.96 (0) 1.00 (0) 1.00 (0) 0.99 (0) 0.98 (0) 0.99 (0) 0.99 (0) BA 20 2 0.47 (12) 0.18 (12) NaN (20) 0.46 (0) 0.46 (0) 0.44 (0) 0.46 (0) 0.45 (0) BA 60 1 NaN (20) NaN (20) NaN (20) 0.77 (0) 0.76 (0) 0.76 (0) 0.77 (3) 0.76 (0) Patents 20 2 0.92 (0) 0.86 (0) 0.91 (0) 0.90 (0) 0.72 (0) 0.74 (0) 0.77 (3) 0.74 (0) Patents 60 1 0.89 (0) 0.76 (0) 0.89 (0) 0.89 (0) 0.74 (0) 0.73 (0) 0.74 (0) 0.74 (0) Pokec 20 2 0.53 (2) 0.13 (3) NaN (20) 0.46 (0) 0.43 (0) 0.41 (0) 0.42 (2) 0.40 (0) Pokec 60 1 0.74 (6) 0.09 (6) NaN (20) 0.61 (0) 0.48 (0) 0.46 (0) 0.45 (1) 0.45 (0) Livejournal 20 2 0.62 (5) 0.19 (5) NaN (20) 0.54 (0) 0.56 (0) 0.53 (0) 0.58 (5) 0.56 (0) Livejournal 60 1 0.88 (12) 0.26 (9) NaN (20) 0.68 (0) 0.65 (0) 0.62 (0) 0.62 (1) 0.62 (0) Table IV DISCREPANCY, AVERAGES OF 20 RUNS ObliviousExpansion AdaptiveExpansion dataset s k BFS-Tree Random-ST PCST-Tree Smart-ST BFS-Tree Random-ST PCST-Tree Smart-ST Grid 20 2 14.5 (0) 11.8 (0) 16.8 (0) 16.7 (0) 14.8 (0) 13.8 (0) 16.4 (0) 16.3 (0) Grid 60 1 41.0 (0) 36.9 (0) 41.0 (0) 41.0 (0) 40.5 (0) 38.9 (0) 40.9 (0) 40.9 (0) Geo 20 2 19.9 (0) 18.4 (0) 20.0 (0) 20.0 (0) 19.9 (0) 19.2 (0) 20.0 (0) 20.0 (0) Geo 60 1 22.0 (0) 20.6 (0) 22.0 (0) 22.0 (0) 21.8 (0) 21.6 (0) 21.8 (0) 21.8 (0) BA 20 2 15.0 (12) 2.8 (12) NaN (20) 15.2 (0) 15.6 (0) 14.4 (0) 14.4 (0) 15.0 (0) BA 60 1 NaN (20) NaN (20) NaN (20) 36.1 (0) 37.4 (0) 35.3 (0) 35.9 (3) 35.5 (0) Patents 20 2 17.4 (0) 15.8 (0) 17.7 (0) 17.6 (0) 14.9 (0) 13.8 (0) 15.8 (3) 14.8 (0) Patents 60 1 40.0 (0) 31.1 (0) 40.8 (0) 40.6 (0) 33.0 (0) 32.2 (0) 33.2 (0) 33.3 (0) Pokec 20 2 11.6 (2) 2.6 (3) NaN (20) 11.8 (0) 8.6 (0) 8.0 (0) 8.2 (2) 8.0 (0) Pokec 60 1 36.6 (6) 4.7 (6) NaN (20) 28.6 (0) 20.9 (0) 17.4 (0) 18.3 (1) 18.5 (0) Livejournal 20 2 14.3 (5) 3.5 (5) NaN (20) 13.8 (0) 11.8 (0) 9.8 (0) 10.8 (5) 10.2 (0) Livejournal 60 1 45.6 (12) 12.0 (9) NaN (20) 31.1 (0) 29.8 (0) 25.6 (0) 26.8 (1) 27.4 (0) ObliviousExpansion for denser graphs. We also note that once a subgraph has been discovered in the 25  
  26. 26. In  What  Follows     •  Unrestricted  Access   –  Complexity     –  Connec(on  to  Steiner  Trees   –  Special  Case:  Graph  is  Tree   –  Algorithms   •  Local  Access   –  Algorithms  to  retrieve  (part  of)  the  graph   •  Experiments   •  Future  Work   26  
  27. 27. Future  Work     Approxima(on  Guarantee  /  Tighter  expansions   Distributed  Se>ng   Unknown  Scale   Applica(on  on  Real  Data   (TKDE  Extension)   27  
  28. 28. The End 28  

×