Successfully reported this slideshow.
Upcoming SlideShare
×

# Bump Hunting in the Dark - ICDE15 presentation

194 views

Published on

Presentation of our work "Local discrepancy maximization on graphs" at the International IEEE Conference on Data Engineering (ICDE) 2015 in Seoul, Korea.

Published in: Science
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### Bump Hunting in the Dark - ICDE15 presentation

1. 1. Bump  Hun(ng  in  the  Dark   Local  Discrepancy  Maximiza(on  on  Graphs   Aris(des  Gionis   Michael  Mathioudakis   An>  Ukkonen   April  16th,  2015   Seoul,  Korea   ICDE  2015
2. 2. Find the Bump! Here it is! regular  nodes   2   undirected  graph   computer  network   online  social  network   etc.   special  nodes   failure  reported  /  virus  detected!   posted  message  about  topic  X
3. 3. for  α  =  1   score  =  18  –  3  =  15   score  =  1  –  0  =  1   score  =  0  –  1  =  -­‐1   graph,  special  &  non-­‐special  nodes     ﬁnd  connected  subgraph  with     max  linear  discrepancy   score    =    α  x  #special  -­‐  #non-­‐special   3
4. 4. ﬁxed  frac(on  of  special  nodes   subgraph  size  ì    score  ì ﬁxed  subgraph  size   frac(on  of  special  nodes  ì    score  ì   graph,  special  &  non-­‐special  nodes     ﬁnd  connected  subgraph  with     max  linear  discrepancy   score    =    α  x  #special  -­‐  #non-­‐special   4
5. 5. local  access   special  nodes  are  provided  as  input   build  graph  via  get-­‐neighbors  func(on     how  much  of  the  graph  do  we  need?   (inﬁnite  graph?)   5
6. 6. Approach     ﬁrst   retrieve  a  minimal  part  of  the  graph   necessary  when  we  have  only  local  access   use  get-neighbors func(on  to   expand  around  the  special  nodes     then   solve  problem  on  retrieved  subgraph   6
7. 7. In  What  Follows     •  Unrestricted  Access   –  Complexity     –  Connec(on  to  Steiner  Trees   –  Special  Case:  Graph  is  Tree   –  Algorithms   •  Local  Access   –  Algorithms  to  retrieve  (part  of)  the  graph   •  Experiments   •  Future  Work   7
8. 8. Unrestricted  Access   the  problem  is…   NP-­‐hard   (reduc(on  from  SETCOVER)   8
9. 9. Unrestricted  Access   the  problem  is…   a  special  case  of  PRIZECOLLECTINGSTEINERTREE     input:  graph  posi(ve  weight  on  terminal  nodes          non-­‐nega(ve  weight  on  edges   output:  tree  min  Σ(missed  terminal  node  weights)  +  Σ  (edge  weights)   9   terminal  nodes   special  nodes:  weight  =  α+1   edges:    weight  =  1   max  discrepancy          is  instance  of          min  PCST  weight   GoemansWilliamson  algorithm,  O(|V||E|)  &  O(|V|2log|V|)   (2  -­‐  1/(n  -­‐  1))-­‐approxima(on   min-­‐approxima(on  does  not  translate  to  our  max-­‐problem
10. 10. In  What  Follows     •  Unrestricted  Access   –  Complexity     –  Connec(on  to  Steiner  Trees   –  Special  Case:  Graph  is  Tree   –  Algorithms   •  Local  Access   –  Algorithms  to  retrieve  (part  of)  the  graph   •  Experiments   •  Future  Work   10
11. 11. Special  Case   graph  is  a  tree   op(mal  linear  recursive  algorithm   op(mal  in  graph   op(mal  with  root   op(mal  with  root   of  subtree   or   11   op(mal  without  root   (in  subtree)
12. 12. Algorithms   idea  for  general  case  heuris(cs   generate  spanning  tree  for  graph   solve  problem  on  spanning  tree     BFS-­‐trees  from  each  special  node   min  weight  spanning  tree   random  weights     w(u,v) in [0,1] ‘smart’  weights   w(u,v) = 2 - 1{u is special} - 1{v is special} GW  for  PCST   12
13. 13. In  What  Follows     •  Unrestricted  Access   –  Complexity     –  Connec(on  to  Steiner  Trees   –  Special  Case:  Graph  is  Tree   –  Algorithms   •  Local  Access   –  Algorithms  to  retrieve  (part  of)  the  graph   •  Experiments   •  Future  Work   13
14. 14. Local  Access   start  with  special  nodes   call  get-neighbors(node)to  expand   retrieve  en(re  graph?     expansion  strategies   full  expansion   return  a  subgraph  that  contains  the  op(mal  solu(on   oblivious  expansion,  adap(ve  expansion     14
15. 15. Full  Expansion   retrieve  graph  that  contains  op(mal  solu(on   with  calls  to    get-neighbors()   maintain  fron(er  to  expand  in  each  itera(on     fron(er  condi(on   unexpanded  nodes  for  which   min distance from reachable special node less than (α+1) x #reachable_special_nodes 1.  fron(er  =  special  nodes   2.  call  get-­‐neighbors()  on  fron(er  nodes   3.  update  fron(er   4.  go  to  .2.  if  fron(er  not  empty   ini(ally:   loop:   15
16. 16. Full  Expansion       expensive  in  prac(ce   we  look  for  heuris(cs   16
17. 17. Oblivious  Expansion   expand  (α+1)  (mes  around  each  special  node   not  guaranteed  to  cover  op(mal  solu(on      a   b 17   retrieved  by  ObliviousExpansion   addi(onally  retrieved  by  Full  Expansion   α  =  1   special  node   solution solution op(mal
18. 18. Adap(ve  Expansion     idea  :  while  expanding,  es(mate  heurisHcally  how  close  we  are  to  op(mal   for  each  component,  solve  problem  on  spanning  trees  rooted  at  fron(er  nodes   obtain  solu(ons  with  and  w/o  roots   maintain  heuris(c   upper  bound  of  op(mal  solu(on  in  en(re  graph   lower  bound  of  op(mal  solu(on  from  retrieved  graph   terminate  when  upper  bound    ≅    lower  bound     18   connected  components   of  retrieved  graph   tree  of   posi(ve   discrepancy   tree  of   nega(ve   discrepancy
19. 19. In  What  Follows     •  Unrestricted  Access   –  Complexity     –  Connec(on  to  Steiner  Trees   –  Special  Case:  Graph  is  Tree   –  Algorithms   •  Local  Access   –  Algorithms  to  retrieve  (part  of)  the  graph   •  Experiments   •  Future  Work   19
20. 20. Datasets   Table I DATASET STATISTICS (NUMBERS ARE ROUNDED). Dataset |V | |E| Geo 1 · 106 4 · 106 BA 1 · 106 10 · 106 Grid 4 · 106 8 · 106 Livejournal 4.3 · 106 69 · 106 Patents 2 · 106 16.5 · 106 Pokec 1.4 · 106 30.6 · 106 All graphs used in the experiments are undirected and their sizes are reported in Table I. N s q s g g i F synthe(c   real   20
21. 21. Input   graphs  from  previous  datasets     special  nodes:  planted  spheres   k  spheres   radius  ρ   s  special  nodes  inside  spheres   z  special  nodes  outside  spheres     21
22. 22. Algorithms     expansion   Full  (...not!)   Oblivious   Adap(ve     op(miza(on   BFS   random-­‐ST   smart-­‐ST   PCST     measure   cost  (get-neighbors  calls)   graph  size           running  (me  vs  graph  size   accuracy  (Jaccard  coeﬀ)   discrepancy   22
23. 23. Expansion   Table II EXPANSION TABLE (AVERAGES OF 20 RUNS) ObliviousExp. AdaptiveExp. dataset s k cost size cost size Grid 20 2 302 888 2783 7950 Grid 60 1 261 784 534 1604 Geo 20 2 452 2578 4833 30883 Geo 60 1 418 2452 578 3991 BA 20 2 3943 243227 114 6032 BA 60 1 4477 271870 135 7407 Patents 20 2 605 3076 13436 25544 Patents 60 1 620 3126 5907 13009 Pokec 20 2 3884 217592 161 7249 Pokec 60 1 4343 240544 116 5146 Livejournal 20 2 3703 348933 234 13540 Livejournal 60 1 4667 394023 129 7087 generate Q, and in interest of presentation, here we report what we consider to be representative results. Table II shows the cost (number of API calls) as well as the size (number of edges) of the retrieved graph G . 1e+01 1e-02 1e-01 1e+00 1e+01 1e+02 Figure 6. Running times of size (number of edges). We                                                                                                                                                                               23
24. 24. Op(miza(on   10 ES OF 20 RUNS) Exp. AdaptiveExp. size cost size 888 2783 7950 784 534 1604 2578 4833 30883 2452 578 3991 3227 114 6032 1870 135 7407 3076 13436 25544 3126 5907 13009 7592 161 7249 0544 116 5146 8933 234 13540 4023 129 7087 sentation, here we report ve results. r of API calls) as well the retrieved graph GX . s that for Grid, Geo, on results in fewer API Running time (in sec) 1e+01 1e+03 1e+05 1e-02 1e-01 1e+00 1e+01 1e+02 expansion size (#edges) BFS-Tree Random-ST PCST-Tree Smart-ST Figure 6. Running times of the different algorithms as a function of expansion size (number of edges). We can see that in comparison to PCST-Tree Smart- ST scales to inputs that are up to two orders of magnitude larger. 24
25. 25. Op(miza(on   11 Table III ACCURACY, AVERAGES OF 20 RUNS ObliviousExpansion AdaptiveExpansion dataset s k BFS-Tree Random-ST PCST-Tree Smart-ST BFS-Tree Random-ST PCST-Tree Smart-ST Grid 20 2 0.88 (0) 0.81 (0) 0.93 (0) 0.93 (0) 0.88 (0) 0.85 (0) 0.93 (0) 0.93 (0) Grid 60 1 1.00 (0) 0.94 (0) 1.00 (0) 1.00 (0) 0.99 (0) 0.98 (0) 1.00 (0) 1.00 (0) Geo 20 2 1.00 (0) 0.95 (0) 1.00 (0) 1.00 (0) 1.00 (0) 0.98 (0) 1.00 (0) 1.00 (0) Geo 60 1 1.00 (0) 0.96 (0) 1.00 (0) 1.00 (0) 0.99 (0) 0.98 (0) 0.99 (0) 0.99 (0) BA 20 2 0.47 (12) 0.18 (12) NaN (20) 0.46 (0) 0.46 (0) 0.44 (0) 0.46 (0) 0.45 (0) BA 60 1 NaN (20) NaN (20) NaN (20) 0.77 (0) 0.76 (0) 0.76 (0) 0.77 (3) 0.76 (0) Patents 20 2 0.92 (0) 0.86 (0) 0.91 (0) 0.90 (0) 0.72 (0) 0.74 (0) 0.77 (3) 0.74 (0) Patents 60 1 0.89 (0) 0.76 (0) 0.89 (0) 0.89 (0) 0.74 (0) 0.73 (0) 0.74 (0) 0.74 (0) Pokec 20 2 0.53 (2) 0.13 (3) NaN (20) 0.46 (0) 0.43 (0) 0.41 (0) 0.42 (2) 0.40 (0) Pokec 60 1 0.74 (6) 0.09 (6) NaN (20) 0.61 (0) 0.48 (0) 0.46 (0) 0.45 (1) 0.45 (0) Livejournal 20 2 0.62 (5) 0.19 (5) NaN (20) 0.54 (0) 0.56 (0) 0.53 (0) 0.58 (5) 0.56 (0) Livejournal 60 1 0.88 (12) 0.26 (9) NaN (20) 0.68 (0) 0.65 (0) 0.62 (0) 0.62 (1) 0.62 (0) Table IV DISCREPANCY, AVERAGES OF 20 RUNS ObliviousExpansion AdaptiveExpansion dataset s k BFS-Tree Random-ST PCST-Tree Smart-ST BFS-Tree Random-ST PCST-Tree Smart-ST Grid 20 2 14.5 (0) 11.8 (0) 16.8 (0) 16.7 (0) 14.8 (0) 13.8 (0) 16.4 (0) 16.3 (0) Grid 60 1 41.0 (0) 36.9 (0) 41.0 (0) 41.0 (0) 40.5 (0) 38.9 (0) 40.9 (0) 40.9 (0) Geo 20 2 19.9 (0) 18.4 (0) 20.0 (0) 20.0 (0) 19.9 (0) 19.2 (0) 20.0 (0) 20.0 (0) Geo 60 1 22.0 (0) 20.6 (0) 22.0 (0) 22.0 (0) 21.8 (0) 21.6 (0) 21.8 (0) 21.8 (0) BA 20 2 15.0 (12) 2.8 (12) NaN (20) 15.2 (0) 15.6 (0) 14.4 (0) 14.4 (0) 15.0 (0) BA 60 1 NaN (20) NaN (20) NaN (20) 36.1 (0) 37.4 (0) 35.3 (0) 35.9 (3) 35.5 (0) Patents 20 2 17.4 (0) 15.8 (0) 17.7 (0) 17.6 (0) 14.9 (0) 13.8 (0) 15.8 (3) 14.8 (0) Patents 60 1 40.0 (0) 31.1 (0) 40.8 (0) 40.6 (0) 33.0 (0) 32.2 (0) 33.2 (0) 33.3 (0) Pokec 20 2 11.6 (2) 2.6 (3) NaN (20) 11.8 (0) 8.6 (0) 8.0 (0) 8.2 (2) 8.0 (0) Pokec 60 1 36.6 (6) 4.7 (6) NaN (20) 28.6 (0) 20.9 (0) 17.4 (0) 18.3 (1) 18.5 (0) Livejournal 20 2 14.3 (5) 3.5 (5) NaN (20) 13.8 (0) 11.8 (0) 9.8 (0) 10.8 (5) 10.2 (0) Livejournal 60 1 45.6 (12) 12.0 (9) NaN (20) 31.1 (0) 29.8 (0) 25.6 (0) 26.8 (1) 27.4 (0) ObliviousExpansion for denser graphs. We also note that once a subgraph has been discovered in the 25
26. 26. In  What  Follows     •  Unrestricted  Access   –  Complexity     –  Connec(on  to  Steiner  Trees   –  Special  Case:  Graph  is  Tree   –  Algorithms   •  Local  Access   –  Algorithms  to  retrieve  (part  of)  the  graph   •  Experiments   •  Future  Work   26
27. 27. Future  Work     Approxima(on  Guarantee  /  Tighter  expansions   Distributed  Se>ng   Unknown  Scale   Applica(on  on  Real  Data   (TKDE  Extension)   27
28. 28. The End 28