Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Link Prediction in the Real World

Slides from a Guest Lecture on the Design and Analysis of Graph Algorithms.

  • Be the first to comment

  • Be the first to like this

Link Prediction in the Real World

  1. 1. Link Prediction in the Real World 2 Law Enforcement Financial Fraud Drug discovery Guest Lecture by Balaji Ganesan
  2. 2. Graph  Primi*ves   Breadth-­‐First  Search   Design  and  Analysis   of  Algorithms  I   Prof. Tim Roughgarden Columbia University Formerly at Stanford University www.algorithmsilluminated.org
  3. 3. Tim  Roughgarden   Overview  and  Example   Breadth-­‐First  Search  (BFS)     -­‐-­‐  explore  nodes  in  “layers”   -­‐-­‐  can  compute  shortest  paths   -­‐-­‐  connected  components  of          undirected  graph     Run  *me  :      O(m+n)  [linear  *me]  
  4. 4. Tim  Roughgarden   The  Code   BFS  (graph  G,  start  vertex  s)   [  all  nodes  ini*ally  unexplored  ]   -­‐-­‐  mark  s  as  explored   -­‐-­‐  let  Q  =  queue  data  structure  (FIFO),  ini*alized  with  s   -­‐-­‐  while      :    -­‐-­‐  remove  the  first  node  of  Q,  call  it  v    -­‐-­‐  for  each  edge(v,w)  :            -­‐-­‐  if  w  unexplored      -­‐-­‐mark  w  as  explored      -­‐-­‐  add  w  to  Q  (at  the  end)    O(1)   *me  
  5. 5. Tim  Roughgarden   Basic  BFS  Proper*es   Claim  #1  :  at  the  end  of  BFS,  v  explored  <==>   G  has  a  path  from  s  to  v.     Reason  :  special  case  of  the  generic  algorithm     Claim  #2  :  running  *me  of  main  while  loop              =  O(ns+ms),  where  ns  =  #  of  nodes  reachable  from  s          ms  =  #  of  edges  reachable  from  s     Reason  :  by  inspec*on  of  code.  
  6. 6. Tim  Roughgarden   Goal  :  compute  dist(v),  the  fewest  #  of  edges  on  a     path  from  s  to  v.     Extra  code  :  ini*alize  dist(v)  =        0    if  v  =  s              ∞    if  v          s   -­‐ When  considering  edge  (v,w)  :   -­‐   if  w  unexplored,  then  set  dist(w)  =  dist(v)  +  1   Claim  :  at  termina*on  dist(v)  =  i    <==>    v  in  ith  layer   (i.e.,  shortest  s-­‐v  path  has  i  edges)   Proof  Idea  :  every  layer  i  node  w  is  added  to  Q  by  a  layer     (i-­‐1)  node  v  via  the  edge  (v,w)   Applica*on:  Shortest  Paths  
  7. 7. Tim  Roughgarden   Single-­‐Source  Shortest  Paths   Length of path = sum of edge lengths Input: directed graph G=(V, E). (m=|E|, n=|V| ) • each edge has non negative length le • source vertex s Output: for each , compute L(v) := length of a shortest s-v path in G Assumption: 1.  [for convenience] 2.  [important] Path  length  =  6  
  8. 8. Tim  Roughgarden   Why  Another  Shortest-­‐Path  Algorithm?   Question: doesn’t BFS already compute shortest paths in linear time? Answer: yes, IF le = 1 for every edge e. Question: why not just replace each edge e by directed path of le unit length edges: Answer: blows up graph too much Solution: Dijkstra’s shortest path algorithm.
  9. 9. Tim  Roughgarden   Dijkstra’s  Algorithm   Ini*alize:   •   X  =  [s]        [ver*ces  processed  so  far]   •   A[s]  =  0    [computed  shortest  path  distances]   •   B[s]  =  empty  path  [computed  shortest   paths]   Main  Loop   •   while    XǂV:      -­‐need  to  grow          x  by  one  node   Main  Loop  cont’d:   •            [call  it  (v*,  w*)]     •   add  w*  to  X   •   set   •   set           This array only to help explanation! Already computed in earlier iteration
  10. 10. def single_source_shortest_path(G, source, cutoff=None): if source not in G: raise nx.NodeNotFound("Source {} not in G".format(source)) def join(p1, p2): return p1 + p2 if cutoff is None: cutoff = float('inf’) # list of nodes to check at next level nextlevel = {source: 1} # paths dictionary (paths to key from source) paths = {source: [source]} return dict(_single_shortest_path(G.adj, nextlevel, paths, cutoff, join)) def _single_shortest_path(adj, firstlevel, paths, cutoff, join): level = 0 # the current level nextlevel = firstlevel while nextlevel and cutoff > level: thislevel = nextlevel nextlevel = {} for v in thislevel: for w in adj[v]: if w not in paths: paths[w] = join(paths[v], [w]) nextlevel[w] = 1 level += 1 return paths https://networkx.github.io/documentation/stable/_modules/networkx/algorithms/shortest_paths/unweighted.html#single_source_shortest_path Single Source Shortest Path implementation in networkx
  11. 11. # find distances of shortest paths in parallel def all_pairs_shortest_path_length_parallel(graph,cutoff=None,num_workers=4): nodes = list(graph.nodes) random.shuffle(nodes) if len(nodes)<50: num_workers = int(num_workers/4) elif len(nodes)<400: num_workers = int(num_workers/2) pool = mp.Pool(processes=num_workers) results = [pool.apply_async(single_source_shortest_path_length_range, args=(graph, nodes[int(len(nodes)/num_workers*i):int(len(nodes) / num_workers*(i+1))], cutoff)) for i in range(num_workers)] output = [p.get() for p in results] dists_dict = merge_dicts(output) pool.close() pool.join() return dists_dict https://github.com/JiaxuanYou/P-GNN/blob/master/utils.py All pairs shortest path length in parallel
  12. 12. def connected_components(G): seen = set() for v in G: if v not in seen: c = set(_plain_bfs(G, v)) yield c seen.update(c) def _plain_bfs(G, source): """A fast BFS node generator""" G_adj = G.adj seen = set() nextlevel = {source} while nextlevel: thislevel = nextlevel nextlevel = set() for v in thislevel: if v not in seen: yield v seen.add(v) nextlevel.update(G_adj[v]) https://networkx.github.io/documentation/stable/_modules/networkx/algorithms/components/connected.html#connected_components Connected Components implementation in networkx
  13. 13. • Adamic, Lada A., and Eytan Adar. "Friends and neighbors on the web." Social networks 25, no. 3 (2003): 211-230. • https://neo4j.com/docs/graph-algorithms/current/labs-algorithms/adamic-adar/ Link Prediction using heuristic methods The Adamic Adar algorithm was introduced in 2003 by Lada Adamic and Eytan Adar to predict links in a social network. It is computed using the following formula: where N(u) is the set of nodes adjacent to u. A value of 0 indicates that two nodes are not close, while higher values indicate nodes are closer. The library contains a function to calculate closeness between two nodes.
  14. 14. Jure Leskovec Advancements in Graph Neural Networks: PGNNs, Pretraining, and OGB Includes joint work with W. Hu, J. You, M. Fey, Y. Dong, B. Liu, M. Catasta, K. Xu, S. Jegelka, M. Zitnik, P. Liang, V. Pande
  15. 15. INPUT GRAPH TARGET NODE B D E F C A B C D A A A C F B E A Graph Neural Networks 7Jure Leskovec, Stanford University Each node defines a computation graph § Each edge in this graph is a transformation/aggregation function Scarselli et al. 2005. The Graph Neural Network Model. IEEE Transactions on Neural Networks.
  16. 16. INPUT GRAPH TARGET NODE B D E F C A B C D A A A C F B E A Graph Neural Networks 8Jure Leskovec, Stanford University Intuition: Nodes aggregate information from their neighbors using neural networks Neural networks Inductive Representation Learning on Large Graphs. W. Hamilton, R. Ying, J. Leskovec. NIPS, 2017.
  17. 17. Three Consequences of GNNs 1) The GNN does two things: § 1) Learns how to “borrow” feature information from nearby nodes to enrich the target node § 2) Each node can have a different computation graph and the network is also able to capture/learn its structure Jure Leskovec, Stanford University 19
  18. 18. Three Consequences of GNNs 2) Computation graphs can be chosen: § Aggregation does not need to happen across all neighbors § Neighbors can be strategically chosen/sampled § Leads to big gains in practice! Jure Leskovec, Stanford University 20
  19. 19. Three Consequences of GNNs 3) We understand GNN failure cases: § GNNs fail to distinguish isomorphic nodes § Nodes with identical rooted subtrees will be classified in the same class (in the absence of differentiating node features) Jure Leskovec, Stanford University 21
  20. 20. Key Insight: Anchors Idea: Nodes need to know ”where” in the network they are Solution: § Anchor: a randomly selected node § Anchor-set: a randomly selected node subset, anchor is a size-1 anchor-set § (1) Randomly choose many anchor-sets § (2) A given node can then use its distance to these anchor-sets to understand its location/position in the network Jure Leskovec, Stanford University 26
  21. 21. Overview of Position-aware GNN § (a) Randomly select anchor-sets § (b) Compute pairwise node distances § (c) Compute anchor-set messages § (d) Transform messages to node embeddings 31 !" !# !$ %&' 2 !) !* +* +" +) Anchor-set selection Embedding computation for a single node !) ℎ&- × ℎ&' ℎ&' × ℎ&' ℎ&/ × ℎ&' ℎ&0 × ℎ&' 1223 ℎ&' %&' 1 %&' 3 1226 ∈ ℝ9 ×: ;&' ∈ ℝ* Next layer Output <(!), !)) <(!), !") <(!), !*) <(!), !#) Jure Leskovec, Stanford University @ +) +" +*
  22. 22. Position-aware GNN Framework Step (b): Compute pairwise node distances § Position-based similarities: For example, shortest path, personalized PageRank, … § We use !-hop shortest path distance "#$ % ('(, '*) for fast computation Jure Leskovec, Stanford University 33 +('(, '*) = 1 "#$ % ('(, '*) + 1 '0 '1 '2 '3 '4 '0 1 0.5 0.5 0.3 0.3 '1 0.5 1 0.3 0.5 0.5 '2 0.5 0.3 1 0.5 0.3 '3 0.3 0.5 0.5 1 0.5 '4 0.3 0.5 0.3 0.5 1 !" !# !$ !% !& '& '" '% pairwise node distances +('(, '*)
  23. 23. Link Prediction 𝑥! 𝑥" 𝑥# … u ℎ$ % = 𝐸𝑚𝑏𝑒𝑑(𝑥!, 𝑥", … , 𝑥#) attributes u v1 v2 vk … neighbors ℎ$ # = 𝐴𝑔𝑔𝑟𝑒𝑔𝑎𝑡𝑒#({ℎ& % : 𝑖 ∈ 𝑁(𝑢)}) You, Jiaxuan, Rex Ying, and Jure Leskovec. "Position-aware Graph Neural Networks." International Conference on Machine Learning (ICML). 2019. anchors position w.r.t. anchors ℎ$ ' = 𝐴𝑔𝑔𝑟𝑒𝑔𝑎𝑡𝑒'({𝑑𝑖𝑠𝑡 𝑢, 𝑗 ∗ ℎ( % : 𝑗 ∈ 𝐴(𝑢)}) 𝐹𝑖𝑛𝑎𝑙 𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 ℎ$ = 𝐶𝑜𝑚𝑏𝑖𝑛𝑒 (ℎ$ %, ℎ$ #, ℎ$ ' ) 𝑤ℎ𝑒𝑟𝑒 𝐸𝑚𝑏𝑒𝑑 … , 𝐴𝑔𝑔𝑟𝑒𝑔𝑎𝑡𝑒# … , 𝐴𝑔𝑔𝑟𝑒𝑔𝑎𝑡𝑒# … 𝑎𝑛𝑑 𝐶𝑜𝑚𝑏𝑖𝑛𝑒 … 𝑎𝑟𝑒 𝑡𝑟𝑎𝑖𝑛𝑎𝑏𝑙𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠 𝐿𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑜𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 𝑓𝑜𝑟 𝑙𝑖𝑛𝑘 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑡𝑎𝑠𝑘: min ) ℒ 𝑑* 𝑓) 𝑢, 𝐺 , 𝑓) 𝑣, 𝐺 − 𝑑+ 𝑢, 𝑣 . 𝐼𝑛 𝑜𝑡ℎ𝑒𝑟 𝑤𝑜𝑟𝑑𝑠, 𝑡𝑟𝑎𝑖𝑛 𝑎𝑛 𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑔 𝑠𝑝𝑎𝑐𝑒 𝑤ℎ𝑒𝑟𝑒 𝑙𝑖𝑛𝑘𝑒𝑑 𝑛𝑜𝑑𝑒𝑠 𝑐𝑎𝑛 𝑏𝑒 𝑝𝑟𝑜𝑗𝑒𝑐𝑡𝑒𝑑 𝑡𝑜 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑠𝑢𝑏𝑠𝑝𝑎𝑐𝑒.
  24. 24. https://arxiv.org/abs/2003.04732
  25. 25. University Collaboration from our Group EDBT 2020 arXiv EYRE 2019 arXiv IBM Summer Internship 2018IBM US Summer Internship 2019 AMLD 2020 IBM Summer Internship 2019 ICADABAI 2019 IBM Global Remote Mentorship 2020 IBM Hackathon, IIM Ahmedabad 2019 Saqlain Mustaq, PES University IBM Global Remote Mentorship 2019
  26. 26. Global Remote Mentorship: http://connecttobuild.in/ Summer Internships: https://www.ibm.com/in-en/employment/internships.html Collaborate with us on research! Balaji Ganesan http://researcher.watson.ibm.com/researcher/view.php?person=in-bganesa1 @balajinix

×