A multithreaded method for network alignment

749 views
623 views

Published on

My talk from SC about network alignment.

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
749
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

A multithreaded method for network alignment

  1. 1. A multithreaded algorithm for network alignment v w David F. Gleich r Overlap s Computer Science Purdue University wtu with u t A L B Arif Khan, Alex Pothen Purdue University, Computer Science Work supported by DOE CSCAPES Institute grant (DE- Mahantesh HalappanavarFC02-08ER25864), NSF CAREER grant 1149756-CCF,and the Center for Adaptive Super Computing Software Pacific Northwest National LabsMultithreaded Architectures (CASS-MT) at PNNL. PNNLis operated by Battelle Memorial Institute under contract 1DE-AC06-76RL01830
  2. 2. Network alignment"What is the best way of matching "graph A to B using only edges in L? w v Overlap sr wtu t u A L BFind a 1-1 matching between verticeswith as many overlaps as possible. 2
  3. 3. Network alignment"… is NP-hard"… has no approximation algorithm w vr Overlap s •  Computer Vision •  Ontology matching •  Database matching wtu •  Bioinformatics t u A L Bobjective = α matching + βoverlap 3
  4. 4. the Figure 2. The NetworkBLAST local network alignment algorithm. Given two inputs) orodes lem Network alignment" networks, a network alignment graph is constructed. Nodes in this graph correspond to pairs of sequence-similar proteins, one from each species, and edges correspond to conserved interactions. A search algorithm identifies highly similar subnetworks that follow a prespecified interaction pattern. Adapted from Sharan and Ideker.30n the ent;nied ped lem net- one oneplest ying einsome the be-d as aphever, ap- From Sharan and Ideker, Modeling cellular machinery through biologicalrked network comparison. Nat. Biotechnol. 24, 4 (Apr. 2006), 427–433. , we Figure 3. Performance comparison of computational approaches. 4mon-
  5. 5. Our contributionMulti-threaded network alignment via a new, multi-threadedapproximation algorithm for max-weight bipartite matchingprocedure with linear complexity415 sec High performance C++ implementations " 40-times faster (on 16 cores – Xeon E5-2670)" (C++ ~ 3, complexity ~ 2, threading ~ 8)" www.cs.purdue.edu/~dgleich/codes/netalignmc10 sec. ... enabling interactive computation! 5
  6. 6. … the best methods in a recent survey …Bayati, Gleich, et al. TKDE forthcomingBelief Propagation" Klau’s Matching Relaxation!Use a probabilistic Iterative improve anrelaxation and iteratively upper-bound on thefind the probability that solution via a sub-an edge is in the gradient methodmatching, given the applied to theprobabilities of its Lagrangian"neighboring edges 6
  7. 7. Each iteration involves Let x[i] be the score forMatrix-vector-ish computations each pair-wise match in Lwith a sparse matrix, e.g. sparsematrix vector products in a semi- for i=1 to ...ring, dot-products, axpy, etc. update x[i] to y[i]Bipartite max-weight matching compute ausing a different weight vector at max-weight match with yeach iteration update y[i] to x[i]" (using match in MR)No “convergence” "100-1000 iterations 7
  8. 8. The methodsEach iteration involves! Belief Propagation! ! Listing 2. A belief-propagation message passing procedure for network alignment. See the text for a description of othermax and round heuristic. D 1 y(0) = 0, z(0) = 0, d(0) = 0, S(k) = 0 tMatrix-vector-ish computations ! 2 3 for k = 1 to niter T F = bound0, [ S + S(k) ] Step 1: compute F O swith a sparse matrix, e.g. sparse 4 d = ↵w + Fe Step 2: compute d a ! 5 y(k) = d othermaxcol(z(k 1) ) Step 3: othermax imatrix vector products in a semi- 6 z(k) = d othermaxrow(y(k 1) ) i h S(k) = diag(y(k) + z(k) d)S F Step 4: update S ! 7ring, dot-products, axpy, etc. 8 (y(k) , z(k) , S(k) ) k (y(k) , z(k) , S(k) )+ O a 9 (1 k )(y(k 1) , z(k 1) , S(k 1) ) Step 5: damping e 10 11 ! round heuristic (y(k) ) Step 6: matching round heuristic (z(k) ) Step 6: matching I 12 endBipartite max-weight matching return y(k) or z(k) with the largest objective value ! 13 t pusing a different weight vector at m ! weach iteration interpretation, the weight vectors are usually called messages as they communicate the “beliefs” of each “agent.” In this A particular problem, the neighborhood of an agent represents all of the other edges in graph L incident on the same vertex s 9 in graph A (1st vector), all edges in L incident on the same fi vertex in graph B (2nd vector), or the edges in L that are “
  9. 9. The NEW methods Each iteration involves! Belief Propagation! el ! Listing 2. A belief-propagation message passing procedure for network alignment. See the text for a description of othermax and round heuristic. D lParal (0) (0) (0) (k) y = 0, z = 0, d = 0, S = 0 1 t ! F = bound Matrix-vector-ish computations for k = 1 to n [ S + S ] Step 1: compute F 2 3 iter 0, (k) T O s with a sparse matrix, e.g. sparse d = ↵wd+ Fe Step 2: compute dStep 3: othermax 4 a ! y = d othermaxrow(y )) = 5 (k) othermaxcol(z (k 1) i matrix vector products in a semi- z 6 (k) (k) (k 1) i h S = diag(y + z d)S F Step 4: update S (k) (k) ! (y , z , S ) (y , z , S )+ 7 ring, dot-products, axpy, etc. 8 (k) (k) (k) k (k) (k) (k) O a 9 (1 k )(y(k 1) , z(k 1) , S(k 1) ) Step 5: damping e 10 11 ! round heuristic (y(k) ) Step 6: matching round heuristic (z(k) ) Step 6" I 12 end approx matching Approximate bipartite max- return y or z with the largest objective value (k) (k) ! 13 t p weight matching is used here m ! w instead! interpretation, the weight vectors are usually called messages as they communicate the “beliefs” of each “agent.” In this A particular problem, the neighborhood of an agent represents 10 all of the other edges in graph L incident on the same vertex s in graph A (1st vector), all edges in L incident on the same fi vertex in graph B (2nd vector), or the edges in L that are “
  10. 10. MR Approximation doesn’t hurt the between the Library of Congress r 0.2 ApproxMRpedia categories (lcsh-wiki). While BP e hierarchical tree, they also have belief propagation algorithm ApproxBP r types of relationships. Thus we 0 0 5 10 15 20 l graphs. The second problem is an expected degree of noise in L (p ⋅ n)rary of Congress subject headingsFrench National Library: Rameau. 1d weights in L are computed via a heading strings (and via translated of correct matchau). These problems are larger than 0.8 BP a Fraction fraction correct indis nd App tingu roxB NMENT WITH APPROXIMATE 0.6 isha P ATCHING ble are ss the question: how does the be- 0.4 d the BP method change when wematching procedure from Section V MR 0.2 ApproxMR step in each algorithm? Note that BP ching in the first step of Klau’s ApproxBPch) because the problems in each 0we parallelize over perturb onealso 0 5 10 15 20 Randomly rows. Note expected degree of noise in L (p ⋅ n) is much more integral to Klau’s B power-law graph to get A, The amount of random-ness in L in average expected degreeedure. Generate L by the true-we Fig. 2. Alignment with a power-law graph shows the large effect that For the BP procedure, ing problem to evaluate the quality approximate rounding can have on solutions from Klau’s method (MR). With 11 match + random edgesKlau’s method, the results of the that method, using exact rounding will yield the identity matching for all problems (bottom figure), whereas using the approximation results in over a
  11. 11. The methods (in more detail)Belief Propagation" Klau’s Matching Relaxation!for i=1 to ... for i=1 to ... update x[i] to y[i] update x[i] to y[i] compute a compute a max-weight match max-weight match with y with y save y if it is the update y[i] to x[i] best result so far based on the matchThe matching is incidental to the BP method, but integral to Klau’s MR method 12
  12. 12. On real-world problems, ApproxMR isn’t so different 400 375 Upper overlap upper bound 381 On a protein- 350 bound on protein align- overlap 300 ment problem, there is little 250 difference withOverlap 200 exact vs. Upper bound on approximate 150 matching matching max weight 100 BP 671.551 AppBP 50 AppMR MR 0 0 100 200 300 400 500 600 13 Weight
  13. 13. Algorithmic analysis v w Exact runtime s r matrix + matching with " matrix ≪ matching u O(|EL| + |S|) + O(|EL| N log N) t A L B Our approx. runtime!Algorithmic parameters matrix + approx. matching!|EL| number of edges in L O(|EL| + |S|) + O(|EL|) |S| number of potential overlaps 14
  14. 14. A local dominating edgemethod for bipartite matching j i The method guarantees r s •  ½ approximation •  maximal matching based on work by Preis (1999), Manne and wtu Bisseling (2008), and t u Halappanavar et al (2012) A L BA locally dominating edge is an edgeheavier than all neighboring edges.For bipartite Work on smaller side only 15
  15. 15. A local dominating edgemethod for bipartite matching j Queue all vertices i r s Until queue is empty! In Parallel over vertices! Match to heavy edge and if there’s a conflict, wtu u check the winner, and t find an alternative for A L B the loser Add endpoint of non-A locally dominating edge is an edge dominating edges toheavier than all neighboring edges. the queueFor bipartite Work on smaller side only 16
  16. 16. A local dominating edgemethod for bipartite matching j i Customized first iteration r s (with all vertices) Use OpenMP locks to update choices wtu t u Use sync_and_fetch_add A L B for queue updates.A locally dominating edge is an edgeheavier than all neighboring edges.For bipartite Work on smaller side only 17
  17. 17. Remaining multi-threadingprocedures are straightforwardStandard OpenMP for matrix-computations" use schedule=dynamic to handle skewWe can batch the matching procedures in theBP method for additional parallelism for i=1 to ... update x[i] to y[i] save y[i] in a buffer when the buffer is full compute max-weight match for all in buffer and save the best 18
  18. 18. TABLE IIed F OR EACH PROBLEM IN OUR BIOINFORMATICS AND ONTOLOGY SETS , WEto Real-world data sets REPORT THE NUMBER OF VERTICES IN GRAPH A AND B, THE NUMBER OF EDGES IN THE GRAPH L, AND THE NUMBER OF NONZEROS IN S.ch Problem |VA | |VB | |EL | |S| dmela-scere 9,459 5,696 34,582 6,860= homo-musm 3,247 9,695 15,810 12,180e. lcsh-wiki 297,266 205,948 4,971,629 1,785,310ed lcsh-rameau 154,974 342,684 20,883,500 4,929,272bed; st Algorithmic parameters Our approx. runtime order to match vertices. We experimented with an initialization algorithm tailored for bipartite graphs by approx. matching matrix + spawning threads |EL| number of edges in L only from one of the vertex sets VO(|E |V+ |S|) identify|) A or B to + O(|E locally |S| number of potential overlaps L L dominant edges. If the thread is responsible for matching a -1 vertex in VA , then it has to check the adjacency sets of the 19 vertices in VB that are adjacent to it in order to determine if the
  19. 19. Performance evaluation(2x4)-10 core Intel E7-8870, 2.4 GHz (80-cores)16 GB memory/proc (128 GB)Scaling study Mem Mem Mem Mem1.  Thread binding " CPU CPU CPU CPU scattered vs. compact CPU CPU CPU CPU2.  Memory binding " Mem Mem Mem Mem interleaved vs. bind 20
  20. 20. Scaling BP with no batching lcsh-rameau, 400 iterations 25 scatter and interleave 20 Speedup 15 115 seconds for 40-thread 10 5 1450 seconds for 1-thread 0 0 20 40 60 80 Threads 21
  21. 21. Scaling BP with no batching 25 compact and interleave compact and membind 20 scatter and interleave scatter and membind Speedup 15 10 5 0 0 20 40 60 80 Threads 22
  22. 22. 25 compact and interleave compact and membind Scaling 20 scatter and interleave scatter and membind Speedup 15 25 compact and interleave 10 compact and membind 20 scatter and interleave scatter and membind 5 BP with no batchingSpeedup 15 0 0 20 40 60 80 10 25 Threads compact and interleave 5 Klau’s MR method 20 compact and membind scatter and interleave 0 scatter and membind 0 20 40 60 80 Speedup 15 Threads In all cases, we get a 10 speedup of around 12-15 on 40-cores with scatter 5 threads and interleaved BP with batch=20 23 0 memory 0 20 40 Threads 60 80
  23. 23. Summary & Conclusions•  Tailored algorithm for approx. max-weight bipartite matching•  Algorithmic improvement in network alignment methods•  Multi-threaded C++ code for network alignment 415 seconds -> 10 seconds (40-times overall speedup) For large problems, interactive network alignment is possible Future work Memory control, improved methods Work supported by DOE CSCAPES Institute grant (DE- Code and data available! FC02-08ER25864), NSF CAREER grant 1149756-CCF, www.cs.purdue.edu/~dgleich/ and the Center for Adaptive Super Computing Software codes/netalignmc Multithreaded Architectures (CASS-MT) at PNNL. PNNL is operated by Battelle Memorial Institute under contract 24 DE-AC06-76RL01830

×