Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Anti-differentiating approximation ... by David Gleich 733 views
- Tall and Skinny QRs in MapReduce by David Gleich 1482 views
- What you can do with a tall-and-ski... by David Gleich 2801 views
- MapReduce Tall-and-skinny QR and ap... by David Gleich 880 views
- Gaps between the theory and practic... by David Gleich 1184 views
- Anti-differentiating Approximation ... by David Gleich 868 views

963 views

Published on

My talk from SC about network alignment.

No Downloads

Total views

963

On SlideShare

0

From Embeds

0

Number of Embeds

8

Shares

0

Downloads

16

Comments

0

Likes

1

No embeds

No notes for slide

- 1. A multithreaded algorithm for network alignment v w David F. Gleich r Overlap s Computer Science Purdue University wtu with u t A L B Arif Khan, Alex Pothen Purdue University, Computer Science Work supported by DOE CSCAPES Institute grant (DE- Mahantesh HalappanavarFC02-08ER25864), NSF CAREER grant 1149756-CCF,and the Center for Adaptive Super Computing Software Paciﬁc Northwest National LabsMultithreaded Architectures (CASS-MT) at PNNL. PNNLis operated by Battelle Memorial Institute under contract 1DE-AC06-76RL01830
- 2. Network alignment"What is the best way of matching "graph A to B using only edges in L? w v Overlap sr wtu t u A L BFind a 1-1 matching between verticeswith as many overlaps as possible. 2
- 3. Network alignment"… is NP-hard"… has no approximation algorithm w vr Overlap s • Computer Vision • Ontology matching • Database matching wtu • Bioinformatics t u A L Bobjective = α matching + βoverlap 3
- 4. the Figure 2. The NetworkBLAST local network alignment algorithm. Given two inputs) orodes lem Network alignment" networks, a network alignment graph is constructed. Nodes in this graph correspond to pairs of sequence-similar proteins, one from each species, and edges correspond to conserved interactions. A search algorithm identiﬁes highly similar subnetworks that follow a prespeciﬁed interaction pattern. Adapted from Sharan and Ideker.30n the ent;nied ped lem net- one oneplest ying einsome the be-d as aphever, ap- From Sharan and Ideker, Modeling cellular machinery through biologicalrked network comparison. Nat. Biotechnol. 24, 4 (Apr. 2006), 427–433. , we Figure 3. Performance comparison of computational approaches. 4mon-
- 5. Our contributionMulti-threaded network alignment via a new, multi-threadedapproximation algorithm for max-weight bipartite matchingprocedure with linear complexity415 sec High performance C++ implementations " 40-times faster (on 16 cores – Xeon E5-2670)" (C++ ~ 3, complexity ~ 2, threading ~ 8)" www.cs.purdue.edu/~dgleich/codes/netalignmc10 sec. ... enabling interactive computation! 5
- 6. … the best methods in a recent survey …Bayati, Gleich, et al. TKDE forthcomingBelief Propagation" Klau’s Matching Relaxation!Use a probabilistic Iterative improve anrelaxation and iteratively upper-bound on theﬁnd the probability that solution via a sub-an edge is in the gradient methodmatching, given the applied to theprobabilities of its Lagrangian"neighboring edges 6
- 7. Each iteration involves Let x[i] be the score forMatrix-vector-ish computations each pair-wise match in Lwith a sparse matrix, e.g. sparsematrix vector products in a semi- for i=1 to ...ring, dot-products, axpy, etc. update x[i] to y[i]Bipartite max-weight matching compute ausing a different weight vector at max-weight match with yeach iteration update y[i] to x[i]" (using match in MR)No “convergence” "100-1000 iterations 7
- 8. The methodsEach iteration involves! Belief Propagation! ! Listing 2. A belief-propagation message passing procedure for network alignment. See the text for a description of othermax and round heuristic. D 1 y(0) = 0, z(0) = 0, d(0) = 0, S(k) = 0 tMatrix-vector-ish computations ! 2 3 for k = 1 to niter T F = bound0, [ S + S(k) ] Step 1: compute F O swith a sparse matrix, e.g. sparse 4 d = ↵w + Fe Step 2: compute d a ! 5 y(k) = d othermaxcol(z(k 1) ) Step 3: othermax imatrix vector products in a semi- 6 z(k) = d othermaxrow(y(k 1) ) i h S(k) = diag(y(k) + z(k) d)S F Step 4: update S ! 7ring, dot-products, axpy, etc. 8 (y(k) , z(k) , S(k) ) k (y(k) , z(k) , S(k) )+ O a 9 (1 k )(y(k 1) , z(k 1) , S(k 1) ) Step 5: damping e 10 11 ! round heuristic (y(k) ) Step 6: matching round heuristic (z(k) ) Step 6: matching I 12 endBipartite max-weight matching return y(k) or z(k) with the largest objective value ! 13 t pusing a different weight vector at m ! weach iteration interpretation, the weight vectors are usually called messages as they communicate the “beliefs” of each “agent.” In this A particular problem, the neighborhood of an agent represents all of the other edges in graph L incident on the same vertex s 9 in graph A (1st vector), all edges in L incident on the same ﬁ vertex in graph B (2nd vector), or the edges in L that are “
- 9. The NEW methods Each iteration involves! Belief Propagation! el ! Listing 2. A belief-propagation message passing procedure for network alignment. See the text for a description of othermax and round heuristic. D lParal (0) (0) (0) (k) y = 0, z = 0, d = 0, S = 0 1 t ! F = bound Matrix-vector-ish computations for k = 1 to n [ S + S ] Step 1: compute F 2 3 iter 0, (k) T O s with a sparse matrix, e.g. sparse d = ↵wd+ Fe Step 2: compute dStep 3: othermax 4 a ! y = d othermaxrow(y )) = 5 (k) othermaxcol(z (k 1) i matrix vector products in a semi- z 6 (k) (k) (k 1) i h S = diag(y + z d)S F Step 4: update S (k) (k) ! (y , z , S ) (y , z , S )+ 7 ring, dot-products, axpy, etc. 8 (k) (k) (k) k (k) (k) (k) O a 9 (1 k )(y(k 1) , z(k 1) , S(k 1) ) Step 5: damping e 10 11 ! round heuristic (y(k) ) Step 6: matching round heuristic (z(k) ) Step 6" I 12 end approx matching Approximate bipartite max- return y or z with the largest objective value (k) (k) ! 13 t p weight matching is used here m ! w instead! interpretation, the weight vectors are usually called messages as they communicate the “beliefs” of each “agent.” In this A particular problem, the neighborhood of an agent represents 10 all of the other edges in graph L incident on the same vertex s in graph A (1st vector), all edges in L incident on the same ﬁ vertex in graph B (2nd vector), or the edges in L that are “
- 10. MR Approximation doesn’t hurt the between the Library of Congress r 0.2 ApproxMRpedia categories (lcsh-wiki). While BP e hierarchical tree, they also have belief propagation algorithm ApproxBP r types of relationships. Thus we 0 0 5 10 15 20 l graphs. The second problem is an expected degree of noise in L (p ⋅ n)rary of Congress subject headingsFrench National Library: Rameau. 1d weights in L are computed via a heading strings (and via translated of correct matchau). These problems are larger than 0.8 BP a Fraction fraction correct indis nd App tingu roxB NMENT WITH APPROXIMATE 0.6 isha P ATCHING ble are ss the question: how does the be- 0.4 d the BP method change when wematching procedure from Section V MR 0.2 ApproxMR step in each algorithm? Note that BP ching in the ﬁrst step of Klau’s ApproxBPch) because the problems in each 0we parallelize over perturb onealso 0 5 10 15 20 Randomly rows. Note expected degree of noise in L (p ⋅ n) is much more integral to Klau’s B power-law graph to get A, The amount of random-ness in L in average expected degreeedure. Generate L by the true-we Fig. 2. Alignment with a power-law graph shows the large effect that For the BP procedure, ing problem to evaluate the quality approximate rounding can have on solutions from Klau’s method (MR). With 11 match + random edgesKlau’s method, the results of the that method, using exact rounding will yield the identity matching for all problems (bottom ﬁgure), whereas using the approximation results in over a
- 11. The methods (in more detail)Belief Propagation" Klau’s Matching Relaxation!for i=1 to ... for i=1 to ... update x[i] to y[i] update x[i] to y[i] compute a compute a max-weight match max-weight match with y with y save y if it is the update y[i] to x[i] best result so far based on the matchThe matching is incidental to the BP method, but integral to Klau’s MR method 12
- 12. On real-world problems, ApproxMR isn’t so different 400 375 Upper overlap upper bound 381 On a protein- 350 bound on protein align- overlap 300 ment problem, there is little 250 difference withOverlap 200 exact vs. Upper bound on approximate 150 matching matching max weight 100 BP 671.551 AppBP 50 AppMR MR 0 0 100 200 300 400 500 600 13 Weight
- 13. Algorithmic analysis v w Exact runtime s r matrix + matching with " matrix ≪ matching u O(|EL| + |S|) + O(|EL| N log N) t A L B Our approx. runtime!Algorithmic parameters matrix + approx. matching!|EL| number of edges in L O(|EL| + |S|) + O(|EL|) |S| number of potential overlaps 14
- 14. A local dominating edgemethod for bipartite matching j i The method guarantees r s • ½ approximation • maximal matching based on work by Preis (1999), Manne and wtu Bisseling (2008), and t u Halappanavar et al (2012) A L BA locally dominating edge is an edgeheavier than all neighboring edges.For bipartite Work on smaller side only 15
- 15. A local dominating edgemethod for bipartite matching j Queue all vertices i r s Until queue is empty! In Parallel over vertices! Match to heavy edge and if there’s a conﬂict, wtu u check the winner, and t ﬁnd an alternative for A L B the loser Add endpoint of non-A locally dominating edge is an edge dominating edges toheavier than all neighboring edges. the queueFor bipartite Work on smaller side only 16
- 16. A local dominating edgemethod for bipartite matching j i Customized ﬁrst iteration r s (with all vertices) Use OpenMP locks to update choices wtu t u Use sync_and_fetch_add A L B for queue updates.A locally dominating edge is an edgeheavier than all neighboring edges.For bipartite Work on smaller side only 17
- 17. Remaining multi-threadingprocedures are straightforwardStandard OpenMP for matrix-computations" use schedule=dynamic to handle skewWe can batch the matching procedures in theBP method for additional parallelism for i=1 to ... update x[i] to y[i] save y[i] in a buffer when the buffer is full compute max-weight match for all in buffer and save the best 18
- 18. TABLE IIed F OR EACH PROBLEM IN OUR BIOINFORMATICS AND ONTOLOGY SETS , WEto Real-world data sets REPORT THE NUMBER OF VERTICES IN GRAPH A AND B, THE NUMBER OF EDGES IN THE GRAPH L, AND THE NUMBER OF NONZEROS IN S.ch Problem |VA | |VB | |EL | |S| dmela-scere 9,459 5,696 34,582 6,860= homo-musm 3,247 9,695 15,810 12,180e. lcsh-wiki 297,266 205,948 4,971,629 1,785,310ed lcsh-rameau 154,974 342,684 20,883,500 4,929,272bed; st Algorithmic parameters Our approx. runtime order to match vertices. We experimented with an initialization algorithm tailored for bipartite graphs by approx. matching matrix + spawning threads |EL| number of edges in L only from one of the vertex sets VO(|E |V+ |S|) identify|) A or B to + O(|E locally |S| number of potential overlaps L L dominant edges. If the thread is responsible for matching a -1 vertex in VA , then it has to check the adjacency sets of the 19 vertices in VB that are adjacent to it in order to determine if the
- 19. Performance evaluation(2x4)-10 core Intel E7-8870, 2.4 GHz (80-cores)16 GB memory/proc (128 GB)Scaling study Mem Mem Mem Mem1. Thread binding " CPU CPU CPU CPU scattered vs. compact CPU CPU CPU CPU2. Memory binding " Mem Mem Mem Mem interleaved vs. bind 20
- 20. Scaling BP with no batching lcsh-rameau, 400 iterations 25 scatter and interleave 20 Speedup 15 115 seconds for 40-thread 10 5 1450 seconds for 1-thread 0 0 20 40 60 80 Threads 21
- 21. Scaling BP with no batching 25 compact and interleave compact and membind 20 scatter and interleave scatter and membind Speedup 15 10 5 0 0 20 40 60 80 Threads 22
- 22. 25 compact and interleave compact and membind Scaling 20 scatter and interleave scatter and membind Speedup 15 25 compact and interleave 10 compact and membind 20 scatter and interleave scatter and membind 5 BP with no batchingSpeedup 15 0 0 20 40 60 80 10 25 Threads compact and interleave 5 Klau’s MR method 20 compact and membind scatter and interleave 0 scatter and membind 0 20 40 60 80 Speedup 15 Threads In all cases, we get a 10 speedup of around 12-15 on 40-cores with scatter 5 threads and interleaved BP with batch=20 23 0 memory 0 20 40 Threads 60 80
- 23. Summary & Conclusions• Tailored algorithm for approx. max-weight bipartite matching• Algorithmic improvement in network alignment methods• Multi-threaded C++ code for network alignment 415 seconds -> 10 seconds (40-times overall speedup) For large problems, interactive network alignment is possible Future work Memory control, improved methods Work supported by DOE CSCAPES Institute grant (DE- Code and data available! FC02-08ER25864), NSF CAREER grant 1149756-CCF, www.cs.purdue.edu/~dgleich/ and the Center for Adaptive Super Computing Software codes/netalignmc Multithreaded Architectures (CASS-MT) at PNNL. PNNL is operated by Battelle Memorial Institute under contract 24 DE-AC06-76RL01830

No public clipboards found for this slide

Be the first to comment