• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Iterative methods for network alignment
 

Iterative methods for network alignment

on

  • 409 views

 

Statistics

Views

Total Views
409
Views on SlideShare
409
Embed Views
0

Actions

Likes
0
Downloads
4
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • great
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Iterative methods for network alignment Iterative methods for network alignment Presentation Transcript

    • Iterative Methods for Network Alignment David F. Gleich Computer Science Purdue University with Arif Khan, Alex Pothen ! Purdue University, Computer ScienceWork supported by DOE CSCAPES Institute grant (DE-FC02-08ER25864), NSF CAREER grant 1149756-CCF, Mahantesh Halappanavar!and the Center for Adaptive Super Computing Software Pacific Northwest National LabsMultithreaded Architectures (CASS-MT) at PNNL. Mohsen Bayati, Amin Saberi!Stanford’s CADS grant from the Library of Congress.PNNL is operated by Battelle Memorial Institute under Stanford Universitycontract DE-AC06-76RL01830 Ying Wang Google 1
    • Network alignment"What is the best way of matching "graph A to B? w v s r t u A B 2
    • the Figure 2. The NetworkBLAST local network alignment algorithm. Given two inputs) orodes lem Network alignment" networks, a network alignment graph is constructed. Nodes in this graph correspond to pairs of sequence-similar proteins, one from each species, and edges correspond to conserved interactions. A search algorithm identifies highly similar subnetworks that follow a prespecified interaction pattern. Adapted from Sharan and Ideker.30n the ent;nied ped lem net- one oneplest ying einsome the be-d as aphever, ap- From Sharan and Ideker, Modeling cellular machinery through biologicalrked network comparison. Nat. Biotechnol. 24, 4 (Apr. 2006), 427–433. , we Figure 3. Performance comparison of computational approaches. 3mon-
    • Gleich (Purdue) Network alignment"d F. Gleich (Purdue) Motivation DavidINFORMS (Purdue) F. Gleich Seminar Network alignment 8 / 40 Motivation INFORMS Semin INFORMS Sem H/Wikipedia: Simple alignment famm 40 60 80 100 mm 120 40 60 80 100 mm40 40 6040 80 j 100 r s60 60 t80 40 t 80 A L B A LCSH 297,266 vertices, 248,230 edges B Wikipedia 205,948 vertices, 382,353 edges L links 4,971,629 edges 60 4
    • Sometimes small procedure on the labels of the two datahe size of these datasets is reported in Table 6.5. For this aome from a text-matching becomes big … Table 6.5 size of non-Web datasets. The product graph is never formed explicitly Dataset Size Nonzeros LCSH-2 59,849 227,464 WC-3 70,509 403,960 Product graph 4,219,893,141 91,886,357,440 periment, we do not investigate all the issues involved in using … Ananth has some better techniques to work with these large problems … d problem and focus on the performance of the inner-outer al nking context. Without any parameter optimization (i.e., usi ), the inner-outer scheme shows a significant performance advd in Table 6.6. 5
    • Network alignment"What is the best way of matching "graph A to B using only edges in L? w v s r wtu t u A L B 6
    • Network alignment"Matching? 1-1 relationship"Best? highest weight and overlap w v Overlap s r wtu t u A L B 7
    • Network alignment"… is NP-hard"… has no approximation algorithm w vr Overlap s •  Computer Vision •  Ontology matching •  Database matching wtu •  Bioinformatics t u A L Bobjective = α matching + βoverlap 8
    • Network alignment"via mathematical programming maximize ↵wT x + 2 xT Sx w v s subject to Ax  e, xi 2 {0, 1} r Overlap Let xi be an indicator over edges in L wtu t u If A is the node-edge incidence matrix for L, then x is a 1-1 matching A L B Find a 1-1 matching between vertices with as many overlaps as possible. 9
    • Network alignment"via mathematical programming maximize ↵wT x + 2 xT Sx w v s subject to Ax  e, xi 2 {0, 1} r Overlap Let xi be an indicator over edges in L wtu t u Let Sij = 1 when xi and xj overlap, then xTSx is twice the overlapped count. A L B Find a 1-1 matching between vertices with as many overlaps as possible. 10
    • Our contributionsA new belief propagation method (Bayati et al. 2009, 2013)"Outperformed state-of-the-art PageRank and optimization-based heuristic methodsHigh performance C++ implementations (Khan et al. 2012)"40 times faster (C++ ~ 3, complexity ~ 2, threading ~ 8)"5 million edge alignments ~ 10 sec"www.cs.purdue.edu/~dgleich/codes/netalignmc 11
    • Iterative methods " for network alignment Each iteration involves Let x[i] be the score forMatrix-vector-ish computations each pair-wise match in Lwith a sparse matrix, e.g. sparsematrix vector products in a semi- for i=1 to ...ring, dot-products, axpy, etc. update x[i] to y[i]Bipartite max-weight matching compute ausing a different weight vector at max-weight match with yeach iteration update y[i] to x[i]" (using match in MR)No “convergence” "100-1000 iterations 13
    • Open question 1!Any sort of property of "these methods beyond ... (i) Principled derivation and "(ii) “David and Ananth say they work”? 14
    • David F. Gleich (Purdue) Algorithms INFORMS Seminar 25 / 40 Belief propagation methodsBelief propagation: Our algorithm mm 40 60 80 100 120Summary History … Construct a probability … BP used for computing model where the most 40 marginal probabilities and likely state is the solution! maximum aposterori … Locally update information probability … Like a generalized dynamic … Wildly successful at solving program 60 satisfiability problems … Convergent algorithm for … It works max-weight matching … Most likely, it won’t 80 converge 15 Bayati et al. 2005;
    • David F. Gleich (Purdue) Algorithms INFORM Belief propagation for The neighbor operation used to define the left-hand vector x@fi is implicitly defined b NetAlign factor graph: Loopy BP the set of variables used on the right-hand side of the equation. In words, the functio network alignment node fi (gi0 ) enforces the matching constraint at i (i0 ) Another type of function nodes check the validity of squares. For each square ii0 ⇤ j mm define a function node hii0 jj 0 : {0, 1}40|+|S| ! R: 60 |EL 80 100 ( 1 xii0 jj 0 = xii0 xjj 0 Functions Variables hii0 jj 0 x@hii0 jj0 = for all (ii0 , jj 0 ) 2 VS . A B 0 otherwise ƒ1 11 0 In other words, 1 ii0 jj 0 40 h ƒ2 guarantees that xii0 jj 0 = 1 if and only if xii0 = xjj 0 = 1. 10 12 0 The edges of the factor graph are simply0 connecting each g0 function node to the variab 1 22nodes it acts on. For example each fi is connected to all variable nodes ii0 2 EL and eac 2 20 0 0 0 0 0hii0 jj 0 is connected to ii , jj and ii jj in EL 23 0 [ V . Thereforeg2 factor graph is bipartit the S g0 Figure 3 shows an example of a graph pair A, B and their factor-graph representation a 3described above. 60 30 110 220 Now define the following probability distribution h110 220 2 3 n m 1 4Y Y Y T T p(xL , xS ) = fi (x@fi ) gj (x@gj ) hijrs (x@hijrs )5 e↵w xL + 2 1|S| xS (4 Z i=1 80 j=1 ijrs2VSwhere Z is just a normalization term to make p(xL , xS ) a probability distribution. I 16 Note It’s pretty hairy to put all the stuff I should put here on a single slide. Most of it is in the papparticular, The rest is just “turning the crank” with standard tricks in BP algorithms. 2 3
    • Algorithms INFORMS Seminar 26 / 40 M !j { = s} = Y i Mj0 ! { = s}60 80 100 120 j0 2{N( )j} j variable tells function j what it thinks about being in state s. This is just the product of what all the other functions tell about being in state s. i Mj! { = s} = m xim mns y:all possible choices 17 for variables 0 2N(j)
    • variable tells function j what it thinks about being in state s. This is just the product of what all the other functions tell about being in state s. i Mj! { = s} = m xim m ns y:all possible choices for variables 0 2N(j) 2 3es j Y 6 0 7 4ƒj (y) M 0 !j { = y 0 }5 0 2{N(j) } function j tells variable what it thinks about being in state s. This means that wel have to locally maxamize ƒj among all possible choices. Note y = s always (too cumbersome to include in notation.) 18
    • Belief propagation for network alignment For t 1, the messages in iteration t are obtained from the messages initeration t 1 recursively. In particular for all ii 0 2 EL ✓ h i ◆+ (t) (t 1) mii 0 !fi = ↵wii 0 max mki 0 !g 0 k 6=i i X ✓ ◆ (t 1) + min , max(0, + mjj 0 !h 0 ) . (1) 2 2 ii jj 0 ii 0 jj 0 2VS (t)The update rule for mii 0 !g 0 is similar, and i ✓ h i ◆+ ✓ h i ◆+ (t) (t 1) (t 1) mii 0 !h 0 0 = ↵wii 0 max mki 0 !g 0 max mik 0 !fi ii jj k6=i i k 0 6=i 0 X ✓ ◆ (t 1) + min , max(0, mkk 0 !h 0 0 + ) . (2) 0 0 2 ii kk 2 kk 6=jj ii 0 kk 0 2VS
    • Synthetic evaluation of network alignment 1 0.8 fraction correct 0.6 0.4 MR 0.2 BP BPSC IsoRank 0 10 15 20 0 5 10 15 20degree of noise in L (p ⋅ n) expected degree of noise in L (p ⋅ n)
    • Open question 2!When could we hope to solve suchsynthetic problems in asymptoticregimes? 21
    • Does it work? LCSH – Library of Congress subject headings Network Alignment :25 Rameau – French National Library subject headingsTable IV. The alignment results for LCSH and Rameau. The first set of results shows the statistics of the known Manually matchedalignment and the results from the max-weight matching algorithm. Next we show results from our algorithms forthree objective parameters. The columns are: objective parameters, algorithms, matching weight, matching edgeoverlap, time, total correct, recall, precision, and matching triangle overlap. Obj. Alg. Weight Overlap Time (s) Correct Rec. Prec. Triangles Sol. 36332.42 39847 — 57645 100% 100% 2073 MWM 93279.0 16990 29.6 29098 50.5% 23.3% 350 ↵ = 1, =1 BP" MP 84622.0 46400 23522.0 32585 56.5% 27.6% 1515 BP++" MP++ 85810.1 46942 27115.6 32857 57.0% 27.4% 1548 MR MR 87588.6 48367 33366.9 33225 57.6% 27.0% 1617 ↵ = 1, =2 BP" MP 81752.6 46569 23427.1 31724 55.0% 27.6% 1483 BP++" MP++ 84615.7 46656 26673.1 31952 55.4% 26.7% 1531 MR MR 85438.4 48934 56961.6 32303 56.0% 26.3% 1604 ↵ = 0, =1 BP" MP 60617.9 45247 14284.8 24794 43.0% 23.2% 1467 BP++" MP++ 60502.8 41592 13979.5 24498 42.5% 23.0% 1484 MR MR 65994.2 46163 10384.4 25455 44.2% 21.5% 1602 22 protein-protein interaction networks and ontologies. In the future, we envision applications of these techniques in mapping large social network structure.
    • Open question 3!How can we evaluate alignments? "What are possible null-models? 23
    • David F. Gleich (Purdue) Results INFORMS Seminar 35 / 40Matching results: A little too hot! mm 40 60 LCSH WC 80 100 120 Science fiction television series Science fiction television programs Turing test Turing test Machine learning Machine learning Hot tubs Hot dog 40 60 80 24
    • Higher-order " } This proposal is for match- network alignment using tensor ing triangles methods: k in 0 j Triangle jg this i k i 0 0 k euris- using A L B istics If xi , xj , and xk are algo- indicators associated with 25volves the edges (i, i0 ), (j, j 0 ), and
    • Network alignment" A L B This proposal is for programming via mathematical match- ing triangles using tensor methods:n maximize ↵wT x + 2 xT Sx j Triangle j0 subject to Ax  e, xi 2 {0, 1}s i k 0 i 0 k -g A L Bs If xi , xj , and xk are - indicators1-1 matching between vertices with Find a associated withs theas many overlaps0 ), and edges (i, i0 ), (j, j as possible. 26o (k, k 0 ), then we want to
    • Triangle alignment" A L B This proposal is for programming via mathematical match- ing triangles using tensor methods:n ↵wT x + 2 xT Sx X j Triangle j0 maximize + Tijk xi xj xks i k0 i0 ijk k subject to Ax  e, xi 2 {0, 1} -g A L Bs If xi , xj , and xk are - indicators1-1 matching between vertices with Find a associated withs edges (i, i0 ), (j, j and triangles as possible. theas many overlaps0 ), and 27o (k, k 0 ), then we want to
    • Tensor eigenvalues" A L B This proposal is for match- and a power method ing triangles using tensor methods: P maximize ijk Tijk xi xj xkn subject to kxk2 = 1 j Triangle j0s i k0 i0 [x(next) ]i = ⇢ · ( X Tijk xj xk + xi ) k - jk where 𝜌 ensures the 2-normg SSHOPM method due to " A L B Kolda and Mayos IfHuman,protein interaction networks 48,228 triangles xi xj , and xk are - indicatorsinteraction networks with triangles Yeast protein associated nonzeros257,978 The tensor T has ~100,000,000,000s the We work with it i0 ), (j, j 0 ), and edges (i, implicitly 28o (k, k 0 ), then we want to
    • Synthetic evaluation of network alignment 1 1 0.8 0.8 fraction correctfraction correct 0.6 0.6 0.4 0.4 0.2 Eigen MR Teigen 0.2 BP Iso 0 BPSC 0 5 10 15 20 expected degree of noise in L (p n) IsoRank 0 10 15 20 0 5 10 15 20degree of noise in L (p ⋅ n) expected degree of noise in L (p ⋅ n)
    • Open question 4!When do we need triangles? 30