The Structure of Level-k Phylogenetic Networks

855 views

Published on

A presentation on the combinatorics of level-k networks (CPM 2009, Lille)

Published in: Education, Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
855
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The Structure of Level-k Phylogenetic Networks

  1. 1. Lille – 24/06/2009 – CPM'09 The Structure of Level-k Phylogenetic Networks Philippe Gambette in collaboration with Vincent Berry, Christophe Paul
  2. 2. Outline • Phylogenetic networks • Decomposition of level-k networks • Construction of level-k generators • Number of level-k generators • Simulated level-k networks
  3. 3. Outline • Phylogenetic networks • Decomposition of level-k networks • Construction of level-k generators • Number of level-k generators • Simulated level-k networks
  4. 4. Phylogenetic networks level-2 split network T-Rex network Level-2 reticulogram SplitsTree minimum synthesis spanning network diagram HorizStory median network Network TCS
  5. 5. Phylogenetic networks level-2 split network T-Rex network Level-2 reticulogram SplitsTree minimum synthesis spanning network diagram HorizStory median network Network TCS
  6. 6. Abstract or explicit networks An explicit phylogenetic network is a phyogenetic network where all reticulations can be interpreted as precise biological events. An abstract network reflects some phylogenetic signals rather than explicitly displaying biological reticulation events. Doolittle : Uprooting the Tree of Life, Scientific American (Feb..2000)
  7. 7. Abstract or explicit networks An explicit phylogenetic network is a phyogenetic network where all reticulations can be interpreted as precise biological events. An abstract network reflects some phylogenetic signals rather than explicitly displaying biological reticulation events. Abstract phylogenetic network of mushroom species, www.splitstree.org
  8. 8. Hierarchy of network subclasses http://www.lirmm.fr/~gambette/RePhylogeneticNetworks.php
  9. 9. Level-k phylogenetic networks level 1 = = level 0 http://www.lirmm.fr/~gambette/RePhylogeneticNetworks.php
  10. 10. Level-k phylogenetic networks Motivation to generalize “galled trees” (= level-1) : Arenas, Valiente, Posada : Characterization of Phylogenetic Reticulate Networks based on the Coalescent with Recombination, Molecular Biology and Evolution, to appear.
  11. 11. Level-k phylogenetic networks A level-k phylogenetic network N on a set X of n taxa is a multidigraph in which: - exactly one vertex has indegree 0 and outdegree 2: the root, - all other vertices have either: - indegree 1 and outdegree 2: split vertices, - indegree 2 and outdegree ≤ 1: reticulation vertices, - or indegree 1 and outdegree 0: leaves labeled by X, - any blob has at most k reticulation vertices. N r r2 r4 All arcs are r1 r3 oriented downwards a b c d e f g h i j k
  12. 12. Level-k phylogenetic networks A level-k phylogenetic network N on a set X of n taxa is a multidigraph in which: - exactly one vertex has indegree 0 and outdegree 2: the root, - all other vertices have either: - indegree 1 and outdegree 2: split vertices, - indegree 2 and outdegree ≤ 1: reticulation vertices, - or indegree 1 and outdegree 0: leaves labeled by X, - any blob has at most k reticulation vertices. N r A blob is a maximal induced connected subgraph with no cut r2 arc. r4 r1 r3 A cut arc is an arc a b c d e f g h i j k which disconnects the graph.
  13. 13. Level-k phylogenetic networks A level-k phylogenetic network N on a set X of n taxa is a multidigraph in which: - exactly one vertex has indegree 0 and outdegree 2: the root, - all other vertices have either: - indegree 1 and outdegree 2: split vertices, - indegree 2 and outdegree ≤ 1: reticulation vertices, - or indegree 1 and outdegree 0: leaves labeled by X, - any blob has at most k reticulation vertices. N r A blob is a maximal induced connected r2 subgraph with no cut r4 arc. r1 r3 A cut arc is an arc a b c d e f g h i j k which disconnects the graph. N has level 2.
  14. 14. Outline • Phylogenetic networks • Decomposition of level-k networks • Construction of level-k generators • Number of level-k generators • Simulated level-k networks
  15. 15. Decomposition of level-k networks We formalize the decomposition into blobs: a b c d e f g h i j k a b c d e f g h i j k N, a level-k network. N decomposed as a tree of simple graph patterns: generators. Generators were introduced by van Iersel & al (Recomb 2008) for the restricted class of simple level-k networks.
  16. 16. Level-k generators A level-k generator is a level-k network with no cut arc. G0 G1 2a 2b 2c 2d The sides of the generator are: - its arcs - its reticulation vertices of outdegree 0
  17. 17. Decomposition theorem of level-k networks N is a level-k network iff there exists a sequence (lj)jϵ[1,r] of r locations (arcs or reticulation vertices of outdegree 0) and a sequence (Gj)jϵ[0,r] of generators of level at most k, such that: - N = Attachk(lr,Gr,Attachk(... Attachk(l2,G2,Attachk(l1,G1,G0))...)), - or N = Attachk(lr,Gr,Attachk(... Attachk(l2,G2,SplitRootk(G1,G0))...)).
  18. 18. Decomposition theorem of level-k networks N is a level-k network iff there exists a sequence (lj)jϵ[1,r] of r locations (arcs or reticulation vertices of outdegree 0) and a sequence (Gj)jϵ[0,r] of generators of level at most k, such that: - N = Attachk(lr,Gr,Attachk(... Attachk(l2,G2,Attachk(l1,G1,G0))...)), - or N = Attachk(lr,Gr,Attachk(... Attachk(l2,G2,SplitRootk(G1,G0))...)). SplitRootk(G1,G0) G0 G1 G0 G1
  19. 19. Decomposition theorem of level-k networks N is a level-k network iff there exists a sequence (lj)jϵ[1,r] of r locations (arcs or reticulation vertices of outdegree 0) and a sequence (Gj)jϵ[0,r] of generators of level at most k, such that: - N = Attachk(lr,Gr,Attachk(... Attachk(l2,G2,Attachk(l1,G1,G0))...)), - or N = Attachk(lr,Gr,Attachk(... Attachk(l2,G2,SplitRootk(G1,G0))...)). li is an arc of N Attachk(li,Gi,N) N Gi li Gi
  20. 20. Decomposition theorem of level-k networks N is a level-k network iff there exists a sequence (lj)jϵ[1,r] of r locations (arcs or reticulation vertices of outdegree 0) and a sequence (Gj)jϵ[0,r] of generators of level at most k, such that: - N = Attachk(lr,Gr,Attachk(... Attachk(l2,G2,Attachk(l1,G1,G0))...)), - or N = Attachk(lr,Gr,Attachk(... Attachk(l2,G2,SplitRootk(G1,G0))...)). li is a reticulation vertex of N Attachk(li,Gi,N) N Gi li Gi
  21. 21. Decomposition theorem of level-k networks N is a level-k network iff there exists a sequence (lj)jϵ[1,r] of r locations (arcs or reticulation vertices of outdegree 0) and a sequence (Gj)jϵ[0,r] of generators of level at most k, such that: - N = Attachk(lr,Gr,Attachk(... Attachk(l2,G2,Attachk(l1,G1,G0))...)), - or N = Attachk(lr,Gr,Attachk(... Attachk(l2,G2,SplitRootk(G1,G0))...)). This decomposition is not unique!
  22. 22. Decomposition theorem of level-k networks Unique “graph-labeled tree” decomposition: ρ 1 1 N 3 2 h1 h2 1 h3 h4 a b c d e f g h i a b c d e f g h i Possible applications: - exhaustive generation of level-k networks - counting of level-k networks
  23. 23. Outline • Phylogenetic networks • Decomposition of level-k networks • Construction of level-k generators • Number of level-k generators • Simulated level-k networks
  24. 24. Construction of the generators Van Iersel & al build the 4 level-2 generators by a case analysis, generalized by Steven Kelk into an exponential algorithm to find all 65 level-3 generators.
  25. 25. Construction of the generators Van Iersel & al give a simple case analysis for level-2. We give rules to build level-(k+1) from level-k generators.
  26. 26. Construction of the generators Construction rules of level-k+1 generators from level k-generators: N e2 e1 h1 h2 Rule R1 : h1 h2 h1 h2 h1 h2 h1 h2 h3 h3 h3 h3 R1(N,h1,h2) R1(N,h1,e2) R1(N,e2,e2) R1(N,e1,e2)
  27. 27. Construction of the generators Construction rules of level-k+1 generators from level k-generators: N e2 e1 h1 h2 Rule R2 : h1 h3 h2 h3 h h3 h2 h1 h1 2 R2(N,h1,e2) R2(N,e1,e1) R2(N,e2,e1)
  28. 28. Construction of the generators Problem! Some of the level-k+1 generators obtained from level-k generators are isomorphic! h1 h2 h1 h2 h3 h3 R1(2b,h1,e2) R1(2b,h2,e1)
  29. 29. Construction of the generators Problem! Some of the level-k+1 generators obtained from level-k generators are isomorphic! h1 h2 h1 h2 h3 h3 R1(N,h1,e2) R1(N,h2,e1) → difficult to count!
  30. 30. Outline • Phylogenetic networks • Decomposition of level-k networks • Construction of level-k generators • Number of level-k generators • Simulated level-k networks
  31. 31. Upper bound R1 and R2 can be applied at most on all pairs of sides A level-k generator has at most 5k slides: gk+1 < 50 k² gk Upper bound: gk < k!² 50k Theoretical corollary: There is a polynomial algorithm to build the set of level- k+1 generators from the set of level-k generators. Practical corollary: g4 < 28350 → it is possible to enumerate all level-4 generators.
  32. 32. Number of level-k generators It is possible to enumerate all level-4 generators. Isomorphism of graphs of bounded valence: polynomial (Luks, FOCS 1980) Practical algorithm? Simple backtracking exponential algorithm sufficient for level 4 : go through both graphs from their root in parallel and identify their vertices: O(n2n-h) → g4 = 1993 → g5 > 71000 http://www.lirmm.fr/~gambette/ProgramGenerators
  33. 33. Number of level-k generators
  34. 34. Lower bound Lower bound: gk ≥ 2k-1 There is an exponential number of generators! Idea: Code every number between 0 and 2k-1-1 by a level-k generator.
  35. 35. Lower bound Lower bound: gk ≥ 2k-1 There is an exponential number of generators! Idea: Code every number between 0 and 2k-1-1 by a level-k generator. 0 1 k=1 0 1
  36. 36. Lower bound Lower bound: gk ≥ 2k-1 There is an exponential number of generators! Idea: Code every number between 0 and 2k-1-1 by a level-k generator. 0 1 2 3 k=2 0 0 1 1 0 1 0 1
  37. 37. Outline • Phylogenetic networks • Decomposition of level-k networks • Construction of level-k generators • Number of level-k generators • Simulated level-k networks
  38. 38. Simulated level-k networks Simulate 1000 phylogenetic networks using the coalescent model with recombination. Arenas, Valiente, Posada 2008 Program Recodon How many are level-1,2,3... networks? nb of networks nb of networks 1000 1000 900 900 800 800 700 n=10,r=1 700 n=50,r=1 n=10,r=2 n=50,r=2 600 600 n=10,r=4 n=50,r=4 500 n=10,r=8 500 n=50,r=8 n=10,r=16 n=50,r=16 400 400 n=10,r=32 n=50,r=32 300 300 200 200 100 100 0 0 12 24 36 12 24 36 0 4 8 16 20 28 32 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100 0 4 8 16 20 28 32 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100 level level http://www.lirmm.fr/~gambette/ProgramGenerators
  39. 39. Simulated level-k networks Simulate 1000 phylogenetic networks using the coalescent model with recombination. Link between level and number of reticulations: level 9 5 0 number of reticulations 0 5 9 http://www.lirmm.fr/~gambette/ProgramGenerators
  40. 40. Summary on the level parameter Advantages: • natural structure for all explicit phylogenetic networks • global tree-structure used algorithmically • finite graph patterns to represent blobs: generators Limits: • number of generators exponential in the level • complex structure of generators • when recombination is not local, level doesn't help
  41. 41. Questions? Thank you for your attention! TreeCloud T-Rex TreeCloud Split network and reticulogram of the 50 most frequent words in CPM titles, hyperlex cooccurrence distance. TreeCloud available at http://treecloud.org - Slides available at http://www.lirmm.fr/~gambette/RePresentationsENG.php

×