Overview of the approaches I co-developed to reconstruct species trees and gene trees, in the presence of gene duplications, losses and transfers, or incomplete lineage sorting. Includes Phyldog, ALE, MP-EST*, RevBayes.
2. Collaborators
• Lyon collaborators:
• Adrián Arellano Davín
• Gergely Szöllősi (Budapest),
• Eric Tannier,
• Vincent Daubin,
• Thomas Bigot,
• Magali Semeria,
• Manolo Gouy,
• Laurent Duret
• Austin collaborators:
• Siavash Mirarab
• Md. Shamsuzzoha Bayzid
• Tandy Warnow
• RevBayes collaborators:
• Sebastian Hoehna
• Michael Landis
• Tracy Heath
• Fredrik Ronquist
• Brian Moore
• John Huelsenbeck
• …
3. To study genome evolution:
1. One species tree:
!
!
!
2. Thousands of gene trees:
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
4. To study genome evolution:
1. One species tree:
!
!
!
2. Thousands of gene trees:
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
6. Why our current pipeline can be improved
•Gene alignments:
•Error prone
•Short
•Point estimates
7. Why our current pipeline can be improved
•Gene alignments:
•Error prone
•Short
•Point estimates
•Gene trees:
•based on alignments
•Point estimates
8. Why our current pipeline can be improved
•Gene alignments:
•Error prone
•Short
•Point estimates
•Gene trees:
•based on alignments
•Point estimates
•Species trees:
•based on gene trees
9. Why our current pipeline can be improved
•Gene alignments:
•Error prone
•Short
•Point estimates
•Gene trees:
•based on alignments
•Point estimates
•Species trees:
•based on gene trees
10. Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
11. Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
12. Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
13. Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
D
14. Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
D DL
15. Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
LGTD DL
16. Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
LGT ILSD DL
17. Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
LGT ILS
DL: Boussau et al., Genome Research 2013
D DL
18. Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
LGT ILS
DL: Boussau et al., Genome Research 2013
D DL
DL+T:!
Szöllősi et al. "
PNAS 2013
19. Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
LGT ILS
ILS: !
Mirarab et al.
Science 2014
DL: Boussau et al., Genome Research 2013
D DL
DL+T:!
Szöllősi et al. "
PNAS 2013
20. (thousands of alignments)
PHYLDOG
All gene families
Rooted species tree,
numbers of duplications
and losses,
rooted gene trees D1
D2
D3
D4
D5
D6
L2
L1
L4
L3
L5
L6
Joint reconstruction of
the species tree,
gene trees, and
numbers of duplications and losses
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
D1
D3
D2 D4
D5 D6
L1
L3
L2 L4
L5 L6
Boussau et al., Genome Research 2013
21. (thousands of alignments)
PHYLDOG
All gene families
Rooted species tree,
numbers of duplications
and losses,
rooted gene trees D1
D2
D3
D4
D5
D6
L2
L1
L4
L3
L5
L6
Joint reconstruction of
the species tree,
gene trees, and
numbers of duplications and losses
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
D1
D3
D2 D4
D5 D6
L1
L3
L2 L4
L5 L6
Probabilistic models:
• sequence evolution
• gene family evolution
Boussau et al., Genome Research 2013
24. Species: A B C D
T
I
M
E
ILS: !
Mirarab et al.
Science 2014
DL: Boussau et al., Genome Research 2013
DL+T:!
Szöllősi et al. "
PNAS 2013
25. Species: A B C D
T
I
M
E
LGT ILS
ILS: !
Mirarab et al.
Science 2014
DL: Boussau et al., Genome Research 2013
D DL
DL+T:!
Szöllősi et al. "
PNAS 2013
26. Gene transfers and the quixotic pursuit of the TOL
Doolittle WF,
Science 1999
27. Gene transfers and the quixotic pursuit of the TOL
Doolittle WF,
Science 1999
28. Gene transfers and the quixotic pursuit of the TOL
Doolittle WF,
Science 1999
“The monistic concept of a single universal tree appears […]
increasingly obsolete. […][It is] no longer the most
scientifically productive position to hold[…][It] accounts for
only a minority of observations from genomes.”!
Bapteste, O’Malley, Beiko, Ereshefsky, Gogarten, Franklin-Hall,
Lapointe, Dupré, Dagan, Boucher, Martin, !
Biology Direct 2009.
35. Using transfers to date clades
?
T
I
M
E
Because we can identify gene transfers, we have information for
ordering the nodes of a species tree
36. Bayesian species tree inference
accounting for DTL events
• STRALE:
• A Bayesian probabilistic method that can interpret thousands of
gene trees in terms of:
• speciation events
• duplication events (D)
• transfer events (T)
• loss events (L)
• A method able to estimate the DTL rates
• A method able to reconstruct the species tree
• A method able to order the nodes of the species tree
38. Better gene trees, fewer transfers
Usual
approach
ALE
+DTL
RFdistancetorealtree
Szöllősi et al., Syst. Biol. 2013
39. Better gene trees, fewer transfers
Usual
approach
ALE
+DTL
Transfereventsperfamily
Usual
approach
ALE
+DTL
RFdistancetorealtree
Szöllősi et al., Syst. Biol. 2013
40. Better gene trees, fewer transfers
Usual
approach
ALE
+DTL
Transfereventsperfamily
Usual
approach
ALE
+DTL
RFdistancetorealtree
Szöllősi et al., Syst. Biol. 2013
Better ancestral genomes:
go see Adrián Arellano Davín’s poster on
reconstructing ancestral genomes across the
tree of life!
41. Species: A B C D
T
I
M
E
ILS: !
Mirarab et al.
Science 2014
DL: Boussau et al., Genome Research 2013
DL+T:!
Szöllősi et al. "
PNAS 2013
42. Species: A B C D
T
I
M
E
LGT ILS
ILS: !
Mirarab et al.
Science 2014
DL: Boussau et al., Genome Research 2013
D DL
DL+T:!
Szöllősi et al. "
PNAS 2013
50. RevBayes
• Collaborative effort
• Model-based phylogenetics
• Many models of sequence evolution
• Models for dating
• Models for phylogeography
• Models for continuous traits
• Models for gene tree/species tree inference
• http://revbayes.net
• Sebastian Hoehna
• Michael Landis
• Tracy Heath
• Fredrik Ronquist
• Nicolas Lartillot
• Brian Moore
• John Huelsenbeck
• …
51. Conclusions
• We develop methods for gene tree and species
tree inference
• Improvement of gene trees and species trees in
the presence of:
• duplications and losses,
• transfers,
• incomplete lineage sorting
• Parallel algorithms applicable to genome-scale
data