SlideShare a Scribd company logo
Bastien Boussau
LBBE, CNRS, Université de Lyon
Models of gene
duplication, transfer and loss
to study genome evolution
Collaborators
Lyon collaborators:
• Adrián Arellano Davín
• Gergely Szöllősi (Budapest)
• Vincent Daubin
• Eric Tannier
• Thomas Bigot
• Magali Semeria
• Manolo Gouy
• Laurent Duret
• Nicolas Lartillot
Austin/Illinois collaborators:
• Siavash Mirarab
• Md. Shamsuzzoha Bayzid
• Tandy Warnow
RevBayes collaborators:
• Sebastian Hoehna
• Michael Landis
• Tracy Heath
• Fredrik Ronquist
• Brian Moore
• John Huelsenbeck
• …
Plan
1. Gene duplications and losses
• Mammalian genomes
2. Gene duplications, losses and transfers
• Fungi and Cyanobacteria
3. A fast approach to dealing with incomplete
lineage sorting
• Birds
4. 2 vignettes
To study genome evolution:
1. One species tree:
!
!
!
2. Thousands of gene trees:
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
To study genome evolution:
1. One species tree:
!
!
!
2. Thousands of gene trees:
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Why	
  our	
  current	
  pipeline	
  can	
  be	
  improved
Why	
  our	
  current	
  pipeline	
  can	
  be	
  improved
•Gene	
  alignments:	
  
•Error	
  prone	
  (Genes	
  are	
  
short)	
  
•Point	
  es:mates	
  
Why	
  our	
  current	
  pipeline	
  can	
  be	
  improved
•Gene	
  trees:	
  
•based	
  on	
  alignments	
  
•Point	
  es:mates	
  
•Gene	
  alignments:	
  
•Error	
  prone	
  (Genes	
  are	
  
short)	
  
•Point	
  es:mates	
  
Why	
  our	
  current	
  pipeline	
  can	
  be	
  improved
•Gene	
  trees:	
  
•based	
  on	
  alignments	
  
•Point	
  es:mates	
  
•Species	
  trees:	
  
•based	
  on	
  gene	
  trees	
  
•Gene	
  alignments:	
  
•Error	
  prone	
  (Genes	
  are	
  
short)	
  
•Point	
  es:mates	
  
Why	
  our	
  current	
  pipeline	
  can	
  be	
  improved
•Gene	
  trees:	
  
•based	
  on	
  alignments	
  
•Point	
  es:mates	
  
•Species	
  trees:	
  
•based	
  on	
  gene	
  trees	
  
•Gene	
  alignments:	
  
•Error	
  prone	
  (Genes	
  are	
  
short)	
  
•Point	
  es:mates	
  
Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
D
Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
D DL
Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
LGTD DL
Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
LGT ILSD DL
Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
LGT ILS
DL: Boussau et al., Genome Research 2013
D DL
Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
LGT ILS
DL: Boussau et al., Genome Research 2013
D DL
DL+T:!
Szöllősi et al. "
PNAS 2013
Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
LGT ILS
ILS: !
Mirarab et al.
Science 2014
DL: Boussau et al., Genome Research 2013
D DL
DL+T:!
Szöllősi et al. "
PNAS 2013
(thousands	
  of	
  alignments)
PHYLDOG
All gene families
Rooted species tree,
numbers of duplications
and losses,
rooted gene trees D1
D2
D3
D4
D5
D6
L2
L1
L4
L3
L5
L6
Joint reconstruction of
the species tree,
gene trees, and
numbers of duplications and losses
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
D1
D3
D2 D4
D5 D6
L1
L3
L2 L4
L5 L6
Boussau et al., Genome Research 2013
(thousands	
  of	
  alignments)
PHYLDOG
All gene families
Rooted species tree,
numbers of duplications
and losses,
rooted gene trees D1
D2
D3
D4
D5
D6
L2
L1
L4
L3
L5
L6
Joint reconstruction of
the species tree,
gene trees, and
numbers of duplications and losses
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
D1
D3
D2 D4
D5 D6
L1
L3
L2 L4
L5 L6
Probabilis5c	
  models:	
  
• sequence	
  evolu1on	
  
• gene	
  family	
  evolu1on
Boussau et al., Genome Research 2013
PHYLDOG: a model of
gene duplication and loss
Assumptions!
•Genes evolve along the species tree:!
•birth events:!
•duplications (rate of duplication)!
•death events:!
•losses (rate of loss)!
•Each gene family is independent of other genes!
•Each gene copy is independent of other copies!
!
!
Study	
  of	
  mammalian	
  genome	
  evolu:on
10
• Challenging	
  but	
  well-­‐studied	
  phylogeny	
  
• 36	
  mammalian	
  genomes	
  available	
  in	
  Ensembl	
  v.	
  57	
  
• About	
  7000	
  gene	
  families	
  
• Correc:on	
  for	
  poorly	
  sequenced	
  genomes
PHYLDOG finds a good species tree
Sus scrofa
Felis catus
Ornithorhynchus anatinus
Oryctolagus cuniculus
Loxodonta africana
Mus musculus
Gorilla gorilla
Dipodomys ordii
Monodelphis domestica
Vicugna pacos
Macaca mulatta
Tupaia belangeri
Procavia capensis
Spermophilus tridecemlineatus
Pongo pygmaeus
Tursiops truncatus
Microcebus murinus
Callithrix jacchus
Equus caballus
Erinaceus europaeus
Tarsius syrichta
Choloepus hoffmanni
Ochotona princeps
Cavia porcellus
Pan troglodytes
Bos taurus
Rattus norvegicus
Homo sapiens
Otolemur garnettii
Dasypus novemcinctus
Echinops telfairi
Pteropus vampyrus
Macropus eugenii
Canis familiaris
Sorex araneus
Myotis lucifugus
Laurasiatheria
Afrotheria
Xenarthra
Marsupials
Primates
Glires
Quality	
  of	
  the	
  gene	
  trees
12
Comparison	
  between:	
  
PhyML	
  (used	
  for	
  the	
  PhylomeDB	
  and	
  Homolens	
  databases	
  )	
  
TreeBeST	
  (used	
  for	
  the	
  Ensembl-­‐Compara	
  database)	
  
PHYLDOG
Two	
  approaches:	
  
• Looking	
  at	
  ancestral	
  genome	
  sizes	
  
• Assessing	
  how	
  well	
  one	
  can	
  recover	
  ancestral	
  syntenies	
  
using	
  reconstructed	
  gene	
  trees	
  (Bérard	
  et	
  al.,	
  
Bioinforma:cs	
  2012)
Sus scrofa
Felis catus
Ornithorhynchus anatinus
Oryctolagus cuniculus
Loxodonta africana
Mus musculus
Gorilla gorilla
Dipodomys ordii
Monodelphis domestica
Vicugna pacos
Macaca mulatta
Tupaia belangeri
Procavia capensis
Spermophilus tridecemlineatus
Pongo pygmaeus
Tursiops truncatus
Microcebus murinus
Callithrix jacchus
Equus caballus
Erinaceus europaeus
Tarsius syrichta
Choloepus hoffmanni
Ochotona princeps
Cavia porcellus
Pan troglodytes
Bos taurus
Rattus norvegicus
Homo sapiens
Otolemur garnettii
Dasypus novemcinctus
Echinops telfairi
Pteropus vampyrus
Macropus eugenii
Canis familiaris
Sorex araneus
Myotis lucifugus
Laurasiatheria
Afrotheria
Xenarthra
Marsupials
Primates
Glires
010000
010000
010000
010000
010000
010000
010000
PHYLDOG
TreeBeST
PhyML
PHYLDOG: better trees for better ancestral genomes
An example gene family
0.1
Ornithorhynchus anatinus
0.3
Ornithorhynchus anatinus
Mus musculus
Mus musculus
Mus musculus
Cavia porcellus
Mus musculus
Oryctolagus cuniculus
Canis familiaris
Bos taurus
Homo sapiens
Pongo pygmaeus
Oryctolagus cuniculus
Cavia porcellus
Equus caballus
Equus caballus
Bos taurus
Callithrix jacchus
Homo sapiens
Monodelphis domestica
Spermophilus tridecemlineatus
Homo sapiens
Ornithorhynchus anatinus
Ornithorhynchus anatinus
Ornithorhynchus anatinus
Ornithorhynchus anatinus
Mus musculus
Mus musculus
Ornithorhynchus anatinus
Ornithorhynchus anatinus
Mus musculus
Mus musculus
Mus musculus
Cavia porcellus
Mus musculus
Oryctolagus cuniculus
Canis familiaris
Bos taurus
Homo sapiens
Pongo pygmaeus
Oryctolagus cuniculus
Cavia porcellus
Equus caballus
Equus caballus
Bos taurus
Callithrix jacchus
Homo sapiens
Monodelphis domestica
Spermophilus tridecemlineatus
Homo sapiens
Ornithorhynchus anatinus
Ornithorhynchus anatinus
Ornithorhynchus anatinus
Ornithorhynchus anatinus
Mus musculus
Mus musculus
TreeBeST PHYLDOG
Boussau et al., Genome Research 2013
Recent improvements to PHYLDOG
• Easier installation using Cmake or a virtual machine!
• Better algorithms for gene tree inference!
• Better algorithm for starting species tree!
• Faster computations using the Phylogenetic Likelihood Library
(PLL, A. Stamatakis group)!
• Python scripts to help run the program
Plan
1. Gene duplications and losses
• Mammalian genomes
2. Gene duplications, losses and transfers
• Fungi and Cyanobacteria
3. A fast approach to dealing with incomplete
lineage sorting
• Birds
4. 2 vignettes
Species: A B C D
T
I
M
E
ILS: !
Mirarab et al.
Science 2014
DL: Boussau et al., Genome Research 2013
DL+T:!
Szöllősi et al. "
PNAS 2013
Species: A B C D
T
I
M
E
LGT ILS
ILS: !
Mirarab et al.
Science 2014
DL: Boussau et al., Genome Research 2013
D DL
DL+T:!
Szöllősi et al. "
PNAS 2013
Gene	
  transfers	
  and	
  the	
  quixo:c	
  pursuit	
  of	
  the	
  TOL
DooliYle	
  WF,	
  
	
  Science	
  1999
Gene	
  transfers	
  and	
  the	
  quixo:c	
  pursuit	
  of	
  the	
  TOL
DooliYle	
  WF,	
  
	
  Science	
  1999
Gene	
  transfers	
  and	
  the	
  quixo:c	
  pursuit	
  of	
  the	
  TOL
DooliYle	
  WF,	
  
	
  Science	
  1999
“The monistic concept of a single universal tree appears […]
increasingly obsolete. […][It is] no longer the most
scientifically productive position to hold[…][It] accounts for
only a minority of observations from genomes.”!
Bapteste, O’Malley, Beiko, Ereshefsky, Gogarten, Franklin-Hall,
Lapointe, Dupré, Dagan, Boucher, Martin, !
Biology Direct 2009.
exODT: a model of
gene duplication, transfer, and loss
Assumptions!
•Genes evolve along the species tree:!
•birth events:!
•duplications (rate of duplication)!
•transfers (rate of receiving a gene)!
•death events:!
•losses (rate of loss)!
•Each gene family is independent of other genes!
•Each gene copy is independent of other copies!
•Transfers can go through unsampled/extinct species!
!
!
exODT: a model of
gene duplication, transfer, and loss
Szöllősi et al., Syst. Biol. a 2013
exODT: a model of
gene duplication, transfer, and loss
Szöllősi et al., Syst. Biol. a 2013
Better gene trees, fewer transfers
Usual
approach
ALE
+DTL
RFdistancetorealtree
Szöllősi et al., Syst. Biol. b 2013
Better gene trees, fewer transfers
Usual
approach
ALE
+DTL
Transfereventsperfamily
Usual
approach
ALE
+DTL
RFdistancetorealtree
Szöllősi et al., Syst. Biol. b 2013
Application to real data:
Cyanobacteria and Fungi
Cyanobacteria!
• > 2.4 billion years old! !
• 40 species!
• 1,200 to 4,500 protein coding genes!
• 7,410 gene families!
!
Fungi (Dikarya)!
• ~ 1 billion years old!
• 28 species!
• 5,200 to 10,000 protein coding genes!
• 11,387 gene families!
!!
Both cases: !
• fixed species tree, gene trees inferred using the
Duplication, Transfer and Loss model! Szöllősi et al., under review
Application to real data:
Cyanobacteria and Fungi
Cyanobacteria!
• > 2.4 billion years old! !
• 40 species!
• 1,200 to 4,500 protein coding genes!
• 7,410 gene families!
!
Fungi (Dikarya)!
• ~ 1 billion years old!
• 28 species!
• 5,200 to 10,000 protein coding genes!
• 11,387 gene families!
!!
Both cases: !
• fixed species tree, gene trees inferred using the
Duplication, Transfer and Loss model!
Transfers are expected
Transfers should be less frequent
Szöllősi et al., under review
Cyanobacteria
Szöllősi et al., under review
Cyanobacteria
Szöllősi et al., under review
Cyanobacteria
0.18 transfer per gene
Szöllősi et al., under review
Fungi
Szöllősi et al., under review
Fungi
Szöllősi et al., under review
Fungi
0.07 transfer per gene
Szöllősi et al., under review
Comparing transfer rates
• Cyanobacteria and Fungi differ in their age:!
!
We can compare normalized numbers of events:!
T/(T+D)!
!
• The Cyanobacteria and Fungi data sets differ in their
number of species:!
!
We can perform rarefaction studies
Szöllősi et al., under review
Comparing transfer rates
Szöllősi et al., under review
Similar transfer rates in Fungi and
Cyanobacteria
Szöllősi et al., under review
Using transfers to date clades
?
T
I
M
E
Using transfers to date clades
?
T
I
M
E
Using transfers to date clades
?
T
I
M
E
Using transfers to date clades
?
T
I
M
E
Using transfers to date clades
?
T
I
M
E
Using transfers to date clades
?
T
I
M
E
Using transfers to date clades
?
T
I
M
E
Because we can identify gene transfers, we have information for
ordering the nodes of a species tree
Bayesian species tree inference
accounting for DTL events
• STRALE:
• A Bayesian probabilistic method that can interpret thousands of
gene trees in terms of:
• speciation events
• duplication events (D)
• transfer events (T)
• loss events (L)
• A method able to estimate the DTL rates
• A method able to reconstruct the species tree
• A method able to order the nodes of the species tree
Simulation to test the species tree reconstruction
• 20 species
• 200 gene families
1 5
1
3
1 4
1 0
6
8
1 2
1 8
1 3
5
4
2
9
0
1 1
1 9
7
1 6
1 7
2
1 3
7
1 7
1 5
1
5
1 2
1 0
1 6
1 1
9
0
4
8
3
1 4
1 9
6
1 8
Simulated Inferred
Conclusion on DTL models
• The use of DTL models shows that the number of gene
transfers has so far been overestimated
• DTL models can be used to study genome evolution
and in particular rates of gene transfer
• DTL models can be used to date the nodes of a species
phylogeny
• DTL models should provide a powerful tool to infer an
accurate account of the history of life
Plan
1. Gene duplications and losses
• Mammalian genomes
2. Gene duplications, losses and transfers
• Fungi and Cyanobacteria
3. A fast approach to dealing with incomplete
lineage sorting
• Birds
4. 2 vignettes
Species: A B C D
T
I
M
E
ILS: !
Mirarab et al.
Science 2014
DL: Boussau et al., Genome Research 2013
DL+T:!
Szöllősi et al. "
PNAS 2013
Species: A B C D
T
I
M
E
LGT ILS
ILS: !
Mirarab et al.
Science 2014
DL: Boussau et al., Genome Research 2013
D DL
DL+T:!
Szöllősi et al. "
PNAS 2013
35
The multispecies coalescent
Rannala and Yang, Genetics 2003
• Divergence times in the species tree!
• Divergence times in the gene trees!
• Effective population sizes in the species tree
Faster alternatives to the multispecies coalescent
use fixed gene trees
E.g.: MP-EST (Liu, Yu and Edwards, 2010)!
Input: fixed gene trees!
Output: species tree with branch lengths in coalescent units!
!
Has been shown to be consistent, under one notable assumption: !
gene trees are correct.
Errors in gene trees decrease the accuracy of
estimated species trees
Mirarab, Bayzid and Warnow, Syst. Biol 2014
38
Statistical binning
Mirarab et al., Science 2014
38
Statistical binning
Mirarab et al., Science 2014
MP-EST
39
Statistical binning
Mirarab et al., Science 2014
MP-EST
39
Statistical binning
Mirarab et al., Science 2014
MP-EST
MP-EST
40
Statistical binning
improves
species tree inference
Mirarab et al., Science 2014
41
Statistical binning also improves the
estimation of the gene tree distribution
Mirarab et al., Science 2014
42
Jarvis et al., Science 2014
Statistical binning and birds
43Mirarab et al., PLoS One, accepted
Improving statistical binning: weighted statistical binning
44Mirarab et al., PLoS One, accepted
Improving statistical binning: weighted statistical binning
Practice: weighted binning and unweighted binning have about the same
accuracy !
Theory: weighted statistical binning can be shown to be consistent,
unweighted statistical binning is not.
Plan
1. Gene duplications and losses
• Mammalian genomes
2. Gene duplications, losses and transfers
• Fungi and Cyanobacteria
3. A fast approach to dealing with incomplete
lineage sorting
• Birds
4. 2 vignettes
RevBayes
• R-like language
• Model-based phylogenetics
• Many models of sequence evolution
• Models for dating
• Models for phylogeography
• Models for continuous traits
• Models for gene tree/species tree inference
• http://revbayes.net
• Sebastian Hoehna
• Michael Landis
• Tracy Heath
• Fredrik Ronquist
• Nicolas Lartillot
• Brian Moore
• John Huelsenbeck
• …
One more thing..
One more thing..
One more thing..
Conclusions
• We develop methods for gene tree and species
tree inference
• Improvement of gene trees and species trees in the
presence of:
• duplications and losses,
• transfers,
• incomplete lineage sorting
• Parallel algorithms applicable to genome-scale data
• We study the evolution of life, ancient and recent
RevBayes collaborators:
• Sebastian Hoehna
• Michael Landis
• Tracy Heath
• Fredrik Ronquist
• Brian Moore
• John Huelsenbeck
• …
Lyon collaborators:
• Adrián Arellano Davín
• Gergely Szöllősi (Budapest)
• Vincent Daubin
• Eric Tannier
• Thomas Bigot
• Magali Semeria
• Manolo Gouy
• Laurent Duret
• Nicolas Lartillot
Austin/Illinois collaborators:
• Siavash Mirarab
• Md. Shamsuzzoha Bayzid
• Tandy Warnow
Thanks!

More Related Content

What's hot

ContentMine (EMBL-EBI Industry Programme)
ContentMine (EMBL-EBI Industry Programme)ContentMine (EMBL-EBI Industry Programme)
ContentMine (EMBL-EBI Industry Programme)
Jenny Molloy
 
Talk at Institut Jean Nicod on 6 October 2010
Talk at Institut Jean Nicod on 6 October 2010Talk at Institut Jean Nicod on 6 October 2010
Talk at Institut Jean Nicod on 6 October 2010
Robin Ryder
 
A. meiosis check your learning
A. meiosis   check your learningA. meiosis   check your learning
A. meiosis check your learning
kcangial
 
So many different kinds of mistakes: Or why systematic error is the 21st cent...
So many different kinds of mistakes: Or why systematic error is the 21st cent...So many different kinds of mistakes: Or why systematic error is the 21st cent...
So many different kinds of mistakes: Or why systematic error is the 21st cent...
Liliana Davalos
 
D. genes and protein check your learning
D. genes and protein   check your learningD. genes and protein   check your learning
D. genes and protein check your learning
kcangial
 
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Ellinor Michel
 
Surfacing the deep data of taxonomy
Surfacing the deep data of taxonomySurfacing the deep data of taxonomy
Surfacing the deep data of taxonomy
Roderic Page
 
What are we DOIng about the missing links? Connecting taxonomic names to the ...
What are we DOIng about the missing links? Connecting taxonomic names to the ...What are we DOIng about the missing links? Connecting taxonomic names to the ...
What are we DOIng about the missing links? Connecting taxonomic names to the ...
Nicole Kearney
 
Microbial Phylogenomics (EVE161) Class 4
Microbial Phylogenomics (EVE161) Class 4Microbial Phylogenomics (EVE161) Class 4
Microbial Phylogenomics (EVE161) Class 4
Jonathan Eisen
 

What's hot (11)

ContentMine (EMBL-EBI Industry Programme)
ContentMine (EMBL-EBI Industry Programme)ContentMine (EMBL-EBI Industry Programme)
ContentMine (EMBL-EBI Industry Programme)
 
Talk at Institut Jean Nicod on 6 October 2010
Talk at Institut Jean Nicod on 6 October 2010Talk at Institut Jean Nicod on 6 October 2010
Talk at Institut Jean Nicod on 6 October 2010
 
A. meiosis check your learning
A. meiosis   check your learningA. meiosis   check your learning
A. meiosis check your learning
 
So many different kinds of mistakes: Or why systematic error is the 21st cent...
So many different kinds of mistakes: Or why systematic error is the 21st cent...So many different kinds of mistakes: Or why systematic error is the 21st cent...
So many different kinds of mistakes: Or why systematic error is the 21st cent...
 
D. genes and protein check your learning
D. genes and protein   check your learningD. genes and protein   check your learning
D. genes and protein check your learning
 
Improving Interoperability of Text Mining Tools with BioC
Improving Interoperability of Text Mining Tools with BioCImproving Interoperability of Text Mining Tools with BioC
Improving Interoperability of Text Mining Tools with BioC
 
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
 
Surfacing the deep data of taxonomy
Surfacing the deep data of taxonomySurfacing the deep data of taxonomy
Surfacing the deep data of taxonomy
 
PaulaTataruCSHL
PaulaTataruCSHLPaulaTataruCSHL
PaulaTataruCSHL
 
What are we DOIng about the missing links? Connecting taxonomic names to the ...
What are we DOIng about the missing links? Connecting taxonomic names to the ...What are we DOIng about the missing links? Connecting taxonomic names to the ...
What are we DOIng about the missing links? Connecting taxonomic names to the ...
 
Microbial Phylogenomics (EVE161) Class 4
Microbial Phylogenomics (EVE161) Class 4Microbial Phylogenomics (EVE161) Class 4
Microbial Phylogenomics (EVE161) Class 4
 

Similar to Models of gene duplication, transfer and loss to study genome evolution

When models mislead
When models misleadWhen models mislead
When models mislead
Liliana Davalos
 
Nemes and Price 2015
Nemes and Price 2015Nemes and Price 2015
Nemes and Price 2015Simone Nemes
 
So many different kinds of mistakes
So many different kinds of mistakesSo many different kinds of mistakes
So many different kinds of mistakes
Liliana Davalos
 
2016 10-27 timbers
2016 10-27 timbers2016 10-27 timbers
2016 10-27 timbers
Tiffany Timbers
 
Dna barcoding
Dna  barcoding Dna  barcoding
Dna barcoding
Kandhan Sankaranarayanan
 
Bioinformatica 24-11-2011-t6-phylogenetics
Bioinformatica 24-11-2011-t6-phylogeneticsBioinformatica 24-11-2011-t6-phylogenetics
Bioinformatica 24-11-2011-t6-phylogenetics
Prof. Wim Van Criekinge
 
"Searching for Novel Forms of Life" talk by Jonathan Eisen 12/16/15
"Searching for Novel Forms of Life" talk by Jonathan Eisen 12/16/15"Searching for Novel Forms of Life" talk by Jonathan Eisen 12/16/15
"Searching for Novel Forms of Life" talk by Jonathan Eisen 12/16/15
Jonathan Eisen
 
The Right Answers to the Wrong Questions
The Right Answers to the Wrong QuestionsThe Right Answers to the Wrong Questions
The Right Answers to the Wrong Questions
Liliana Davalos
 
Sally Adamowicz - Invertebrates Plenary
Sally Adamowicz - Invertebrates PlenarySally Adamowicz - Invertebrates Plenary
Sally Adamowicz - Invertebrates Plenary
Consortium for the Barcode of Life (CBOL)
 
Cave animals at the dawn of speleogenomics
Cave animals at the dawn of speleogenomicsCave animals at the dawn of speleogenomics
Cave animals at the dawn of speleogenomics
friedrichwsu
 
Mitochondrial DNA in Taxonomy and Phylogeny
Mitochondrial DNA in Taxonomy and PhylogenyMitochondrial DNA in Taxonomy and Phylogeny
Mitochondrial DNA in Taxonomy and Phylogeny
Rachel Jacob
 
FOR CO EVIDENCE OF EVOLUTION.ppt
FOR CO EVIDENCE OF EVOLUTION.pptFOR CO EVIDENCE OF EVOLUTION.ppt
FOR CO EVIDENCE OF EVOLUTION.ppt
paolo Macarayo
 
Whole genome duplication and diversification of plant genomes
Whole genome duplication and diversification of plant genomesWhole genome duplication and diversification of plant genomes
Whole genome duplication and diversification of plant genomesSimonRB
 
Biol102 chp26-pp-spr10-100312094514-phpapp02
Biol102 chp26-pp-spr10-100312094514-phpapp02Biol102 chp26-pp-spr10-100312094514-phpapp02
Biol102 chp26-pp-spr10-100312094514-phpapp02
Cleophas Rwemera
 
Biol102 chp26-pp-spr10-100312094514-phpapp02
Biol102 chp26-pp-spr10-100312094514-phpapp02Biol102 chp26-pp-spr10-100312094514-phpapp02
Biol102 chp26-pp-spr10-100312094514-phpapp02
Cleophas Rwemera
 

Similar to Models of gene duplication, transfer and loss to study genome evolution (20)

U1 and U2 Exam Review from 28May
U1 and U2 Exam Review from 28MayU1 and U2 Exam Review from 28May
U1 and U2 Exam Review from 28May
 
When models mislead
When models misleadWhen models mislead
When models mislead
 
Nemes and Price 2015
Nemes and Price 2015Nemes and Price 2015
Nemes and Price 2015
 
So many different kinds of mistakes
So many different kinds of mistakesSo many different kinds of mistakes
So many different kinds of mistakes
 
2016 10-27 timbers
2016 10-27 timbers2016 10-27 timbers
2016 10-27 timbers
 
Dna barcoding
Dna  barcoding Dna  barcoding
Dna barcoding
 
Bioinformatica 24-11-2011-t6-phylogenetics
Bioinformatica 24-11-2011-t6-phylogeneticsBioinformatica 24-11-2011-t6-phylogenetics
Bioinformatica 24-11-2011-t6-phylogenetics
 
Taxonomy
TaxonomyTaxonomy
Taxonomy
 
"Searching for Novel Forms of Life" talk by Jonathan Eisen 12/16/15
"Searching for Novel Forms of Life" talk by Jonathan Eisen 12/16/15"Searching for Novel Forms of Life" talk by Jonathan Eisen 12/16/15
"Searching for Novel Forms of Life" talk by Jonathan Eisen 12/16/15
 
The Right Answers to the Wrong Questions
The Right Answers to the Wrong QuestionsThe Right Answers to the Wrong Questions
The Right Answers to the Wrong Questions
 
Sally Adamowicz - Invertebrates Plenary
Sally Adamowicz - Invertebrates PlenarySally Adamowicz - Invertebrates Plenary
Sally Adamowicz - Invertebrates Plenary
 
K.A. Seifert - Algae, Protists & Fungi Plenary
K.A. Seifert - Algae, Protists & Fungi PlenaryK.A. Seifert - Algae, Protists & Fungi Plenary
K.A. Seifert - Algae, Protists & Fungi Plenary
 
Cave animals at the dawn of speleogenomics
Cave animals at the dawn of speleogenomicsCave animals at the dawn of speleogenomics
Cave animals at the dawn of speleogenomics
 
Hereditas
HereditasHereditas
Hereditas
 
Mitochondrial DNA in Taxonomy and Phylogeny
Mitochondrial DNA in Taxonomy and PhylogenyMitochondrial DNA in Taxonomy and Phylogeny
Mitochondrial DNA in Taxonomy and Phylogeny
 
FOR CO EVIDENCE OF EVOLUTION.ppt
FOR CO EVIDENCE OF EVOLUTION.pptFOR CO EVIDENCE OF EVOLUTION.ppt
FOR CO EVIDENCE OF EVOLUTION.ppt
 
Ch10 molevo
Ch10 molevoCh10 molevo
Ch10 molevo
 
Whole genome duplication and diversification of plant genomes
Whole genome duplication and diversification of plant genomesWhole genome duplication and diversification of plant genomes
Whole genome duplication and diversification of plant genomes
 
Biol102 chp26-pp-spr10-100312094514-phpapp02
Biol102 chp26-pp-spr10-100312094514-phpapp02Biol102 chp26-pp-spr10-100312094514-phpapp02
Biol102 chp26-pp-spr10-100312094514-phpapp02
 
Biol102 chp26-pp-spr10-100312094514-phpapp02
Biol102 chp26-pp-spr10-100312094514-phpapp02Biol102 chp26-pp-spr10-100312094514-phpapp02
Biol102 chp26-pp-spr10-100312094514-phpapp02
 

Recently uploaded

Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
ssuserbfdca9
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 

Recently uploaded (20)

Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 

Models of gene duplication, transfer and loss to study genome evolution

  • 1. Bastien Boussau LBBE, CNRS, Université de Lyon Models of gene duplication, transfer and loss to study genome evolution
  • 2. Collaborators Lyon collaborators: • Adrián Arellano Davín • Gergely Szöllősi (Budapest) • Vincent Daubin • Eric Tannier • Thomas Bigot • Magali Semeria • Manolo Gouy • Laurent Duret • Nicolas Lartillot Austin/Illinois collaborators: • Siavash Mirarab • Md. Shamsuzzoha Bayzid • Tandy Warnow RevBayes collaborators: • Sebastian Hoehna • Michael Landis • Tracy Heath • Fredrik Ronquist • Brian Moore • John Huelsenbeck • …
  • 3. Plan 1. Gene duplications and losses • Mammalian genomes 2. Gene duplications, losses and transfers • Fungi and Cyanobacteria 3. A fast approach to dealing with incomplete lineage sorting • Birds 4. 2 vignettes
  • 4. To study genome evolution: 1. One species tree: ! ! ! 2. Thousands of gene trees: Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E
  • 5. To study genome evolution: 1. One species tree: ! ! ! 2. Thousands of gene trees: Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E
  • 6. Why  our  current  pipeline  can  be  improved
  • 7. Why  our  current  pipeline  can  be  improved •Gene  alignments:   •Error  prone  (Genes  are   short)   •Point  es:mates  
  • 8. Why  our  current  pipeline  can  be  improved •Gene  trees:   •based  on  alignments   •Point  es:mates   •Gene  alignments:   •Error  prone  (Genes  are   short)   •Point  es:mates  
  • 9. Why  our  current  pipeline  can  be  improved •Gene  trees:   •based  on  alignments   •Point  es:mates   •Species  trees:   •based  on  gene  trees   •Gene  alignments:   •Error  prone  (Genes  are   short)   •Point  es:mates  
  • 10. Why  our  current  pipeline  can  be  improved •Gene  trees:   •based  on  alignments   •Point  es:mates   •Species  trees:   •based  on  gene  trees   •Gene  alignments:   •Error  prone  (Genes  are   short)   •Point  es:mates  
  • 11. Species: A B C D T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E
  • 12. Species: A B C D T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E
  • 13. Species: A B C D T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E
  • 14. Species: A B C D T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E D
  • 15. Species: A B C D T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E D DL
  • 16. Species: A B C D T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E LGTD DL
  • 17. Species: A B C D T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E LGT ILSD DL
  • 18. Species: A B C D T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E LGT ILS DL: Boussau et al., Genome Research 2013 D DL
  • 19. Species: A B C D T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E LGT ILS DL: Boussau et al., Genome Research 2013 D DL DL+T:! Szöllősi et al. " PNAS 2013
  • 20. Species: A B C D T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E LGT ILS ILS: ! Mirarab et al. Science 2014 DL: Boussau et al., Genome Research 2013 D DL DL+T:! Szöllősi et al. " PNAS 2013
  • 21. (thousands  of  alignments) PHYLDOG All gene families Rooted species tree, numbers of duplications and losses, rooted gene trees D1 D2 D3 D4 D5 D6 L2 L1 L4 L3 L5 L6 Joint reconstruction of the species tree, gene trees, and numbers of duplications and losses Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E D1 D3 D2 D4 D5 D6 L1 L3 L2 L4 L5 L6 Boussau et al., Genome Research 2013
  • 22. (thousands  of  alignments) PHYLDOG All gene families Rooted species tree, numbers of duplications and losses, rooted gene trees D1 D2 D3 D4 D5 D6 L2 L1 L4 L3 L5 L6 Joint reconstruction of the species tree, gene trees, and numbers of duplications and losses Species: A B C D Discrete character: Continuous character: a a b a 0.1 0.2 0.2 0.4 T I M E D1 D3 D2 D4 D5 D6 L1 L3 L2 L4 L5 L6 Probabilis5c  models:   • sequence  evolu1on   • gene  family  evolu1on Boussau et al., Genome Research 2013
  • 23. PHYLDOG: a model of gene duplication and loss Assumptions! •Genes evolve along the species tree:! •birth events:! •duplications (rate of duplication)! •death events:! •losses (rate of loss)! •Each gene family is independent of other genes! •Each gene copy is independent of other copies! ! !
  • 24. Study  of  mammalian  genome  evolu:on 10 • Challenging  but  well-­‐studied  phylogeny   • 36  mammalian  genomes  available  in  Ensembl  v.  57   • About  7000  gene  families   • Correc:on  for  poorly  sequenced  genomes
  • 25. PHYLDOG finds a good species tree Sus scrofa Felis catus Ornithorhynchus anatinus Oryctolagus cuniculus Loxodonta africana Mus musculus Gorilla gorilla Dipodomys ordii Monodelphis domestica Vicugna pacos Macaca mulatta Tupaia belangeri Procavia capensis Spermophilus tridecemlineatus Pongo pygmaeus Tursiops truncatus Microcebus murinus Callithrix jacchus Equus caballus Erinaceus europaeus Tarsius syrichta Choloepus hoffmanni Ochotona princeps Cavia porcellus Pan troglodytes Bos taurus Rattus norvegicus Homo sapiens Otolemur garnettii Dasypus novemcinctus Echinops telfairi Pteropus vampyrus Macropus eugenii Canis familiaris Sorex araneus Myotis lucifugus Laurasiatheria Afrotheria Xenarthra Marsupials Primates Glires
  • 26. Quality  of  the  gene  trees 12 Comparison  between:   PhyML  (used  for  the  PhylomeDB  and  Homolens  databases  )   TreeBeST  (used  for  the  Ensembl-­‐Compara  database)   PHYLDOG Two  approaches:   • Looking  at  ancestral  genome  sizes   • Assessing  how  well  one  can  recover  ancestral  syntenies   using  reconstructed  gene  trees  (Bérard  et  al.,   Bioinforma:cs  2012)
  • 27. Sus scrofa Felis catus Ornithorhynchus anatinus Oryctolagus cuniculus Loxodonta africana Mus musculus Gorilla gorilla Dipodomys ordii Monodelphis domestica Vicugna pacos Macaca mulatta Tupaia belangeri Procavia capensis Spermophilus tridecemlineatus Pongo pygmaeus Tursiops truncatus Microcebus murinus Callithrix jacchus Equus caballus Erinaceus europaeus Tarsius syrichta Choloepus hoffmanni Ochotona princeps Cavia porcellus Pan troglodytes Bos taurus Rattus norvegicus Homo sapiens Otolemur garnettii Dasypus novemcinctus Echinops telfairi Pteropus vampyrus Macropus eugenii Canis familiaris Sorex araneus Myotis lucifugus Laurasiatheria Afrotheria Xenarthra Marsupials Primates Glires 010000 010000 010000 010000 010000 010000 010000 PHYLDOG TreeBeST PhyML PHYLDOG: better trees for better ancestral genomes
  • 28. An example gene family 0.1 Ornithorhynchus anatinus 0.3 Ornithorhynchus anatinus Mus musculus Mus musculus Mus musculus Cavia porcellus Mus musculus Oryctolagus cuniculus Canis familiaris Bos taurus Homo sapiens Pongo pygmaeus Oryctolagus cuniculus Cavia porcellus Equus caballus Equus caballus Bos taurus Callithrix jacchus Homo sapiens Monodelphis domestica Spermophilus tridecemlineatus Homo sapiens Ornithorhynchus anatinus Ornithorhynchus anatinus Ornithorhynchus anatinus Ornithorhynchus anatinus Mus musculus Mus musculus Ornithorhynchus anatinus Ornithorhynchus anatinus Mus musculus Mus musculus Mus musculus Cavia porcellus Mus musculus Oryctolagus cuniculus Canis familiaris Bos taurus Homo sapiens Pongo pygmaeus Oryctolagus cuniculus Cavia porcellus Equus caballus Equus caballus Bos taurus Callithrix jacchus Homo sapiens Monodelphis domestica Spermophilus tridecemlineatus Homo sapiens Ornithorhynchus anatinus Ornithorhynchus anatinus Ornithorhynchus anatinus Ornithorhynchus anatinus Mus musculus Mus musculus TreeBeST PHYLDOG Boussau et al., Genome Research 2013
  • 29. Recent improvements to PHYLDOG • Easier installation using Cmake or a virtual machine! • Better algorithms for gene tree inference! • Better algorithm for starting species tree! • Faster computations using the Phylogenetic Likelihood Library (PLL, A. Stamatakis group)! • Python scripts to help run the program
  • 30. Plan 1. Gene duplications and losses • Mammalian genomes 2. Gene duplications, losses and transfers • Fungi and Cyanobacteria 3. A fast approach to dealing with incomplete lineage sorting • Birds 4. 2 vignettes
  • 31. Species: A B C D T I M E ILS: ! Mirarab et al. Science 2014 DL: Boussau et al., Genome Research 2013 DL+T:! Szöllősi et al. " PNAS 2013
  • 32. Species: A B C D T I M E LGT ILS ILS: ! Mirarab et al. Science 2014 DL: Boussau et al., Genome Research 2013 D DL DL+T:! Szöllősi et al. " PNAS 2013
  • 33. Gene  transfers  and  the  quixo:c  pursuit  of  the  TOL DooliYle  WF,    Science  1999
  • 34. Gene  transfers  and  the  quixo:c  pursuit  of  the  TOL DooliYle  WF,    Science  1999
  • 35. Gene  transfers  and  the  quixo:c  pursuit  of  the  TOL DooliYle  WF,    Science  1999 “The monistic concept of a single universal tree appears […] increasingly obsolete. […][It is] no longer the most scientifically productive position to hold[…][It] accounts for only a minority of observations from genomes.”! Bapteste, O’Malley, Beiko, Ereshefsky, Gogarten, Franklin-Hall, Lapointe, Dupré, Dagan, Boucher, Martin, ! Biology Direct 2009.
  • 36. exODT: a model of gene duplication, transfer, and loss Assumptions! •Genes evolve along the species tree:! •birth events:! •duplications (rate of duplication)! •transfers (rate of receiving a gene)! •death events:! •losses (rate of loss)! •Each gene family is independent of other genes! •Each gene copy is independent of other copies! •Transfers can go through unsampled/extinct species! ! !
  • 37. exODT: a model of gene duplication, transfer, and loss Szöllősi et al., Syst. Biol. a 2013
  • 38. exODT: a model of gene duplication, transfer, and loss Szöllősi et al., Syst. Biol. a 2013
  • 39. Better gene trees, fewer transfers Usual approach ALE +DTL RFdistancetorealtree Szöllősi et al., Syst. Biol. b 2013
  • 40. Better gene trees, fewer transfers Usual approach ALE +DTL Transfereventsperfamily Usual approach ALE +DTL RFdistancetorealtree Szöllősi et al., Syst. Biol. b 2013
  • 41. Application to real data: Cyanobacteria and Fungi Cyanobacteria! • > 2.4 billion years old! ! • 40 species! • 1,200 to 4,500 protein coding genes! • 7,410 gene families! ! Fungi (Dikarya)! • ~ 1 billion years old! • 28 species! • 5,200 to 10,000 protein coding genes! • 11,387 gene families! !! Both cases: ! • fixed species tree, gene trees inferred using the Duplication, Transfer and Loss model! Szöllősi et al., under review
  • 42. Application to real data: Cyanobacteria and Fungi Cyanobacteria! • > 2.4 billion years old! ! • 40 species! • 1,200 to 4,500 protein coding genes! • 7,410 gene families! ! Fungi (Dikarya)! • ~ 1 billion years old! • 28 species! • 5,200 to 10,000 protein coding genes! • 11,387 gene families! !! Both cases: ! • fixed species tree, gene trees inferred using the Duplication, Transfer and Loss model! Transfers are expected Transfers should be less frequent Szöllősi et al., under review
  • 45. Cyanobacteria 0.18 transfer per gene Szöllősi et al., under review
  • 46. Fungi Szöllősi et al., under review
  • 47. Fungi Szöllősi et al., under review
  • 48. Fungi 0.07 transfer per gene Szöllősi et al., under review
  • 49. Comparing transfer rates • Cyanobacteria and Fungi differ in their age:! ! We can compare normalized numbers of events:! T/(T+D)! ! • The Cyanobacteria and Fungi data sets differ in their number of species:! ! We can perform rarefaction studies Szöllősi et al., under review
  • 50. Comparing transfer rates Szöllősi et al., under review
  • 51. Similar transfer rates in Fungi and Cyanobacteria Szöllősi et al., under review
  • 52. Using transfers to date clades ? T I M E
  • 53. Using transfers to date clades ? T I M E
  • 54. Using transfers to date clades ? T I M E
  • 55. Using transfers to date clades ? T I M E
  • 56. Using transfers to date clades ? T I M E
  • 57. Using transfers to date clades ? T I M E
  • 58. Using transfers to date clades ? T I M E Because we can identify gene transfers, we have information for ordering the nodes of a species tree
  • 59. Bayesian species tree inference accounting for DTL events • STRALE: • A Bayesian probabilistic method that can interpret thousands of gene trees in terms of: • speciation events • duplication events (D) • transfer events (T) • loss events (L) • A method able to estimate the DTL rates • A method able to reconstruct the species tree • A method able to order the nodes of the species tree
  • 60. Simulation to test the species tree reconstruction • 20 species • 200 gene families 1 5 1 3 1 4 1 0 6 8 1 2 1 8 1 3 5 4 2 9 0 1 1 1 9 7 1 6 1 7 2 1 3 7 1 7 1 5 1 5 1 2 1 0 1 6 1 1 9 0 4 8 3 1 4 1 9 6 1 8 Simulated Inferred
  • 61. Conclusion on DTL models • The use of DTL models shows that the number of gene transfers has so far been overestimated • DTL models can be used to study genome evolution and in particular rates of gene transfer • DTL models can be used to date the nodes of a species phylogeny • DTL models should provide a powerful tool to infer an accurate account of the history of life
  • 62. Plan 1. Gene duplications and losses • Mammalian genomes 2. Gene duplications, losses and transfers • Fungi and Cyanobacteria 3. A fast approach to dealing with incomplete lineage sorting • Birds 4. 2 vignettes
  • 63. Species: A B C D T I M E ILS: ! Mirarab et al. Science 2014 DL: Boussau et al., Genome Research 2013 DL+T:! Szöllősi et al. " PNAS 2013
  • 64. Species: A B C D T I M E LGT ILS ILS: ! Mirarab et al. Science 2014 DL: Boussau et al., Genome Research 2013 D DL DL+T:! Szöllősi et al. " PNAS 2013
  • 65. 35 The multispecies coalescent Rannala and Yang, Genetics 2003 • Divergence times in the species tree! • Divergence times in the gene trees! • Effective population sizes in the species tree
  • 66. Faster alternatives to the multispecies coalescent use fixed gene trees E.g.: MP-EST (Liu, Yu and Edwards, 2010)! Input: fixed gene trees! Output: species tree with branch lengths in coalescent units! ! Has been shown to be consistent, under one notable assumption: ! gene trees are correct.
  • 67. Errors in gene trees decrease the accuracy of estimated species trees Mirarab, Bayzid and Warnow, Syst. Biol 2014
  • 69. 38 Statistical binning Mirarab et al., Science 2014 MP-EST
  • 70. 39 Statistical binning Mirarab et al., Science 2014 MP-EST
  • 71. 39 Statistical binning Mirarab et al., Science 2014 MP-EST MP-EST
  • 72. 40 Statistical binning improves species tree inference Mirarab et al., Science 2014
  • 73. 41 Statistical binning also improves the estimation of the gene tree distribution Mirarab et al., Science 2014
  • 74. 42 Jarvis et al., Science 2014 Statistical binning and birds
  • 75. 43Mirarab et al., PLoS One, accepted Improving statistical binning: weighted statistical binning
  • 76. 44Mirarab et al., PLoS One, accepted Improving statistical binning: weighted statistical binning Practice: weighted binning and unweighted binning have about the same accuracy ! Theory: weighted statistical binning can be shown to be consistent, unweighted statistical binning is not.
  • 77. Plan 1. Gene duplications and losses • Mammalian genomes 2. Gene duplications, losses and transfers • Fungi and Cyanobacteria 3. A fast approach to dealing with incomplete lineage sorting • Birds 4. 2 vignettes
  • 78. RevBayes • R-like language • Model-based phylogenetics • Many models of sequence evolution • Models for dating • Models for phylogeography • Models for continuous traits • Models for gene tree/species tree inference • http://revbayes.net • Sebastian Hoehna • Michael Landis • Tracy Heath • Fredrik Ronquist • Nicolas Lartillot • Brian Moore • John Huelsenbeck • …
  • 82. Conclusions • We develop methods for gene tree and species tree inference • Improvement of gene trees and species trees in the presence of: • duplications and losses, • transfers, • incomplete lineage sorting • Parallel algorithms applicable to genome-scale data • We study the evolution of life, ancient and recent
  • 83. RevBayes collaborators: • Sebastian Hoehna • Michael Landis • Tracy Heath • Fredrik Ronquist • Brian Moore • John Huelsenbeck • … Lyon collaborators: • Adrián Arellano Davín • Gergely Szöllősi (Budapest) • Vincent Daubin • Eric Tannier • Thomas Bigot • Magali Semeria • Manolo Gouy • Laurent Duret • Nicolas Lartillot Austin/Illinois collaborators: • Siavash Mirarab • Md. Shamsuzzoha Bayzid • Tandy Warnow Thanks!