LBBE, CNRS, Université de Lyon
Evolutionary genomics
Bastien Boussau
boussau@gmail.com
@bastounette
Chance and necessity
…ATCGACATCAGCATCAGCACTAC…
Chance and necessity
… …
Evolution
…ATCGACATCAGCATCAGCACTAC…
Chance and necessity
… …
Evolution
Function
…ATCGACATCAGCATCAGCACTAC…
3
Evolution in our genomes
3
Brown, Sanger, Kitai. Biochem. J. 1955.
3
Evolution in our genomes
3
Brown, Sanger, Kitai. Biochem. J. 1955.
3
Evolution in our genomes
3
Brown, Sanger, Kitai. Biochem. J. 1955.
Genomes as Documents of
Evolutionary History
4
What information can we extract
from genome sequences?
5
1. Species phylogeny!
2. Phylogeography!
3. Diversification history!
4. Ancestral lifestyles!
5. Selective pressures in extant species!
6. Application to cell lineages
Genome evolution
6
Genome sequence
Processes
…ACTCGATCGCATCGACTCCTCCAGC…
Genome evolution
6
Genome sequence
point mutations
Processes
…ACTCGTTCGCATCGACTCCTCCAGC…
Genome evolution
7
Genome sequence
point mutations
Processes
…ACTCGATCGCATCGACTCCTCCAGC…
Genome evolution
7
Genome sequence
point mutations insertions/deletions
Processes
…ACTCGATCGCATCGAAAACTCCTCCAGC…
Genome evolution
8
Genome sequence
point mutations insertions/deletions
Processes
…ACTCGATCGCATCGACTCCTCCAGC…
Genome evolution
8
Genome sequence
point mutations insertions/deletions duplications/losses
Processes
…ACTCGATCGCATCGACTCCTTCCTCCAGC…
Genome evolution
9
Genome sequence
point mutations insertions/deletions duplications/losses
Processes
…ACTCGATCGCATCGACTCCTCCAGC…
Genome evolution
9
Genome sequence
point mutations insertions/deletions duplications/losses rearrangements
Processes
…ACTCGATCGCAAGCTCTCCTCCAGC…
Genome evolution
10
Genome sequence
point mutations insertions/deletions duplications/losses rearrangements
Processes
population genetics molecular machinery species phylogeny environment
…ACTCGATCGCATCGACTCCTCCAGC…
Using genomes for 
statistical inference
11
Genome sequence
point mutations insertions/deletions duplications/losses rearrangements
Processes
population genetics molecular machinery species phylogeny environment
…ACTCGATCGCATCGACTCCTCCAGC…
Inferential statistics
12
Boussau and Daubin, Tree 2010
Inferential statistics
12
Boussau and Daubin, Tree 2010
• Using computers!
• Probabilistic models (e.g. models of sequence evolution)!
• “What I cannot create, I do not understand.” (Feynman, 1988)!
• What I cannot simulate, I do not understand.”
What information can we extract
from genome sequences?
13
1. Species phylogeny!
2. Phylogeography!
3. Diversification history!
4. Ancestral lifestyles!
5. Selective pressures in extant species!
6. Application to cell lineages
1 Inferring the phylogeny
Models:!
• Modelling events of substitution!
• In some cases, modelling insertions and deletions!
• In some cases, modelling allele sorting!
• In some cases, modelling gene duplications, losses and transfers!
• In some cases, modelling hybridization!
• Dates of speciation can also be inferred with a model of rate evolution
1 The phylogeny of life
Williams et al., Nature 2013
1 The phylogeny of life
Williams et al., Nature 2013
Improvements in:!
• probabilistic models!
• data available
1 The origin of viral strains
Boussau, Guéguen, Gouy, Evolutionary Bioinformatics 2009.
1353 first sites ~1200 remaining sites
1 The origin of viral strains
Boussau, Guéguen, Gouy, Evolutionary Bioinformatics 2009.
The N HIV strain originated through a recombination between
a Human and a Chimp virus
1353 first sites ~1200 remaining sites
Contagion, Steven Soderbergh, 2011
Contagion, Steven Soderbergh, 2011
1 Forensic analyses
Scaduto et al., PNAS 2010
The problem:!
• CC01 is a HIV-positive male, accused by several partners of hiding
his seropositivity and infecting them in the process ==> trial!
• 1 accused male, 6 partners, all seropositive!
• HIV sequences available from each of them!
• How can we tell whether CC01 likely contaminated his partners?
1 Forensic analyses
Scaduto et al., PNAS 2010
The problem:!
• CC01 is a HIV-positive male, accused by several partners of hiding
his seropositivity and infecting them in the process ==> trial!
• 1 accused male, 6 partners, all seropositive!
• HIV sequences available from each of them!
• How can we tell whether CC01 likely contaminated his partners?
Use the HIV sequences to build a phylogenetic tree!
1 Forensic analyses
Scaduto et al., PNAS 2010
1 Forensic analyses
Scaduto et al., PNAS 2010
Evidence used to establish that CC01 had infected his partners
1 Inferring the phylogeny
Purposes:!
• Inferring the species phylogeny!
• Reconstructing the evolutionary history of infectious agents!
• Reconstructing transmission histories (e.g. forensic analyses)!
2 Phylogeography
Question:!
• How did these organisms get to be where they are?!
2 Phylogeography
Question:!
• How did these organisms get to be where they are?!
Mus musculus, GBIF database
2 Phylogeography
Faria et al., Science 2014
Models:!
• Add spatial information at the leaves!
• Use Discrete models or continuous models to reconstruct
ancestral ranges!
2 Phylogeography
Landis et al., Syst. Biol. 2014
Landis et al., Syst. Biol. 2014
2 Phylogeography
2 HIV phylogeography
Faria et al., Science 2014
2 Phylogeography
Purposes:!
• Inferring the species geographical range through time!
• Reconstructing the evolutionary spread of infectious agents!
• Investigating plate tectonics!
3 Diversification history
How did species diversify? Were there bursts of speciation, or
mass extinctions? !
How many species/individuals through time?!
Models:!
• Modelling events of speciation, and events of extinction!
• In some cases, can be dependent on other parameters
28
3 Species of birds through time
Jetz et al., Nature 2012
29
3 Speciations of birds across the globe
Jetz et al., Nature 2012
30
3 Phylodynamics of HCV in Egypt
Drummond et al., MBE 2005
30
3 Phylodynamics of HCV in Egypt
Drummond et al., MBE 2005
Huge increase in number of viruses coincides with the extensive use of
an antischistosomiasis treatment from 1920 to 1980
3 Diversification history
Purposes:!
• Inferring the number of species/individuals through time!
• Finding major radiation/extinction events!
• Reconstructing past epidemics!
4 Ancestral lifestyles
How did ancestral species live? What temperature did they
like most? How long did they live?!
Models:!
• Correlating molecular evolution with phenotypic traits!
4 Inferring growth temperature
across the tree of life
4 Inferring growth temperature
across the tree of life
Idea: reconstruct ancestral sequences in silico,
and predict ancestral growth temperatures
4 Inferring growth temperature
across the tree of life
Usual model: !
all branches evolve according to the same model
Better model: !
different models for different branches
Boussau et al., Nature 2008
4 Inferring growth
temperature across the
tree of life
Boussau et al., Nature 2008
4 Inferring growth temperature across
the tree of life
Boussau et al., Nature 2008
4 Inferring growth temperature across
the tree of life
Late Heavy Bombardment 3.8 Bya?
Lartillot and Delsuc, Evolution 2012
4 Joint inference of rates,
dates, and traits
4 Ancestral lifestyles
Purposes:!
• Inferring characteristics of ancient organisms!
• Inferring characteristics of ancient environments!
• Finding/using correlations between phenotype evolution and
genotype evolution
5 Selective pressures in
extant species
What sites in the genome of species X are important?!
Models:!
• Usual models of sequence evolution!
• Models of insertion-deletion!
• Hidden Markov Models that run along the genome!
40
5 Conservation across species
indicates function
40
Brown, Sanger, Kitai. Biochem. J. 1955.
41
5 Conservation across species
indicates function
41
The UCSC genome browser uses conservation across 100 vertebrates to
detect functional regions
Gnad et al., BMC Genomics 2013
5 How to best
predict cancer-
causing
mutations?
Comparison
across 12
methods:
5 How to best predict cancer-
causing mutations?
Gnad et al., BMC Genomics 2013
5 Selective pressures in
extant species
Purposes:!
• Screening a genome for new functional elements!
• Evaluating the severity of candidate mutations (e.g. genetic
disease, cancer)!
• Finding sites to target in a pest that needs controlling
6 Application to cell lineages
As cells divide by mitosis, mutations accumulate. !
—> phylogenetic approaches can be used to address developmental
questions:!
• How similar is development across individuals?!
• Do the first cells produced during development contribute equally to
the adult organism?!
• …!
Models:!
• Usual models of sequence evolution!
• Models of microsatellite evolution
6 Application to cell lineages
Behjati et al., Nature 2014
6 Application to cell lineages
Behjati et al., Nature 2014
Contributions of early embryonic cells to adult tail cell populations
6 Application to cell lineages
6 Application to cell lineages
Purposes:!
• Learning about development!
• Learning about cancer evolution (e.g. using phylogeography to
understand cancer spread)
Conclusions
• Genomes contain a lot of information about their history and about
how they work!
• The comparative approach is a powerful way to learn about the
function of a stretch of sequence!
• Thanks to probabilistic models, one can exploit the huge amount of
information in genomes to ask a large number of interesting
questions
Slides available on SlideShare: http://www.slideshare.net/boussau

Evolutionary genomics

  • 1.
    LBBE, CNRS, Universitéde Lyon Evolutionary genomics Bastien Boussau boussau@gmail.com @bastounette
  • 2.
  • 3.
    Chance and necessity …… Evolution …ATCGACATCAGCATCAGCACTAC…
  • 4.
    Chance and necessity …… Evolution Function …ATCGACATCAGCATCAGCACTAC…
  • 5.
    3 Evolution in ourgenomes 3 Brown, Sanger, Kitai. Biochem. J. 1955.
  • 6.
    3 Evolution in ourgenomes 3 Brown, Sanger, Kitai. Biochem. J. 1955.
  • 7.
    3 Evolution in ourgenomes 3 Brown, Sanger, Kitai. Biochem. J. 1955.
  • 8.
    Genomes as Documentsof Evolutionary History 4
  • 9.
    What information canwe extract from genome sequences? 5 1. Species phylogeny! 2. Phylogeography! 3. Diversification history! 4. Ancestral lifestyles! 5. Selective pressures in extant species! 6. Application to cell lineages
  • 10.
  • 11.
    Genome evolution 6 Genome sequence pointmutations Processes …ACTCGTTCGCATCGACTCCTCCAGC…
  • 12.
    Genome evolution 7 Genome sequence pointmutations Processes …ACTCGATCGCATCGACTCCTCCAGC…
  • 13.
    Genome evolution 7 Genome sequence pointmutations insertions/deletions Processes …ACTCGATCGCATCGAAAACTCCTCCAGC…
  • 14.
    Genome evolution 8 Genome sequence pointmutations insertions/deletions Processes …ACTCGATCGCATCGACTCCTCCAGC…
  • 15.
    Genome evolution 8 Genome sequence pointmutations insertions/deletions duplications/losses Processes …ACTCGATCGCATCGACTCCTTCCTCCAGC…
  • 16.
    Genome evolution 9 Genome sequence pointmutations insertions/deletions duplications/losses Processes …ACTCGATCGCATCGACTCCTCCAGC…
  • 17.
    Genome evolution 9 Genome sequence pointmutations insertions/deletions duplications/losses rearrangements Processes …ACTCGATCGCAAGCTCTCCTCCAGC…
  • 18.
    Genome evolution 10 Genome sequence pointmutations insertions/deletions duplications/losses rearrangements Processes population genetics molecular machinery species phylogeny environment …ACTCGATCGCATCGACTCCTCCAGC…
  • 19.
    Using genomes for statistical inference 11 Genome sequence point mutations insertions/deletions duplications/losses rearrangements Processes population genetics molecular machinery species phylogeny environment …ACTCGATCGCATCGACTCCTCCAGC…
  • 20.
  • 21.
    Inferential statistics 12 Boussau andDaubin, Tree 2010 • Using computers! • Probabilistic models (e.g. models of sequence evolution)! • “What I cannot create, I do not understand.” (Feynman, 1988)! • What I cannot simulate, I do not understand.”
  • 22.
    What information canwe extract from genome sequences? 13 1. Species phylogeny! 2. Phylogeography! 3. Diversification history! 4. Ancestral lifestyles! 5. Selective pressures in extant species! 6. Application to cell lineages
  • 23.
    1 Inferring thephylogeny Models:! • Modelling events of substitution! • In some cases, modelling insertions and deletions! • In some cases, modelling allele sorting! • In some cases, modelling gene duplications, losses and transfers! • In some cases, modelling hybridization! • Dates of speciation can also be inferred with a model of rate evolution
  • 24.
    1 The phylogenyof life Williams et al., Nature 2013
  • 25.
    1 The phylogenyof life Williams et al., Nature 2013 Improvements in:! • probabilistic models! • data available
  • 26.
    1 The originof viral strains Boussau, Guéguen, Gouy, Evolutionary Bioinformatics 2009. 1353 first sites ~1200 remaining sites
  • 27.
    1 The originof viral strains Boussau, Guéguen, Gouy, Evolutionary Bioinformatics 2009. The N HIV strain originated through a recombination between a Human and a Chimp virus 1353 first sites ~1200 remaining sites
  • 28.
  • 29.
  • 30.
    1 Forensic analyses Scadutoet al., PNAS 2010 The problem:! • CC01 is a HIV-positive male, accused by several partners of hiding his seropositivity and infecting them in the process ==> trial! • 1 accused male, 6 partners, all seropositive! • HIV sequences available from each of them! • How can we tell whether CC01 likely contaminated his partners?
  • 31.
    1 Forensic analyses Scadutoet al., PNAS 2010 The problem:! • CC01 is a HIV-positive male, accused by several partners of hiding his seropositivity and infecting them in the process ==> trial! • 1 accused male, 6 partners, all seropositive! • HIV sequences available from each of them! • How can we tell whether CC01 likely contaminated his partners? Use the HIV sequences to build a phylogenetic tree!
  • 32.
    1 Forensic analyses Scadutoet al., PNAS 2010
  • 33.
    1 Forensic analyses Scadutoet al., PNAS 2010 Evidence used to establish that CC01 had infected his partners
  • 34.
    1 Inferring thephylogeny Purposes:! • Inferring the species phylogeny! • Reconstructing the evolutionary history of infectious agents! • Reconstructing transmission histories (e.g. forensic analyses)!
  • 35.
    2 Phylogeography Question:! • Howdid these organisms get to be where they are?!
  • 36.
    2 Phylogeography Question:! • Howdid these organisms get to be where they are?! Mus musculus, GBIF database
  • 37.
    2 Phylogeography Faria etal., Science 2014 Models:! • Add spatial information at the leaves! • Use Discrete models or continuous models to reconstruct ancestral ranges!
  • 38.
    2 Phylogeography Landis etal., Syst. Biol. 2014
  • 39.
    Landis et al.,Syst. Biol. 2014 2 Phylogeography
  • 40.
    2 HIV phylogeography Fariaet al., Science 2014
  • 41.
    2 Phylogeography Purposes:! • Inferringthe species geographical range through time! • Reconstructing the evolutionary spread of infectious agents! • Investigating plate tectonics!
  • 42.
    3 Diversification history Howdid species diversify? Were there bursts of speciation, or mass extinctions? ! How many species/individuals through time?! Models:! • Modelling events of speciation, and events of extinction! • In some cases, can be dependent on other parameters
  • 43.
    28 3 Species ofbirds through time Jetz et al., Nature 2012
  • 44.
    29 3 Speciations ofbirds across the globe Jetz et al., Nature 2012
  • 45.
    30 3 Phylodynamics ofHCV in Egypt Drummond et al., MBE 2005
  • 46.
    30 3 Phylodynamics ofHCV in Egypt Drummond et al., MBE 2005 Huge increase in number of viruses coincides with the extensive use of an antischistosomiasis treatment from 1920 to 1980
  • 47.
    3 Diversification history Purposes:! •Inferring the number of species/individuals through time! • Finding major radiation/extinction events! • Reconstructing past epidemics!
  • 48.
    4 Ancestral lifestyles Howdid ancestral species live? What temperature did they like most? How long did they live?! Models:! • Correlating molecular evolution with phenotypic traits!
  • 49.
    4 Inferring growthtemperature across the tree of life
  • 50.
    4 Inferring growthtemperature across the tree of life Idea: reconstruct ancestral sequences in silico, and predict ancestral growth temperatures
  • 51.
    4 Inferring growthtemperature across the tree of life Usual model: ! all branches evolve according to the same model Better model: ! different models for different branches
  • 52.
    Boussau et al.,Nature 2008 4 Inferring growth temperature across the tree of life
  • 53.
    Boussau et al.,Nature 2008 4 Inferring growth temperature across the tree of life
  • 54.
    Boussau et al.,Nature 2008 4 Inferring growth temperature across the tree of life Late Heavy Bombardment 3.8 Bya?
  • 55.
    Lartillot and Delsuc,Evolution 2012 4 Joint inference of rates, dates, and traits
  • 56.
    4 Ancestral lifestyles Purposes:! •Inferring characteristics of ancient organisms! • Inferring characteristics of ancient environments! • Finding/using correlations between phenotype evolution and genotype evolution
  • 57.
    5 Selective pressuresin extant species What sites in the genome of species X are important?! Models:! • Usual models of sequence evolution! • Models of insertion-deletion! • Hidden Markov Models that run along the genome!
  • 58.
    40 5 Conservation acrossspecies indicates function 40 Brown, Sanger, Kitai. Biochem. J. 1955.
  • 59.
    41 5 Conservation acrossspecies indicates function 41 The UCSC genome browser uses conservation across 100 vertebrates to detect functional regions
  • 60.
    Gnad et al.,BMC Genomics 2013 5 How to best predict cancer- causing mutations? Comparison across 12 methods:
  • 61.
    5 How tobest predict cancer- causing mutations? Gnad et al., BMC Genomics 2013
  • 62.
    5 Selective pressuresin extant species Purposes:! • Screening a genome for new functional elements! • Evaluating the severity of candidate mutations (e.g. genetic disease, cancer)! • Finding sites to target in a pest that needs controlling
  • 63.
    6 Application tocell lineages As cells divide by mitosis, mutations accumulate. ! —> phylogenetic approaches can be used to address developmental questions:! • How similar is development across individuals?! • Do the first cells produced during development contribute equally to the adult organism?! • …! Models:! • Usual models of sequence evolution! • Models of microsatellite evolution
  • 64.
    6 Application tocell lineages Behjati et al., Nature 2014
  • 65.
    6 Application tocell lineages Behjati et al., Nature 2014 Contributions of early embryonic cells to adult tail cell populations
  • 66.
    6 Application tocell lineages
  • 67.
    6 Application tocell lineages Purposes:! • Learning about development! • Learning about cancer evolution (e.g. using phylogeography to understand cancer spread)
  • 68.
    Conclusions • Genomes containa lot of information about their history and about how they work! • The comparative approach is a powerful way to learn about the function of a stretch of sequence! • Thanks to probabilistic models, one can exploit the huge amount of information in genomes to ask a large number of interesting questions Slides available on SlideShare: http://www.slideshare.net/boussau