The document discusses the role of genomes as records of evolutionary history, outlining various applications of genomic data in inferring phylogenies, phylogeography, diversification, and understanding ancestral lifestyles. It emphasizes the use of probabilistic models and computational tools to extract meaningful insights from genome sequences, including the impact on species evolution and forensic analyses. Overall, it highlights the potential of genomic information to address diverse scientific questions related to evolution and genetics.
Introduction to LBBE, CNRS, Université de Lyon and the themes of chance, necessity, evolution, and function in genomes.
Discussion on evolution documented in genomes with citations from Brown et al. on biochemical aspects.
Genomes interpreted as documents of evolutionary history, extracting information like phylogeny and diversification.Genome evolution discussed with focus on mutations, processes like insertions, deletions, and their implications.Use of statistical models for inferring phylogenetic relationships and understanding evolutionary processes.
Models for inferring species phylogeny through genetic sequencing and the application in understanding viral strains.
Exploration of geographic distribution and historical movement of species based on genomic data.Investigating speciation, extinction events, and the diversification of species through time.Exploration of ancient organisms and environments based on molecular evolution and correlating traits.
Identifying crucial genomic sites and conservation across species indicating functions, relevant to disease.
Using phylogenetic methods to understand cell lineage development and implications for cancer evolution.
Summation of insights gained from genomes, historical information, and the importance of comparative approaches.
What information canwe extract
from genome sequences?
5
1. Species phylogeny!
2. Phylogeography!
3. Diversification history!
4. Ancestral lifestyles!
5. Selective pressures in extant species!
6. Application to cell lineages
Genome evolution
10
Genome sequence
pointmutations insertions/deletions duplications/losses rearrangements
Processes
population genetics molecular machinery species phylogeny environment
…ACTCGATCGCATCGACTCCTCCAGC…
19.
Using genomes for
statistical inference
11
Genome sequence
point mutations insertions/deletions duplications/losses rearrangements
Processes
population genetics molecular machinery species phylogeny environment
…ACTCGATCGCATCGACTCCTCCAGC…
Inferential statistics
12
Boussau andDaubin, Tree 2010
• Using computers!
• Probabilistic models (e.g. models of sequence evolution)!
• “What I cannot create, I do not understand.” (Feynman, 1988)!
• What I cannot simulate, I do not understand.”
22.
What information canwe extract
from genome sequences?
13
1. Species phylogeny!
2. Phylogeography!
3. Diversification history!
4. Ancestral lifestyles!
5. Selective pressures in extant species!
6. Application to cell lineages
23.
1 Inferring thephylogeny
Models:!
• Modelling events of substitution!
• In some cases, modelling insertions and deletions!
• In some cases, modelling allele sorting!
• In some cases, modelling gene duplications, losses and transfers!
• In some cases, modelling hybridization!
• Dates of speciation can also be inferred with a model of rate evolution
1 The originof viral strains
Boussau, Guéguen, Gouy, Evolutionary Bioinformatics 2009.
The N HIV strain originated through a recombination between
a Human and a Chimp virus
1353 first sites ~1200 remaining sites
1 Forensic analyses
Scadutoet al., PNAS 2010
The problem:!
• CC01 is a HIV-positive male, accused by several partners of hiding
his seropositivity and infecting them in the process ==> trial!
• 1 accused male, 6 partners, all seropositive!
• HIV sequences available from each of them!
• How can we tell whether CC01 likely contaminated his partners?
31.
1 Forensic analyses
Scadutoet al., PNAS 2010
The problem:!
• CC01 is a HIV-positive male, accused by several partners of hiding
his seropositivity and infecting them in the process ==> trial!
• 1 accused male, 6 partners, all seropositive!
• HIV sequences available from each of them!
• How can we tell whether CC01 likely contaminated his partners?
Use the HIV sequences to build a phylogenetic tree!
1 Inferring thephylogeny
Purposes:!
• Inferring the species phylogeny!
• Reconstructing the evolutionary history of infectious agents!
• Reconstructing transmission histories (e.g. forensic analyses)!
2 Phylogeography
Faria etal., Science 2014
Models:!
• Add spatial information at the leaves!
• Use Discrete models or continuous models to reconstruct
ancestral ranges!
2 Phylogeography
Purposes:!
• Inferringthe species geographical range through time!
• Reconstructing the evolutionary spread of infectious agents!
• Investigating plate tectonics!
42.
3 Diversification history
Howdid species diversify? Were there bursts of speciation, or
mass extinctions? !
How many species/individuals through time?!
Models:!
• Modelling events of speciation, and events of extinction!
• In some cases, can be dependent on other parameters
30
3 Phylodynamics ofHCV in Egypt
Drummond et al., MBE 2005
Huge increase in number of viruses coincides with the extensive use of
an antischistosomiasis treatment from 1920 to 1980
47.
3 Diversification history
Purposes:!
•Inferring the number of species/individuals through time!
• Finding major radiation/extinction events!
• Reconstructing past epidemics!
48.
4 Ancestral lifestyles
Howdid ancestral species live? What temperature did they
like most? How long did they live?!
Models:!
• Correlating molecular evolution with phenotypic traits!
4 Inferring growthtemperature
across the tree of life
Idea: reconstruct ancestral sequences in silico,
and predict ancestral growth temperatures
51.
4 Inferring growthtemperature
across the tree of life
Usual model: !
all branches evolve according to the same model
Better model: !
different models for different branches
52.
Boussau et al.,Nature 2008
4 Inferring growth
temperature across the
tree of life
53.
Boussau et al.,Nature 2008
4 Inferring growth temperature across
the tree of life
54.
Boussau et al.,Nature 2008
4 Inferring growth temperature across
the tree of life
Late Heavy Bombardment 3.8 Bya?
4 Ancestral lifestyles
Purposes:!
•Inferring characteristics of ancient organisms!
• Inferring characteristics of ancient environments!
• Finding/using correlations between phenotype evolution and
genotype evolution
57.
5 Selective pressuresin
extant species
What sites in the genome of species X are important?!
Models:!
• Usual models of sequence evolution!
• Models of insertion-deletion!
• Hidden Markov Models that run along the genome!
41
5 Conservation acrossspecies
indicates function
41
The UCSC genome browser uses conservation across 100 vertebrates to
detect functional regions
60.
Gnad et al.,BMC Genomics 2013
5 How to best
predict cancer-
causing
mutations?
Comparison
across 12
methods:
61.
5 How tobest predict cancer-
causing mutations?
Gnad et al., BMC Genomics 2013
62.
5 Selective pressuresin
extant species
Purposes:!
• Screening a genome for new functional elements!
• Evaluating the severity of candidate mutations (e.g. genetic
disease, cancer)!
• Finding sites to target in a pest that needs controlling
63.
6 Application tocell lineages
As cells divide by mitosis, mutations accumulate. !
—> phylogenetic approaches can be used to address developmental
questions:!
• How similar is development across individuals?!
• Do the first cells produced during development contribute equally to
the adult organism?!
• …!
Models:!
• Usual models of sequence evolution!
• Models of microsatellite evolution
6 Application tocell lineages
Purposes:!
• Learning about development!
• Learning about cancer evolution (e.g. using phylogeography to
understand cancer spread)
68.
Conclusions
• Genomes containa lot of information about their history and about
how they work!
• The comparative approach is a powerful way to learn about the
function of a stretch of sequence!
• Thanks to probabilistic models, one can exploit the huge amount of
information in genomes to ask a large number of interesting
questions
Slides available on SlideShare: http://www.slideshare.net/boussau