2. “Nothing in biology makes sense
except in the light of evolution.”
T. H. Dobzhansky (1973)
TIGR
3. Topics of Discussion
• Introduction to phylogenomics
• Uses of evolutionary analysis in genomics
– Selection of species
– Functional prediction
– Gene duplication
– Gene loss
– Genome rearrangements
– Lateral transfer
– Uncultured species
– Specialization
TIGR
5. Uses of Phylogenomics
• Selection of species
• Functional prediction
• Gene duplication
• Intragenomic movement
• Gene loss
• Lateral transfer
• Genome rearrangements
• Uncultured species
TIGR
6. Strain Selection and Evolution
• Increasing phylogenetic representation
• Determining relatedness to model organism
• Understanding major evolutionary transitions
• Identifying taxa with unusual (high or low) rates
of evolution
• Identifying source of DNA from uncultured
species
• Species naming and type strains (e.g., see Ward et.
al. 2001)
TIGR
8. S. pombe Genome Analysis
Eukaryotes vs. Prokaryotes
S. pombe
S. cerevisiae Eukaryotes
Encephalatozoon
Archaea Worm
Fly
Bacteria Humans
Dictyostelium
Arabidopsis
Chlamydomonas
Phytophthora
Tetrahymena
Plasmodium
Trypanosoma
Euglena
Naegleria
Trichomonas
Giardia
TIGR
9. Single vs. Multi-celled
S. pombe Fungi
S. cerevisiae
Encephalatozoon Microsporidia
Worm Animals
Fly
Humans
Dictyostelium Dictyostelia
Arabidopsis
Plants
Chlamydomonas
Phytophthora Heterokonts
Tetrahymena Ciliates
Plasmodium Apicomplexa
Trypanosoma Kinetoplastids
Euglena Euglenas
Naegleria
Acrasidae
Trichomonas
Parabisalia
TIGR Giardia Diplomonads
10. Uses of Phylogenomics
• Selection of species
• Functional prediction
• Gene duplication
• Intragenomic movement
• Gene loss
• Lateral transfer
• Genome rearrangements
• Uncultured species
TIGR
11. Predicting Function
• Identification of motifs
• Homology/similarity based methods
– Highest hit, top hit, HMMs, threading
• Evolutionary methods
– Phylogenetic trees
– Ds/Dn
– Phylogenetic profiles
TIGR
12. FlyMutS.Borbu
TrepaGTBP.Mouse
hMSH4.Human
MSH4.Caeel
MSH4.Yeast
orf.Arath
hMSH3.Human
orf.Chltr
orf.Deira
atMSH2.Arath
MSH2.Neucr
MSH2.Yeast
MSH2.Human
MSH2.Mouse
MSH2.Rat
MSH2.Xenla
SPE1.Drome
orf.Trepa
MutS.Aquae
orfStrpy
MutS.Helpy
yshD
MSH3.Yeast
MutS
sgMutS.Saugl
orfGTBP.Human
MutS.Bacsu
MSH6.Arath
MutS2
orf
MutS.Metth
hMHS5
MSH5
MutS
Swi4.Spombe
MSH1.Spombe
.MutS
Chltr
Thema
Neigo
Arath
Neucr
Xenla
Trepa
Theaq
Ecoli
Bacsu
Strpy
Yeast
Human
MouseMSH1.Yeast
MSH6.Yeast
Rep3.Mouse
MutS2.Saugl
Helpy
Deira
Synsp
Aquae
Borbu
MutS2.Metth
MutS2-Saugl
MutS2-Metth
FlyMSH5
Caeel
Helpy
MutS2.Saugl
MutS2.Metth
Chltr
Deira
Theaq
Thema
Neigo
EcoliMSH3
Synsp
Bacsu
Strpy
Borbu
ArathMSH6
Neucr
Yeast
Human
Mouse
RatMutS1
Xenla
Aquae MSH1
Spombe
Yeast
Mouse
Caeel
Human
Spombe
Spombe
Yeast
Mouse
Yeast
Human
Arath
Caeel
Spombe
Arath
MutS2
MSH6
MSH3
MSH2
MSH4
MutS1
MSH1
MSH5
MutS2
MSH2
MSH4
D.
C.
B.
A.
Neigo
Bacsu
Synsp
Borbu
Deira
Strpy
Ecoli
Aquae
Theaq
Thema
human
Yeast
Caeel
SegAllMMR &&
Segregationin
All MMR
regation
MMR of
Mismatches and
in Nucleus
Crossover
Large Loops
(Bacteria)
Crossover
Mitochondria
Small Loops
in Nucleus
in Nucleus
TIGR
15. Uses of Phylogenomics
• Selection of species
• Functional prediction
• Gene duplication
• Gene loss
• Lateral transfer
• Genome rearrangements
• Uncultured species
TIGR
16. Why Duplications Are Useful to Identify
• Allows division into orthologs and paralogs
• Improves functional predictions
• Helps identify mechanisms of duplication
• Can be used to study mutation processes in
different parts of a genome
• Lineage specific duplications may be indicative
of species’ specific adaptations
TIGR
17. Lineage Specific Duplications in Wolbachia wMel
Annotation hypothetical protein prophage LambdaW2, baseplate
ankyrin repeat domain protein hypothetical protein assembly protein J, putative
ankyrin repeat domain protein hypothetical protein prophage LambdaW2, baseplate
ankyrin repeat domain protein hypothetical protein assembly protein V, putative
ankyrin repeat domain protein hypothetical protein FRAMESHIFT
ankyrin repeat domain protein hypothetical protein prophage LambdaW2, baseplate
ankyrin repeat domain protein hypothetical protein assembly protein V, putative
ankyrin repeat domain protein hypothetical protein FRAMESHIFT
conserved domain protein hypothetical protein prophage LambdaW2, baseplate
conserved domain protein hypothetical protein assembly protein W, putative
conserved domain protein hypothetical protein prophage LambdaW2, minor tail
conserved domain protein hypothetical protein protein Z, putative,
conserved hypothetical protein hypothetical protein FRAMESHIFT
conserved hypothetical protein hypothetical protein prophage LambdaW2, site-
conserved hypothetical protein hypothetical protein specific recombinase, resolvase
conserved hypothetical protein hypothetical protein family
conserved hypothetical protein hypothetical protein prophage LambdaW4, ankyrin
conserved hypothetical protein hypothetical protein repeat domain protein
conserved hypothetical protein hypothetical protein prophage LambdaW4, DNA
conserved hypothetical protein hypothetical protein methylase
conserved hypothetical protein hypothetical protein prophage LambdaW4, portal
conserved hypothetical protein hypothetical protein protein, FRAMESHIFT
conserved hypothetical protein hypothetical protein prophage LambdaW4, portal
conserved hypothetical protein hypothetical protein protein, FRAMESHIFT
conserved hypothetical protein hypothetical protein prophage LambdaW4, terminase
conserved hypothetical protein hypothetical protein large subunit, putative
conserved hypothetical protein hypothetical protein prophage LambdaW5, ankyrin
conserved hypothetical protein hypothetical protein repeat domain protein
conserved hypothetical protein hypothetical protein prophage LambdaW5, ankyrin
conserved hypothetical protein hypothetical protein repeat domain protein
conserved hypothetical protein hypothetical protein prophage LambdaW5, ankyrin
conserved hypothetical protein hypothetical protein repeat domain protein
conserved hypothetical protein hypothetical protein prophage LambdaW5, baseplate
conserved hypothetical protein hypothetical protein assembly protein J, putative,
conserved hypothetical protein hypothetical protein FRAMESHIFT
conserved hypothetical protein hypothetical protein prophage LambdaW5, baseplate
FRAMESHIFT hypothetical protein assembly protein V, putative
conserved hypothetical protein hypothetical protein prophage LambdaW5, baseplate
POINT MUTATION hypothetical protein assembly protein W, putative
conserved hypothetical protein, hypothetical protein prophage LambdaW5, minor tail
degenerate hypothetical protein protein Z, putative, degenerate,
conserved hypothetical protein, hypothetical protein FRAMESHIFT
FRAMESHIFT hypothetical protein prophage LambdaW5, site-
conserved hypothetical protein, hypothetical protein specific recombinase, resolvase
FRAMESHIFT hypothetical protein family
conserved hypothetical protein, hypothetical protein regulatory protein RepA, putative
FRAMESHIFT hypothetical protein regulatory protein RepA, putative
conserved hypothetical protein, hypothetical protein reverse transcriptase, putative
FRAMESHIFT hypothetical protein reverse transcriptase, putative
conserved hypothetical protein, hypothetical protein reverse transcriptase, putative
interruption-C hypothetical protein sodium/alanine symporter family
conserved hypothetical protein, hypothetical protein protein
POINT MUTATION hypothetical protein sodium/alanine symporter family
conserved hypothetical protein, hypothetical protein protein
POINT MUTATION hypothetical protein TenA/THI-4 family protein
conserved hypothetical protein, hypothetical protein transcriptional regulator
truncated hypothetical protein transcriptional regulator
conserved hypothetical protein, hypothetical protein transcriptional regulator
truncation hypothetical protein transcriptional regulator
DNA mismatch repair protein hypothetical protein transcriptional regulator
MutL (mutL) hypothetical protein transcriptional regulator
DNA repair protein RadC, hypothetical protein transcriptional regulator, putative
putative hypothetical protein translation elongation factor Tu
DNA repair protein RadC, hypothetical protein (tuf)
putative, truncation hypothetical protein translation elongation factor Tu
DNA repair protein RadC, hypothetical protein (tuf)
truncation hypothetical protein transposase, degenerate
DnaJ domain protein hypothetical protein transposase, IS4 family
DnaJ domain protein hypothetical protein transposase, IS4 family
exopolysaccharide synthesis hypothetical protein transposase, IS4 family
protein ExoD-related protein major facilitator family transposase, IS5 family,
exopolysaccharide synthesis transporter interruption-N
protein ExoD-related protein major facilitator family transposase, IS5 family,
HNH endonuclease family transporter truncation
protein major facilitator family transposase, putative, degenerate
HNH endonuclease family transporter transposase, putative, degenerate
protein membrane protein, putative transposase, putative, degenerate
hypothetical protein membrane protein, putative type IV secretion system protein
hypothetical protein membrane protein, putative VirB4, putative
hypothetical protein MutL family protein UDP-N-acetylglucosamine
hypothetical protein Na+/H+ antiporter family protein pyrophosphorylase-related
hypothetical protein Na+/H+ antiporter, putative protein
hypothetical protein permease, putative
hypothetical protein portal protein, FRAMESHIFT
hypothetical protein portal protein, FRAMESHIFT
hypothetical protein prophage LambdaW1, DNA
TIGR
hypothetical protein methylase
hypothetical protein prophage LambdaW1, terminase
hypothetical protein large subunit, putative
hypothetical protein prophage LambdaW2, ankyrin
hypothetical protein repeat domain protein
hypothetical protein prophage LambdaW2, ankyrin
hypothetical protein repeat domain protein
18. MutL Duplication in Wolbachia wMel
ORF01096 DNA mismatch repair protein MutL (mutL)
ORF00446 MutL family protein
TIGR
21. Uses of Phylogenomics
• Selection of species
• Functional prediction
• Gene duplication
• Intragenomic movement
• Gene loss
• Lateral transfer
• Genome rearrangements
• Uncultured species
TIGR
22. X-files
Eisen et al. 2000. Genome Biology 1(6): 11.1-11.9
Also see Tillier and Collins. 2000. Nature Genetics
26(2):195-7 and Suyama and Bork. 2001. Trends Genetics
17: 10-13.
TIGR
23. C. trachomatis vs C. pneumoniae Dot Plot
Origin
C. pneumoniae AR39
Terminus
TIGR C. trachomatis MoPn Read et al. 2000
26. Uses of Phylogenomics
• Selection of species
• Functional prediction
• Gene duplication
• Intragenomic movement
• Gene loss
• Lateral transfer
• Genome rearrangements
• Uncultured species
TIGR
27. Most ‘Evidence’ for Gene Transfer
has Alternative Explanations
Observation Other Causes Always Occurs
Unusual Distribution Sampling bias Not if recipient already has gene.
Unusual GC/Codons Selection Not if donor/recipient similar.
Not if it occurred long ago.
High hit to "distant" species Selection Usually.
Rate variation
Gene loss
Incongruent trees Bad trees Usually.
Missed paralogs
Correlation of above with Selection Only if genes keep order after
neighbors transfer.
TIGR
29. Mitochondrial Genome
Integration into A. thaliana chrII
0
4E+05
3E+05
2E+05
1E+05
3.6E+06
3.5E+06
3.4E+06
3.3E+06
3.2E+06 thaliana
D.
C.
B.
Alternative
Chromosome II
Possible Mitochondrial Form
D’D’A.
1’ 1
A
B
C
A’
3
Insertion
Mitochondrial
PointAlternative
Genome
TIGR Lin et al., 1999
30. Number of pBVTs Depends
1800 on # of Genomes Analyzed
1600
1400
1200
Fruit fly
1000 C. elegans
Arabidopsis
Yeast
800 Parasites
600
400
200
0
1 2 3 4 5 Other
TIGR Number of protein sets
Salzberg et al. 2001
32. Uses of Phylogenomics
• Selection of species
• Functional prediction
• Gene duplication
• Intragenomic movement
• Gene loss
• Lateral transfer
• Genome rearrangements
• Uncultured species
TIGR
41. C. pneumoniae Paralogs by Position
1250000
1000000
750000
500000
Subject Orf Position
250000
0
0 250000 500000 750000 1000000 1250000
TIGR Query Orf Position
42. C. pneumoniae Paralogs -
1250000
Lineage Specific
1000000
750000
500000
Subject Orf Position
250000
0
0 250000 500000 750000 1000000 1250000
TIGR Query Orf Position