This document summarizes a talk on phylogenomics and the diversity and diversification of microbes. The talk discussed several topics:
1) It introduced phylogenomics and described how limited sampling of microbial diversity has been from rRNA studies.
2) It discussed examples of mechanisms of novelty originating from within genomes, such as UV repair mechanisms in H. volcanii compared to E. coli.
3) It outlined how phylogenomic methods can be used to infer the likely functions of genes of interest by identifying homologs, aligning sequences, constructing gene trees, and overlaying known functions onto the tree. However, these methods may not work as well for very recently evolved functions.
4) It
4. Phylogenomics of Novelty
Mechanisms of Variation in
Origin of New Mechanisms:
Functions Patterns, Causes
and Effects
5. Phylogenomics of Novelty
Mechanisms of Variation in
Origin of New Mechanisms:
Functions Patterns, Causes
and Effects
Species Evolution
6. Phylogenomics of Novelty
Variation in
Mechanisms of
Mechanisms:
Origin of New
Patterns, Causes
Functions
and Effects
Species Evolution
7. Outline
⢠Introduction
⢠Phylogenomic Stories
â Within genome invention of novelty
â Stealing novelty
â Communities of microbes
â Community service and knowing what we donât
know
9. rRNA Tree of Life
FIgure from Barton, Eisen et al.
âEvolutionâ, CSHL Press.
Based on tree from Pace NR, 2003.
10. Limited Sampling of RRR Studies
FIgure from Barton, Eisen et al.
âEvolutionâ, CSHL Press.
Based on tree from Pace NR, 2003.
11. Limited Sampling of RRR Studies
Haloferax
Methanococcus
Chlorobium
Deinococcus
Thermotoga
FIgure from Barton, Eisen et al.
âEvolutionâ, CSHL Press.
Based on tree from Pace NR, 2003.
15. TIGR Genome Projects
Haloferax
Methanococcus
Chlorobium
Deinococcus
Thermotoga
FIgure from Barton, Eisen et al.
âEvolutionâ, CSHL Press.
Based on tree from Pace NR, 2003.
23. From Eisen et al.
1997 Nature
Medicine 3:
1076-1078.
24. Blast Search of H. pylori âMutSâ
⢠Blast search pulls up Syn. sp MutS#2 with much higher p
value than other MutS homologs
⢠Based on this TIGR predicted this species had mismatch
repair
⢠Assumes functional constancy
Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
25. Predicting Function
⢠Identification of motifs
â Short regions of sequence similarity that are indicative of
general activity
â e.g., ATP binding
⢠Homology/similarity based methods
â Gene sequence is searched against a databases of other
sequences
â If significant similar genes are found, their functional
information is used
⢠Problem
â Genes frequently have similarity to hundreds of motifs
and multiple genes, not all with the same function
27. Overlaying Functions onto Tree
MutS2
Aquae
MSH5 Strpy
Bacsu
Synsp
Deira Helpy
Yeast
Human Borbu Metth
Celeg
MSH6 mSaco
Yeast
Human
Mouse
Arath
Yeast MSH4
Celeg
Human
Arath
Human
MSH3 Mouse
Fly
Spombe
Yeast Xenla
Rat
Mouse
Yeast Human
MSH1 Spombe Yeast MSH2
Neucr
Arath
Aquae Trepa
Chltr
DeiraTheaq
BacsuBorbu
Thema
SynspStrpy Based on Eisen,
Ecoli
Neigo
1998 Nucl Acids
MutS1 Res 26: 4291-4300.
28.
29. Evolutionary Functional Prediction
EXAMPLE A METHOD EXAMPLE B
2A CHOOSE GENE(S) OF INTEREST 5
3A 1 3 4
2B 2
IDENTIFY HOMOLOGS 5
1A 2A 1B 3B 6
ALIGN SEQUENCES
1A 2A 3A 1B 2B 3B 1 2 3 4 5 6
CALCULATE GENE TREE
Duplication?
1A 2A 3A 1B 2B 3B 1 2 3 4 5 6
OVERLAY KNOWN
FUNCTIONS ONTO TREE
Duplication?
1 2 3 4 5 6
1A 2A 3A 1B 2B 3B
INFER LIKELY FUNCTION
OF GENE(S) OF INTEREST
Ambiguous
Duplication?
Species 1 Species 2 Species 3
1A 1B 2A 2B 3A 3B 1 2 3 4 5 6
ACTUAL EVOLUTION
(ASSUMED TO BE UNKNOWN)
Based on Eisen,
1998 Genome
Duplication
Res 8: 163-167.
31. RIPPING
CATGTACAGCA
GTACATGTCGT
Galagan et al. Genome
CATGTACAGCA
GTACATGTCGT sequence reveals
CATGTACAGCA
signiďŹcant
S
GTACATGTCGT
underrepresentation of
F
TATGTATAG
ATACATATC
recently duplicated genes.
TATATATAG
A
O
ATATATATC
TATGTATAGTA
ATACATATCAT
O
CH3 CH3
CH3
TATATATAGCA
R
ATATATATCGT
CH3
AU: Fig.
12.30. leg-
P
FIGURE 12.30. RIPPING. âThe repeat-induced point mutation (RIP) process in Neurospora crassa. end from
Duplications that occur during the vegetative phase are detected by RIP during the sexual cycle source; re-
after fertilization but before the DNA synthesis and nuclear fusion (karyogamy). Duplicated se- place with
quences that are longer than ~400 bp (or ~1 kb for unlinked duplications as shown) and sharing an original
greater than ~80% nucleotide identity are detected. Numerous C-G to T-A point mutations are in- legend.
troduced into both copies (unmutated C-G pairs are shown in blue; mutations are shown in red
letters; only a small number of base pairs are shown for clarity). RIP-mutated sequences are fre-
quent targets for methylation, which results in transcriptional silencing in Neurospora. In contrast
to mammals and plants, methylation is not limited to symmetric sites.â
35. Tetrahymena Genome Processing
⢠Analogous to RIPPING and
heterochromatin silencing
⢠Targets new/foreign DNA not duplicated
DNA
⢠Does not limit diversiďŹcation by
duplication
Eisen et al. 2006. PLoS Biology.
36. Phylogenomics of Novelty II
Sometimes, it is easier to steal, borrow, or
coopt functions rather than evolve them
anew
37. rRNA Tree of Life
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
âEvolutionâ, CSHL Press.
Based on tree from Pace NR, 2003.
39. Network of Life
Bacteria
Archaea
Eukaryotes
Figure from Barton, Eisen et al.
âEvolutionâ, CSHL Press.
Based on tree from Pace NR, 2003.
40. articles
Arabidopsis thaliana
*
* Authorship of this paper should be cited as `The Arabidopsis Genome Iniative'. A full list of contributors appears at the end of this paper
..........................................................................................................................................................................................................................................................................
. .
The ÂŻowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions.
Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the
125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication,
followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene
transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000
families, similar to the functional diversity of Drosophila and Caenorhabditis elegansĂ the other sequenced multicellular
eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets
of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the ÂŽrst
complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes
in all eukaryotes, identifying a wide range of plant-speciÂŽc gene functions and establishing rapid systematic ways to identify
genes for crop improvement.
The plant and animal kingdoms evolved independently from biologists, but will also affect agricultural science, evolutionary
unicellular eukaryotes and represent highly contrasting life forms. biology, bioinformatics, combinatorial chemistry, functional and
The genome sequences of C. elegans1 and Drosophila2 reveal that comparative genomics, and molecular medicine.
metazoans share a great deal of genetic information required for
developmental and physiological processes, but these genome Overview of sequencing strategy
sequences represent a limited survey of multicellular organisms. We used large-insert bacterial artiÂŽcial chromosome (BAC), phage
Flowering plants have unique organizational and physiological (P1) and transformation-competent artiÂŽcial chromosome (TAC)
properties in addition to ancestral features conserved between libraries9Âą12 as the primary substrates for sequencing. Early stages of
plants and animals. The genome sequence of a plant provides a genome sequencing used 79 cosmid clones. Physical maps of the
means for understanding the genetic basis of differences between genome of accession Columbia were assembled by restriction
plants and other eukaryotes, and provides the foundation for fragment `ÂŽngerprint' analysis of BAC clones13, by hybridization14
detailed functional characterization of plant genes. or polymerase chain reaction (PCR)15 of sequence-tagged sites and
Arabidopsis thaliana has many advantages for genome analysis, by hybridization and Southern blotting16. The resulting maps were
including a short generation time, small size, large number of integrated (http://nucleus/cshl.org/arabmaps/) with the genetic
offspring, and a relatively small nuclear genome. These advantages map and provided a foundation for assembling sets of contigs
promoted the growth of a scientiÂŽc community that has investi- into sequence-ready tiling paths. End sequence (http://www.
gated the biological processes of Arabidopsis and has characterized tigr.org/tdb/at/abe/bac_end_search.html) of 47,788 BAC clones
many genes3. To support these activities, an international collabora- was used to extend contigs from BACS anchored by marker content
tion (the Arabidopsis Genome Initiative, AGI) began sequencing and to integrate contigs.
the genome in 1996. The sequences of chromosomes 2 and 4 have Ten contigs representing the chromosome arms and centromeric
been reported4,5, and the accompanying Letters describe the heterochromatin were assembled from 1,569 BAC, TAC, cosmid and
sequences of chromosomes 1 (ref. 6), 3 (ref. 7) and 5 (ref. 8). P1 clones (average insert size 100 kilobases (kb)). Twenty-two PCR
Here we report analysis of the completed Arabidopsis genome products were ampliÂŽed directly from genomic DNA and
41. Correlated gain/loss of genes
⢠Microbial genes are lost rapidly when not
maintained by selection
⢠Genes can be acquired by lateral transfer
⢠Frequently gain and loss occurs for entire
pathways/processes
⢠Thus might be able to use correlated
presence/absence information to identify
genes with similar functions
42. Non-Homology Predictions:
Phylogenetic ProďŹling
⢠Step 1: Search all genes in
organisms of interest against all
other genomes
⢠Ask: Yes or No, is each gene
found in each other species
⢠Cluster genes by distribution
patterns (proďŹles)
43. Carboxydothermus hydrogenoformans
⢠Isolated from a Russian hotspring
⢠Thermophile (grows at 80°C)
⢠Anaerobic
⢠Grows very efficiently on CO
(Carbon Monoxide)
⢠Produces hydrogen gas
⢠Low GC Gram positive
(Firmicute)
⢠Genome Determined (Wu et al.
2005 PLoS Genetics 1: e65. )
48. Mutualistic Genome Evolution
⢠Compare and contrast different types of
mutualistic symbioses
⢠Diverse hosts, symbionts, biology, ages
⢠Organelles, chemosymbioses,
photosynthetic symbioses, nutritional
symbioses
⢠What are the rules & patterns?
49. Glassy Winged Sharpshooter
⢠Obligate xylem feeder
⢠Can transmit Pierceâs
Disease agent
⢠Potential bioterror agent
⢠Needs to get amino-
acids and other nutrients
from symbionts like
aphids
67. How can we best use
metagenomic data?
⢠Many possible uses including:
â Improvements on rRNA based phylotyping and
species diversity measurements
â Adding functional information on top of
phylogenetic/species diversity information
⢠Most/all possible uses either require or are
improved with phylogenetic analysis
70. Weighted % of Clones
0
0.1250
0.2500
0.3750
0.5000
Al
ph
ap
ro
te
Be ob
ta ac
pr te
ot ria
G eo
am ba
m ct
ap er
ro ia
Ep te
si ob
lo ac
np te
ro ria
D te
el ob
ta ac
pr te
ot ria
eo
C ba
ya ct
no er
b ia
ac
te
Fi ria
rm
ic
ut
Ac e s
tin
ob
ac
te
C ria
hl
o ro
bi
C
FB
Major Phylogenetic Group
Sargasso Phylotypes
C
hl
o ro
fle
Sp xi
iro
ch
ae
Fu te
so s
D ba
ei ct
no er
c oc ia
cu
s-
Eu Th
ry erm
ar
ch us
C ae
re ot
na a
rc
ha
eo
ta
Shotgun Sequencing Allows Use of Other Markers
EFG
Venter et al., Science 304: 66-74. 2004
EFTu
rRNA
RecA
RpoB
HSP70
87. As of 2002 Proteobacteria
TM6
OS-K ⢠At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA
WS3
Gemmimonas
Firmicutes
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
ChloroďŹexi
TM7
Deinococcus-Thermus
Dictyoglomus
AquiďŹcae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
88. As of 2002 Proteobacteria
TM6
OS-K
⢠At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA ⢠Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
ChloroďŹexi
TM7
Deinococcus-Thermus
Dictyoglomus
AquiďŹcae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
89. As of 2002 Proteobacteria
TM6
OS-K
⢠At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA ⢠Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
⢠Some other
Verrucomicrobia
Chlamydia
OP3
phyla are
Planctomycetes
Spriochaetes only sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
ChloroďŹexi
TM7
Deinococcus-Thermus
Dictyoglomus
AquiďŹcae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
90. As of 2002 Proteobacteria
TM6
OS-K
⢠At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA ⢠Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
⢠Some other
Verrucomicrobia
Chlamydia
OP3
phyla are
Planctomycetes
Spriochaetes only sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
ChloroďŹexi
TM7
Deinococcus-Thermus
Dictyoglomus
AquiďŹcae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
91. Need for Tree Guidance Well Established
⢠Common approach within some eukaryotic
groups
⢠Many small projects funded to ďŹll in some
bacterial or archaeal gaps
⢠Phylogenetic gaps in bacterial and archaeal
projects commonly lamented in literature
92. Proteobacteria
⢠NSF-funded TM6
OS-K
⢠At least 40
Tree of Life Acidobacteria
Termite Group phyla of
OP8
Project Nitrospira
Bacteroides bacteria
Chlorobi
⢠A genome Fibrobacteres
Marine GroupA ⢠Genome
WS3
from each of Gemmimonas sequences are
Firmicutes
eight phyla Fusobacteria
mostly from
Actinobacteria
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
⢠Some other
Verrucomicrobia
Chlamydia
OP3
phyla are only
Planctomycetes
Spriochaetes sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
ChloroďŹexi
TM7
Deinococcus-Thermus
⢠Solution I:
Dictyoglomus
Eisen, Ward, AquiďŹcae
Thermudesulfobacteria
sequence more
Robb, Nelson, et Thermotogae
phyla
OP1
al OP11
93.
94. Proteobacteria
⢠NSF-funded TM6
OS-K
⢠At least 40
Tree of Life Acidobacteria
Termite Group phyla of bacteria
OP8
Project Nitrospira
⢠Genome
Bacteroides
⢠A genome Chlorobi
Fibrobacteres sequences are
Marine GroupA
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
⢠Some other
Synergistes
Deferribacteres
Chrysiogenetes
phyla are only
NKB19
Verrucomicrobia sparsely
Chlamydia
OP3
Planctomycetes
sampled
Spriochaetes
Coprothmermobacter ⢠Still highly
OP10
Thermomicrobia
ChloroďŹexi
biased in terms
TM7
Deinococcus-Thermus
Dictyoglomus
of the tree
AquiďŹcae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
95. Proteobacteria
⢠NSF-funded TM6
OS-K
⢠At least 40
Tree of Life Acidobacteria
Termite Group phyla of bacteria
OP8
Project Nitrospira
⢠Genome
Bacteroides
⢠A genome Chlorobi
Fibrobacteres sequences are
Marine GroupA
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
⢠Some other
Synergistes
Deferribacteres
Chrysiogenetes
phyla are only
NKB19
Verrucomicrobia sparsely
Chlamydia
OP3
Planctomycetes
sampled
Spriochaetes
Coprothmermobacter ⢠Same trend in
OP10
Thermomicrobia
ChloroďŹexi
Archaea
TM7
Deinococcus-Thermus
Dictyoglomus
AquiďŹcae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
96. Proteobacteria
⢠NSF-funded TM6
OS-K
⢠At least 40
Tree of Life Acidobacteria
Termite Group phyla of bacteria
OP8
Project Nitrospira
⢠Genome
Bacteroides
⢠A genome Chlorobi
Fibrobacteres sequences are
Marine GroupA
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
⢠Some other
Synergistes
Deferribacteres
Chrysiogenetes
phyla are only
NKB19
Verrucomicrobia sparsely
Chlamydia
OP3
Planctomycetes
sampled
Spriochaetes
Coprothmermobacter ⢠Same trend in
OP10
Thermomicrobia
ChloroďŹexi
Eukaryotes
TM7
Deinococcus-Thermus
Dictyoglomus
AquiďŹcae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
97. Proteobacteria
⢠NSF-funded TM6
OS-K
⢠At least 40
Tree of Life Acidobacteria
Termite Group phyla of bacteria
OP8
Project Nitrospira
⢠Genome
Bacteroides
⢠A genome Chlorobi
Fibrobacteres sequences are
Marine GroupA
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
⢠Some other
Synergistes
Deferribacteres
Chrysiogenetes
phyla are only
NKB19
Verrucomicrobia sparsely
Chlamydia
OP3
Planctomycetes
sampled
Spriochaetes
Coprothmermobacter ⢠Same trend in
OP10
Thermomicrobia
ChloroďŹexi
Viruses
TM7
Deinococcus-Thermus
Dictyoglomus
AquiďŹcae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
98. Proteobacteria
⢠GEBA TM6
OS-K ⢠At least 40
Acidobacteria
⢠A genomic Termite Group
OP8
phyla of bacteria
encyclopedia Nitrospira
Bacteroides ⢠Genome
Chlorobi
of bacteria Fibrobacteres
Marine GroupA
sequences are
and archaea WS3
Gemmimonas mostly from
Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria ⢠Some other
Synergistes
Deferribacteres
Chrysiogenetes
phyla are only
NKB19
Verrucomicrobia sparsely
Chlamydia
OP3
Planctomycetes
sampled
Spriochaetes
Coprothmermobacter
OP10
⢠Solution: Really
Thermomicrobia
ChloroďŹexi Fill in the Tree
TM7
Deinococcus-Thermus
Dictyoglomus
AquiďŹcae
Thermudesulfobacteria
Eisen & Ward, PIs Thermotogae
OP1
OP11
100. GEBA Pilot Project: Components
⢠Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan
Eisen, Eddy Rubin, Jim Bristow)
⢠Project management (David Bruce, Eileen Dalin, Lynne Goodwin)
⢠Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)
⢠Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus,
Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)
⢠Annotation and data release (Nikos Kyrpides, Victor Markowitz, et
al)
⢠Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor
Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik
DâHaeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N.
Ivanova, Athanasios Lykidis, Adam Zemla)
⢠Adopt a microbe education project (Cheryl Kerfeld)
⢠Outreach (David Gilbert)
⢠$$$ (DOE, Eddy Rubin, Jim Bristow)
101. GEBA Pilot Project Overview
⢠Identify major branches in rRNA tree for
which no genomes are available
⢠Identify those with a cultured representative in
DSMZ
⢠DSMZ grew > 200 of these and prepped DNA
⢠Sequence and ďŹnish 100+ (covering breadth of
bacterial/archaea diversity)
⢠Annotate, analyze, release data
⢠Assess beneďŹts of tree guided sequencing
⢠1st paper Wu et al in Nature Dec 2009
102. Network of Life
Bacteria
Archaea
Eukaryotes
Figure from Barton, Eisen et al.
âEvolutionâ, CSHL Press.
Based on tree from Pace NR, 2003.
103. GEBA Lesson 1:
The rRNA Tree of Life is a Useful Tool
for Identifying Phylogenetically Novel
From Wu et al. 2009 Nature 462, 1056-1060
104. GEBA Lesson 2:
The rRNA Tree of Life is not perfect ...
16s WGT, 23S
Badger et al. 2005 Int J System Evol Microbiol 55: 1021-1026.
105. GEBA Lesson 3:
Phylogeny driven genome selection (and
phylogenetics) improves genome annotation
⢠Took 56 GEBA genomes and compared results vs. 56
randomly sampled new genomes
⢠Better deďŹnition of protein family sequence âpatternsâ
⢠Greatly improves âcomparativeâ and âevolutionaryâ
based predictions
⢠Conversion of hypothetical into conserved hypotheticals
⢠Linking distantly related members of protein families
⢠Improved non-homology prediction
109. Network of Life
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
âEvolutionâ, CSHL Press.
Based on tree from Pace NR, 2003.
110. Protein Family Rarefaction
Curves
⢠Take data set of multiple complete genomes
⢠Identify all protein families using MCL
⢠Plot # of genomes vs. # of protein families
118. Structural Novelty
⢠Of the 17000 protein families in the GEBA56, 1800
are novel in sequence (Wu)
⢠Structural modeling suggests many are structurally
novel too (D'haeseleer)
⢠372 being crystallized by the PSI (Kerfeld)
120. Weighted % of Clones
0
0.1250
0.2500
0.3750
0.5000
Al
ph
ap
ro
te
Be ob
ta ac
pr te
ot ria
G eo
am ba
m ct
ap er
ro ia
Ep te
si ob
lo ac
np te
ro ria
D te
el ob
ta ac
pr te
ot ria
eo
C ba
ya ct
no er
b ia
ac
te
Fi ria
rm
ic
ut
Ac e s
tin
ob
ac
te
C ria
hl
o ro
bi
C
FB
Major Phylogenetic Group
Sargasso Phylotypes
C
hl
o ro
fle
Sp xi
iro
ch
ae
Fu te
so s
D ba
ei ct
no er
c oc ia
cu
s-
Eu Th
ry erm
ar
ch us
C ae
re ot
na a
rc
ha
eo
ta
Shotgun Sequencing Allows Use of Other Markers
EFG
Venter et al., Science 304: 66-74. 2004
EFTu
rRNA
RecA
RpoB
HSP70
Editor's Notes
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
It has been less than 10 years since the first genome was determined\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Genome sizes estimated from careful cytospectrophotometry in the 1970’s. 180 Mb = Drosophila size.\nMAC chromosome copy # exception: rDNA @ ~9,000 copies per MAC (by quantitative DNA hybridization)\nChromosome #s:\n MIC: Direct microscopic observations (1950s)\n Quantitative measurements in stained pulsed-field gels (1980s)\n
Cbs = chromosome breakage site\nIES = internally eliminated segment\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Functional prediction using a gene tree is just like predicting the biology of a species using a species tree\n
Extension of rRNA analysis to uncultured organisms using PCR\n
\n
\n
\n
Functional prediction using a gene tree is just like predicting the biology of a species using a species tree\n
Functional prediction using a gene tree is just like predicting the biology of a species using a species tree\n
Functional prediction using a gene tree is just like predicting the biology of a species using a species tree\n
\n
\n
\n
\n
This is a tree of a rRNA gene that was found on a large DNA fragment isolated from the Monterey Bay. This rRNA gene groups in a tree with genes from members of the gamma Proteobacteria a group that includes E. coli as well as many environmental bacteria. This rRNA phylotype has been found to be a dominant species in many ocean ecosystems.\n
\n
\n
\n
Phylogenetic analysis of rRNAs led to the discovery of archaea\n
This is a tree of a rRNA gene that was found on a large DNA fragment isolated from the Monterey Bay. This rRNA gene groups in a tree with genes from members of the gamma Proteobacteria a group that includes E. coli as well as many environmental bacteria. This rRNA phylotype has been found to be a dominant species in many ocean ecosystems.\n\n clone from the Sargasso Sea. This shows that this \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
It has been less than 10 years since the first genome was determined\n