1. Lecture 13:
EVE 161:â¨
Microbial Phylogenomics
!
Lecture #13:
Era III: Genome Sequencing and
Phylogenomic Analysis
!
UC Davis, Winter 2014
Instructor: Jonathan Eisen
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!1
2. Where we are going and where we have been
⢠Previous lecture:
! 12: Guest Lecture
⢠Current Lecture:
! 13: Genome Sequencing III
⢠Next Lecture:
! 14: Metagenomics
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!2
4. Phylogenomics I:Major Evolutionary Transitions
⢠Analysis of S. pombe genome by Wood et al
2002
⢠Compared the genomes of eukaryotes to
those of prokaryotes
⢠âAre there genes found in all eukaryotes
with no obvious homologs in any
prokaryote?â
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
5. Evolutionary Model
S. pombe
Eukaryotes
S. cerevisiae
Encephalatozoon
Archaea
Bacteria
Worm
Fly
Humans
Dictyostelium
Arabidopsis
Chlamydomonas
Phytophthora
Tetrahymena
Plasmodium
Trypanosoma
Euglena
Naegleria
Trichomonas
Giardia
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
6. Eukaryotic Specific Genes
⢠>200 genes found including:
â Cytoskeleton components: tubulin,
ankyrin, myosin
â Protein degradation: ubiquitin, proteases
â Chromatin and DNA packaging
⢠Of the 200 many had no known function:
could encode novel eukaryotic wide
processes
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
7. Multi- vs. Single-Cellular Eukaryotes
⢠Further analysis of S. pombe genome
⢠Compared multi-cellular vs. single-cellular eukaryotes
(animals and plants vs. yeast)
⢠âAre there genes in all multi-cellular and not in any singlecellular?â
⢠Found only 3
⢠Concluded that the genetic basis of multi-cellularity was
likely to be gene regulation and not invention of new genes
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
8. Multiple Origins of Multicellularity
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
10. Endosymbiont Evolution
⢠Compared to free-living relatives
â
â
â
â
Smaller genomes
Lower GC content
Higher pIs
Higher rates of sequence evolution
⢠Baumannia shows ALL of these
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
11. Uses of Whole Genome Trees
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
13. Variation Between Endosymbionts and Free Living
⢠Repair hypothesis
!
⢠Population genetics hypothesis
!
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
14. Variation Between Endosymbionts and Free Living
⢠Repair hypothesis
!
⢠Population genetics hypothesis
!
⢠PopGen explanations favored
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
17. Variation Among Endosymbionts
⢠Repair hypothesis
!
⢠Population genetics hypothesis
!
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
18. Variation Among Endosymbionts
⢠Repair hypothesis
!
⢠Population genetics hypothesis
!
⢠Repair explanations favored
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
28. Steps in Lateral Gene Transfer (LGT)
A
B
C
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
D
29. Steps in Lateral Gene Transfer (LGT)
A
B
C
1
D
Gene acquires host features
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
30. Steps in Lateral Gene Transfer (LGT)
A
B
C
D
2
Transfer
1
Gene acquires host features
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
31. Steps in Lateral Gene Transfer (LGT)
A
B
C
3-5
D
Integration, selection, spread
2
Transfer
1
Gene acquires host features
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
32. Steps in Lateral Gene Transfer (LGT)
A
B
C
D
Amelioration
Integration, selection, spread
6
3-5
2
Transfer
1
Gene acquires host features
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
33. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
34. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
35. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
36. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
37. How to Infer Gene Transfers
⢠Unusual distribution patterns
!
⢠Unusual nucleotide composition
!
⢠High sequence similarity to supposedly
distantly related species
!
⢠Unusual gene trees
!
⢠Observe transfer events
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
38. Case Study I: Aphids
ig. 1 Coloration and carotenoids in the pea aphid. Typical green (A) and red (B) aphid clones, (C) 5AY, a green mutant clone arising from the red clone 5A. (D)
roďŹles of carotenoids in red (5A, LSR1), mutant redgreen (5AY, two samples), and green (8-10-1, 7-2-1) pea aphid clones. Torulene and a related red compound
re restricted to red clones; the mutant 5AY clone lacks these and displays an elevation in their predicted precursor, -carotene.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
39. Case Study I: Aphids
Table 1 Genes in the A. pisum genome with closest homology to carotenoid biosynthetic enzymes, including scaffold of origin and matching EST sequences.
Similar color indicates that the gene is on the same scaffold. The 3' end of scaffold NW_001925130 overlaps with the 5' end of NW_001923501 for 5400 base
pairs, and PCR demonstrated continuity of these scaffolds. Pink row is the gene corresponding to torR and conferring red color (see text). Protein length, amino
acids; ESTs are those present in GenBank, mostly from clone LSR1.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
40. Case Study I: Aphids
Fig. 2 Phylogenetic relations of inferred carotenoid biosynthetic enzymes from the pea aphid genome. (A) Carotenoid desaturases and (B) carotenoid cyclaseâ
carotenoid synthases. Sequences are from aphids, bacteria, plants, and fungi; no homologs were detectable in other sequenced animal genomes. Bootstrap
support greater than 50% is indicated on branches.
!
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
41. Case Study II: GEBA
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
42. Tree of Life
Figure from Barton, Eisen et al. âEvolutionâ, CSHL Press based on Baldauf et al Tree
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
43. Genomes Poorly Sampled
Figure from Barton, Eisen et al. âEvolutionâ, CSHL Press based on Baldauf et al Tree
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
44. TIGR Tree of Life Project
Figure from Barton, Eisen et al. âEvolutionâ, CSHL Press based on Baldauf et al Tree
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
45. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!41
46. Genomes Still Poorly Sampled
Figure from Barton, Eisen et al. âEvolutionâ, CSHL Press based on Baldauf et al Tree
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
47. Genomic Encyclopedia of Bacteria & Archaea
Wu et al. 2009 Nature 462, 1056-1060
Figure from Barton, Eisen et al. âEvolutionâ, CSHL Press based on Baldauf et al Tree
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
48. Genomic Encyclopedia of Bacteria & Archaea
Wu et al. 2009 Nature 462, 1056-1060
Figure from Barton, Eisen et al. âEvolutionâ, CSHL Press based on Baldauf et al Tree
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
49. GEBA Lesson 1: rRNA utility in IDing novel genomes
From Wu et al. 2009 Nature 462, 1056-1060
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!45
50. GEBA Lesson 2: rRNA Tree is not perfect
16s
WGT, 23S
Badger et al. 2005 Int J System Evol Microbiol 55: 1021-1026.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!46
51. GEBA Lesson 3: Phylogenetic sampling improves annotation
⢠Took 56 GEBA genomes and compared results vs. 56
randomly sampled new genomes
⢠Better definition of protein family sequence âpatternsâ
⢠Greatly improves âcomparativeâ and âevolutionaryâ
based predictions
⢠Conversion of hypothetical into conserved hypotheticals
⢠Linking distantly related members of protein families
⢠Improved non-homology prediction
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!47
52. GEBA Lesson 4 : Metadata Important
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!48
53. GEBA Lesson 5:Improves discovering new genetic diversity
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!49
54. Protein Family Rarefaction Curves
⢠Take data set of multiple complete genomes
⢠Identify all protein families using MCL
⢠Plot # of genomes vs. # of protein families
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!50
55. Wu et al. 2009 Nature 462, 1056-1060
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!51
56. Wu et al. 2009 Nature 462, 1056-1060
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!51
57. Wu et al. 2009 Nature 462, 1056-1060
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!51
58. Wu et al. 2009 Nature 462, 1056-1060
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!51
59. Wu et al. 2009 Nature 462, 1056-1060
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!51
60. Synapomorphies exist
Wu et al. 2009 Nature 462, 1056-1060
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!52
61. Phylogenetic Distribution Novelty:
Bacterial Actin Related Protein
87
C. boidinii  gi57157304
S. cerevisiae  gi14318479
L. starkeyi  gi166080363Â
S. japonicus  gi213407080
ACTIN
A. cliftonii  gi14269497
U. pertusa  gi50355609
99
H. sapiens  gi4501889
M. cerebralis  gi46326807
67
C. cinerea  gi169844021
ARP1
N. crassa  gi85101929
100
I. scapularis  gi215507378Â
100 H. sapiens  gi5031569
51
65
S. japonicus  gi213404844
100
S. cerevisiae  gi6320175
ARP2
D. melanogaster  gi24642545
100 G. gallus gi45382569
75
C. neoformans  gi58266690
S. cerevisiae  gi6322525
ARP3
100
D. melanogaster  gi17737543
100 H. sapiens  gi5031573Â
H. ochraceum  gi227395998
BARP
S. cerevisiae  gi1008244Â
P. patens  gi168051992Â
ARP4
73
99
A. thaliana  gi18394608Â
94
S. cerevisiae  gi1301932
100
S. japonicus  gi213408393Â
ARP5
D. discoideum  gi66802418
D. melanogaster  gi17737347
74
S. cerevisiae  gi6323114
97
ARP6
100
D. hansenii gi21851 1921
100
O. sativa  gi182657420Â
ARP7
A. thaliana gi1841 1737
D. melanogater  gi19920358
100
M. musculus  gi226246593
ARP10
0.5Â
Haliangium ochraceum DSM 14365
Patrik Dâhaeseleer, Adam Zemla,
Victor Kunin
See also Guljamow et al. 2007 Current Biology.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
63. Haloarchaeal GEBA-like
Lynch et al. (2012) PLoS ONE 7(7): e41389. doi:10.1371/journal.pone.0041389
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
64. The Dark Matter of Biology
From Wu et al. 2009 Nature 462, 1056-1060
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
65. GEBA Uncultured
SAR
A: Hydrothermal vent
B: Gold Mine
C: Tropical gyres (Mesopelagic)
D: Tropical gyres (Photic zone)
OP3
Site
Site
Site
Site
OP1
406
OD1
1
Number of SAGs from Candidate Phyla
4
6
1
1
13
-
2
-
2
-
Sample collections at 4 additional sites are underway.
Phil Hugenholtz
!57
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
66. JGI Dark Matter Project
brackish/freshwater
TG
HSM
SM
GBS
GBS
HOT
OT
SAK
AK
hydrothermal
sediment
ETL
E
BACTERIA
ARCHAEA
UGA recoded for Gly (Gracilibacteria)
seawater
HGT from Eukaryotes (Nanoarchaea)
bioreactor
EPR
EPR
T
TA
G
GOM
OM
Growing
AA chain
U
oxidoretucase
Ribo
A
P51$
environmental
samples (n=9)
draft genomes
(n=201)
W51$*O
67. recognizes
UGA
G
isolation of single
cells (n=9,600)
SSU rRNA gene
based identification
(n=2,000)
whole genome
amplification (n=3,300)
U
genome sequencing,
assembly and QC (n=201)
1
H
H
1
1
$,$5
adenine
+2+2
+2+2
OH
2+3
Woyke et al. Nature 2013.
limiting
phosphate,
fatty acids,
carbon, iron SpotT
1+ 2
1$'+
51$ SROPHUDVH
Äą3
Äą2
-10
Äą1
GTP or GDP
+ATP
limiting
amino acids
RelA
ppGpp
(GTP or GDP)
+ PPi
H
DksA
Expression of components
for stress response
O
OH
+2+2
O
O
O
1+
1+
2+3
2+3
tetrapeptide
e- acceptor
stringent response
(Diapherotrites, Nanoarchaea)
H
+2+2
O
IMP
1
1
O
O
1+
Äą4
-35
)$,$5
1
guanine
O
PurP
O
+2 1
H
H H
+
Č ČÂś
ÄŽ7'
?
H 1
+
1+2
O
Oxidation
Archaea
PurF
PurD
3XU1
PurL/Q
PurM
PurK
PurE
3XU
PurB
1+2
1
O
ÄŽ17'
archaeal type purine synthesis
(Microgenomates)
1
Eukaryota
ADP
sigma factor (Diapherotrites, Nanoarchaea)
ribosome
PRPP
Reduction
1$'+ + + H-
A U
A U
G U
A A U G A U
Ribo
1+
H
Korarchaeota
Cren Thermoprotei
Thaumarchaeota
Cren MCG
Cren pISA7
Cren C2
Aigarchaeota
Nanoarchaea
Micrarchaea
pMC2A384 (Diapherotrites)
DSEG (Aenigmarchaea)
Nanohaloarchaea
Euryarchaeota
:6
OP11 (Microgenomates)
OD1 (Parcubacteria)
SR1
BH1
TM7
GN02 (Gracilibacteria)
Bacteriodetes
OP1 (Acetothermia)
'HLQRFRFFXVĂ7KHUPXV
093Ă
70
ZB3
)LEUREDFWHUHV
TG3
Spirochaetes
WWE1 (Cloacamonetes)
Proteobacteria
)LUPLFXWHV
Tenericutes
)XVREDFWHULD
Chrysiogenetes
Chlorobi
6$5 0DULQLPLFURELD
70. A Genomic Encyclopedia of Microbes (GEM)
Figure from Barton, Eisen et al. âEvolutionâ, CSHL Press based on Baldauf et al Tree
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
71. A Genomic Encyclopedia of Microbes (GEM)
Figure from Barton, Eisen et al. âEvolutionâ, CSHL Press based on Baldauf et al Tree
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
72. GEBA Lesson 6: Improves analysis of metagenomic data
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!61
73. Other Markers
Sargasso Phylotypes
0.500
GEBA Project improves
metagenomic analysis
EFG
EFTu
HSP70
RecA
RpoB
rRNA
0.250
0.125
us
ar
ch
ae
ot
C
a
re
na
rc
ha
eo
ta
er
m
er
ia
ct
Th
ba
Eu
cc
u
in
o
co
ry
s-
so
iro
c
ha
et
es
ex
i
Sp
hl
or
oďŹ
C
FB
C
Fu
De
ap
ro
t
eo
ba
Be
ct
ta
er
pr
ia
ot
eo
ba
G
am
ct
er
m
ia
ap
ro
te
ob
Ep
ac
si
lo
te
np
ria
ro
te
ob
De
ac
lta
te
pr
ria
ot
eo
ba
ct
C
er
ya
ia
no
ba
ct
er
ia
Fi
rm
ic
ut
es
Ac
tin
ob
ac
te
ria
C
hl
or
ob
i
0.000
Al
ph
Weighted % of Clones
0.375
Major Phylogenetic Group
Venter et Eisen Winter 304:
Slides for UC Davis EVE161 Course Taught by Jonathan al., Science2014
66-74. 2004
!62
74. Venter et Eisen Winter 304:
Major Phylogenetic Group
Slides for UC Davis EVE161 Course Taught by Jonathan al., Science2014
ar
ch
re
n
C
ot
a
ae
a
ot
us
rm
ria
s
te
te
ae
ry
ar
ch
Eu
Th
e
s-
cu
oc
De
in
oc
ac
so
b
Fu
ae
ch
Sp
iro
xi
or
oďŹ
e
hl
C
FB
C
i
or
ob
hl
C
ria
te
ob
ac
tin
Ac
es
ut
ic
rm
Fi
ria
ria
te
ac
ob
ya
n
C
te
ia
er
ia
er
ct
ba
c
eo
pr
ot
lta
De
ct
eo
ba
ro
t
np
si
lo
ria
te
0.375
Ep
eo
ba
ro
t
ap
am
m
G
ba
c
eo
ia
er
ct
ba
eo
pr
ot
pr
ot
ta
ph
a
Be
Al
Weighted % of Clones
Other Markers
0.500
Sargasso Phylotypes
But not a lot
EFG
rRNA
66-74. 2004
EFTu
0.250
0.125
0.000
!63
75. rRNA Tree of Life
Bacteria
Archaea
Eukaryotes
Figure from Barton, Eisen et al. âEvolutionâ, CSHL Press.
2007.
Based on tree from Pace 1997 Science 276:734-740
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!64
76. PD: Genomesâ¨
From Wu et
al. 2009
Nature 462,
1056-1060
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!65
77. PD: Genomes + GEBAâ¨
From Wu et
al. 2009
Nature 462,
1056-1060
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!66
78. PD: Isolates
From Wu et al. 2009 Nature 462,
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 1056-1060
!67
79. PD: All
From Wu et al. 2009 Nature 462,
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 1056-1060
!68
80. Uncultured Lineages:â¨
Technical Approaches
⢠Get into culture
⢠Enrichment cultures
⢠If abundant in low diversity ecosystems
⢠Flow sorting
⢠Microbeads
⢠Microfluidic sorting
⢠Single cell amplification
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!69
81. GEBA uncultured
SAR
A: Hydrothermal vent
B: Gold Mine
C: Tropical gyres (Mesopelagic)
D: Tropical gyres (Photic zone)
OP3
Site
Site
Site
Site
OP1
406
OD1
1
Number of SAGs from Candidate Phyla
4
6
1
1
13
-
2
-
2
-
Sample collections at 4 additional sites are underway.
Phil Hugenholtz
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!70