The document discusses Jonathan Eisen's talk at the 2010 Lake Arrowhead Microbial Genomes conference. It covers various topics discussed at the conference, including the importance of history and sequencing more phyla to fill gaps in the bacterial and archaeal phylogenetic trees. Key quotes and concepts from previous years' conferences are also presented.
Jonathan Eisen talk at Lake Arrowhead Microbial Genomics Mtg #LAMG10
1. The Importance of History
(and other obsessions)
Jonathan A. Eisen
UC Davis
Talk for Lake Arrowhead Microbial
Genomes 2010 (#LAMG10)
Wednesday, September 15, 2010
10. Homework
• Do blastp search with other famous people
associated with Lake Arrowhead Meeting
• JEFFREYHMILLER
• SARAHPALIN and her relationship to fungi
B. fuckeliana
• see http://phylogenomics.blogspot.com/
2008/09/tracing-evolutionary-history-of-
sarah.html
Wednesday, September 15, 2010
19. Quotes 2004
• Space-time continuum of genes and genomes
• Gene sequences are the wormhole that allows
one to tunnel into the past
• The human mind can conceive of things with no
basis in physical reality
• Thoughts can go faster than the speed of light
Wednesday, September 15, 2010
21. Quotes 2006
• The human guts are a real milieu of stuff
• You better kiss everybody
• Microbes not only have a lot of sex, they have a
lot of weird sex
• This is how you do metagenomics on 50
dollars, and that’s Canadian dollars
Wednesday, September 15, 2010
22. Quotes 2008
• Antibiotics do not kill things, they corrupt them
• There comes a point in life when you have to bring
chemists into the picture
• The rectal swabs are here in tan color
• And there's Jeffrey Dahmer
• We are the environment. We live the phenotype.
• If I have time I will tell you about a dream
• A paper came out next year
Wednesday, September 15, 2010
23. Quotes 2010
• We have been using this word for many years without actually realizing it
was correct
• Another thing you need to know" pause "Actually you don't NEED to
know any of this
• "I have been influenced by Fisher Price throughout my life
• Don't take that away from us
• It takes 1000 nanobiologists to make one microbiologist
• I am going to wrap up as I hear the crickets chirping
• And we will bring out the unused cheese from yesterday
• In an engineering sense, the vagina is a simple plug flow reactor
• This is going to be ironic coming from someone who studies circumcision
• A little bit about time, but I am going to spend a lot less time on time than
on space
Wednesday, September 15, 2010
24. Keywords I remember from 2010
• Penis
• Vagina
• Anthrax
• Acne
• Ulcer (multiple kinds)
• Global warming
• Antibiotic resistance
• Virulence
24
Wednesday, September 15, 2010
27. rRNA Tree of Life
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Wednesday, September 15, 2010
28. Proteobacteria
2002 TM6
OS-K
Acidobacteria
• At least 40
Termite Group
OP8
phyla of
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA
WS3
Gemmimonas
Firmicutes
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on Hugenholtz,
OP11 2002
Wednesday, September 15, 2010
29. 2002
Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA
• Genome
WS3
Gemmimonas sequences are
Firmicutes
Fusobacteria mostly from
Actinobacteria
OP9
Cyanobacteria
three phyla
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on Hugenholtz,
OP11 2002
Wednesday, September 15, 2010
30. 2002
Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA
• Genome
WS3
Gemmimonas sequences are
Firmicutes
Fusobacteria mostly from
Actinobacteria
OP9
Cyanobacteria
three phyla
Synergistes
Deferribacteres
Chrysiogenetes • Some other
NKB19
Verrucomicrobia
Chlamydia
phyla are only
OP3
Planctomycetes
Spriochaetes
sparsely
Coprothmermobacter
OP10
sampled
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on Hugenholtz,
OP11 2002
Wednesday, September 15, 2010
31. 2002
Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA
• Genome
WS3
Gemmimonas sequences are
Firmicutes
Fusobacteria mostly from
Actinobacteria
OP9
Cyanobacteria
three phyla
Synergistes
Deferribacteres
Chrysiogenetes • Some other
NKB19
Verrucomicrobia
Chlamydia
phyla are only
OP3
Planctomycetes
Spriochaetes
sparsely
Coprothmermobacter
OP10
sampled
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on Hugenholtz,
OP11 2002
Wednesday, September 15, 2010
32. Why Increase Phylogenetic Coverage?
• Common approach within some eukaryotic
groups (FGP, NHGRI, etc)
• Many successful small projects to fill in
bacterial or archaeal gaps
• Phylogenetic gaps in bacterial and archaeal
projects commonly lamented in literature
• Many potential benefits
Wednesday, September 15, 2010
33. Proteobacteria
• NSF-funded TM6 • At least 40 phyla
OS-K
Tree of Life Acidobacteria
Termite Group of bacteria
OP8
Project Nitrospira
• Genome
Bacteroides
Chlorobi
• A genome Fibrobacteres
Marine GroupA
sequences are
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
Synergistes
• Some other
Deferribacteres
Chrysiogenetes phyla are only
NKB19
Verrucomicrobia
Chlamydia
sparsely sampled
OP3
Planctomycetes
Spriochaetes
• Solution I:
Coprothmermobacter
OP10 sequence more
Thermomicrobia
Chloroflexi
TM7
phyla
Deinococcus-Thermus
Dictyoglomus
Aquificae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
Wednesday, September 15, 2010
35. Proteobacteria
• NSF-funded TM6 • At least 40 phyla
OS-K
Tree of Life Acidobacteria
Termite Group of bacteria
OP8
Project Nitrospira
• Genome
Bacteroides
Chlorobi
• A genome Fibrobacteres
Marine GroupA
sequences are
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
Synergistes
• Some other
Deferribacteres
Chrysiogenetes phyla are only
NKB19
Verrucomicrobia
Chlamydia
sparsely sampled
OP3
Planctomycetes
Spriochaetes
• Still highly
Coprothmermobacter
OP10 biased in terms
Thermomicrobia
Chloroflexi
TM7
of the tree
Deinococcus-Thermus
Dictyoglomus
Aquificae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
Wednesday, September 15, 2010
37. Proteobacteria
• NSF-funded TM6 • At least 40 phyla
OS-K
Tree of Life Acidobacteria
Termite Group of bacteria
OP8
Project Nitrospira
• Genome
Bacteroides
Chlorobi
• A genome Fibrobacteres
Marine GroupA
sequences are
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
Synergistes
• Some other
Deferribacteres
Chrysiogenetes phyla are only
NKB19
Verrucomicrobia
Chlamydia
sparsely sampled
OP3
Planctomycetes
Spriochaetes
• Same trend in
Coprothmermobacter
OP10 Archaea
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
Wednesday, September 15, 2010
38. Proteobacteria
• NSF-funded TM6 • At least 40 phyla
OS-K
Tree of Life Acidobacteria
Termite Group of bacteria
OP8
Project Nitrospira
• Genome
Bacteroides
Chlorobi
• A genome Fibrobacteres
Marine GroupA
sequences are
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
Synergistes
• Some other
Deferribacteres
Chrysiogenetes phyla are only
NKB19
Verrucomicrobia
Chlamydia
sparsely sampled
OP3
Planctomycetes
Spriochaetes
• Same trend in
Coprothmermobacter
OP10 Eukaryotes
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
Wednesday, September 15, 2010
39. Proteobacteria
• NSF-funded TM6 • At least 40 phyla
OS-K
Tree of Life Acidobacteria
Termite Group of bacteria
OP8
Project Nitrospira
• Genome
Bacteroides
Chlorobi
• A genome Fibrobacteres
Marine GroupA
sequences are
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
Synergistes
• Some other
Deferribacteres
Chrysiogenetes phyla are only
NKB19
Verrucomicrobia
Chlamydia
sparsely sampled
OP3
Planctomycetes
Spriochaetes
• Same trend in
Coprothmermobacter
OP10 Viruses
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
Wednesday, September 15, 2010
40. Proteobacteria
• GEBA TM6
OS-K • At least 40 phyla
Acidobacteria
• A genomic Termite Group
OP8
of bacteria
encyclopedia Nitrospira
Bacteroides • Genome
Chlorobi
of bacteria and Fibrobacteres
Marine GroupA sequences are
archaea WS3
Gemmimonas mostly from
Firmicutes
Fusobacteria
Actinobacteria
three phyla
OP9
Cyanobacteria
Synergistes
• Some other
Deferribacteres
Chrysiogenetes phyla are only
NKB19
Verrucomicrobia
Chlamydia sparsely sampled
OP3
Planctomycetes
Spriochaetes • Solution: Really
Coprothmermobacter
OP10
Thermomicrobia
Fill in the Tree
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
Wednesday, September 15, 2010
41. GEBA Pilot Project Overview
• Identify major branches in rRNA tree for
which no genomes are available
• Identify those with a cultured representative in
DSMZ
• DSMZ grew > 200 of these and prepped DNA
• Sequence and finish 100+ (covering breadth of
bacterial/archaea diversity)
• Annotate, analyze, release data
• Assess benefits of tree guided sequencing
• 1st paper Wu et al in Nature Dec 2009
Wednesday, September 15, 2010
42. GEBA Pilot Project: Components
• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen,
Eddy Rubin, Jim Bristow, Tanya Woyke)
• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)
• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)
• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat
Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)
• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al)
• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor
Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer,
Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova,
Athanasios Lykidis, Adam Zemla)
• Adopt a microbe education project (Cheryl Kerfeld)
• Outreach (David Gilbert)
• $$$ (DOE, DSMZ, GBMF)
Wednesday, September 15, 2010
43. GEBA and Openness
• All data released as quickly as
possible w/ no restrictions to
IMG-GEBA; Genbank, etc
• Data also available in
Biotorrents (http://
biotorrents.net)
• Individual genome reports
published in OA “Standards in
Genome Sciences (SIGS)”
• 1st GEBA paper in Nature freely
available and published using
Creative Commons License
43
Wednesday, September 15, 2010
44. GEBA Lesson 1
rRNA Tree is Useful for Identifying
Phylogenetically Novel Organisms
44
Wednesday, September 15, 2010
45. rRNA Tree of Life
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Wednesday, September 15, 2010
46. Network of Life?
Bacteria
Archaea
Eukaryotes
Figure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Wednesday, September 15, 2010
47. Compare PD in rRNA and WGT
Wednesday, September 15, 2010
48. PD of rRNA, Genome Trees Similar
From Wu et al. 2009 Nature 462, 1056-1060
Wednesday, September 15, 2010
49. GEBA Lesson 2
Phylogeny-driven genome selection
helps discover new genetic diversity
Wednesday, September 15, 2010
50. Network of Life?
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Wednesday, September 15, 2010
51. Protein Family Rarefaction
Curves
• Take data set of multiple complete genomes
• Identify all protein families using MCL
• Plot # of genomes vs. # of protein families
Wednesday, September 15, 2010
60. Most/All Functional Prediction Improves
w/ Better Phylogenetic Sampling
• Took 56 GEBA genomes and compared results vs. 56
randomly sampled new genomes
• Better definition of protein family sequence “patterns”
• Greatly improves “comparative” and “evolutionary”
based predictions
• Conversion of hypothetical into conserved hypotheticals
• Linking distantly related members of protein families
• Improved non-homology prediction
Kostas Natalia Thanos Nikos Iain
Mavrommatis Ivanova Lykidis Kyrpides Anderson
Wednesday, September 15, 2010
61. GEBA Lesson 4
Metadata and individual genome
papers important
Wednesday, September 15, 2010
62. SIGS
http://standardsingenomics.org/
Wednesday, September 15, 2010
63. GEBA Lesson 5
Phylogeny-driven genome selection
improves analysis of metagenome data
Wednesday, September 15, 2010
64. Wednesday, September 15, 2010
genomes
if no reference
• Assigning reads to
phylogenetic groups
using multiple genes
• Phylogenetic binning
• Phylogenetic ecology
- especially important
Weighted % of Clones
Al
pha
pr
ot
0
0.1250
0.2500
0.3750
0.5000
Be eo
Al
ta ba
ph
G
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
am
pr
ot ct
er
a
m eo ia Be pro
ap ba
ro ct ta teo
D te er G p b
el ob ia
ta am rot ac
pr ac
Ep ot te
U si
lo eo ria m eo te
nc ba
ba ria
la np Ep ap
ss ro ct ct
ifi te er si rot
ed ob ia lo
Pr ac n eo eria
ot te De pr ba
eo ria
ba lta ote cte
Cy ct pr ob ria
an er
ob ia o a
ac C teo cte
Ch te ya b ri
ria
la no ac a
m b te
Ac yd
id ia
ob e Fi act ria
rm er
Ba act
ct er
ia
Ac ic ia
Uses of phylogenetic
er ut
Ac oi tin es
de
tin te ob
ob s a
ac
te C cte
ria hl ri
Aq or a
Pl ui
an fic ob
ct
om ae C i
yc FB
Sp et C
iro es hl
ch o
ae
te
Major Phylogenetic Group
Fi
Sp rof
rm s
ic
iro lex
i
Sargasso Phylotypes
ut
classification in metagenomics
Ch es Fu cha
lo
ro De
U fle
so ete
nc xi in ba s
la Ch oc
ss lo ct
ifi ro oc
ed bi
er
Ba Ecus ia
ct ur -
er
ia yaTh
C rcherm
re
na aeous
frr
tsf
t
pgk
rplL
rplF
rplP
rplT
rplE
infC
rpsI
rplS
rplA
rplB
rplK
rplC
rpsJ
rc
rplN
rplD
rplM
rpsE
rpsS
rpsB
rpsK
rpsC
rpoB
rpsM
pyrG
nusA
dnaG
rpmA
smpB
ha a
eo
ta
65. Wednesday, September 15, 2010
genomes
if no reference
phylogenetic groups
using multiple genes
Limited
• Phylogenetic binning
• Phylogenetic ecology
- especially important
sampling
Weighted % of Clones
Al
pha
pr
ot
0
0.1250
0.2500
0.3750
0.5000
Be eo
Al
ta ba
ph
G
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
pr a
poor genomic
am ot ct
er
m eo ia Be pro
ap ba
ro ct ta teo
D te er G p b
el ob ia
ta
• Assigning reads to in past
pr ac am rot ac
Ep ot te
U si
lo eo ria m eo te
nc ba
ba ria
la np Ep ap
ss ro ct ct
ifi te er si rot
ed ob ia lo
Pr ac n eo eria
ot te De pr ba
eo ria
ba lta ote cte
Cy ct pr ob ria
an er
ob ia o a
by
ac C teo cte
Ch te ya b ri
ria
la no ac a
m b te
Ac yd
id ia
ob e Fi act ria
rm er
Ba act
ct er
ia
Ac ic ia
Uses of phylogenetic
er ut
Ac oi tin es
de
tin te ob
ob s a
ac
te C cte
ria hl ri
Aq or a
Pl ui
an fic ob
ct
om ae C i
yc FB
Sp et C
iro es hl
ch o
ae
te
Major Phylogenetic Group
Fi
Sp rof
rm s
ic
iro lex
i
Sargasso Phylotypes
ut
classification in metagenomics
Ch es Fu cha
lo
ro De
U fle
so ete
nc xi in ba s
la Ch oc
ss lo ct
ifi ro oc
ed bi
er
Ba Ecus ia
ct ur -
er
ia yaTh
C rcherm
re
na aeous
frr
tsf
t
pgk
rplL
rplF
rplP
rplT
rplE
infC
rpsI
rplS
rplA
rplB
rplK
rplC
rpsJ
rc
rplN
rplD
rplM
rpsE
rpsS
rpsB
rpsK
rpsC
rpoB
rpsM
pyrG
nusA
dnaG
rpmA
smpB
ha a
eo
ta
66. Metagenomic Analysis Improves
w/ Phylogenetic Sampling
• Small but real improvements in
–Gene identification / confirmation
–Functional prediction
–Binning
–Phylogenetic classification
Wednesday, September 15, 2010
67. Metagenomic Analysis Improves
w/ Phylogenetic Sampling
• Small but real improvements in
–Gene identification / confirmation
–Functional prediction
–Binning
–Phylogenetic classification
• But not a lot ...
Wednesday, September 15, 2010
68. GEBA Future 1
Need to adapt genomic and
metagenomic methods to make use of
GEBA data
Wednesday, September 15, 2010
69. Phylogenetic Binning Using AMPHORA
dnaG
0.7
frr
infC
0.6 nusA
pgk
pyrG
0.5
0.4
Improves with better rplA
rplB
rplC
rplD
0.3 phylogenetic methods rplE
rplF
rplK
rplL
0.2 rplM
rplN
rplP
0.1 rplS
rplT
rpmA
0 rpoB
rpsB
es
ia
es
s
s
ria
bi
ia
ia
om ae
ia
e
ria
ia
ria
ia
ria
xi
te
te
ia
er
er
er
er
er
fle
er
ro
et
ut
rpsC
fic
te
te
te
te
yd
de
ae
ct
ct
ct
ct
ct
Ba act
lo
yc
ro
ic
ac
ac
ac
ac
ui
m
ch
oi
ba
Ch
ba
ba
ba
Ba
rm
rpsE
lo
Aq
ob
ob
ob
ob
ob
er
la
iro
eo
Ch
eo
eo
eo
Fi
ed
Ch
ct
an
te
te
id
tin
ct
rpsI
Sp
ot
ot
ot
ot
Ac
ro
ro
ifi
an
Cy
Ac
Pr
pr
pr
pr
ss
ap
np
rpsJ
Pl
ha
ta
ta
ed
la
m
lo
el
Be
nc
p
rpsK
si
ifi
am
Al
D
Ep
U
ss
rpsM
G
la
nc
rpsS
U
smpB
tsf
AMPHORA - each read on its own tree
Wednesday, September 15, 2010
70. Improving Phylogeny for
Metagenomic Reads
• Examples using reference trees
– AMPHORA (Wu and Eisen)
– PPlacer (Erik Matsen)
– FastTree (Morgan Price)
• Variants
– Use concatenated alignment of markers not just
individual genes (Steven Kembel)
– Apply to OTU identification not just classification
(Thomas Sharpton)
– CoBinning: look for linkage among fragments/genes
(Aaron Darling)
Wednesday, September 15, 2010