Gets better with more markers - but we do not have lots of sequences for these markers. We can get them from genomes. The more diverse the genomes, thebeter the marker set will be
Selecting phylogenetically diverse genomes increases the probability that one will find new protein families
Selecting phylogenetically diverse genomes increases the probability that one will find new protein families
Jonathan Eisen talk on 1$ Genome - Presentation Transcript
The 1$ Bacterial Genome:
Advances in Bioinformatics
Jonathan A. Eisen
U. C. Davis Genome Center
The 1$ Bacterial Genome:
Oh $^#^ - We’re $&#$
Jonathan A. Eisen
U. C. Davis Genome Center
The 1$ Bacterial Genome:
Informatics, GEBA and me
Jonathan A. Eisen
U. C. Davis Genome Center
Outline
• GEBA - The JGI Genomic Encyclopedia of
Bacteria and Archaea
• Insights into the 1$ genome from the GEBA
project
• Additional insights into the 1$ genome
GEBA: The Genomic Encyclopedia of
Bacteria and Archaea
Run by JGI
$$ from DOE
Work by many
As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA
WS3
Gemmimonas
Firmicutes
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11
Hugenholtz, 2002
As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria three phyla
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria three phyla
Synergistes
Deferribacteres
Chrysiogenetes • Some other
NKB19
Verrucomicrobia
Chlamydia phyla are
OP3
Planctomycetes
Spriochaetes
only sparsely
Coprothmermobacter
OP10
sampled
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria three phyla
Synergistes
Deferribacteres
Chrysiogenetes • Some other
NKB19
Verrucomicrobia
Chlamydia phyla are
OP3
Planctomycetes
Spriochaetes
only sparsely
Coprothmermobacter
OP10
sampled
Thermomicrobia
Chloroflexi
TM7 • Same trend in
Deinococcus-Thermus
Dictyoglomus
Aquificae Archaea
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Need for Tree Guidance Well Established
• Common approach within some eukaryotic groups
– NHGRI animal projects
– FGI at Whitehead
– Plant LSP
• Phylogenetic gaps in bacterial and archaeal projects
commonly lamented in literature, conversations, etc
• Many small projects funded to fill in some gaps
– DOE/TIGR Sequencing
– Multiple CSP projects
– Multiple NSF/USDA projects
– Private projects (e.g., Integrated Genomics, Diversa)
Proteobacteria
• NSF-funded TM6 • At least 40
OS-K
Tree of Life Acidobacteria
Termite Group phyla of
OP8
Project Nitrospira
bacteria
Bacteroides
Chlorobi
• A genome Fibrobacteres
Marine GroupA • Genome
from each of WS3
Gemmimonas sequences are
Firmicutes
eight phyla Fusobacteria mostly from
Actinobacteria
OP9
Cyanobacteria three phyla
Synergistes
Deferribacteres
Chrysiogenetes • Some other
NKB19
Verrucomicrobia
Chlamydia phyla are only
OP3
Planctomycetes
Spriochaetes
sparsely
Coprothmermobacter
OP10
sampled
Thermomicrobia
Eisen, Ward, Chloroflexi
• Solution I:
TM7
Badger, Wu, Deinococcus-Thermus
Wu, et al.
Dictyoglomus
Aquificae sequence more
Thermudesulfobacteria
Thermotogae
OP1
phyla
OP11
Proteobacteria
TM6
OS-K
• At least 100 phyla of
Acidobacteria
Termite Group bacteria
OP8
Nitrospira
Bacteroides • Genome sequences are
Chlorobi
Fibrobacteres
Marine GroupA
mostly from three phyla
WS3
Gemmimonas
Firmicutes
• Most phyla with cultured
Fusobacteria
Actinobacteria
species are sparsely
OP9
Cyanobacteria
Synergistes
sampled
Deferribacteres
Chrysiogenetes
NKB19
• Lineages with no cultured
Verrucomicrobia
Chlamydia
OP3
taxa even more poorly
Planctomycetes
Spriochaetes sampled
Coprothmermobacter
OP10
Thermomicrobia • Solution - use tree to really
Chloroflexi
TM7
Deinococcus-Thermus
fill gaps
Dictyoglomus
Aquificae
Well sampled phyla
Thermudesulfobacteria
Thermotogae
OP1
OP11
http://www.jgi.doe.gov/programs/GEBA/pilot.html
GEBA Pilot Project Overview
• Select 200 organisms using rRNA tree as a
guide
• Develop high throughput pipeline for strain
growth and DNA preparation
• Sequence and finish 100 genomes
• Annotate, analyze, release data
• Assess benefits of tree guided sequencing
B:
Ac
tin
ob
ac
te
B: ria # of Genomes
Am (H
in igh
10
15
20
25
30
35
0
5
an G
a C
B: B: er )
Ba Aq ob
ct uif ia
e i
B: B: ro cae
D Ch ide
B: efe lo te
r s
D rri ofl
ef ba e
B: e c xi
B: De B rrib ter
Ep lta : D act es
si Pr ei er
lo o n es
n te oc
Pr ob oc
ot a ci
B: e ct
G B: oba eri
am B F ct a
: ir e
B: m Fu mi ria
a
G P so cut
em ro ba e
t c s
B: ma eo te
ba ri
H tim c a
a t
B: loa ona eri
Pl na d a
B:
an er ete
Th o
B: cto bia s
Phyla
er
m S m le
y s
B: od piro ce
es c te
T u h
B: he lfo ae s
rm b te
GEBA Pilot Target List
Th o a s
er de cte
m s ri
u a
A: ove lfo
H n bi
A: alo abu a
A: A b la
M rc ac e
A: et ha te
M han eo ria
et g
ha oba lob
A: no cte i
m r
A: The icr ia
Th rm obi
er oc a
m oc
op ci
ro
te
i
Why Increase Taxonomic Coverage?
• Gene discovery
• Annotation, functional prediction
• Metagenomic analysis
• Mechanisms of diversification
• Species phylogeny and classification
Phylogenetic Metagenomics
Non-Homology Predictions:
Phylogenetic Profiling
• Step 1: Search all genes in
organisms of interest against all
other genomes
• Ask: Yes or No, is each gene
found in each other species
• Cluster genes by distribution
patterns (profiles)
GEBA Lesson 1
Tree of Life is a Useful Guide
rRNA Tree of Life
GEBA Lesson 2
We have still only scratched the
surface of microbial diversity
First Bacterial Actin Related Protein -
Haliangium ochraceum DSM 14365
First found by V. Kunin, Structure Analysis by Patrik D. et al
GEBA Lesson 3
Need Experiments from Across the
Tree of Life too
As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA
WS3
Gemmimonas
Firmicutes
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11
Hugenholtz, 2002
As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Experimental
WS3
Gemmimonas
Firmicutes
studies are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria three phyla
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides
bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Experimental
WS3
Gemmimonas
Firmicutes
studies are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria three phyla
Synergistes
Deferribacteres
Chrysiogenetes • Some studies
NKB19
Verrucomicrobia
Chlamydia in other phyla
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Proteobacteria
TM6
OS-K
Need
Acidobacteria
Termite Group
OP8
experimental
Nitrospira
Bacteroides
Chlorobi
studies from
Fibrobacteres
Marine GroupA
WS3
across the tree
Gemmimonas
Firmicutes too
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1
OP11
GEBA Lesson 4
The Importance of Project
Management
GEBA Project Flowchart
Annotation
Project Initiation Sequencing
Draft IMG1
GEBA Sequencing
Proposal and
Assembly1 Shotgun Complete
Scientific and Genome Genome
Technical GenBank GenBank
Review1 Submission1 Submission1
OK?
OK?
IMG – ER1 IMG – ER1
Finish
Negotiate Sequencing
Scope of and Draft
Work Gene-QA1
Assembly2 Annotation3
Receive
Starting
Material1 Finish
OK?
Annotation3
1 PGF
2 LANL
David Bruce, Lynne Goodwin et al 3 ORNL
GEBA Lessons 5
The Importance of Culture
(Collections that is)
GEBA Biggest Challenge:
Getting DNA
• Getting quality DNA is biggest bottleneck
• Solution: Beg Borrow and Steal
• DSMZ offered to do for free
• ATCC is doing a small number for a fee
• In discussions with other PCC and other
collections
Quantification gel of the genomic DNA isolated from Microorganisms
Conexibacter woesei (DSM 14684T)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Lane 1: c(λ-Marker)= 15 ng Lane 9: DSM 18081, Patulibacter minatonensis
Lane 2: c(λ-Marker)= 30 ng Lane 10: DSM 14684, Conexibacter woesei
Lane 3: c(λ-Marker)= 50 ng Lane 11: DSM 11002, Dethiosulfovibrio peptidovorans
Lane 4: DNA Molecular Weight Marker II (Roche Lane 12: DSM 11551, Halogeometricum borinquense
236250) Lane 13: DNA Molecular Weight Marker II (Roche
Lane 5: DSM 13279, Collinsella stercoris 236250)
Lane 6: DSM 43043, Intrasporangium calvum Lane 14: c(λ-Marker)= 125 ng
Lane 7: DSM 18053, Dyadobacter fermentans Lane 15: c(λ-Marker)= 250 ng
Lane 8: DSM 20476, Slackia heliotrinireducens Lane 16: c(λ-Marker)= 500 ng
Conexibacter woesei (DSM 14684T) was taken from the German Collection of Microorganisms
and Cell Cultures (DSMZ). The genomic DNA was isolated using the Qiagen Genomic 500 DNA
Kit (Qiagen 10262). The genomic DNA was 10-250 kb in size as determined by Pulsed Field Gel
Electrophoresis (PFGE). The bulk of DNA had a size of 50-250 kb (see attached PFGE image).
The DNA concentration is 500 ng/µl as estimated from the gel. Spectrophotometric measurements
yielded a DNA concentration of 450 µg/ml; 300 µl of genomic DNA are shipped (150 µg).
Related Lesson 1
METADATA ROCKS
SIGS
• The Genomic Standards Consortium
• The GSC is an open-membership working
body which formed in September 2005.
• The goal of this international community is to
promote mechanisms that standardize the
description of genomes and the exchange and
integration of genomic data.
• See
http://gensc.org/gc_wiki/index.php/Main_Page
Related Lesson 2
Completeness Matters
Completeness
• Final quality of genome sequence influences what one can
do with the data
• Why completeness (closed, high quality) is important
– Gene presence/absence
– Gene order
– Genome rearrangements
– Identifying islands
• See “The Value of Complete Microbial Genome
Sequencing (You Get What You Pay For).” Fraser et al. J.
Bact. 2002.
Additional Lessons
• Computational methods need to be more
automated
• Need to limit analyses to subsets of all
available data
• Need for people to help interpret and study
data is increasing not decreasing
• Sequence is just the beginning
• Need to train more students
0 comments
Post a comment