SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
1.
Microbial Ecology
Indoor Microbial Ecology
(DNA Sequencing Focus)
Indoor Air 2011
Workshop on Microbiomes of the Built Environment
Jonathan A. Eisen, Ph.D.
University of California, Davis
DOE Joint Genome Institute
Twitter: @phylogenomics
2.
Outline
• Introduction
• Sequencing in microbial studies
• Sequencing technologies
• Current and future issues
6.
A Field Guide to Microbes
• What should be included
• Catalog of types of organism
• Functional diversity
• Biogeography (space and time)
• Niche information
• Means for identification
• “Natural” locations
• “Non natural (i.e., built) locations
7.
Microbial Ecology
• Much more than just a field guide
• Interactions of microbes with each other
with macroorganisms, and the
environment
• Mechanisms and rules of such
interactions
• Can be applied to any environment(s)
including built ones
8.
I: Sequencing and Microbes
• Sequencing is useful as a tool in studies
of microbial ecology for many reasons
• It is complimentary to other means of
study
9.
Era I: rRNA Tree of Life
Bacteria
• Appearance of
microbes not
informative (enough)
• rRNA Tree of Life
Archaea identified two major
groups of organisms
w/o nuclei
• rRNA powerful for
many reasons, though
not perfect
Eukaryotes
Barton, Eisen et al. “Evolution”, CSHL Press. 2007.
Based on tree from Pace 1997 Science 276:734-740
11.
Great Plate Count Anomaly
Culturing Microscope
Count Count
12.
Great Plate Count Anomaly
Culturing Microscope
Count <<<< Count
13.
Great Plate Count Anomaly
DNA
Culturing Microscope
Count <<<< Count
14.
PCR & phylogenetic analysis of rRNA
DNA
extraction PCR
Makes lots Sequence
PCR of copies of rRNA genes
the rRNA
genes in
sample
rRNA1
5’...ACACACATAGGTGGAGC
TAGCGATCGATCGA... 3’
Phylogenetic tree Sequence alignment = Data matrix
rRNA2
rRNA1 rRNA2
rRNA1 A C A C A C 5’..TACAGTATAGGTGGAGCT
rRNA4 AGCGACGATCGA... 3’
rRNA3 rRNA2 T A C A G T
rRNA3
rRNA3 C A C T G T 5’...ACGGCAAAATAGGTGGA
E. coli Humans rRNA4 C A C A G T TTCTAGCGATATAGA... 3’
Yeast E. coli A G A C A G rRNA4
5’...ACGGCCCGATAGGTGG
Humans T A T A G T
ATTCTAGCGCCATAGA... 3’
Yeast T A C A G T
15.
Era II: rRNA in environment
The Hidden Majority Richness estimates
Hugenholtz 2002 Bohannan and Hughes 2003
16.
Era III: Genome Sequencing
Genomes Online
Fleischmann et al. 1995 Science 269:496-512
18.
Era IV: Genomes in Environment
shotgun
sequence
Metagenomics
19.
Weighted % of Clones
0
0.1250
0.2500
0.3750
0.5000
Al
ph
a
Be pro
ta teo
G p b
am rot ac
m eo te
ba ria
Ep ap
ct
si ro
lo t e
np eob ria
D
el rot ac
ta e t
pr ob eria
ot ac
C eo te
ya b r
EFG
no ac ia
EFTu
rRNA
RecA
RpoB
b te
HSP70
Fi act ria
rm e
Ac ic ria
tin ut
es
ob
a
C cte
hl r
or ia
ob
C i
FB
C
hl
o
Major Phylogenetic Group
Sp rof
Metagenomic Phylotyping
Sargasso Phylotypes
iro lex
i
Fu cha
D
304: 66. 2004
ei so et
no ba es
co ct
cc er
Euus ia
ry -T
a h
Venter et al., Science
C rcherm
re
na aeous
rc t
ha a
eo
ta
27.
What’s Coming?
• Sequencing
• Speed up; cost down
• Mini-sequencers with massive capacity
• Automation of sample processing
• Portable and remote systems
• Massive databases
• Computational changes
• Clusters vs. RAM
• Cloud computing
• GPU acceleration
28.
Beyond Sequencing
• Array methods should not be ignored
• Bad gene array
• Phylochips
• High throughput/low cost approaches to
characterizing other macromolecules
• Proteomics
• Metabolomics
• Transcriptomics
29.
Challenge 1: Data overload
• Major current issue is massive size of
sequence data sets
• Creates many new challenges not widely
anticipated
• Data transfer and storage
• RAM limits for some processes
• Databases overstretched
30.
Solutions?
• Throw away data (analogous to CERN)
• New algorithms to limit RAM needs
• Complete automation of algorithms
• Distributed data (e.g., Biotorrents)
• Emphasis on standards and metadata
31.
Challenge 2: Short reads
• Some specific challenges come from
short reads
• Key step in analysis of mixed
communities is “binning”
• Binning methods perform poorly on
short reads
• nucleotide composition
• blast hits
• phylogenetic analysis
32.
Solutions
• Longer reads
• More full length reference data
• Reference is annotated
• Reads are used to count
• New algorithms
• Phylogeny w/ short reads
• Cobinning/combining data
• New markers
• Better HMM searches
33.
Challenge 3: Real time
• New sequencing and array technologies
allow almost real time data collection
• Analysis generally not done in real time
• e.g., metagenome annotation can take
weeks to months
• e.g., phylogenetics bottleneck
• systems not set up for rapid, open sharing
of results
34.
Solutions?
• New automated high throughput
methods
• Must be updated continuously to deal with
new data types
• Need to be tested and verified
• Rapid sharing of results
• PLoS Currents
0.700
0.525
0.350
0.175
0
C eob ria
Ba ac ria
oi a
s
or es
xi
te
ri
le
hl et
te
b e
te
de
of
no ct
yc
pr bac
ya a
om
er
o
C
ct
te
ct
ot
ro
an
ap
Pl
ta
ph
el
D
Al
35.
Challenge 4: Reference Data
• Microbial diversity woefully
undersampled
• Greatly limits ability to
• Identify new organisms from DNA fragments
• Determine if organisms are out of “place” in
some way compared to natural diversity
• Perform reliable attribution/matching
• Understand EIDs
• Know what is “normal”
36.
Solution?
• Systematic efforts to sample diversity
• Some decent efforts in this regard in
terms of diversity of known Category
ABC pathogens
• Much more needed
38.
Genomic Diversity of Isolates
Bacteria
Archaea
Eukaryotes
Figure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
39.
Gene tree ≠ Genome tree
16s WGT, 23S
Badger et al. 2005 Int J System Evol Microbiol 55: 1021-1026.
40.
Phylogenetic Diversity
• Phylogenetic
diversity poorly
sampled
• GEBA project at DOE-
JGI correcting this
42.
Challenge 5: Knowledge
• Data collection is of course not enough
• Need to be able to turn the data into
knowledge
• This is difficult to automate
43.
Solutions
• More curators
• Populate databases with experimental
information not more predictions
• Bioinformatics expansion
• Better linking with ecology, building
science, etc.
44.
Acknowledgements
• $$$
• Sloan Foundation
• DOE
• NSF
• GBMF
• DARPA
• People, places
• DOE JGI: Eddy Rubin, Phil Hugenholtz et al.
• UC Davis: Aaron Darling, Dongying Wu
• Other: Jessica Green, Katie Pollard, Martin
Wu, Tom Slezak, Jack Gilbert
Editor's Notes
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Send it out for sequencing, do an alignment with your gene and blast it (search for other organisms) with a similar sequence\n