This document summarizes Jonathan Eisen's talk at Fresno State University about phylogeny-driven approaches to genomics and metagenomics. The talk focused on 3 main topics:
1) Eisen's early career work using rRNA phylogeny to study microbial diversity and evolution, including his work on halophiles and using rRNA to study microbes in the environment.
2) The revolution brought about by using high-throughput sequencing of rRNA genes (phylotyping) to deeply characterize microbial communities from various environments and uncover the rare biosphere.
3) How next-generation sequencing has further expanded the field by allowing more samples to be analyzed at a finer scale, revealing patterns of beta-diversity across spatial
Magpali et al (2020) Adaptive evolution of hearing genes in echolocating dolp...Letícia Magpali
Candidate poster for presentation at the I Meeting of Systematics, Biogeography and Evolution (SBE), in the category Phylogenomics and molecular evolution.
Magpali, L.; Freitas, L.; Ramos, E. K. S.; de Souza, E. M. S.; Nery, M. F.
University of Campinas / Biology Institute, Brazil
EveMicrobial Phylogenomics (EVE161) Class 9Jonathan Eisen
Microbial Phylogenomics (EVE161) at UC Davis Spring 2016. Co-taught by Jonathan Eisen and Holly Ganz.
Class 9:
Era II: rRNA Case Study: Built Environment Metaanalysis
Effects of density on spacing patterns and habitat associations of a Neotropi...Nicole Angeli
Presentation at Ecological Society of America, August 2013. Minneapolis, USA. –Oral Paper
Angeli, N. F., K. Lips, G. V. DiRenzo, and A. Cunha. “Effects of density on spacing patterns
and habitat associations in the Neotropical Glassfrog Espadarana prosoblepon.”
Supporting evidence for a cryptic species within the Neotropical freshwater f...Izabela Mendes
Presentation by Izabela Santos Mendes for the I Virtual Meeting of Systematics, Biogeography and Evolution (SBE).
Authors: Izabela Santos Mendes, Bruno Francelino de Melo, Daniel Fonseca Teixeira, Júnio Damasceno Souza, Daniel Cardoso Carvalho.
"Genomic approaches for dissecting fitness traits in forest tree landscapes"ExternalEvents
"Genomic approaches for dissecting fitness traits in forest
tree landscapes" presentation by Ciro De Pace, Università degli Studi della Tuscia, Viterbo, Italy
Magpali et al (2020) Adaptive evolution of hearing genes in echolocating dolp...Letícia Magpali
Candidate poster for presentation at the I Meeting of Systematics, Biogeography and Evolution (SBE), in the category Phylogenomics and molecular evolution.
Magpali, L.; Freitas, L.; Ramos, E. K. S.; de Souza, E. M. S.; Nery, M. F.
University of Campinas / Biology Institute, Brazil
EveMicrobial Phylogenomics (EVE161) Class 9Jonathan Eisen
Microbial Phylogenomics (EVE161) at UC Davis Spring 2016. Co-taught by Jonathan Eisen and Holly Ganz.
Class 9:
Era II: rRNA Case Study: Built Environment Metaanalysis
Effects of density on spacing patterns and habitat associations of a Neotropi...Nicole Angeli
Presentation at Ecological Society of America, August 2013. Minneapolis, USA. –Oral Paper
Angeli, N. F., K. Lips, G. V. DiRenzo, and A. Cunha. “Effects of density on spacing patterns
and habitat associations in the Neotropical Glassfrog Espadarana prosoblepon.”
Supporting evidence for a cryptic species within the Neotropical freshwater f...Izabela Mendes
Presentation by Izabela Santos Mendes for the I Virtual Meeting of Systematics, Biogeography and Evolution (SBE).
Authors: Izabela Santos Mendes, Bruno Francelino de Melo, Daniel Fonseca Teixeira, Júnio Damasceno Souza, Daniel Cardoso Carvalho.
"Genomic approaches for dissecting fitness traits in forest tree landscapes"ExternalEvents
"Genomic approaches for dissecting fitness traits in forest
tree landscapes" presentation by Ciro De Pace, Università degli Studi della Tuscia, Viterbo, Italy
Temporal dynamics in microbial soil communities at anthrax carcass sitesThomas Haverkamp
Nutrient availability and moisture are defining parameters of soil microbes in semi-arid environments. Introduction of animal carcasses provide large inputs of nutrients, fluids and host-associated microbes into the soil. One trigger for animal death is Anthrax caused by the spore-forming bacterium Bacillus anthracis. The bacterium is present in soils as spores that are activated after ingestion by grazing mammals. After killing an animal, B. anthracis cells return to the soil where they sporulate, completing the lifecycle of the bacterium. It is unclear, how animal carcass with large numbers of B. anthracis cells influence the soil community.
We therefore studied microbial soil community dynamics over 30 days (Etosha National Park, Namibia), after decomposition of two zebra anthrax carcasses.
Time-series metagenomics data showed that during the experiment the microbial community quickly changed and became dominated by the opportunistic orders Bacillales and Pseudomonadales with genomes enriched for metabolic pathways needed for proliferation. Bacteria commonly found in semi-arid soils (e.g. Frankiales and Rhizobiales) dominated at the end of the time-series. Those orders have pathways involved in desiccation and radiation resistance. Thus metagenomic data showed that anthrax carcasses have a substantial influence on the microbial communities of semi-arid soils.
To avoid state-associated-challenges (i.e. vegetative/spore) we monitored Bacillus anthracis, throughout the period. Using shotgun metagenomics, quantitative PCR and cultivation, we observed that vegetative B. anthracis abundances peak early in the time-series and then quickly drop, at which time they are replaced by spores.
We find that DNA-based approaches underestimated total B. anthracis abundances, due to difficulty in DNA extracting from spores. Furthermore, current bioinformatic tools have difficulties differentiating between the very closely related Bacillus cereus group ‘species’. This suggests that DNA-based approaches of spore-forming bacteria in their natural habitat are insufficient for estimating their abundances. We show, however, that complementing DNA based approaches with cultivation may give a more complete picture of the ecology of spore forming pathogens.
Authors:
Karoline Valseth, Camilla L. Nesbø, W. Ryan Easterday, Wendy C. Turner, Jaran S. Olsen, Nils Chr. Stenseth and Thomas H. A. Haverkamp.
Affiliations
1) Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Blindern, Oslo, Norway
2) Norwegian Defence Research Establishment, Kjeller, Norway
3) Department of Biological Sciences, University of Alberta, Edmonton, Alberta, Canada
4 ) Department of biological Sciences, University of Albany, State University of New York, Albany, New York, USA
Seminar abstract: I will be talking about two ongoing research projects in my laboratory: (1) evolution of thermal niches in seaweeds, (2) biodiversity of endolithic algae in coral skeletons and its relationship with the environment. Using evolutionary models in an explicit phylogenetic framework, patterns of evolution in environmental traits such as the sea surface temperature (SST) affinities of species can be studied. Based on case studies in the green algae Codium and Halimeda, it is shown that lineages behave differently when it comes to their evolution of SST affinities, and that there is a strong correlation between the evolution of SST affinities and rates of species diversification. For the second part of the talk, I will focus on our recent work on environmental sequencing of coral skeletons. These feature unexpectedly high biodiversity of limestone-boring algae as well as many unknown inhabitants. Our first results indicate that the diversity of algal endoliths may be linked to environmental conditions, but this hypothesis needs further testing.
Dealing with heterogeneous data to improve our knowledge of biodiversity dynamics and ecosystem function: perspectives from synthesis projects: presented by Liliana Ballesteros-Meija for ACTIAS (Global patterns of insect diversity, distribution and evolutionary distinctness - What can we learn from two of the best-documented families of moths?) at the sfécologie conference 2018.
more information on the group: http://www.cesab.org/index.php/fr/projets-en-cours/projets-2014/130-actias
Innovations in Sequencing & Bioinformatics
Talk for
Healthy Central Valley Together Research Workshop
Jonathan A. Eisen University of California, Davis
January 31, 2024 linktr.ee/jonathaneisen
Thoughts on UC Davis' COVID Current ActionsJonathan Eisen
Slides I used for a presentation to Chancellor May's leadership council about the current state of UC Davis' response to COVID and how it could be improved
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
25. DNA
extraction
PCR
Sequence
rRNA genes
Sequence alignment = Data matrixPhylogenetic tree
PCR
rRNA1
rRNA2
Makes lots of
copies of the
rRNA genes
in sample
rRNA1
5’...ACACACATAGGTGGAGCTA
GCGATCGATCGA... 3’
E. coli
Humans
A
T
T
A
G
A
A
C
A
T
C
A
C
A
A
C
A
G
G
A
G
T
T
C
rRNA1
E. coli Humans
rRNA2
rRNA2
5’..TACAGTATAGGTGGAGCTAG
CGACGATCGA... 3’
PCR and phylogenetic analysis of rRNA genes
rRNA3
5’...ACGGCAAAATAGGTGGATT
CTAGCGATATAGA... 3’
rRNA4
5’...ACGGCCCGATAGGTGGATT
CTAGCGCCATAGA... 3’
rRNA3 C A C T G T
rRNA4 C A C A G T
Yeast T A C A G T
Yeast
rRNA3
rRNA4
26. DNA
extraction
PCR
Sequence
rRNA genes
Sequence alignment = Data matrixPhylogenetic tree
PCR
rRNA1
rRNA2
Makes lots of
copies of the
rRNA genes
in sample
rRNA1
5’...ACACACATAGGTGGAGCTA
GCGATCGATCGA... 3’
E. coli
Humans
A
T
T
A
G
A
A
C
A
T
C
A
C
A
A
C
A
G
G
A
G
T
T
C
rRNA1
E. coli Humans
rRNA2
rRNA2
5’..TACAGTATAGGTGGAGCTAG
CGACGATCGA... 3’
PCR and phylogenetic analysis of rRNA genes
rRNA3
5’...ACGGCAAAATAGGTGGATT
CTAGCGATATAGA... 3’
rRNA4
5’...ACGGCCCGATAGGTGGATT
CTAGCGCCATAGA... 3’
rRNA3 C A C T G T
rRNA4 C A C A G T
Yeast T A C A G T
Yeast
rRNA3
rRNA4
Phylotyping
30. Approaching to NGS
Discovery of DNA structure
(Cold Spring Harb. Symp. Quant. Biol. 1953;18:123-31)
1953
Sanger sequencing method by F. Sanger
(PNAS ,1977, 74: 560-564)
1977
PCR by K. Mullis
(Cold Spring Harb Symp Quant Biol. 1986;51 Pt 1:263-73)
1983
Development of pyrosequencing
(Anal. Biochem., 1993, 208: 171-175; Science ,1998, 281: 363-365)
1993
1980
1990
2000
2010
Single molecule emulsion PCR 1998
Human Genome Project
(Nature , 2001, 409: 860–92; Science, 2001, 291: 1304–1351)
Founded 454 Life Science 2000
454 GS20 sequencer
(First NGS sequencer)
2005
Founded Solexa 1998
Solexa Genome Analyzer
(First short-read NGS sequencer)
2006
GS FLX sequencer
(NGS with 400-500 bp read lenght)
2008
Hi-Seq2000
(200Gbp per Flow Cell)
2010
Illumina acquires Solexa
(Illumina enters the NGS business)
2006
ABI SOLiD
(Short-read sequencer based upon ligation)
2007
Roche acquires 454 Life Sciences
(Roche enters the NGS business)
2007
NGS Human Genome sequencing
(First Human Genome sequencing based upon NGS technology)
2008
From Slideshare presentation of Cosentino Cristian
http://www.slideshare.net/cosentia/high-throughput-equencing
Miseq
Roche Jr
Ion Torrent
PacBio
Oxford
Sequencing Has Gone Crazy
31. Phylotyping Revolution
• More PCR products
!
• Deeper sequencing
• The rare biosphere
• Relative abundance estimates
!
• More samples (with barcoding)
• Times series
• Spatially diverse sampling
• Fine scale sampling
32. Beta-Diversity
a broader range of Proteobacteria, but yielded similar results
(Fig. S1 and Tables S2 and S3).
Across all samples, we identified 4,931 quality Nitrosomadales
sequences, which grouped into 176 OTUs (operational taxo-
nomic units) using an arbitrary 99% sequence similarity cutoff.
This cutoff retained a high amount of sequence diversity, but
minimized the chance of including diversity because of se-
quencing or PCR errors. Most (95%) of the sequences appear
closely related either to the marine Nitrosospira-like clade,
known to be abundant in estuarine sediments (e.g., ref. 19) or to
marine bacterium C-17, classified as Nitrosomonas (20) (Fig. S2).
Pairwise community similarity between the samples was calcu-
somonadales community similarity. Geographic distance con-
tributed the largest partial regression coefficient (b = 0.40,
P < 0.0001), with sediment moisture, nitrate concentration, plant
cover, salinity, and air and water temperature contributing to
Fig. 1. The 13 marshes sampled (see Table S1 for details). Marshes com-
pared with one another within regions are circled. (Inset) The arrangement
of sampling points within marshes. Six points were sampled along a 100-m
transect, and a seventh point was sampled ∼1 km away. Two marshes in the
Northeast United States (outlined stars) were sampled more intensively,
along four 100-m transects in a grid pattern.
Fig. 2. Distance-decay curves for the Nitrosomadales communities. The
dashed, blue line denotes the least-squares linear regression across all spatial
scales. The solid lines denote separate regressions within each of the three
spatial scales: within marshes, regional (across marshes within regions circled in
Fig. 1), and continental (across regions). The slopes of all lines (except the solid
light blue line) are significantly less than zero. The slopes of the solid red lines
are significantly different from the slope of the all scale (blue dashed) line.
ECOLOGY
a broader range of Proteobacteria, but yielded similar results
(Fig. S1 and Tables S2 and S3).
Across all samples, we identified 4,931 quality Nitrosomadales
sequences, which grouped into 176 OTUs (operational taxo-
nomic units) using an arbitrary 99% sequence similarity cutoff.
This cutoff retained a high amount of sequence diversity, but
minimized the chance of including diversity because of se-
quencing or PCR errors. Most (95%) of the sequences appear
closely related either to the marine Nitrosospira-like clade,
known to be abundant in estuarine sediments (e.g., ref. 19) or to
marine bacterium C-17, classified as Nitrosomonas (20) (Fig. S2).
Pairwise community similarity between the samples was calcu-
lated based on the presence or absence of each OTU using
a rarefied Sørensen’s index (4). Community similarity using this
incidence index was highly correlated with the abundance-based
Sørensen index (Mantel test: ρ = 0.9239; P = 0.0001) (21).
A plot of community similarity versus geographic distance for
somonadales community similarity. Geographic distance con-
tributed the largest partial regression coefficient (b = 0.40,
P < 0.0001), with sediment moisture, nitrate concentration, plant
cover, salinity, and air and water temperature contributing to
smaller, but significant, partial regression coefficients (b = 0.09–
0.17, P < 0.05) (Table 1). Because salt marsh bacteria may be
dispersing through ocean currents, we also used a global ocean
circulation model (23), as applied previously (24), to estimate
relative dispersal times of hypothetical microbial cells between
Fig. 1. The 13 marshes sampled (see Table S1 for details). Marshes com-
pared with one another within regions are circled. (Inset) The arrangement
of sampling points within marshes. Six points were sampled along a 100-m
transect, and a seventh point was sampled ∼1 km away. Two marshes in the
Northeast United States (outlined stars) were sampled more intensively,
along four 100-m transects in a grid pattern.
Fig. 2. Distance-decay curves for the Nitrosomadales communities. The
dashed, blue line denotes the least-squares linear regression across all spatial
scales. The solid lines denote separate regressions within each of the three
spatial scales: within marshes, regional (across marshes within regions circled in
Fig. 1), and continental (across regions). The slopes of all lines (except the solid
light blue line) are significantly less than zero. The slopes of the solid red lines
are significantly different from the slope of the all scale (blue dashed) line.
ECOLOGY
Drivers of bacterial β-diversity depend on spatial scale
Jennifer B. H. Martinya,1
, Jonathan A. Eisenb
, Kevin Pennc
, Steven D. Allisona,d
, and M. Claire Horner-Devinee
a
Department of Ecology and Evolutionary Biology, and d
Department of Earth System Science, University of California, Irvine, CA 92697; b
Department of
Evolution and Ecology, University of California Davis Genome Center, Davis, CA 95616; c
Center for Marine Biotechnology and Biomedicine, The Scripps
Institution of Oceanography, University of California at San Diego, La Jolla, CA 92093; and e
School of Aquatic and Fishery Sciences, University of Washington,
community composition) yield insights into the maintenance of
biodiversity. These studies are still relatively rare for micro-
organisms, however, and thus our understanding of the mecha-
nisms underlying microbial diversity—most of the tree of life—
remains limited.
β-Diversity, and therefore distance-decay patterns, could be
driven solely by differences in environmental conditions across
space, a hypothesis summed up by microbiologists as, “every-
thing is everywhere—the environmental selects” (10). Under this
model, a distance-decay curve is observed because environmen-
tal variables tend to be spatially autocorrelated, and organisms
with differing niche preferences are selected from the available
pool of taxa as the environment changes with distance.
Dispersal limitation can also give rise to β-diversity, as it per-
mits historical contingencies to influence present-day biogeo-
graphic patterns. For example, neutral niche models, in which an
organism’s abundance is not influenced by its environmental
preferences, predict a distance-decay curve (8, 11). On relatively
short time scales, stochastic births and deaths contribute to
a heterogeneous distribution of taxa (ecological drift). On longer
time scales, stochastic genetic processes allow for taxon di-
versification across the landscape (evolutionary drift). If dispersal
is limiting, then current environmental or biotic conditions will
not fully explain the distance-decay curve, and thus geographic
distance will be correlated with community similarity even after
controlling for other factors (2).
For macroorganisms, the relative contribution of environ-
mental factors or dispersal limitation to β-diversity depends on
vary by spatial scale? Because most bac
and hardy, we predicted that dispers
primarily across continents, resulting
microbial “provinces” (15). At the sam
environmental factors would contrib
decay at all scales, resulting in the steep
scale as reported in plant and animal c
Results and Discussion
We characterized AOB community co
Sanger sequencing of 16S rRNA gene
primer sets. Here we focus on the resu
sequences from the order Nitrosomo
primers specific for AOB within the β-
The second primer set (18) generate
Author contributions: J.B.H.M. and M.C.H.-D. designe
M.C.H.-D. performed research; J.B.H.M., S.D.A., and M
and M.C.H.-D. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Freely available online through the PNAS open acces
Data deposition: The sequences reported in this pap
Bank database (accession nos. HQ271472–HQ276885
1
To whom correspondence should be addressed. E-m
This article contains supporting information online at
1073/pnas.1016308108/-/DCSupplemental.
7850–7854 | PNAS | May 10, 2011 | vol. 108 | no. 19 www.pnas.org
34. The Built Environment
ORIGINAL ARTICLE
Architectural design influences the diversity and
structure of the built environment microbiome
Steven W Kembel1
, Evan Jones1
, Jeff Kline1,2
, Dale Northcutt1,2
, Jason Stenson1,2
,
Ann M Womack1
, Brendan JM Bohannan1
, G Z Brown1,2
and Jessica L Green1,3
1
Biology and the Built Environment Center, Institute of Ecology and Evolution, Department of
Biology, University of Oregon, Eugene, OR, USA; 2
Energy Studies in Buildings Laboratory,
Department of Architecture, University of Oregon, Eugene, OR, USA and 3
Santa Fe Institute,
Santa Fe, NM, USA
Buildings are complex ecosystems that house trillions of microorganisms interacting with each
other, with humans and with their environment. Understanding the ecological and evolutionary
processes that determine the diversity and composition of the built environment microbiome—the
community of microorganisms that live indoors—is important for understanding the relationship
between building design, biodiversity and human health. In this study, we used high-throughput
sequencing of the bacterial 16S rRNA gene to quantify relationships between building attributes and
airborne bacterial communities at a health-care facility. We quantified airborne bacterial community
structure and environmental conditions in patient rooms exposed to mechanical or window
ventilation and in outdoor air. The phylogenetic diversity of airborne bacterial communities was
lower indoors than outdoors, and mechanically ventilated rooms contained less diverse microbial
communities than did window-ventilated rooms. Bacterial communities in indoor environments
contained many taxa that are absent or rare outdoors, including taxa closely related to potential
human pathogens. Building attributes, specifically the source of ventilation air, airflow rates, relative
humidity and temperature, were correlated with the diversity and composition of indoor bacterial
communities. The relative abundance of bacteria closely related to human pathogens was higher
indoors than outdoors, and higher in rooms with lower airflow rates and lower relative humidity.
The observed relationship between building design and airborne bacterial diversity suggests that
we can manage indoor environments, altering through building design and operation the community
of microbial species that potentially colonize the human microbiome during our time indoors.
The ISME Journal advance online publication, 26 January 2012; doi:10.1038/ismej.2011.211
Subject Category: microbial population and community ecology
Keywords: aeromicrobiology; bacteria; built environment microbiome; community ecology; dispersal;
environmental filtering
Introduction microbiome—includes human pathogens and com-
mensals interacting with each other and with their
The ISME Journal (2012), 1–11
& 2012 International Society for Microbial Ecology All rights reserved 1751-7362/12
www.nature.com/ismej
Microbial Biogeography of Public Restroom Surfaces
Gilberto E. Flores1
, Scott T. Bates1
, Dan Knights2
, Christian L. Lauber1
, Jesse Stombaugh3
, Rob Knight3,4
,
Noah Fierer1,5
*
1 Cooperative Institute for Research in Environmental Science, University of Colorado, Boulder, Colorado, United States of America, 2 Department of Computer Science,
University of Colorado, Boulder, Colorado, United States of America, 3 Department of Chemistry and Biochemistry, University of Colorado, Boulder, Colorado, United
States of America, 4 Howard Hughes Medical Institute, University of Colorado, Boulder, Colorado, United States of America, 5 Department of Ecology and Evolutionary
Biology, University of Colorado, Boulder, Colorado, United States of America
Abstract
We spend the majority of our lives indoors where we are constantly exposed to bacteria residing on surfaces. However, the
diversity of these surface-associated communities is largely unknown. We explored the biogeographical patterns exhibited
by bacteria across ten surfaces within each of twelve public restrooms. Using high-throughput barcoded pyrosequencing of
the 16 S rRNA gene, we identified 19 bacterial phyla across all surfaces. Most sequences belonged to four phyla:
Actinobacteria, Bacteriodetes, Firmicutes and Proteobacteria. The communities clustered into three general categories: those
found on surfaces associated with toilets, those on the restroom floor, and those found on surfaces routinely touched with
hands. On toilet surfaces, gut-associated taxa were more prevalent, suggesting fecal contamination of these surfaces. Floor
surfaces were the most diverse of all communities and contained several taxa commonly found in soils. Skin-associated
bacteria, especially the Propionibacteriaceae, dominated surfaces routinely touched with our hands. Certain taxa were more
common in female than in male restrooms as vagina-associated Lactobacillaceae were widely distributed in female
restrooms, likely from urine contamination. Use of the SourceTracker algorithm confirmed many of our taxonomic
observations as human skin was the primary source of bacteria on restroom surfaces. Overall, these results demonstrate that
restroom surfaces host relatively diverse microbial communities dominated by human-associated bacteria with clear
linkages between communities on or in different body sites and those communities found on restroom surfaces. More
generally, this work is relevant to the public health field as we show that human-associated microbes are commonly found
on restroom surfaces suggesting that bacterial pathogens could readily be transmitted between individuals by the touching
of surfaces. Furthermore, we demonstrate that we can use high-throughput analyses of bacterial communities to determine
sources of bacteria on indoor surfaces, an approach which could be used to track pathogen transmission and test the
efficacy of hygiene practices.
Citation: Flores GE, Bates ST, Knights D, Lauber CL, Stombaugh J, et al. (2011) Microbial Biogeography of Public Restroom Surfaces. PLoS ONE 6(11): e28132.
doi:10.1371/journal.pone.0028132
Editor: Mark R. Liles, Auburn University, United States of America
Received September 12, 2011; Accepted November 1, 2011; Published November 23, 2011
Copyright: ß 2011 Flores et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was supported with funding from the Alfred P. Sloan Foundation and their Indoor Environment program, and in part by the National
Institutes of Health and the Howard Hughes Medical Institute. The funders had no role in study design, data collection and analysis, decision to publish, or
preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
* E-mail: noah.fierer@colorado.edu
Introduction
More than ever, individuals across the globe spend a large
portion of their lives indoors, yet relatively little is known about the
microbial diversity of indoor environments. Of the studies that
have examined microorganisms associated with indoor environ-
ments, most have relied upon cultivation-based techniques to
detect organisms residing on a variety of household surfaces [1–5].
Not surprisingly, these studies have identified surfaces in kitchens
and restrooms as being hot spots of bacterial contamination.
Because several pathogenic bacteria are known to survive on
surfaces for extended periods of time [6–8], these studies are of
obvious importance in preventing the spread of human disease.
However, it is now widely recognized that the majority of
communities and revealed a greater diversity of bacteria on
indoor surfaces than captured using cultivation-based techniques
[10–13]. Most of the organisms identified in these studies are
related to human commensals suggesting that the organisms are
not actively growing on the surfaces but rather were deposited
directly (i.e. touching) or indirectly (e.g. shedding of skin cells) by
humans. Despite these efforts, we still have an incomplete
understanding of bacterial communities associated with indoor
environments because limitations of traditional 16 S rRNA gene
cloning and sequencing techniques have made replicate sampling
and in-depth characterizations of the communities prohibitive.
With the advent of high-throughput sequencing techniques, we
can now investigate indoor microbial communities at an
unprecedented depth and begin to understand the relationship
the stall in), they were likely dispersed manually after women used
the toilet. Coupling these observations with those of the
distribution of gut-associated bacteria indicate that routine use of
toilets results in the dispersal of urine- and fecal-associated bacteria
throughout the restroom. While these results are not unexpected,
they do highlight the importance of hand-hygiene when using
public restrooms since these surfaces could also be potential
vehicles for the transmission of human pathogens. Unfortunately,
previous studies have documented that college students (who are
likely the most frequent users of the studied restrooms) are not
always the most diligent of hand-washers [42,43].
Results of SourceTracker analysis support the taxonomic
patterns highlighted above, indicating that human skin was the
primary source of bacteria on all public restroom surfaces
examined, while the human gut was an important source on or
around the toilet, and urine was an important source in women’s
restrooms (Figure 4, Table S4). Contrary to expectations (see
above), soil was not identified by the SourceTracker algorithm as
being a major source of bacteria on any of the surfaces, including
floors (Figure 4). Although the floor samples contained family-level
taxa that are common in soil, the SourceTracker algorithm
probably underestimates the relative importance of sources, like
Figure 3. Cartoon illustrations of the relative abundance of discriminating taxa on public restroom surfaces. Light blue indicates low
abundance while dark blue indicates high abundance of taxa. (A) Although skin-associated taxa (Propionibacteriaceae, Corynebacteriaceae,
Staphylococcaceae and Streptococcaceae) were abundant on all surfaces, they were relatively more abundant on surfaces routinely touched with
hands. (B) Gut-associated taxa (Clostridiales, Clostridiales group XI, Ruminococcaceae, Lachnospiraceae, Prevotellaceae and Bacteroidaceae) were most
abundant on toilet surfaces. (C) Although soil-associated taxa (Rhodobacteraceae, Rhizobiales, Microbacteriaceae and Nocardioidaceae) were in low
abundance on all restroom surfaces, they were relatively more abundant on the floor of the restrooms we surveyed. Figure not drawn to scale.
doi:10.1371/journal.pone.0028132.g003
Bacteria of Public Restrooms
high diversity of floor communities is likely due to the frequency of
contact with the bottom of shoes, which would track in a diversity
of microorganisms from a variety of sources including soil, which is
known to be a highly-diverse microbial habitat [27,39]. Indeed,
bacteria commonly associated with soil (e.g. Rhodobacteraceae,
Rhizobiales, Microbacteriaceae and Nocardioidaceae) were, on average,
related differences in the relative abundances of s
some surfaces (Figure 1B, Table S2). Most notably
were clearly more abundant on certain surfaces
restrooms than male restrooms (Figure 1B). Some
family are the most common, and often most abun
found in the vagina of healthy reproductive age w
Figure 2. Relationship between bacterial communities associated with ten public restroom surfaces. Communities were
PCoA of the unweighted UniFrac distance matrix. Each point represents a single sample. Note that the floor (triangles) and toilet (as
form clusters distinct from surfaces touched with hands.
doi:10.1371/journal.pone.0028132.g002
Bacteria of P
time, the
un to take
of outside
om plants
ours after
ere shut
ortion of
e human
ck to pre-
which
26 Janu-
Journal,
hanically
had lower
y than ones with open win-
ility of fresh air translated
tions of microbes associ-
an body, and consequently,
pathogens. Although this
hat having natural airflow
Green says answering that
clinical data; she’s hoping
they move around. But to quantify those con-
tributions, Peccia’s team has had to develop
new methods to collect airborne bacteria and
extract their DNA, as the microbes are much
less abundant in air than on surfaces.
In one recent study, they used air filters
to sample airborne particles and microbes
in a classroom during 4 days during which
pant in indoor microbial
ecology research, Peccia
thinks that the field has
yet to gel. And the Sloan
Foundation’s Olsiewski
shares some of his con-
cern. “Everybody’s gen-
erating vast amounts of
data,” she says, but looking across data sets
can be difficult because groups choose dif-
ferent analytical tools. With Sloan support,
though, a data archive and integrated analyt-
ical tools are in the works.
To foster collaborations between micro-
biologists, architects, and building scientists,
the foundation also sponsored a symposium
100
80
60
40
20
0
Averagecontribution(%)
DoorinDoorout
StallinStallout
Faucethandles
SoapdispenserToiletseat
ToiletflushhandleToiletfloorSinkfloor
SOURCES
Soil
Water
Mouth
Urine
Gut
Skin
Bathroom biogeography. By
swabbing different surfaces in
public restrooms, researchers
determinedthatmicrobesvaryin
where they come from depend-
ing on the surface (chart).
February9,2012
43. Helicobacter pylori genome sequenced 1997
“The ability of H. pylori to perform mismatch
repair is suggested by the presence of methyl
transferases, mutS and uvrD. However,
orthologues of MutH and MutL were not
identified.”
45. Blast Search of H. pylori “MutS”
Score E
Sequences producing significant alignments: (bits) Value
sp|P73625|MUTS_SYNY3 DNA MISMATCH REPAIR PROTEIN 117 3e-25
sp|P74926|MUTS_THEMA DNA MISMATCH REPAIR PROTEIN 69 1e-10
sp|P44834|MUTS_HAEIN DNA MISMATCH REPAIR PROTEIN 64 3e-09
sp|P10339|MUTS_SALTY DNA MISMATCH REPAIR PROTEIN 62 2e-08
sp|O66652|MUTS_AQUAE DNA MISMATCH REPAIR PROTEIN 57 4e-07
sp|P23909|MUTS_ECOLI DNA MISMATCH REPAIR PROTEIN 57 4e-07
• Blast search pulls up Syn. sp MutS#2 with much higher p
value than other MutS homologs
• Based on this TIGR predicted this species had mismatch
repair
Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
46. Tree of MutS Family
Aquae Trepa
Fly
Xenla
Rat
Mouse
Human
Yeast
Neucr
Arath
Borbu
Strpy
Bacsu
Synsp
Ecoli
Neigo
Thema
TheaqDeira
Chltr
Spombe
Yeast
Yeast
Spombe
Mouse
Human
Arath
Yeast
Human
Mouse
Arath
StrpyBacsu
Celeg
Human
Yeast
MetthBorbu
Aquae
Synsp
Deira Helpy
mSaco
Yeast
Celeg
Human
Based on Eisen, 1998
Nucl Acids Res 26: 4291-4300.
48. Overlaying Functions onto Tree
Aquae Trepa
Rat
Fly
Xenla
Mouse
Human
Yeast
Neucr
Arath
Borbu
Synsp
Neigo
Thema
Strpy
Bacsu
Ecoli
TheaqDeira
Chltr
Spombe
Yeast
Yeast
Spombe
Mouse
Human
Arath
Yeast
Human
Mouse
Arath
StrpyBacsu
Human
Celeg
Yeast
MetthBorbu
Aquae
Synsp
Deira Helpy
mSaco
Yeast
Celeg
Human
MSH4
MSH5
MutS2
MutS1
MSH1
MSH3
MSH6
MSH2
Based on Eisen, 1998
Nucl Acids Res 26: 4291-4300.
49. Functional Prediction Using Tree
Aquae Trepa
Fly
Xenla
Rat
Mouse
Human
Yeast
Neucr
Arath
Borbu
Strpy
Bacsu
Synsp
Ecoli
Neigo
Thema
TheaqDeira
Chltr
Spombe
Yeast
Yeast
Spombe
Mouse
Human
Arath
Yeast
Human
Mouse
Arath
MSH1
Mitochondrial
Repair
MSH3 - Nuclear
RepairOf Loops
MSH6 - Nuclear
Repair
Of Mismatches
MutS1 - Bacterial Mismatch and Loop Repair
StrpyBacsu
Celeg
Human
Yeast
MetthBorbu
Aquae
Synsp
Deira Helpy
mSaco
Yeast
Celeg
Human
MSH4 - Meiotic Crossing
Over
MSH5 - Meiotic Crossing Over MutS2 - Unknown Functions
MSH2 - Eukaryotic Nuclear
Mismatch and Loop Repair
Based on Eisen, 1998
Nucl Acids Res 26: 4291-4300.
50.
51. PHYLOGENENETIC PREDICTION OF GENE FUNCTION
IDENTIFY HOMOLOGS
OVERLAY KNOWN
FUNCTIONS ONTO TREE
INFER LIKELY FUNCTION
OF GENE(S) OF INTEREST
1 2 3 4 5 6
3 5
3
1A 2A 3A 1B 2B 3B
2A 1B
1A
3A
1B
2B
3B
ALIGN SEQUENCES
CALCULATE GENE TREE
1
2
4
6
CHOOSE GENE(S) OF INTEREST
2A
2A
5
3
Species 3Species 1 Species 2
1
1 2
2
2 31
1A 3A
1A 2A 3A
1A 2A 3A
4 6
4 5 6
4 5 6
2B 3B
1B 2B 3B
1B 2B 3B
ACTUAL EVOLUTION
(ASSUMED TO BE UNKNOWN)
Duplication?
EXAMPLE A EXAMPLE B
Duplication?
Duplication?
Duplication
5
METHOD
Ambiguous
Based on
Eisen, 1998
Genome Res 8:
163-167.
Phylogenomic Functional Prediction
52. If you can’t beat them, use their data
Fleischmann et al.
1995
67. Genome Sequences Have
Revolutionized Microbiology
• Predictions of metabolic processes
• Better vaccine and drug design
• New insights into mechanisms of evolution
• Genomes serve as template for functional
studies
• New enzymes and materials for engineering
and synthetic biology
69. Phylogenetic Prediction of Function
• Many powerful and automated similarity based
methods for assigning genes to protein families
• COGs
• PFAM HMM searches
• Some limitations of similarity based methods can be
overcome by phylogenetic approaches
• Automated methods now available
• Sean Eddy
• Steven Brenner
• Kimmen Sjölander
• But …
70. Carboxydothermus hydrogenoformans
• Isolated from a Russian hotspring
• Thermophile (grows at 80°C)
• Anaerobic
• Grows very efficiently on CO (Carbon
Monoxide)
• Produces hydrogen gas
• Low GC Gram positive (Firmicute)
• Genome Determined (Wu et al. 2005
PLoS Genetics 1: e65. )
73. Non-Homology Predictions:
Phylogenetic Profiling
• Step 1: Search all genes in
organisms of interest against all
other genomes
!
• Ask: Yes or No, is each gene
found in each other species
!
• Cluster genes by distribution
patterns (profiles)
77. Ed Delong on SAR86
gene
le ge-
iden-
roteo-
from
opsins
erent.
hereas
philes
r than
rmine
l, we
a coli
pres-
rotein
3A).
nes of
popro-
m was
(Fig.
at 520
band-
erated
odop-
nce of
dth is
own transducer of light stimuli [for example,
Htr (22, 23)]. Although sequence analysis of
proteorhodopsin shows moderate statistical
support for a specific relationship with sen-
the kinetics of its photochemical reaction cy-
cle. The transport rhodopsins (bacteriorho-
dopsins and halorhodopsins) are character-
ized by cyclic photochemical reaction se-
From Beja et al. Science 289: 1902–1906. doi:
78. Proteorhodopsin
generated
eorhodop-
resence of
ndwidth is
absorption
. The red-
nm in the
ated Schiff
ably to the
on was de-
s in a cell
ward trans-
in proteor-
nd only in
(Fig. 4A).
edium was
ce of a 10
re carbonyl
19). Illumi-
ical poten-
right-side-
nce of reti-
light onset
hat proteo-
capable of
physiolog-
e activities
containing
proteorho-
main to be
Fig. 1. (A) Phylogenetic tree of bacterial 16S rRNA gene sequences, including that encoded on the
130-kb bacterioplankton BAC clone (EBAC31A08) (16). (B) Phylogenetic analysis of proteorhodop-
sin with archaeal (BR, HR, and SR prefixes) and Neurospora crassa (NOP1 prefix) rhodopsins (16).
Nomenclature: Name_Species.abbreviation_Genbank.gi (HR, halorhodopsin; SR, sensory rhodopsin;
BR, bacteriorhodopsin). Halsod, Halorubrum sodomense; Halhal, Halobacterium salinarum (halo-
bium); Halval, Haloarcula vallismortis; Natpha, Natronomonas pharaonis; Halsp, Halobacterium sp;
Neucra, Neurospora crassa.
wDownloadedfrom
From Beja et al. Science 289: 1902–1906. doi:
100. Commonly Used Binning Methods
Did not Work Well
• Assembly
–Only Baumannia generated good contigs
• Depth of coverage
–Everything else 0-1X coverage
• Nucleotide composition
–No detectible peaks in any vector we looked at
109. Shotgun Metagenomics
Community structure and metabolism
through reconstruction of microbial
genomes from the environment
Gene W. Tyson1
, Jarrod Chapman3,4
, Philip Hugenholtz1
, Eric E. Allen1
, Rachna J. Ram1
, Paul M. Richardson4
, Victor V. Solovyev4
,
Edward M. Rubin4
, Daniel S. Rokhsar3,4
& Jillian F. Banfield1,2
1
Department of Environmental Science, Policy and Management, 2
Department of Earth and Planetary Sciences, and 3
Department of Physics, University of California,
Berkeley, California 94720, USA
4
Joint Genome Institute, Walnut Creek, California 94598, USA
...........................................................................................................................................................................................................................
Microbial communities are vital in the functioning of all ecosystems; however, most microorganisms are uncultivated, and their
roles in natural systems are unclear. Here, using random shotgun sequencing of DNA from a natural acidophilic biofilm, we report
reconstruction of near-complete genomes of Leptospirillum group II and Ferroplasma type II, and partial recovery of three other
genomes. This was possible because the biofilm was dominated by a small number of species populations and the frequency of
genomic rearrangements and gene insertions or deletions was relatively low. Because each sequence read came from a different
individual, we could determine that single-nucleotide polymorphisms are the predominant form of heterogeneity at the strain level.
The Leptospirillum group II genome had remarkably few nucleotide polymorphisms, despite the existence of low-abundance
variants. The Ferroplasma type II genome seems to be a composite from three ancestral strains that have undergone homologous
recombination to form a large population of mosaic genomes. Analysis of the gene complement for each organism revealed the
pathways for carbon and nitrogen fixation and energy generation, and provided insights into survival strategies in an extreme
environment.
The study of microbial evolution and ecology has been revolutio-
nized by DNA sequencing and analysis1–3
. However, isolates have
been the main source of sequence data, and only a small fraction of
microorganisms have been cultivated4–6
. Consequently, focus has
shifted towards the analysis of uncultivated microorganisms via
cloning of conserved genes5
and genome fragments directly from
7–9
fluorescence in situ hybridization (FISH) revealed that all biofilms
contained mixtures of bacteria (Leptospirillum, Sulfobacillus and, in
a few cases, Acidimicrobium) and archaea (Ferroplasma and other
members of the Thermoplasmatales). The genome of one of these
archaea, Ferroplasma acidarmanus fer1, isolated from the Richmond
mine, has been sequenced previously (http://www.jgi.doe.gov/JGI_
articles
Environmental Genome Shotgun
Sequencing of the Sargasso Sea
J. Craig Venter,1
* Karin Remington,1
John F. Heidelberg,3
Aaron L. Halpern,2
Doug Rusch,2
Jonathan A. Eisen,3
Dongying Wu,3
Ian Paulsen,3
Karen E. Nelson,3
William Nelson,3
Derrick E. Fouts,3
Samuel Levy,2
Anthony H. Knap,6
Michael W. Lomas,6
Ken Nealson,5
Owen White,3
Jeremy Peterson,3
Jeff Hoffman,1
Rachel Parsons,6
Holly Baden-Tillson,1
Cynthia Pfannkoch,1
Yu-Hui Rogers,4
Hamilton O. Smith1
chlorococcus, tha
photosynthetic bio
Surface water
were collected ab
from three sites o
February 2003. A
lected aboard the S
station S” in May
are indicated on F
S1; sampling prot
one expedition to
was extracted from
genomic libraries w
2 to 6 kb were m
prepared plasmid
RESEARCH ARTICLE
110. Venter et al., Science 304: 66. 2004
rRNA Phylotyping in Sargasso
112. Sargasso Phylotypes
Weighted%ofClones
0.000
0.125
0.250
0.375
0.500
Major Phylogenetic Group
Alphaproteobacteria
Betaproteobacteria
G
am
m
aproteobacteria
Epsilonproteobacteria
Deltaproteobacteria
C
yanobacteriaFirm
icutesActinobacteria
C
hlorobi
C
FB
C
hloroflexiSpirochaetesFusobacteria
Deinococcus-Therm
us
EuryarchaeotaC
renarchaeota
EFG EFTu HSP70 RecA RpoB rRNA
Phylotyping in Sargasso Data
Venter et al., Science 304: 66. 2004
123. GEBA Pilot Project Overview
• Identify major branches in rRNA tree for which
no genomes are available
• Identify those with a cultured representative in
DSMZ
• DSMZ grew > 200 of these and prepped DNA
• Sequence and finish 200+
• Annotate, analyze, release data
• Assess benefits of tree guided sequencing
• 1st paper Wu et al in Nature Dec 2009
124. GEBA Pilot Project: Components
• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan
Eisen, Eddy Rubin, Jim Bristow)
• Project management (David Bruce, Eileen Dalin, Lynne
Goodwin)
• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)
• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla
Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen,
Jan-Fang Cheng)
• Annotation and data release (Nikos Kyrpides, Victor
Markowitz, et al)
• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu,
Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain,
Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati,
Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla)
• Adopt a microbe education project (Cheryl Kerfeld)
• Outreach (David Gilbert)
• $$$ (DOE, Eddy Rubin, Jim Bristow)
126. Lesson 1: rRNA PD IDs novel lineages
From Wu et al. 2009 Nature 462, 1056-1060
127. Lesson 2: rRNA Tree is not perfect
Badger et al. 2005 Int J System Evol Microbiol 55: 1021-1026.
16s WGT, 23S
128. Lesson 3: Improves annotation
• Took 56 GEBA genomes and compared results vs. 56
randomly sampled new genomes
• Better definition of protein family sequence “patterns”
• Greatly improves “comparative” and “evolutionary”
based predictions
• Conversion of hypothetical into conserved hypotheticals
• Linking distantly related members of protein families
• Improved non-homology prediction
136. Lesson 5: Improves metagenomics
Sargasso Phylotypes
Weighted%ofClones
0.000
0.125
0.250
0.375
0.500
Major Phylogenetic Group
Alphaproteobacteria
Betaproteobacteria
G
am
m
aproteobacteria
Epsilonproteobacteria
Deltaproteobacteria
C
yanobacteriaFirm
icutesActinobacteriaC
hlorobi
C
FB
C
hloroflexiSpirochaetesFusobacteria
Deinococcus-Therm
us
Euryarchaeota
C
renarchaeota
EFG EFTu HSP70
RecA RpoB rRNA
Venter et al., Science 304: 66-74. 2004
GEBA Project
improves
metagenomic
analysis
140. Phylotyping
Sargasso Phylotypes
Weighted%ofClones
0.000
0.125
0.250
0.375
0.500
Major Phylogenetic Group
Alphaproteobacteria
Betaproteobacteria
G
am
m
aproteobacteria
Epsilonproteobacteria
Deltaproteobacteria
C
yanobacteriaFirm
icutesActinobacteriaC
hlorobi
C
FB
C
hloroflexiSpirochaetesFusobacteria
Deinococcus-Therm
us
Euryarchaeota
C
renarchaeota
EFG EFTu HSP70
RecA RpoB rRNA
Venter et al., Science 304: 66-74. 2004
GEBA Project
improves
metagenomic
analysis
141. Phylotyping
Sargasso Phylotypes
Weighted%ofClones
0.000
0.125
0.250
0.375
0.500
Major Phylogenetic Group
Alphaproteobacteria
Betaproteobacteria
G
am
m
aproteobacteria
Epsilonproteobacteria
Deltaproteobacteria
C
yanobacteriaFirm
icutesActinobacteriaC
hlorobi
C
FB
C
hloroflexiSpirochaetesFusobacteria
Deinococcus-Therm
us
Euryarchaeota
C
renarchaeota
EFG EFTu HSP70
RecA RpoB rRNA
But not a lot
Venter et al., Science 304: 66-74. 2004
142. Phylotyping
Sargasso Phylotypes
Weighted%ofClones
0.000
0.125
0.250
0.375
0.500
Major Phylogenetic Group
Alphaproteobacteria
Betaproteobacteria
G
am
m
aproteobacteria
Epsilonproteobacteria
Deltaproteobacteria
C
yanobacteriaFirm
icutesActinobacteriaC
hlorobi
C
FB
C
hloroflexiSpirochaetesFusobacteria
Deinococcus-Therm
us
Euryarchaeota
C
renarchaeota
EFG EFTu HSP70
RecA RpoB rRNA
Venter et al., Science 304: 66-74. 2004
GEBA Project
improves
phylogenomics
analysis
143. Phylotyping
Sargasso Phylotypes
Weighted%ofClones
0.000
0.125
0.250
0.375
0.500
Major Phylogenetic Group
Alphaproteobacteria
Betaproteobacteria
G
am
m
aproteobacteria
Epsilonproteobacteria
Deltaproteobacteria
C
yanobacteriaFirm
icutesActinobacteriaC
hlorobi
C
FB
C
hloroflexiSpirochaetesFusobacteria
Deinococcus-Therm
us
Euryarchaeota
C
renarchaeota
EFG EFTu HSP70
RecA RpoB rRNA
But not a lot
Venter et al., Science 304: 66-74. 2004
144. Future Needs I:
• Need to adapt genomic and metagenomic
methods to make better use of data
145. Improving Metagenomic Analysis
• Methods
• More automation
• Better phylogenetic methods for short reads
and large data sets
• Improved tools for using distantly related
genomes in metagenomic analysis
• Data sets
• Rebuild protein family models
• New phylogenetic markers
• Need better reference phylogenies, including
HGT
• More simulations
146. WATERsPage 2 of 14
ic-
A).
sly
ers
nly
ed,
ed
ng
ge-
de-
he
a
nt
ise
he
on
n-
nd
eys
er)
16
n-
as
nto
tly
nc-
6 S
As
chimeric sequences generated during PCR identifying
closely related sets of sequences (also known as opera-
tional taxonomic units or OTUs), removing redundant
sequences above a certain percent identity cutoff, assign-
ing putative taxonomic identifiers to each sequence or
representative of a group, inferring a phylogenetic tree of
Figure 1 Overview of WATERS. Schema of WATERS where white
boxes indicate "behind the scenes" analyses that are performed in WA-
TERS. Quality control files are generated for white boxes, but not oth-
erwise routinely analyzed. Black arrows indicate that metadata (e.g.,
sample type) has been overlaid on the data for downstream interpre-
tation. Colored boxes indicate different types of results files that are
generated for the user for further use and biological interpretation.
Colors indicate different types of WATERS actors from Fig. 2 which
were used: green, Diversity metrics, WriteGraphCoordinates, Diversity
graphs; blue, Taxonomy, BuildTree, Rename Trees, Save Trees; Create-
Unifrac; yellow, CreateOtuTable, CreateCytoscape, CreateOTUFile;
white, remaining unnamed actors.
Align
Check
chimeras
Cluster Build
Tree
Assign
Taxonomy
Tree w/
Taxonomy
Diversity
statistics &
graphs
Unifrac
files
Cytoscape
network
OTU table
Hartman et al 2010. W.A.T.E.R.S.: a Workflow for the Alignment, Taxonomy, and Ecology
of Ribosomal Sequences. BMC Bioinformatics 2010, 11:317 doi:
10.1186/1471-2105-11-317
all of these bioinformatics steps together in one package.
To this end, we have built an automated, user-friendly,
workflow-based system called WATERS: a Workflow for
the Alignment, Taxonomy, and Ecology of Ribosomal
Sequences (Fig. 1). In addition to being automated and
simple to use, because WATERS is executed in the Kepler
scientific workflow system (Fig. 2) it also has the advan-
tage that it keeps track of the data lineage and provenance
of data products [23,24].
Automation
The primary motivation in building WATERS was to
minimize the technical, bioinformatics challenges that
arise when performing DNA sequence clustering, phylo-
therefore, to invest a large amount of time and effort to
get to that list of microbes. But now that current efforts
are significantly more advanced and often require com-
parison of dozens of factors and variables with datasets of
thousands of sequences, it is not practically feasible to
process these large collections "by hand", and hugely inef-
ficient if instead automated methods can be successfully
employed.
Broadening the user base
A second motivation and perspective is that by minimiz-
ing the technical difficulty of 16 S rDNA analysis through
the use of WATERS, we aim to make the analysis of these
datasets more widely available and allow individuals with
Figure 2 Screenshot of WATERS in Kepler software. Key features: the library of actors un-collapsed and displayed on the left-hand side, the input
and output paths where the user declares the location of their input files and desired location for the results files. Each green box is an individual Kepler
actor that performs a single action on the data stream. The connectors (black arrows) direct and hook up the actors in a defined sequence. Double-
clicking on any actor or connector allows it to be manipulated and re-arranged.
147. Zorro - Automated Masking
cetoTrueTree
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
200 400 800 1600 3200
DistancetoTrueTree
Sequence Length
200
no masking
zorro
gblocks
Wu M, Chatterji S, Eisen JA (2012) Accounting For Alignment Uncertainty
in Phylogenomics. PLoS ONE 7(1): e30288. doi:10.1371/journal.pone.
0030288
148. Kembel Correction
Kembel SW, Wu M, Eisen JA, Green JL (2012) Incorporating 16S Gene Copy Number Information Improves Estimates
of Microbial Diversity and Abundance. PLoS Comput Biol 8(10): e1002743. doi:10.1371/journal.pcbi.1002743
149. alignment used to build the profile, resulting in a multiple
sequence alignment of full-length reference sequences and
PD versus PID clustering, 2) to explore overlap betw
clusters and recognized taxonomic designations, and
Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in
workflow of PhylOTU. See Results section for details.
doi:10.1371/journal.pcbi.1001061.g001
Finding Meta
Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O'Dwyer JP, Green JL, Eisen JA, Pollard KS. (2011)
PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel
Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061
PhylOTU
151. Kembel Combiner
typically used as a qualitative measure because duplicate s
quences are usually removed from the tree. However, the
test may be used in a semiquantitative manner if all clone
even those with identical or near-identical sequences, are i
cluded in the tree (13).
Here we describe a quantitative version of UniFrac that w
call “weighted UniFrac.” We show that weighted UniFrac b
haves similarly to the FST test in situations where both a
FIG. 1. Calculation of the unweighted and the weighted UniFr
measures. Squares and circles represent sequences from two differe
environments. (a) In unweighted UniFrac, the distance between t
circle and square communities is calculated as the fraction of t
branch length that has descendants from either the square or the circ
environment (black) but not both (gray). (b) In weighted UniFra
branch lengths are weighted by the relative abundance of sequences
the square and circle communities; square sequences are weight
twice as much as circle sequences because there are twice as many tot
circle sequences in the data set. The width of branches is proportion
to the degree to which each branch is weighted in the calculations, an
gray branches have no weight. Branches 1 and 2 have heavy weigh
since the descendants are biased toward the square and circles, respe
tively. Branch 3 contributes no value since it has an equal contributio
from circle and square sequences after normalization.
Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of Metagenomes. PLoS
ONE 6(8): e23214. doi:10.1371/journal.pone.0023214
152. NMF in MetagenomesCharacterizing the niche-space distributions of components
Sites
North American East Coast_GS005_Embayment
North American East Coast_GS002_Coastal
North American East Coast_GS003_Coastal
North American East Coast_GS007_Coastal
North American East Coast_GS004_Coastal
North American East Coast_GS013_Coastal
North American East Coast_GS008_Coastal
North American East Coast_GS011_Estuary
North American East Coast_GS009_Coastal
Eastern Tropical Pacific_GS021_Coastal
North American East Coast_GS006_Estuary
North American East Coast_GS014_Coastal
Polynesia Archipelagos_GS051_Coral Reef Atoll
Galapagos Islands_GS036_Coastal
Galapagos Islands_GS028_Coastal
Indian Ocean_GS117a_Coastal sample
Galapagos Islands_GS031_Coastal upwelling
Galapagos Islands_GS029_Coastal
Galapagos Islands_GS030_Warm Seep
Galapagos Islands_GS035_Coastal
Sargasso Sea_GS001c_Open Ocean
Eastern Tropical Pacific_GS022_Open Ocean
Galapagos Islands_GS027_Coastal
Indian Ocean_GS149_Harbor
Indian Ocean_GS123_Open Ocean
Caribbean Sea_GS016_Coastal Sea
Indian Ocean_GS148_Fringing Reef
Indian Ocean_GS113_Open Ocean
Indian Ocean_GS112a_Open Ocean
Caribbean Sea_GS017_Open Ocean
Indian Ocean_GS121_Open Ocean
Indian Ocean_GS122a_Open Ocean
Galapagos Islands_GS034_Coastal
Caribbean Sea_GS018_Open Ocean
Indian Ocean_GS108a_Lagoon Reef
Indian Ocean_GS110a_Open Ocean
Eastern Tropical Pacific_GS023_Open Ocean
Indian Ocean_GS114_Open Ocean
Caribbean Sea_GS019_Coastal
Caribbean Sea_GS015_Coastal
Indian Ocean_GS119_Open Ocean
Galapagos Islands_GS026_Open Ocean
Polynesia Archipelagos_GS049_Coastal
Indian Ocean_GS120_Open Ocean
Polynesia Archipelagos_GS048a_Coral Reef
Component 1
Component 2
Component 3
Component 4
Component 5
0.1 0.2 0.3 0.4 0.5 0.6 0.2 0.4 0.6 0.8 1.0
Salinity
SampleDepth
Chlorophyll
Temperature
Insolation
WaterDepth
General
High
Medium
Low
NA
High
Medium
Low
NA
Water depth
>4000m
2000!4000m
900!2000m
100!200m
20!100m
0!20m
>4000m
2000!4000m
900!2000m
100!200m
20!100m
0!20m
(a) (b) (c)
Figure 3: a) Niche-space distributions for our five components (HT
); b) the site-
similarity matrix ( ˆHT ˆH); c) environmental variables for the sites. The matrices are
aligned so that the same row corresponds to the same site in each matrix. Sites are
ordered by applying spectral reordering to the similarity matrix (see Materials and
Methods). Rows are aligned across the three matrices.
Functional biogeography of ocean microbes
revealed through non-negative matrix
factorization Jiang et al. PLoS One.
w/ Weitz, Dushoff,
Langille, Neches,
Levin, etc
156. Future Needs II:
• We have still only scratched the surface
of microbial diversity
157. rRNA Tree of Life
Figure from Barton, Eisen et al. “Evolution”, CSHL
Press. 2007.
Based on tree from Pace 1997 Science 276:734-740
Archaea
Eukaryotes
Bacteria