Your SlideShare is downloading. ×
Bayesian Taxonomic Assignment for the Next-Generation Metagenomics
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Bayesian Taxonomic Assignment for the Next-Generation Metagenomics

5,496
views

Published on

Talk by Jonathan Eisen about Phylosift, metagenomics, and related topics; for DHS forensics annual meeting

Talk by Jonathan Eisen about Phylosift, metagenomics, and related topics; for DHS forensics annual meeting

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,496
On Slideshare
0
From Embeds
0
Number of Embeds
53
Actions
Shares
0
Downloads
42
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. “Bayesian Taxonomic Assignment for the Next-Generation Metagenomics” Jonathan A. Eisen August 7, 2013 DHS Meeting Wednesday, August 7, 13
  • 2. Shotgun Metagenomics Wednesday, August 7, 13
  • 3. Shotgun Metagenomics Wednesday, August 7, 13
  • 4. Shotgun Metagenomics DNA Wednesday, August 7, 13
  • 5. Shotgun Metagenomics DNA Sequence Wednesday, August 7, 13
  • 6. Shotgun Metagenomics DNA Sequence ????? Wednesday, August 7, 13
  • 7. Shotgun Metagenomics DNA Sequence Who is there? What are they doing? Wednesday, August 7, 13
  • 8. Shotgun Metagenomics DNA Sequence Wednesday, August 7, 13
  • 9. Shotgun Metagenomics • Which communities are most similar / different? • What accounts for the differences? • Natural vs. unnatural • Community level signatures (of events, stability, biogeography, etc) Wednesday, August 7, 13
  • 10. Our Approach - Phylogeny Phylogeny of sequences can reveal details about history, taxonomy, function, and ecology Wednesday, August 7, 13
  • 11. DNA extraction PCR Sequence rRNA genes Sequence alignment = Data matrix Phylogenetic tree PCR rRNA1 rRNA2 Makes lots of copies of the rRNA genes in sample rRNA1 5’...ACACACATAGGTGGAGCTA GCGATCGATCGA... 3’ E. coli Humans A T T A G A A C A T C A C A A C A G G A G T T C rRNA1 E. coli Humans rRNA2 rRNA2 5’..TACAGTATAGGTGGAGCTAG CGACGATCGA... 3’ rRNA phylotyping rRNA3 5’...ACGGCAAAATAGGTGGATT CTAGCGATATAGA... 3’ rRNA4 5’...ACGGCCCGATAGGTGGATT CTAGCGCCATAGA... 3’ rRNA3 C A C T G T rRNA4 C A C A G T Yeast T A C A G T Yeast rRNA3 rRNA4 Phylotyping Wednesday, August 7, 13
  • 12. Uses of Phylogeny in Metagenomics • Taxonomic assessment • Phylogenetic OTUs • Phylogenetic taxonomy assignment • Phylogenetic binning • Sample comparisons and hypothesis testing • Alpha diversity (i.e., PD) • Beta diversity • Trait evolution • Dispersal • Functional predictions • Rates of evolution • Convergence Wednesday, August 7, 13
  • 13. Venter et al., Science 304: 66. 2004 rRNA Phylotyping - Sargasso Metagenome Wednesday, August 7, 13
  • 14. Venter et al., Science 304: 66. 2004 RecA Phylotyping - Sargasso Metagenome Wednesday, August 7, 13
  • 15. 0 0.125 0.250 0.375 0.500 Alphaproteobacteria Betaproteobacteria G am m aproteobacteria Epsilonproteobacteria Deltaproteobacteria C yanobacteriaFirm icutesActinobacteria C hlorobi C FB C hloroflexiSpirochaetesFusobacteria Deinococcus-Therm us EuryarchaeotaC renarchaeota Sargasso Phylotypes Weighted%ofClones Major Phylogenetic Group EFG EFTu HSP70 RecA RpoB rRNA Phylotyping - Sargasso Metagenome Venter et al., Science 304: 66. 2004 Wednesday, August 7, 13
  • 16. GOS 1 GOS 2 GOS 3 GOS 4 GOS 5 Phylogenetic ID of Novel Lineages Wu et al PLoS One 2011 Wednesday, August 7, 13
  • 17. Wu et al. 2006 PLoS Biology 4: e188. Baumannia makes vitamins and cofactors Sulcia makes amino acids Phylogenetic Binning Wednesday, August 7, 13
  • 18. Phylogenetic Functional Prediction Venter et al., Science 304: 66. 2004 Wednesday, August 7, 13
  • 19. Sequencing Revolution • More Samples • Deeper sequencing • The rare biosphere • Relative abundance estimates • More samples (with barcoding) • Times series • Spatially diverse sampling • Fine scale sampling Wednesday, August 7, 13
  • 20. http://phylosift.wordpress.com PhyloSift Supported by DHS Grant Wednesday, August 7, 13
  • 21. Acknowledgements Jonathan Eisen Students and other staff: - Eric Lowe, John Zhang, David Coil Open source community: - BLAST, LAST, HMMER, Infernal, pplacer, Krona, metAMOS, Bioperl, Bio::Phylo, JSON, etc. etc. PhyloSift is open source software: - Website: http://phylosift.wordpress.org - Code: http://github.com/gjospin/phylosift Erick Matsen FHCRC Todd Treangen BNBI, NBACC Holly Bik Tiffanie Nelson Mark Brown Aaron Darling Guillaume Jospin Supported by DHS Grant Wednesday, August 7, 13
  • 22. Analysis & Summary PhyloSift Wednesday, August 7, 13
  • 23. Analysis & Summary Analysis & Summary •Metagenomic reads •Contigs •Genes PhyloSift Wednesday, August 7, 13
  • 24. Analysis & Summary Searching inputs against reference family DB PhyloSift Wednesday, August 7, 13
  • 25. Analysis & Summary Align to reference HMMs for each family PhyloSift Wednesday, August 7, 13
  • 26. Analysis & Summary Place reads into reference phylogeny using pplacer PhyloSift Wednesday, August 7, 13
  • 27. Analysis & Summary Summarize results & additional analyses PhyloSift Wednesday, August 7, 13
  • 28. Output 1: Taxonomy Wednesday, August 7, 13
  • 29. Taxonomic summary plots in Krona (Ondov et al 2011) Taxonomic Summaries (via Krona) Wednesday, August 7, 13
  • 30. Wednesday, August 7, 13
  • 31. Wednesday, August 7, 13
  • 32. Tree Reconciliation in PhyloSift Wednesday, August 7, 13
  • 33. Tree Reconciliation in PhyloSift Environmental Sequences Named Taxa Wednesday, August 7, 13
  • 34. Output 2: Phylogenetic Tree of Reads Wednesday, August 7, 13
  • 35. PhyloSift Tree Browsing Darling et al Submitted Placement tree from 2 week old infant gut data Wednesday, August 7, 13
  • 36. Output 3: Edge PCA  Edge PCA for exploratory data analysis (Matsen and Evans 2013)  Given E edges and S samples: − For each edge, calculate difference in placement mass on either side of edge − Results in E x S matrix − Calculate E x E covariance matrix − Calculate eigenvectors, eigenvalues of covariance matrix  Eigenvector: each value indicates how “important” an edge is in explaining differences among the S samples Example calculating a matrix entry for an edge: This edge gets 5-2=3 mass=5 mass=2 Wednesday, August 7, 13
  • 37. Edge PCA: Identify lineages that explain most variation among samples Matsen and Evans 2013, Darling et al Submitted. Edge PCA Wednesday, August 7, 13
  • 38. QIIME and Edge PCA on 110 fecal metagenomes from Yatsunenko et al 2012 Nature. Sequenced with 454, to about 150Mbp/metagenome Darling et al Submitted. Edge PCA vs. UNIFRAC PCA Wednesday, August 7, 13
  • 39. Output 4: Forensics Wednesday, August 7, 13
  • 40. Output 4: Forensics Wednesday, August 7, 13
  • 41. Analysis & Summary Analysis & Summary •Metagenomic reads •Contigs •Genes PhyloSift Wednesday, August 7, 13
  • 42. Analysis & Summary Analysis & Summary •Metagenomic reads •Contigs •Genes PhyloSift Challenge - Short Non Overlapping Reads Wednesday, August 7, 13
  • 43. Analysis & Summary Searching inputs against reference family DB PhyloSift Wednesday, August 7, 13
  • 44. Markers • PMPROK – Dongying Wu’s Bac/Arch markers • Eukaryotic Orthologs – Parfrey 2011 paper • 16S/18S rRNA • Mitochondria - protein-coding genes • Viral Markers – Markov clustering on genomes • Codon Subtrees – finer scale taxonomy • Extended Markers – plastids, gene families Wednesday, August 7, 13
  • 45. PMPROK Genes Wednesday, August 7, 13
  • 46. Analysis & Summary PhyloSift Challenges: •Limited ref. genomes •Limited markers, families Searching inputs against reference family DB Wednesday, August 7, 13
  • 47. Improving I: More Markers Phylogenetic group Genome Number Gene Number Maker Candidates Archaea 62 145415 106 Actinobacteria 63 267783 136 Alphaproteobacteria 94 347287 121 Betaproteobacteria 56 266362 311 Gammaproteobacteria 126 483632 118 Deltaproteobacteria 25 102115 206 Epislonproteobacteria 18 33416 455 Bacteriodes 25 71531 286 Chlamydae 13 13823 560 Chloroflexi 10 33577 323 Cyanobacteria 36 124080 590 Firmicutes 106 312309 87 Spirochaetes 18 38832 176 Thermi 5 14160 974 Thermotogae 9 17037 684 Wu et al. PLOS One 2013. In press. Wednesday, August 7, 13
  • 48. Representative Genomes Extract Protein Annotation All v. All BLAST Homology Clustering (MCL) SFams Align & Build HMMs HMMs Screen for Homologs New Genomes Extract Protein Annotation Figure 1 Sharpton et al. 2013 A B C Improving II: More Families Wednesday, August 7, 13
  • 49. Improving III: Filling in the Tree Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree Wednesday, August 7, 13
  • 50. Genomic Encyclopedia of Bacteria & Archaea Wu et al. 2009 Nature 462, 1056-1060 Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree Wednesday, August 7, 13
  • 51. Genomic Encyclopedia of Bacteria & Archaea Wu et al. 2009 Nature 462, 1056-1060 Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree Wednesday, August 7, 13
  • 52. Family Diversity vs. PD Wu et al. 2009 Nature 462, 1056-1060 Wednesday, August 7, 13
  • 53. The Dark Matter of Biology From Wu et al. 2009 Nature 462, 1056-1060 Wednesday, August 7, 13
  • 54. 50 Number of SAGs from Candidate Phyla OD1 OP11 OP3 SAR406 Site A: Hydrothermal vent 4 1 - - Site B: Gold Mine 6 13 2 - Site C: Tropical gyres (Mesopelagic) - - - 2 Site D: Tropical gyres (Photic zone) 1 - - - Sample collections at 4 additional sites are underway. Phil Hugenholtz GEBA Uncultured Wednesday, August 7, 13
  • 55. JGI Dark Matter Project environmental samples (n=9) isolation of single cells (n=9,600) whole genome amplification (n=3,300) SSU rRNA gene based identification (n=2,000) genome sequencing, assembly and QC (n=201) draft genomes (n=201) SAK HSM ETLTG HOT GOM GBS EPR TAETL T PR EBS AK E SM G TATTG OM OT seawater brackish/freshwater hydrothermal sediment bioreactor GN04 WS3 (Latescibacteria) GN01 +Gí LD1 WS1 Poribacteria BRC1 Lentisphaerae Verrucomicrobia OP3 (Omnitrophica) Chlamydiae Planctomycetes NKB19 (Hydrogenedentes) WYO Armatimonadetes WS4 Actinobacteria Gemmatimonadetes NC10 SC4 WS2 Cyanobacteria :36í2 Deltaproteobacteria EM19 (Calescamantes) 2FW6SDí )HUYLGLEDFWHULD
  • 56. GAL35 Aquificae EM3 Thermotogae Dictyoglomi SPAM GAL15 CD12 (Aerophobetes) OP8 (Aminicenantes) AC1 SBR1093 Thermodesulfobacteria Deferribacteres Synergistetes OP9 (Atribacteria) :36í2 Caldiserica AD3 Chloroflexi Acidobacteria Elusimicrobia Nitrospirae 49S1 2B Caldithrix GOUTA4 6$5 0DULQLPLFURELD
  • 57. Chlorobi )LUPLFXWHV Tenericutes )XVREDFWHULD Chrysiogenetes Proteobacteria )LEUREDFWHUHV TG3 Spirochaetes WWE1 (Cloacamonetes) 70 ZB3 093í 'HLQRFRFFXVí7KHUPXV OP1 (Acetothermia) Bacteriodetes TM7 GN02 (Gracilibacteria) SR1 BH1 OD1 (Parcubacteria) :6 OP11 (Microgenomates) Euryarchaeota Micrarchaea DSEG (Aenigmarchaea) Nanohaloarchaea Nanoarchaea Cren MCG Thaumarchaeota Cren C2 Aigarchaeota Cren pISA7 Cren Thermoprotei Korarchaeota pMC2A384 (Diapherotrites) BACTERIA ARCHAEA archaeal toxins (Nanoarchaea) lytic murein transglycosylase stringent response (Diapherotrites, Nanoarchaea) ppGpp limiting amino acids SpotT RelA (GTP or GDP) + PPi GTP or GDP +ATP limiting phosphate, fatty acids, carbon, iron DksA Expression of components for stress response sigma factor (Diapherotrites, Nanoarchaea) ı4 ȕ ȕ¶ ı2ı3 ı1 -35 -10 Į17' Į7' 51$ SROPHUDVH oxidoretucase + +e- donor e- acceptor H 1 Ribo ADP + 1+2 O Reduction Oxidation H 1 Ribo ADP 1+ O 2H 1$' + H 1$'++ + - HGT from Eukaryotes (Nanoarchaea) Eukaryota O +2+2 OH 1+ 2+3 O O +2+2 1+ 2+3 O tetra- peptide O +2+2 OH 1+ 2+3 O O +2+2 1+ 2+3 O tetra- peptide murein (peptido-glycan) archaeal type purine synthesis (Microgenomates) PurF PurD 3XU1 PurL/Q PurM PurK PurE 3XU PurB PurP ? Archaea adenine guanine O + 12 + 1 1+2 1 1 H H 1 1 1 H H H1 1 H PRPP )$,$5 IMP $,$5 A GUA G U G U A G U A U A U A U Growing AA chain W51$*O
  • 58. recognizes UGA P51$ UGA recoded for Gly (Gracilibacteria) ribosome Woyke et al. Nature 2013. Wednesday, August 7, 13
  • 59. A Genomic Encyclopedia of Microbes (GEM) Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree Wednesday, August 7, 13
  • 60. A Genomic Encyclopedia of Microbes (GEM) Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree Wednesday, August 7, 13
  • 61. Analysis Summary Align to reference HMMs for each family PhyloSift Wednesday, August 7, 13
  • 62. Analysis Summary Align to reference HMMs for each family PhyloSift Challenge: How to align? Wednesday, August 7, 13
  • 63. Zorro - Automated Masking cetoTrueTree 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 200 400 800 1600 3200 DistancetoTrueTree Sequence Length 200 no masking zorro gblocks Wu M, Chatterji S, Eisen JA (2012) Accounting For Alignment Uncertainty in Phylogenomics. PLoS ONE 7(1): e30288. doi:10.1371/journal.pone. 0030288 Wednesday, August 7, 13
  • 64. Analysis Summary Place reads into reference phylogeny using pplacer PhyloSift Wednesday, August 7, 13
  • 65. Analysis Summary Place reads into reference phylogeny using pplacer PhyloSift Challenges: •Trees from short reads •Probabilistic methods Wednesday, August 7, 13
  • 66. Improving IV: Better Reference Tree Lang et al. 2013 Wednesday, August 7, 13
  • 67. Analysis Summary Summarize results additional analyses PhyloSift Wednesday, August 7, 13
  • 68. Phylosift DB Update Amino Acid Tree Run PhyloSift (search + align) Execute'dbupdate'mode' A'taxa'set'is'selected'with'a' maxPD'cutoff'of'0.02'and'a'new' tree'is'inferred' EBI' Genomes' Infer Updated Tree Add'new'sequences'to'marker'packages' JGI' Genomes' Private' Genomes' NCBI' Genomes' Nucleotide Tree Prune Tree Update reference sequences with new data New'sequences'added'at'0.25'PD'for'amino' acid'tree;'higher'PD'threshold'enables' more'aggressive'searches'of'reference' database,'since'LAST'searching'is'faster' with'fewer'sequences.' Reconcile'NCBI'taxonomy'IDs'with' phylogeneOc'topologies,'for'both' amino'acid'tree'and'codon'subtrees' Tree Reconciliation Package Markers Users’'local'marker'databases'are'automaOcally' scanned'each'Ome'PhyloSiR'is'run'and'any'new' updates'are'automaOcally'downloaded'if'available' Automated Download to PhyloSift Users Prune Tree A'taxa'set'is'selected'with'a' maxPD'cutoff'of'0.01'and'a'new' tree'is'inferred' Wednesday, August 7, 13
  • 69. Improving VI: Other Methods • PhylOTU • Kembel all markers • Kembel copy # correction Wednesday, August 7, 13
  • 70. Kembel Correction Kembel SW, Wu M, Eisen JA, Green JL (2012) Incorporating 16S Gene Copy Number Information Improves Estimates of Microbial Diversity and Abundance. PLoS Comput Biol 8(10): e1002743. doi:10.1371/journal.pcbi.1002743 Wednesday, August 7, 13
  • 71. alignment used to build the profile, resulting in a multiple sequence alignment of full-length reference sequences and metagenomic reads. The final step of the alignment process is a PD versus PID clustering, 2) to explore overlap betw clusters and recognized taxonomic designations, and the accuracy of PhylOTU clusters from shotgun re Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in workflow of PhylOTU. See Results section for details. doi:10.1371/journal.pcbi.1001061.g001 Finding Meta Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O'Dwyer JP, Green JL, Eisen JA, Pollard KS. (2011) PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061 PhylOTU Wednesday, August 7, 13
  • 72. Kembel Combiner typically used as a qualitative measure because duplicate s quences are usually removed from the tree. However, the test may be used in a semiquantitative manner if all clone even those with identical or near-identical sequences, are i cluded in the tree (13). Here we describe a quantitative version of UniFrac that w call “weighted UniFrac.” We show that weighted UniFrac b haves similarly to the FST test in situations where both a FIG. 1. Calculation of the unweighted and the weighted UniFr measures. Squares and circles represent sequences from two differe environments. (a) In unweighted UniFrac, the distance between t circle and square communities is calculated as the fraction of t branch length that has descendants from either the square or the circ environment (black) but not both (gray). (b) In weighted UniFra branch lengths are weighted by the relative abundance of sequences the square and circle communities; square sequences are weight twice as much as circle sequences because there are twice as many tot circle sequences in the data set. The width of branches is proportion to the degree to which each branch is weighted in the calculations, an gray branches have no weight. Branches 1 and 2 have heavy weigh since the descendants are biased toward the square and circles, respe tively. Branch 3 contributes no value since it has an equal contributio from circle and square sequences after normalization. Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214 Wednesday, August 7, 13
  • 73. NMF in MetagenomesCharacterizing the niche-space distributions of components Sites North American East Coast_GS005_Embayment North American East Coast_GS002_Coastal North American East Coast_GS003_Coastal North American East Coast_GS007_Coastal North American East Coast_GS004_Coastal North American East Coast_GS013_Coastal North American East Coast_GS008_Coastal North American East Coast_GS011_Estuary North American East Coast_GS009_Coastal Eastern Tropical Pacific_GS021_Coastal North American East Coast_GS006_Estuary North American East Coast_GS014_Coastal Polynesia Archipelagos_GS051_Coral Reef Atoll Galapagos Islands_GS036_Coastal Galapagos Islands_GS028_Coastal Indian Ocean_GS117a_Coastal sample Galapagos Islands_GS031_Coastal upwelling Galapagos Islands_GS029_Coastal Galapagos Islands_GS030_Warm Seep Galapagos Islands_GS035_Coastal Sargasso Sea_GS001c_Open Ocean Eastern Tropical Pacific_GS022_Open Ocean Galapagos Islands_GS027_Coastal Indian Ocean_GS149_Harbor Indian Ocean_GS123_Open Ocean Caribbean Sea_GS016_Coastal Sea Indian Ocean_GS148_Fringing Reef Indian Ocean_GS113_Open Ocean Indian Ocean_GS112a_Open Ocean Caribbean Sea_GS017_Open Ocean Indian Ocean_GS121_Open Ocean Indian Ocean_GS122a_Open Ocean Galapagos Islands_GS034_Coastal Caribbean Sea_GS018_Open Ocean Indian Ocean_GS108a_Lagoon Reef Indian Ocean_GS110a_Open Ocean Eastern Tropical Pacific_GS023_Open Ocean Indian Ocean_GS114_Open Ocean Caribbean Sea_GS019_Coastal Caribbean Sea_GS015_Coastal Indian Ocean_GS119_Open Ocean Galapagos Islands_GS026_Open Ocean Polynesia Archipelagos_GS049_Coastal Indian Ocean_GS120_Open Ocean Polynesia Archipelagos_GS048a_Coral Reef Component 1 Component 2 Component 3 Component 4 Component 5 0.1 0.2 0.3 0.4 0.5 0.6 0.2 0.4 0.6 0.8 1.0 Salinity SampleDepth Chlorophyll Temperature Insolation WaterDepth General High M edium Low NA High M edium Low NA Water depth 4000m 2000!4000m 900!2000m 100!200m 20!100m 0!20m 4000m 2000!4000m 900!2000m 100!200m 20!100m 0!20m (a) (b) (c) Figure 3: a) Niche-space distributions for our five components (HT ); b) the site- similarity matrix ( ˆHT ˆH); c) environmental variables for the sites. The matrices are aligned so that the same row corresponds to the same site in each matrix. Sites are ordered by applying spectral reordering to the similarity matrix (see Materials and Methods). Rows are aligned across the three matrices. Functional biogeography of ocean microbes revealed through non-negative matrix factorization Jiang et al. In press PLoS One. Comes out 9/18. w/ Weitz, Dushoff, Langille, Neches, Levin, etc Wednesday, August 7, 13
  • 74. Other Uses of PhyloSift • Integration with other tools (e.g., QIIME) • LGT detection • Contamination screening • Synthetic Biology Orders Wednesday, August 7, 13
  • 75. w 68 Amino Acid Tree Run PhyloSift (search + align) Execute'dbupdate'mode' A'taxa'set'is'selected'with'a' maxPD'cutoff'of'0.02'and'a'new' tree'is'inferred' EBI' Genomes' Infer Updated Tree Add'new'sequences'to'marker'packages' JGI' Genomes' Private' Genomes' NCBI' Genomes' Nucleotide Tree Prune Tree Update reference sequences with new data New'sequences'added'at'0.25'PD'for'amino' acid'tree;'higher'PD'threshold'enables' more'aggressive'searches'of'reference' database,'since'LAST'searching'is'faster' with'fewer'sequences.' Reconcile'NCBI'taxonomy'IDs'with' phylogeneOc'topologies,'for'both' amino'acid'tree'and'codon'subtrees' Tree Reconciliation Package Markers Users’'local'marker'databases'are'automaOcally' scanned'each'Ome'PhyloSiR'is'run'and'any'new' updates'are'automaOcally'downloaded'if'available' Automated Download to PhyloSift Users Prune Tree A'taxa'set'is'selected'with'a' maxPD'cutoff'of'0.01'and'a'new' tree'is'inferred' Wednesday, August 7, 13
  • 76. Improving VII: More Samples Wednesday, August 7, 13
  • 77. The Built Environment ORIGINAL ARTICLE Architectural design influences the diversity and structure of the built environment microbiome Steven W Kembel1 , Evan Jones1 , Jeff Kline1,2 , Dale Northcutt1,2 , Jason Stenson1,2 , Ann M Womack1 , Brendan JM Bohannan1 , G Z Brown1,2 and Jessica L Green1,3 1 Biology and the Built Environment Center, Institute of Ecology and Evolution, Department of Biology, University of Oregon, Eugene, OR, USA; 2 Energy Studies in Buildings Laboratory, Department of Architecture, University of Oregon, Eugene, OR, USA and 3 Santa Fe Institute, Santa Fe, NM, USA Buildings are complex ecosystems that house trillions of microorganisms interacting with each other, with humans and with their environment. Understanding the ecological and evolutionary processes that determine the diversity and composition of the built environment microbiome—the community of microorganisms that live indoors—is important for understanding the relationship between building design, biodiversity and human health. In this study, we used high-throughput sequencing of the bacterial 16S rRNA gene to quantify relationships between building attributes and airborne bacterial communities at a health-care facility. We quantified airborne bacterial community structure and environmental conditions in patient rooms exposed to mechanical or window ventilation and in outdoor air. The phylogenetic diversity of airborne bacterial communities was lower indoors than outdoors, and mechanically ventilated rooms contained less diverse microbial communities than did window-ventilated rooms. Bacterial communities in indoor environments contained many taxa that are absent or rare outdoors, including taxa closely related to potential human pathogens. Building attributes, specifically the source of ventilation air, airflow rates, relative humidity and temperature, were correlated with the diversity and composition of indoor bacterial communities. The relative abundance of bacteria closely related to human pathogens was higher indoors than outdoors, and higher in rooms with lower airflow rates and lower relative humidity. The observed relationship between building design and airborne bacterial diversity suggests that we can manage indoor environments, altering through building design and operation the community of microbial species that potentially colonize the human microbiome during our time indoors. The ISME Journal advance online publication, 26 January 2012; doi:10.1038/ismej.2011.211 Subject Category: microbial population and community ecology Keywords: aeromicrobiology; bacteria; built environment microbiome; community ecology; dispersal; environmental filtering Introduction Humans spend up to 90% of their lives indoors (Klepeis et al., 2001). Consequently, the way we microbiome—includes human pathogens and com- mensals interacting with each other and with their environment (Eames et al., 2009). There have been few attempts to comprehensively survey the built The ISME Journal (2012), 1–11 2012 International Society for Microbial Ecology All rights reserved 1751-7362/12 www.nature.com/ismej Microbial Biogeography of Public Restroom Surfaces Gilberto E. Flores1 , Scott T. Bates1 , Dan Knights2 , Christian L. Lauber1 , Jesse Stombaugh3 , Rob Knight3,4 , Noah Fierer1,5 * 1 Cooperative Institute for Research in Environmental Science, University of Colorado, Boulder, Colorado, United States of America, 2 Department of Computer Science, University of Colorado, Boulder, Colorado, United States of America, 3 Department of Chemistry and Biochemistry, University of Colorado, Boulder, Colorado, United States of America, 4 Howard Hughes Medical Institute, University of Colorado, Boulder, Colorado, United States of America, 5 Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, Colorado, United States of America Abstract We spend the majority of our lives indoors where we are constantly exposed to bacteria residing on surfaces. However, the diversity of these surface-associated communities is largely unknown. We explored the biogeographical patterns exhibited by bacteria across ten surfaces within each of twelve public restrooms. Using high-throughput barcoded pyrosequencing of the 16 S rRNA gene, we identified 19 bacterial phyla across all surfaces. Most sequences belonged to four phyla: Actinobacteria, Bacteriodetes, Firmicutes and Proteobacteria. The communities clustered into three general categories: those found on surfaces associated with toilets, those on the restroom floor, and those found on surfaces routinely touched with hands. On toilet surfaces, gut-associated taxa were more prevalent, suggesting fecal contamination of these surfaces. Floor surfaces were the most diverse of all communities and contained several taxa commonly found in soils. Skin-associated bacteria, especially the Propionibacteriaceae, dominated surfaces routinely touched with our hands. Certain taxa were more common in female than in male restrooms as vagina-associated Lactobacillaceae were widely distributed in female restrooms, likely from urine contamination. Use of the SourceTracker algorithm confirmed many of our taxonomic observations as human skin was the primary source of bacteria on restroom surfaces. Overall, these results demonstrate that restroom surfaces host relatively diverse microbial communities dominated by human-associated bacteria with clear linkages between communities on or in different body sites and those communities found on restroom surfaces. More generally, this work is relevant to the public health field as we show that human-associated microbes are commonly found on restroom surfaces suggesting that bacterial pathogens could readily be transmitted between individuals by the touching of surfaces. Furthermore, we demonstrate that we can use high-throughput analyses of bacterial communities to determine sources of bacteria on indoor surfaces, an approach which could be used to track pathogen transmission and test the efficacy of hygiene practices. Citation: Flores GE, Bates ST, Knights D, Lauber CL, Stombaugh J, et al. (2011) Microbial Biogeography of Public Restroom Surfaces. PLoS ONE 6(11): e28132. doi:10.1371/journal.pone.0028132 Editor: Mark R. Liles, Auburn University, United States of America Received September 12, 2011; Accepted November 1, 2011; Published November 23, 2011 Copyright: ß 2011 Flores et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported with funding from the Alfred P. Sloan Foundation and their Indoor Environment program, and in part by the National Institutes of Health and the Howard Hughes Medical Institute. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: noah.fierer@colorado.edu Introduction More than ever, individuals across the globe spend a large portion of their lives indoors, yet relatively little is known about the microbial diversity of indoor environments. Of the studies that have examined microorganisms associated with indoor environ- ments, most have relied upon cultivation-based techniques to detect organisms residing on a variety of household surfaces [1–5]. Not surprisingly, these studies have identified surfaces in kitchens and restrooms as being hot spots of bacterial contamination. Because several pathogenic bacteria are known to survive on surfaces for extended periods of time [6–8], these studies are of obvious importance in preventing the spread of human disease. However, it is now widely recognized that the majority of microorganisms cannot be readily cultivated [9] and thus, the overall diversity of microorganisms associated with indoor communities and revealed a greater diversity of bacteria on indoor surfaces than captured using cultivation-based techniques [10–13]. Most of the organisms identified in these studies are related to human commensals suggesting that the organisms are not actively growing on the surfaces but rather were deposited directly (i.e. touching) or indirectly (e.g. shedding of skin cells) by humans. Despite these efforts, we still have an incomplete understanding of bacterial communities associated with indoor environments because limitations of traditional 16 S rRNA gene cloning and sequencing techniques have made replicate sampling and in-depth characterizations of the communities prohibitive. With the advent of high-throughput sequencing techniques, we can now investigate indoor microbial communities at an unprecedented depth and begin to understand the relationship between humans, microbes and the built environment. In order to begin to comprehensively describe the microbial the stall in), they were likely dispersed manually after women used the toilet. Coupling these observations with those of the distribution of gut-associated bacteria indicate that routine use of toilets results in the dispersal of urine- and fecal-associated bacteria throughout the restroom. While these results are not unexpected, they do highlight the importance of hand-hygiene when using public restrooms since these surfaces could also be potential vehicles for the transmission of human pathogens. Unfortunately, previous studies have documented that college students (who are likely the most frequent users of the studied restrooms) are not always the most diligent of hand-washers [42,43]. Results of SourceTracker analysis support the taxonomic patterns highlighted above, indicating that human skin was the primary source of bacteria on all public restroom surfaces examined, while the human gut was an important source on or around the toilet, and urine was an important source in women’s restrooms (Figure 4, Table S4). Contrary to expectations (see above), soil was not identified by the SourceTracker algorithm as being a major source of bacteria on any of the surfaces, including floors (Figure 4). Although the floor samples contained family-level taxa that are common in soil, the SourceTracker algorithm probably underestimates the relative importance of sources, like Figure 3. Cartoon illustrations of the relative abundance of discriminating taxa on public restroom surfaces. Light blue indicates low abundance while dark blue indicates high abundance of taxa. (A) Although skin-associated taxa (Propionibacteriaceae, Corynebacteriaceae, Staphylococcaceae and Streptococcaceae) were abundant on all surfaces, they were relatively more abundant on surfaces routinely touched with hands. (B) Gut-associated taxa (Clostridiales, Clostridiales group XI, Ruminococcaceae, Lachnospiraceae, Prevotellaceae and Bacteroidaceae) were most abundant on toilet surfaces. (C) Although soil-associated taxa (Rhodobacteraceae, Rhizobiales, Microbacteriaceae and Nocardioidaceae) were in low abundance on all restroom surfaces, they were relatively more abundant on the floor of the restrooms we surveyed. Figure not drawn to scale. doi:10.1371/journal.pone.0028132.g003 Bacteria of Public Restrooms high diversity of floor communities is likely due to the frequency of contact with the bottom of shoes, which would track in a diversity of microorganisms from a variety of sources including soil, which is known to be a highly-diverse microbial habitat [27,39]. Indeed, bacteria commonly associated with soil (e.g. Rhodobacteraceae, Rhizobiales, Microbacteriaceae and Nocardioidaceae) were, on average, more abundant on floor surfaces (Figure 3C, Table S2). Interestingly, some of the toilet flush handles harbored bacterial related differences in the relative abundances of s some surfaces (Figure 1B, Table S2). Most notably were clearly more abundant on certain surfaces restrooms than male restrooms (Figure 1B). Some family are the most common, and often most abun found in the vagina of healthy reproductive age w and are relatively less abundant in male urine analysis of female urine samples collected as part Figure 2. Relationship between bacterial communities associated with ten public restroom surfaces. Communities were PCoA of the unweighted UniFrac distance matrix. Each point represents a single sample. Note that the floor (triangles) and toilet (as form clusters distinct from surfaces touched with hands. doi:10.1371/journal.pone.0028132.g002 Bacteria of P time, the un to take of outside om plants ours after ere shut ortion of e human ck to pre- which 26 Janu- Journal, hanically had lower y than ones with open win- ility of fresh air translated tions of microbes associ- an body, and consequently, pathogens. Although this hat having natural airflow Green says answering that clinical data; she’s hoping ital to participate in a study ence of hospital-acquired they move around. But to quantify those con- tributions, Peccia’s team has had to develop new methods to collect airborne bacteria and extract their DNA, as the microbes are much less abundant in air than on surfaces. In one recent study, they used air filters to sample airborne particles and microbes in a classroom during 4 days during which students were present and 4 days during which the room was vacant. They measured pant in indoor microbial ecology research, Peccia thinks that the field has yet to gel. And the Sloan Foundation’s Olsiewski shares some of his con- cern. “Everybody’s gen- erating vast amounts of data,” she says, but looking across data sets can be difficult because groups choose dif- ferent analytical tools. With Sloan support, though, a data archive and integrated analyt- ical tools are in the works. To foster collaborations between micro- biologists, architects, and building scientists, the foundation also sponsored a symposium on the microbiome of the built environment at the 2011 Indoor Air conference in Austin, 100 80 60 40 20 0 Averagecontribution(%) DoorinDoorout StallinStallout Faucethandles SoapdispenserToiletseat ToiletflushhandleToiletfloorSinkfloor SOURCES Soil Water Mouth Urine Gut Skin Bathroom biogeography. By swabbing different surfaces in public restrooms, researchers determinedthatmicrobesvaryin where they come from depend- ing on the surface (chart). onFebruary9,2012 Wednesday, August 7, 13
  • 78. Citizen Science Wednesday, August 7, 13
  • 79. Crowdfunding/Crowdsourcing Wednesday, August 7, 13
  • 80. Acknowledgements Jonathan Eisen Students and other staff: - Eric Lowe, John Zhang, David Coil Open source community: - BLAST, LAST, HMMER, Infernal, pplacer, Krona, metAMOS, Bioperl, Bio::Phylo, JSON, etc. etc. PhyloSift is open source software: - Website: http://phylosift.wordpress.org - Code: http://github.com/gjospin/phylosift Erick Matsen FHCRC Todd Treangen BNBI, NBACC Holly Bik Tiffanie Nelson Mark Brown Aaron Darling Guillaume Jospin Supported by DHS Grant Wednesday, August 7, 13

×