Towards better tools for fungal environmental metagenomics
Towards better tools for fungal environmentalmetagenomicsJason StajichPlant Pathology & Microbiology http://lab.stajich.org http://fungalgenomes.org http://fungidb.org twitter: hyphaltip, stajichlab, fungalgenomes
AcknowledgementsPeng Liu Sapphire Ear Univ of Colorado, BoulderBrad Cavinder Erum Khan Rob Knight IIGB Computational CoreSoﬁa Robb Lorena Rivera Daniel McDonaldJinfeng Chen Carlos RojasAnastasia Gio@ Megna Tiwari Noah Fierer Jessica De Anda Sco0 BatesSteven Ahrendt Annie Nguyen Jon LeﬀDivya Sain Ramy WissaYizhou Wang Marine Biological LaboratoryYi Zhou Mitch Sogin Sue HuseRaghu RamamurthyEdward Liaw Argonne Na@onal LabGreg Gu Folker MeyerDaniel Borcherding Henrik Nilsson Keith Seifert
Molecular Ecology of microbes• What microbes live where?• Using molecular techniques improve upon culture based methods reducing bias in just fast- growing and or culturable organisms.• Many eﬀorts to examine Bacteria and Archaeal diversity with sequencing developed important standards - e.g. Human Microbiome Project.• Eﬀorts towards improving methods of studying of fungi in the environment
Plantae Amoebozoa Choanozoa Metazoa Microsporidia Fungi Rozella Chytridiomycota BlastocladiomycotaMulticellular with Mucoromycotinadifferentiated tissues Entomophthoromycotina Zoopagomycotina Loss of flagellum Kickxellomycotina Glomeromycota Mitotic sporangia Pucciniomycotina Basidiomycota to mitotic conidia Ustilaginomycotina Regular septa Agaricomycotina Taphrinomycotina AscomycotaMeiotic sporangia to Saccharomycotinaexternal meiospores Pezizomycotina 1500 1000 500 0 Millions of years Stajich et al. Current Biol 2009
Fungi interact with many organisms 10.3389/fpls.2011.00100 Betsy Arnold doi: 10.3389/fpls.2011.00100Endophytes Mycorrhiza doi: 10.1016/j.pbi.2009.05.007, F. Martin
How many species of Fungi are there? Mycol. Res. 9S (6): 641--655 (1991) Printed in Great Britain 641 1.5 Million based on fungus to plant ratio of 6:1 Presidential address 1990 The fungal dimension of biodiversity: magnitude, significance, and conservation D. L. HAWKSWORTH International Mycological Institute, Kew, Surrey TW9 3AF, UK American Journal of Botany 98(3): 426–438. 2011.Don’t forget the endophytes... Fungi, members of the kingdoms Chromista, Fungi S.str. and Protozoa studied by mycologists, have received scant consideration in discussions on biodiversity. The number of known species is about 69000, but that in the world is conservatively estimated at 15 million; six-times higher than hitherto suggested. The new world estimate is primarily based on vascular plant:fungus ratios in THE FUNGI: 1, 2, 3 … 5.1 MILLION SPECIES?1 and the soil... different regions. It is considered conservative as: (1) it is based on the lower estimates of world vascular plants; (2) no separate Meredith Blackwell2 provision is made for the vast numbers of insects now suggested to exist; (3) ratios are based on areas still not fully known mycologically; and (4) no allowance is made for higher ratios in tropical and polar regions. Evidence that numerous new species Department of Biological Sciences; Louisiana State University; Baton Rouge, Louisiana 70803 USA remain to be found is presented. This realization has major implications for systematic manpower, resources, and classification. Fungi have and continue to playa vital role in the evolution of terrestrial life (especially through mutualisms), ecosystem functionPremise of the study: Fungi are major decomposers in certain ecosystems and essential associates of many organisms. They • and the provide enzymes and drugs and serve as experimental organisms. In 1991, a landmark paper estimated that there are 1.5 million DOI:10.3732/ajb.1000298 maintenance of biodiversity, human progress, and the operation of Gaia. Conservation in situ and ex situ are complementary, andon the Earth. Because only 70 000 fungi had been described at that time, the estimate has been the impetus to search for fungi the significance of culture collections is stressed. International collaboration is required to develop a world inventory, quantify functional unknown fungi. Fungal habitats include soil, water, and organisms that may harbor large numbers of understudied previously roles, and for effective conservation. fungi, estimated to outnumber plants by at least 6 to 1. More recent estimates based on high-throughput sequencing methods Upwards of 6M species - Lee Taylor (pers suggest that as many as 5.1 million fungal species exist. • Methods: Technological advances make it possible to apply molecular methods to develop a stable classiﬁcation and to dis- cover and identify fungal taxa. Biodiversity, the extent of biological variation on Earth, has species, or populations. Knowledge of all of these is pertinent • Key results: Molecular methods have dramatically increased our knowledge of Fungi in less than 20 years, revealing a mono- comm) come to the fore as a key issue in science and politics for the to a thorough appreciation of the fungal dimension, butkingdom and increased diversity among early-diverging lineages. Mycologists are making signiﬁcant advances in phyletic here“Thus, the Fungi is likely equaled only by the Insecta with respect to eukaryote 1990s. First used as BioDiversity in the title of a scientific meeting in Washington, D.C. in 1986 (Wilson, 1988: p. v), it at other levels. species discovery, but many fungi remain to be discovered. I will centre on species biodiversity; that is basal to discussions • Conclusions: Fungi are essential to the survival of many groups of organisms with which they form associations. They also attract attention as predators of invertebrate animals, pathogens of potatoes and rice and humans and bats, killers of frogs and has been rapidly adopted as a contraction of biotic diversity crayﬁsh, producers of secondary metabolites to lower cholesterol, and subjects of prize-winning research. Molecular tools in use and under development can be used to discover the world’s unknown fungi in less than 1000 years predicted at current new
Microbial Ecology is not just outside• Most humans spend majority of lives indoors• What are the organisms that live in the built environment?• Are there beneﬁcial organisms that inﬂuence overlal composition of communities?• How does the composition change when environmental conditions change (moisture, temperature, food sources)
Microbial Ecology in simple terms• Collecting what’s there (sampling and PCR amplifying) [LAB]• Put labels on things by matching to knowns (BLAST or other approach to see what matches in a database) [COMPUTER]• See what is diﬀerent (compare communities) [COMPUTER] http://xkcd.com/1133/
Sampling and amplifying• Total DNA extracted from a sample - soil, plant tissue, swab• PCR with primers designed to amplify a conserved locus• Sequencing with Sanger sequencing -> Next Generation Sequencing
Metagenomics - Amplicon• Amplify a targeted locus for sequencing.• Works best if there are universal primers which can amplify from all the species of interest• For Bacteria most successful locus has been Ribosomal Small Subunit gene (16S) • Primers that work well to amplify most groups of Bacteria and Archea• Other loci are useful markers for sometimes better species resolution (phylogenetics) or community functional diversity by targeting a protein coding gene
Universal primers Development one of ﬁrst primer sets andampliﬁed regions of rRNA small subunit gene Woese, Pace
Barcoding for multiplexing sampleshttp://www.hmpdacc.org/doc/HMP_MDG_454_16S_Protocol.pdf
Fungal Markers for molecular ecology• Needs to be universally amplifying across all groups• Ribosomal rRNA ( • Small Subunit and Large Subunit genes • Internal Transcribed Spacer 1 and 2• Protein coding genes • EF1alpha, RPB1, RPB2 (Fungal Tree of Life project)
There’s a data storm coming 320k curated Roche-454 1M sequences per run sequences Illumina HiSeq 2-3 Billion sequences per run (10-14 days) Illumina MiSeq 3-5 M reads (1 day) IonTorrent 4-8 M reads (2hrs)
Fungal-speciﬁc Challenges• Alignment of ITS• Establishment of a reference tree • Unalignable sequence into tree with LSU• Naming and Curation of datasets
ITS is most useful as a barcode sequence Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi Conrad L. Schocha,1, Keith A. Seifertb,1, Sabine Huhndorfc, Vincent Robertd, John L. Spougea, C. André Levesqueb, Wen Chenb, and Fungal Barcoding Consortiuma,2 a National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892; bBiodiversity (Mycology and Microbiology), Agriculture and Agri-Food Canada, Ottawa, ON, Canada K1A 0C6; cDepartment of Botany, The Field Museum, Chicago, IL 60605; and d Centraalbureau voor Schimmelcultures Fungal Biodiversity Centre (CBS-KNAW), 3508 AD, Utrecht, The Netherlands Edited* by Daniel H. Janzen, University of Pennsylvania, Philadelphia, PA, and approved February 24, 2012 (received for review October 18, 2011) Six DNA regions were evaluated as potential DNA barcodes for the intron of the trnK gene. This system sets a precedent for Fungi, the second largest kingdom of eukaryotic life, by a multina- reconsidering CO1 as the default fungal barcode. tional, multilaboratory consortium. The region of the mitochondrial CO1 functions reasonably well as a barcode in some fungal cytochrome c oxidase subunit 1 used as the animal barcode was genera, such as Penicillium, with reliable primers and adequate excluded as a potential marker, because it is difﬁcult to amplify in species resolution (67% in this young lineage) (9); however, fungi, often includes large introns, and can be insufﬁciently vari- results in the few other groups examined experimentally are in- able. Three subunits from the nuclear ribosomal RNA cistron were consistent, and cloning is often required (10). The degenerate compared together with regions of three representative protein- primers applicable to many Ascomycota (11) are difﬁcult to as- coding genes (largest subunit of RNA polymerase II, second largest sess, because ampliﬁcation failures may not reﬂect priming subunit of RNA polymerase II, and minichromosome maintenance mismatches. Extreme length variation occurs because of multiple protein). Although the protein-coding gene regions often had introns (9, 12–14), which are not consistently present in a species. MICROBIOLOGY a higher percent of correct identiﬁcation compared with ribosomal Multiple copies of different lengths and variable sequences oc- markers, low PCR ampliﬁcation and sequencing success eliminated cur, with identical sequences sometimes shared by several species them as candidates for a universal fungal barcode. Among the (11). Some fungal clades, such as Neocallimastigomycota (an regions of the ribosomal cistron, the internal transcribed spacer early diverging lineage of obligately anaerobic, zoosporic gut (ITS) region has the highest probability of successful identiﬁcation fungi), lack mitochondria (15). Finally, because most fungi are for the broadest range of fungi, with the most clearly deﬁned bar- microscopic and inconspicuous and many are unculturable, ro- code gap between inter- and intraspeciﬁc variation. The nuclear bust, universal primers must be available to detect a truly rep- ribosomal large subunit, a popular phylogenetic marker in certain resentative proﬁle. This availability seems impossible with CO1. groups, had superior species resolution in some taxonomic groups, The nuclear rRNA cistron has been used for fungal dia- such as the early diverging lineages and the ascomycete yeasts, but gnostics and phylogenetics for more than 20 y (16), and its was otherwise slightly inferior to the ITS. The nuclear ribosomal components are most frequently discussed as alternatives to CO1 small subunit has poor species-level resolution in fungi. ITS will be (13, 17). The eukaryotic rRNA cistron consists of the 18S, 5.8S, formally proposed for adoption as the primary fungal barcode and 28S rRNA genes transcribed as a unit by RNA polymerase I. marker to the Consortium for the Barcode of Life, with the possibil- Posttranscriptional processes split the cistron, removing two in- ity that supplementary barcodes may be developed for particular narrowly circumscribed taxonomic groups. ternal transcribed spacers. These two spacers, including the 5.8S
Solutions• ITS is hard to align across diverse taxa, but LSU is not.• Marker with both sequences would be useful for both phylogenetic placement and barcoding. 5.8S LSU• ITS + LSU amplicon proposed - primer testing with Illumina is under testing - a bit too large by current chemistry but could work in the near future
Putting a name on it• Most sequences will not have identiﬁed names• Grouping all observed sequences together to deﬁne OTU clusters even if no name can be assigned• Curated ITS databases - UNITE project • ~300,000 sequences in UNITE, ~200,000 which are full length (SSU + ITS + LSU) • 50% are identiﬁed to a species level (18,000 distinct latin binomials)
UNITE project forH. Nilsson http://unite.ut.ee
Soil Clone Group 1 - highly abundant, uncultured organismPorter et al. 2008
Soil Clone Group 1 - highly abundant, uncultured organismPorter et al. 2008
What’s in a name? Would a mold by any other name smellas sweet?• “One fungus, one name” is eliminating dual nomeclature (naming of sexual and asexual forms separately)• How to name species from molecular data alone? PERSPEC Uncultured fungus clone unisequences#37-3808_2763 ITS2, PS • Name by close relatives on the tree? Uncultured fungus clone MOTU_2635_GVUGVSB04J56R4 18S rRNA gene, PS, ITS Uncultured fungus clone MOTU_3006_GVUGV5B04JIHT 18S rRNA gene Uncultured fungus clone MOTU_1888_GVUGV5B04JJTLJ 18S rRNA gene Uncultured fungus clone MOTU_2993_GOKCVYYY06HH12J 18S rRNA gene, PS, ITS Uncultured fungus clone MOTU_2930_GOKCVYYY06G7201 18S rRNA Fibulobasidium murrhardtense strain CB59109 18S rRNA gene Uncultured fungus clone MOTU_141_GOKCVYYY06G5FYL 18S rRNA gene, PS, ITS Uncultured Tremellales clone LTSP_EUKA_P4L03 18S rRNA gene, PS, ITS • Use marker loci that contain both ITS and LSU Uncultured fungus clone unisequence#65-3936_0554 ITS2, PS Uncultured fungus clone MOTU_601_GOK Uncultured basidiomycete ITS to better place sequence in tree. Fungi 3 leaves Uncultured fungus clone unise Uncultured Tremellales clone LTSP_EUKA Trichosporonales sp. LM559 18S rRNA gene Uncultured fungus clone unisequences #65-3574_00447, ITS2, PS Uncultured fungus clone MOTU_4349_GOKCVYYY06GR7WA 18S rRNA gene, PS, ITS2 Uncultured fungus clone unisequences#69-3466_2373 ITS2, PS • Proposal to name species in Botanical code Uncultured fungus clone MOTU_43 Uncultured fungus clone F66N0BQ02H1NX5 18S rRNA Uncultured fungus clone LT5P_EUKA_P5H04 18S rRNA gene, 18S–25/28S rRNA gene directly from sequence Uncultured fungus clone MOTU_1778_GVUGB5B04IF01X 18S rRNA gene, PS Uncultured fungus clone MOTU_4043_GVUGB5B04JK5N2 18S rRNA gene, PS, ITS2 Uncultured fungus clone MOTU_2412 Uncultured Agaricomycotina clone 6_g19 18S rRNA gene Uncultured fungus clone MOTU_3797_GOKCVYYY06HBZ1X 18S rRNA gene, PS, ITS2 Uncultured Rhodotorula IT51, 5.8S rRNA, ITS2 and partial 28S rRNA, clone MNIB2FAST_K1 Uncultured Tremellales clone 5_D20 18S rRNA, ITS1, 5.8S rRNA gene, ITS1• Good old fashioned microbiology Uncultured fungus clone U_QM_090130_127_1A_plate1g12.b1 18S rRNA gene, PS, ITS1 Uncultured fungus clone OTU_1445_1GW5CJXV07HXDTO 18S rRNA gene Uncultured fungus clone MOTU_3163_GYUGV5B0412KQP 18S rRNA gene, PS, ITS1 Uncultured fungus clone MOTU_533_GOKCVYYY06GU3JA18S rRNA gene, PS, ITS1 Uncultured fungus clone U_QM_090130_240_B_plate1a12.b1 18S rRNA gene, PS, ITS1 Uncultured fungus clone OTU_403_GW5CJXV07IOX5A 18S rRNA gene Uncultured fungus clone singleton_70-3063_2201 18S rRNA gene, PS, ITS HIbbett and Taylor 2013 gi|22497358|gb|FJ761130.1| uncultured fungus clone singleton_70-3063_2201 18S rRNA gene
From barcodes to organisms - low throughput but effective Dilution to Extinction (d2e)‘High throughput’ isolation from global dust samples Sarea resinae Cryptocoryneum rilstonei Keith Seifert
Communitycomparisons• Pie charts of taxonomic diﬀerences varied across treatments• 16S Community composition varies with smoking and COPD status Erb-Downward et al 2011.
Comparingcommunities• Taxonomic diversity varies across ant worker type and time of year
Tools - QIIME: Quantitative Insight Into Molecular Ecology• For amplicon based datasets (16s, 18s, ITS) • Alpha diversity - phylogenetic diversity, Chao, number of observed species • Generate species diversity plots to assess community diversity • Beta diversity - Unifrac distance, Bray-Curtis, Jaccard • Need reference phylogenetic tree to compute these, unavailable• Support for shotgun metagenomics
Approaches to clustering sequences• De novo clustering • Requires all-vs-all searches, very expensive• Known Knows - “Closed reference” • Match sequences to a database of representative known sequences • Fast, but throw out unknowns• Known Knowns and Known Unknowns - “Open reference” • Match to known set and de novo cluster the remainder
QIIME on fungal data• New (Dec 2012) Fungal ITS reference database from UNITE incorporated as QIIME resource• Can use it to match against known set (closed-reference) or match and cluster unknowns (open reference)• One dataset of Indoor dust samples from Kerry Kinney (UT Austin) group• A second indoor sampled (Amend et al)
QIIME taxonomic distribution for samples Greg Gu
A previously published indoor mycobiome• Amend et al PNAS 2010 “Indoor fungal composition is geographically patterned and more diverse in temperate zones than in the tropics.”• Sequencing dust from houses and oﬃce buildings• 72 samples of fungi from 6 continents. Sampled ITS2 region and the D1-D2 region of LSU with 454-FLX• A primary ﬁnding was increasing species diversity with increasing latitude
ITS 28SPCA of normalized counts – Painted by rRNA type MG-‐RAST tools
PCA of normalized counts – Painted by sampled country MG-‐RAST tools
PCA of normalized counts – Painted by sampled eleva@on MG-‐RAST tools
Metagenomics -shotgun approach• For non-amplicon based studies of community composition• Will be the future approaches for community studies with the increased sequencing depth• Metatranscriptomics for studying what is expressed• Support in QIIME and MG-RAST for the studies, but limited by the diversity of genome/protein sequences which can be matched.
Summary• Fungal microbial ecology is embracing highthroughput sequencing technologies for community studies• Limitations due to lack of curated sequences and the properties of the marker loci used• Building new databases and tools to help with the analyses will improve utility• Improvements in sequencing chemistry (read length x depth) make this a moving target for establishing the best practices• Deeper studies will improve our understanding of the fungal diversity and role of fungi in diﬀerent ecosystems - 1000 genomes project can help provide anchor representatives of this diversity.
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.