This document discusses phylogeny-driven approaches to studying microbial diversity using ribosomal RNA gene sequences. It provides background on how advances in sequencing technology and appreciation of microbial diversity have enabled microbiome research. The document outlines several uses of phylogeny in microbiome studies, including constructing species phylogenies using rRNA sequences and assigning taxonomy to environmental sequences via rRNA phylotyping. It describes challenges with analyzing large rRNA datasets and introduces an automated pipeline called STAP that generates high-quality multiple sequence alignments and phylogenetic trees to classify sequences and analyze species diversity in a manner that scales to large datasets.
This document provides an overview of a tutorial on analyzing microbiome data using 16S rRNA gene sequencing and metagenomics. The morning session covers the basics of 16S analysis including sample collection, PCR amplification of the 16S gene, clustering sequences into OTUs, assigning taxonomy, and calculating alpha and beta diversity. The assumptions and limitations of 16S analysis are also discussed. The afternoon session introduces metagenomics and compares it to 16S analysis. It covers taxonomic and functional profiling from metagenomic data as well as tools like PICRUSt for predicting gene functions. The document concludes by discussing the value of multi-omics approaches that integrate different types of microbiome data.
Tom Delmont: From the Terragenome Project to Global Metagenomic Comparisons: ...GigaScience, BGI Hong Kong
This document discusses challenges in comparing metagenomic data from different environments and studies. It argues that when exploring a new environment, multiple methodological approaches should be used to capture natural and methodological variations. When performing global comparisons, methodological variations should be considered for all environments. Defining ecosystems precisely at the microorganism level is important. The author's vision is for projects like the Earth Microbiome Project to use flexible experimental designs informed by different experts to best represent microbial communities.
Targeted RNA Sequencing, Urban Metagenomics, and Astronaut GenomicsQIAGEN
This document discusses targeted RNA sequencing and metagenomics projects including:
1. Using targeted RNA panels to profile gene expression in acute lymphoblastic leukemia patients to identify chemo-resistant clones hiding at low frequencies.
2. Conducting the first city-scale metagenomic profile of the New York City subway system, finding many bacterial species including those associated with skin.
3. Ongoing plans to conduct metropolitan-scale metagenomic profiling in several major cities around the world to better understand urban microbiomes and human-microbe interactions.
This document provides an introduction to metagenomics. It defines metagenomics as the study of microbial communities directly in their natural environments using modern genomics techniques. The document outlines the historical context and basic purpose of metagenomics. It describes some of the applications of metagenomics, such as understanding the human microbiome, bioremediation, bioenergy production, and smart farming. Finally, it introduces some basic concepts in metagenomics analysis including binning, OTUs, alpha and beta diversity measurements, and challenges around estimating diversity from samples.
Metagenomics research is a vast field which studies about the genetic system of the
environmental samples. Binning is a bioinformatics tool. Binning tool helps to analyses the
genomic analysis of the environmental samples.The
Microbial Metagenomics Drives a New CyberinfrastructureLarry Smarr
06.03.03
Invited Talk
School of Biological Sciences
University of California, Irvine
Title: Microbial Metagenomics Drives a New Cyberinfrastructure
Irvine, CA
This document provides an overview of a tutorial on analyzing microbiome data using 16S rRNA gene sequencing and metagenomics. The morning session covers the basics of 16S analysis including sample collection, PCR amplification of the 16S gene, clustering sequences into OTUs, assigning taxonomy, and calculating alpha and beta diversity. The assumptions and limitations of 16S analysis are also discussed. The afternoon session introduces metagenomics and compares it to 16S analysis. It covers taxonomic and functional profiling from metagenomic data as well as tools like PICRUSt for predicting gene functions. The document concludes by discussing the value of multi-omics approaches that integrate different types of microbiome data.
Tom Delmont: From the Terragenome Project to Global Metagenomic Comparisons: ...GigaScience, BGI Hong Kong
This document discusses challenges in comparing metagenomic data from different environments and studies. It argues that when exploring a new environment, multiple methodological approaches should be used to capture natural and methodological variations. When performing global comparisons, methodological variations should be considered for all environments. Defining ecosystems precisely at the microorganism level is important. The author's vision is for projects like the Earth Microbiome Project to use flexible experimental designs informed by different experts to best represent microbial communities.
Targeted RNA Sequencing, Urban Metagenomics, and Astronaut GenomicsQIAGEN
This document discusses targeted RNA sequencing and metagenomics projects including:
1. Using targeted RNA panels to profile gene expression in acute lymphoblastic leukemia patients to identify chemo-resistant clones hiding at low frequencies.
2. Conducting the first city-scale metagenomic profile of the New York City subway system, finding many bacterial species including those associated with skin.
3. Ongoing plans to conduct metropolitan-scale metagenomic profiling in several major cities around the world to better understand urban microbiomes and human-microbe interactions.
This document provides an introduction to metagenomics. It defines metagenomics as the study of microbial communities directly in their natural environments using modern genomics techniques. The document outlines the historical context and basic purpose of metagenomics. It describes some of the applications of metagenomics, such as understanding the human microbiome, bioremediation, bioenergy production, and smart farming. Finally, it introduces some basic concepts in metagenomics analysis including binning, OTUs, alpha and beta diversity measurements, and challenges around estimating diversity from samples.
Metagenomics research is a vast field which studies about the genetic system of the
environmental samples. Binning is a bioinformatics tool. Binning tool helps to analyses the
genomic analysis of the environmental samples.The
Microbial Metagenomics Drives a New CyberinfrastructureLarry Smarr
06.03.03
Invited Talk
School of Biological Sciences
University of California, Irvine
Title: Microbial Metagenomics Drives a New Cyberinfrastructure
Irvine, CA
The document describes a seminar on high-throughput sequencing bioinformatics. It discusses analyzing microbiome samples using 16S rRNA sequencing and tools like Mothur and QIIME. It provides an overview of analyzing 16S sequences, including quality filtering, OTU clustering, classification, and diversity analysis. It also outlines running a Mothur tutorial to analyze a mock microbiome dataset from 21 samples using the Mothur MiSeq standard operating procedure.
I. The document outlines a proteogenomics course at EMBL-EBI, discussing integrating proteomics and genomics data.
II. It discusses what proteogenomics is, using multi-omics approaches to correlate genomic and proteomic sequence events like mutations and modifications.
III. The talk will cover integrating proteomics data into Ensembl and UCSC trackhubs, as well as tools for proteogenomics analysis.
The document summarizes research that screened metagenomic libraries from Puerto Rican forests for protease activity. Culture-independent metagenomic techniques were used to study the uncultured microbial genetics. Two libraries containing 14,000 and 600,000 clones were screened, identifying 20 potential clones producing protease enzymes, which are undergoing further analysis. Proteases have important industrial biotechnology applications.
Metagenomics is a set of techniques used to study microbial communities through direct collection and analysis of environmental DNA samples. It allows researchers to study millions of microbial organisms and genetic fragments simultaneously without needing to culture individual microbes in the lab. The main procedures involve sampling an environment, filtering out particles by size, extracting and sequencing DNA fragments. Two common sequencing methods are shotgun sequencing and high-throughput sequencing using platforms like Illumina or SOLiD. Projects like MetaHIT use metagenomics to study the human gut microbiome and its role in health and disease. Potential applications include contributions to earth sciences, life sciences, biomedicine, bioenergy, biotechnology, and microbial forensics.
Metagenomics is the study of microbial communities directly in their natural environments without isolating individual species in the lab. It involves sequencing DNA from environmental samples and analyzing the metagenomes. Some key points are that metagenomics can identify uncultivable microbes, bypassing the need for culture, and it has led to advances in understanding microbial ecology, evolution, and diversity. The rumen, home to a complex microbial community important for ruminant digestion, is a important target of metagenomics study. Next generation sequencing techniques now allow more accessible exploration of microbial systems through metagenomics.
10.02.19
Invited talk
Symposium #1816, Managing the Exaflood: Enhancing the Value of Networked Data for Science and Society
Title: Advancing the Metagenomics Revolution
San Diego, CA
Metagenomics as a tool for biodiversity and healthAlberto Dávila
- The document discusses several studies that used metagenomics to analyze microbial communities and identify novel genes.
- One study analyzed anoxygenic photosynthetic bacteria in coastal Brazil finding high abundance linked to upwelling and light availability. Novel polyketide synthase and nonribosomal peptide synthase genes were also identified.
- Another study identified 243 secondary metabolite gene clusters in lake microbial genomes from Germany.
- A third study found novel beta-lactamase genes in Brazilian hospital sewage and estimated their relative abundances, finding they grouped with Firmicutes and Bacteroidetes.
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Larry Smarr
06.09.15
Invited Talk
2006 Synthetic Biology Symposium
Aliso Creek Inn
Title: Building a Community Cyberinfrastructure to Support Marine Microbial Ecology Metagenomics
Laguna Beach, CA
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...Larry Smarr
The document discusses the creation of the Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA) project. CAMERA aims to provide metagenomic sequencing and analysis of marine microbes at high speeds. It will include data from the Sorcerer II expedition and other projects. The document outlines how CAMERA will utilize Calit2's infrastructure including high-performance computing resources and optical networks to enable remote interactive analysis of large-scale genomic and environmental data sets.
Metagenome is the entire genetic information of microorganism at specific site/time. Analysis of metagenomic data could be achieved by two approaches; 1) amplicon (16s RNA gene) data analysis and whole genome metagenomics data analysis. Here we focus on 16S rRNA amplicon using Mothur Pipeline for analysis of metagenomics data.
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICSLubna MRL
After billions of years of evolution, prokaryotes have developed a huge diversity of regulatory mechanisms, many of which are probably uncharacterized. Now that the powerful tool of whole-transcriptome analysis can be used to study the RNA of bacteria and archaea, a new set of un expected RNA-based regulatory strategies might be revealed.
Metagenomics, together with in vitro evolution and high-throughput screening technologies, provides industry with an unprecedented chance to bring biomolecules into industrial application.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.jennomics
Presentation at a workshop conducted by the UC Davis Bioinformatics Core Facility: Using the Linux Command Line for Analysis of High Throughput Sequence Data, September 15-19, 2014
Viral Metagenomics (CABBIO 20150629 Buenos Aires)bedutilh
This is a one-hour lecture about metagenomics, focusing on discovery of viruses and unknown sequence elements. It is part of a one-day workshop about metagenome assembly of crAssphage, a bacteriophage virus found in human gut. The hands-on workflow can be found at http://tbb.bio.uu.nl/dutilh/CABBIO/ and should be doable in one afternoon with supervision. There is also an iPython notebook about this here: https://github.com/linsalrob/CrAPy
The document discusses metagenomics analysis tools and challenges. It summarizes several metagenome analysis portals that provide computational analysis and public sample databases. It also discusses the rapid growth of metagenomic data being produced, challenges around quality control, feature identification, characterization and presentation of metagenomic data, and the need for standardized metadata and data formats. The future directions highlighted include studying strain variation, expanding metadata capture and standards, and developing improved assembly, binning and analysis methods.
This document discusses the potential for metagenomics to provide novel enzymes and biocatalysts for various industrial applications. It outlines how different industries, such as chemicals, pharmaceuticals, and detergents, are interested in accessing new enzymes from uncultured microbes. The document also discusses challenges in finding suitable enzymes and describes screening methods used to identify candidate enzymes from metagenomic libraries for specific industrial transformations and processes.
Shotgun metagenomics sequencing allows researchers to comprehensively sample all genes in organisms present in a complex sample without culturing. This provides insights into bacterial diversity, abundance, and uncultured microbes. Bioinformatics pipelines guide analysis including quality filtering, assembly, binning, gene finding, fingerprinting, and phylogeny/diversity modeling to understand communities. Metagenomics has applications in antibiotic/drug discovery, bioremediation, agriculture, human microbiome mapping, and more. Tools like QIIME, Mothur, MEGAN, and MG-RAST facilitate large-scale metagenomic analysis.
[2013.09.27] extracting genomes from metagenomesMads Albertsen
This document summarizes a presentation on extracting genomes from metagenomes. It discusses why genomes are needed, how they can be obtained through culturing, single cell genomics, and metagenomics. Metagenomics involves sequencing all DNA from an environmental sample to study the collective genomes of microbial communities. While it provides abundance and functional information, it does not yield full genomes due to microdiversity within populations. Methods for binning sequences into genomes using genomic signatures and using multiple related samples are described. An example of obtaining a near-complete genome of a Candidatus Saccharimonas bacterium from activated sludge metagenomes is provided. Obtaining genomes through metagenomics enables comprehensive studies of ecosystem function.
Managing environmental- molecular- and associated meta-data: The Micro B3 Inf...Renzo Kottmann
A 5 minutes lightning talk about the approach the Micro B3 Information System takes to deliver integrated environmental and molecular data with associated metadata. Presented at Biodiversity Informatics Horizon 2013 conference (see http://conference.lifewatch.unisalento.it/index.php/EBIC/BIH2013)
The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First ...Renzo Kottmann
Overview on Ocean Sampling Day and status of metagenomic analysis. Presentation was delivered at final BioMedBridges Symposium http://www.biomedbridges.eu/news/symposium-open-bridges-life-science-data
Metagenomics Session: http://www.biomedbridges.eu/news/workshop-metagenomics-bridging-between-environment-and-life
The document describes a seminar on high-throughput sequencing bioinformatics. It discusses analyzing microbiome samples using 16S rRNA sequencing and tools like Mothur and QIIME. It provides an overview of analyzing 16S sequences, including quality filtering, OTU clustering, classification, and diversity analysis. It also outlines running a Mothur tutorial to analyze a mock microbiome dataset from 21 samples using the Mothur MiSeq standard operating procedure.
I. The document outlines a proteogenomics course at EMBL-EBI, discussing integrating proteomics and genomics data.
II. It discusses what proteogenomics is, using multi-omics approaches to correlate genomic and proteomic sequence events like mutations and modifications.
III. The talk will cover integrating proteomics data into Ensembl and UCSC trackhubs, as well as tools for proteogenomics analysis.
The document summarizes research that screened metagenomic libraries from Puerto Rican forests for protease activity. Culture-independent metagenomic techniques were used to study the uncultured microbial genetics. Two libraries containing 14,000 and 600,000 clones were screened, identifying 20 potential clones producing protease enzymes, which are undergoing further analysis. Proteases have important industrial biotechnology applications.
Metagenomics is a set of techniques used to study microbial communities through direct collection and analysis of environmental DNA samples. It allows researchers to study millions of microbial organisms and genetic fragments simultaneously without needing to culture individual microbes in the lab. The main procedures involve sampling an environment, filtering out particles by size, extracting and sequencing DNA fragments. Two common sequencing methods are shotgun sequencing and high-throughput sequencing using platforms like Illumina or SOLiD. Projects like MetaHIT use metagenomics to study the human gut microbiome and its role in health and disease. Potential applications include contributions to earth sciences, life sciences, biomedicine, bioenergy, biotechnology, and microbial forensics.
Metagenomics is the study of microbial communities directly in their natural environments without isolating individual species in the lab. It involves sequencing DNA from environmental samples and analyzing the metagenomes. Some key points are that metagenomics can identify uncultivable microbes, bypassing the need for culture, and it has led to advances in understanding microbial ecology, evolution, and diversity. The rumen, home to a complex microbial community important for ruminant digestion, is a important target of metagenomics study. Next generation sequencing techniques now allow more accessible exploration of microbial systems through metagenomics.
10.02.19
Invited talk
Symposium #1816, Managing the Exaflood: Enhancing the Value of Networked Data for Science and Society
Title: Advancing the Metagenomics Revolution
San Diego, CA
Metagenomics as a tool for biodiversity and healthAlberto Dávila
- The document discusses several studies that used metagenomics to analyze microbial communities and identify novel genes.
- One study analyzed anoxygenic photosynthetic bacteria in coastal Brazil finding high abundance linked to upwelling and light availability. Novel polyketide synthase and nonribosomal peptide synthase genes were also identified.
- Another study identified 243 secondary metabolite gene clusters in lake microbial genomes from Germany.
- A third study found novel beta-lactamase genes in Brazilian hospital sewage and estimated their relative abundances, finding they grouped with Firmicutes and Bacteroidetes.
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Larry Smarr
06.09.15
Invited Talk
2006 Synthetic Biology Symposium
Aliso Creek Inn
Title: Building a Community Cyberinfrastructure to Support Marine Microbial Ecology Metagenomics
Laguna Beach, CA
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...Larry Smarr
The document discusses the creation of the Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA) project. CAMERA aims to provide metagenomic sequencing and analysis of marine microbes at high speeds. It will include data from the Sorcerer II expedition and other projects. The document outlines how CAMERA will utilize Calit2's infrastructure including high-performance computing resources and optical networks to enable remote interactive analysis of large-scale genomic and environmental data sets.
Metagenome is the entire genetic information of microorganism at specific site/time. Analysis of metagenomic data could be achieved by two approaches; 1) amplicon (16s RNA gene) data analysis and whole genome metagenomics data analysis. Here we focus on 16S rRNA amplicon using Mothur Pipeline for analysis of metagenomics data.
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICSLubna MRL
After billions of years of evolution, prokaryotes have developed a huge diversity of regulatory mechanisms, many of which are probably uncharacterized. Now that the powerful tool of whole-transcriptome analysis can be used to study the RNA of bacteria and archaea, a new set of un expected RNA-based regulatory strategies might be revealed.
Metagenomics, together with in vitro evolution and high-throughput screening technologies, provides industry with an unprecedented chance to bring biomolecules into industrial application.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.jennomics
Presentation at a workshop conducted by the UC Davis Bioinformatics Core Facility: Using the Linux Command Line for Analysis of High Throughput Sequence Data, September 15-19, 2014
Viral Metagenomics (CABBIO 20150629 Buenos Aires)bedutilh
This is a one-hour lecture about metagenomics, focusing on discovery of viruses and unknown sequence elements. It is part of a one-day workshop about metagenome assembly of crAssphage, a bacteriophage virus found in human gut. The hands-on workflow can be found at http://tbb.bio.uu.nl/dutilh/CABBIO/ and should be doable in one afternoon with supervision. There is also an iPython notebook about this here: https://github.com/linsalrob/CrAPy
The document discusses metagenomics analysis tools and challenges. It summarizes several metagenome analysis portals that provide computational analysis and public sample databases. It also discusses the rapid growth of metagenomic data being produced, challenges around quality control, feature identification, characterization and presentation of metagenomic data, and the need for standardized metadata and data formats. The future directions highlighted include studying strain variation, expanding metadata capture and standards, and developing improved assembly, binning and analysis methods.
This document discusses the potential for metagenomics to provide novel enzymes and biocatalysts for various industrial applications. It outlines how different industries, such as chemicals, pharmaceuticals, and detergents, are interested in accessing new enzymes from uncultured microbes. The document also discusses challenges in finding suitable enzymes and describes screening methods used to identify candidate enzymes from metagenomic libraries for specific industrial transformations and processes.
Shotgun metagenomics sequencing allows researchers to comprehensively sample all genes in organisms present in a complex sample without culturing. This provides insights into bacterial diversity, abundance, and uncultured microbes. Bioinformatics pipelines guide analysis including quality filtering, assembly, binning, gene finding, fingerprinting, and phylogeny/diversity modeling to understand communities. Metagenomics has applications in antibiotic/drug discovery, bioremediation, agriculture, human microbiome mapping, and more. Tools like QIIME, Mothur, MEGAN, and MG-RAST facilitate large-scale metagenomic analysis.
[2013.09.27] extracting genomes from metagenomesMads Albertsen
This document summarizes a presentation on extracting genomes from metagenomes. It discusses why genomes are needed, how they can be obtained through culturing, single cell genomics, and metagenomics. Metagenomics involves sequencing all DNA from an environmental sample to study the collective genomes of microbial communities. While it provides abundance and functional information, it does not yield full genomes due to microdiversity within populations. Methods for binning sequences into genomes using genomic signatures and using multiple related samples are described. An example of obtaining a near-complete genome of a Candidatus Saccharimonas bacterium from activated sludge metagenomes is provided. Obtaining genomes through metagenomics enables comprehensive studies of ecosystem function.
Managing environmental- molecular- and associated meta-data: The Micro B3 Inf...Renzo Kottmann
A 5 minutes lightning talk about the approach the Micro B3 Information System takes to deliver integrated environmental and molecular data with associated metadata. Presented at Biodiversity Informatics Horizon 2013 conference (see http://conference.lifewatch.unisalento.it/index.php/EBIC/BIH2013)
The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First ...Renzo Kottmann
Overview on Ocean Sampling Day and status of metagenomic analysis. Presentation was delivered at final BioMedBridges Symposium http://www.biomedbridges.eu/news/symposium-open-bridges-life-science-data
Metagenomics Session: http://www.biomedbridges.eu/news/workshop-metagenomics-bridging-between-environment-and-life
This study demonstrates the utility of using Next Generation Sequencing (NGS) technology and DNA analysis to identify and analyze closely related insect species and populations. The researchers sequenced DNA from two mitochondrial genes and a nuclear gene from individuals of two closely related fly species, Bactrocera philippinensis and B. occipitalis. They obtained overlapping sequences from these genes that could be assembled into full gene sequences. Their goal is to ultimately sequence the entire genome of multiple individuals to better characterize populations and species through comparative genomic analysis. DNA-based methods provide advantages over traditional taxonomy by requiring less material and being consistent across life stages.
[2013.12.02] Mads Albertsen: Extracting Genomes from MetagenomesMads Albertsen
This document summarizes the process of extracting genomes from metagenomes. It discusses how metagenomics involves sequencing the collective DNA from an environmental sample to determine the community composition and functional potential. Full genomes cannot typically be assembled from metagenomic data due to high microbial diversity within samples and limitations in separating individual genomes (binning). Methods described to improve binning include reducing diversity through short-term enrichments and using multiple related samples. Validation of binned genomes involves checking for essential single copy genes and confirming bins with in situ techniques like fluorescence in situ hybridization.
The diversity of microbial species in a metagenomic study is commonly assessed using 16S rRNA gene sequencing. With the rapid developments in genome sequencing technologies, the focus has shifted towards the sequencing of hypervariable regions of 16S rRNA gene instead of full length gene sequencing. Therefore, 16S Classifier is developed using a machine learning method, Random Forest, for faster and accurate taxonomic classification of short hypervariable regions of 16S rRNA sequence. It displayed precision values of up to 0.91 on training datasets and the precision values of up to 0.98 on the test dataset. On real metagenomic datasets, it showed up to 99.7% accuracy at the phylum level and up to 99.0% accuracy at the genus level. 16S Classifier is available freely at http://metagenomics.iiserb.ac.in/16Sclassifier and http://metabiosys.iiserb.ac.in/16Sclassifier.
This document discusses the use of 16S ribosomal RNA (rRNA) gene sequencing for bacterial identification and phylogenetic analysis. It explains that the 16S rRNA gene is highly conserved, making it useful for comparing distantly related organisms. The document outlines the process of 16S rRNA gene sequencing, including PCR amplification using conserved primer regions and sequencing of variable regions. It also discusses various methods that have been developed using 16S rRNA, such as TRFLP profiling and ribotyping, to study microbial communities.
New Generation Sequencing Technologies: an overviewPaolo Dametto
The document provides a history of DNA sequencing technologies. It begins with the discovery of DNA's structure in 1953 and the development of recombinant DNA technology in the 1970s. First generation Sanger sequencing produced short reads over 1,000 years to sequence the human genome. Next generation sequencing (NGS) platforms since 2005 have dramatically reduced costs while increasing throughput. NGS methods like Roche/454 pyrosequencing, Illumina/Solexa sequencing by synthesis, SOLiD ligation sequencing, and single-molecule real-time sequencing by Pacific Biosciences now enable large-scale genome and transcriptome analysis.
This document discusses the evolution of metagenomics from culturing microorganisms to direct high-throughput sequencing using next-generation sequencing (NGS) technologies. It describes how early metagenomics relied on cloning environmental DNA into libraries for Sanger sequencing, but NGS allows direct sequencing without cloning. NGS produces large volumes of sequence data at low cost, enabling assembly of large DNA fragments and reliable annotation of genes and pathways. The future of metagenomics involves comprehensively cataloging human and environmental microbiomes using NGS and exploiting microbial diversity for biotechnology applications like enzymes, antibiotics, and probiotics.
Bacterial Identification by 16s rRNA Sequencing.pptRakesh Kumar
Bacteria are the most abundant life forms on Earth, with a single gram of soil containing 40 million bacterial cells. Most bacterial species have yet to be identified due to their abundance. DNA sequencing of the 16s rRNA gene is a common technique used to identify bacterial species. The process involves isolating bacteria from a sample, extracting DNA, amplifying and sequencing the 16s rRNA gene, and comparing the sequence to databases to identify matches. 16s rRNA gene sequencing provides a more accurate identification of bacteria than phenotypic methods.
This document discusses the potentials and pitfalls of metagenomics. It begins with an introduction to metagenomics and its history. It describes some of the early applications of metagenomics including exploration of microbial communities and identification of specific functions. Potential pitfalls of metagenomics are then outlined, including issues related to DNA extraction, sequencing depth, and biases. The major pitfall discussed is the incompleteness of databases for assigning taxonomy and functions. The document concludes by describing some of the potentials of metagenomics, including hunting for novel antibiotic resistance genes using functional metagenomics and extracting genomes from metagenomes through reducing microdiversity and binning sequences from multiple related samples.
Novel Computational Approaches to Investigate Microbial DiversityQingpeng "Q.P." Zhang
The document summarizes novel computational approaches to investigate microbial diversity using metagenomic sequencing data. It discusses challenges with traditional assembly-based and mapping-based approaches for diversity analysis due to short read lengths and high diversity. It then introduces the concept of informative genomic segments (IGS) as a new unit to represent genomes and perform diversity analysis without the need for assembly or references. The document outlines using k-mer counting and abundance profiles across samples to estimate sequencing coverage and size of genomic regions represented by IGSs from raw reads. It proposes replacing traditional species-sample abundance tables with IGS-sample tables for scalable metagenomic diversity analysis.
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeJustin Johnson
The document discusses next generation sequencing technologies and challenges. It describes EdgeBio's sequencing platforms including Illumina, Ion Torrent, SOLiD, and PacBio machines. It highlights challenges such as experimental design considerations, flexibility with standards, sample preparation difficulties, and differences between platforms regarding read length, error rates, and yield. Overall the document provides an overview of sequencing technologies and issues researchers may face.
This document discusses next generation sequencing technologies. It provides details on several massively parallel sequencing platforms and describes their advantages over traditional Sanger sequencing such as higher throughput, lower costs, and ability to process millions of reads in parallel. It then outlines several applications of next generation sequencing like mutation discovery, transcriptome analysis, metagenomics, epigenetics research and discovery of non-coding RNAs.
Metagenomics is the study of genetic material recovered directly from environmental samples. It provides a new approach to studying microbes that are not easily cultured in a laboratory and enables investigation of microbial communities in their natural habitats. Metagenomics involves directly extracting DNA from samples, sequencing it, and analyzing the genetic information obtained from entire communities of organisms simultaneously. This provides insights into uncultured microbes and their roles in various environments.
This document discusses the field of metagenomics, which involves directly extracting and sequencing genetic material from environmental samples without culturing individual microbial species. It provides a brief history of metagenomics from early microbiologists in the 17th century to recent large-scale sequencing projects. Methods of metagenomic analysis like sequence-driven and function-driven approaches are described. Applications to studying uncultured symbiotic microbes, extreme environments, and the human gut microbiome are also summarized.
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesSurya Saha
Presented at Cornell Symbiosis symposium. Workflow for processing amplicon based 16S/ITS sequences as well as whole genome shotgun sequences are described. Slides include short description and links for each tool.
DISCLAIMER: This is a small subset of tools out there. No disrespect to methods not mentioned.
Next-generation sequencing techniques such as Illumina and 454 pyrosequencing were discussed for applications including microbial genome sequencing and metagenomic profiling of microbial communities from targeted gene markers or shotgun sequencing. Key steps include library preparation, sequencing, and downstream bioinformatics analysis of sequencing data for tasks like genome assembly, gene annotation, and taxonomic classification of microbial taxa.
Metagenomics is the study of metagenomes, genetic material recovered directly from environmental samples. The broad field was referred to as environmental genomics, ecogenomics or community genomics. Recent studies use "shotgun" Sanger sequencing or next generation sequencing (NGS) to get largely unbiased samples of all genes from all the members of the sampled communities.
Next Gen Sequencing (NGS) Technology OverviewDominic Suciu
Next generation sequencing (NGS) provides several new technologies for DNA sequencing that have significantly increased throughput and reduced costs compared to previous methods. NGS technologies include Roche/454, Illumina, ABI SOLiD, Ion Torrent, and PacBio. These technologies have various applications including whole genome sequencing, detection of genetic mutations associated with diseases, RNA sequencing to study gene expression, and ChIP sequencing to identify DNA-binding sites. NGS is revolutionizing genomic research by allowing comprehensive study of genomes, transcriptomes, and gene regulation.
Metagenomics is the study of genetic material recovered directly from environmental samples without culturing. This field enables research on uncultured organisms and microbial communities. There are three main metagenomic approaches: biochemical, whole genome shotgun sequencing, and 16s rRNA sequencing. Metagenomics is being applied to study human microbiomes, discover new genes and enzymes, monitor environmental impacts, and characterize uncultured microbes. Future directions include identifying more novel products from uncultured bacteria and improving culture methods and bioinformatics tools.
Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...Jonathan Eisen
This document summarizes an automated phylogenetic tree-based small subunit rRNA taxonomy and alignment pipeline (STAP) developed by the authors. STAP generates high-quality multiple sequence alignments and phylogenetic trees from rRNA gene sequences in a fully automated manner, allowing for phylogenetic analysis of large datasets. It combines existing tools like BLAST, CLUSTALW and PHYML with new programs for automated alignment, masking, and tree parsing. STAP yields results comparable to manual analysis but with increased speed and capacity needed to analyze the large volumes of rRNA data now being generated.
This document discusses the analysis of microbial communities through sequencing of the 16S rRNA gene. It presents WATERS, a workflow system that automates and bundles various software tools for analyzing 16S rRNA sequence data. The goals of WATERS are to simplify the analysis process for users without specialized bioinformatics expertise and to facilitate reproducibility through tracking of data provenance. WATERS guides users through the typical sequence analysis steps of alignment, chimera filtering, OTU clustering, taxonomy assignment, phylogeny tree building, and ecological analyses and visualization. By integrating existing tools into a single automated workflow, WATERS aims to reduce the effort required for 16S rRNA data analysis and allow researchers to focus on biological interpretation of results.
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesMartin Hartmann
This document provides an overview of next-generation sequencing (NGS) technologies and their usefulness for analyzing microorganisms associated with plants. It discusses how NGS methods allow addressing previously impossible questions about the composition, function, and interactions of microbial communities in environments like the rhizosphere and phyllosphere. While powerful, NGS platforms have limitations that can introduce errors or biases, but methods exist to overcome these issues. The review highlights applications of NGS in metagenomic studies of plant-associated microbiomes and how these new techniques are transforming the field.
This document provides an overview of bioinformatics and genomics. It begins with an acknowledgement and abstract section. The introduction defines bioinformatics and its role in analyzing genetic sequences and biological data through computational methods. Major research areas of bioinformatics discussed include sequence analysis, genome annotation, evolutionary biology, measuring biodiversity, gene expression analysis, protein analysis, cancer mutation analysis, and protein structure prediction. Comparative genomics and modeling biological systems are also summarized. The document concludes with a definition of genomics as the study of genomes through sequencing efforts and mapping genetic interactions.
Shotgun metagenomics sequencing allows researchers to comprehensively sample all genes in organisms present in a complex sample without relying on cultivation. This approach provides insights into bacterial diversity, abundance, and unculturable microbes. Bioinformatics pipelines guide the optimized whole genome shotgun sequencing approach by performing tasks like quality control, assembly, binning, gene finding, fingerprinting, and phylogenetic analysis to study community diversity from fragmented metagenomic data. Metagenomics has applications in fields like drug discovery, bioremediation, agriculture, and understanding the human microbiome.
This document provides an overview of bioinformatics and discusses key concepts like:
- Bioinformatics combines biology, computer science, and information technology to analyze large amounts of biological data.
- High-throughput DNA sequencing has generated vast genomic data that requires bioinformatics tools and databases accessible via the internet to analyze and share.
- Popular sequence alignment tools like BLAST, FASTA, and ClustalW are used to search databases and compare sequences, helping researchers analyze genes and genomes.
The flood of nextgen sequencing data is changing the landscape of computation biology, pushing the need for more robust infrastructures, tools, and visualization techniques.
Knowing Your NGS Downstream: Functional PredictionsGolden Helix Inc
Next-Generation Sequencing analysis workflows typically lead to a list of candidate variants that may or may not be associated with the phenotype of interest. Any given analysis may result in tens, hundreds, or even thousands of genetic variants which must be screened and prioritized for experimental validation before a causal variant may be identified. To assist with this screening process, the field of bioinformatics has developed numerous algorithms to predict the functional consequences of genetic variants. Algorithms like SIFT and PolyPhen-2 are firmly established in the field and are cited frequently. Other tools, like MutationAssessor and FATHMM are newer and perhaps not known as well.
This presentation will review several of the functional prediction tools that are currently available to help researchers determine the functional consequences of genetic alterations. The biological principals underlying functional predictions will be discussed together with an overview of the methodology used by each of the predictive algorithms. Finally, we will discuss how these predictions can be accessed and used within the Golden Helix SNP & Variation Suite (SVS) software.
Overview of the commonly used sequencing platforms, bioinformatic search tool...OECD Environment
24 June 2019: This OECD seminar presented and discussed the potential use of genome sequence, bioinformatic tools and databases in a regulatory decision process for microbial pesticides.
This document discusses computational methods and challenges for genome assembly using next-generation sequencing data. It describes the four main stages of genome assembly as preprocessing filtering, graph construction, graph simplification, and postprocessing filtering. Each stage processes the data from the previous stage to build the assembly graph and reduce complexity, though some assemblers delay filtering steps.
A full picture of -omics cellular networks of regulation brings researchers closer to a realistic and reliable understanding of complex conditions. For more information, please visit: http://tbioinfopb.pine-biotech.com/
T-Bioinfo is a comprehensive bioinformatics platform that allows the user to navigate NGS, Mass-Spec and Structural Biology data analysis pipelines using consistent interface. Analysis and integration of such data allows for better and faster discovery and optimization of personalized and precision treatment of complex diseases and understanding of medical conditions. For more information, go to pine-biotech.com
Next-generation DNA sequencing technologies have significantly impacted genetics research. Three major platforms - Roche/454, Illumina Genome Analyzer, and Applied Biosystems SOLiD - utilize massively parallel sequencing to generate large amounts of sequence data. Roche/454 uses emulsion PCR to amplify DNA fragments on beads and pyrosequencing to determine sequences. Illumina performs bridge amplification on a flow cell to generate DNA clusters then sequences by synthesis. Applied Biosystems SOLiD uses ligation-based sequencing. These new methods have enabled genome-wide studies and applications such as ancient DNA sequencing and metagenomics that were previously difficult or impossible.
INTEGRALL is a freely available database containing over 4,800 sequences related to integrons, integrases, and gene cassettes. It provides scientists with easy access to sequence data, molecular arrangements, and genetic contexts of integrons. The database aims to organize 20 years of integron data in one place and facilitate understanding of integrons' role in bacterial adaptation and interactions. It currently includes sequences from a diverse range of bacteria and environments. Over half of gene cassettes encode antibiotic resistance genes.
Computational genomics uses computational and statistical analysis to understand biology from genome sequences and related data. It involves analyzing whole genomes to understand how DNA controls organisms' molecular biology. The field emerged in the late 1990s with available complete genomes. It has contributed to discoveries like predicting gene locations, signaling networks, and genome evolution mechanisms. The first computer model of an organism was of Mycoplasma genitalium incorporating over 1,900 parameters. Computational genomics addresses problems like data storage, pattern matching, and structure prediction to analyze vast genomic data from databases.
Whole genome sequencing of bacteria & analysisdrelamuruganvet
This document discusses the history and advancements of whole genome sequencing of bacteria. It begins with early sequencing methods like Sanger sequencing and describes the development of next generation sequencing technologies like 454 sequencing, Illumina sequencing, and third generation single molecule sequencing. The document then discusses genome assembly, annotation, and various applications of bacterial genome sequencing like identification of genes and SNPs, comparative genomics, and metagenomics. Important databases for bacterial genomic data are also listed.
This document discusses various bioinformatics tools and their functions. It provides details on multiple sequence alignment tools like CLUSTAL Omega, CLUSTALW, BLAST, and FASTA. It explains that CLUSTAL Omega can align a large number of sequences quickly and accurately using progressive alignment. CLUSTALW performs multiple sequence alignment in three steps - pairwise alignment, guide tree creation, and multiple alignment using the guide tree. BLAST can identify unknown sequences by comparing them to known sequences. FASTA uses short exact matches to find similar regions between sequences. Expasy provides access to databases for proteomics, genomics, and other areas. MASCOT searches peptide mass fingerprinting and shotgun proteomics datasets.
WHAT IS BIOINFORMATICS?
Computational Biology/Bioinformatics is the application of computer sciences and allied technologies to answer the questions of Biologists, about the mysteries of life. It has evolved to serve as the bridge between:
Observations (data) in diverse biologically-related disciplines and
The derivations of understanding (information)
APPLICATIONS OF BIOINFORMATICS
Computer Aided Drug Design
Microarray Bioinformatics
Proteomics
Genomics
Biological Databases
Phylogenetics
Systems Biology
Marco Brandizi and Keywan Hassani-Pak, Rothamsted Research, Invited Presentation at SWAT4HCLS 2022.
FAIR data principles are being a driving force in life sciences and other scientific domains, helping researchers to share their data and free all of their potential to integrate information and do novel discoveries. Knowledge graphs are an ever more popular paradigm to model data according to such principles, and technologies such as graph databases are emerging as complementary to approaches like linked data. All of this includes the agronomy, farming and food domains. How advanced the adoption of sound data management policies is in these life domains? How does that compare to other life sciences? In this presentation, we will talk about our practical experience, focusing on KnetMiner, a gene and molecular biology discovering platform, which is based on building and publishing knowledge graphs according to the FAIR principles, as well as using a mix of linked data standards for life sciences and recent graph database and API technologies. We will welcome questions and discussions from the audience about similar experience.
Similar to Talk by J. Eisen for NZ Computational Genomics meeting (20)
Innovations in Sequencing & Bioinformatics
Talk for
Healthy Central Valley Together Research Workshop
Jonathan A. Eisen University of California, Davis
January 31, 2024 linktr.ee/jonathaneisen
Talk by Jonathan Eisen for LAMG2022 meetingJonathan Eisen
The document discusses the history of the Lake Arrowhead Microbial Genomes (LAMG) conference. It reveals that LAMG2020 was cancelled due to a secret plan by organizers who formed an "anti-karyote society" that hates eukaryotes. The meeting was to be renamed the "Big, Large, Enormous" meeting of the Lake Arrowhead Big Large Enormous Anti-Karyote Society. The document also hints that several past LAMG speakers have made cryptic comments indicating involvement in a conspiracy surrounding the conference.
Thoughts on UC Davis' COVID Current ActionsJonathan Eisen
Slides I used for a presentation to Chancellor May's leadership council about the current state of UC Davis' response to COVID and how it could be improved
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Jonathan Eisen
The document discusses Jonathan Eisen's work as a microbiology professor at UC Davis. It provides an overview of his research topics, which include microbial phylogenomics and evolvability, phylogenetic methods and tools, and using phylogenomics to study microbial communities and interactions between microbes and hosts under stress. The document also acknowledges collaborators and funding sources for Eisen's research over the years.
This document summarizes a class on detecting, quantifying, and tracking variations of SARS-CoV-2 RNA from COVID-19 samples. It discusses using quantitative RT-PCR (qRT-PCR) to detect and measure viral RNA levels in samples. Sequencing is used to identify variations in the viral genome over time, and online tools like Nextstrain allow viewing the evolution and global transmission of variants. Genotyping assays are also described that can rapidly screen samples for known single nucleotide variations during PCR.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms.
EVE198 Winter2020 Class 8 - COVID RNA DetectionJonathan Eisen
This document summarizes a class on SARS-CoV-2 RNA detection, quantification, and variation. It discusses how qRT-PCR is used to detect and quantify the virus by amplifying and detecting viral RNA. It also covers sequencing to identify variants, how variants evolve over time, and genotyping assays that can screen samples for known single nucleotide variations. Nextstrain and other online tools are presented that use sequencing data to analyze viral phylogenies, track variant distributions globally, and visualize genetic variations across the SARS-CoV-2 genome.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive function. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms for those who already suffer from conditions like depression and anxiety.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive function. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms.
EVE198 Winter2020 Class 5 - COVID VaccinesJonathan Eisen
The document discusses a class on COVID-19 vaccines. It covers topics like vaccine development, current candidates, delivery challenges, and comparisons between vaccines. Moderna and Pfizer mRNA vaccines are highlighted as being similar but having some differences in mRNA region, nanoparticle structure/synthesis, dosage amount, and storage temperature requirements. Other vaccines discussed include Novavax using spike protein nanoparticles, and AstraZeneca and Johnson & Johnson using DNA for spike protein delivered by a modified virus.
EVE198 Winter2020 Class 9 - COVID TransmissionJonathan Eisen
This document discusses modes of SARS-CoV-2 transmission including droplets, aerosols, and surfaces. It emphasizes that surfaces are not as big a risk as initially thought. It provides guidance on limiting transmission from different modes such as distancing, masks, washing hands, cleaning surfaces, and improving ventilation. The focus in 2021 is on droplets and aerosols rather than surfaces.
EVE198 Fall2020 "Covid Mass Testing" Class 8 VaccinesJonathan Eisen
This document discusses a class on vaccines for COVID-19. It covers topics like vaccine development, current candidate vaccines, challenges with vaccine distribution, and how vaccines are being assessed for safety, effectiveness, costs and production feasibility. Over 100 vaccine candidates are in development using platforms like DNA, RNA, viral vectors and inactivated viruses. Efforts like Operation Warp Speed are coordinating development of nucleic acid, viral vector and protein subunit vaccines. Distribution challenges include vaccine production, storage and logistics, number of doses required, and overcoming vaccine nationalism and hesitancy.
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and TestingJonathan Eisen
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive function. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms.
EVE198 Fall2020 "Covid Mass Testing" Class 1 IntroductionJonathan Eisen
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive function. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxRASHMI M G
Abnormal or anomalous secondary growth in plants. It defines secondary growth as an increase in plant girth due to vascular cambium or cork cambium. Anomalous secondary growth does not follow the normal pattern of a single vascular cambium producing xylem internally and phloem externally.
BREEDING METHODS FOR DISEASE RESISTANCE.pptxRASHMI M G
Plant breeding for disease resistance is a strategy to reduce crop losses caused by disease. Plants have an innate immune system that allows them to recognize pathogens and provide resistance. However, breeding for long-lasting resistance often involves combining multiple resistance genes
Nucleophilic Addition of carbonyl compounds.pptxSSR02
Nucleophilic addition is the most important reaction of carbonyls. Not just aldehydes and ketones, but also carboxylic acid derivatives in general.
Carbonyls undergo addition reactions with a large range of nucleophiles.
Comparing the relative basicity of the nucleophile and the product is extremely helpful in determining how reversible the addition reaction is. Reactions with Grignards and hydrides are irreversible. Reactions with weak bases like halides and carboxylates generally don’t happen.
Electronic effects (inductive effects, electron donation) have a large impact on reactivity.
Large groups adjacent to the carbonyl will slow the rate of reaction.
Neutral nucleophiles can also add to carbonyls, although their additions are generally slower and more reversible. Acid catalysis is sometimes employed to increase the rate of addition.
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Talk by J. Eisen for NZ Computational Genomics meeting
1. Phylogeny driven approaches
to the study of microbial diversity
September 3, 2015
Queenstown Computational Genomics
Conference
Jonathan A. Eisen
@phylogenomics
University of California, Davis
3. microBIOME or microbiOME
• microbi-OME
• collection of genomes of microbes from a
community (emphasis on OME)
• micro-BIOME
• a community of microbes (emphasis on
BIOME)
• see http://tinyurl.com/definemicrobiome
14. Woese: Classification of Cultured Taxa by rRNA
!13
rRNA rRNArRNA
ACUGC
ACCUAU
CGUUCG
ACUCC
AGCUAU
CGAUCG
ACCCC
AGCUCU
CGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
R ACUCCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
F ACUCCAGGUAUCGAUCG
C ACCCCAGCUCUCGCUCG
W ACCCCAGCUCUGGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
C ACCCCAGCUCUCGCUCG
EukaryotesBacteria ?????ArchaebacteriaArchaea
Isolate Ribosomes
15. Archaea
Woese: Classification of Cultured Taxa by rRNA PCR
!15
rRNA
rRNA
PCR
rRNA
PCR
ACUGC
ACCUAU
CGUUCG
ACUCC
AGCUAU
CGAUCG
ACCCC
AGCUCU
CGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
R ACUCCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
F ACUCCAGGUAUCGAUCG
C ACCCCAGCUCUCGCUCG
W ACCCCAGCUCUGGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
C ACCCCAGCUCUCGCUCG
EukaryotesBacteria
Isolate DNA
21. !21
Approaching to NGS
Discovery of DNA structure
(Cold Spring Harb. Symp. Quant. Biol. 1953;18:123-31)
1953
Sanger sequencing method by F. Sanger
(PNAS ,1977, 74: 560-564)
1977
PCR by K. Mullis
(Cold Spring Harb Symp Quant Biol. 1986;51 Pt 1:263-73)
1983
Development of pyrosequencing
(Anal. Biochem., 1993, 208: 171-175; Science ,1998, 281: 363-365)
1993
1980
1990
2000
2010
Single molecule emulsion PCR 1998
Human Genome Project
(Nature , 2001, 409: 860–92; Science, 2001, 291: 1304–1351)
Founded 454 Life Science 2000
454 GS20 sequencer
(First NGS sequencer)
2005
Founded Solexa 1998
Solexa Genome Analyzer
(First short-read NGS sequencer)
2006
GS FLX sequencer
(NGS with 400-500 bp read lenght)
2008
Hi-Seq2000
(200Gbp per Flow Cell)
2010
Illumina acquires Solexa
(Illumina enters the NGS business)
2006
ABI SOLiD
(Short-read sequencer based upon ligation)
2007
Roche acquires 454 Life Sciences
(Roche enters the NGS business)
2007
NGS Human Genome sequencing
(First Human Genome sequencing based upon NGS technology)
2008
From Slideshare presentation of Cosentino Cristian
http://www.slideshare.net/cosentia/high-throughput-equencing
Miseq
Roche Jr
Ion Torrent
PacBio
Oxford
Automation is Critical
AAATCGCTAGCGC
CGGCGAGCTAGC
CGAGCGATCGAGC
CGAGCATCGAGTA
22. STAP (for rRNA)
An Automated Phylogenetic Tree-Based Small Subunit
rRNA Taxonomy and Alignment Pipeline (STAP)
Dongying Wu1
*, Amber Hartman1,6
, Naomi Ward4,5
, Jonathan A. Eisen1,2,3
1 UC Davis Genome Center, University of California Davis, Davis, California, United States of America, 2 Section of Evolution and Ecology, College of Biological Sciences,
University of California Davis, Davis, California, United States of America, 3 Department of Medical Microbiology and Immunology, School of Medicine, University of
California Davis, Davis, California, United States of America, 4 Department of Molecular Biology, University of Wyoming, Laramie, Wyoming, United States of America,
5 Center of Marine Biotechnology, Baltimore, Maryland, United States of America, 6 The Johns Hopkins University, Department of Biology, Baltimore, Maryland, United
States of America
Abstract
Comparative analysis of small-subunit ribosomal RNA (ss-rRNA) gene sequences forms the basis for much of what we know
about the phylogenetic diversity of both cultured and uncultured microorganisms. As sequencing costs continue to decline
and throughput increases, sequences of ss-rRNA genes are being obtained at an ever-increasing rate. This increasing flow of
data has opened many new windows into microbial diversity and evolution, and at the same time has created significant
methodological challenges. Those processes which commonly require time-consuming human intervention, such as the
preparation of multiple sequence alignments, simply cannot keep up with the flood of incoming data. Fully automated
methods of analysis are needed. Notably, existing automated methods avoid one or more steps that, though
computationally costly or difficult, we consider to be important. In particular, we regard both the building of multiple
sequence alignments and the performance of high quality phylogenetic analysis to be necessary. We describe here our fully-
automated ss-rRNA taxonomy and alignment pipeline (STAP). It generates both high-quality multiple sequence alignments
and phylogenetic trees, and thus can be used for multiple purposes including phylogenetically-based taxonomic
assignments and analysis of species diversity in environmental samples. The pipeline combines publicly-available packages
(PHYML, BLASTN and CLUSTALW) with our automatic alignment, masking, and tree-parsing programs. Most importantly,
this automated process yields results comparable to those achievable by manual analysis, yet offers speed and capacity that
are unattainable by manual efforts.
Citation: Wu D, Hartman A, Ward N, Eisen JA (2008) An Automated Phylogenetic Tree-Based Small Subunit rRNA Taxonomy and Alignment Pipeline (STAP). PLoS
ONE 3(7): e2566. doi:10.1371/journal.pone.0002566
multiple alignment and phylogeny was deemed unfeasible.
However, this we believe can compromise the value of the results.
For example, the delineation of OTUs has also been automated
via tools that do not make use of alignments or phylogenetic trees
(e.g., Greengenes). This is usually done by carrying out pairwise
comparisons of sequences and then clustering of sequences that
have better than some cutoff threshold of similarity with each
other). This approach can be powerful (and reasonably efficient)
but it too has limitations. In particular, since multiple sequence
alignments are not used, one cannot carry out standard
phylogenetic analyses. In addition, without multiple sequence
alignments one might end up comparing and contrasting different
regions of a sequence depending on what it is paired with.
The limitations of avoiding multiple sequence alignments and
phylogenetic analysis are readily apparent in tools to classify
sequences. For example, the Ribosomal Database Project’s
Classifier program [29] focuses on composition characteristics of
each sequence (e.g., oligonucleotide frequency) and assigns
taxonomy based upon clustering genes by their composition.
Though this is fast and completely automatable, it can be misled in
cases where distantly related sequences have converged on similar
composition, something known to be a major problem in ss-rRNA
sequences [30]. Other taxonomy assignment systems focus
primarily on the similarity of sequences. The simplest of these is
classification tools it does have some limitations. For example,
the generation of new alignments for each sequence is both
computational costly, and does not take advantage of available
curated alignments that make use of ss-RNA secondary structure
to guide the primary sequence alignment. Perhaps most
importantly however is that the tool is not fully automated. In
addition, it does not generate multiple sequence alignments for all
sequences in a dataset which would be necessary for doing many
analyses.
Automated methods for analyzing rRNA sequences are also
available at the web sites for multiple rRNA centric databases,
such as Greengenes and the Ribosomal Database Project (RDPII).
Though these and other web sites offer diverse powerful tools, they
do have some limitations. For example, not all provide multiple
sequence alignments as output and few use phylogenetic
approaches for taxonomy assignments or other analyses. More
importantly, all provide only web-based interfaces and their
integrated software, (e.g., alignment and taxonomy assignment),
cannot be locally installed by the user. Therefore, the user cannot
take advantage of the speed and computing power of parallel
processing such as is available on linux clusters, or locally alter and
potentially tailor these programs to their individual computing
needs (Table 1).
Given the limited automated tools that are available for
Table 1. Comparison of STAP’s computational abilities relative to existing commonly-used ss-RNA analysis tools.
STAP ARB Greengenes RDP
Installed where? Locally Locally Web only Web only
User interface Command line GUI Web portal Web portal
Parallel processing YES NO NO NO
Manual curation for taxonomy assignment NO YES NO NO
Manual curation for alignment NO YES NO* NO
Open source YES** NO NO NO
Processing speed Fast Slow Medium Medium
It is important to note, that STAP is the only software that runs on the command line and can take advantage of parallel processing on linux clusters and, further, is
more amenable to downstream code manipulation.
*
Note: Greengenes alignment output is compatible with upload into ARB and downstream manual alignment.
**
The STAP program itself is open source, the programs it depends on are freely available but not open source.
doi:10.1371/journal.pone.0002566.t001
ss-rRNA Taxonomy Pipeline
STAP database, and the query sequence is aligned to them using
the CLUSTALW profile alignment algorithm [40] as described
above for domain assignment. By adapting the profile alignment
algorithm, th
while gaps ar
sequence ac
Figure 1. A flow chart of the STAP pipeline.
doi:10.1371/journal.pone.0002566.g001
STAP database, and the query sequence is aligned to them using
the CLUSTALW profile alignment algorithm [40] as described
above for domain assignment. By adapting the profile alignment
algorithm, the alignments from the STAP database remain intact,
while gaps are inserted and nucleotides are trimmed for the query
sequence according to the profile defined by the previous
alignments from the databases. Thus the accuracy and quality of
the alignment generated at this step depends heavily on the quality
of the Bacterial/Archaeal ss-rRNA alignments from the
Greengenes project or the Eukaryotic ss-rRNA alignments from
the RDPII project.
Phylogenetic analysis using multiple sequence alignments rests on
the assumption that the residues (nucleotides or amino acids) at the
same position in every sequence in the alignment are homologous.
Thus, columns in the alignment for which ‘‘positional homology’’
cannot be robustly determined must be excluded from subsequent
analyses. This process of evaluating homology and eliminating
questionable columns, known as masking, typically requires time-
consuming, skillful, human intervention. We designed an automat-
ed masking method for ss-rRNA alignments, thus eliminating this
bottleneck in high-throughput processing.
First, an alignment score is calculated for each aligned column
by a method similar to that used in the CLUSTALX package [42].
Specifically, an R-dimensional sequence space representing all the
possible nucleotide character states is defined. Then for each
aligned column, the nucleotide populating that column in each of
the aligned sequences is assigned a score in each of the R
dimensions (Sr) according to the IUB matrix [42]. The consensus
‘‘nucleotide’’ for each column (X) also has R dimensions, with the
Figure 2. Domain assignment. In Step 1, STAP assigns a domain to
each query sequence based on its position in a maximum likelihood
tree of representative ss-rRNA sequences. Because the tree illustrated
here is not rooted, domain assignment would not be accurate and
Figure 1. A flow chart of the STAP pipeline.
doi:10.1371/journal.pone.0002566.g001
ss-rRNA Taxonomy Pipeline
Dongying
Wu
Amber
Hartman
Naomi Ward
23. alignment used to build the profile, resulting in a multiple PD versus PID clustering, 2) to explore overlap between PhylOT
Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in this generaliz
workflow of PhylOTU. See Results section for details.
doi:10.1371/journal.pcbi.1001061.g001
Finding Metagenomic OTU
Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O'Dwyer JP, Green JL, Eisen JA, Pollard
KS. (2011) PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity
and Resolves Novel Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:
10.1371/journal.pcbi.1001061
PhylOTU
Tom Sharpton
Katie Pollard
Jessica Green
25. Taxa Characters
B1 ACTGCACCTATCGTTCG
B2 ACTCCACCTATCGTTCG
E1 ACTCCAGCTATCGATCG
E2 ACTCCAGGTATCGATCG
A1 ACCCCAGCTCTCGCTCG
A2 ACCCCAGCTCTGGCTCG
New1 ACCCCAGCTCTGCCTCG
New2 ACTGCACCTATCGTTCG
New3 ACCCCAGCTCTCGCTCG
New4 AGGGGAGCTCTCGCTCG
Archaea EukaryotesBacteria
!24
rRNA
rRNA
PCR
rRNA
PCR
Isolate DNA
rRNA PCR: Community Comparisons
A A A A
AA
A A A A
AA
A A
A A A
AA
A A
26. Taxa Characters
B1 ACTGCACCTATCGTTCG
B2 ACTCCACCTATCGTTCG
E1 ACTCCAGCTATCGATCG
E2 ACTCCAGGTATCGATCG
A1 ACCCCAGCTCTCGCTCG
A2 ACCCCAGCTCTGGCTCG
New1 ACCCCAGCTCTGCCTCG
New2 ACTGCACCTATCGTTCG
New3 ACCCCAGCTCTCGCTCG
New4 AGGGGAGCTCTCGCTCG !25
rRNA
rRNA
PCR
rRNA
PCR
Isolate DNA
rRNA PCR: Community Comparisons
A A A A
AA
A A A A
AA
A A
A A A
AA
A A
27. Hartman et al. BMC Bioinformatics 2010, 11:317
http://www.biomedcentral.com/1471-2105/11/317
Open AccessSOFTWARE
Software
Introducing W.A.T.E.R.S.: a Workflow for the
Alignment, Taxonomy, and Ecology of Ribosomal
Sequences
Amber L Hartman†1,3, Sean Riddle†2, Timothy McPhillips2, Bertram Ludäscher2 and Jonathan A Eisen*1
Abstract
Background: For more than two decades microbiologists have used a highly conserved microbial gene as a
phylogenetic marker for bacteria and archaea. The small-subunit ribosomal RNA gene, also known as 16 S rRNA, is
encoded by ribosomal DNA, 16 S rDNA, and has provided a powerful comparative tool to microbial ecologists. Over
time, the microbial ecology field has matured from small-scale studies in a select number of environments to massive
collections of sequence data that are paired with dozens of corresponding collection variables. As the complexity of
data and tool sets have grown, the need for flexible automation and maintenance of the core processes of 16 S rDNA
sequence analysis has increased correspondingly.
Results: We present WATERS, an integrated approach for 16 S rDNA analysis that bundles a suite of publicly available 16
S rDNA analysis software tools into a single software package. The "toolkit" includes sequence alignment, chimera
removal, OTU determination, taxonomy assignment, phylogentic tree construction as well as a host of ecological
analysis and visualization tools. WATERS employs a flexible, collection-oriented 'workflow' approach using the open-
source Kepler system as a platform.
Conclusions: By packaging available software tools into a single automated workflow, WATERS simplifies 16 S rDNA
analyses, especially for those without specialized bioinformatics, programming expertise. In addition, WATERS, like
some of the newer comprehensive rRNA analysis tools, allows researchers to minimize the time dedicated to carrying
out tedious informatics steps and to focus their attention instead on the biological interpretation of the results. One
advantage of WATERS over other comprehensive tools is that the use of the Kepler workflow system facilitates result
interpretation and reproducibility via a data provenance sub-system. Furthermore, new "actors" can be added to the
workflow as desired and we see WATERS as an initial seed for a sizeable and growing repository of interoperable, easy-
to-combine tools for asking increasingly complex microbial ecology questions.
Background
Microbial communities and how they are surveyed
Microbial communities abound in nature and are crucial
for the success and diversity of ecosystems. There is no
end in sight to the number of biological questions that
can be asked about microbial diversity on earth. From
animal and human guts to open ocean surfaces and deep
sea hydrothermal vents, to anaerobic mud swamps or
boiling thermal pools, to the tops of the rainforest canopy
and the frozen Antarctic tundra, the composition of
microbial communities is a source of natural history,
intellectual curiosity, and reservoir of environmental
health [1]. Microbial communities are also mediators of
insight into global warming processes [2,3], agricultural
success [4], pathogenicity [5,6], and even human obesity
[7,8].
In the mid-1980 s, researchers began to sequence ribo-
somal RNAs from environmental samples in order to
characterize the types of microbes present in those sam-
ples, (e.g., [9,10]). This general approach was revolution-
ized by the invention of the polymerase chain reaction
(PCR), which made it relatively easy to clone and then
* Correspondence: jaeisen@ucdavis.edu
1 Department of Medical Microbiology and Immunology and the Department
of Evolution and Ecology, Genome Center, University of California Davis, One
Shields Avenue, Davis, CA, 95616, USA
† Contributed equally
Full list of author information is available at the end of the article
WATERS - Kepler Workflow for rRNA
matics 2010, 11:317
.com/1471-2105/11/317
Page 2 of 14
genes for ribosomal RNA) in partic-
ubunit ribosomal RNA (ss-rRNA).
ed a large amount of previously
l diversity [1,11-13]. Researchers
all subunit rRNA gene not only
ith which it can be PCR amplified,
has variable and highly conserved
to be universally distributed among
nd it is useful for inferring phyloge-
4,15]. Since then, "cultivation-inde-
" have brought a revolution to the
by allowing scientists to study a
mount of diversity in many different
ments [16-18]. The general premise
Figure 1 Overview of WATERS. Schema of WATERS where white
boxes indicate "behind the scenes" analyses that are performed in WA-
Align
Check
chimeras
Cluster Build
Tree
Assign
Taxonomy
Tree w/
Taxonomy
Diversity
statistics &
graphs
Unifrac
files
Cytoscape
network
OTU table
Hartman et al. BMC Bioinformatics 2010, 11:317
http://www.biomedcentral.com/1471-2105/11/317
Page 3 of 14
Motivations
As outlined above, successfully processing microbial
sequence collections is far from trivial. Each step is com-
plex and usually requires significant bioinformatics
expertise and time investment prior to the biological
interpretation. In order to both increase efficiency and
ensure that all best-practice tools are easily usable, we
sought to create an "all-inclusive" method for performing
all of these bioinformatics steps together in one package.
To this end, we have built an automated, user-friendly,
workflow-based system called WATERS: a Workflow for
the Alignment, Taxonomy, and Ecology of Ribosomal
Sequences (Fig. 1). In addition to being automated and
simple to use, because WATERS is executed in the Kepler
scientific workflow system (Fig. 2) it also has the advan-
tage that it keeps track of the data lineage and provenance
of data products [23,24].
Automation
The primary motivation in building WATERS was to
minimize the technical, bioinformatics challenges that
arise when performing DNA sequence clustering, phylo-
genetic tree, and statistical analyses by automating the 16
S rDNA analysis workflow. We also hoped to exploit
additional features that workflow-based approaches
entail, such as optimized execution and data lineage
tracking and browsing [23,25-27]. In the earlier days of 16
S rDNA analysis, simply knowing which microbes were
present and whether they were biologically novel was a
noteworthy achievement. It was reasonable and expected,
therefore, to invest a large amount of time and effort to
get to that list of microbes. But now that current efforts
are significantly more advanced and often require com-
parison of dozens of factors and variables with datasets of
thousands of sequences, it is not practically feasible to
process these large collections "by hand", and hugely inef-
ficient if instead automated methods can be successfully
employed.
Broadening the user base
A second motivation and perspective is that by minimiz-
ing the technical difficulty of 16 S rDNA analysis through
the use of WATERS, we aim to make the analysis of these
datasets more widely available and allow individuals with
Figure 2 Screenshot of WATERS in Kepler software. Key features: the library of actors un-collapsed and displayed on the left-hand side, the input
and output paths where the user declares the location of their input files and desired location for the results files. Each green box is an individual Kepler
actor that performs a single action on the data stream. The connectors (black arrows) direct and hook up the actors in a defined sequence. Double-
clicking on any actor or connector allows it to be manipulated and re-arranged.
Hartman et al. BMC Bioinformatics 2010, 11:317
http://www.biomedcentral.com/1471-2105/11/317
Page 9 of
default is 97% and 99%), and they are also generated for
every metadata variable comparison that the user
includes.
Data pruning
To assist in troubleshooting and quality contro
WATERS returns to the user three fasta files of sequenc
Figure 3 Biologically similar results automatically produced by WATERS on published colonic microbiota samples. (A) Rarefaction curves sim
ilar to curves shown in Eckburg et al. Fig. 2; 70-72, indicate patient numbers, i.e., 3 different individuals. (B) Weighted Unifrac analysis based on phylo
genetic tree and OTU data produced by WATERS very similar to Eckburg et al. Fig. 3B. (C) Neighbor-joining phylogenetic tree (Quicktree) representing
the sequences analyzed by WATERS, which is clearly similar to Fig. S1 in Eckburg et al.
BA
3 3HUFHQW YDULDWLRQ H[SODLQHG
33HUFHQWYDULDWLRQH[SODLQHG
$%
&
'(
)
6
$ %
&
'(
)
6
$
%&
'
()
6
3&$ 3 YV 3
C
%$&7(52,'(7(6
%$&7(52,'$/(6
'(/7$3527(2%$&7(5,$
$&7,12%$&7(5,$
9(558&20,&52%,$
(36,/213527(2%$&7(5,$
),50,&87(6
&/2675,',$
&/2675,',$/(6
*$00$3527(2%$&7(5,$
&<$12%$&7(5,$
$/3+$3527(2%$&7(5,$
)862%$&7(5,$
),50,&87(6
%$&,//,
),50,&87(6
02//,&87(6
Amber
Hartman
28. Tree from Woese. 1987.
Microbiological Reviews 51:221
rRNA Not Perfect
Nothing is Perfect
29. rRNA Phylogeny Copy # Correction
Kembel SW, Wu M,
Eisen JA, Green JL
(2012) Incorporating
16S Gene Copy
Number Information
Improves Estimates of
Microbial Diversity and
Abundance. PLoS
Comput Biol 8(10):
e1002743. doi:
10.1371/journal.pcbi.
1002743 Steven
Kembel
Jessica
Green
Martin
Wu
30. Tree Complications 1
!29
rRNA rRNArRNA
ACUGC
ACCUAU
CGUUCG
ACUCC
AGCUAU
CGAUCG
ACCCC
AGCUCU
CGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
R ACUCCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
F ACUCCAGGUAUCGAUCG
C ACCCCAGCUCUCGCUCG
W ACCCCAGCUCUGGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
C ACCCCAGCUCUCGCUCG
EuksBacteria Arch
Isolate Ribosomes
Arch
31. Tree Complications 2
!30
rRNA rRNArRNA
ACUGC
ACCUAU
CGUUCG
ACUCC
AGCUAU
CGAUCG
ACCCC
AGCUCU
CGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
R ACUCCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
F ACUCCAGGUAUCGAUCG
C ACCCCAGCUCUCGCUCG
W ACCCCAGCUCUGGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
C ACCCCAGCUCUCGCUCG
EuksBacteria Arch
Isolate Ribosomes
Arch
32. Tree Complications 3
!31
rRNA rRNArRNA
ACUGC
ACCUAU
CGUUCG
ACUCC
AGCUAU
CGAUCG
ACCCC
AGCUCU
CGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
R ACUCCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
F ACUCCAGGUAUCGAUCG
C ACCCCAGCUCUCGCUCG
W ACCCCAGCUCUGGCUCG
Taxa Characters
S ACUGCACCUAUCGUUCG
E ACUCCAGCUAUCGAUCG
C ACCCCAGCUCUCGCUCG
EuksBacteria Arch
Isolate Ribosomes
Arch
33. Automated Accurate Genome Tree
Lang JM, Darling AE, Eisen JA (2013) Phylogeny of
Bacterial and Archaeal Genomes Using Conserved
Genes: Supertrees and Supermatrices. PLoS ONE
8(4): e62510. doi:10.1371/journal.pone.0062510
Jenna
Lang
Aaron
Darling
38. Culture Independent “Metagenomics”
DNA DNADNA
!35
Taxa Characters
B1 ACTGCACCTATCGTTCG
B2 ACTCCACCTATCGTTCG
E1 ACTCCAGCTATCGATCG
E2 ACTCCAGGTATCGATCG
A1 ACCCCAGCTCTCGCTCG
A2 ACCCCAGCTCTGGCTCG
New1 ACCCCAGCTCTGCCTCG
New2 AGGGGAGCTCTGCCTCG
New3 ACTCCAGCTATCGATCG
New4 ACTGCACCTATCGTTCG
RecA RecARecA
http://genomebiology.com/2008/9/10/R151 Genome Biology 2008, Volume 9, Issue 10, Article R151 Wu and Eisen R151.7
Genome Biology 2008, 9:R151
sequences are not conserved at the nucleotide level [29]. As a
result, the nr database does not actually contain many more
protein marker sequences that can be used as references than
those available from complete genome sequences.
Comparison of phylogeny-based and similarity-based phylotyping
Although our phylogeny-based phylotyping is fully auto-
mated, it still requires many more steps than, and is slower
than, similarity based phylotyping methods such as a
MEGAN [30]. Is it worth the trouble? Similarity based phylo-
typing works by searching a query sequence against a refer-
ence database such as NCBI nr and deriving taxonomic
information from the best matches or 'hits'. When species
that are closely related to the query sequence exist in the ref-
erence database, similarity-based phylotyping can work well.
However, if the reference database is a biased sample or if it
contains no closely related species to the query, then the top
hits returned could be misleading [31]. Furthermore, similar-
ity-based methods require an arbitrary similarity cut-off
value to define the top hits. Because individual bacterial
genomes and proteins can evolve at very different rates, a uni-
versal cut-off that works under all conditions does not exist.
As a result, the final results can be very subjective.
In contrast, our tree-based bracketing algorithm places the
query sequence within the context of a phylogenetic tree and
only assigns it to a taxonomic level if that level has adequate
sampling (see Materials and methods [below] for details of
the algorithm). With the well sampled species Prochlorococ-
cus marinus, for example, our method can distinguish closely
related organisms and make taxonomic identifications at the
species level. Our reanalysis of the Sargasso Sea data placed
672 sequences (3.6% of the total) within a P. marinus clade.
On the other hand, for sparsely sampled clades such as
Aquifex, assignments will be made only at the phylum level.
Thus, our phylogeny-based analysis is less susceptible to data
sampling bias than a similarity based approach, and it makes
Major phylotypes identified in Sargasso Sea metagenomic dataFigure 3
Major phylotypes identified in Sargasso Sea metagenomic data. The metagenomic data previously obtained from the Sargasso Sea was reanalyzed using
AMPHORA and the 31 protein phylogenetic markers. The microbial diversity profiles obtained from individual markers are remarkably consistent. The
breakdown of the phylotyping assignments by markers and major taxonomic groups is listed in Additional data file 5.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Alphaproteobacteria
Betaproteobacteria
G
am
m
aproteobacteria
D
eltaproteobacteria
Epsilonproteobacteria
U
nclassified
proteobacteria
Bacteroidetes
C
hlam
ydiae
C
yanobacteria
Acidobacteria
Therm
otogae
Fusobacteria
ActinobacteriaAquificae
Planctom
ycetes
Spirochaetes
Firm
icutes
C
hloroflexiC
hlorobi
U
nclassified
bacteria
dnaG
frr
infC
nusA
pgk
pyrG
rplA
rplB
rplC
rplD
rplE
rplF
rplK
rplL
rplM
rplN
rplP
rplS
rplT
rpmA
rpoB
rpsB
rpsC
rpsE
rpsI
rpsJ
rpsK
rpsM
rpsS
smpB
tsf
Relativeabundance
RpoB RpoBRpoB
Rpl4 Rpl4Rpl4 rRNA rRNArRNA
Hsp70 Hsp70Hsp70
EFTu EFTuEFTu
Many other genes
better than rRNA
40. Phylotyping w/ Protein Markers
AMPHORA
http://genomebiology.com/2008/9/10/R151 Genome Biology 2008, Volume 9, Issue 10, Article R151 Wu and Eisen R151.7
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Alphaproteobacteria
Betaproteobacteria
G
am
m
aproteobacteria
D
eltaproteobacteria
Epsilonproteobacteria
U
nclassified
proteobacteria
Bacteroidetes
C
hlam
ydiae
C
yanobacteria
Acidobacteria
Therm
otogae
Fusobacteria
ActinobacteriaAquificae
Planctom
ycetes
Spirochaetes
Firm
icutes
C
hloroflexiC
hlorobi
U
nclassified
bacteria
dnaG
frr
infC
nusA
pgk
pyrG
rplA
rplB
rplC
rplD
rplE
rplF
rplK
rplL
rplM
rplN
rplP
rplS
rplT
rpmA
rpoB
rpsB
rpsC
rpsE
rpsI
rpsJ
rpsK
rpsM
rpsS
smpB
tsf
Relativeabundance
Martin Wu
41. GOS 1
GOS 2
GOS 3
GOS 4
GOS 5
Phylogenetic ID of Novel Lineages
Dongying
Wu
Wu D, Wu M, Halpern A, Rusch DB,
Yooseph S, Frazier M, et al. (2011)
Stalking the Fourth Domain in
Metagenomic Data: Searching for,
Discovering, and Interpreting Novel, Deep
Branches in Marker Gene Phylogenetic
Trees. PLoS ONE 6(3): e18011. doi:
10.1371/journal.pone.0018011
42. Phylogenetic Diversity of Metagenomes
typically used as a qualitative measure because duplicate s
quences are usually removed from the tree. However, the
test may be used in a semiquantitative manner if all clone
even those with identical or near-identical sequences, are i
cluded in the tree (13).
Here we describe a quantitative version of UniFrac that w
call “weighted UniFrac.” We show that weighted UniFrac b
haves similarly to the FST test in situations where both a
FIG. 1. Calculation of the unweighted and the weighted UniFr
measures. Squares and circles represent sequences from two differe
environments. (a) In unweighted UniFrac, the distance between t
circle and square communities is calculated as the fraction of t
branch length that has descendants from either the square or the circ
environment (black) but not both (gray). (b) In weighted UniFra
branch lengths are weighted by the relative abundance of sequences
the square and circle communities; square sequences are weight
twice as much as circle sequences because there are twice as many tot
circle sequences in the data set. The width of branches is proportion
to the degree to which each branch is weighted in the calculations, an
gray branches have no weight. Branches 1 and 2 have heavy weigh
since the descendants are biased toward the square and circles, respe
tively. Branch 3 contributes no value since it has an equal contributio
from circle and square sequences after normalization.
Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of
Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214
Jessica
Green
Steven
Kembel
Katie
Pollard
43. Phylosift
Input Sequences
rRNA workflow
protein workflow
profile HMMs used to align
candidates to reference alignment
Taxonomic
Summaries
parallel option
hmmalign
multiple alignment
LAST
fast candidate search
pplacer
phylogenetic placement
LAST
fast candidate search
LAST
fast candidate search
search input against references
hmmalign
multiple alignment
hmmalign
multiple alignment
Infernal
multiple alignment
LAST
fast candidate search
<600 bp
>600 bp
Sample Analysis &
Comparison
Krona plots,
Number of reads placed
for each marker gene
Edge PCA,
Tree visualization,
Bayes factor tests
eachinputsequencescannedagainstbothworkflows
Aaron Darling
@koadman
Erik Matsen
@ematsen
Holly Bik
@hollybik
Guillaume Jospin
@guillaumejospin
Darling AE, Jospin G, Lowe E,
Matsen FA IV, Bik HM, Eisen JA.
(2014) PhyloSift: phylogenetic
analysis of genomes and
metagenomes. PeerJ 2:e243
http://dx.doi.org/10.7717/peerj.
243
Erik Lowe
44. Edge PCA: Identify
lineages that explain most
variation among samples
Edge PCA - Matsen and Evans 2013
Output: Edge PCA
46. PHYLOGENENETIC PREDICTION OF GENE FUNCTION
IDENTIFY HOMOLOGS
OVERLAY KNOWN
FUNCTIONS ONTO TREE
INFER LIKELY FUNCTION
OF GENE(S) OF INTEREST
1 2 3 4 5 6
3 5
3
1A 2A 3A 1B 2B 3B
2A 1B
1A
3A
1B
2B
3B
ALIGN SEQUENCES
CALCULATE GENE TREE
1
2
4
6
CHOOSE GENE(S) OF INTEREST
2A
2A
5
3
Species 3Species 1 Species 2
1
1 2
2
2 31
1A 3A
1A 2A 3A
1A 2A 3A
4 6
4 5 6
4 5 6
2B 3B
1B 2B 3B
1B 2B 3B
ACTUAL EVOLUTION
(ASSUMED TO BE UNKNOWN)
Duplication?
EXAMPLE A EXAMPLE B
Duplication?
Duplication?
Duplication
5
METHOD
Ambiguous
Based on
Eisen, 1998
Genome Res 8:
163-167.
Phylogenomics
47. Overlaying Functions onto Tree
Aquae Trepa
Rat
Fly
Xenla
Mouse
Human
Yeast
Neucr
Arath
Borbu
Synsp
Neigo
Thema
Strpy
Bacsu
Ecoli
TheaqDeira
Chltr
Spombe
Yeast
Yeast
Spombe
Mouse
Human
Arath
Yeast
Human
Mouse
Arath
StrpyBacsu
Human
Celeg
Yeast
MetthBorbu
Aquae
Synsp
Deira Helpy
mSaco
Yeast
Celeg
Human
MSH4
MSH5
MutS2
MutS1
MSH1
MSH3
MSH6
MSH2
Based on Eisen, 1998
Nucl Acids Res 26: 4291-4300.
52. Phylogenetic Prediction of Function
• Many powerful and automated similarity based
methods for assigning genes to protein families
• COGs
• PFAM HMM searches
• Some limitations of similarity based methods can be
overcome by phylogenetic approaches
• Automated methods now available
• Sean Eddy
• Steven Brenner
• Kimmen Sjölander
53. Phylogenetic Prediction of Function
• Many powerful and automated similarity based
methods for assigning genes to protein families
• COGs
• PFAM HMM searches
• Some limitations of similarity based methods can be
overcome by phylogenetic approaches
• Automated methods now available
• Sean Eddy
• Steven Brenner
• Kimmen Sjölander
• But …
54. Carboxydothermus hydrogenoformans
• Isolated from a Russian hotspring
• Thermophile (grows at 80°C)
• Anaerobic
• Grows very efficiently on CO (Carbon
Monoxide)
• Produces hydrogen gas
• Low GC Gram positive (Firmicute)
• Genome Determined (Wu et al. 2005
PLoS Genetics 1: e65. )
57. Non-Homology Predictions:
Phylogenetic Profiling
• Step 1: Search all genes in
organisms of interest against all
other genomes
• Ask: Yes or No, is each gene
found in each other species
• Cluster genes by distribution
patterns (profiles)
62. HiC Crosslinking & Sequencing
Beitel CW, Froenicke L, Lang JM, Korf IF, Michelmore
RW, Eisen JA, Darling AE. (2014) Strain- and plasmid-
level deconvolution of a synthetic metagenome by
sequencing proximity ligation products. PeerJ 2:e415
http://dx.doi.org/10.7717/peerj.415
Table 1 Species alignment fractions. The number of reads aligning to each replicon present in the
synthetic microbial community are shown before and after filtering, along with the percent of total
constituted by each species. The GC content (“GC”) and restriction site counts (“#R.S.”) of each replicon,
species, and strain are shown. Bur1: B. thailandensis chromosome 1. Bur2: B. thailandensis chromosome
2. Lac0: L. brevis chromosome, Lac1: L. brevis plasmid 1, Lac2: L. brevis plasmid 2, Ped: P. pentosaceus,
K12: E. coli K12 DH10B, BL21: E. coli BL21. An expanded version of this table can be found in Table S2.
Sequence Alignment % of Total Filtered % of aligned Length GC #R.S.
Lac0 10,603,204 26.17% 10,269,562 96.85% 2,291,220 0.462 629
Lac1 145,718 0.36% 145,478 99.84% 13,413 0.386 3
Lac2 691,723 1.71% 665,825 96.26% 35,595 0.385 16
Lac 11,440,645 28.23% 11,080,865 96.86% 2,340,228 0.46 648
Ped 2,084,595 5.14% 2,022,870 97.04% 1,832,387 0.373 863
BL21 12,882,177 31.79% 2,676,458 20.78% 4,558,953 0.508 508
K12 9,693,726 23.92% 1,218,281 12.57% 4,686,137 0.507 568
E. coli 22,575,903 55.71% 3,894,739 17.25% 9,245,090 0.51 1076
Bur1 1,886,054 4.65% 1,797,745 95.32% 2,914,771 0.68 144
Bur2 2,536,569 6.26% 2,464,534 97.16% 3,809,201 0.672 225
Bur 4,422,623 10.91% 4,262,279 96.37% 6,723,972 0.68 369
Figure 1 Hi-C insert distribution. The distribution of genomic distances between Hi-C read pairs is
shown for read pairs mapping to each chromosome. For each read pair the minimum path length on
the circular chromosome was calculated and read pairs separated by less than 1000 bp were discarded.
The 2.5 Mb range was divided into 100 bins of equal size and the number of read pairs in each bin
was recorded for each chromosome. Bin values for each chromosome were normalized to sum to 1 and
plotted.
E. coli K12 genome were distributed in a similar manner as previously reported (Fig. 1;
(Lieberman-Aiden et al., 2009)). We observed a minor depletion of alignments spanning
the linearization point of the E. coli K12 assembly (e.g., near coordinates 0 and 4686137)
due to edge eVects induced by BWA treating the sequence as a linear chromosome rather
than circular.
10.7717/peerj.415 9/19
Figure 2 Metagenomic Hi-C associations. The log-scaled, normalized number of Hi-C read pairs
associating each genomic replicon in the synthetic community is shown as a heat map (see color scale,
blue to yellow: low to high normalized, log scaled association rates). Bur1: B. thailandensis chromosome
1. Bur2: B. thailandensis chromosome 2. Lac0: L. brevis chromosome, Lac1: L. brevis plasmid 1, Lac2:
L. brevis plasmid 2, Ped: P. pentosaceus, K12: E. coli K12 DH10B, BL21: E. coli BL21.
reference assemblies of the members of our synthetic microbial community with the same
alignment parameters as were used in the top ranked clustering (described above). We first
Figure 3 Contigs associated by Hi-C reads. A graph is drawn with nodes depicting contigs and edges
depicting associations between contigs as indicated by aligned Hi-C read pairs, with the count thereof
depicted by the weight of edges. Nodes are colored to reflect the species to which they belong (see legend)
with node size reflecting contig size. Contigs below 5 kb and edges with weights less than 5 were excluded.
Contig associations were normalized for variation in contig size.
typically represent the reads and variant sites as a variant graph wherein variant sites are
represented as nodes, and sequence reads define edges between variant sites observed in
the same read (or read pair). We reasoned that variant graphs constructed from Hi-C
data would have much greater connectivity (where connectivity is defined as the mean
path length between randomly sampled variant positions) than graphs constructed from
mate-pair sequencing data, simply because Hi-C inserts span megabase distances. Such
Figure 4 Hi-C contact maps for replicons of Lactobacillus brevis. Contact maps show the number of
Hi-C read pairs associating each region of the L. brevis genome. The L. brevis chromosome (Lac0, (A),
Chris Beitel
@datscimed
Aaron Darling
@koadman
63. Pink Berries
PB-PSB1
(Purple sulfur bacteria)
PB-SRB1
(Sulfate reducing bacteria)
(sulfate)
(sulfide)
Wilbanks, E.G. et al (2014). Environmental Microbiology
Lizzy Wilbanks
@lizzywilbanks
64. Long Reads Help, A Lot
Hiseq & Miseq
100-250 bp
Moleculo
2-20 kb
Pacbio RSII
2-20kb
Micky Kertesz,
Tim Blauwcamp
Meredith Ashby
Cheryl Heiner
Illumina-based
“synthetic long
reads”
Real-time single
molecule
sequencing
(p4-c2, p5-c3)
295 Megabases 474 Megabases61 Gigabases
72. Wu et al. 2006 PLoS Biology 4: e188.
Baumannia makes vitamins and cofactors
Sulcia makes amino acids
Simple Symbioses
Wu et al. 2006 PLoS Biology 4: e188.
Baumannia makes vitamins and cofactors
Sulcia makes amino acids
Phylogenetic Binning
Nancy Moran
Dongying Wu
73. Drosophila microbiome w/ Kopp Lab
Both natural surveys and laboratory
experiments indicate that host diet
plays a major role in shaping the
Drosophila bacterial microbiome.
Laboratory strains provide only a
limited model of natural host–microbe
interactions
Jenna Lang Angus Chandler
74. Rice Microbiome w/ Sundar Lab
Edwards et al. 2015. Structure, variation,
and assembly of the root-associated
microbiomes of rice. PNAS
9
Supplementary Figures31
32
Fig. S1 Map depicting soil collection locations for greenhouse experiment.33
10
234
Fig. S2. Sampling and collection of the rhizocompartments. Roots are collected from rice235
plants and soil is shaken off the roots to leave ~1mm of soil around the roots. The ~1 mm of soil236
three separate rhizocompartments: the rhizosphere, rhizoplane,
and endosphere (Fig. 1A). Because the root microbiome has
been shown to correlate with the developmental stage of the
plant (10), the root-associated microbial communities were
sampled at 42 d (6 wk), when rice plants from all genotypes were
well-established in the soil but still in their vegetative phase of
growth. For our study, the rhizosphere compartment was com-
w
i
t
i
(
t
s
z
i
m
a
r
t
t
(
t
m
P
h
t
P
p
(
i
M
P
a
t
o
s
q
a
n
v
v
p
t
p
s
G
Fig. 1. Root-associated microbial communities are separable by rhizo-
compartment and soil type. (A) A representation of a rice root cross-section
depicting the locations of the microbial communities sampled. (B) Within-
sample diversity (α-diversity) measurements between rhizospheric compart-
ments indicate a decreasing gradient in microbial diversity from the rhizo-
sphere to the endosphere independent of soil type. Estimated species
richness was calculated as eShannon_entropy
. The horizontal bars within boxes
represent median. The tops and bottoms of boxes represent 75th and 25th
quartiles, respectively. The upper and lower whiskers extend 1.5× the
interquartile range from the upper edge and lower edge of the box, re-
spectively. All outliers are plotted as individual points. (C) PCoA using the
WUF metric indicates that the largest separation between microbial com-
munities is spatial proximity to the root (PCo 1) and the second largest
source of variation is soil type (PCo 2). (D) Histograms of phyla abundances in
each compartment and soil. B, bulk soil; E, endosphere; P, rhizoplane; S,
rhizosphere; Sac, Sacramento.
2 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1414592112
gate the relationship between rice ge-
icrobiome, domesticated rice varieties
rated growing regions were tested. Six
spanning two species within the Oryza
2 d in the greenhouse before sampling.
a) cultivars M104, Nipponbare (both
ties), IR50, and 93-11 (both indica va-
gside two cultivars of African cultivated
g7102 (Glab B) and TOg7267 (Glab E).
ed that rice genotype accounted for
ariation between microbial communities
% of the variance, P < 0.001; Dataset
f the variance, P < 0.066; Dataset S5H);
ntations for clustering patterns of the
nt on the first two axes of unconstrained
ppendix, Fig. S10). We then used CAP
ffect of rice genotype on the microbial
g on rice cultivar and controlling for
and technical factors, we found that ge-
ice have a significant effect on root-
mmunities (5.1%, P = 0.005, WUF, Fig.
, UUF, SI Appendix, Fig. S11A). Ordi-
AP analysis revealed clustering patterns
only partially consistent with genetic
F and UUF metrics. The two japonica
her and the two O. glaberrima cultivars
ver, the indica cultivars were split, with
O. glaberrima cultivars and IR50 clus-
cultivars.
enotypic effect manifests in individual
eparated the whole dataset to focus on
vidually and conducted CAP analysis
and technical factors. The rhizosphere
eight sites were operated under two cultivation practices: organic
cultivation and a more conventional cultivation practice termed
“ecofarming” (see below). Because genotype explained the least
variance in the greenhouse data, we limited the analysis to one
cultivar, S102, a California temperate japonica variety that is
widely cultivated by commercial growers and is closely related to
M104 (26). Field samples were collected from vegetatively
growing rice plants in flooded fields and the previously defined
rhizocompartments were analyzed as before. Unfortunately,
collection of bulk soil controls for the field experiment was not
Fig. 3. Host plant genotype significantly affects microbial communities in
the rhizospheric compartments. (A) Ordination of CAP analysis using the
WUF metric constrained to rice genotype. (B) Within-sample diversity
measurements of rhizosphere samples of each cultivar grown in each soil.
Estimated species richness was calculated as eShannon_entropy
. The horizontal
bars within boxes represent median. The tops and bottoms of boxes repre-
sent 75th and 25th quartiles, respectively. The upper and lower whiskers
extend 1.5× the interquartile range from the upper edge and lower edge of
the box, respectively. All outliers are plotted as individual points.
oi/10.1073/pnas.1414592112 Edwards et al.
fields are too high to find representative soil that is unlikely to
be affected by nearby plants. Amplification and sequencing of
the field microbiome samples yielded 13,349,538 high-quality
sequences (median: 54,069 reads per sample; range: 12,535–
148,233 reads per sample; Dataset S13). The sequences were
clustered into OTUs using the same criteria as the greenhouse
experiment, yielding 222,691 microbial OTUs and 47,983 OTUs
with counts >5 across the field dataset.
We found that the microbial diversity of field rice plants is
significantly influenced by the field site. α-Diversity measure-
ments of the field rhizospheres indicated that the cultivation site
significantly impacts microbial diversity (SI Appendix, Fig. S14A,
P = 2.00E-16, ANOVA and Dataset S14). Unconstrained PCoA
using both the WUF and UUF metrics showed that microbial
communities separated by field site across the first axis (Fig. 4B,
WUF and SI Appendix, Fig. S14B, UUF). PERMANOVA agreed
with the unconstrained PCoA in that field site explained the
largest proportion of variance between the microbial communi-
ties for field plants (30.4% of variance, P < 0.001, WUF, Dataset
S5O and 26.6% of variance, P < 0.001, UUF, Dataset S5P). CAP
analysis constrained to field site and controlled for rhizocom-
partment, cultivation practice, and technical factors (sequencing
batch and biological replicate) agreed with the PERMANOVA
results in that the field site explains the largest proportion of
variance between the root-associated microbial communities in
field plants (27.3%, P = 0.005, WUF, SI Appendix, Fig. S15A
and 28.9%, P = 0.005, UUF, SI Appendix, Fig. S15E), sug-
gesting that geographical factors may shape root-associated
microbial communities.
Rhizospheric Compartmentalization Is Retained in Field Plants. Sim-
ilar to the greenhouse plants, the rhizospheric microbiomes of
field plants are distinguishable by compartment. α-Diversity of
the field plants again showed that the rhizosphere had the
highest microbial diversity, whereas the endosphere had the least
S15). PCoA
the WUF a
compartmen
Appendix, F
separation i
ond largest
(20.76%, P
UUF, Data
biomes cons
trolled for f
agreed with
variance bet
compartmen
and 10.9%,
Taxonomi
overall sim
Chloroflexi,
microbiota.
endosphere
Proteobacteri
and Plancto
distribution
trend from t
Appendix, Fi
We again
OTUs in the
S16). We fo
endosphere c
representing
Fig. S17). Th
the genus A
and Alphap
terestingly, 1
found to b
greenhouse
OTUs were
sisted of tax
and Myxoco
bidopsis roo
Cultivation Pr
The rice fiel
practices, org
tion called
farming in th
are all perm
harvest fumi
itself does si
partments ov
a significant
the rhizocom
indicating th
affected diffe
the rhizosph
practice, with
zospheres th
Dataset S14)
crobial comm
tests; Datase
practices are
the WUF m
S14D). PERFig. 4. Root-associated microbiomes from field-grown plants are separable
by cultivation site, rhizospheric compartment, and cultivation practice. (A)
Variation w/in Plant
Cultivation Site Effects
Rice Genotype Effects
and mitochondrial) reads to analyze microbial abundance in
the endosphere over time (Fig. 6A). Using this technique, we
confirmed the sterility of seedling roots before transplantation.
We found that microbial penetrance into the endosphere oc-
curred at or before 24 h after transplantation and that the pro-
portion of microbial reads to organellar reads increased over the
first 2 wk after transplantation (Fig. 6A). To further support the
evidence for microbiome acquisition within the first 24 h, we
sampled root endospheric microbiomes from sterilely germi-
nated seedlings before transplanting into Davis field soil as well
as immediately after transplantation and 24 h after transplan-
tation (SI Appendix, Fig. S24). The root endospheres of sterilely
germinated seedlings, as well as seedlings transplanted into
Davis field soil for 1 min, both had a very low percentage of
microbial reads compared with organellar reads (0.22% and
0.71%), with the differences not statistically significant (P = 0.1,
Wilcoxon test). As before, endospheric microbial abundance
increased significantly, by >10-fold after 24 h in field soil (3.95%,
P = 0.05, Wilcoxon test). We conclude that brief soil contact
does not strongly increase the proportion of microbial reads, and
therefore the increase in microbial reads at 24 h is indicative of
endophyte acquisition within 1 d after transplantation.
α-Diversity significantly varied by rhizocompartment (P < 2E-
16; Dataset S23) and there was a significant interaction between
rhizocompartment and collection time (P = 0.042; Dataset S23);
however, when each rhizocompartment was analyzed individ-
ually, the bulk soil was the only compartment that showed
(13 d) approach the endosphere and rhizoplane microbiome
compositions for plants that have been grown in the green-
house for 42 d.
There are slight shifts in the distribution of phyla over time;
however, there are significant distinctions between the com-
partments starting as early as 24 h after transplantation into soil
(Fig. 6D, SI Appendix, Figs. S24B and S26, and Dataset S24).
Because each phylum consists of diverse OTUs that could ex-
hibit very different behaviors during acquisition, we next ex-
amined the dynamics and colonization patterns of specific
OTUs within the time-course experiment. The core set of 92
endosphere-enriched OTUs obtained from the previous green-
house experiment (SI Appendix, Fig. S9C) was analyzed for
relative abundances at different time points (Fig. 6E). Of the 92
core endosphere-enriched microbes present in the greenhouse
experiment, 53 OTUs were detectable in the endosphere in the
time-course experiment. The average abundance profile over
time revealed a colonization pattern for the core endospheric
microbiome. Relative abundance of the core endosphere-
enriched microbiome peaks early (3 d) in the rhizosphere and
then decreases back to a steady, low level for the remainder of
the time points. Similarly, the rhizoplane profile shows an in-
crease after 3 d with a peak at 8 d with a decline at 13 d. The
endosphere generally follows the rhizoplane profile, except that
relative abundance is still increasing at 13 d. These results sug-
gest that the core endospheric microbes are first attracted to the
rhizosphere and then locate to the rhizoplane, where they attach
Fig. 5. OTU coabundance network reveals modules of OTUs associated with methane cycling. (A) Subset of the entire network corresponding to 11
modules with methane cycling potential. Each node represents one OTU and an edge is drawn between OTUs if they share a Pearson correlation of
greater than or equal to 0.6. (B) Depiction of module 119 showing the relationship between methanogens, syntrophs, methanotrophs, and other
methane cycling taxonomies. Each node represents one OTU and is labeled by the presumed function of that OTU’s taxonomy in methane cycling. An
edge is drawn between two OTUs if they have a Pearson correlation of greater than or equal to 0.6. (C) Mean abundance profile for OTUs in module 119
across all rhizocompartments and field sites. The position along the x axis corresponds to a different field site. Error bars represent SE. The x and y axes
represent no particular scale.
PLANTBIOLOGYPNASPLUS
Function x Genotype
of magnitude greater than in any single plant species to date.
Under controlled greenhouse conditions, the rhizocompartments
described the largest source of variation in the microbial com-
munities sampled (Dataset S5A). The pattern of separation be-
tween the microbial communities in each compartment is
consistent with a spatial gradient from the bulk soil across the
rhizosphere and rhizoplane into the endosphere (Fig. 1C).
Similarly, microbial diversity patterns within samples hold the
same pattern where there is a gradient in α-diversity from the
rhizosphere to the endosphere (Fig. 1B). Enrichment and de-
pletion of certain microbes across the rhizocompartments indi-
cates that microbial colonization of rice roots is not a passive
process and that plants have the ability to select for certain mi-
crobial consortia or that some microbes are better at filling the
root colonizing niche. Similar to studies in Arabidopsis, we found
that the relative abundance of Proteobacteria is increased in the
endosphere compared with soil, and that the relative abundances
of Acidobacteria and Gemmatimonadetes decrease from the soil
to the endosphere (9–11), suggesting that the distribution of
different bacterial phyla inside the roots might be similar for all
land plants (Fig. 1D and Dataset S6). Under controlled green-
house conditions, soil type described the second largest source
of variation within the microbial communities of each sample.
However, the soil source did not affect the pattern of separation
between the rhizospheric compartments, suggesting that the
rhizocompartments exert a recruitment effect on microbial con-
sortia independent of the microbiome source.
By using differential OTU abundance analysis in the com-
partments, we observed that the rhizosphere serves an enrich-
ment role for a subset of microbial OTUs relative to bulk soil
(Fig. 2). Further, the majority of the OTUs enriched in the
rhizosphere are simultaneously enriched in the rhizoplane and/or
endosphere of rice roots (Fig. 2B and SI Appendix, Fig. S16B),
consistent with a recruitment model in which factors produced by
the root attract taxa that can colonize the endosphere. We found
that the rhizoplane, although enriched for OTUs that are also
Time Series
75. Acknowledgements
DOE JGI Sloan GBMF NSF
DHS DARPA
Aaron Darling
Lizzy
Wilbanks
Jenna Lang Russell
Neches
Rob Knight
Jack Gilbert Tanja Woyke Rob Dunn
Katie Pollard
Jessica
Green
Darlene
Cavalier
Eddy RubinWendy Brown
Dongying Wu
Phil
Hugenholtz
DSMZ
Sundar
Srijak
Bhatnagar David Coil
Alex Alexiev
Hannah
Holland-Moritz
Holly Bik
John Zhang
Holly
Menninger
Guillaume
Jospin
David Lang
Cassie
Ettinger
Tim HarkinsJennifer Gardy
Holly Ganz