SlideShare a Scribd company logo
1 of 1
Inferring microbial community function from taxonomic composition
                                   Morgan G.I. Langille1,*, Jesse R.R. Zaneveld2, J Gregory Caporaso3, Joshua Reyes4,
                                  Dan Knights5, Daniel McDonald6, Rob Knight5, Robert G. Beiko1, Curtis Huttenhower4
 1Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada; 2Dept. of Microbiology, Oregon State University, Corvallis, OR, USA; 3Dept. of Computer Science, Northern Arizona University, Flagstaff, AZ, USA;4Dept. of
Biostatistics, Harvard School of Public Health, Boston, MA, USA; 5Dept. Computer Science, University of Colorado, Boulder, CO, USA; 6Biofrontiers Institute, University of Colorado, Boulder, CO, USA; *morgangilangille@gmail.com

  Abstract
  It is often most efficient to characterize microbial communities using taxonomic markers such as                      3. Genome Validation
  the 16S ribosomal small subunit rRNA gene. The 16S gene is typically used to describe the
  organisms or taxonomic units present in a sample, but data from such markers do not inherently                        3.1 Method
  reveal the molecular functions or ecological roles of members of a microbial community. We have                       1)    Remove a single genome from our reference dataset (pretending it has not been sequenced)
  developed and validated a novel computational method that takes a set of observed taxonomic                           2)    Use PI-CRUST to predict the functional abundances for our “unknown” genome using only its 16S gene
  abundances and infers abundance profiles of enzymes and pathways from multiple functional                             3)    Compare PI-CRUST predictions vs. the known functional abundances of our genome
  classification schemes (KEGG, PFAM, COG, etc.). We use ancestral state reconstruction to                              4)    Repeat for all completed genomes (>2000)
  determine approximate genomic content, taking into account 16S copy number and known                                  5)    Plot the distribution of accuracy values for each genome (3.2) or each functional group (3.3)
  functional abundance profiles from all currently available microbial genomes. We have evaluated
  the accuracy of this inference for different groups of taxa and for different areas of biological
  function. Our method, implemented as the PI-CRUST software (Phylogenetic Investigation of                             3.2 PI-CRUST accuracy for completed genomes
  Communities by Reconstruction of Unobserved STates), allows 16S metagenomic based studies to
  be extended to predict the functional abilities of microbiomes as well as to compare expected                             Using Various Ancestral State Reconstruction               Distance to nearest genome affects accuracy
  versus observed functions in shotgun based metagenomic experiments.


       1. PI-CRUST Software Pipeline
   1.1 Starting Data Sources (Internally used by PI-CRUST)
   •    Entire GreenGenes 16S reference tree.
   •    A functional “Trait Table” for all completed genomes (e.g. KEGG, PFAM, etc.). This contains
        abundances of each functional category for each genome in the IMG database.                                                                                                                         Endosymbionts&
   •    16S copy number information for each completed genome in IMG (used to normalize OTU tables)                                                                                                         Reduced Genomes
   •    GreenGenes identifier to IMG completed genomes map (to link information we have about
        completed genomes to tips in our reference tree).



   1.2 PI-CRUST: Genome Functional Predictions                                                                                                                                               16S phylogenetic distance to nearest species

         16S Copy                   Genome                                      Known functional composition                  “Random”: Functional abundances are chosen randomly from each of its distributions in all genomes.
          Number
        (completed       &      Functional Table
                                  (completed
                                                                                  (from sequenced genome)
                                                                                       Inferred ancestral
                                                                                                                              “Nearest Neighbour”: Functional profile from genome with closest 16S distance is used.
                                                                                                                              “PIC”: Ancestral state reconstruction using least squares regression (APE R package).
       genomes only)             genomes only)                                      functional composition                    “WAGNER”: Ancestral state reconstruction using Wagner parsimony (Count package).
                                                                               Predicted functional composition
                                                                                  (for unsequenced genome)
                                      Reference 16S Tree
                                         (greengenes)
                                                                                                                        3.3 PI-CRUST accuracy for various functional groups
                                                                          16S Copy             Functional
                                                                          Number                  Trait
                                                                         Predictions           Predictions
              Prune taxa with
                no genome
                information


                                                                             Predict
                                    Infer ancestral
                                                                           functional
                                    genome traits
                                                                          compositions



   1.3 User Input
   •    “OTU table”, Number of OTUs (with greengenes identifiers) per sample



   1.4 PI-CRUST: Metagenome Functional Predictions
                                                       16S Copy
                                                                                    Normalized
               OTU Table                               Number
                                                                                    OTU Table
                                                      Predictions
                                                                                                                                                                                 PI-CRUST Accuracy (for each SEED function)

                                                      Functional                       Metagenome                           The ability to predict functions from 16S varies depending on the functional class. Functions that are well
              Normalized                                                                Functional                          conserved and evolve similarly to 16S have higher accuracy, such as “RNA metabolism” and “Cell Division
                                                         Trait
              OTU Table                                                                 Predictions                         and Cell Cycle”. Other groups that tend not to be inherited by vertical descent such as “Phages, Prophages,
                                                      Predictions
                                                                                                                            Transposable Elements, Plasmids” are not predicted as accurately.


       2 Metagenome Validation                                                                                         4 Concluding Remarks
       2.1 Method
       1) Obtain microbiome samples with both whole metagenomic and 16S sequencing
                                                                                                                        4.1 Discussion
       2) Use PI-CRUST with 16S data to predict functions for samples                                                   •     Genome content has been shown in the past to vary widely even in closely related species. However,
       3) Compare PI-CRUST predictions with functions observed from sequencing                                                this may not be typical for the majority of bacterial and archaeal species. Our ability to predict the
                                                                                                                              functions encoded in an organism based solely by its 16S gene and knowledge from the thousands
                                                                                                                              of completed genomes suggests that gene content often has good phylogenetic correlation with 16S.
       2.2 PI-CRUST accuracy on HMP samples                                                                             •     PI-CRUST allows 16S-only studies to be expanded to include information about functional
                                                                                                                              abundances.
                                                                                                                        •     Studies with full metagenomic sequencing can use PI-CRUST to identify functions that are observed
                                                                                                                              but not expected based on their 16S profiles (i.e the taxa that are present in the sample).


                                                                                                                        4.2 Availability & Future Plans
                                                                                                                        • PI-CRUST is still under development but will be freely available under the GPL at:
                                                                                                                        http://picrust.sourceforge.net
                                                                                                                        • Various methods of ancestral state reconstruction and confidence weighting are still being evaluated.
                                                                                                                        • Evaluation of PI-CRUST on other paired metagenomic and 16S datasets is underway.



                                                                                                                         Acknowledgements
                                PI-CRUST predicted abundance based on 16S data                                          •     MGIL is the recipient of an IHMC travel award funded by the NIH.
       Each point represents the predicted vs. observed relative abundance for a single KEGG category                   •     MGIL and RGB are supported by a CIHR emerging team grant.

More Related Content

What's hot

What's hot (12)

Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencingForsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research Community
 
Pb Stem Cell Engineering
Pb Stem Cell EngineeringPb Stem Cell Engineering
Pb Stem Cell Engineering
 
Bioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmmBioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmm
 
German Russian Workshop 2011 - geneXplain
German Russian Workshop  2011 - geneXplainGerman Russian Workshop  2011 - geneXplain
German Russian Workshop 2011 - geneXplain
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysis
 
Thesis def
Thesis defThesis def
Thesis def
 
gene prediction programs
gene prediction programsgene prediction programs
gene prediction programs
 
artificial neural network-gene prediction
artificial neural network-gene predictionartificial neural network-gene prediction
artificial neural network-gene prediction
 
Genome analysis2
Genome analysis2Genome analysis2
Genome analysis2
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 

Viewers also liked

Unknown Genes, Community Profiling, & Biotorrents.net
Unknown Genes, Community Profiling, & Biotorrents.netUnknown Genes, Community Profiling, & Biotorrents.net
Unknown Genes, Community Profiling, & Biotorrents.net
Morgan Langille
 
Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...
Morgan Langille
 
Bacterial Identification by 16s rRNA Sequencing.ppt
Bacterial Identification by 16s rRNA Sequencing.pptBacterial Identification by 16s rRNA Sequencing.ppt
Bacterial Identification by 16s rRNA Sequencing.ppt
Rakesh Kumar
 

Viewers also liked (17)

MicrobeDB Overview
MicrobeDB OverviewMicrobeDB Overview
MicrobeDB Overview
 
Unknown Genes, Community Profiling, & Biotorrents.net
Unknown Genes, Community Profiling, & Biotorrents.netUnknown Genes, Community Profiling, & Biotorrents.net
Unknown Genes, Community Profiling, & Biotorrents.net
 
HMMER 3 & Community Profiling
HMMER 3 & Community ProfilingHMMER 3 & Community Profiling
HMMER 3 & Community Profiling
 
Diagrams
DiagramsDiagrams
Diagrams
 
"The Quest for A field Guide to the Microbes" talk by Jonathan Eisen February...
"The Quest for A field Guide to the Microbes" talk by Jonathan Eisen February..."The Quest for A field Guide to the Microbes" talk by Jonathan Eisen February...
"The Quest for A field Guide to the Microbes" talk by Jonathan Eisen February...
 
Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...
 
BioTorrents: A File Sharing Service for Scientific Data
BioTorrents: A File Sharing Service for Scientific DataBioTorrents: A File Sharing Service for Scientific Data
BioTorrents: A File Sharing Service for Scientific Data
 
Characterizing Protein Families of Unknown Function
Characterizing Protein Families of Unknown FunctionCharacterizing Protein Families of Unknown Function
Characterizing Protein Families of Unknown Function
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysis
 
Microbial Phylogenomics (EVE161) Class 1
Microbial Phylogenomics (EVE161) Class 1Microbial Phylogenomics (EVE161) Class 1
Microbial Phylogenomics (EVE161) Class 1
 
Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...
 
Bacterial Identification by 16s rRNA Sequencing.ppt
Bacterial Identification by 16s rRNA Sequencing.pptBacterial Identification by 16s rRNA Sequencing.ppt
Bacterial Identification by 16s rRNA Sequencing.ppt
 
Lactobacillus
LactobacillusLactobacillus
Lactobacillus
 
identification of bacteria
identification of bacteriaidentification of bacteria
identification of bacteria
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
 
DNA extraction presentation
DNA extraction presentationDNA extraction presentation
DNA extraction presentation
 
Dna extraction
Dna extractionDna extraction
Dna extraction
 

Similar to Inferring microbial community function from taxonomic composition

Unison: An Integrated Platform for Computational Biology Discovery
Unison: An Integrated Platform for Computational Biology DiscoveryUnison: An Integrated Platform for Computational Biology Discovery
Unison: An Integrated Platform for Computational Biology Discovery
Reece Hart
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
Atai Rabby
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_Presentation
Toyin23
 
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platformDissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Klaas Vandepoele
 
Lecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generationLecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generation
MohamedHasan816582
 

Similar to Inferring microbial community function from taxonomic composition (20)

Functional genomics, and tools
Functional genomics, and toolsFunctional genomics, and tools
Functional genomics, and tools
 
genomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.pptgenomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.ppt
 
Processing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing DataProcessing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing Data
 
Human SNPs in microRNA Target Sites
Human SNPs in microRNA Target SitesHuman SNPs in microRNA Target Sites
Human SNPs in microRNA Target Sites
 
Understanding Genome
Understanding Genome Understanding Genome
Understanding Genome
 
EiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.DEiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.D
 
Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing
 
Unison: An Integrated Platform for Computational Biology Discovery
Unison: An Integrated Platform for Computational Biology DiscoveryUnison: An Integrated Platform for Computational Biology Discovery
Unison: An Integrated Platform for Computational Biology Discovery
 
An26247254
An26247254An26247254
An26247254
 
Introduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinisIntroduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinis
 
Journal club Aug04 2015 GeneMarkET
Journal club Aug04 2015 GeneMarkETJournal club Aug04 2015 GeneMarkET
Journal club Aug04 2015 GeneMarkET
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
EVE 161 Winter 2018 Class 16
EVE 161 Winter 2018 Class 16EVE 161 Winter 2018 Class 16
EVE 161 Winter 2018 Class 16
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_Presentation
 
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platformDissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
 
Bioinformatica t8-go-hmm
Bioinformatica t8-go-hmmBioinformatica t8-go-hmm
Bioinformatica t8-go-hmm
 
Genome comparision
Genome comparisionGenome comparision
Genome comparision
 
Lecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generationLecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generation
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Inferring microbial community function from taxonomic composition

  • 1. Inferring microbial community function from taxonomic composition Morgan G.I. Langille1,*, Jesse R.R. Zaneveld2, J Gregory Caporaso3, Joshua Reyes4, Dan Knights5, Daniel McDonald6, Rob Knight5, Robert G. Beiko1, Curtis Huttenhower4 1Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada; 2Dept. of Microbiology, Oregon State University, Corvallis, OR, USA; 3Dept. of Computer Science, Northern Arizona University, Flagstaff, AZ, USA;4Dept. of Biostatistics, Harvard School of Public Health, Boston, MA, USA; 5Dept. Computer Science, University of Colorado, Boulder, CO, USA; 6Biofrontiers Institute, University of Colorado, Boulder, CO, USA; *morgangilangille@gmail.com Abstract It is often most efficient to characterize microbial communities using taxonomic markers such as 3. Genome Validation the 16S ribosomal small subunit rRNA gene. The 16S gene is typically used to describe the organisms or taxonomic units present in a sample, but data from such markers do not inherently 3.1 Method reveal the molecular functions or ecological roles of members of a microbial community. We have 1) Remove a single genome from our reference dataset (pretending it has not been sequenced) developed and validated a novel computational method that takes a set of observed taxonomic 2) Use PI-CRUST to predict the functional abundances for our “unknown” genome using only its 16S gene abundances and infers abundance profiles of enzymes and pathways from multiple functional 3) Compare PI-CRUST predictions vs. the known functional abundances of our genome classification schemes (KEGG, PFAM, COG, etc.). We use ancestral state reconstruction to 4) Repeat for all completed genomes (>2000) determine approximate genomic content, taking into account 16S copy number and known 5) Plot the distribution of accuracy values for each genome (3.2) or each functional group (3.3) functional abundance profiles from all currently available microbial genomes. We have evaluated the accuracy of this inference for different groups of taxa and for different areas of biological function. Our method, implemented as the PI-CRUST software (Phylogenetic Investigation of 3.2 PI-CRUST accuracy for completed genomes Communities by Reconstruction of Unobserved STates), allows 16S metagenomic based studies to be extended to predict the functional abilities of microbiomes as well as to compare expected Using Various Ancestral State Reconstruction Distance to nearest genome affects accuracy versus observed functions in shotgun based metagenomic experiments. 1. PI-CRUST Software Pipeline 1.1 Starting Data Sources (Internally used by PI-CRUST) • Entire GreenGenes 16S reference tree. • A functional “Trait Table” for all completed genomes (e.g. KEGG, PFAM, etc.). This contains abundances of each functional category for each genome in the IMG database. Endosymbionts& • 16S copy number information for each completed genome in IMG (used to normalize OTU tables) Reduced Genomes • GreenGenes identifier to IMG completed genomes map (to link information we have about completed genomes to tips in our reference tree). 1.2 PI-CRUST: Genome Functional Predictions 16S phylogenetic distance to nearest species 16S Copy Genome Known functional composition “Random”: Functional abundances are chosen randomly from each of its distributions in all genomes. Number (completed & Functional Table (completed (from sequenced genome) Inferred ancestral “Nearest Neighbour”: Functional profile from genome with closest 16S distance is used. “PIC”: Ancestral state reconstruction using least squares regression (APE R package). genomes only) genomes only) functional composition “WAGNER”: Ancestral state reconstruction using Wagner parsimony (Count package). Predicted functional composition (for unsequenced genome) Reference 16S Tree (greengenes) 3.3 PI-CRUST accuracy for various functional groups 16S Copy Functional Number Trait Predictions Predictions Prune taxa with no genome information Predict Infer ancestral functional genome traits compositions 1.3 User Input • “OTU table”, Number of OTUs (with greengenes identifiers) per sample 1.4 PI-CRUST: Metagenome Functional Predictions 16S Copy Normalized OTU Table Number OTU Table Predictions PI-CRUST Accuracy (for each SEED function) Functional Metagenome The ability to predict functions from 16S varies depending on the functional class. Functions that are well Normalized Functional conserved and evolve similarly to 16S have higher accuracy, such as “RNA metabolism” and “Cell Division Trait OTU Table Predictions and Cell Cycle”. Other groups that tend not to be inherited by vertical descent such as “Phages, Prophages, Predictions Transposable Elements, Plasmids” are not predicted as accurately. 2 Metagenome Validation 4 Concluding Remarks 2.1 Method 1) Obtain microbiome samples with both whole metagenomic and 16S sequencing 4.1 Discussion 2) Use PI-CRUST with 16S data to predict functions for samples • Genome content has been shown in the past to vary widely even in closely related species. However, 3) Compare PI-CRUST predictions with functions observed from sequencing this may not be typical for the majority of bacterial and archaeal species. Our ability to predict the functions encoded in an organism based solely by its 16S gene and knowledge from the thousands of completed genomes suggests that gene content often has good phylogenetic correlation with 16S. 2.2 PI-CRUST accuracy on HMP samples • PI-CRUST allows 16S-only studies to be expanded to include information about functional abundances. • Studies with full metagenomic sequencing can use PI-CRUST to identify functions that are observed but not expected based on their 16S profiles (i.e the taxa that are present in the sample). 4.2 Availability & Future Plans • PI-CRUST is still under development but will be freely available under the GPL at: http://picrust.sourceforge.net • Various methods of ancestral state reconstruction and confidence weighting are still being evaluated. • Evaluation of PI-CRUST on other paired metagenomic and 16S datasets is underway. Acknowledgements PI-CRUST predicted abundance based on 16S data • MGIL is the recipient of an IHMC travel award funded by the NIH. Each point represents the predicted vs. observed relative abundance for a single KEGG category • MGIL and RGB are supported by a CIHR emerging team grant.