• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters
 

Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

on

  • 1,022 views

Hans-Joachim Ruscheweyh's talk from the 1st Earth Microbiome Project meeting in Shenzhen

Hans-Joachim Ruscheweyh's talk from the 1st Earth Microbiome Project meeting in Shenzhen

Statistics

Views

Total Views
1,022
Views on SlideShare
1,022
Embed Views
0

Actions

Likes
0
Downloads
9
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters Presentation Transcript

    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Pooling metagenomes in MEGAN based on environmental parameters Hans-Joachim Ruscheweyh Center for Bioinformatics, Tuebingen University June 15, 20111 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion 1 Introduction Metagenomics Unculturable Microbes Typical Metagenomic Samples Pipeline 2 MEGAN MEGAN Introduction Taxonomic & Functional Analysis Comparison Analysis PostgreSQL 3 Metadata What is Metadata? Using Metadata to pool Datasets 4 Pooling Datasets Basic Idea Combined Datasets MetaData Analyzer 5 Summary & Conclusion2 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion 1 Introduction Metagenomics Unculturable Microbes Typical Metagenomic Samples Pipeline 2 MEGAN MEGAN Introduction Taxonomic & Functional Analysis Comparison Analysis PostgreSQL 3 Metadata What is Metadata? Using Metadata to pool Datasets 4 Pooling Datasets Basic Idea Combined Datasets MetaData Analyzer 5 Summary & Conclusion3 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Metagenomics The study of DNA of uncultured organisms > 99% of all microbes cannot be cultured A genome is the entire genetic information of a single organism A metagenome is the entire genetic information of a assemblage of organisms4 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Typical Metagenomic Samples Human microbiome Soil samples Sea water samples Seabed samples Air samples Medical samples Ancient bones5 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Metagenomic Pipeline A primer on metagenomics; Wooley et al. (2010)6 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion 1 Introduction Metagenomics Unculturable Microbes Typical Metagenomic Samples Pipeline 2 MEGAN MEGAN Introduction Taxonomic & Functional Analysis Comparison Analysis PostgreSQL 3 Metadata What is Metadata? Using Metadata to pool Datasets 4 Pooling Datasets Basic Idea Combined Datasets MetaData Analyzer 5 Summary & Conclusion7 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion MEGAN Introduction Interactive tool for metagenomic analysis - www-ab.informatik.uni-tuebingen.de/software/megan8 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Taxonomic Analysis Tree reflects the NCBI taxonomy Reads are compared against reference database e.g. NR Reads are mapped on the tree using the comparison results based on the LCA algorithm9 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Functional Analysis - SEED The tree contains the nodes of the SEED classification Reads are mapped on to the SEED classification www.theSEED.org10 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Functional Analysis - KEGG KEGG: Kanehisa et al., Nucleic Acids Res. 38, D355-D360 (2010) http://www.genome.jp/kegg/11 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Comparing Datasets Based on (normalized) number of reads assigned to each node Each color determines a dataset12 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion DB Extension - PostgreSQL MEGAN communicates with a PostgreSQL database Many datasets are available in one database instance Many users can operate on the same database instance This avoids redundancy on often large datasets http://www.postgresql.org/13 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion 1 Introduction Metagenomics Unculturable Microbes Typical Metagenomic Samples Pipeline 2 MEGAN MEGAN Introduction Taxonomic & Functional Analysis Comparison Analysis PostgreSQL 3 Metadata What is Metadata? Using Metadata to pool Datasets 4 Pooling Datasets Basic Idea Combined Datasets MetaData Analyzer 5 Summary & Conclusion14 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion What is Metadata? Metadata are for example environmental parameters recorded together with the actual metagenomic sample e.g. collection date, gender, health status, ... Month Salinity Ammonia January_2PM January 33.3 0.0 January_10PM January 34.2 0.0 August_4AM August 33.3 0.14 August_10AM August 32.1 0.06 Datasets taken from: The taxonomic and functional diversity of microbes at a temperate coastal site: a ’multi-omic’ study of the seasonal and diel temporal variation; Gilbert et al. (2010)15 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Month ∈ {Dec, Jan, Feb} January_2PM Winter January_10PM Month ∈ {Jun,Jul, Aug} August_4AM Summer August_10AM16 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion 1 Introduction Metagenomics Unculturable Microbes Typical Metagenomic Samples Pipeline 2 MEGAN MEGAN Introduction Taxonomic & Functional Analysis Comparison Analysis PostgreSQL 3 Metadata What is Metadata? Using Metadata to pool Datasets 4 Pooling Datasets Basic Idea Combined Datasets MetaData Analyzer 5 Summary & Conclusion17 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Basic Idea Create two new datasets (winter, summer) from the four BLAST files Problems: Doubles space consumption Is time inefficient Idea: Use database technology to avoid redundancy, save time and space18 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Primary & Combined Datasets in the Database A primary dataset is a dataset created from the original BLAST output and the reads file A combined dataset is created from primary datasets A combined dataset is created by using: References to read and match data of the primary datasets Optionally also the classification data of the primary datasets Hence, a combined dataset can be created time and space efficiently19 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Creating Combined Datasets in MEGAN20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Creating Combined Datasets in MEGAN20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Creating Combined Datasets in MEGAN20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Creating Combined Datasets in MEGAN20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Creating Combined Datasets in MEGAN20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Creating Combined Datasets in MEGAN20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Analysis Input: 8 primary datasets. Altogether ~100,000 reads, ~4 mio matches, ~4.5 GB space It takes ~50 minutes to load these datasets to the database Three combined datasets (winter, spring, summer) are created Their creation takes ~30 seconds and needs ~40MB additional space Alternatively combined datasets can be created on-the-fly. This takes less than a second and needs no additional space21 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Comparing all Datasets22 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Comparing by Season23 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion24 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion24 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion 1 Introduction Metagenomics Unculturable Microbes Typical Metagenomic Samples Pipeline 2 MEGAN MEGAN Introduction Taxonomic & Functional Analysis Comparison Analysis PostgreSQL 3 Metadata What is Metadata? Using Metadata to pool Datasets 4 Pooling Datasets Basic Idea Combined Datasets MetaData Analyzer 5 Summary & Conclusion25 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Summary & Conclusion MEGAN communicates with a PostgreSQL database This gives the user access to many datasets Many user can work on the database simultaneously Primary datasets can be pooled to create combined datasets The MetaData Analyzer allows one to create combined datasets based on the usage of boolean expressions on assigned metadata This technique is highly space and time efficient26 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
    • Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion MEGAN v4 is freely available from www-ab. informatik.uni-tuebingen.de/software/megan Integrative analysis of environmental sequences using MEGAN4, Daniel H. Huson, Suparna Mitra, Hans-Joachim Ruscheweyh, Nico Weber, Stephan C. Schuster; submitted 2011 Thanks go to Daniel Huson, Suparna Mitra, Nico Weber, Stefan Schuster Thank your for your attention!27 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes