SlideShare a Scribd company logo
1 of 33
Download to read offline
Introduction   MEGAN           Metadata         Pooling Datasets     Summary & Conclusion




           Pooling metagenomes in MEGAN based on
                   environmental parameters

                       Hans-Joachim Ruscheweyh

                   Center for Bioinformatics, Tuebingen University


                                 June 15, 2011




1 / 27            Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction          MEGAN           Metadata        Pooling Datasets   Summary & Conclusion


         1     Introduction Metagenomics
                  Unculturable Microbes
                  Typical Metagenomic Samples
                  Pipeline
         2     MEGAN
                  MEGAN Introduction
                  Taxonomic & Functional Analysis
                  Comparison Analysis
                  PostgreSQL
         3     Metadata
                  What is Metadata?
                  Using Metadata to pool Datasets
         4     Pooling Datasets
                  Basic Idea
                  Combined Datasets
                  MetaData Analyzer
         5     Summary & Conclusion
2 / 27                   Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction          MEGAN           Metadata        Pooling Datasets   Summary & Conclusion


         1     Introduction Metagenomics
                  Unculturable Microbes
                  Typical Metagenomic Samples
                  Pipeline
         2     MEGAN
                  MEGAN Introduction
                  Taxonomic & Functional Analysis
                  Comparison Analysis
                  PostgreSQL
         3     Metadata
                  What is Metadata?
                  Using Metadata to pool Datasets
         4     Pooling Datasets
                  Basic Idea
                  Combined Datasets
                  MetaData Analyzer
         5     Summary & Conclusion
3 / 27                   Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction         MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Metagenomics




               The study of DNA of uncultured organisms
               > 99% of all microbes cannot be cultured
               A genome is the entire genetic information of a single
               organism
               A metagenome is the entire genetic information of a
               assemblage of organisms




4 / 27                  Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction        MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Typical Metagenomic Samples



               Human microbiome
               Soil samples
               Sea water samples
               Seabed samples
               Air samples
               Medical samples
               Ancient bones




5 / 27                 Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction               MEGAN                Metadata      Pooling Datasets   Summary & Conclusion



 Metagenomic Pipeline




         A primer on metagenomics; Wooley et al. (2010)

6 / 27                          Hans-Joachim Ruscheweyh    Pooling metagenomes
Introduction          MEGAN           Metadata        Pooling Datasets   Summary & Conclusion


         1     Introduction Metagenomics
                  Unculturable Microbes
                  Typical Metagenomic Samples
                  Pipeline
         2     MEGAN
                  MEGAN Introduction
                  Taxonomic & Functional Analysis
                  Comparison Analysis
                  PostgreSQL
         3     Metadata
                  What is Metadata?
                  Using Metadata to pool Datasets
         4     Pooling Datasets
                  Basic Idea
                  Combined Datasets
                  MetaData Analyzer
         5     Summary & Conclusion
7 / 27                   Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction             MEGAN              Metadata            Pooling Datasets           Summary & Conclusion



 MEGAN Introduction




         Interactive tool for metagenomic analysis - www-ab.informatik.uni-tuebingen.de/software/megan

8 / 27                        Hans-Joachim Ruscheweyh       Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets    Summary & Conclusion



 Taxonomic Analysis

                                                      Tree reflects the
                                                      NCBI taxonomy
                                                      Reads are
                                                      compared against
                                                      reference
                                                      database e.g. NR
                                                      Reads are
                                                      mapped on the
                                                      tree using the
                                                      comparison
                                                      results based on
                                                      the LCA algorithm



9 / 27            Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Functional Analysis - SEED


                                                      The tree contains
                                                      the nodes of the
                                                      SEED
                                                      classification
                                                      Reads are
                                                      mapped on to the
                                                      SEED
                                                      classification


                                               www.theSEED.org




10 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets             Summary & Conclusion



 Functional Analysis - KEGG




                                                        KEGG: Kanehisa et al., Nucleic
                                                          Acids Res. 38, D355-D360
                                                                    (2010)
                                                         http://www.genome.jp/kegg/




11 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets      Summary & Conclusion



 Comparing Datasets


                                                      Based on
                                                      (normalized)
                                                      number of reads
                                                      assigned to each
                                                      node
                                                      Each color
                                                      determines a
                                                      dataset




12 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction         MEGAN           Metadata        Pooling Datasets              Summary & Conclusion



 DB Extension - PostgreSQL



               MEGAN communicates with a
               PostgreSQL database
               Many datasets are available in
               one database instance
               Many users can operate on
               the same database instance
               This avoids redundancy on
               often large datasets
                                                            http://www.postgresql.org/




13 / 27                 Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction          MEGAN           Metadata        Pooling Datasets   Summary & Conclusion


          1    Introduction Metagenomics
                  Unculturable Microbes
                  Typical Metagenomic Samples
                  Pipeline
          2    MEGAN
                  MEGAN Introduction
                  Taxonomic & Functional Analysis
                  Comparison Analysis
                  PostgreSQL
          3    Metadata
                  What is Metadata?
                  Using Metadata to pool Datasets
          4    Pooling Datasets
                  Basic Idea
                  Combined Datasets
                  MetaData Analyzer
          5    Summary & Conclusion
14 / 27                  Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction                 MEGAN                  Metadata                Pooling Datasets                Summary & Conclusion



 What is Metadata?


          Metadata are for example environmental parameters recorded
          together with the actual metagenomic sample e.g. collection
          date, gender, health status, ...

                                                          Month             Salinity          Ammonia
                         January_2PM                     January             33.3               0.0
                        January_10PM                     January             34.2               0.0
                         August_4AM                      August              33.3               0.14
                        August_10AM                      August              32.1               0.06
          Datasets taken from: The taxonomic and functional diversity of microbes at a temperate coastal site: a ’multi-omic’
          study of the seasonal and diel temporal variation; Gilbert et al. (2010)




15 / 27                            Hans-Joachim Ruscheweyh             Pooling metagenomes
Introduction     MEGAN           Metadata        Pooling Datasets     Summary & Conclusion




                             Month ∈ {Dec, Jan, Feb}
          January_2PM
                                                                    Winter
          January_10PM

                             Month ∈ {Jun,Jul, Aug}
          August_4AM
                                                                    Summer
          August_10AM




16 / 27             Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction          MEGAN           Metadata        Pooling Datasets   Summary & Conclusion


          1    Introduction Metagenomics
                  Unculturable Microbes
                  Typical Metagenomic Samples
                  Pipeline
          2    MEGAN
                  MEGAN Introduction
                  Taxonomic & Functional Analysis
                  Comparison Analysis
                  PostgreSQL
          3    Metadata
                  What is Metadata?
                  Using Metadata to pool Datasets
          4    Pooling Datasets
                  Basic Idea
                  Combined Datasets
                  MetaData Analyzer
          5    Summary & Conclusion
17 / 27                  Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction           MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Basic Idea



               Create two new datasets (winter, summer) from the four
               BLAST files
               Problems:
                   Doubles space consumption
                   Is time inefficient
               Idea:
                   Use database technology to avoid redundancy, save time
                   and space




18 / 27                   Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction         MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Primary & Combined Datasets in the Database



               A primary dataset is a dataset created from the original
               BLAST output and the reads file
               A combined dataset is created from primary datasets
               A combined dataset is created by using:
                   References to read and match data of the primary datasets
                   Optionally also the classification data of the primary
                   datasets
               Hence, a combined dataset can be created time and space
               efficiently




19 / 27                 Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Creating Combined Datasets in MEGAN




20 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Creating Combined Datasets in MEGAN




20 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Creating Combined Datasets in MEGAN




20 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Creating Combined Datasets in MEGAN




20 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Creating Combined Datasets in MEGAN




20 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Creating Combined Datasets in MEGAN




20 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction         MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Analysis


               Input: 8 primary datasets. Altogether ~100,000 reads, ~4
               mio matches, ~4.5 GB space
               It takes ~50 minutes to load these datasets to the database
               Three combined datasets (winter, spring, summer) are
               created
               Their creation takes ~30 seconds and needs ~40MB
               additional space
               Alternatively combined datasets can be created on-the-fly.
               This takes less than a second and needs no additional
               space



21 / 27                 Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Comparing all Datasets




22 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Comparing by Season




23 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets   Summary & Conclusion




24 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets   Summary & Conclusion




24 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction          MEGAN           Metadata        Pooling Datasets   Summary & Conclusion


          1    Introduction Metagenomics
                  Unculturable Microbes
                  Typical Metagenomic Samples
                  Pipeline
          2    MEGAN
                  MEGAN Introduction
                  Taxonomic & Functional Analysis
                  Comparison Analysis
                  PostgreSQL
          3    Metadata
                  What is Metadata?
                  Using Metadata to pool Datasets
          4    Pooling Datasets
                  Basic Idea
                  Combined Datasets
                  MetaData Analyzer
          5    Summary & Conclusion
25 / 27                  Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction         MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Summary & Conclusion


               MEGAN communicates with a PostgreSQL database
               This gives the user access to many datasets
               Many user can work on the database simultaneously
               Primary datasets can be pooled to create combined
               datasets
               The MetaData Analyzer allows one to create combined
               datasets based on the usage of boolean expressions on
               assigned metadata
               This technique is highly space and time efficient




26 / 27                 Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction        MEGAN           Metadata        Pooling Datasets   Summary & Conclusion




               MEGAN v4 is freely available from www-ab.
               informatik.uni-tuebingen.de/software/megan
               Integrative analysis of environmental sequences using
               MEGAN4, Daniel H. Huson, Suparna Mitra, Hans-Joachim
               Ruscheweyh, Nico Weber, Stephan C. Schuster; submitted
               2011
               Thanks go to Daniel Huson, Suparna Mitra, Nico Weber,
               Stefan Schuster




          Thank your for your attention!
27 / 27                Hans-Joachim Ruscheweyh   Pooling metagenomes

More Related Content

Similar to Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Integrating Public and Private Data: Lessons Learned from Unison
Integrating Public and Private Data: Lessons Learned from UnisonIntegrating Public and Private Data: Lessons Learned from Unison
Integrating Public and Private Data: Lessons Learned from UnisonReece Hart
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Keesthehyve
 
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDB
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDBMetabolic pathway mapping against KEGG, Reactome, HMDB and CPDB
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDBDinesh Barupal
 
Metabolic network visualization - concepts
Metabolic network visualization - conceptsMetabolic network visualization - concepts
Metabolic network visualization - conceptsDinesh Barupal
 
International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...CSCJournals
 
NetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-vizNetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-vizAlexander Pico
 
Bm Systems Scientific Epa Conference Heuristic Mathematic Concepts Synergies ...
Bm Systems Scientific Epa Conference Heuristic Mathematic Concepts Synergies ...Bm Systems Scientific Epa Conference Heuristic Mathematic Concepts Synergies ...
Bm Systems Scientific Epa Conference Heuristic Mathematic Concepts Synergies ...Manuel GEA - Bio-Modeling Systems
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsDmitry Grapov
 
Softwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisSoftwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisPrasanthperceptron
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuKAUSHAL SAHU
 
Protein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonProtein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonNatalio Krasnogor
 
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015Mathew Varghese
 

Similar to Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters (20)

Abstract kita
Abstract kitaAbstract kita
Abstract kita
 
Integrating Public and Private Data: Lessons Learned from Unison
Integrating Public and Private Data: Lessons Learned from UnisonIntegrating Public and Private Data: Lessons Learned from Unison
Integrating Public and Private Data: Lessons Learned from Unison
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Kees
 
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDB
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDBMetabolic pathway mapping against KEGG, Reactome, HMDB and CPDB
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDB
 
Chemistry Online and The vision and challenges associated with building the c...
Chemistry Online and The vision and challenges associated with building the c...Chemistry Online and The vision and challenges associated with building the c...
Chemistry Online and The vision and challenges associated with building the c...
 
Metabolic network visualization - concepts
Metabolic network visualization - conceptsMetabolic network visualization - concepts
Metabolic network visualization - concepts
 
International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...
 
NetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-vizNetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-viz
 
Bm Systems Scientific Epa Conference Heuristic Mathematic Concepts Synergies ...
Bm Systems Scientific Epa Conference Heuristic Mathematic Concepts Synergies ...Bm Systems Scientific Epa Conference Heuristic Mathematic Concepts Synergies ...
Bm Systems Scientific Epa Conference Heuristic Mathematic Concepts Synergies ...
 
B.3.5
B.3.5B.3.5
B.3.5
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization tools
 
Softwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisSoftwares For Phylogentic Analysis
Softwares For Phylogentic Analysis
 
Metabolomics Data Analysis
Metabolomics Data AnalysisMetabolomics Data Analysis
Metabolomics Data Analysis
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
Jm200026b
Jm200026bJm200026b
Jm200026b
 
Gene Expression Lab Summary
Gene Expression Lab SummaryGene Expression Lab Summary
Gene Expression Lab Summary
 
Protein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonProtein Structure Alignment and Comparison
Protein Structure Alignment and Comparison
 
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
 
presentation
presentationpresentation
presentation
 
Data handling metabolomics
Data handling metabolomicsData handling metabolomics
Data handling metabolomics
 

More from GigaScience, BGI Hong Kong

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...GigaScience, BGI Hong Kong
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteGigaScience, BGI Hong Kong
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...GigaScience, BGI Hong Kong
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...GigaScience, BGI Hong Kong
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...GigaScience, BGI Hong Kong
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...GigaScience, BGI Hong Kong
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...GigaScience, BGI Hong Kong
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixGigaScience, BGI Hong Kong
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserGigaScience, BGI Hong Kong
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...GigaScience, BGI Hong Kong
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceGigaScience, BGI Hong Kong
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...GigaScience, BGI Hong Kong
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...GigaScience, BGI Hong Kong
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveGigaScience, BGI Hong Kong
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...GigaScience, BGI Hong Kong
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...GigaScience, BGI Hong Kong
 

More from GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 

Recently uploaded

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

  • 1. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Pooling metagenomes in MEGAN based on environmental parameters Hans-Joachim Ruscheweyh Center for Bioinformatics, Tuebingen University June 15, 2011 1 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 2. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion 1 Introduction Metagenomics Unculturable Microbes Typical Metagenomic Samples Pipeline 2 MEGAN MEGAN Introduction Taxonomic & Functional Analysis Comparison Analysis PostgreSQL 3 Metadata What is Metadata? Using Metadata to pool Datasets 4 Pooling Datasets Basic Idea Combined Datasets MetaData Analyzer 5 Summary & Conclusion 2 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 3. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion 1 Introduction Metagenomics Unculturable Microbes Typical Metagenomic Samples Pipeline 2 MEGAN MEGAN Introduction Taxonomic & Functional Analysis Comparison Analysis PostgreSQL 3 Metadata What is Metadata? Using Metadata to pool Datasets 4 Pooling Datasets Basic Idea Combined Datasets MetaData Analyzer 5 Summary & Conclusion 3 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 4. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Metagenomics The study of DNA of uncultured organisms > 99% of all microbes cannot be cultured A genome is the entire genetic information of a single organism A metagenome is the entire genetic information of a assemblage of organisms 4 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 5. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Typical Metagenomic Samples Human microbiome Soil samples Sea water samples Seabed samples Air samples Medical samples Ancient bones 5 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 6. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Metagenomic Pipeline A primer on metagenomics; Wooley et al. (2010) 6 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 7. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion 1 Introduction Metagenomics Unculturable Microbes Typical Metagenomic Samples Pipeline 2 MEGAN MEGAN Introduction Taxonomic & Functional Analysis Comparison Analysis PostgreSQL 3 Metadata What is Metadata? Using Metadata to pool Datasets 4 Pooling Datasets Basic Idea Combined Datasets MetaData Analyzer 5 Summary & Conclusion 7 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 8. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion MEGAN Introduction Interactive tool for metagenomic analysis - www-ab.informatik.uni-tuebingen.de/software/megan 8 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 9. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Taxonomic Analysis Tree reflects the NCBI taxonomy Reads are compared against reference database e.g. NR Reads are mapped on the tree using the comparison results based on the LCA algorithm 9 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 10. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Functional Analysis - SEED The tree contains the nodes of the SEED classification Reads are mapped on to the SEED classification www.theSEED.org 10 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 11. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Functional Analysis - KEGG KEGG: Kanehisa et al., Nucleic Acids Res. 38, D355-D360 (2010) http://www.genome.jp/kegg/ 11 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 12. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Comparing Datasets Based on (normalized) number of reads assigned to each node Each color determines a dataset 12 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 13. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion DB Extension - PostgreSQL MEGAN communicates with a PostgreSQL database Many datasets are available in one database instance Many users can operate on the same database instance This avoids redundancy on often large datasets http://www.postgresql.org/ 13 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 14. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion 1 Introduction Metagenomics Unculturable Microbes Typical Metagenomic Samples Pipeline 2 MEGAN MEGAN Introduction Taxonomic & Functional Analysis Comparison Analysis PostgreSQL 3 Metadata What is Metadata? Using Metadata to pool Datasets 4 Pooling Datasets Basic Idea Combined Datasets MetaData Analyzer 5 Summary & Conclusion 14 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 15. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion What is Metadata? Metadata are for example environmental parameters recorded together with the actual metagenomic sample e.g. collection date, gender, health status, ... Month Salinity Ammonia January_2PM January 33.3 0.0 January_10PM January 34.2 0.0 August_4AM August 33.3 0.14 August_10AM August 32.1 0.06 Datasets taken from: The taxonomic and functional diversity of microbes at a temperate coastal site: a ’multi-omic’ study of the seasonal and diel temporal variation; Gilbert et al. (2010) 15 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 16. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Month ∈ {Dec, Jan, Feb} January_2PM Winter January_10PM Month ∈ {Jun,Jul, Aug} August_4AM Summer August_10AM 16 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 17. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion 1 Introduction Metagenomics Unculturable Microbes Typical Metagenomic Samples Pipeline 2 MEGAN MEGAN Introduction Taxonomic & Functional Analysis Comparison Analysis PostgreSQL 3 Metadata What is Metadata? Using Metadata to pool Datasets 4 Pooling Datasets Basic Idea Combined Datasets MetaData Analyzer 5 Summary & Conclusion 17 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 18. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Basic Idea Create two new datasets (winter, summer) from the four BLAST files Problems: Doubles space consumption Is time inefficient Idea: Use database technology to avoid redundancy, save time and space 18 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 19. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Primary & Combined Datasets in the Database A primary dataset is a dataset created from the original BLAST output and the reads file A combined dataset is created from primary datasets A combined dataset is created by using: References to read and match data of the primary datasets Optionally also the classification data of the primary datasets Hence, a combined dataset can be created time and space efficiently 19 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 20. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Creating Combined Datasets in MEGAN 20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 21. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Creating Combined Datasets in MEGAN 20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 22. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Creating Combined Datasets in MEGAN 20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 23. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Creating Combined Datasets in MEGAN 20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 24. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Creating Combined Datasets in MEGAN 20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 25. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Creating Combined Datasets in MEGAN 20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 26. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Analysis Input: 8 primary datasets. Altogether ~100,000 reads, ~4 mio matches, ~4.5 GB space It takes ~50 minutes to load these datasets to the database Three combined datasets (winter, spring, summer) are created Their creation takes ~30 seconds and needs ~40MB additional space Alternatively combined datasets can be created on-the-fly. This takes less than a second and needs no additional space 21 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 27. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Comparing all Datasets 22 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 28. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Comparing by Season 23 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 29. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion 24 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 30. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion 24 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 31. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion 1 Introduction Metagenomics Unculturable Microbes Typical Metagenomic Samples Pipeline 2 MEGAN MEGAN Introduction Taxonomic & Functional Analysis Comparison Analysis PostgreSQL 3 Metadata What is Metadata? Using Metadata to pool Datasets 4 Pooling Datasets Basic Idea Combined Datasets MetaData Analyzer 5 Summary & Conclusion 25 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 32. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Summary & Conclusion MEGAN communicates with a PostgreSQL database This gives the user access to many datasets Many user can work on the database simultaneously Primary datasets can be pooled to create combined datasets The MetaData Analyzer allows one to create combined datasets based on the usage of boolean expressions on assigned metadata This technique is highly space and time efficient 26 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 33. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion MEGAN v4 is freely available from www-ab. informatik.uni-tuebingen.de/software/megan Integrative analysis of environmental sequences using MEGAN4, Daniel H. Huson, Suparna Mitra, Hans-Joachim Ruscheweyh, Nico Weber, Stephan C. Schuster; submitted 2011 Thanks go to Daniel Huson, Suparna Mitra, Nico Weber, Stefan Schuster Thank your for your attention! 27 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes