SlideShare a Scribd company logo
1 of 33
Download to read offline
Introduction   MEGAN           Metadata         Pooling Datasets     Summary & Conclusion




           Pooling metagenomes in MEGAN based on
                   environmental parameters

                       Hans-Joachim Ruscheweyh

                   Center for Bioinformatics, Tuebingen University


                                 June 15, 2011




1 / 27            Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction          MEGAN           Metadata        Pooling Datasets   Summary & Conclusion


         1     Introduction Metagenomics
                  Unculturable Microbes
                  Typical Metagenomic Samples
                  Pipeline
         2     MEGAN
                  MEGAN Introduction
                  Taxonomic & Functional Analysis
                  Comparison Analysis
                  PostgreSQL
         3     Metadata
                  What is Metadata?
                  Using Metadata to pool Datasets
         4     Pooling Datasets
                  Basic Idea
                  Combined Datasets
                  MetaData Analyzer
         5     Summary & Conclusion
2 / 27                   Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction          MEGAN           Metadata        Pooling Datasets   Summary & Conclusion


         1     Introduction Metagenomics
                  Unculturable Microbes
                  Typical Metagenomic Samples
                  Pipeline
         2     MEGAN
                  MEGAN Introduction
                  Taxonomic & Functional Analysis
                  Comparison Analysis
                  PostgreSQL
         3     Metadata
                  What is Metadata?
                  Using Metadata to pool Datasets
         4     Pooling Datasets
                  Basic Idea
                  Combined Datasets
                  MetaData Analyzer
         5     Summary & Conclusion
3 / 27                   Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction         MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Metagenomics




               The study of DNA of uncultured organisms
               > 99% of all microbes cannot be cultured
               A genome is the entire genetic information of a single
               organism
               A metagenome is the entire genetic information of a
               assemblage of organisms




4 / 27                  Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction        MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Typical Metagenomic Samples



               Human microbiome
               Soil samples
               Sea water samples
               Seabed samples
               Air samples
               Medical samples
               Ancient bones




5 / 27                 Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction               MEGAN                Metadata      Pooling Datasets   Summary & Conclusion



 Metagenomic Pipeline




         A primer on metagenomics; Wooley et al. (2010)

6 / 27                          Hans-Joachim Ruscheweyh    Pooling metagenomes
Introduction          MEGAN           Metadata        Pooling Datasets   Summary & Conclusion


         1     Introduction Metagenomics
                  Unculturable Microbes
                  Typical Metagenomic Samples
                  Pipeline
         2     MEGAN
                  MEGAN Introduction
                  Taxonomic & Functional Analysis
                  Comparison Analysis
                  PostgreSQL
         3     Metadata
                  What is Metadata?
                  Using Metadata to pool Datasets
         4     Pooling Datasets
                  Basic Idea
                  Combined Datasets
                  MetaData Analyzer
         5     Summary & Conclusion
7 / 27                   Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction             MEGAN              Metadata            Pooling Datasets           Summary & Conclusion



 MEGAN Introduction




         Interactive tool for metagenomic analysis - www-ab.informatik.uni-tuebingen.de/software/megan

8 / 27                        Hans-Joachim Ruscheweyh       Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets    Summary & Conclusion



 Taxonomic Analysis

                                                      Tree reflects the
                                                      NCBI taxonomy
                                                      Reads are
                                                      compared against
                                                      reference
                                                      database e.g. NR
                                                      Reads are
                                                      mapped on the
                                                      tree using the
                                                      comparison
                                                      results based on
                                                      the LCA algorithm



9 / 27            Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Functional Analysis - SEED


                                                      The tree contains
                                                      the nodes of the
                                                      SEED
                                                      classification
                                                      Reads are
                                                      mapped on to the
                                                      SEED
                                                      classification


                                               www.theSEED.org




10 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets             Summary & Conclusion



 Functional Analysis - KEGG




                                                        KEGG: Kanehisa et al., Nucleic
                                                          Acids Res. 38, D355-D360
                                                                    (2010)
                                                         http://www.genome.jp/kegg/




11 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets      Summary & Conclusion



 Comparing Datasets


                                                      Based on
                                                      (normalized)
                                                      number of reads
                                                      assigned to each
                                                      node
                                                      Each color
                                                      determines a
                                                      dataset




12 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction         MEGAN           Metadata        Pooling Datasets              Summary & Conclusion



 DB Extension - PostgreSQL



               MEGAN communicates with a
               PostgreSQL database
               Many datasets are available in
               one database instance
               Many users can operate on
               the same database instance
               This avoids redundancy on
               often large datasets
                                                            http://www.postgresql.org/




13 / 27                 Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction          MEGAN           Metadata        Pooling Datasets   Summary & Conclusion


          1    Introduction Metagenomics
                  Unculturable Microbes
                  Typical Metagenomic Samples
                  Pipeline
          2    MEGAN
                  MEGAN Introduction
                  Taxonomic & Functional Analysis
                  Comparison Analysis
                  PostgreSQL
          3    Metadata
                  What is Metadata?
                  Using Metadata to pool Datasets
          4    Pooling Datasets
                  Basic Idea
                  Combined Datasets
                  MetaData Analyzer
          5    Summary & Conclusion
14 / 27                  Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction                 MEGAN                  Metadata                Pooling Datasets                Summary & Conclusion



 What is Metadata?


          Metadata are for example environmental parameters recorded
          together with the actual metagenomic sample e.g. collection
          date, gender, health status, ...

                                                          Month             Salinity          Ammonia
                         January_2PM                     January             33.3               0.0
                        January_10PM                     January             34.2               0.0
                         August_4AM                      August              33.3               0.14
                        August_10AM                      August              32.1               0.06
          Datasets taken from: The taxonomic and functional diversity of microbes at a temperate coastal site: a ’multi-omic’
          study of the seasonal and diel temporal variation; Gilbert et al. (2010)




15 / 27                            Hans-Joachim Ruscheweyh             Pooling metagenomes
Introduction     MEGAN           Metadata        Pooling Datasets     Summary & Conclusion




                             Month ∈ {Dec, Jan, Feb}
          January_2PM
                                                                    Winter
          January_10PM

                             Month ∈ {Jun,Jul, Aug}
          August_4AM
                                                                    Summer
          August_10AM




16 / 27             Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction          MEGAN           Metadata        Pooling Datasets   Summary & Conclusion


          1    Introduction Metagenomics
                  Unculturable Microbes
                  Typical Metagenomic Samples
                  Pipeline
          2    MEGAN
                  MEGAN Introduction
                  Taxonomic & Functional Analysis
                  Comparison Analysis
                  PostgreSQL
          3    Metadata
                  What is Metadata?
                  Using Metadata to pool Datasets
          4    Pooling Datasets
                  Basic Idea
                  Combined Datasets
                  MetaData Analyzer
          5    Summary & Conclusion
17 / 27                  Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction           MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Basic Idea



               Create two new datasets (winter, summer) from the four
               BLAST files
               Problems:
                   Doubles space consumption
                   Is time inefficient
               Idea:
                   Use database technology to avoid redundancy, save time
                   and space




18 / 27                   Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction         MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Primary & Combined Datasets in the Database



               A primary dataset is a dataset created from the original
               BLAST output and the reads file
               A combined dataset is created from primary datasets
               A combined dataset is created by using:
                   References to read and match data of the primary datasets
                   Optionally also the classification data of the primary
                   datasets
               Hence, a combined dataset can be created time and space
               efficiently




19 / 27                 Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Creating Combined Datasets in MEGAN




20 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Creating Combined Datasets in MEGAN




20 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Creating Combined Datasets in MEGAN




20 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Creating Combined Datasets in MEGAN




20 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Creating Combined Datasets in MEGAN




20 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Creating Combined Datasets in MEGAN




20 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction         MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Analysis


               Input: 8 primary datasets. Altogether ~100,000 reads, ~4
               mio matches, ~4.5 GB space
               It takes ~50 minutes to load these datasets to the database
               Three combined datasets (winter, spring, summer) are
               created
               Their creation takes ~30 seconds and needs ~40MB
               additional space
               Alternatively combined datasets can be created on-the-fly.
               This takes less than a second and needs no additional
               space



21 / 27                 Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Comparing all Datasets




22 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Comparing by Season




23 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets   Summary & Conclusion




24 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction   MEGAN           Metadata        Pooling Datasets   Summary & Conclusion




24 / 27           Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction          MEGAN           Metadata        Pooling Datasets   Summary & Conclusion


          1    Introduction Metagenomics
                  Unculturable Microbes
                  Typical Metagenomic Samples
                  Pipeline
          2    MEGAN
                  MEGAN Introduction
                  Taxonomic & Functional Analysis
                  Comparison Analysis
                  PostgreSQL
          3    Metadata
                  What is Metadata?
                  Using Metadata to pool Datasets
          4    Pooling Datasets
                  Basic Idea
                  Combined Datasets
                  MetaData Analyzer
          5    Summary & Conclusion
25 / 27                  Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction         MEGAN           Metadata        Pooling Datasets   Summary & Conclusion



 Summary & Conclusion


               MEGAN communicates with a PostgreSQL database
               This gives the user access to many datasets
               Many user can work on the database simultaneously
               Primary datasets can be pooled to create combined
               datasets
               The MetaData Analyzer allows one to create combined
               datasets based on the usage of boolean expressions on
               assigned metadata
               This technique is highly space and time efficient




26 / 27                 Hans-Joachim Ruscheweyh   Pooling metagenomes
Introduction        MEGAN           Metadata        Pooling Datasets   Summary & Conclusion




               MEGAN v4 is freely available from www-ab.
               informatik.uni-tuebingen.de/software/megan
               Integrative analysis of environmental sequences using
               MEGAN4, Daniel H. Huson, Suparna Mitra, Hans-Joachim
               Ruscheweyh, Nico Weber, Stephan C. Schuster; submitted
               2011
               Thanks go to Daniel Huson, Suparna Mitra, Nico Weber,
               Stefan Schuster




          Thank your for your attention!
27 / 27                Hans-Joachim Ruscheweyh   Pooling metagenomes

More Related Content

Similar to Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

Integrating Public and Private Data: Lessons Learned from Unison
Integrating Public and Private Data: Lessons Learned from UnisonIntegrating Public and Private Data: Lessons Learned from Unison
Integrating Public and Private Data: Lessons Learned from Unison
Reece Hart
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Kees
thehyve
 
Chemistry Online and The vision and challenges associated with building the c...
Chemistry Online and The vision and challenges associated with building the c...Chemistry Online and The vision and challenges associated with building the c...
Chemistry Online and The vision and challenges associated with building the c...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...
CSCJournals
 
NetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-vizNetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-viz
Alexander Pico
 
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Mathew Varghese
 

Similar to Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters (20)

Abstract kita
Abstract kitaAbstract kita
Abstract kita
 
Integrating Public and Private Data: Lessons Learned from Unison
Integrating Public and Private Data: Lessons Learned from UnisonIntegrating Public and Private Data: Lessons Learned from Unison
Integrating Public and Private Data: Lessons Learned from Unison
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Kees
 
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDB
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDBMetabolic pathway mapping against KEGG, Reactome, HMDB and CPDB
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDB
 
Chemistry Online and The vision and challenges associated with building the c...
Chemistry Online and The vision and challenges associated with building the c...Chemistry Online and The vision and challenges associated with building the c...
Chemistry Online and The vision and challenges associated with building the c...
 
Metabolic network visualization - concepts
Metabolic network visualization - conceptsMetabolic network visualization - concepts
Metabolic network visualization - concepts
 
International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...
 
NetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-vizNetBioSIG2012 anyatsalenko-en-viz
NetBioSIG2012 anyatsalenko-en-viz
 
Bm Systems Scientific Epa Conference Heuristic Mathematic Concepts Synergies ...
Bm Systems Scientific Epa Conference Heuristic Mathematic Concepts Synergies ...Bm Systems Scientific Epa Conference Heuristic Mathematic Concepts Synergies ...
Bm Systems Scientific Epa Conference Heuristic Mathematic Concepts Synergies ...
 
B.3.5
B.3.5B.3.5
B.3.5
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization tools
 
Softwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisSoftwares For Phylogentic Analysis
Softwares For Phylogentic Analysis
 
Metabolomics Data Analysis
Metabolomics Data AnalysisMetabolomics Data Analysis
Metabolomics Data Analysis
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
Jm200026b
Jm200026bJm200026b
Jm200026b
 
Gene Expression Lab Summary
Gene Expression Lab SummaryGene Expression Lab Summary
Gene Expression Lab Summary
 
Protein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonProtein Structure Alignment and Comparison
Protein Structure Alignment and Comparison
 
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
 
presentation
presentationpresentation
presentation
 
Data handling metabolomics
Data handling metabolomicsData handling metabolomics
Data handling metabolomics
 

More from GigaScience, BGI Hong Kong

More from GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 

Recently uploaded

Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 

Recently uploaded (20)

Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 

Hans-Joachim Ruscheweyh: Pooling Metagenomes in MEGAN Based on Environmental Parameters

  • 1. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Pooling metagenomes in MEGAN based on environmental parameters Hans-Joachim Ruscheweyh Center for Bioinformatics, Tuebingen University June 15, 2011 1 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 2. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion 1 Introduction Metagenomics Unculturable Microbes Typical Metagenomic Samples Pipeline 2 MEGAN MEGAN Introduction Taxonomic & Functional Analysis Comparison Analysis PostgreSQL 3 Metadata What is Metadata? Using Metadata to pool Datasets 4 Pooling Datasets Basic Idea Combined Datasets MetaData Analyzer 5 Summary & Conclusion 2 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 3. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion 1 Introduction Metagenomics Unculturable Microbes Typical Metagenomic Samples Pipeline 2 MEGAN MEGAN Introduction Taxonomic & Functional Analysis Comparison Analysis PostgreSQL 3 Metadata What is Metadata? Using Metadata to pool Datasets 4 Pooling Datasets Basic Idea Combined Datasets MetaData Analyzer 5 Summary & Conclusion 3 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 4. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Metagenomics The study of DNA of uncultured organisms > 99% of all microbes cannot be cultured A genome is the entire genetic information of a single organism A metagenome is the entire genetic information of a assemblage of organisms 4 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 5. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Typical Metagenomic Samples Human microbiome Soil samples Sea water samples Seabed samples Air samples Medical samples Ancient bones 5 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 6. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Metagenomic Pipeline A primer on metagenomics; Wooley et al. (2010) 6 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 7. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion 1 Introduction Metagenomics Unculturable Microbes Typical Metagenomic Samples Pipeline 2 MEGAN MEGAN Introduction Taxonomic & Functional Analysis Comparison Analysis PostgreSQL 3 Metadata What is Metadata? Using Metadata to pool Datasets 4 Pooling Datasets Basic Idea Combined Datasets MetaData Analyzer 5 Summary & Conclusion 7 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 8. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion MEGAN Introduction Interactive tool for metagenomic analysis - www-ab.informatik.uni-tuebingen.de/software/megan 8 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 9. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Taxonomic Analysis Tree reflects the NCBI taxonomy Reads are compared against reference database e.g. NR Reads are mapped on the tree using the comparison results based on the LCA algorithm 9 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 10. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Functional Analysis - SEED The tree contains the nodes of the SEED classification Reads are mapped on to the SEED classification www.theSEED.org 10 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 11. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Functional Analysis - KEGG KEGG: Kanehisa et al., Nucleic Acids Res. 38, D355-D360 (2010) http://www.genome.jp/kegg/ 11 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 12. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Comparing Datasets Based on (normalized) number of reads assigned to each node Each color determines a dataset 12 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 13. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion DB Extension - PostgreSQL MEGAN communicates with a PostgreSQL database Many datasets are available in one database instance Many users can operate on the same database instance This avoids redundancy on often large datasets http://www.postgresql.org/ 13 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 14. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion 1 Introduction Metagenomics Unculturable Microbes Typical Metagenomic Samples Pipeline 2 MEGAN MEGAN Introduction Taxonomic & Functional Analysis Comparison Analysis PostgreSQL 3 Metadata What is Metadata? Using Metadata to pool Datasets 4 Pooling Datasets Basic Idea Combined Datasets MetaData Analyzer 5 Summary & Conclusion 14 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 15. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion What is Metadata? Metadata are for example environmental parameters recorded together with the actual metagenomic sample e.g. collection date, gender, health status, ... Month Salinity Ammonia January_2PM January 33.3 0.0 January_10PM January 34.2 0.0 August_4AM August 33.3 0.14 August_10AM August 32.1 0.06 Datasets taken from: The taxonomic and functional diversity of microbes at a temperate coastal site: a ’multi-omic’ study of the seasonal and diel temporal variation; Gilbert et al. (2010) 15 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 16. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Month ∈ {Dec, Jan, Feb} January_2PM Winter January_10PM Month ∈ {Jun,Jul, Aug} August_4AM Summer August_10AM 16 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 17. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion 1 Introduction Metagenomics Unculturable Microbes Typical Metagenomic Samples Pipeline 2 MEGAN MEGAN Introduction Taxonomic & Functional Analysis Comparison Analysis PostgreSQL 3 Metadata What is Metadata? Using Metadata to pool Datasets 4 Pooling Datasets Basic Idea Combined Datasets MetaData Analyzer 5 Summary & Conclusion 17 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 18. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Basic Idea Create two new datasets (winter, summer) from the four BLAST files Problems: Doubles space consumption Is time inefficient Idea: Use database technology to avoid redundancy, save time and space 18 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 19. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Primary & Combined Datasets in the Database A primary dataset is a dataset created from the original BLAST output and the reads file A combined dataset is created from primary datasets A combined dataset is created by using: References to read and match data of the primary datasets Optionally also the classification data of the primary datasets Hence, a combined dataset can be created time and space efficiently 19 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 20. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Creating Combined Datasets in MEGAN 20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 21. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Creating Combined Datasets in MEGAN 20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 22. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Creating Combined Datasets in MEGAN 20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 23. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Creating Combined Datasets in MEGAN 20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 24. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Creating Combined Datasets in MEGAN 20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 25. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Creating Combined Datasets in MEGAN 20 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 26. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Analysis Input: 8 primary datasets. Altogether ~100,000 reads, ~4 mio matches, ~4.5 GB space It takes ~50 minutes to load these datasets to the database Three combined datasets (winter, spring, summer) are created Their creation takes ~30 seconds and needs ~40MB additional space Alternatively combined datasets can be created on-the-fly. This takes less than a second and needs no additional space 21 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 27. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Comparing all Datasets 22 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 28. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Comparing by Season 23 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 29. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion 24 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 30. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion 24 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 31. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion 1 Introduction Metagenomics Unculturable Microbes Typical Metagenomic Samples Pipeline 2 MEGAN MEGAN Introduction Taxonomic & Functional Analysis Comparison Analysis PostgreSQL 3 Metadata What is Metadata? Using Metadata to pool Datasets 4 Pooling Datasets Basic Idea Combined Datasets MetaData Analyzer 5 Summary & Conclusion 25 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 32. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion Summary & Conclusion MEGAN communicates with a PostgreSQL database This gives the user access to many datasets Many user can work on the database simultaneously Primary datasets can be pooled to create combined datasets The MetaData Analyzer allows one to create combined datasets based on the usage of boolean expressions on assigned metadata This technique is highly space and time efficient 26 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes
  • 33. Introduction MEGAN Metadata Pooling Datasets Summary & Conclusion MEGAN v4 is freely available from www-ab. informatik.uni-tuebingen.de/software/megan Integrative analysis of environmental sequences using MEGAN4, Daniel H. Huson, Suparna Mitra, Hans-Joachim Ruscheweyh, Nico Weber, Stephan C. Schuster; submitted 2011 Thanks go to Daniel Huson, Suparna Mitra, Nico Weber, Stefan Schuster Thank your for your attention! 27 / 27 Hans-Joachim Ruscheweyh Pooling metagenomes