Functional Classification of
Environmental Reads using Gene Ontology




                       Daniel C. Richter
        ...
Metagenomics - Workflow


                                           Environmental Sample


                              ...
Metagenomics - Workflow


                                           Environmental Sample


                              ...
MEGAN – Taxonomical Analysis
                                                 Precomputation

              Reads


      ...
Functional Metagenome Analysis
 
     Extension of MEGAN to classify reads according to their function
 • Input: BLASTX r...
Mapping BLAST Matches to GO Terms

    >gb|EAU86868.1| predicted protein [Coprinopsis cinerea okayama7#130]
    >emb|CAC86...
Placing Reads onto GO Terms – LCA Approach
                                                                               ...
Placing Reads onto GO Terms – LCA Approach
                                                                               ...
Placing Reads onto GO Terms – LCA Approach
                                                                               ...
Placing Reads onto GO Terms – LCA Approach
                                                                               ...
Placing Reads onto GO Terms – LCA Approach
                                                                               ...
Benefits and Drawbacks of the LCA Algorithm
                   • loss of accuracy: LCA is always less specific
           ...
GO Analyzer – Main Window




Daniel Richter – University of Tuebingen   Functional Metagenome Analysis   Stockholm, 09/06...
GO Analyzer – Main Window




Daniel Richter – University of Tuebingen   Functional Metagenome Analysis   Stockholm, 09/06...
GO Analyzer – Main Window




Daniel Richter – University of Tuebingen   Functional Metagenome Analysis   Stockholm, 09/06...
GO Analyzer – Main Window




Daniel Richter – University of Tuebingen   Functional Metagenome Analysis   Stockholm, 09/06...
GO Analyzer – Main Window




Daniel Richter – University of Tuebingen   Functional Metagenome Analysis   Stockholm, 09/06...
GO Analyzer – Main Window




Daniel Richter – University of Tuebingen   Functional Metagenome Analysis   Stockholm, 09/06...
GO Analyzer – Main Window




Daniel Richter – University of Tuebingen   Functional Metagenome Analysis   Stockholm, 09/06...
GO Analyzer – Main Window




                                           Extract reads



Daniel Richter – University of T...
GO Analyzer – Path Highlighting




Daniel Richter – University of Tuebingen   Functional Metagenome Analysis   Stockholm,...
GO Analyzer – GO Slims

 Gene Ontology provides subsets of GO terms
          → useful for high level view of the three on...
GO Analyzer – Comparison View




Daniel Richter – University of Tuebingen   Functional Metagenome Analysis   Stockholm, 0...
GO Analyzer – Comparison View




Daniel Richter – University of Tuebingen   Functional Metagenome Analysis   Stockholm, 0...
GO Analyzer – Comparison View




Daniel Richter – University of Tuebingen   Functional Metagenome Analysis   Stockholm, 0...
GO Analyzer – Comparison View




Daniel Richter – University of Tuebingen   Functional Metagenome Analysis   Stockholm, 0...
GO Analyzer – Summary


   • New module of MEGAN 4 to conduct functional analyses on
                 environmental reads
...
Upcoming SlideShare
Loading in …5
×

Functional Metagenome Analysis using Gene Ontology (MEGAN 4)

5,557 views

Published on

Talk at the SIG M3 meeting (ISMB 2009), Stockholm June 2009

Describes an approach for the functional classification of environmental sequences of a metagenomic data set.
http://www-ab.informatik.uni-tuebingen.de/software/megan/welcome.html

Published in: Technology, Education

Functional Metagenome Analysis using Gene Ontology (MEGAN 4)

  1. 1. Functional Classification of Environmental Reads using Gene Ontology Daniel C. Richter Daniel H. Huson Dept. Algorithms in Bioinformatics ZBIT Center for Bioinformatics University of Tuebingen, Germany www-ab.informatik.uni-tuebingen.de
  2. 2. Metagenomics - Workflow Environmental Sample Sequencing (Sanger/NGS) Who is out there? How many are there? What are they doing? Taxonomical Analysis Quantitive Analysis Functional Analysis MEGAN Software Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [01]
  3. 3. Metagenomics - Workflow Environmental Sample Sequencing (Sanger/NGS) Who is out there? How many are there? What are they doing? Taxonomical Analysis Quantitive Analysis Functional Analysis MEGAN Software Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [01]
  4. 4. MEGAN – Taxonomical Analysis Precomputation Reads nr BLAST nt ... „Laptop MEGAN Analysis“ NCBI Taxonomy • >460.000 taxa • Taxonomical Ranks: Kingdom, Phylum, Class, Order,..., Species Huson et al., 2007, Genome Research Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [02]
  5. 5. Functional Metagenome Analysis  Extension of MEGAN to classify reads according to their function • Input: BLASTX result file → homology-based approach • Structured and interactive overview of gene products http://www.geneontology.org  widely used in biological databases, gene expression and annotation studies  >27000 GO terms (cross-specific) DAG  three structured vocabularies (ontologies) molecular function biological process cellular component Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [03]
  6. 6. Mapping BLAST Matches to GO Terms >gb|EAU86868.1| predicted protein [Coprinopsis cinerea okayama7#130] >emb|CAC86119.1| putative hexose-6-phosphate transporter [Listeria monocytogenes] >ref|ZP_00390013.1| Arabinose efflux permease [Bacillus anthracis str. A2012] ref2go map RefSeqID → UniProt mapping GO Terms RefSeqID → GO Terms RefSeqID → GO Terms http://pir.georgetown.edu/ RefSeqID → GO Terms ... >3.5 Mio entries GO:0044408 GO:0043581 GO:0032502 Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [04]
  7. 7. Placing Reads onto GO Terms – LCA Approach r ar al la t ul ic l u en ec tion og ess e l on C p BLAST ref2go map ol ol M un c F Bi roc P Co m GO Terms M0 protein binding M1 response to stress signal transduction Read M2 cell communication M3 nucleus M4 cell part cytosol Placement: ? ? ? Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [05]
  8. 8. Placing Reads onto GO Terms – LCA Approach r ar al la t ul ic l u en ec tion og ess e l on C p BLAST ref2go map ol ol M un c Bi roc P Co m GO Terms F M0 protein binding M1 response to stress signal transduction Read M2 cell communication M3 nucleus M4 cell part cytosol Placement: ? ? Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [06]
  9. 9. Placing Reads onto GO Terms – LCA Approach r ar al la t ul ic l u en ec tion og ess e l on C p BLAST ref2go map ol ol M un c Bi roc P Co m GO Terms F M0 protein binding M1 response to stress signal transduction Read M2 cell communication M3 nucleus M4 cell part cytosol Placement: ? ? root root root cellular process cell communication response signal response signal response signal to stress transduction to stress transduction to stress transduction Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [07]
  10. 10. Placing Reads onto GO Terms – LCA Approach r ar al la t ul ic l u en ec tion og ess e l on C p BLAST ref2go map ol ol M un c Bi roc P Co m GO Terms F M0 protein binding M1 response to stress signal transduction Read M2 cell communication M3 nucleus M4 cell part cytosol Placement: ? root root root cellular process cell communication response signal response signal response signal to stress transduction to stress transduction to stress transduction Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [08]
  11. 11. Placing Reads onto GO Terms – LCA Approach r ar al la t ul ic l u en ec tion og ess e l on C p BLAST ref2go map ol ol M un c Bi roc P Co m GO Terms F M0 protein binding M1 response to stress signal transduction Read M2 cell communication M3 nucleus M4 cell part cytosol Placement: Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [09]
  12. 12. Benefits and Drawbacks of the LCA Algorithm • loss of accuracy: LCA is always less specific • might miss gene products of interest (losing the „big picture“) • reads with many different BLAST matches (= many GO terms) are likely to be assigned to high level GO terms • complexity reduction facilitates analysis and visual inspection • memory efficient: • need to store only three integers (GO IDs) per read • applicable to large data sets: 5 Mio reads, 760 GB BLAST output • loss of accuracy ≠loss of correctness (avoids false-positives) → balance between usability and accuracy Calculation example „Full Approach“: 1,000,000 reads each read: 50 BLAST matches each match: 10 GO terms → 500,000,000 GO IDs Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [10]
  13. 13. GO Analyzer – Main Window Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [11]
  14. 14. GO Analyzer – Main Window Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [12]
  15. 15. GO Analyzer – Main Window Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [13]
  16. 16. GO Analyzer – Main Window Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [14]
  17. 17. GO Analyzer – Main Window Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [15]
  18. 18. GO Analyzer – Main Window Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [16]
  19. 19. GO Analyzer – Main Window Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [17]
  20. 20. GO Analyzer – Main Window Extract reads Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [18]
  21. 21. GO Analyzer – Path Highlighting Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [19]
  22. 22. GO Analyzer – GO Slims Gene Ontology provides subsets of GO terms → useful for high level view of the three ontologies http://www.geneontology.org/GO.slims.shtml Design your own metagenomic GO slim... Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [20]
  23. 23. GO Analyzer – Comparison View Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [21]
  24. 24. GO Analyzer – Comparison View Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [22]
  25. 25. GO Analyzer – Comparison View Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [23]
  26. 26. GO Analyzer – Comparison View Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [24]
  27. 27. GO Analyzer – Summary • New module of MEGAN 4 to conduct functional analyses on environmental reads „BLAST only once, perform taxonomical and functional analysis in one step“ • Homology-based approach • Overview tool: visual and interactive exploration of gene products • Inspection, extraction and chart features • Comparative mode Installers for all operating systems will be available from: http://www-ab.informatik.uni-tuebingen.de/software/megan Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [25]

×