0
IPRStats: a Visualization Tool for
         InterProScan


              Iddo Friedberg
               Microbiology and
  ...
Microbes are Everywhere
●
    1030 prokaryotic cells on Earth
    (give or take a couple)
●   Dominate the biosphere
    ●...
t
                                   os
                               alm
            Microbes do Everything
●   Nutrient...
Of course there is health...
●   Communicable
    diseases
●   Heart disease
●   Gastric cancer
●   Irritable Bowel
    Sy...
...and Wellness
Microbial Genomics

    Phage phi-X174 1978: 5.5Kbp




    H. influenzae 1995: 1.7Mbp
Classic microbial genomics
Classic microbial genomics
Classic microbial genomics
Microbes live in Communities
 & only 1% can be cultured
What is Metagenomics?
• Culture independent approach to study
  microbial communities
  – < 1% of microbes can be cultured...
Metagenomics is the Application
 of Genomics to Communities
Some things we can learn using Metagenomics

 ●Taxonomic content: Taxon diversity in a habitat (using taxonomic
 markers)
...
A Metagenomic project
●   Sequencing
●   Assembly
●   Diversity analysis
●   Annotation
    ●   Gene finding
    ●   Funct...
A Metagenomic project
●   Sequencing
●   Assembly
●   Diversity analysis
●   Annotation
    ●   Gene finding
    ●   Funct...
A Metagenomic project

●   Sequencing
●   Assembly
●   Annotation
    ●   Gene finding
                              Popul...
InterProScan
●   Signature search against an
    integrated resource of domains
    and functional sites
●   Easy to insta...
Open XML file                                  Charting
   Python SAX Parser
                       GUI: wxPython
        ...
IPRStats Architecture


                         IPRStats                     standalone
importers                (wx.Fram...
?
What is PyTables?
   - package for creating data structures that can handle large amounts of data
   - uses NumPy (for i...
Multiple graph formats


                            Pie charts




Bar graphs
Conclusions & Future
●   A lightweight, machine-independent
    visualization tool for InterProScan annotations
●   Licens...
Thanks
●   David Ream
●   Han Wang
●   Ian Fleming
●   David Vincent
●   Ryan Kelly
●   EBI
●   Miami University startup f...
The Friedberg Lab is Recruiting
●   Graduate students
●   Postdocs
●   Catch me later, email me, or look at
    iddo-fried...
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstats
Upcoming SlideShare
Loading in...5
×

Friedberg bosc2010 iprstats

675

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
675
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Friedberg bosc2010 iprstats"

  1. 1. IPRStats: a Visualization Tool for InterProScan Iddo Friedberg Microbiology and Computer Science & Software Engineering Miami University http://github.com/devrkel/IPRStats.git
  2. 2. Microbes are Everywhere ● 1030 prokaryotic cells on Earth (give or take a couple) ● Dominate the biosphere ● 90% of the cells in your body are prokaryotic (1014) ● Found in the most hostile environments
  3. 3. t os alm Microbes do Everything ● Nutrient reservoir: ● 4x1010 tons carbon (rivaling plants) ● 1x1010 tons Nitrogen ● 1x109 tons phosphorous ●
  4. 4. Of course there is health... ● Communicable diseases ● Heart disease ● Gastric cancer ● Irritable Bowel Syndrome
  5. 5. ...and Wellness
  6. 6. Microbial Genomics Phage phi-X174 1978: 5.5Kbp H. influenzae 1995: 1.7Mbp
  7. 7. Classic microbial genomics
  8. 8. Classic microbial genomics
  9. 9. Classic microbial genomics
  10. 10. Microbes live in Communities & only 1% can be cultured
  11. 11. What is Metagenomics? • Culture independent approach to study microbial communities – < 1% of microbes can be cultured – DNA directly isolated from environmental sample and sequenced • Examining genomic content of organisms in community/environment to better understand: – Diversity of organisms – Their roles and interactions in the ecosystem
  12. 12. Metagenomics is the Application of Genomics to Communities
  13. 13. Some things we can learn using Metagenomics ●Taxonomic content: Taxon diversity in a habitat (using taxonomic markers) • Functional content: biological functions, qualitative and quantitative profiles • Coping with the environment: differences in functional content between habitats • Decompose the biotic / abiotic elements in a habitat: metadata analysis
  14. 14. A Metagenomic project ● Sequencing ● Assembly ● Diversity analysis ● Annotation ● Gene finding ● Function prediction ● Diversity analysis ● Comparative analysis
  15. 15. A Metagenomic project ● Sequencing ● Assembly ● Diversity analysis ● Annotation ● Gene finding ● Function prediction ● Diversity analysis ● Comparative analysis
  16. 16. A Metagenomic project ● Sequencing ● Assembly ● Annotation ● Gene finding Population ● Function prediction analysis tools ● Diversity analysis ● Comparative analysis
  17. 17. InterProScan ● Signature search against an integrated resource of domains and functional sites ● Easy to install, cluster-enabled (pleasantly parallel) ● Maintained by EBI ● Can annotate whole genomes ● PIR, Pfam, TIGRFam, Panther, Prodom, PRINTS,... ● Needs a visualization tool for population / metagenomic annotation
  18. 18. Open XML file Charting Python SAX Parser GUI: wxPython Excel export: xlwt Full Databases IPRStats File Help PFAM PIR GENE3D Aggregate Queries HAMAP PANTHER PRINTS PRODOM Resulting Tables PROFILE PROSITE SMART SUPERFAMILY TIGRFAMs
  19. 19. IPRStats Architecture IPRStats standalone importers (wx.Frame) Menu XML (wx.MenuBar) PropertiesDlg IPS (wx.Dialog) Settings Chart (wx.StaticBitmap) exporters Table (wx.PyGridTableBase) HTML StatsData XLS (using xlwt) Results (sqlite or pytables) IPS
  20. 20. ? What is PyTables? - package for creating data structures that can handle large amounts of data - uses NumPy (for in memory) and HDF5 (for disk storage) structures - uses Numexpr (jit compiler) for evaluating expressions (like queries) - in the context of IPRScan, it provides a way of accessing a huge table of data without requiring that all the data be in memory Pros Cons - HDF5 provides very fast, compact and - Large memory overhead (particularly efficient indexing in comparison to smaller datasets) - NumPy provides efficient in-memory - Many large, complex dependencies storage including HDF5, NumPy, Numexpr and - Minimizes disk and memory usage Cython - Very fast read times compared to - Slow write times (particularly important SQLite and MySQL since IPRStats bottlenecks with writing)
  21. 21. Multiple graph formats Pie charts Bar graphs
  22. 22. Conclusions & Future ● A lightweight, machine-independent visualization tool for InterProScan annotations ● License: AFL ● Todo: ● Comparative population analysis ● Large dataset handling ● More graphic options ● Anything else you like... – http://github.com/devrkel/IPRStats.git
  23. 23. Thanks ● David Ream ● Han Wang ● Ian Fleming ● David Vincent ● Ryan Kelly ● EBI ● Miami University startup funding ● Miami University Undergraduate Summer Scholars Program
  24. 24. The Friedberg Lab is Recruiting ● Graduate students ● Postdocs ● Catch me later, email me, or look at iddo-friedberg.net to learn more
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×