Your SlideShare is downloading. ×
Friedberg bosc2010 iprstats
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Friedberg bosc2010 iprstats


Published on

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. IPRStats: a Visualization Tool for InterProScan Iddo Friedberg Microbiology and Computer Science & Software Engineering Miami University
  • 2. Microbes are Everywhere ● 1030 prokaryotic cells on Earth (give or take a couple) ● Dominate the biosphere ● 90% of the cells in your body are prokaryotic (1014) ● Found in the most hostile environments
  • 3. t os alm Microbes do Everything ● Nutrient reservoir: ● 4x1010 tons carbon (rivaling plants) ● 1x1010 tons Nitrogen ● 1x109 tons phosphorous ●
  • 4. Of course there is health... ● Communicable diseases ● Heart disease ● Gastric cancer ● Irritable Bowel Syndrome
  • 5. ...and Wellness
  • 6. Microbial Genomics Phage phi-X174 1978: 5.5Kbp H. influenzae 1995: 1.7Mbp
  • 7. Classic microbial genomics
  • 8. Classic microbial genomics
  • 9. Classic microbial genomics
  • 10. Microbes live in Communities & only 1% can be cultured
  • 11. What is Metagenomics? • Culture independent approach to study microbial communities – < 1% of microbes can be cultured – DNA directly isolated from environmental sample and sequenced • Examining genomic content of organisms in community/environment to better understand: – Diversity of organisms – Their roles and interactions in the ecosystem
  • 12. Metagenomics is the Application of Genomics to Communities
  • 13. Some things we can learn using Metagenomics ●Taxonomic content: Taxon diversity in a habitat (using taxonomic markers) • Functional content: biological functions, qualitative and quantitative profiles • Coping with the environment: differences in functional content between habitats • Decompose the biotic / abiotic elements in a habitat: metadata analysis
  • 14. A Metagenomic project ● Sequencing ● Assembly ● Diversity analysis ● Annotation ● Gene finding ● Function prediction ● Diversity analysis ● Comparative analysis
  • 15. A Metagenomic project ● Sequencing ● Assembly ● Diversity analysis ● Annotation ● Gene finding ● Function prediction ● Diversity analysis ● Comparative analysis
  • 16. A Metagenomic project ● Sequencing ● Assembly ● Annotation ● Gene finding Population ● Function prediction analysis tools ● Diversity analysis ● Comparative analysis
  • 17. InterProScan ● Signature search against an integrated resource of domains and functional sites ● Easy to install, cluster-enabled (pleasantly parallel) ● Maintained by EBI ● Can annotate whole genomes ● PIR, Pfam, TIGRFam, Panther, Prodom, PRINTS,... ● Needs a visualization tool for population / metagenomic annotation
  • 18. Open XML file Charting Python SAX Parser GUI: wxPython Excel export: xlwt Full Databases IPRStats File Help PFAM PIR GENE3D Aggregate Queries HAMAP PANTHER PRINTS PRODOM Resulting Tables PROFILE PROSITE SMART SUPERFAMILY TIGRFAMs
  • 19. IPRStats Architecture IPRStats standalone importers (wx.Frame) Menu XML (wx.MenuBar) PropertiesDlg IPS (wx.Dialog) Settings Chart (wx.StaticBitmap) exporters Table (wx.PyGridTableBase) HTML StatsData XLS (using xlwt) Results (sqlite or pytables) IPS
  • 20. ? What is PyTables? - package for creating data structures that can handle large amounts of data - uses NumPy (for in memory) and HDF5 (for disk storage) structures - uses Numexpr (jit compiler) for evaluating expressions (like queries) - in the context of IPRScan, it provides a way of accessing a huge table of data without requiring that all the data be in memory Pros Cons - HDF5 provides very fast, compact and - Large memory overhead (particularly efficient indexing in comparison to smaller datasets) - NumPy provides efficient in-memory - Many large, complex dependencies storage including HDF5, NumPy, Numexpr and - Minimizes disk and memory usage Cython - Very fast read times compared to - Slow write times (particularly important SQLite and MySQL since IPRStats bottlenecks with writing)
  • 21. Multiple graph formats Pie charts Bar graphs
  • 22. Conclusions & Future ● A lightweight, machine-independent visualization tool for InterProScan annotations ● License: AFL ● Todo: ● Comparative population analysis ● Large dataset handling ● More graphic options ● Anything else you like... –
  • 23. Thanks ● David Ream ● Han Wang ● Ian Fleming ● David Vincent ● Ryan Kelly ● EBI ● Miami University startup funding ● Miami University Undergraduate Summer Scholars Program
  • 24. The Friedberg Lab is Recruiting ● Graduate students ● Postdocs ● Catch me later, email me, or look at to learn more