Your SlideShare is downloading. ×
Friedberg bosc2010 iprstats
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Friedberg bosc2010 iprstats

652
views

Published on

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
652
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. IPRStats: a Visualization Tool for InterProScan Iddo Friedberg Microbiology and Computer Science & Software Engineering Miami University http://github.com/devrkel/IPRStats.git
  • 2. Microbes are Everywhere ● 1030 prokaryotic cells on Earth (give or take a couple) ● Dominate the biosphere ● 90% of the cells in your body are prokaryotic (1014) ● Found in the most hostile environments
  • 3. t os alm Microbes do Everything ● Nutrient reservoir: ● 4x1010 tons carbon (rivaling plants) ● 1x1010 tons Nitrogen ● 1x109 tons phosphorous ●
  • 4. Of course there is health... ● Communicable diseases ● Heart disease ● Gastric cancer ● Irritable Bowel Syndrome
  • 5. ...and Wellness
  • 6. Microbial Genomics Phage phi-X174 1978: 5.5Kbp H. influenzae 1995: 1.7Mbp
  • 7. Classic microbial genomics
  • 8. Classic microbial genomics
  • 9. Classic microbial genomics
  • 10. Microbes live in Communities & only 1% can be cultured
  • 11. What is Metagenomics? • Culture independent approach to study microbial communities – < 1% of microbes can be cultured – DNA directly isolated from environmental sample and sequenced • Examining genomic content of organisms in community/environment to better understand: – Diversity of organisms – Their roles and interactions in the ecosystem
  • 12. Metagenomics is the Application of Genomics to Communities
  • 13. Some things we can learn using Metagenomics ●Taxonomic content: Taxon diversity in a habitat (using taxonomic markers) • Functional content: biological functions, qualitative and quantitative profiles • Coping with the environment: differences in functional content between habitats • Decompose the biotic / abiotic elements in a habitat: metadata analysis
  • 14. A Metagenomic project ● Sequencing ● Assembly ● Diversity analysis ● Annotation ● Gene finding ● Function prediction ● Diversity analysis ● Comparative analysis
  • 15. A Metagenomic project ● Sequencing ● Assembly ● Diversity analysis ● Annotation ● Gene finding ● Function prediction ● Diversity analysis ● Comparative analysis
  • 16. A Metagenomic project ● Sequencing ● Assembly ● Annotation ● Gene finding Population ● Function prediction analysis tools ● Diversity analysis ● Comparative analysis
  • 17. InterProScan ● Signature search against an integrated resource of domains and functional sites ● Easy to install, cluster-enabled (pleasantly parallel) ● Maintained by EBI ● Can annotate whole genomes ● PIR, Pfam, TIGRFam, Panther, Prodom, PRINTS,... ● Needs a visualization tool for population / metagenomic annotation
  • 18. Open XML file Charting Python SAX Parser GUI: wxPython Excel export: xlwt Full Databases IPRStats File Help PFAM PIR GENE3D Aggregate Queries HAMAP PANTHER PRINTS PRODOM Resulting Tables PROFILE PROSITE SMART SUPERFAMILY TIGRFAMs
  • 19. IPRStats Architecture IPRStats standalone importers (wx.Frame) Menu XML (wx.MenuBar) PropertiesDlg IPS (wx.Dialog) Settings Chart (wx.StaticBitmap) exporters Table (wx.PyGridTableBase) HTML StatsData XLS (using xlwt) Results (sqlite or pytables) IPS
  • 20. ? What is PyTables? - package for creating data structures that can handle large amounts of data - uses NumPy (for in memory) and HDF5 (for disk storage) structures - uses Numexpr (jit compiler) for evaluating expressions (like queries) - in the context of IPRScan, it provides a way of accessing a huge table of data without requiring that all the data be in memory Pros Cons - HDF5 provides very fast, compact and - Large memory overhead (particularly efficient indexing in comparison to smaller datasets) - NumPy provides efficient in-memory - Many large, complex dependencies storage including HDF5, NumPy, Numexpr and - Minimizes disk and memory usage Cython - Very fast read times compared to - Slow write times (particularly important SQLite and MySQL since IPRStats bottlenecks with writing)
  • 21. Multiple graph formats Pie charts Bar graphs
  • 22. Conclusions & Future ● A lightweight, machine-independent visualization tool for InterProScan annotations ● License: AFL ● Todo: ● Comparative population analysis ● Large dataset handling ● More graphic options ● Anything else you like... – http://github.com/devrkel/IPRStats.git
  • 23. Thanks ● David Ream ● Han Wang ● Ian Fleming ● David Vincent ● Ryan Kelly ● EBI ● Miami University startup funding ● Miami University Undergraduate Summer Scholars Program
  • 24. The Friedberg Lab is Recruiting ● Graduate students ● Postdocs ● Catch me later, email me, or look at iddo-friedberg.net to learn more