Biopython Project Update (BOSC 2012)


Published on

Highlights of the Biopython project for computational biology, 2011-2012: Artemis-like genome track comparison with GenomeDiagram, new formats for SeqIO, phylogenetics with Bio.Phylo, Bio.PDB improvements, and an update on Google Summer of Code (GSoC) projects.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Biopython Project Update (BOSC 2012)

  1. 1. Project UpdateBioinformatics Open Source Conference (BOSC) July 14, 2012 Long Beach, California, USA Eric Talevich, Peter Cock, Brad Chapman, João Rodrigues, and Biopython contributors
  2. 2. Hello, BOSCBiopython is a freely available Python library for biologicalcomputation, and a long-running, distributed collaborationto produce and maintain it [1]. ● Supported by the Open Bioinformatics Foundation (OBF) ● "This is Pythons Bio* library. There are several Bio* libraries like it, but this one is ours." ●[1] Cock, P.J.A., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A.,Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., de Hoon, M.J. (2009)Biopython: freely available Python tools for computational molecular biologyand bioinformatics. Bioinformatics 25(11) 1422-3. doi:10.1093/bioinformatics/btp163
  3. 3. Bio.Graphics (Biopython 1.59, February 2012)New features in...BasicChromosome: ● Draw simple sub-features on chromosome segments ● Show the position of genes, SNPs or other lociGenomeDiagram [2]: ● Cross-links between tracks ● Track-specific start/end positions for showing regions_____[2] Pritchard, L., White, J.A., Birch, P.R., Toth, I. (2010) GenomeDiagram: apython package for the visualization of large-scale genomic data.Bioinformatics 2(5) 616-7.doi:10.1093/bioinformatics/btk021
  4. 4. BasicChromosome: Potato NB-LRRsJupe et al. (2012) BMC Genomics
  5. 5. GenomeDiagram: A tale of three phagesSwanson et al. (2012) PLoS One (to appear)
  6. 6. GenomeDiagram imitatesArtemis Comparison Tool (ACT)
  7. 7. SeqIO and AlignIO(Biopython 1.58, August 2011)● SeqXML format [3]● Read support for ABI chromatogram files (Wibowo A.)● "phylip-relaxed" format (Connor McCoy, Brandon I.) ○ Relaxes the 10-character limit on taxon names ○ Space-delimited instead ○ Used in RAxML, PhyML, PAML, etc._____[3] Schmitt et al. (2011) SeqXML and OrthoXML: standards for sequence andorthology information. Briefings in Bioinformatics 12(5): 485-488. doi:10.1093/bib/bbr025
  8. 8. Bio.Phylo & pypaml● PAML interop: wrappers, I/O, glue ○ Merged Brandon Invergo’s pypaml as Bio.Phylo.PAML (Biopython 1.58, August 2011)● Phylo.draw improvements● RAxML wrapper (Biopython 1.60, June 2012)● Paper in review [4]_____[4] Talevich, E., Invergo, B.M., Cock, P.J.A., Chapman, B.A. (2012) Bio.Phylo:a unified toolkit for processing, analysis and visualization of phylogenetic datain Biopython. BMC Bioinformatics 13:209. doi:10.1186/1471-2105-13-209
  9. 9. Phylo.draw and matplotlib
  10. 10. Bio.bgzf (Blocked GNU Zip Format)● BGZF is a GZIP variant that compresses blocks of a fixed, known size● Used in Next Generation Sequencing for efficient random access to compressed files ○ SAM + BGZF = BAMBio.SeqIO can now index BGZF compressedsequence files. (Biopython 1.60, June 2012)
  11. 11. TogoWS(Biopython 1.59, February 2012)● TogoWS is an integrated web resource for bioinformatics databases and services● Provided by the Database Center for Life Science in Japan● Usage is similar to NCBI Entrez_____
  12. 12. PyPy and Python 3Biopython:● works well on PyPy 1.9 (excluding NumPy & C extensions)● works on Python 3 (excluding some C extensions), but concerns remain about performance in default unicode mode. ○ Currently beta level support.
  13. 13. Bio.PDB● mmCIF parser restored (Biopython 1.60, June 2012) ○ Lenna Peterson fixed a 4-year-old lex/yacc-related compilation issue ○ That was awesome ○ Now shes a GSoC student ○ Py3/PyPy/Jython compatibility in progress● Merging GSoC results incrementally ○ Atom element names & weights (João Rodrigues, GSoC 2010) ○ Lots of feature branches remaining...
  14. 14. Bio.PDB feature branches PDBParser Bio.Struct Mocapy++ Generic Features InterfaceAnalysis mmCIF Parser GSOC 10 11 12 ...
  15. 15. Google Summer of Code (GSoC)In 2011, Biopython had three projects funded via the OBF:● Mikael Trellet (Bio.PDB)● Michele Silva (Bio.PDB, Mocapy++)● Justinas Daugmaudis (Mocapy++)In 2012, we have two projects via the OBF:● Wibowo Arindrarto: (SearchIO)● Lenna Peterson: (Variants)_____
  16. 16. GSoC 2011: Mikael TrelletBiomolecular interfaces in Bio.PDBMentor: João Rodrigues● Representation of protein-protein interfaces: SM(I)CRA● Determining interfaces from PDB coordinates● Analyses of these objects_____
  17. 17. GSoC 2011: Michele SilvaPython/Biopython bindings for Mocapy++Mentor: Thomas HamelryckMichele Silva wrote a Python bridge for Mocapy++ andlinked it to Bio.PDB to enable statistical analysis of proteinstructures.More-or-less ready to merge after the next Mocapy++release._____
  18. 18. GSoC 2011: Justinas DaugmaudisMocapy extensions in PythonMentor: Thomas HamelryckEnhance Mocapy++ in a complementary way, developing aplugin system for Mocapy++ allowing users to easily writenew nodes (probability distribution functions) in Python.Hes finishing this as part of his masters thesis project withThomas Hamelryck._____
  19. 19. GSoC 2012: Lenna PetersonDiff My DNA: Development of aGenomic Variant Toolkit for BiopythonMentors: Brad Chapman, James Casbon● I/O for VCF, GVF formats● internal schema for variant data_____
  20. 20. GSoC 2012: Wibowo ArindrartoSearchIO implementation inBiopythonMentor: Peter CockUnified, BioPerl-like API forsearch results from BLAST,HMMer, FASTA, etc._____
  21. 21. Thanks● OBF● BOSC organizers● Biopython contributors● Scientists like youCheck us out:● Website:● Code: