Project UpdateBioinformatics Open Source Conference (BOSC) July 14, 2012 Long Beach, California, USA Eric Talevich, Peter Cock, Brad Chapman, João Rodrigues, and Biopython contributors
Hello, BOSCBiopython is a freely available Python library for biologicalcomputation, and a long-running, distributed collaborationto produce and maintain it . ● Supported by the Open Bioinformatics Foundation (OBF) ● "This is Pythons Bio* library. There are several Bio* libraries like it, but this one is ours." ● http://biopython.org/_____ Cock, P.J.A., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A.,Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., de Hoon, M.J. (2009)Biopython: freely available Python tools for computational molecular biologyand bioinformatics. Bioinformatics 25(11) 1422-3. doi:10.1093/bioinformatics/btp163
Bio.Graphics (Biopython 1.59, February 2012)New features in...BasicChromosome: ● Draw simple sub-features on chromosome segments ● Show the position of genes, SNPs or other lociGenomeDiagram : ● Cross-links between tracks ● Track-specific start/end positions for showing regions_____ Pritchard, L., White, J.A., Birch, P.R., Toth, I. (2010) GenomeDiagram: apython package for the visualization of large-scale genomic data.Bioinformatics 2(5) 616-7.doi:10.1093/bioinformatics/btk021
BasicChromosome: Potato NB-LRRsJupe et al. (2012) BMC Genomics
GenomeDiagram: A tale of three phagesSwanson et al. (2012) PLoS One (to appear)
SeqIO and AlignIO(Biopython 1.58, August 2011)● SeqXML format ● Read support for ABI chromatogram files (Wibowo A.)● "phylip-relaxed" format (Connor McCoy, Brandon I.) ○ Relaxes the 10-character limit on taxon names ○ Space-delimited instead ○ Used in RAxML, PhyML, PAML, etc._____ Schmitt et al. (2011) SeqXML and OrthoXML: standards for sequence andorthology information. Briefings in Bioinformatics 12(5): 485-488. doi:10.1093/bib/bbr025
Bio.Phylo & pypaml● PAML interop: wrappers, I/O, glue ○ Merged Brandon Invergo’s pypaml as Bio.Phylo.PAML (Biopython 1.58, August 2011)● Phylo.draw improvements● RAxML wrapper (Biopython 1.60, June 2012)● Paper in review _____ Talevich, E., Invergo, B.M., Cock, P.J.A., Chapman, B.A. (2012) Bio.Phylo:a unified toolkit for processing, analysis and visualization of phylogenetic datain Biopython. BMC Bioinformatics, in review
Bio.bgzf (Blocked GNU Zip Format)● BGZF is a GZIP variant that compresses blocks of a fixed, known size● Used in Next Generation Sequencing for efficient random access to compressed files ○ SAM + BGZF = BAMBio.SeqIO can now index BGZF compressedsequence files. (Biopython 1.60, June 2012)
TogoWS(Biopython 1.59, February 2012)● TogoWS is an integrated web resource for bioinformatics databases and services● Provided by the Database Center for Life Science in Japan● Usage is similar to NCBI Entrez_____http://togows.dbcls.jp/
PyPy and Python 3Biopython:● works well on PyPy 1.9 (excluding NumPy & C extensions)● works on Python 3 (excluding some C extensions), but concerns remain about performance in default unicode mode. ○ Currently beta level support.
Bio.PDB● mmCIF parser restored (Biopython 1.60, June 2012) ○ Lenna Peterson fixed a 4-year-old lex/yacc-related compilation issue ○ That was awesome ○ Now shes a GSoC student ○ Py3/PyPy/Jython compatibility in progress● Merging GSoC results incrementally ○ Atom element names & weights (João Rodrigues, GSoC 2010) ○ Lots of feature branches remaining...
Google Summer of Code (GSoC)In 2011, Biopython had three projects funded via the OBF:● Mikael Trellet (Bio.PDB)● Michele Silva (Bio.PDB, Mocapy++)● Justinas Daugmaudis (Mocapy++)In 2012, we have two projects via the OBF:● Wibowo Arindrarto: (SearchIO)● Lenna Peterson: (Variants)_____http://biopython.org/wiki/Google_Summer_of_Codehttp://www.open-bio.org/wiki/Google_Summer_of_Codehttps://www.google-melange.com/
GSoC 2011: Mikael TrelletBiomolecular interfaces in Bio.PDBMentor: João Rodrigues● Representation of protein-protein interfaces: SM(I)CRA● Determining interfaces from PDB coordinates● Analyses of these objects_____http://biopython.org/wiki/GSoC2011_mtrellet
GSoC 2011: Michele SilvaPython/Biopython bindings for Mocapy++Mentor: Thomas HamelryckMichele Silva wrote a Python bridge for Mocapy++ andlinked it to Bio.PDB to enable statistical analysis of proteinstructures.More-or-less ready to merge after the next Mocapy++release._____http://biopython.org/wiki/GSOC2011_Mocapy
GSoC 2011: Justinas DaugmaudisMocapy extensions in PythonMentor: Thomas HamelryckEnhance Mocapy++ in a complementary way, developing aplugin system for Mocapy++ allowing users to easily writenew nodes (probability distribution functions) in Python.Hes finishing this as part of his masters thesis project withThomas Hamelryck._____http://biopython.org/wiki/GSOC2011_MocapyExt
GSoC 2012: Lenna PetersonDiff My DNA: Development of aGenomic Variant Toolkit for BiopythonMentors: Brad Chapman, James Casbon● I/O for VCF, GVF formats● internal schema for variant data_____http://arklenna.tumblr.com/tagged/gsoc2012
GSoC 2012: Wibowo ArindrartoSearchIO implementation inBiopythonMentor: Peter CockUnified, BioPerl-like API forsearch results from BLAST,HMMer, FASTA, etc._____http://biopython.org/wiki/SearchIOhttp://bow.web.id/blog/tag/gsoc/
Thanks● OBF● BOSC organizers● Biopython contributors● Scientists like youCheck us out:● Website: http://biopython.org● Code: https://github.com/biopython/biopython
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.