Project Update
Bioinformatics Open Source Conference (BOSC)
                 July 14, 2012
         Long Beach, California, USA

          Eric Talevich, Peter Cock,
       Brad Chapman, João Rodrigues,
          and Biopython contributors
Hello, BOSC
Biopython is a freely available Python library for biological
computation, and a long-running, distributed collaboration
to produce and maintain it [1].
 ● Supported by the Open Bioinformatics Foundation
    (OBF)
 ● "This is Python's Bio* library. There are several Bio*
    libraries like it, but this one is ours."
 ● http://biopython.org/
_____
[1] Cock, P.J.A., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A.,
Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., de Hoon, M.J. (2009)
Biopython: freely available Python tools for computational molecular biology
and bioinformatics. Bioinformatics 25(11) 1422-3. doi:10.1093
/bioinformatics/btp163
Bio.Graphics (Biopython 1.59, February 2012)
New features in...
BasicChromosome:
 ● Draw simple sub-features on chromosome segments
 ● Show the position of genes, SNPs or other loci

GenomeDiagram [2]:
 ● Cross-links between tracks
 ● Track-specific start/end positions for showing regions

_____
[2] Pritchard, L., White, J.A., Birch, P.R., Toth, I. (2010) GenomeDiagram: a
python package for the visualization of large-scale genomic data.
Bioinformatics 2(5) 616-7.
doi:10.1093/bioinformatics/btk021
BasicChromosome: Potato NB-LRRs




Jupe et al. (2012) BMC Genomics
GenomeDiagram:
     A tale of three phages




Swanson et al. (2012) PLoS One (to appear)
GenomeDiagram imitates
Artemis Comparison Tool (ACT)
SeqIO and AlignIO
(Biopython 1.58, August 2011)

● SeqXML format [3]

● Read support for ABI chromatogram files (Wibowo A.)

● "phylip-relaxed" format (Connor McCoy, Brandon I.)
     ○ Relaxes the 10-character limit on taxon names
     ○ Space-delimited instead
     ○ Used in RAxML, PhyML, PAML, etc.

_____
[3] Schmitt et al. (2011) SeqXML and OrthoXML: standards for sequence and
orthology information. Briefings in Bioinformatics 12(5): 485-488. doi:10.1093
/bib/bbr025
Bio.Phylo & pypaml

● PAML interop: wrappers, I/O, glue
  ○ Merged Brandon Invergo’s pypaml as
    Bio.Phylo.PAML (Biopython 1.58, August 2011)

● Phylo.draw improvements

● RAxML wrapper (Biopython 1.60, June 2012)

● Paper in review [4]

_____
[4] Talevich, E., Invergo, B.M., Cock, P.J.A., Chapman, B.A. (2012) Bio.Phylo:
a unified toolkit for processing, analysis and visualization of phylogenetic data
in Biopython. BMC Bioinformatics, in review
Phylo.draw and matplotlib
Bio.bgzf (Blocked GNU Zip Format)
● BGZF is a GZIP variant that compresses
  blocks of a fixed, known size
● Used in Next Generation Sequencing for
  efficient random access to compressed files
  ○ SAM + BGZF = BAM


Bio.SeqIO can now index BGZF compressed
sequence files. (Biopython 1.60, June 2012)
TogoWS
(Biopython 1.59, February 2012)

● TogoWS is an integrated web resource for
    bioinformatics databases and services
●   Provided by the Database Center for Life Science in
    Japan
●   Usage is similar to NCBI Entrez

_____
http://togows.dbcls.jp/
PyPy and Python 3
Biopython:
● works well on PyPy 1.9
    (excluding NumPy & C extensions)
●   works on Python 3 (excluding some C
    extensions), but concerns remain about
    performance in default unicode mode.
    ○ Currently 'beta' level support.
Bio.PDB
● mmCIF parser restored (Biopython 1.60, June 2012)
  ○ Lenna Peterson fixed a 4-year-old lex/yacc-related
    compilation issue
  ○ That was awesome
  ○ Now she's a GSoC student
  ○ Py3/PyPy/Jython compatibility in progress

● Merging GSoC results incrementally
  ○ Atom element names & weights (João Rodrigues,
    GSoC 2010)
  ○ Lots of feature branches remaining...
Bio.PDB feature branches

                                                 PDBParser


                                          Bio.Struct
               Mocapy++
 Generic
 Features     InterfaceAnalysis   mmCIF Parser


            GSOC



  '10              '11              '12                 ...
Google Summer of Code (GSoC)
In 2011, Biopython had three projects funded via the OBF:
●   Mikael Trellet (Bio.PDB)
●   Michele Silva (Bio.PDB, Mocapy++)
●   Justinas Daugmaudis (Mocapy++)

In 2012, we have two projects via the OBF:
●   Wibowo Arindrarto: (SearchIO)
●   Lenna Peterson: (Variants)

_____
http://biopython.org/wiki/Google_Summer_of_Code
http://www.open-bio.org/wiki/Google_Summer_of_Code
https://www.google-melange.com/
GSoC 2011: Mikael Trellet
Biomolecular interfaces in Bio.PDB
Mentor: João Rodrigues

● Representation of protein-protein
    interfaces: SM(I)CRA
●   Determining interfaces from PDB coordinates
●   Analyses of these objects

_____
http://biopython.org/wiki/GSoC2011_mtrellet
GSoC 2011: Michele Silva
Python/Biopython bindings for Mocapy++
Mentor: Thomas Hamelryck

Michele Silva wrote a Python bridge for Mocapy++ and
linked it to Bio.PDB to enable statistical analysis of protein
structures.

More-or-less ready to merge after the next Mocapy++
release.
_____
http://biopython.org/wiki/GSOC2011_Mocapy
GSoC 2011: Justinas Daugmaudis
Mocapy extensions in Python
Mentor: Thomas Hamelryck

Enhance Mocapy++ in a complementary way, developing a
plugin system for Mocapy++ allowing users to easily write
new nodes (probability distribution functions) in Python.

He's finishing this as part of his master's thesis project with
Thomas Hamelryck.
_____
http://biopython.org/wiki/GSOC2011_MocapyExt
GSoC 2012: Lenna Peterson
Diff My DNA: Development of a
Genomic Variant Toolkit for Biopython
Mentors: Brad Chapman, James Casbon

● I/O for VCF, GVF formats
● internal schema for variant data


_____
http://arklenna.tumblr.com/tagged/gsoc2012
GSoC 2012: Wibowo Arindrarto
SearchIO implementation in
Biopython
Mentor: Peter Cock

Unified, BioPerl-like API for
search results from BLAST,
HMMer, FASTA, etc.


_____
http://biopython.org/wiki/SearchIO
http://bow.web.id/blog/tag/gsoc/
Thanks
●   OBF
●   BOSC organizers
●   Biopython contributors
●   Scientists like you

Check us out:
● Website: http://biopython.org
● Code: https://github.com/biopython/biopython

E Talevich - Biopython project-update

  • 1.
    Project Update Bioinformatics OpenSource Conference (BOSC) July 14, 2012 Long Beach, California, USA Eric Talevich, Peter Cock, Brad Chapman, João Rodrigues, and Biopython contributors
  • 2.
    Hello, BOSC Biopython isa freely available Python library for biological computation, and a long-running, distributed collaboration to produce and maintain it [1]. ● Supported by the Open Bioinformatics Foundation (OBF) ● "This is Python's Bio* library. There are several Bio* libraries like it, but this one is ours." ● http://biopython.org/ _____ [1] Cock, P.J.A., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., de Hoon, M.J. (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. doi:10.1093 /bioinformatics/btp163
  • 3.
    Bio.Graphics (Biopython 1.59,February 2012) New features in... BasicChromosome: ● Draw simple sub-features on chromosome segments ● Show the position of genes, SNPs or other loci GenomeDiagram [2]: ● Cross-links between tracks ● Track-specific start/end positions for showing regions _____ [2] Pritchard, L., White, J.A., Birch, P.R., Toth, I. (2010) GenomeDiagram: a python package for the visualization of large-scale genomic data. Bioinformatics 2(5) 616-7. doi:10.1093/bioinformatics/btk021
  • 4.
    BasicChromosome: Potato NB-LRRs Jupeet al. (2012) BMC Genomics
  • 5.
    GenomeDiagram: A tale of three phages Swanson et al. (2012) PLoS One (to appear)
  • 6.
  • 7.
    SeqIO and AlignIO (Biopython1.58, August 2011) ● SeqXML format [3] ● Read support for ABI chromatogram files (Wibowo A.) ● "phylip-relaxed" format (Connor McCoy, Brandon I.) ○ Relaxes the 10-character limit on taxon names ○ Space-delimited instead ○ Used in RAxML, PhyML, PAML, etc. _____ [3] Schmitt et al. (2011) SeqXML and OrthoXML: standards for sequence and orthology information. Briefings in Bioinformatics 12(5): 485-488. doi:10.1093 /bib/bbr025
  • 8.
    Bio.Phylo & pypaml ●PAML interop: wrappers, I/O, glue ○ Merged Brandon Invergo’s pypaml as Bio.Phylo.PAML (Biopython 1.58, August 2011) ● Phylo.draw improvements ● RAxML wrapper (Biopython 1.60, June 2012) ● Paper in review [4] _____ [4] Talevich, E., Invergo, B.M., Cock, P.J.A., Chapman, B.A. (2012) Bio.Phylo: a unified toolkit for processing, analysis and visualization of phylogenetic data in Biopython. BMC Bioinformatics, in review
  • 9.
  • 10.
    Bio.bgzf (Blocked GNUZip Format) ● BGZF is a GZIP variant that compresses blocks of a fixed, known size ● Used in Next Generation Sequencing for efficient random access to compressed files ○ SAM + BGZF = BAM Bio.SeqIO can now index BGZF compressed sequence files. (Biopython 1.60, June 2012)
  • 11.
    TogoWS (Biopython 1.59, February2012) ● TogoWS is an integrated web resource for bioinformatics databases and services ● Provided by the Database Center for Life Science in Japan ● Usage is similar to NCBI Entrez _____ http://togows.dbcls.jp/
  • 12.
    PyPy and Python3 Biopython: ● works well on PyPy 1.9 (excluding NumPy & C extensions) ● works on Python 3 (excluding some C extensions), but concerns remain about performance in default unicode mode. ○ Currently 'beta' level support.
  • 13.
    Bio.PDB ● mmCIF parserrestored (Biopython 1.60, June 2012) ○ Lenna Peterson fixed a 4-year-old lex/yacc-related compilation issue ○ That was awesome ○ Now she's a GSoC student ○ Py3/PyPy/Jython compatibility in progress ● Merging GSoC results incrementally ○ Atom element names & weights (João Rodrigues, GSoC 2010) ○ Lots of feature branches remaining...
  • 14.
    Bio.PDB feature branches PDBParser Bio.Struct Mocapy++ Generic Features InterfaceAnalysis mmCIF Parser GSOC '10 '11 '12 ...
  • 15.
    Google Summer ofCode (GSoC) In 2011, Biopython had three projects funded via the OBF: ● Mikael Trellet (Bio.PDB) ● Michele Silva (Bio.PDB, Mocapy++) ● Justinas Daugmaudis (Mocapy++) In 2012, we have two projects via the OBF: ● Wibowo Arindrarto: (SearchIO) ● Lenna Peterson: (Variants) _____ http://biopython.org/wiki/Google_Summer_of_Code http://www.open-bio.org/wiki/Google_Summer_of_Code https://www.google-melange.com/
  • 16.
    GSoC 2011: MikaelTrellet Biomolecular interfaces in Bio.PDB Mentor: João Rodrigues ● Representation of protein-protein interfaces: SM(I)CRA ● Determining interfaces from PDB coordinates ● Analyses of these objects _____ http://biopython.org/wiki/GSoC2011_mtrellet
  • 17.
    GSoC 2011: MicheleSilva Python/Biopython bindings for Mocapy++ Mentor: Thomas Hamelryck Michele Silva wrote a Python bridge for Mocapy++ and linked it to Bio.PDB to enable statistical analysis of protein structures. More-or-less ready to merge after the next Mocapy++ release. _____ http://biopython.org/wiki/GSOC2011_Mocapy
  • 18.
    GSoC 2011: JustinasDaugmaudis Mocapy extensions in Python Mentor: Thomas Hamelryck Enhance Mocapy++ in a complementary way, developing a plugin system for Mocapy++ allowing users to easily write new nodes (probability distribution functions) in Python. He's finishing this as part of his master's thesis project with Thomas Hamelryck. _____ http://biopython.org/wiki/GSOC2011_MocapyExt
  • 19.
    GSoC 2012: LennaPeterson Diff My DNA: Development of a Genomic Variant Toolkit for Biopython Mentors: Brad Chapman, James Casbon ● I/O for VCF, GVF formats ● internal schema for variant data _____ http://arklenna.tumblr.com/tagged/gsoc2012
  • 20.
    GSoC 2012: WibowoArindrarto SearchIO implementation in Biopython Mentor: Peter Cock Unified, BioPerl-like API for search results from BLAST, HMMer, FASTA, etc. _____ http://biopython.org/wiki/SearchIO http://bow.web.id/blog/tag/gsoc/
  • 21.
    Thanks ● OBF ● BOSC organizers ● Biopython contributors ● Scientists like you Check us out: ● Website: http://biopython.org ● Code: https://github.com/biopython/biopython