Biopython: Overview, State of the Art and Outlook


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Biopython: Overview, State of the Art and Outlook

  1. 1. Biopython: Overview, State of the Art and Outlook Sebastián Bassi Twitter: @sbassi
  2. 2. A few words about Python: Python is a general-purpose high-level and dynamic programming language. It supports multiple programming paradigms (OOP, imperative and functional programming). It features a fully dynamic type system and automatic memory management.
  3. 3. Python features ●Easy to learn ●Easy to read (looks like pseudocode) ●Interpreted (compiled to a vm bytecode, it is fast to program) ●High level data structures (lists, dictionaries, sets and more) ●Multiplatform (from supercomputers to phones) ●Batteries included philosophy ● Extensive 3rd party libraries ●Free (as in freedom and as in beer). ●Strong community
  4. 4. Read a file, load an array an sort it VB Dim i, j, Array_Used As Integer Dim MyArray() As String Dim InBuffer, Temp As String Array_Used = 0 ReDim MyArray(50) 'open a text file here . . . Do While Not EOF(file_no) Line Input #file_no, MyArray(Array_Used) Array_Used = Array_Used + 1 If Array_Used = UBound(MyArray) Then ReDim Preserve MyArray(UBound(MyArray) + 50) End If Loop 'simple bubble sort For i = Array_Used - 1 To 0 Step -1 For j = 1 To i If MyArray(j - 1) > MyArray(j) Then 'swap Temp = MyArray(j - 1) MyArray(j - 1) = MyArray(j) MyArray(j) = Temp End If Next Next
  5. 5. Read a file, load an array an sort it Python # Open the filehandle file_object = open(FILENAME) # Read all line and store them in a list lista = file_object.readlines() # Sort the list lista.sort()
  6. 6. What can be done with Python? from pylab import * from data_helper import get_daily_data intc, msft = get_daily_data() delta1 = diff([0] # size in points ^2 volume = (15*intc.volume[:-2]/intc.volume[0])**2 close = 0.003*intc.close[:-2]/0.003*[:-2] scatter(delta1[:-1], delta1[1:], c=close, s=volume, alpha=0.75) ticks = arange(-0.06, 0.061, 0.02) xticks(ticks) yticks(ticks) xlabel(r'$Delta_i$', fontsize=20) ylabel(r'$Delta_{i+1}$', fontsize=20) title('Volume and percent change') grid(True) show()
  7. 7. Robots (made in Argentina)
  8. 8. Biopython A set of freely available Python tools for bioinformatics and molecular biology Features include: ●Parsing bioinformatics files into python structures ●A sequence class to store sequences, ids and features ●Interface to popular bioinformatics programs (clustalw, blast, primer3 and more) ●Tools for performing common operations on DNA/protein sequence (translation, transcription, Tm, weight) ●Code to deal with alignments ●Integration with other languages via BioCorba
  9. 9. Biopython in the lab (real world usage)
  10. 10. Contributions to Biopython Code: ●Tm function ●LCC function ●Two checksums function in Bio.SeqUtils.CheckSum Other: ●Feedback ●Bug reporting ●Testing (BLAST, SFF files, BioSQL)
  11. 11. Sequence class >>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> seq_1=Seq('GATCGATGGGCCTATATAGGA', IUPAC.unambiguous_dna) >>> rna_1 = seq_1.transcribe() >>> str(rna_1) 'GAUCGAUGGGCCUAUAUAGGA' >>> rna_1.translate() Seq('DRWAYIG', IUPACProtein())
  12. 12. Run a BLAST search from Bio.Blast import NCBIStandalone as BLAST r,e = BLAST.blastall(b_exe, 'blastn', b_db,f_in, gap_open='3', gap_extend='2', wordsize=20, expectation=1e-50, alignments=1, descriptions=1, align_view='0', html='F') Parse a BLAST result from Bio.Blast import NCBIXML for rec in NCBIXML.parse(r): for align in rec.alignments: for hsp in align.hsps: print hsp.query_start, hsp.query_end print hsp.sbjct_start, hsp.sbjct_end if hsp.identities>90: print align.title
  13. 13. Typical bioinformatic problems and Biopython (1/3) Problem: Sequence manipulation in batch Tool: SeqRecord and SeqIO Problem: Filtering vector contamination Tool: SeqRecord, SeqIO, NCBIXML and NCBIStandalone Problem: Searching for primers Tool: Emboss.Applications Problem: Calculate melting temperature Tool: SeqUtils
  14. 14. Typical bioinformatic problems and Biopython (2/3) Problem: Introduce mutations with restrictions Tool: Restriction and Data.CodonTable Problem: Extract information from alignment Tool: Clustalw.MultipleAlignCL Problem: Get a substitution matrix from an alignment Tool: Align.AlignInfo and SubsMat Problem: Parse structural data Tool: PDB.PDBParser
  15. 15. Typical bioinformatic problems and Biopython (3/3) Problem: Calculate linkage desiquilibrium Tool: PopGen.GenePop Problem: Running SIMCOAL2 Tool: PopGen.SimCoal Problem: Data persistence (in relational database) Tool: BioSQL Problem: Retrieve data from Entrez Tool: Entrez.efetch
  16. 16. Outlook for Biopython Current version: 1.53 (December 2009) For 1.54: ●Updated multiple sequence alignment object ●Bio.Phylo module ●Bio.SeqIO support for Standard Flowgram Format (SFF) files Next: ●Extending Bio.PDB (GSoC grant) ●Support Python 3
  17. 17. Additional Resources Biopython website: Documentation: Cock PJ, et al. “Biopython: freely available Python tools for computational molecular biology and bioinformatics”. Bioinformatics 2009 Jun 1; 25(11) 1422-3. doi:10.1093/bioinformatics/btp163 pmid:19304878. Bassi S (2007) A Primer on Python for Life Science Researchers. PLoS Comput Biol 3(11): e199. doi:10.1371/journal.pcbi.0030199 Book: “Python for Bioinformatics” and Mailing list: Users: Developers: Python in Argentina:
  18. 18. Thank you!