Biopython

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Biopython - Presentation Transcript

    1. The 8th annual Bioinformatics Open Source Conference (BOSC 2007) 18 th July, Vienna, Austria Biopython Project Update Peter Cock, MOAC Doctoral Training Centre, University of Warwick, UK
    2. Talk Outline
      • What is python?
      • What is Biopython?
      • Short history
      • Project organisation
      • What can you do with it?
      • How can you contribute?
      • Acknowledgements
      The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    3. What is Python?
      • High level programming language
      • Object orientated
      • Open Source, free ($$$)
      • Cross platform: Linux, Windows, Mac OS X, …
      • Extensible in C, C++, …
      The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    4. What is Biopython?
      • Set of libraries for computational biology
      • Open Source, free ($$$)
      • Cross platform: Linux, Windows, Mac OS X, …
      • Sibling project to BioPerl, BioRuby, BioJava, …
      The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    5. Popularity by Google Hits
      • Python
      • Perl
      • Ruby
      • Java
      98 million 101 million 101 million 289 million
      • Biopython
      • BioPerl
      • BioRuby
      • BioJava
      252,000 610,000 122,000 185,000
      • Both Perl and Python are strong at text
      • Python may have the edge for numerical work (with the Numerical python libraries)
      The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria BioPerl 610,000
    6. Biopython history
      • 1999 : Started by Jeff Chang & Andrew Dalke
      • 2000 : Biopython 0.90, first release
      • 2001 : Biopython 1.00, “semi-complete”
      • 2002 : Biopython 1.10, “semi-stable”
      • 2003 : Biopython 1.20, 1.21, 1.22 and 1.23
      • 2004 : Biopython 1.24 and 1.30
      • 2005 : Biopython 1.40 and 1.41
      • 2006 : Biopython 1.42
      • 2007 : Biopython 1.43
      The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    7. Biopython Project Organisation
      • Releases:
        • No fixed schedule
        • Currently once or twice a year
        • Work from a stable CVS base
      • Bugs:
        • Online bugzilla
        • Some small changes handled on mailing list
      • Tests:
        • Many based on unittest python library
        • Also simple scripts where output is verified
      The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    8. What can you do with Biopython?
      • Read, write & manipulate sequences
      • Restriction enzymes
      • BLAST (local and online)
      • Web databases (e.g. NCBI’s EUtils)
      • Call command line tools (e.g. clustalw)
      • Clustering (Bio.Cluster)
      • Phylogenetics (Bio.Nexus)
      • Protein Structures (Bio.PDB)
      The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    9. Manipulating Sequences
      • Use Biopython’s Seq object, holds:
        • Sequence data (string like)
        • Alphabet (can include list of letters)
      • Alphabet allows type checking, preventing errors like appending DNA to Protein
      The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    10. Manipulating Sequences from Bio.Seq import Seq from Bio.Alphabet.IUPAC import unambiguous_dna my_dna=Seq('CTAAACATCCTTCAT', unambiguous_dna) print 'Original:' print my_dna print 'Reverse complement:' print my_dna.reverse_complement() Original: Seq('CTAAACATCCTTCAT', IUPACUnambiguousDNA()) Reverse complement: Seq('ATGAAGGATGTTTAG', IUPACUnambiguousDNA()) The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    11. Translating Sequences from Bio import Translate bact_trans=Translate.unambiguous_dna_by_id[11] print 'Forward translation' print bact_trans.translate(my_dna) print 'Reverse complement translation' print bact_trans.translate( my_dna.reverse_complement()) Forward translation Seq('LNILH', HasStopCodon(IUPACProtein(), '*')) Reverse complement translation Seq('MKDV*', HasStopCodon(IUPACProtein(), '*')) The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    12. Sequence Input/Output
      • Bio.SeqIO is new in Biopython 1.43
      • Inspired by BioPerl’s SeqIO
      • Works with SeqRecord objects (not format specific representations)
      • Builds on existing Biopython parsers
      The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    13. SeqIO – Sequence Input from Bio import SeqIO handle = open('ls_orchid.fasta') format = 'fasta' for rec in SeqIO.parse(handle, format) : print "%s, len %i" % (rec.id, len(rec.seq)) print rec.seq[:40].tostring() + "..." handle.close() gi|2765658|emb|Z78533.1|CIZ78533, len 740 CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCAT... gi|2765657|emb|Z78532.1|CCZ78532, len 753 CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCAT... ... The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    14. SeqIO – Sequence Input from Bio import SeqIO handle = open('ls_orchid.gbk') format = 'genbank' for rec in SeqIO.parse(handle, format) : print "%s, len %i" % (rec.id, len(rec.seq)) print rec.seq[:40].tostring() + "..." handle.close() The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria Z78533.1, len 740 CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCAT... Z78532.1, len 753 CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCAT... ...
    15. SeqIO – Extracting Data from Bio import SeqIO handle = open('ls_orchid.gbk') format = 'genbank' from sets import Set print Set([rec.annotations['organism'] for rec in SeqIO.parse(handle, format)]) handle.close() Set(['Cypripedium acaule', 'Paphiopedilum primulinum', 'Phragmipedium lindenii', 'Paphiopedilum papuanum', 'Paphiopedilum stonei', 'Paphiopedilum urbanianum', 'Paphiopedilum dianthum', ...]) The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    16. SeqIO – Filtering Output i_handle = open('ls_orchid.gbk') o_handle = open('small_orchid.faa', 'w') SeqIO.write([rec for rec in SeqIO.parse(i_handle, 'genbank') if len(rec.seq) < 600], o_handle, 'fasta') i_handle.close() o_handle.close() >Z78481.1 P.insigne 5.8S rRNA gene and ITS1 ... CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTT... >Z78480.1 P.gratrixianum 5.8S rRNA gene and ... CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTT... ... The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    17. 3D Structures
      • Bio.Nexus was added in Biopython 1.30 by Frank Kauff and Cymon Cox
      • Reads Nexus alignments and trees
      • Also parses Newick format trees
      The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    18. Newick Tree Parsing (Bovine:0.69395,(Gibbon:0.36079,(Orang:0.33636,(Gorilla:0.17147,(Chimp:0.19268, Human:0.11927):0.08386):0.06124):0.15057):0.54939,Mouse:1.21460):0.10; from Bio.Nexus.Trees import Tree tree_str = open(&quot;simple.tree&quot;).read() tree_obj = Tree(tree_str) print tree_obj tree a_tree = (Bovine,(Gibbon,(Orang,(Gorilla, (Chimp,Human)))),Mouse); The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    19. Newick Tree Parsing # taxon prev succ brlen blen (sum) support 0 - None [1,2,11] 0.0 0.0 - 1 Bovine 0 [] 0.69395 0.69395 - 2 - 0 [3,4] 0.54939 0.54939 - 3 Gibbon 2 [] 0.36079 0.91018 - 4 - 2 [5,6] 0.15057 0.69996 - 5 Orang 4 [] 0.33636 1.03632 - 6 - 4 [7,8] 0.06124 0.7612 - 7 Gorilla 6 [] 0.17147 0.93267 - 8 - 6 [9,10] 0.08386 0.84506 - 9 Chimp 8 [] 0.19268 1.03774 - 10 Human 8 [] 0.11927 0.96433 - 11 Mouse 0 [] 1.2146 1.2146 - Root: 0 tree_obj.display() The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    20. 3D Structures
      • Bio.PDB was added in Biopython 1.24 by Thomas Hamelryck
      • Reads PDB and CIF format files
      The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    21. Working with 3D Structures http://www.warwick.ac.uk/go/peter_cock/ python/protein_superposition/ This example (online) uses Bio.PDB to align 21 alternative X-Ray crystal structures for PDB structure 1JOY. Before: After: The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    22. Population Genetics (planned)
      • Tiago Antão (with Ralph Haygood) plans to start a Population Genetics module
      • See also:
      • PyPop: Python for Population Genetics Alex Lancaster et al. (2003) www.pypop.org
      The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    23. Areas for Improvement
      • Documentation!
      • I’m interested in sequences & alignments:
        • Seq objects – more like strings?
        • Alignment objects – more like arrays?
        • SeqIO – support for more formats
        • AlignIO? – alignment equivalent to SeqIO
      • Move from Numeric to NumPy
      • Move from CVS to SVN?
      The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    24. How can you Contribute?
      • Users:
        • Discussions on the mailing list
        • Report bugs
        • Documentation improvement
      • Coders:
        • Suggest bug fixes
        • New/extended test cases
        • Adopt modules with no current ‘owner’
        • New modules
      The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    25. Biopython Acknowledgements
      • Open Bioinformatics Foundation or O|B|F for web hosting, CVS servers, mailing list
      • Biopython developers, including: Jeff Chang, Andrew Dalke, Brad Chapman, Iddo Friedberg , Michiel de Hoon, Frank Kauff, Cymon Cox, Thomas Hamelryck, me
      • Contributors who report bugs & join in the mailing list discussions
      The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    26. Personal Acknowledgements
      • Everyone for listening
      • Open Bioinformatics Foundation or O|B|F for the BOSC 2007 invitation
      • Iddo Friedberg & Michiel de Hoon for their encouragement
      • The EPSRC for my PhD funding via the MOAC Doctoral Training Centre, Warwick http://www.warwick.ac.uk/go/moac
      The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria
    27. Questions? The 8 th annual Bioinformatics Open Source Conference Biopython Project Update @ BOSC 2007, Vienna, Austria

    + boscbosc, 3 years ago

    custom

    1697 views, 0 favs, 7 embeds more stats

    Title: Biopython Update
    Author: Peter Cock

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 1697
      • 1613 on SlideShare
      • 84 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 31
    Most viewed embeds
    • 44 views on http://www.bioinformatica.info
    • 21 views on http://biobits.wordpress.com
    • 15 views on http://bioinformatica.info
    • 1 views on http://livebackgammon.info
    • 1 views on http://www.geoffblake.com

    more

    All embeds
    • 44 views on http://www.bioinformatica.info
    • 21 views on http://biobits.wordpress.com
    • 15 views on http://bioinformatica.info
    • 1 views on http://livebackgammon.info
    • 1 views on http://www.geoffblake.com
    • 1 views on http://www.mvyx.com
    • 1 views on http://www.o53.net

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories

    Tags