Biopython Project Update
     Peter Cock, Plant Pathology, SCRI, Dundee, UK
10th Annual Bioinformatics Open Source Confere...
Contents

•  Brief introduction to Biopython & history
•  Releases since BOSC 2008
•  Current and future projects
•  CVS, ...
Biopython

•  Free, open source library for bioinformatics
•  Supported by Open Bioinformatics Foundation
•  Runs on Windo...
Biopython’s Ten Year History

1999 •  Started by Jeff Chang & Andrew Dalke
2000 •  First release, Biopython 0.90
2001 •  B...
Biopython Publication – Cock et al. 2009




          N.B. Open Access!
November 2008 – Biopython 1.49

•  Support for Python 2.6
•  Switched from “Numeric” to “NumPy”
   (important Numerical li...
April 2009 – Biopython 1.50

•  New Bio.Motif module for sequence motifs
   (to replace Bio.AliceAce and Bio.MEME)
•  Supp...
Biopython 1.50 includes GenomeDiagram
                      De novo assembly
                      of 42kb phage from
    ...
Reading a FASTA file with Bio.SeqIO
>FL3BO7415JACDX	
TTAATTTTATTTTGTCGGCTAAAGAGATTTTTAGCTAAACGTTCAATTGCTTTAGCTGAA	
GTACGAG...
Reading a FASTQ file with Bio.SeqIO
@FL3BO7415JACDX	
TTAATTTTATTTTGTCGGCTAAAGAGATTTTTAGCTAAACGTTCAATTGCTTTAGCTGAAGTACGAGCA...
June 2009 – Biopython 1.51 beta

•  Support for Illumina 1.3+ FASTQ files
   (in addition to Sanger FASTQ and older
   Sol...
Google Summer of Code Projects

•  Eric Talevich - Parsing and writing phyloXML

  •  Mentors Brad Chapman & Christian Zma...
Other Notable Active Projects

•  Brad Chapman – GFF parsing
•  Tiago Antão – Population genetics statistics
•  Peter Cock...
Distributed Development

•  Currently work from a stable branch in CVS
•  CVS master is mirrored to github.com
•  Several ...
Acknowledgements
•  Other Biopython contributors & developers!
•  Open Bioinformatics Foundation (OBF)
   supports Biopyth...
What next?

•  This afternoon’s “Birds of a Feather” session:
         Biopython Tutorial and/or Hackathon

•  Sign up to ...
Upcoming SlideShare
Loading in...5
×

Cock Biopython Bosc2009

3,052

Published on

Published in: Technology, Education
1 Comment
0 Likes
Statistics
Notes
  • See http://biopython.org/wiki/Documentation#Presentations which has links to this and other Biopython presentations as PDF files.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total Views
3,052
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
26
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Cock Biopython Bosc2009

  1. 1. Biopython Project Update Peter Cock, Plant Pathology, SCRI, Dundee, UK 10th Annual Bioinformatics Open Source Conference (BOSC) Stockholm, Sweden, 28 June 2009
  2. 2. Contents •  Brief introduction to Biopython & history •  Releases since BOSC 2008 •  Current and future projects •  CVS, git and github •  BoF hackathon and tutorial at BOSC 2009
  3. 3. Biopython •  Free, open source library for bioinformatics •  Supported by Open Bioinformatics Foundation •  Runs on Windows, Linux, Mac OS X, etc •  International team of volunteer developers •  Currently about three releases per year •  Extensive “Biopython Tutorial & Cookbook” •  See www.biopython.org for details
  4. 4. Biopython’s Ten Year History 1999 •  Started by Jeff Chang & Andrew Dalke 2000 •  First release, Biopython 0.90 2001 •  Biopython 1.00, “semi-complete” … •  Biopython 1.10, …, 1.41 2007 •  Biopython 1.43 (Bio.SeqIO), 1.44 2008 •  Biopython 1.45, 1.47, 1.48, 1.49 2009 •  Biopython 1.50, 1.51beta •  OA Publication, Cock et al.
  5. 5. Biopython Publication – Cock et al. 2009 N.B. Open Access!
  6. 6. November 2008 – Biopython 1.49 •  Support for Python 2.6 •  Switched from “Numeric” to “NumPy” (important Numerical library for Python) •  More biological methods on core Seq object
  7. 7. April 2009 – Biopython 1.50 •  New Bio.Motif module for sequence motifs (to replace Bio.AliceAce and Bio.MEME) •  Support for QUAL and FASTQ in Bio.SeqIO (important NextGen sequencing formats) •  Integration of GenomeDiagram for figures (Pritchard et al. 2006)
  8. 8. Biopython 1.50 includes GenomeDiagram De novo assembly of 42kb phage from Roche 454 data “Feature Track” showing ORFs Scale tick marks “Barchart Track” of read depth (~100, scale max 200)
  9. 9. Reading a FASTA file with Bio.SeqIO >FL3BO7415JACDX TTAATTTTATTTTGTCGGCTAAAGAGATTTTTAGCTAAACGTTCAATTGCTTTAGCTGAA GTACGAGCAGATACTCCAATCGCAATTGTTTCTTCATTTAAAATTAGCTCGTCGCCACCT TCAATTGGAAATTTATAATCACGATCTAACCAGATTGGTACATTATGTTTTGCAAATCTT GGATGATATTTAATGATGTACTCCATGAATAATGATTCACGTCTACGCGCTGGTTCTCTC ATCTTATTTATCGTTAAGCCA >FL3BO7415I7AFR ... from Bio import SeqIO for rec in SeqIO.parse(open("phage.fasta"), "fasta") : print rec.id, len(rec.seq), rec.seq[:10]+"..." FL3BO7415JACDX 261 TTAATTTTAT... FL3BO7415I7AFR FL3BO7415JCAY5 267 136 CATTAACTAA... TTTCTTTTCT... Focus on the FL3BO7415JB41R 208 CTCTTTTATG... FL3BO7415I6HKB FL3BO7415I63UC 268 219 GGTATTTGAA... AACATGTGAG... filename and ... format (“fasta”)…
  10. 10. Reading a FASTQ file with Bio.SeqIO @FL3BO7415JACDX TTAATTTTATTTTGTCGGCTAAAGAGATTTTTAGCTAAACGTTCAATTGCTTTAGCTGAAGTACGAGCAGATACTCCAATCGCAATTGTTTCTTC ATTTAAAATTAGCTCGTCGCCACCTTCAATTGGAAATTTATAATCACGATCTAACCAGATTGGTACATTATGTTTTGCAAATCTTGGATGATATT TAATGATGTACTCCATGAATAATGATTCACGTCTACGCGCTGGTTCTCTCATCTTATTTATCGTTAAGCCA + BBBB2262=1111FFGGGHHHHIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFGGGFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFGGGGFFFFFFFFFFFFFFFFFGB BBCFFFFFFFFFFFFFFFFFFFFFFFGGGGGGGIIIIIIIGGGIIIGGGIIGGGG@AAAAA?===@@@??? @FL3BO7415I7AFR ... from Bio import SeqIO for rec in SeqIO.parse(open("phage.fastq"), "fastq") : print rec.id, len(rec.seq), rec.seq[:10]+"..." print rec.letter_annotations["phred_quality"][:10], "..." FL3BO7415JACDX 261 TTAATTTTAT... [33, 33, 33, 33, 17, 17, 21, 17, 28, FL3BO7415I7AFR 267 CATTAACTAA... 16] ... Just filename and [37, 37, 37, 37, 37, 37, 37, 37, 38, 38] ... FL3BO7415JCAY5 136 TTTCTTTTCT... [37, 37, 36, 36, 29, 29, 29, 29, 36, 37] ... format changed FL3BO7415JB41R 208 CTCTTTTATG... [37, 37, 37, 38, 38, 38, 38, 38, 37, 37] ... (“fasta” to “fastq”) FL3BO7415I6HKB 268 GGTATTTGAA... [37, 37, 37, 37, 34, 34, 34, 37, 37, 37] ... FL3BO7415I63UC 219 AACATGTGAG... [37, 37, 37, 37, 37, 37, 37, 37, 37, 37] ... ...
  11. 11. June 2009 – Biopython 1.51 beta •  Support for Illumina 1.3+ FASTQ files (in addition to Sanger FASTQ and older Solexa/Illumina FASTQ files) •  Faster parsing of UniProt/SwissProt files •  Bio.SeqIO now writes feature table in GenBank output Already being used at SCRI for genome annotation, e.g. with RAST and Artemis
  12. 12. Google Summer of Code Projects •  Eric Talevich - Parsing and writing phyloXML •  Mentors Brad Chapman & Christian Zmasek •  Nick Matzke - Biogeographical Phylogenetics •  Mentors Stephen Smith, Brad Chapman & David Kidd •  Hosted by NESCent Phyloinformatics Group •  Code development on github branches...
  13. 13. Other Notable Active Projects •  Brad Chapman – GFF parsing •  Tiago Antão – Population genetics statistics •  Peter Cock – Parsing Roche 454 SFF files (with Jose Blanca, co-author of sff_extract) •  Plus other ongoing refinements and documentation improvements
  14. 14. Distributed Development •  Currently work from a stable branch in CVS •  CVS master is mirrored to github.com •  Several sub-projects are being developed on github branches (in public) •  This is letting us get familiar with git & github •  Suggested plan is to switch from CVS to git summer 2009 (still hosted by OBF), continue to push to github for public collaboration
  15. 15. Acknowledgements •  Other Biopython contributors & developers! •  Open Bioinformatics Foundation (OBF) supports Biopython (and BioPerl etc) •  Society for General Microbiology (SGM) for my travel costs •  My Biopython work supported by: •  EPSRC funded PhD (MOAC DTC, University of Warwick, UK) •  SCRI (Scottish Crop Research Institute), who also paid my conference fees
  16. 16. What next? •  This afternoon’s “Birds of a Feather” session: Biopython Tutorial and/or Hackathon •  Sign up to our mailing list? •  Homepage www.biopython.org
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×