Cock Biopython Bosc2009

bosc
Biopython Project Update
     Peter Cock, Plant Pathology, SCRI, Dundee, UK
10th Annual Bioinformatics Open Source Conference (BOSC)
           Stockholm, Sweden, 28 June 2009
Contents

•  Brief introduction to Biopython & history
•  Releases since BOSC 2008
•  Current and future projects
•  CVS, git and github
•  BoF hackathon and tutorial at BOSC 2009
Biopython

•  Free, open source library for bioinformatics
•  Supported by Open Bioinformatics Foundation
•  Runs on Windows, Linux, Mac OS X, etc
•  International team of volunteer developers
•  Currently about three releases per year
•  Extensive “Biopython Tutorial & Cookbook”
•  See www.biopython.org for details
Biopython’s Ten Year History

1999 •  Started by Jeff Chang & Andrew Dalke
2000 •  First release, Biopython 0.90
2001 •  Biopython 1.00, “semi-complete”
 …    •  Biopython 1.10, …, 1.41
2007 •  Biopython 1.43 (Bio.SeqIO), 1.44
2008 •  Biopython 1.45, 1.47, 1.48, 1.49
2009 •  Biopython 1.50, 1.51beta
      •  OA Publication, Cock et al.
Biopython Publication – Cock et al. 2009




          N.B. Open Access!
November 2008 – Biopython 1.49

•  Support for Python 2.6
•  Switched from “Numeric” to “NumPy”
   (important Numerical library for Python)
•  More biological methods on core Seq object
April 2009 – Biopython 1.50

•  New Bio.Motif module for sequence motifs
   (to replace Bio.AliceAce and Bio.MEME)
•  Support for QUAL and FASTQ in Bio.SeqIO
   (important NextGen sequencing formats)
•  Integration of GenomeDiagram for figures
   (Pritchard et al. 2006)
Biopython 1.50 includes GenomeDiagram
                      De novo assembly
                      of 42kb phage from
                      Roche 454 data
                         “Feature Track”
                         showing ORFs

                         Scale tick marks

                         “Barchart Track” of
                         read depth (~100,
                         scale max 200)
Reading a FASTA file with Bio.SeqIO
>FL3BO7415JACDX	
TTAATTTTATTTTGTCGGCTAAAGAGATTTTTAGCTAAACGTTCAATTGCTTTAGCTGAA	
GTACGAGCAGATACTCCAATCGCAATTGTTTCTTCATTTAAAATTAGCTCGTCGCCACCT	
TCAATTGGAAATTTATAATCACGATCTAACCAGATTGGTACATTATGTTTTGCAAATCTT	
GGATGATATTTAATGATGTACTCCATGAATAATGATTCACGTCTACGCGCTGGTTCTCTC	
ATCTTATTTATCGTTAAGCCA	
>FL3BO7415I7AFR	
...	



from Bio import SeqIO	
for rec in SeqIO.parse(open("phage.fasta"), "fasta") :	
    print rec.id, len(rec.seq), rec.seq[:10]+"..."	


FL3BO7415JACDX   261   TTAATTTTAT...	
FL3BO7415I7AFR
FL3BO7415JCAY5
                 267
                 136
                       CATTAACTAA...	
                       TTTCTTTTCT...	                           Focus on the
FL3BO7415JB41R   208   CTCTTTTATG...	
FL3BO7415I6HKB
FL3BO7415I63UC
                 268
                 219
                       GGTATTTGAA...	
                       AACATGTGAG...	
                                                                filename and
...	
                                                                format (“fasta”)…
Reading a FASTQ file with Bio.SeqIO
@FL3BO7415JACDX	
TTAATTTTATTTTGTCGGCTAAAGAGATTTTTAGCTAAACGTTCAATTGCTTTAGCTGAAGTACGAGCAGATACTCCAATCGCAATTGTTTCTTC
ATTTAAAATTAGCTCGTCGCCACCTTCAATTGGAAATTTATAATCACGATCTAACCAGATTGGTACATTATGTTTTGCAAATCTTGGATGATATT
TAATGATGTACTCCATGAATAATGATTCACGTCTACGCGCTGGTTCTCTCATCTTATTTATCGTTAAGCCA	
+	
BBBB2262=1111FFGGGHHHHIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFGGGFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFGGGGFFFFFFFFFFFFFFFFFGB
BBCFFFFFFFFFFFFFFFFFFFFFFFGGGGGGGIIIIIIIGGGIIIGGGIIGGGG@AAAAA?===@@@???	
@FL3BO7415I7AFR	
...	


from Bio import SeqIO	
for rec in SeqIO.parse(open("phage.fastq"), "fastq") :	
    print rec.id, len(rec.seq), rec.seq[:10]+"..."	
    print rec.letter_annotations["phred_quality"][:10], "..."	

FL3BO7415JACDX 261 TTAATTTTAT...	
[33, 33, 33, 33, 17, 17, 21, 17, 28,
FL3BO7415I7AFR 267 CATTAACTAA...	
                                       16] ...	
                                                                   Just filename and
[37, 37, 37, 37, 37, 37, 37, 37, 38,   38] ...	
FL3BO7415JCAY5 136 TTTCTTTTCT...	
[37, 37, 36, 36, 29, 29, 29, 29, 36,   37] ...	
                                                                   format changed
FL3BO7415JB41R 208 CTCTTTTATG...	
[37, 37, 37, 38, 38, 38, 38, 38, 37,   37] ...	                    (“fasta” to “fastq”)
FL3BO7415I6HKB 268 GGTATTTGAA...	
[37, 37, 37, 37, 34, 34, 34, 37, 37,   37] ...	
FL3BO7415I63UC 219 AACATGTGAG...	
[37, 37, 37, 37, 37, 37, 37, 37, 37,   37] ...	
...
June 2009 – Biopython 1.51 beta

•  Support for Illumina 1.3+ FASTQ files
   (in addition to Sanger FASTQ and older
   Solexa/Illumina FASTQ files)
•  Faster parsing of UniProt/SwissProt files
•  Bio.SeqIO now writes feature table in
   GenBank output
                         Already being used at SCRI
                         for genome annotation, e.g.
                         with RAST and Artemis
Google Summer of Code Projects

•  Eric Talevich - Parsing and writing phyloXML

  •  Mentors Brad Chapman & Christian Zmasek

•  Nick Matzke - Biogeographical Phylogenetics

  •  Mentors Stephen Smith, Brad Chapman & David Kidd

•  Hosted by NESCent Phyloinformatics Group

•  Code development on github branches...
Other Notable Active Projects

•  Brad Chapman – GFF parsing
•  Tiago Antão – Population genetics statistics
•  Peter Cock – Parsing Roche 454 SFF files
   (with Jose Blanca, co-author of sff_extract)
•  Plus other ongoing refinements and
   documentation improvements
Distributed Development

•  Currently work from a stable branch in CVS
•  CVS master is mirrored to github.com
•  Several sub-projects are being developed on
   github branches (in public)
•  This is letting us get familiar with git & github
•  Suggested plan is to switch from CVS to git
   summer 2009 (still hosted by OBF), continue
   to push to github for public collaboration
Acknowledgements
•  Other Biopython contributors & developers!
•  Open Bioinformatics Foundation (OBF)
   supports Biopython (and BioPerl etc)
•  Society for General Microbiology
   (SGM) for my travel costs
•  My Biopython work supported by:
  •  EPSRC funded PhD
     (MOAC DTC, University of Warwick, UK)
  •  SCRI (Scottish Crop Research Institute),
     who also paid my conference fees
What next?

•  This afternoon’s “Birds of a Feather” session:
         Biopython Tutorial and/or Hackathon

•  Sign up to our mailing list?




•  Homepage www.biopython.org
1 of 16

Recommended

Antao Biopython Bosc2008 by
Antao Biopython Bosc2008Antao Biopython Bosc2008
Antao Biopython Bosc2008bosc_2008
1.3K views18 slides
E Talevich - Biopython project-update by
E Talevich - Biopython project-updateE Talevich - Biopython project-update
E Talevich - Biopython project-updateJan Aerts
1.5K views21 slides
Talk6 biopython bosc2011 by
Talk6 biopython bosc2011Talk6 biopython bosc2011
Talk6 biopython bosc2011Bioinformatics Open Source Conference
1.3K views14 slides
Biopython Project Update 2013 by
Biopython Project Update 2013Biopython Project Update 2013
Biopython Project Update 2013pjacock
3.1K views28 slides
Prins Bio Lib Bosc 2009 by
Prins Bio Lib Bosc 2009Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009bosc
906 views26 slides
Systematic integration of millions of peptidoform evidences into Ensembl and ... by
Systematic integration of millions of peptidoform evidences into Ensembl and ...Systematic integration of millions of peptidoform evidences into Ensembl and ...
Systematic integration of millions of peptidoform evidences into Ensembl and ...Yasset Perez-Riverol
275 views19 slides

More Related Content

Similar to Cock Biopython Bosc2009

BOSC 2008 Biopython by
BOSC 2008 BiopythonBOSC 2008 Biopython
BOSC 2008 Biopythontiago
895 views18 slides
Biopython by
BiopythonBiopython
Biopythonbosc
2.8K views27 slides
2015 bioinformatics bio_python by
2015 bioinformatics bio_python2015 bioinformatics bio_python
2015 bioinformatics bio_pythonProf. Wim Van Criekinge
2.6K views58 slides
Python 2 is dead! Drag your old code into the modern age by
Python 2 is dead! Drag your old code into the modern agePython 2 is dead! Drag your old code into the modern age
Python 2 is dead! Drag your old code into the modern ageBecky Smith
296 views39 slides
2016 bioinformatics i_bio_python_wimvancriekinge by
2016 bioinformatics i_bio_python_wimvancriekinge2016 bioinformatics i_bio_python_wimvancriekinge
2016 bioinformatics i_bio_python_wimvancriekingeProf. Wim Van Criekinge
3K views54 slides
Python Orientation by
Python OrientationPython Orientation
Python OrientationPavan Devarakonda
2.1K views240 slides

Similar to Cock Biopython Bosc2009(20)

BOSC 2008 Biopython by tiago
BOSC 2008 BiopythonBOSC 2008 Biopython
BOSC 2008 Biopython
tiago895 views
Biopython by bosc
BiopythonBiopython
Biopython
bosc2.8K views
Python 2 is dead! Drag your old code into the modern age by Becky Smith
Python 2 is dead! Drag your old code into the modern agePython 2 is dead! Drag your old code into the modern age
Python 2 is dead! Drag your old code into the modern age
Becky Smith296 views
그렇게 커미터가 된다: Python을 통해 오픈소스 생태계 가르치기 by Jeongkyu Shin
그렇게 커미터가 된다: Python을 통해 오픈소스 생태계 가르치기그렇게 커미터가 된다: Python을 통해 오픈소스 생태계 가르치기
그렇게 커미터가 된다: Python을 통해 오픈소스 생태계 가르치기
Jeongkyu Shin1.5K views
Python Evolution by Quintagroup
Python EvolutionPython Evolution
Python Evolution
Quintagroup1.9K views
BioRuby -- Bioinformatics Library by ngotogenome
BioRuby -- Bioinformatics LibraryBioRuby -- Bioinformatics Library
BioRuby -- Bioinformatics Library
ngotogenome2.7K views
Teaching with JupyterHub - lessons learned by Martin Christen
Teaching with JupyterHub - lessons learnedTeaching with JupyterHub - lessons learned
Teaching with JupyterHub - lessons learned
Martin Christen669 views
Bonnal bosc2010 bio_ruby by BOSC 2010
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_ruby
BOSC 2010806 views
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC... by Jian-Hong Pan
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...
Jian-Hong Pan86 views
PyCon Taiwan 2013 Tutorial by Justin Lin
PyCon Taiwan 2013 TutorialPyCon Taiwan 2013 Tutorial
PyCon Taiwan 2013 Tutorial
Justin Lin290.9K views
Pharo 7.0 and 8.0 alpha by Pharo
Pharo 7.0 and 8.0 alphaPharo 7.0 and 8.0 alpha
Pharo 7.0 and 8.0 alpha
Pharo768 views
Introduction Apache Kafka by Joe Stein
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
Joe Stein5.1K views
fastp: the FASTQ pre-processor by Hoffman Lab
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processor
Hoffman Lab538 views
New Features of Python 3.10 by Gabor Guta
New Features of Python 3.10New Features of Python 3.10
New Features of Python 3.10
Gabor Guta92 views
Git 101, or, how to sanely manage your Koha customizations by Ian Walls
Git 101, or, how to sanely manage your Koha customizationsGit 101, or, how to sanely manage your Koha customizations
Git 101, or, how to sanely manage your Koha customizations
Ian Walls7.4K views

More from bosc

Swertz Molgenis Bosc2009 by
Swertz Molgenis Bosc2009Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009bosc
1K views34 slides
Bosc Intro 20090627 by
Bosc Intro 20090627Bosc Intro 20090627
Bosc Intro 20090627bosc
641 views7 slides
Software Patterns Panel Bosc2009 by
Software Patterns Panel Bosc2009Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009bosc
528 views3 slides
Schbath Rmes Bosc2009 by
Schbath Rmes Bosc2009Schbath Rmes Bosc2009
Schbath Rmes Bosc2009bosc
689 views27 slides
Kallio Chipster Bosc2009 by
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009bosc
813 views16 slides
Welch Wordifier Bosc2009 by
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009bosc
708 views21 slides

More from bosc(20)

Swertz Molgenis Bosc2009 by bosc
Swertz Molgenis Bosc2009Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009
bosc1K views
Bosc Intro 20090627 by bosc
Bosc Intro 20090627Bosc Intro 20090627
Bosc Intro 20090627
bosc641 views
Software Patterns Panel Bosc2009 by bosc
Software Patterns Panel Bosc2009Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009
bosc528 views
Schbath Rmes Bosc2009 by bosc
Schbath Rmes Bosc2009Schbath Rmes Bosc2009
Schbath Rmes Bosc2009
bosc689 views
Kallio Chipster Bosc2009 by bosc
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009
bosc813 views
Welch Wordifier Bosc2009 by bosc
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
bosc708 views
Rice Emboss Bosc2009 by bosc
Rice Emboss Bosc2009Rice Emboss Bosc2009
Rice Emboss Bosc2009
bosc1K views
Prlic Bio Java Bosc2009 by bosc
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009
bosc658 views
Senger Soaplab Bosc2009 by bosc
Senger Soaplab Bosc2009Senger Soaplab Bosc2009
Senger Soaplab Bosc2009
bosc431 views
Hanmer Software Patterns Bosc2009 by bosc
Hanmer Software Patterns Bosc2009Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009
bosc1.8K views
Snell Psoda Bosc2009 by bosc
Snell Psoda Bosc2009Snell Psoda Bosc2009
Snell Psoda Bosc2009
bosc531 views
Procter Vamsas Bosc2009 by bosc
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009
bosc853 views
Drablos Composite Motifs Bosc2009 by bosc
Drablos Composite Motifs Bosc2009Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009
bosc754 views
Fauteux Seeder Bosc2009 by bosc
Fauteux Seeder Bosc2009Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009
bosc614 views
Moeller Debian Bosc2009 by bosc
Moeller Debian Bosc2009Moeller Debian Bosc2009
Moeller Debian Bosc2009
bosc469 views
Wilczynski_BNFinder_BOSC2009 by bosc
Wilczynski_BNFinder_BOSC2009Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009
bosc1.7K views
Welsh_BioHDF_BOSC2009 by bosc
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009
bosc549 views
Varre_Biomanycores_BOSC2009 by bosc
Varre_Biomanycores_BOSC2009Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
bosc771 views
Trelles_QnormBOSC2009 by bosc
Trelles_QnormBOSC2009Trelles_QnormBOSC2009
Trelles_QnormBOSC2009
bosc342 views
Rother_ModeRNA_BOSC2009 by bosc
Rother_ModeRNA_BOSC2009Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009
bosc562 views

Recently uploaded

Netmera Presentation.pdf by
Netmera Presentation.pdfNetmera Presentation.pdf
Netmera Presentation.pdfMustafa Kuğu
22 views50 slides
Innovation & Entrepreneurship strategies in Dairy Industry by
Innovation & Entrepreneurship strategies in Dairy IndustryInnovation & Entrepreneurship strategies in Dairy Industry
Innovation & Entrepreneurship strategies in Dairy IndustryPervaizDar1
35 views26 slides
PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」 by
PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」
PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」PC Cluster Consortium
27 views68 slides
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdf by
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdfAdopting Karpenter for Cost and Simplicity at Grafana Labs.pdf
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdfMichaelOLeary82
13 views74 slides
Bronack Skills - Risk Management and SRE v1.0 12-3-2023.pdf by
Bronack Skills - Risk Management and SRE v1.0 12-3-2023.pdfBronack Skills - Risk Management and SRE v1.0 12-3-2023.pdf
Bronack Skills - Risk Management and SRE v1.0 12-3-2023.pdfThomasBronack
31 views31 slides
Discover Aura Workshop (12.5.23).pdf by
Discover Aura Workshop (12.5.23).pdfDiscover Aura Workshop (12.5.23).pdf
Discover Aura Workshop (12.5.23).pdfNeo4j
15 views55 slides

Recently uploaded(20)

Innovation & Entrepreneurship strategies in Dairy Industry by PervaizDar1
Innovation & Entrepreneurship strategies in Dairy IndustryInnovation & Entrepreneurship strategies in Dairy Industry
Innovation & Entrepreneurship strategies in Dairy Industry
PervaizDar135 views
PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」 by PC Cluster Consortium
PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」
PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdf by MichaelOLeary82
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdfAdopting Karpenter for Cost and Simplicity at Grafana Labs.pdf
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdf
MichaelOLeary8213 views
Bronack Skills - Risk Management and SRE v1.0 12-3-2023.pdf by ThomasBronack
Bronack Skills - Risk Management and SRE v1.0 12-3-2023.pdfBronack Skills - Risk Management and SRE v1.0 12-3-2023.pdf
Bronack Skills - Risk Management and SRE v1.0 12-3-2023.pdf
ThomasBronack31 views
Discover Aura Workshop (12.5.23).pdf by Neo4j
Discover Aura Workshop (12.5.23).pdfDiscover Aura Workshop (12.5.23).pdf
Discover Aura Workshop (12.5.23).pdf
Neo4j15 views
AIM102-S_Cognizant_CognizantCognitive by PhilipBasford
AIM102-S_Cognizant_CognizantCognitiveAIM102-S_Cognizant_CognizantCognitive
AIM102-S_Cognizant_CognizantCognitive
PhilipBasford21 views
The Coming AI Tsunami.pptx by johnhandby
The Coming AI Tsunami.pptxThe Coming AI Tsunami.pptx
The Coming AI Tsunami.pptx
johnhandby13 views
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue199 views
Mobile Core Solutions & Successful Cases.pdf by IPLOOK Networks
Mobile Core Solutions & Successful Cases.pdfMobile Core Solutions & Successful Cases.pdf
Mobile Core Solutions & Successful Cases.pdf
IPLOOK Networks14 views
Business Analyst Series 2023 - Week 4 Session 8 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8
DianaGray10145 views
Deep Tech and the Amplified Organisation: Core Concepts by Holonomics
Deep Tech and the Amplified Organisation: Core ConceptsDeep Tech and the Amplified Organisation: Core Concepts
Deep Tech and the Amplified Organisation: Core Concepts
Holonomics17 views
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」 by PC Cluster Consortium
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」
GDSC GLAU Info Session.pptx by gauriverrma4
GDSC GLAU Info Session.pptxGDSC GLAU Info Session.pptx
GDSC GLAU Info Session.pptx
gauriverrma415 views
The Power of Heat Decarbonisation Plans in the Built Environment by IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE84 views
Business Analyst Series 2023 - Week 4 Session 7 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7
DianaGray10146 views
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... by ShapeBlue
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
ShapeBlue108 views

Cock Biopython Bosc2009

  • 1. Biopython Project Update Peter Cock, Plant Pathology, SCRI, Dundee, UK 10th Annual Bioinformatics Open Source Conference (BOSC) Stockholm, Sweden, 28 June 2009
  • 2. Contents •  Brief introduction to Biopython & history •  Releases since BOSC 2008 •  Current and future projects •  CVS, git and github •  BoF hackathon and tutorial at BOSC 2009
  • 3. Biopython •  Free, open source library for bioinformatics •  Supported by Open Bioinformatics Foundation •  Runs on Windows, Linux, Mac OS X, etc •  International team of volunteer developers •  Currently about three releases per year •  Extensive “Biopython Tutorial & Cookbook” •  See www.biopython.org for details
  • 4. Biopython’s Ten Year History 1999 •  Started by Jeff Chang & Andrew Dalke 2000 •  First release, Biopython 0.90 2001 •  Biopython 1.00, “semi-complete” … •  Biopython 1.10, …, 1.41 2007 •  Biopython 1.43 (Bio.SeqIO), 1.44 2008 •  Biopython 1.45, 1.47, 1.48, 1.49 2009 •  Biopython 1.50, 1.51beta •  OA Publication, Cock et al.
  • 5. Biopython Publication – Cock et al. 2009 N.B. Open Access!
  • 6. November 2008 – Biopython 1.49 •  Support for Python 2.6 •  Switched from “Numeric” to “NumPy” (important Numerical library for Python) •  More biological methods on core Seq object
  • 7. April 2009 – Biopython 1.50 •  New Bio.Motif module for sequence motifs (to replace Bio.AliceAce and Bio.MEME) •  Support for QUAL and FASTQ in Bio.SeqIO (important NextGen sequencing formats) •  Integration of GenomeDiagram for figures (Pritchard et al. 2006)
  • 8. Biopython 1.50 includes GenomeDiagram De novo assembly of 42kb phage from Roche 454 data “Feature Track” showing ORFs Scale tick marks “Barchart Track” of read depth (~100, scale max 200)
  • 9. Reading a FASTA file with Bio.SeqIO >FL3BO7415JACDX TTAATTTTATTTTGTCGGCTAAAGAGATTTTTAGCTAAACGTTCAATTGCTTTAGCTGAA GTACGAGCAGATACTCCAATCGCAATTGTTTCTTCATTTAAAATTAGCTCGTCGCCACCT TCAATTGGAAATTTATAATCACGATCTAACCAGATTGGTACATTATGTTTTGCAAATCTT GGATGATATTTAATGATGTACTCCATGAATAATGATTCACGTCTACGCGCTGGTTCTCTC ATCTTATTTATCGTTAAGCCA >FL3BO7415I7AFR ... from Bio import SeqIO for rec in SeqIO.parse(open("phage.fasta"), "fasta") : print rec.id, len(rec.seq), rec.seq[:10]+"..." FL3BO7415JACDX 261 TTAATTTTAT... FL3BO7415I7AFR FL3BO7415JCAY5 267 136 CATTAACTAA... TTTCTTTTCT... Focus on the FL3BO7415JB41R 208 CTCTTTTATG... FL3BO7415I6HKB FL3BO7415I63UC 268 219 GGTATTTGAA... AACATGTGAG... filename and ... format (“fasta”)…
  • 10. Reading a FASTQ file with Bio.SeqIO @FL3BO7415JACDX TTAATTTTATTTTGTCGGCTAAAGAGATTTTTAGCTAAACGTTCAATTGCTTTAGCTGAAGTACGAGCAGATACTCCAATCGCAATTGTTTCTTC ATTTAAAATTAGCTCGTCGCCACCTTCAATTGGAAATTTATAATCACGATCTAACCAGATTGGTACATTATGTTTTGCAAATCTTGGATGATATT TAATGATGTACTCCATGAATAATGATTCACGTCTACGCGCTGGTTCTCTCATCTTATTTATCGTTAAGCCA + BBBB2262=1111FFGGGHHHHIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFGGGFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFGGGGFFFFFFFFFFFFFFFFFGB BBCFFFFFFFFFFFFFFFFFFFFFFFGGGGGGGIIIIIIIGGGIIIGGGIIGGGG@AAAAA?===@@@??? @FL3BO7415I7AFR ... from Bio import SeqIO for rec in SeqIO.parse(open("phage.fastq"), "fastq") : print rec.id, len(rec.seq), rec.seq[:10]+"..." print rec.letter_annotations["phred_quality"][:10], "..." FL3BO7415JACDX 261 TTAATTTTAT... [33, 33, 33, 33, 17, 17, 21, 17, 28, FL3BO7415I7AFR 267 CATTAACTAA... 16] ... Just filename and [37, 37, 37, 37, 37, 37, 37, 37, 38, 38] ... FL3BO7415JCAY5 136 TTTCTTTTCT... [37, 37, 36, 36, 29, 29, 29, 29, 36, 37] ... format changed FL3BO7415JB41R 208 CTCTTTTATG... [37, 37, 37, 38, 38, 38, 38, 38, 37, 37] ... (“fasta” to “fastq”) FL3BO7415I6HKB 268 GGTATTTGAA... [37, 37, 37, 37, 34, 34, 34, 37, 37, 37] ... FL3BO7415I63UC 219 AACATGTGAG... [37, 37, 37, 37, 37, 37, 37, 37, 37, 37] ... ...
  • 11. June 2009 – Biopython 1.51 beta •  Support for Illumina 1.3+ FASTQ files (in addition to Sanger FASTQ and older Solexa/Illumina FASTQ files) •  Faster parsing of UniProt/SwissProt files •  Bio.SeqIO now writes feature table in GenBank output Already being used at SCRI for genome annotation, e.g. with RAST and Artemis
  • 12. Google Summer of Code Projects •  Eric Talevich - Parsing and writing phyloXML •  Mentors Brad Chapman & Christian Zmasek •  Nick Matzke - Biogeographical Phylogenetics •  Mentors Stephen Smith, Brad Chapman & David Kidd •  Hosted by NESCent Phyloinformatics Group •  Code development on github branches...
  • 13. Other Notable Active Projects •  Brad Chapman – GFF parsing •  Tiago Antão – Population genetics statistics •  Peter Cock – Parsing Roche 454 SFF files (with Jose Blanca, co-author of sff_extract) •  Plus other ongoing refinements and documentation improvements
  • 14. Distributed Development •  Currently work from a stable branch in CVS •  CVS master is mirrored to github.com •  Several sub-projects are being developed on github branches (in public) •  This is letting us get familiar with git & github •  Suggested plan is to switch from CVS to git summer 2009 (still hosted by OBF), continue to push to github for public collaboration
  • 15. Acknowledgements •  Other Biopython contributors & developers! •  Open Bioinformatics Foundation (OBF) supports Biopython (and BioPerl etc) •  Society for General Microbiology (SGM) for my travel costs •  My Biopython work supported by: •  EPSRC funded PhD (MOAC DTC, University of Warwick, UK) •  SCRI (Scottish Crop Research Institute), who also paid my conference fees
  • 16. What next? •  This afternoon’s “Birds of a Feather” session: Biopython Tutorial and/or Hackathon •  Sign up to our mailing list? •  Homepage www.biopython.org