0
EMBOSS European Molecular Biology Open Software Suite Open-Bio Project Update 2011 Peter Rice pmr@ebi.ac.uk Alan Bleasby, ...
A quick introduction <ul><li>Open source package for sequence analysis </li></ul><ul><ul><li>ANSI C source code </li></ul>...
Who do we serve? <ul><li>Expert software developers </li></ul><ul><ul><li>Bioinformaticians </li></ul></ul><ul><ul><li>Com...
EMBOSS command line interface <ul><li>EMBOSS applications run from the command line </li></ul><ul><li>This is not the only...
EMBOSS Update <ul><li>Release 6.4.0 as usual on 15th July 2011 </li></ul><ul><li>New Website  emboss.open-bio.org </li></u...
Data sources for EMBOSS <ul><li>Server definitions </li></ul><ul><ul><li>One server, 100+ databases </li></ul></ul><ul><ul...
New data types: input and output <ul><li>OBO ontology terms </li></ul><ul><li>NCBI Taxonomy </li></ul><ul><li>Data Resourc...
New query language <ul><li>SRS-like syntax </li></ul><ul><ul><li>id lists:  dbname:{ida,idb,idc} </li></ul></ul><ul><ul><l...
EDAM ontology <ul><li>EDAM defines topic, operation, data, format, identifier </li></ul><ul><ul><li>ACD file application, ...
DRCAT Data Resource Catalogue <ul><li>Public Data Resources </li></ul><ul><li>EDAM annotations </li></ul><ul><li>UniProt a...
Ontologies: NCBI Taxonomy <ul><li>Parsers for “.dmp” files </li></ul><ul><li>Indexed by dbxtax </li></ul><ul><li>Navigatio...
Installation <ul><li>Release size increased </li></ul><ul><ul><li>EDAM, DRCAT, NCBI Taxonomy, GO, plus index files </li></...
EMBOSS Interfaces and wrappers <ul><li>Two releases in this year </li></ul><ul><li>Too many for other projects to keep up ...
EMBOSS Future Plans <ul><li>Further development this year </li></ul><ul><ul><li>Mapped short reads </li></ul></ul><ul><ul>...
The Emboss Team BOSC 2011: EMBOSS 17.07.11 Peter Rice Alan Bleasby Jon Ison Mahmut Uludag Michael Schuster
Acknowledgements <ul><li>EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag, Michael Schuster, Martin Senger, Tom Oinn...
Upcoming SlideShare
Loading in...5
×

G09-Misc-EMBOSS

602

Published on

EMBOSS: New developments and extended data access (Peter Rice)

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
602
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "G09-Misc-EMBOSS"

  1. 1. EMBOSS European Molecular Biology Open Software Suite Open-Bio Project Update 2011 Peter Rice pmr@ebi.ac.uk Alan Bleasby, Jon Ison, Mahmut Uludag, Michael Schuster
  2. 2. A quick introduction <ul><li>Open source package for sequence analysis </li></ul><ul><ul><li>ANSI C source code </li></ul></ul><ul><ul><li>GPL licensed applications, LGPL libraries </li></ul></ul><ul><ul><li>275+ applications </li></ul></ul><ul><ul><li>150+ third party applications in 15 associated packages </li></ul></ul><ul><ul><ul><li>MIRA, MEME, HMMER, PHYLIP, VIENNA, etc. </li></ul></ul></ul><ul><ul><li>Project started 1996 at Sanger and Daresbury/HGMP </li></ul></ul><ul><ul><li>Now based at EBI </li></ul></ul><ul><ul><li>Release 1.0.0 15th July 2000 </li></ul></ul><ul><ul><li>Release 6.4.0 15th July 2011 </li></ul></ul><ul><ul><li>Funded by UK-BBSRC and EMBL-EBI </li></ul></ul><ul><ul><li>Originally funded by the Wellcome Trust </li></ul></ul><ul><ul><li>Additional funds from UK-MRC </li></ul></ul>BOSC 2011: EMBOSS 17.07.11
  3. 3. Who do we serve? <ul><li>Expert software developers </li></ul><ul><ul><li>Bioinformaticians </li></ul></ul><ul><ul><li>Computer scientists </li></ul></ul><ul><li>Expert users </li></ul><ul><ul><li>Biology research community </li></ul></ul><ul><ul><li>Industry </li></ul></ul><ul><li>Scientific users </li></ul><ul><ul><li>Biology research community </li></ul></ul><ul><ul><li>Industry </li></ul></ul>BOSC 2011: EMBOSS 17.07.11
  4. 4. EMBOSS command line interface <ul><li>EMBOSS applications run from the command line </li></ul><ul><li>This is not the only interface </li></ul><ul><ul><li>There are over 100 interfaces and packaged systems available </li></ul></ul><ul><ul><ul><li>Web: wEMBOSS, Mobyle </li></ul></ul></ul><ul><ul><ul><li>GUI: Jemboss </li></ul></ul></ul><ul><ul><ul><li>Web Services: SoapLab </li></ul></ul></ul><ul><ul><ul><li>Workflows: Galaxy, Taverna, Pipeline Pilot </li></ul></ul></ul><ul><ul><ul><li>Windows: mEMBOSS </li></ul></ul></ul><ul><li>All applications have a command definition file (.acd) </li></ul><ul><ul><li>Defines all inputs, outputs, and other options </li></ul></ul><ul><ul><li>Read at startup </li></ul></ul><ul><ul><li>Contains all command line options with descriptions </li></ul></ul><ul><ul><li>Template for any other interface </li></ul></ul>BOSC 2011: EMBOSS 17.07.11
  5. 5. EMBOSS Update <ul><li>Release 6.4.0 as usual on 15th July 2011 </li></ul><ul><li>New Website emboss.open-bio.org </li></ul><ul><li>Three open source books: users, developers, admin </li></ul><ul><ul><li>Cambridge University Press </li></ul></ul>BOSC 2011: EMBOSS 17.07.11
  6. 6. Data sources for EMBOSS <ul><li>Server definitions </li></ul><ul><ul><li>One server, 100+ databases </li></ul></ul><ul><ul><li>server:dbname as the database name </li></ul></ul><ul><li>Data access methods </li></ul><ul><ul><li>Ensembl, DAS, BioMart, CHADO,SRS, Entrez, MRS </li></ul></ul><ul><ul><li>EBI REST and SOAP services </li></ul></ul><ul><ul><li>Data resource Catalogue (DRCAT) </li></ul></ul><ul><li>emboss.standard file for all installations </li></ul><ul><ul><li>IF-ELSE-ENDIF to customize for SQL, AXIS2C, local setup </li></ul></ul><ul><li>New applications </li></ul><ul><ul><li>showserver, dbtell, servertell </li></ul></ul>BOSC 2011: EMBOSS 17.07.11
  7. 7. New data types: input and output <ul><li>OBO ontology terms </li></ul><ul><li>NCBI Taxonomy </li></ul><ul><li>Data Resource Catalogue entries </li></ul><ul><li>Text </li></ul><ul><li>URL </li></ul><ul><li>Cross-references: </li></ul><ul><ul><li>dbname and identifier </li></ul></ul><ul><ul><li>data content </li></ul></ul>BOSC 2011: EMBOSS 17.07.11
  8. 8. New query language <ul><li>SRS-like syntax </li></ul><ul><ul><li>id lists: dbname:{ida,idb,idc} </li></ul></ul><ul><ul><li>or operator: dbname-{id:h* | des:hemoglobin} </li></ul></ul><ul><ul><li>and operator: dbname-{id:h* & des:hemoglobin} </li></ul></ul><ul><ul><li>and operator: dbname-{id:h* & des:hemoglobin} </li></ul></ul><ul><ul><li>eor operator: dbname-{id:h* ^ des:hemoglobin} </li></ul></ul><ul><li>Compressed (20-fold) b+tree indexes </li></ul><ul><li>New indexing applications (obo, taxon, drcat) </li></ul>BOSC 2011: EMBOSS 17.07.11
  9. 9. EDAM ontology <ul><li>EDAM defines topic, operation, data, format, identifier </li></ul><ul><ul><li>ACD file application, inputs, outputs, parameters </li></ul></ul><ul><ul><li>DRCAT resources, queries, identifiers </li></ul></ul><ul><ul><li>SoapLab web services </li></ul></ul><ul><ul><li>Redefined EMBOSS program groups. </li></ul></ul><ul><li>OBO format ontology </li></ul><ul><ul><li>2835 terms </li></ul></ul><ul><ul><li>Available throughout EMBOSS as database EDAM: </li></ul></ul><ul><li>New applications </li></ul><ul><ul><li>EDAM namespace searches, relation queries </li></ul></ul><ul><ul><li>OBO ontology applications </li></ul></ul><ul><ul><li>GO, SO, and other OBO ontologies in release </li></ul></ul>BOSC 2011: EMBOSS 17.07.11
  10. 10. DRCAT Data Resource Catalogue <ul><li>Public Data Resources </li></ul><ul><li>EDAM annotations </li></ul><ul><li>UniProt and EMBL/GenBank/DDBJ cross-references </li></ul><ul><li>Query prototypes </li></ul><ul><li>Example identifiers for testing </li></ul><ul><li>662 entries </li></ul><ul><li>Available in EMBOSS as database DRCAT: </li></ul><ul><li>Applications: </li></ul><ul><ul><li>Search by EDAM annotation </li></ul></ul><ul><ul><li>Search by 18 indexed fields </li></ul></ul>BOSC 2011: EMBOSS 17.07.11
  11. 11. Ontologies: NCBI Taxonomy <ul><li>Parsers for “.dmp” files </li></ul><ul><li>Indexed by dbxtax </li></ul><ul><li>Navigation up, down, siblings (the usual suspects) </li></ul><ul><li>Automatic cross references from sequence data </li></ul><ul><ul><li>EMBL source line </li></ul></ul><ul><ul><li>UniProt OX lines </li></ul></ul><ul><ul><li>BioMart mart name (organism name) </li></ul></ul><ul><ul><li>etc. </li></ul></ul><ul><li>New applications </li></ul><ul><ul><li>Search and retrieve from taxon hierarchy </li></ul></ul>BOSC 2011: EMBOSS 17.07.11
  12. 12. Installation <ul><li>Release size increased </li></ul><ul><ul><li>EDAM, DRCAT, NCBI Taxonomy, GO, plus index files </li></ul></ul><ul><ul><li>Associated packages </li></ul></ul><ul><ul><ul><li>AXIS2C (SOAP web service access) </li></ul></ul></ul><ul><ul><ul><li>MYSQL (Ensembl) </li></ul></ul></ul><ul><ul><ul><li>PostgresQL (FlyBase) </li></ul></ul></ul><ul><li>mEMBOSS for Windows </li></ul><ul><ul><li>Enhanced QA testing </li></ul></ul><ul><ul><ul><li>Standard test set adapted for use on Windows and Unix </li></ul></ul></ul>BOSC 2011: EMBOSS 17.07.11
  13. 13. EMBOSS Interfaces and wrappers <ul><li>Two releases in this year </li></ul><ul><li>Too many for other projects to keep up </li></ul><ul><ul><li>So we are obliged to help, starting with: </li></ul></ul><ul><ul><ul><li>SoapLab2 </li></ul></ul></ul><ul><ul><ul><li>Jemboss </li></ul></ul></ul><ul><ul><ul><li>Galaxy </li></ul></ul></ul><ul><ul><ul><li>Mobyle </li></ul></ul></ul><ul><ul><ul><li>… and anyone else who asks </li></ul></ul></ul><ul><li>Interface generation should be automated </li></ul><ul><ul><li>Tested during development </li></ul></ul><ul><ul><li>Changes highlighted before release </li></ul></ul>BOSC 2011: EMBOSS 17.07.11
  14. 14. EMBOSS Future Plans <ul><li>Further development this year </li></ul><ul><ul><li>Mapped short reads </li></ul></ul><ul><ul><li>Reference sequences </li></ul></ul><ul><ul><li>Sequence variation </li></ul></ul><ul><ul><li>Genome browser data format support </li></ul></ul><ul><li>Leaving EBI in December </li></ul><ul><ul><li>… into the unknown </li></ul></ul><ul><ul><li>… still supporting EMBOSS and planning new developments </li></ul></ul>BOSC 2011: EMBOSS 17.07.11
  15. 15. The Emboss Team BOSC 2011: EMBOSS 17.07.11 Peter Rice Alan Bleasby Jon Ison Mahmut Uludag Michael Schuster
  16. 16. Acknowledgements <ul><li>EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag, Michael Schuster, Martin Senger, Tom Oinn, Jaina Mistry, Rodrigo Lopez, Sharmilla Pillai, Hamish McWilliam, Syed Haider </li></ul><ul><li>RFCGR/HGMP: Alan Bleasby, Jon Ison, Tim Carver, Hugh Morgan, Claude Beazley, Lisa Mullan, Damian Counsell, Gary Williams, Val Curwen, Mark Faller, Sinead O’Leary, Thon deBoer, Martin Bishop </li></ul><ul><li>LION: Thomas Laurent, Bijay Jassal, Bren Vaughan, Thure Etzold </li></ul><ul><li>Sanger Institute: Ian Longden, Richard Bruskiewich, Simon Kelley </li></ul><ul><li>National bioinformatics service providers in: Norway, Spain, Italy, Netherlands, Germany, Belgium, Russia, China, Canada, Australia, Argentina </li></ul><ul><li>Others: Catherine Letondal, Don Gilbert, Rodger Staden, Bill Pearson, Webb Miller, Marie-Laetitia Denayer, Amandine Schurmann, Gabriele Weiler, Luke McCarthy, David Mathog, David Bauer, Henrikki Almusa, Thomas Siegmund, Scott Markel, Darryl Leon, Bastien Chevreux, Ivo Hofacker, Kristoffer Rapacki, Matus Kalas </li></ul><ul><li>Cambridge University Press, LION bioscience, IBM, Hewlett-Packard, (Compaq), Apple, SGI, Sun, SciTegic, Microsoft Research </li></ul><ul><li>Open-Bio Foundation, Sourceforge, ... And the British Antarctic Survey </li></ul><ul><li>http://emboss.open-bio.org </li></ul><ul><li>http://emboss.open-bio.org/wiki </li></ul>BOSC 2011: EMBOSS 17.07.11
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×