Uploaded on

EMBOSS: New developments and extended data access (Peter Rice)

EMBOSS: New developments and extended data access (Peter Rice)

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
548
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
5
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. EMBOSS European Molecular Biology Open Software Suite Open-Bio Project Update 2011 Peter Rice pmr@ebi.ac.uk Alan Bleasby, Jon Ison, Mahmut Uludag, Michael Schuster
  • 2. A quick introduction
    • Open source package for sequence analysis
      • ANSI C source code
      • GPL licensed applications, LGPL libraries
      • 275+ applications
      • 150+ third party applications in 15 associated packages
        • MIRA, MEME, HMMER, PHYLIP, VIENNA, etc.
      • Project started 1996 at Sanger and Daresbury/HGMP
      • Now based at EBI
      • Release 1.0.0 15th July 2000
      • Release 6.4.0 15th July 2011
      • Funded by UK-BBSRC and EMBL-EBI
      • Originally funded by the Wellcome Trust
      • Additional funds from UK-MRC
    BOSC 2011: EMBOSS 17.07.11
  • 3. Who do we serve?
    • Expert software developers
      • Bioinformaticians
      • Computer scientists
    • Expert users
      • Biology research community
      • Industry
    • Scientific users
      • Biology research community
      • Industry
    BOSC 2011: EMBOSS 17.07.11
  • 4. EMBOSS command line interface
    • EMBOSS applications run from the command line
    • This is not the only interface
      • There are over 100 interfaces and packaged systems available
        • Web: wEMBOSS, Mobyle
        • GUI: Jemboss
        • Web Services: SoapLab
        • Workflows: Galaxy, Taverna, Pipeline Pilot
        • Windows: mEMBOSS
    • All applications have a command definition file (.acd)
      • Defines all inputs, outputs, and other options
      • Read at startup
      • Contains all command line options with descriptions
      • Template for any other interface
    BOSC 2011: EMBOSS 17.07.11
  • 5. EMBOSS Update
    • Release 6.4.0 as usual on 15th July 2011
    • New Website emboss.open-bio.org
    • Three open source books: users, developers, admin
      • Cambridge University Press
    BOSC 2011: EMBOSS 17.07.11
  • 6. Data sources for EMBOSS
    • Server definitions
      • One server, 100+ databases
      • server:dbname as the database name
    • Data access methods
      • Ensembl, DAS, BioMart, CHADO,SRS, Entrez, MRS
      • EBI REST and SOAP services
      • Data resource Catalogue (DRCAT)
    • emboss.standard file for all installations
      • IF-ELSE-ENDIF to customize for SQL, AXIS2C, local setup
    • New applications
      • showserver, dbtell, servertell
    BOSC 2011: EMBOSS 17.07.11
  • 7. New data types: input and output
    • OBO ontology terms
    • NCBI Taxonomy
    • Data Resource Catalogue entries
    • Text
    • URL
    • Cross-references:
      • dbname and identifier
      • data content
    BOSC 2011: EMBOSS 17.07.11
  • 8. New query language
    • SRS-like syntax
      • id lists: dbname:{ida,idb,idc}
      • or operator: dbname-{id:h* | des:hemoglobin}
      • and operator: dbname-{id:h* & des:hemoglobin}
      • and operator: dbname-{id:h* & des:hemoglobin}
      • eor operator: dbname-{id:h* ^ des:hemoglobin}
    • Compressed (20-fold) b+tree indexes
    • New indexing applications (obo, taxon, drcat)
    BOSC 2011: EMBOSS 17.07.11
  • 9. EDAM ontology
    • EDAM defines topic, operation, data, format, identifier
      • ACD file application, inputs, outputs, parameters
      • DRCAT resources, queries, identifiers
      • SoapLab web services
      • Redefined EMBOSS program groups.
    • OBO format ontology
      • 2835 terms
      • Available throughout EMBOSS as database EDAM:
    • New applications
      • EDAM namespace searches, relation queries
      • OBO ontology applications
      • GO, SO, and other OBO ontologies in release
    BOSC 2011: EMBOSS 17.07.11
  • 10. DRCAT Data Resource Catalogue
    • Public Data Resources
    • EDAM annotations
    • UniProt and EMBL/GenBank/DDBJ cross-references
    • Query prototypes
    • Example identifiers for testing
    • 662 entries
    • Available in EMBOSS as database DRCAT:
    • Applications:
      • Search by EDAM annotation
      • Search by 18 indexed fields
    BOSC 2011: EMBOSS 17.07.11
  • 11. Ontologies: NCBI Taxonomy
    • Parsers for “.dmp” files
    • Indexed by dbxtax
    • Navigation up, down, siblings (the usual suspects)
    • Automatic cross references from sequence data
      • EMBL source line
      • UniProt OX lines
      • BioMart mart name (organism name)
      • etc.
    • New applications
      • Search and retrieve from taxon hierarchy
    BOSC 2011: EMBOSS 17.07.11
  • 12. Installation
    • Release size increased
      • EDAM, DRCAT, NCBI Taxonomy, GO, plus index files
      • Associated packages
        • AXIS2C (SOAP web service access)
        • MYSQL (Ensembl)
        • PostgresQL (FlyBase)
    • mEMBOSS for Windows
      • Enhanced QA testing
        • Standard test set adapted for use on Windows and Unix
    BOSC 2011: EMBOSS 17.07.11
  • 13. EMBOSS Interfaces and wrappers
    • Two releases in this year
    • Too many for other projects to keep up
      • So we are obliged to help, starting with:
        • SoapLab2
        • Jemboss
        • Galaxy
        • Mobyle
        • … and anyone else who asks
    • Interface generation should be automated
      • Tested during development
      • Changes highlighted before release
    BOSC 2011: EMBOSS 17.07.11
  • 14. EMBOSS Future Plans
    • Further development this year
      • Mapped short reads
      • Reference sequences
      • Sequence variation
      • Genome browser data format support
    • Leaving EBI in December
      • … into the unknown
      • … still supporting EMBOSS and planning new developments
    BOSC 2011: EMBOSS 17.07.11
  • 15. The Emboss Team BOSC 2011: EMBOSS 17.07.11 Peter Rice Alan Bleasby Jon Ison Mahmut Uludag Michael Schuster
  • 16. Acknowledgements
    • EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag, Michael Schuster, Martin Senger, Tom Oinn, Jaina Mistry, Rodrigo Lopez, Sharmilla Pillai, Hamish McWilliam, Syed Haider
    • RFCGR/HGMP: Alan Bleasby, Jon Ison, Tim Carver, Hugh Morgan, Claude Beazley, Lisa Mullan, Damian Counsell, Gary Williams, Val Curwen, Mark Faller, Sinead O’Leary, Thon deBoer, Martin Bishop
    • LION: Thomas Laurent, Bijay Jassal, Bren Vaughan, Thure Etzold
    • Sanger Institute: Ian Longden, Richard Bruskiewich, Simon Kelley
    • National bioinformatics service providers in: Norway, Spain, Italy, Netherlands, Germany, Belgium, Russia, China, Canada, Australia, Argentina
    • Others: Catherine Letondal, Don Gilbert, Rodger Staden, Bill Pearson, Webb Miller, Marie-Laetitia Denayer, Amandine Schurmann, Gabriele Weiler, Luke McCarthy, David Mathog, David Bauer, Henrikki Almusa, Thomas Siegmund, Scott Markel, Darryl Leon, Bastien Chevreux, Ivo Hofacker, Kristoffer Rapacki, Matus Kalas
    • Cambridge University Press, LION bioscience, IBM, Hewlett-Packard, (Compaq), Apple, SGI, Sun, SciTegic, Microsoft Research
    • Open-Bio Foundation, Sourceforge, ... And the British Antarctic Survey
    • http://emboss.open-bio.org
    • http://emboss.open-bio.org/wiki
    BOSC 2011: EMBOSS 17.07.11