G09-Misc-EMBOSS
Upcoming SlideShare
Loading in...5
×
 

G09-Misc-EMBOSS

on

  • 775 views

EMBOSS: New developments and extended data access (Peter Rice)

EMBOSS: New developments and extended data access (Peter Rice)

Statistics

Views

Total Views
775
Views on SlideShare
775
Embed Views
0

Actions

Likes
0
Downloads
4
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

G09-Misc-EMBOSS G09-Misc-EMBOSS Presentation Transcript

  • EMBOSS European Molecular Biology Open Software Suite Open-Bio Project Update 2011 Peter Rice pmr@ebi.ac.uk Alan Bleasby, Jon Ison, Mahmut Uludag, Michael Schuster
  • A quick introduction
    • Open source package for sequence analysis
      • ANSI C source code
      • GPL licensed applications, LGPL libraries
      • 275+ applications
      • 150+ third party applications in 15 associated packages
        • MIRA, MEME, HMMER, PHYLIP, VIENNA, etc.
      • Project started 1996 at Sanger and Daresbury/HGMP
      • Now based at EBI
      • Release 1.0.0 15th July 2000
      • Release 6.4.0 15th July 2011
      • Funded by UK-BBSRC and EMBL-EBI
      • Originally funded by the Wellcome Trust
      • Additional funds from UK-MRC
    BOSC 2011: EMBOSS 17.07.11
  • Who do we serve?
    • Expert software developers
      • Bioinformaticians
      • Computer scientists
    • Expert users
      • Biology research community
      • Industry
    • Scientific users
      • Biology research community
      • Industry
    BOSC 2011: EMBOSS 17.07.11
  • EMBOSS command line interface
    • EMBOSS applications run from the command line
    • This is not the only interface
      • There are over 100 interfaces and packaged systems available
        • Web: wEMBOSS, Mobyle
        • GUI: Jemboss
        • Web Services: SoapLab
        • Workflows: Galaxy, Taverna, Pipeline Pilot
        • Windows: mEMBOSS
    • All applications have a command definition file (.acd)
      • Defines all inputs, outputs, and other options
      • Read at startup
      • Contains all command line options with descriptions
      • Template for any other interface
    BOSC 2011: EMBOSS 17.07.11
  • EMBOSS Update
    • Release 6.4.0 as usual on 15th July 2011
    • New Website emboss.open-bio.org
    • Three open source books: users, developers, admin
      • Cambridge University Press
    BOSC 2011: EMBOSS 17.07.11
  • Data sources for EMBOSS
    • Server definitions
      • One server, 100+ databases
      • server:dbname as the database name
    • Data access methods
      • Ensembl, DAS, BioMart, CHADO,SRS, Entrez, MRS
      • EBI REST and SOAP services
      • Data resource Catalogue (DRCAT)
    • emboss.standard file for all installations
      • IF-ELSE-ENDIF to customize for SQL, AXIS2C, local setup
    • New applications
      • showserver, dbtell, servertell
    BOSC 2011: EMBOSS 17.07.11
  • New data types: input and output
    • OBO ontology terms
    • NCBI Taxonomy
    • Data Resource Catalogue entries
    • Text
    • URL
    • Cross-references:
      • dbname and identifier
      • data content
    BOSC 2011: EMBOSS 17.07.11
  • New query language
    • SRS-like syntax
      • id lists: dbname:{ida,idb,idc}
      • or operator: dbname-{id:h* | des:hemoglobin}
      • and operator: dbname-{id:h* & des:hemoglobin}
      • and operator: dbname-{id:h* & des:hemoglobin}
      • eor operator: dbname-{id:h* ^ des:hemoglobin}
    • Compressed (20-fold) b+tree indexes
    • New indexing applications (obo, taxon, drcat)
    BOSC 2011: EMBOSS 17.07.11
  • EDAM ontology
    • EDAM defines topic, operation, data, format, identifier
      • ACD file application, inputs, outputs, parameters
      • DRCAT resources, queries, identifiers
      • SoapLab web services
      • Redefined EMBOSS program groups.
    • OBO format ontology
      • 2835 terms
      • Available throughout EMBOSS as database EDAM:
    • New applications
      • EDAM namespace searches, relation queries
      • OBO ontology applications
      • GO, SO, and other OBO ontologies in release
    BOSC 2011: EMBOSS 17.07.11
  • DRCAT Data Resource Catalogue
    • Public Data Resources
    • EDAM annotations
    • UniProt and EMBL/GenBank/DDBJ cross-references
    • Query prototypes
    • Example identifiers for testing
    • 662 entries
    • Available in EMBOSS as database DRCAT:
    • Applications:
      • Search by EDAM annotation
      • Search by 18 indexed fields
    BOSC 2011: EMBOSS 17.07.11
  • Ontologies: NCBI Taxonomy
    • Parsers for “.dmp” files
    • Indexed by dbxtax
    • Navigation up, down, siblings (the usual suspects)
    • Automatic cross references from sequence data
      • EMBL source line
      • UniProt OX lines
      • BioMart mart name (organism name)
      • etc.
    • New applications
      • Search and retrieve from taxon hierarchy
    BOSC 2011: EMBOSS 17.07.11
  • Installation
    • Release size increased
      • EDAM, DRCAT, NCBI Taxonomy, GO, plus index files
      • Associated packages
        • AXIS2C (SOAP web service access)
        • MYSQL (Ensembl)
        • PostgresQL (FlyBase)
    • mEMBOSS for Windows
      • Enhanced QA testing
        • Standard test set adapted for use on Windows and Unix
    BOSC 2011: EMBOSS 17.07.11
  • EMBOSS Interfaces and wrappers
    • Two releases in this year
    • Too many for other projects to keep up
      • So we are obliged to help, starting with:
        • SoapLab2
        • Jemboss
        • Galaxy
        • Mobyle
        • … and anyone else who asks
    • Interface generation should be automated
      • Tested during development
      • Changes highlighted before release
    BOSC 2011: EMBOSS 17.07.11
  • EMBOSS Future Plans
    • Further development this year
      • Mapped short reads
      • Reference sequences
      • Sequence variation
      • Genome browser data format support
    • Leaving EBI in December
      • … into the unknown
      • … still supporting EMBOSS and planning new developments
    BOSC 2011: EMBOSS 17.07.11
  • The Emboss Team BOSC 2011: EMBOSS 17.07.11 Peter Rice Alan Bleasby Jon Ison Mahmut Uludag Michael Schuster
  • Acknowledgements
    • EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag, Michael Schuster, Martin Senger, Tom Oinn, Jaina Mistry, Rodrigo Lopez, Sharmilla Pillai, Hamish McWilliam, Syed Haider
    • RFCGR/HGMP: Alan Bleasby, Jon Ison, Tim Carver, Hugh Morgan, Claude Beazley, Lisa Mullan, Damian Counsell, Gary Williams, Val Curwen, Mark Faller, Sinead O’Leary, Thon deBoer, Martin Bishop
    • LION: Thomas Laurent, Bijay Jassal, Bren Vaughan, Thure Etzold
    • Sanger Institute: Ian Longden, Richard Bruskiewich, Simon Kelley
    • National bioinformatics service providers in: Norway, Spain, Italy, Netherlands, Germany, Belgium, Russia, China, Canada, Australia, Argentina
    • Others: Catherine Letondal, Don Gilbert, Rodger Staden, Bill Pearson, Webb Miller, Marie-Laetitia Denayer, Amandine Schurmann, Gabriele Weiler, Luke McCarthy, David Mathog, David Bauer, Henrikki Almusa, Thomas Siegmund, Scott Markel, Darryl Leon, Bastien Chevreux, Ivo Hofacker, Kristoffer Rapacki, Matus Kalas
    • Cambridge University Press, LION bioscience, IBM, Hewlett-Packard, (Compaq), Apple, SGI, Sun, SciTegic, Microsoft Research
    • Open-Bio Foundation, Sourceforge, ... And the British Antarctic Survey
    • http://emboss.open-bio.org
    • http://emboss.open-bio.org/wiki
    BOSC 2011: EMBOSS 17.07.11