DAS. Technical introduction to the Distributed Annotation System.
1. Primer for Predocs
17-19 January 2011
Rafael Jimenez
rafael@ebi.ac.uk
EnCORE
presentation
DAS
Technical introduction to the
Distributed Annotation System
2. Table of contents
• DAS
Commands and queries
Design principles
Documentation
Clients and servers
4. DAS – Andy Jenkinson
23.08.184
Query model
Structured REST URL
– http://server/das/source/command?arguments
– servers, data sources, commands, parameters
Reference object
– e.g. “chromosome X”
Reference servers provide sequence
– http://server/das/source/sequence?segment=X:1,500
Annotation servers provide features
– http://server/das/source/features?segment=X:1,500
5. DAS – Andy Jenkinson
23.08.185
Data model
Lightweight XML
http://server/das/source/features?segment=X:1,500
<SEGMENT id=“X” start=“1” stop=“500”>
<FEATURE id=“…”>
<TYPE id=“…” category=“…”>…</TYPE>
<METHOD id=“…”>…</METHOD>
<START>…</START>
<END>…</END>
</FEATURE>
<FEATURE id=“…”>
…
</FEATURE>
</SEGMENT>
http://server/das/source/features?segment=X:1,500
<SEGMENT id=“X” start=“1” stop=“500”>
<FEATURE id=“…”>
<TYPE id=“…” category=“…”>…</TYPE>
<METHOD id=“…”>…</METHOD>
<START>…</START>
<END>…</END>
</FEATURE>
<FEATURE id=“…”>
…
</FEATURE>
</SEGMENT>
6. DAS Annotation source - Protein Feature Request
Non-positional feature
Positional feature
http://www.ebi.ac.uk/das-srv/uniprot/das/uniprot/features?segment=Q12345
7. DAS Reference source - Protein Sequence Request
http://www.ebi.ac.uk/das-srv/uniprot/das/uniprot/sequence?segment=Q12345
8. More DAS Commands
• Alignment, Structure and Interaction
• More …
http://server/das/source/entry_points
– entry_points: List of available “chromosomes | contigs | proteins | …”
http://server/das/source/types
– types – provides a summary of the feature types for a segment.
http://server/das/source/stylesheet
– stylesheet – gives hints to the DAS client about how to display the
feature types. Can be ignored of course.
http://server/das/sources
– sources – list of available sources in one DAS server. Replaces the
original, underspecified dsn command.
http://www.biodas.org/wiki/DAS1.6
10. DAS – Andy Jenkinson
23.08.1810
DAS Design Principles
• Data remains distributed
– “live” data
– data providers retain responsibility
– good for changing data
– spreads resources
• Easy for data providers to implement
– simple protocol
– lots of data providers
14. Versions of DAS
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
~250
sources
~380
sources
~650
sources
~ 8 sources
DAS
1.01
~1300
sources
DAS
1.53
DAS
2.0
DAS
2.1
DAS
1.53E
DAS
1.6DAS 1 DAS/2
19. List of DAS Clients
23.08.1819
• Ensembl uses DAS to pull in genomic, gene and protein annotations. It also
provides data via DAS.
• Gbrowse is a generic genome browser, and is both a consumer and provider
of DAS.
• IGB is a desktop application for viewing genomic data.
• SPICE is an application for projecting protein annotations onto 3D structures.
• Dasty2 is a web-based viewer for protein annotations
• Jalview is a multiple alignment editor.
• PeppeR is a graphical viewer for 3D electron microscopy data.
• DASMI is an integration portal for protein interaction data.
• DASher is a Java-based viewer for protein annotations.
• EpiC presents structure-function summaries for antibody design.
• STRAP is a STRucture-based sequence Alignment Program.
well-formed hierarchical URL, each server has one or more sources, and each source implements one or more commands
sequence command provides sequence, and features command provides sequence annotations
stylesheet command allows the server to govern how the feature will be rendered by the client. it works by specifying the type and colour of glyph to use for each type of feature. So for instance the COSMIC cancer mutation database DAS server specifies that substitutions should be drawn as crosses, whereas insertions are drawn as triangles.
live – warehouses allow fast access but data is often not in sync with source database
providers are responsible for data, and clients are shielded from database changes
rapidly changing data e.g. ENCODE, c.f. warehouses.
makes a lot of sense to spread resources given the topology of the network
intrinsically simple protocol, and: dumb server – all it has to do is access its adapt the data medium to XML, and existing implementations make that easy
clever client –presentation of the data
Graphic representation of the evolution of &quot;Versions of DAS&quot;. It gives a rough idea of when the different specifications were adopted and when DAS/2 started a as independent specification. It also shows an estimation of available DAS sources per year for DAS 1 and DAS/2.
Integration of biological data of various types and development of adapted bioinformatics tools represent critical objectives to enable research at the systems level. The European Network of Excellence ENFIN is engaged in developing an adapted infrastructure to connect databases, and platforms to enable both generation of new bioinformatics tools and experimental validation of computational predictions. Beyond the use of common standards to format individual datasets, there is a need for sophisticated informatics platforms to enable mining data across various domains, sources, formats and types. The aim of the EnCORE project is to integrate across different disciplines an extensive list of database resources and analysis tools in a computationally accessible and extensible manner, facilitating automated data retrieval and processing with a special focus on systems biology. The EnCORE platform is available as a collection of webservices with a common standard format easy to integrate in Workflow management software such as Taverna. Additionally EnCORE services are also accessible thought EnVISION, a web graphical user interface providing elaborated information such as molecular interaction, biological pathways and computational models of pathways.