DAS, the Distributed Annotation System

Proteomics Bioinformatics
WTAC
13-17 December 2010
Rafael Jimenez
rafael@ebi.ac.uk
EnCORE
presentation
DAS
Distributed Annotation System

Table of contents
• DAS
 What is it?
 Commands and queries
 Why should I use it?
 Documentation
 Clients and servers

DAS, The Distributed Annotation System
The Distributed Annotation System is…
– A network of biological data sources
– A Service Oriented Architecture (SOA)
– RESTful web service
– An example of federation
• Uniform access to multiple repositories of biological data.
• Repositories distributed in different geographical locations.
The DAS Protocol is…
– An integration platform
– A client-server protocol
– An agreed standard for web services

23.08.18 5
DAS data types
Genome sequence
Sequence alignments
Protein sequence
Protein-protein interaction
Gel 2D
EMAP
3DM
Protein structure
Protein structure
EMAP
3DM
Protein-protein interaction
Protein structure
Gel 2D
Mass spectrometry
Epigenetics
Phenotype
Functional genomics
Structural genomics
Protein sequence
Alignment servers Annotation servers Reference servers

The Distributed Annotation System, 2001 Dowell et al;
BMC Bioinformatics. 2001; 2: 7. Published online 2001 October 10.
DAS, Architectural Overview
illustration

Service
broker
Service
consumer
Service
provider
Service
Contract
...
...
Interact
PublishFind
Service Oriented Architecture
DAS implementation
DAS
...
...
...
DAS
Registry
DAS Clients
Annotation
sources
Reference
source
Alignment
sources
Alignment
sources
Alignment
sources
Annotation
sources
Annotation
sources
DAS Clients
DAS Clients
Protocol

Example client behaviour
Andy Jenkinson

Example client behaviour
Standardization allows clients to connect to different
DAS sources without additional programming
Andy Jenkinson

DAS – Andy Jenkinson
23.08.1812
Query model
Structured REST URL
– http://server/das/source/command?arguments
– servers, data sources, commands, parameters
Reference object
– e.g. “chromosome X”
Reference servers provide sequence
– http://server/das/source/sequence?segment=X:1,500
Annotation servers provide features
– http://server/das/source/features?segment=X:1,500

23.08.1813
Data model
Lightweight XML
http://server/das/source/features?segment=X:1,500
<SEGMENT id=“X” start=“1” stop=“500”>
<FEATURE id=“…”>
<TYPE id=“…” category=“…”>…</TYPE>
<METHOD id=“…”>…</METHOD>
<START>…</START>
<END>…</END>
</FEATURE>
…
</FEATURE>
</SEGMENT>
http://server/das/source/features?segment=X:1,500
<SEGMENT id=“X” start=“1” stop=“500”>
<TYPE id=“…” category=“…”>…</TYPE>
<METHOD id=“…”>…</METHOD>
<START>…</START>
<END>…</END>
</FEATURE>
…
</FEATURE>
</SEGMENT>

DAS Annotation source - Protein Feature Request
Non-positional feature
Positional feature
http://www.ebi.ac.uk/das-srv/uniprot/das/uniprot/features?segment=Q12345

DAS Reference source - Protein Sequence Request
http://www.ebi.ac.uk/das-srv/uniprot/das/uniprot/sequence?segment=Q12345

More DAS Commands
• Alignment, Structure and Interaction
• More …
http://server/das/source/entry_points
– entry_points: List of available “chromosomes | contigs | proteins | …”
http://server/das/source/types
– types – provides a summary of the feature types for a segment.
http://server/das/source/stylesheet
– stylesheet – gives hints to the DAS client about how to display the
feature types. Can be ignored of course.
http://server/das/sources
– sources – list of available sources in one DAS server. Replaces the
original, underspecified dsn command.
http://www.biodas.org/wiki/DAS1.6

23.08.1818
DAS Design Principles
Data remains distributed
• “live” data
• data providers retain responsibility
• good for changing data
• spreads resources
Easy for data providers to implement
• simple protocol
• lots of data providers

23.08.1819
DAS Design Principles
Principally for display
• should be responsive (fast)
• region-targeted queries
• lightweight infrastructure
Downsides
• Rigid data model
• Weak semantics

Tutorials
http://www.biodas.org/wiki/DASWorkshop2010

Versions of DAS
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
~250
sources
~380
sources
~650
sources
~ 8 sources
DAS
1.01
~1300
sources
DAS
1.53
DAS
2.0
DAS
2.1
DAS
1.53E
DAS
1.6DAS 1 DAS/2

DAS Specification 1.6
http://www.biodas.org/wiki/DAS1.6

List of DAS Servers
23.08.1826

DAS Client libraries
23.08.1827
• Bio::Das::Lite (Perl)
• Dasobert (Java)

List of DAS Clients
23.08.1828
• Ensembl uses DAS to pull in genomic, gene and protein annotations. It also
provides data via DAS.
• Gbrowse is a generic genome browser, and is both a consumer and provider
of DAS.
• IGB is a desktop application for viewing genomic data.
• SPICE is an application for projecting protein annotations onto 3D structures.
• Dasty2 is a web-based viewer for protein annotations
• Jalview is a multiple alignment editor.
• PeppeR is a graphical viewer for 3D electron microscopy data.
• DASMI is an integration portal for protein interaction data.
• DASher is a Java-based viewer for protein annotations.
• EpiC presents structure-function summaries for antibody design.
• STRAP is a STRucture-based sequence Alignment Program.

23.08.18 29
Protein sequence data
Dasty2

23.08.18 30
Genome sequence data
Ensembl

23.08.18 31
Protein structure data
Spice-Sisyphus

23.08.18 32
Protein-protein interaction data
iPfam

23.08.18 33
Sequence alignment data
Pfam

23.08.18 34
EMAP data
EMAP: The Edinburgh Mouse Atlas Project
Gene expression databases (EMAGE & GXD)

DAS reference server

EMAP - Ontology
DAS annotation servers

EMAGE

GXD

Thank you!
Questions?
ProteomicsServicesTeam

DAS, the Distributed Annotation System

More Related Content

Similar to DAS, the Distributed Annotation System

More from Rafael C. Jimenez

Recently uploaded

DAS, the Distributed Annotation System

Editor's Notes