SADI for GMOD: Semantic Web Services for Model Organism Databases
Upcoming SlideShare
Loading in...5
×
 

SADI for GMOD: Semantic Web Services for Model Organism Databases

on

  • 2,803 views

SADI for GMOD is a collection of ready-made SADI services for accessing sequence feature data in RDF form. The services were developed as an add-on for the GMOD (Generic Model Organism Database) ...

SADI for GMOD is a collection of ready-made SADI services for accessing sequence feature data in RDF form. The services were developed as an add-on for the GMOD (Generic Model Organism Database) project, which is a popular toolkit for building model organism databases and their associated websites (e.g. FlyBase).

Statistics

Views

Total Views
2,803
Views on SlideShare
2,803
Embed Views
0

Actions

Likes
1
Downloads
8
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

SADI for GMOD: Semantic Web Services for Model Organism Databases SADI for GMOD: Semantic Web Services for Model Organism Databases Presentation Transcript

  • SADI for GMOD:  Semantic Web Services  for Model Organism  Databases Ben Vandervalk, Luke McCarthy, Edward  Kawas, Mark WilkinsonJames Hogg Research Centre, Heart + Lung Institute University of British Columbia http://code.google.com/p/sadi/wiki/SADIforGMOD
  • Background
  • Background: Model Organism Databases • several organisms are studied extensively by  biologists: e.g. yeast, mouse, fruitfly • each model organism has its own database:  • sequences (DNA, RNA, protein) • sequence features (e.g. genes) • research publications • experimental results • biochemical pathways • phenotype images • evolutionary trees (for closely related  species) All images were obtained from Wikipedia and are in the public domain.
  • Background: Sequence Features  sequence features (a.k.a. sequence annotations) are regions of a DNA or protein sequence with a certain type (e.g. gene)  in genome browsers, different types of sequence annotations are displayed in separate tracksposition on DNA sequence promoter track gene track transcript track Lincoln Stein, http://www.sequenceontology.org/gff3.shtml
  • Background: Sequence Features Many types ofbiological data arerepresented as sequencefeatures:  promoters  chromosome bands  genes  transcripts  CDSs  proteins  protein domains  transposons  non-coding RNAs  ESTs  many more... autogenerated image from http://flybase.org/cgi-bin/gbrowse/dmel/
  • Background: Distributed Annotation System (DAS) HTTP GET DAS XML DAS Server HTTP GET DAS XML DAS Server HTTP GET DAS XML DAS Server HTTP GET DAS XML DAS Server autogenerated image from http://flybase.org/cgi-bin/gbrowse/dmel/
  • Background: Limitations of the Distributed Annotation System (DAS)   integrating data from DAS servers requires  specialized software (“DAS clients”)   other types of data (e.g. biochemical pathways,  experimental results) cannot be automatically  integrated with sequence feature data   most bioinformatics analysis software (e.g. BLAST)  does not speak DAS
  • SADI for GMOD: Semantic Web  Services for Model Organism  Databases
  • SADI for GMOD: Semantic Web Services for Model  Organism Databases SADI (Semantic Automated Discovery and Integration) • Standard for Web services that consume/generate RDF • Motivation: automated integration of bioinformatics data and  software  GMOD (Generic Model Organism Database) • Toolkit for building a model organism database and  website • Collection of related open source projects: e.g. Chado,  Gbrowse, Pathway Tools   • Many sites use GMOD components: FlyBase,  BeetleBase, DictyBase, etc. 
  • SADI in a Nutshell• to invoke a SADI service: o HTTP POST an RDF document to the service URL o e.g. $ curl --data @input.rdf http://sadiframework.org/examples/hello• to get service metadata:   o HTTP GET on service URL o returns an RDF document with service name, description, etc.  o e.g. $ curl http://sadiframework.org/examples/hello• structure of input/output data is described in OWL o service provider specifies one input OWL class and one output OWL class• strengths of SADI o no framework-specific messaging formats or ontologies o supports batch processing of inputs o supports long-running services (asynchronous services) more info: http://sadiframework.org/
  • SADI for GMOD Services• SADI services for accessing sequence feature data• implemented as Perl CGI scripts Service Name Input Relationship Output get_feature_info database identifier is about feature description get_features_ collection of feature  genomic coordinates overlaps overlapping_region descriptions get_sequence_ DNA, RNA, or amino  genomic coordinates is represented by for_region acid sequence has part / derives  collection of feature  get_child_features feature description into descriptions is part of / derives  collection of feature  get_parent_features feature description from descriptions
  • SADI for GMOD: Structure of Service  Input/Output RDF Input RDF (N3) Output RDF (N3)@prefix lsrn: <http://purl.oclc.org/SADI/LSRN/> . @perefix lsrn: <http://purl.oclc.org/SADI/LSRN/> .@prefix GeneID: <http://lsrn.org/GeneID:> . @prefix GeneID: <http://lsrn.org/GeneID:> . @prefix FlyBase: <http://flybase.org/cgi-bin/sadi.gmod/feature?GeneID:49962 id=> . a lsrn:GeneID_Record; @prefix GenBank: <http://lsrn.org/GB:> . sio:SIO_000008 [ # p = has attribute a lsrn:GeneID_Identifier; # p = is about sio:SIO_000300 "49962" # p = has value GeneID:49962 sio:SIO_000332 FlyBase:FBgn0040037 . ] . # feature FlyBase:FBgn0040037 a SO:SO_0000704 . # o = gene range:position [ HTTP  a range:RangedSequencePosition; sio:SIO_000053 . # p = has proper part POST [ a range:StartPosition; sio:SIO_000300 26994]; sio:SIO_000053 . # p = has proper part [ a range:EndPosition; sio:SIO_000300 32391]; range:in_relation_to _:minus_strand_seq ] . _:minus_strand_seq sio:SIO_000011 [ # p = represents a strand:MinusStrand; sio:SIO_000093 GenBank:AE014135 # p = is proper part of ] . # reference feature (chromosome) FlyBase:4 # chromosome 4 get_feature_info a SO:SO_0000105 . # o = chromosome arm
  • SADI for GMOD Demo
  • SADI Client Software SHARE Query Engine SADI Taverna PluginSPARQL Query => SADI Workflow Design SADI Workflows http://biordf.net/cardioSHARE/query http://sadiframework.org/content/ 2010/05/03/sadi-taverna-plugin- tutorial/
  • Demo with SHARE Query Engine SPARQL Query SADI Workflow "What proteins are homologous to FlyBase protein FBpp0091047?"PREFIX FlyBase: <http://lsrn.org/FLYBASE:>PREFIX sio: <http://semanticscience.org/resource/>PREFIX sadi: <http://sadiframework.org/ontologies/properties.owl#>SELECT ?homologWHERE { # SIO_000332 = is about FlyBase:FBpp0091047 sio:SIO_000332 ?protein . ?protein sadi:hasSequence ?sequence . # SIO_010302 = is homologous to ?protein sio:SIO_010302 ?homolog .}
  • Acknowledgements  Team Mark Wilkinson: Principal Investigator Luke McCarthy: Lead Programmer, SADI & SHARE Edward Kawas: Perl Programmer, SADI Funding Microsoft Research http://sadiframework.org/
  • SADI Training Course “Web Publishing of Scientific Data and Services” October 22nd-23rd, 2011 University of British Columbia (next door!)Learn how to:=> semantically describe service functionality in OWL=> publish Semantic Web services using the SADIframeworkMore info: http://sadiframework.org/training  
  • Extra Slides
  • SADI for GMOD: Setting up the Services1. Load your GFF files into a Bio::DB::SeqFeature::Store database (mysql) 2. Install SADI for GMOD dependencies with CPAN3. Download the SADI for GMOD tarball and unpack into cgi-bin4. Set DB connection parameters in cgi-bin/sadi.gmod/sadi.gmod.conf [GENERAL] db_adaptor = Bio::DB::SeqFeature::Store db_args = -adaptor DBI::mysql -dsn dbi:mysql:database=flybase base_url = http://flybase.org/cgi-bin/sadi.gmod/5. Configure Dbxref mappings in cgi-bin/sadi.gmod/dbxref.conf [DBXREF_TO_LSRN] SwissProt = UniProt UniProtKB = UniProt SwissProt/TrEMBL = UniProt ...6. Register the services in public SADI registry: http://sadiframework.org/registry more info: http://code.google.com/p/sadi/wiki/SADIforGMOD