Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SADI for GMOD: Semantic Web Services for Model Organism Databases


Published on

SADI for GMOD is a collection of ready-made SADI services for accessing sequence feature data in RDF form. The services were developed as an add-on for the GMOD (Generic Model Organism Database) project, which is a popular toolkit for building model organism databases and their associated websites (e.g. FlyBase).

Published in: Technology, Education
  • Be the first to comment

SADI for GMOD: Semantic Web Services for Model Organism Databases

  1. 1. SADI for GMOD:  Semantic Web Services  for Model Organism  Databases Ben Vandervalk, Luke McCarthy, Edward  Kawas, Mark WilkinsonJames Hogg Research Centre, Heart + Lung Institute University of British Columbia
  2. 2. Background
  3. 3. Background: Model Organism Databases • several organisms are studied extensively by  biologists: e.g. yeast, mouse, fruitfly • each model organism has its own database:  • sequences (DNA, RNA, protein) • sequence features (e.g. genes) • research publications • experimental results • biochemical pathways • phenotype images • evolutionary trees (for closely related  species) All images were obtained from Wikipedia and are in the public domain.
  4. 4. Background: Sequence Features  sequence features (a.k.a. sequence annotations) are regions of a DNA or protein sequence with a certain type (e.g. gene)  in genome browsers, different types of sequence annotations are displayed in separate tracksposition on DNA sequence promoter track gene track transcript track Lincoln Stein,
  5. 5. Background: Sequence Features Many types ofbiological data arerepresented as sequencefeatures:  promoters  chromosome bands  genes  transcripts  CDSs  proteins  protein domains  transposons  non-coding RNAs  ESTs  many more... autogenerated image from
  6. 6. Background: Distributed Annotation System (DAS) HTTP GET DAS XML DAS Server HTTP GET DAS XML DAS Server HTTP GET DAS XML DAS Server HTTP GET DAS XML DAS Server autogenerated image from
  7. 7. Background: Limitations of the Distributed Annotation System (DAS)   integrating data from DAS servers requires  specialized software (“DAS clients”)   other types of data (e.g. biochemical pathways,  experimental results) cannot be automatically  integrated with sequence feature data   most bioinformatics analysis software (e.g. BLAST)  does not speak DAS
  8. 8. SADI for GMOD: Semantic Web  Services for Model Organism  Databases
  9. 9. SADI for GMOD: Semantic Web Services for Model  Organism Databases SADI (Semantic Automated Discovery and Integration) • Standard for Web services that consume/generate RDF • Motivation: automated integration of bioinformatics data and  software  GMOD (Generic Model Organism Database) • Toolkit for building a model organism database and  website • Collection of related open source projects: e.g. Chado,  Gbrowse, Pathway Tools   • Many sites use GMOD components: FlyBase,  BeetleBase, DictyBase, etc. 
  10. 10. SADI in a Nutshell• to invoke a SADI service: o HTTP POST an RDF document to the service URL o e.g. $ curl --data @input.rdf• to get service metadata:   o HTTP GET on service URL o returns an RDF document with service name, description, etc.  o e.g. $ curl• structure of input/output data is described in OWL o service provider specifies one input OWL class and one output OWL class• strengths of SADI o no framework-specific messaging formats or ontologies o supports batch processing of inputs o supports long-running services (asynchronous services) more info:
  11. 11. SADI for GMOD Services• SADI services for accessing sequence feature data• implemented as Perl CGI scripts Service Name Input Relationship Output get_feature_info database identifier is about feature description get_features_ collection of feature  genomic coordinates overlaps overlapping_region descriptions get_sequence_ DNA, RNA, or amino  genomic coordinates is represented by for_region acid sequence has part / derives  collection of feature  get_child_features feature description into descriptions is part of / derives  collection of feature  get_parent_features feature description from descriptions
  12. 12. SADI for GMOD: Structure of Service  Input/Output RDF Input RDF (N3) Output RDF (N3)@prefix lsrn: <> . @perefix lsrn: <> .@prefix GeneID: <> . @prefix GeneID: <> . @prefix FlyBase: < id=> . a lsrn:GeneID_Record; @prefix GenBank: <> . sio:SIO_000008 [ # p = has attribute a lsrn:GeneID_Identifier; # p = is about sio:SIO_000300 "49962" # p = has value GeneID:49962 sio:SIO_000332 FlyBase:FBgn0040037 . ] . # feature FlyBase:FBgn0040037 a SO:SO_0000704 . # o = gene range:position [ HTTP  a range:RangedSequencePosition; sio:SIO_000053 . # p = has proper part POST [ a range:StartPosition; sio:SIO_000300 26994]; sio:SIO_000053 . # p = has proper part [ a range:EndPosition; sio:SIO_000300 32391]; range:in_relation_to _:minus_strand_seq ] . _:minus_strand_seq sio:SIO_000011 [ # p = represents a strand:MinusStrand; sio:SIO_000093 GenBank:AE014135 # p = is proper part of ] . # reference feature (chromosome) FlyBase:4 # chromosome 4 get_feature_info a SO:SO_0000105 . # o = chromosome arm
  13. 13. SADI for GMOD Demo
  14. 14. SADI Client Software SHARE Query Engine SADI Taverna PluginSPARQL Query => SADI Workflow Design SADI Workflows 2010/05/03/sadi-taverna-plugin- tutorial/
  15. 15. Demo with SHARE Query Engine SPARQL Query SADI Workflow "What proteins are homologous to FlyBase protein FBpp0091047?"PREFIX FlyBase: <>PREFIX sio: <>PREFIX sadi: <>SELECT ?homologWHERE { # SIO_000332 = is about FlyBase:FBpp0091047 sio:SIO_000332 ?protein . ?protein sadi:hasSequence ?sequence . # SIO_010302 = is homologous to ?protein sio:SIO_010302 ?homolog .}
  16. 16. Acknowledgements  Team Mark Wilkinson: Principal Investigator Luke McCarthy: Lead Programmer, SADI & SHARE Edward Kawas: Perl Programmer, SADI Funding Microsoft Research
  17. 17. SADI Training Course “Web Publishing of Scientific Data and Services” October 22nd-23rd, 2011 University of British Columbia (next door!)Learn how to:=> semantically describe service functionality in OWL=> publish Semantic Web services using the SADIframeworkMore info:  
  18. 18. Extra Slides
  19. 19. SADI for GMOD: Setting up the Services1. Load your GFF files into a Bio::DB::SeqFeature::Store database (mysql) 2. Install SADI for GMOD dependencies with CPAN3. Download the SADI for GMOD tarball and unpack into cgi-bin4. Set DB connection parameters in cgi-bin/sadi.gmod/sadi.gmod.conf [GENERAL] db_adaptor = Bio::DB::SeqFeature::Store db_args = -adaptor DBI::mysql -dsn dbi:mysql:database=flybase base_url = Configure Dbxref mappings in cgi-bin/sadi.gmod/dbxref.conf [DBXREF_TO_LSRN] SwissProt = UniProt UniProtKB = UniProt SwissProt/TrEMBL = UniProt ...6. Register the services in public SADI registry: more info: