Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Loading in …3
1 of 10

Talk1 ben sadi for_gmod_bosc_2011



Download to read offline

Related Books

Free with a 30 day trial from Scribd

See all

Talk1 ben sadi for_gmod_bosc_2011

  1. 1. SADI for GMOD:  Bringing Model Organism  Databases onto the  Semantic Web  Ben Vandervalk, Luke McCarthy, Edward  Kawas, Mark Wilkinson James Hogg Research Centre, Heart + Lung Institute University of British Columbia
  2. 2. SADI for GMOD: Background SADI (Semantic Automated Discovery and  Integration) • Standard for Web services that consume/generate  RDF • Motivation: automated integration of bioinformatics  data and software  GMOD (Generic Model Organism Database) • Toolkit for building a model organism database and  website • Collection of related open source projects: e.g.  Chado, Gbrowse, Pathway Tools   • Many sites use GMOD components: FlyBase,  BeetleBase, DictyBase, etc. 
  3. 3. SADI in a Nutshell • to invoke a SADI service: o HTTP POST an RDF document to the service URI o e.g. $ curl --data-binary @input.rdf • to get service metadata:   o HTTP GET on service URL o returns an RDF document with service name, description, etc.  o e.g. $ curl • structure of input/output data is described in OWL o service provider specifies one input OWL class and one output OWL class • strengths of SADI o no framework-specific messaging formats or ontologies o supports batch processing of inputs o supports long-running services (asynchronous services) more info:
  4. 4. SADI for GMOD • SADI services for accessing sequence feature data • implemented as Perl CGI scripts Service Name Input Relationship Output get_feature_info database identifier is about feature description get_features_ collection of feature  genomic coordinates overlaps overlapping_region descriptions get_sequence_ DNA, RNA, or amino  genomic coordinates is represented by for_region acid sequence collection of feature  get_child_features feature description has part / derives into descriptions is part of / derives  collection of feature  get_parent_feature feature description from descriptions
  5. 5. SADI for GMOD: Structure of Service  Input/Output RDF Input RDF (N3) Output RDF (N3) @prefix lsrn: <> . @perefix lsrn: <> . @prefix GeneID: <> . @prefix GeneID: <> . @prefix FlyBase: < GeneID:49962 id=> . a lsrn:GeneID_Record; @prefix GenBank: <> . sio:SIO_000008 [ # p = 'has attribute' a lsrn:GeneID_Identifier; # p = 'is about' sio:SIO_000300 "49962" # p = 'has value' GeneID:49962 sio:SIO_000332 FlyBase:FBgn0040037 . ] . # feature FlyBase:FBgn0040037 a SO:SO_0000704 . # o = 'gene' range:position [ HTTP  a range:RangedSequencePosition; sio:SIO_000053 . # p = 'has proper part' POST [ a range:StartPosition; sio:SIO_000300 26994]; sio:SIO_000053 . # p = 'has proper part' [ a range:EndPosition; sio:SIO_000300 32391]; range:in_relation_to _:minus_strand_seq ] . _:minus_strand_seq sio:SIO_000011 [ # p = 'represents' a strand:MinusStrand; sio:SIO_000093 GenBank:AE014135 # p = 'is proper part of' ] . # reference feature (chromosome) FlyBase:4 # chromosome 4 get_feature_info a SO:SO_0000105 . # o = 'chromosome arm'
  6. 6. SADI for GMOD: Setting up the Services 1. Load your GFF files into a Bio::DB::SeqFeature::Store database (mysql)   2. Install SADI for GMOD dependencies with CPAN 3. Download the SADI for GMOD tarball and unpack into cgi-bin 4. Set DB connection parameters in cgi-bin/sadi.gmod/sadi.gmod.conf [GENERAL] db_adaptor = Bio::DB::SeqFeature::Store db_args = -adaptor DBI::mysql -dsn dbi:mysql:database=flybase base_url = 5. Configure Dbxref mappings in cgi-bin/sadi.gmod/dbxref.conf [DBXREF_TO_LSRN] SwissProt = UniProt UniProtKB = UniProt SwissProt/TrEMBL = UniProt ... 6. Register the services in public SADI registry: more info:
  7. 7. SADI Client Software SHARE Query Engine SADI Taverna Plugin SPARQL Query => SADI Workflow Design SADI workflows 2010/05/03/sadi-taverna-plugin- tutorial/
  8. 8. Acknowledgements Team   Mark Wilkinson: Principal Investigator Luke McCarthy: Lead Programmer, SADI & SHARE Edward Kawas: Perl Programmer, SADI Funding Microsoft Research
  9. 9. Extra Slides
  10. 10. Demo with SHARE Query Engine SPARQL Query SADI Workflow "What proteins are homologous to FlyBase protein FBpp0288804?" PREFIX FlyBase: <> PREFIX sio: <> SELECT ?homolog WHERE { # SIO_000332 = 'is about' FlyBase:FBpp0288804 sio:SIO_000332 ?protein . # SIO_000205 = 'is represented by' ?protein sio:SIO_000205 ?sequence . # SIO_010302 = 'is homologous to' ?protein sio:SIO_010302 ?homolog . } online demo: