SADI for GMOD: Bringing Model Organism Databases onto the Semantic Web Ben Vandervalk, Luke McCarthy, Edward Kawas, Mark WilkinsonJames Hogg Research Centre, Heart + Lung Institute University of British Columbia http://code.google.com/p/sadi/wiki/SADIforGMOD
SADI for GMOD: Background SADI (Semantic Automated Discovery and Integration) • Standard for Web services that consume/generate RDF • Motivation: automated integration of bioinformatics data and software GMOD (Generic Model Organism Database) • Toolkit for building a model organism database and website • Collection of related open source projects: e.g. Chado, Gbrowse, Pathway Tools • Many sites use GMOD components: FlyBase, BeetleBase, DictyBase, etc.
SADI in a Nutshell• to invoke a SADI service: o HTTP POST an RDF document to the service URI o e.g. $ curl --data-binary @input.rdf http://sadiframework.org/examples/hello• to get service metadata: o HTTP GET on service URL o returns an RDF document with service name, description, etc. o e.g. $ curl http://sadiframework.org/examples/hello• structure of input/output data is described in OWL o service provider specifies one input OWL class and one output OWL class• strengths of SADI o no framework-specific messaging formats or ontologies o supports batch processing of inputs o supports long-running services (asynchronous services) more info: http://sadiframework.org/
SADI for GMOD• SADI services for accessing sequence feature data• implemented as Perl CGI scripts Service Name Input Relationship Output get_feature_info database identifier is about feature description get_features_ collection of feature genomic coordinates overlapsoverlapping_region descriptions get_sequence_ DNA, RNA, or amino genomic coordinates is represented by for_region acid sequence collection of feature get_child_features feature description has part / derives into descriptions is part of / derives collection of feature get_parent_feature feature description from descriptions
SADI for GMOD: Structure of Service Input/Output RDF Input RDF (N3) Output RDF (N3)@prefix lsrn: <http://purl.oclc.org/SADI/LSRN/> . @perefix lsrn: <http://purl.oclc.org/SADI/LSRN/> .@prefix GeneID: <http://lsrn.org/GeneID:> . @prefix GeneID: <http://lsrn.org/GeneID:> . @prefix FlyBase: <http://flybase.org/cgi-bin/sadi.gmod/feature?GeneID:49962 id=> . a lsrn:GeneID_Record; @prefix GenBank: <http://lsrn.org/GB:> . sio:SIO_000008 [ # p = has attribute a lsrn:GeneID_Identifier; # p = is about sio:SIO_000300 "49962" # p = has value GeneID:49962 sio:SIO_000332 FlyBase:FBgn0040037 . ] . # feature FlyBase:FBgn0040037 a SO:SO_0000704 . # o = gene range:position [ HTTP a range:RangedSequencePosition; sio:SIO_000053 . # p = has proper part POST [ a range:StartPosition; sio:SIO_000300 26994]; sio:SIO_000053 . # p = has proper part [ a range:EndPosition; sio:SIO_000300 32391]; range:in_relation_to _:minus_strand_seq ] . _:minus_strand_seq sio:SIO_000011 [ # p = represents a strand:MinusStrand; sio:SIO_000093 GenBank:AE014135 # p = is proper part of ] . # reference feature (chromosome) FlyBase:4 # chromosome 4 get_feature_info a SO:SO_0000105 . # o = chromosome arm
SADI for GMOD: Setting up the Services1. Load your GFF files into a Bio::DB::SeqFeature::Store database (mysql) 2. Install SADI for GMOD dependencies with CPAN3. Download the SADI for GMOD tarball and unpack into cgi-bin4. Set DB connection parameters in cgi-bin/sadi.gmod/sadi.gmod.conf [GENERAL] db_adaptor = Bio::DB::SeqFeature::Store db_args = -adaptor DBI::mysql -dsn dbi:mysql:database=flybase base_url = http://flybase.org/cgi-bin/sadi.gmod/5. Configure Dbxref mappings in cgi-bin/sadi.gmod/dbxref.conf [DBXREF_TO_LSRN] SwissProt = UniProt UniProtKB = UniProt SwissProt/TrEMBL = UniProt ...6. Register the services in public SADI registry: http://sadiframework.org/registry more info: http://code.google.com/p/sadi/wiki/SADIforGMOD