SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)


Published on

IMPORTANT CORRECTION TO THIS SLIDESHOW WAS MADE August 24, 2011. How to use the Protege SADI plugin to generate SADI-compliant semantic web services. Created for the 2011 DBCLS BioHackathon. Credits to Mark Wilkinson, Benjamin Vandervalk, Luke McCarthy, Edward Kawas.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)

  1. 1. Creating a SADI Service in Perl<br />Using the Protege SADI plug-in<br />
  2. 2. Steps...<br />What data will you consume?<br />What data will you produce?<br />What ontologies will you use?<br />Model your input data v.v. these ontologies<br />Model your output data v.v. these ontologies<br />Create the OWL models for the input and output data in Protege<br />Use the SADI plugin to automatically generate the service code scaffold<br />Add your business logic<br />Deploy<br />Register with the SADI registry<br />
  3. 3. Step 1 & 2<br />My service: getDragonAllelesByGene<br />The service consumes record identifiers corresponding to Loci from the DragonDB (Antirrhinum majus) genome database, and returns record identifiers for every known allele of those Loci.<br />
  4. 4. Step3<br />What ontologies should I use?<br />(this will vary for every project, and you are free to <br />use whatever ontology you wish with SADI!)<br />LSRN (life science resource names) is an ontology of database records and identifiers<br /><br />SIO (SemanticScience Integrated Ontology) is an “upper” ontology specifying how to represent scientific data, including database records. <br /><br />
  5. 5. Step4 & 5<br />Model your input and output data<br />The SIO best-practices suggests that your data should be modelled as attributes which have an optional unit and value. The identifier for any given record is an attribute of that record, where the value of that attribute is the ID number of the record.<br />The ontological type of Antirrhinum Locus IDs is “DragonDB_Locus_Identifier” according to the LSRN ontology<br />The ontological type of Antirrhnum Allele IDs is “DragonDB_Allele_Identifier” according to the LSRN ontology<br /> Therefore... Our data models look like this:<br />
  6. 6. Step4<br />Input Data Structure<br />This is the “subject” node of theRDF graph<br /><br />rdf:type<br /><br />has attribute<br />(SIO:000008)<br />has value<br />(SIO:000300)<br />CHO<br />'has attribute' only (DragonDB_Locus_Identifierand 'has value' some string)<br />
  7. 7. Step5<br />Output Data Structure<br /><br /><br />rdf:type<br />has_allele<br /><br />has attribute<br />(SIO:000008)<br />has value<br />(SIO:000300)<br />Red is the incoming subject node (retained in the output as per SADI requirements!)<br />Green is the data added to that node<br />cho-1<br />has_allele some ('has attribute' some (DragonDB_Allele_Identifierand 'has value' some string))<br />
  8. 8. Step6<br />Create the OWL Classes representing your Input and Outputdata models using Protege<br />
  9. 9. Step6<br />Start Protege and create a new ontology<br />The IRI that you chose MUST BE REAL AND RESOLVABLE!! SADI will look for your ontology at that address later, so chose this carefully from the start!<br />
  10. 10. Step6<br />Using Protege, import the ontologies you need<br />Click here and add the LSRNand SIO ontologies as imports<br />
  11. 11. Step6<br />
  12. 12. Step6<br />Create two new classes representingyour Input and Output data(class names are arbitrary)<br />
  13. 13. Step6<br />If there are predicates you require that do not exist in any of the imported ontologies, create them now<br />(to maximize interoperability, always TRY to use predicates that already exist, or inherit from a predicate that already exists; however if you MUST make one of your own, then you’re free to do so)<br />
  14. 14. Step6<br />Now define your input and output classesNOTE: you will have to use the Manchester Syntax Editor to do this, since the kinds of restrictions we need to make cannot be created using the Protege GUI (unfortunately  )<br />Switch back to the “Classes” tab in Protege, then click here<br />
  15. 15. Step6<br />Define Input Class...<br />N.B. You must use Existential restrictions here, NEVER Universal!i.e. Never use “only”, always use “some”<br />
  16. 16. Step6<br />Define Output Class...<br />
  17. 17. Step6<br />DONE!<br />Now clickthe SADITab...<br />
  18. 18. Step7<br />Use the SADI Plugin to write your service code<br />
  19. 19. Step7<br />On the SADI tab, fill-in your service details:<br /><ul><li> Drag-and-Drop your input and output Classes onto the SADI panel to fill-in those two slots.
  20. 20. “Service Provider” is some domain that identifies you (NOT A URL! A DOMAIN NAME!!)
  21. 21. “Authoritative” is a small annotation to indicate if you are the “owner” of the data that the service will provide, or if you are a mirror or other re-distributor of the data
  22. 22. “Service Endpoint” is the public URL for your service. It is only required for asynchronous services behind proxies/redirects.
  23. 23. “Service Type” is optional. It is an rdf:type URI indicating the type of service (e.g.</li></li></ul><li>Step7<br />Now on the bottom...<br /><ul><li> Is your service likely to respond slowly? If so, then it should be Asynchronous to avoid timeouts
  24. 24. Select “Perl” tab
  25. 25. Chose a place for the Plug-in to write the code to (you will edit this code shortly)
  26. 26. Click “Generate”</li></li></ul><li>Step7<br />Hurray!<br />
  27. 27. Step8<br />Edit code to add your business-logic<br />
  28. 28. #-----------------------------------------------------------------<br /># SERVICE IMPLEMENTATION PART<br />#-----------------------------------------------------------------<br />use RDF::Trine::Node::Resource;<br />use RDF::Trine::Node::Literal;<br />use RDF::Trine::Statement;<br />=head2 process_it<br /> Function: implements the business logic of a SADI service<br />Args : $inputs - ref to an array of RDF::Trine::Node::Resource<br /> $input_model - an RDF::Trine::Model containing the input RDF data<br /> $output_model - an RDF::Trine::Model containing the output RDF<br /> Returns : nothing (service output is stored in $output_model)<br />=cut<br />sub process_it {<br /> my ($self, $inputs, $input_model, $output_model) = @_;<br />foreach my $input (@$inputs) {<br /> # Log4perl 'easy mode' routines: TRACE, DEBUG, INFO, WARN, ERROR<br /> INFO(sprintf('processing input %s', $input->uri));<br /> # Your code goes here...<br /> # For a 'Hello, World!' example, see the SYNOPSIS section of<br /> # <br /> }<br />}<br /><br />Your code is here!<br />It uses RDF::Trine<br />The input data is parsed for you and each input “subject” node is placed into an arrayref<br />You access the input data via the subject node and calls to RDF::Trine to retrieve connected attribute nodes<br />Use the RDF::Trine add_statement method to add your output data to the $output_model<br />Done!<br />
  29. 29. Step8<br />For example...<br /> here I am just going to hard-code the<br /> output data for simplicity, but of course<br /> you would normally use a database call<br /> or algorithm to generate this...<br />
  30. 30. Step8<br />use RDF::Trine::Node::Resource;<br />use RDF::Trine::Node::Literal;<br />use RDF::Trine::Statement;<br />use RDF::SIO::Utils;<br />my $sadi = "";<br />my $lsrn = "";<br />my $sio = "";<br />I am going to use the RDF::SIO::Utils module from CPAN to help mebuild SIO-compliant data structures more easily...<br />I also like to define URI prefixes as variables to beautify my code. ( NOTE that the trailing “/” or “#” on the prefix is omitted, since this helps us later when we want to use Perl string interpolation. )<br />
  31. 31. Step8<br />sub process_it {<br /> my ($self, $inputs, $input_model, $output_model) = @_;<br /> my $sadi = "";<br /> my $lsrn = "";<br /> my $sio = "";<br />my $SIO = RDF::SIO::Utils->new();<br />foreach my $input (@$inputs) {<br /> my $loci = $SIO->getAttributesByType(<br /> model =>$input_model,<br /> node => $input,<br />attributeType =>"$lsrn/DragonDB_Locus_Identifier" );<br /> my $locus_node = shift @$loci; # comes back as an arrayref<br /> my ($locus, $null) = $SIO->getUnitValue(model => $input_model, node => $locus_node);<br />Put prefixes here<br />For each of the $inputs we pick up the DragonDB_Locus_Identifier attribute nodesand for each of those (there should only be one, so simply shift it off the array) we get the value of that Identifier. <br />The “getUnitValue” function works on attributes that have only values, as is the case here, but also on attributes (like quantitative measurements) that have values and associated measurement units. In this case, $locus is the value, and $null will be null since there are no units.<br />$locus now contains the identifier of the locus for that input<br />
  32. 32. Step8<br /> # do your database or algorithm on $locus here to set value of $allele...<br />my $allele = "cho-1"; # here we are just going to hard-code it...<br /> # make an output node to attach to the input subject node<br />my $out_node = $SIO->Trine->iri("$allele");<br /># decorate it with the output data values<br /> my $attribute = $SIO->addAttribute(<br /> model => $output_model, # add to output model<br /> node => $out_node, #<br /> predicate => "$sio/SIO_000671", # has identifier<br />attributeType => "$lsrn/DragonDB_Allele_Identifier",<br /> value => "cho-1",<br /> );<br /> # SADI outputs must be attached to the subject node with a meaningful predicate<br /> my $service_predicate = $SIO->Trine->iri("$sadi#has_allele");<br /> my $statement = $SIO->Trine->statement($input, $service_predicate, $out_node);<br /> $output_model->add_statement($statement); # add this to the output model<br /> # DONE!<br />This is the rest of your service code... You need to do nothing more!<br />
  33. 33. sub process_it {<br /> my ($self, $inputs, $input_model, $output_model) = @_;<br /> my $SIO = RDF::SIO::Utils->new();<br />foreach my $input (@$inputs) {<br /> my $loci = $SIO->getAttributesByType(<br /> model =>$input_model,<br /> node => $input,<br />attributeType =>"$lsrn/DragonDB_Locus_Identifier", );<br /> my $locus_node = shift @$loci;<br /> my ($locus, $unit) = $SIO->getUnitValue(model => $input_model, node => $locus_node);<br /> # do your database or algorithm on $locus here to set value of $allele...<br /> my $allele = "cho-1";<br /> my $out_node = $SIO->Trine->iri("$allele");<br /> my $attribute = $SIO->addAttribute(<br /> model => $output_model,<br /> node => $out_node,<br /> predicate => "$sio/SIO_000671", # has identifier<br />attributeType => "$lsrn/DragonDB_Allele_Identifier",<br /> value => "cho-1",<br /> );<br /> my $service_predicate = $SIO->Trine->iri("$sadi#has_allele");<br /> my $statement = $SIO->Trine->statement($input, $service_predicate, $out_node);<br /> $output_model->add_statement($statement);<br /> }<br />}<br />Step8<br />THIS IS YOUR SERVICE CODE<br />Bolded statements are the ones that you add to the auto-generated scaffold<br />
  34. 34. Step9<br />Deploy! <br />Copy to cgi-bin on your server (make sure it is set to “executable”!)<br />Save your ontology and deploy it to the correct location such thatSADI can find it<br />
  35. 35. Step9a<br />Test your service before registering it!!<br /><ul><li> Create a file called “data.rdf” with some sample input data:</li></ul><?xml version="1.0" encoding="utf-8"?><br /><rdf:RDFxmlns:rdf=""><br /><rdf:Description xmlns:ns1="" rdf:nodeID="r1313610791r0"><br /> <ns1:SIO_000300>CHO</ns1:SIO_000300><br /> <rdf:typerdf:resource=""/><br /> <rdf:typerdf:resource=""/><br /></rdf:Description><br /><rdf:Description xmlns:ns1="" rdf:about=""><br /> <rdf:typerdf:resource=""/><br /> <ns1:SIO_000008 rdf:nodeID="r1313610791r0"/><br /> <ns1:SIO_000671 rdf:nodeID="r1313610791r0"/><br /></rdf:Description><br /></rdf:RDF><br />(note the line in red!! The SADI spec requires input data to be typed according to the interface of the service provider!)<br /><ul><li> Then use an HTTP client like Unix ‘curl’ to send that data to your service:</li></ul> $ curl --data @data.rdf<br />
  36. 36. Step10<br />Register your service with SADI<br /><br />
  37. 37. Congratulations! Break out the champagne!<br />