This sequel to the talk "Comrehensive Self-Service Life Science Data Federation with SADI semantic Web services and HYDRA" is more technical, although relatively self-contained, and tailored to people interested in practical aspects of SADI: I go step-by-step through an example of SADI service creation, deployment and use.
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
ย
A practical introduction to SADI semantic Web services and HYDRA query tool
1. A PRACTICAL INTRODUCTION TO
SADI SEMANTIC WEB SERVICES
AND HYDRA QUERY TOOL
Alexandre Riazanov, CTO
IPSNP Computing Inc
Oslo University, Sep 23, 2015
2. PLAN OF THE TALK
โข A brief reminder of the previous episode: data
federation with SADI and HYDRA.
โข RDF and OWL as syntactic foundations of service I/O
and functionality descriptions.
โข Query execution with automatic service discovery and
reasoning.
โข Resource publishing process with SADI, with a detailed
practical example (time permitting).
4. QUERY EXAMPLES
โข Find the names of drugs that contain chemical category Y as
active ingredients.
โข Find documents mentioning enzyme activity X, extract info on
protein mutations and visualize mutations on 3D structure.
โข Annotate a DNA sequence X with molecular functions of
proteins produced by the corresponding gene.
โข Find patients with precondition X diagnosed with infections Y
resulting from procedure Z.
โข Find patients diagnosed with X while taking drug C.
5. HOW WE DO IT WITH HYDRA AND SADI
SEMANTIC WEB SERVICES
6. A HIGH LEVEL VIEW OF THE HYDRA APPROACH
โ Given a SPARQL query, HYDRA analyses it
by using an intelligent logic-based algorithm
(proprietary, unlike SADI itself).
โ HYDRA requests descriptions of potentially
useful services from available SADI service
registries.
โ HYDRA processes the descriptions and
figures out which services have to be
invoked, on what data and in what order.
SPARQL is a W3C
standard semantic
query language --
much more intuitive
than SQL.
7. HOW IS THIS ALL POSSIBLE?
โข Key ingredient: the SADI framework for
Semantic Web services (Semantic Automated
Discovery and Integration).
โข SADI services are:
โข RESTful services
โข consuming and producing one format -- RDF,
โข with semantic descriptions (in OWL) fully defining
their functionality.
8. DIGRESSION: RDF
โข W3C RDF = Resource Description
Framework
โข Standartised graph-based data model and a
few standard rendering formats.
โข Nodes = objects (URIs) and data values like
โabcโ^^xsd:string or โ123โ^^xsd:integer.
โข Edges: binary relations.
9. RDF EXAMPLES
@prefix mt: <http://localhost:8080/medical_terminology.owl#> .
<http://example.com/patient#1234> rdf:type mt:Patient .
<http://example.com/patient#1234> mt:has_mass _:hm .
_:hm rdf:type mt:Measurement .
_:hm mt:has_value "92.0"^^xsd:float .
_:hm mt:has_units mt:kg .
@prefix mt: <http://localhost:8080/medical_terminology.owl#> .
<http://example.com/patient#1234> a mt:Person ;
mt:has_mass [a mt:Measurement;
mt:has_value "92.0"^^xsd:float;
mt:has_units mt:kg] .
The original XML-
based rendering
format is also
popular.
10. DIGRESSION: OWL
โข W3C OWL = Web Ontology Language
โข Essentially, extends RDF with definitions and other axioms
for classes (types of objects) and properties (binary
relations).
โข Most useful axiom types -- class and property chierarchies:
Patient subClassOf Person
loves subPropertyOf knows
โข SADI reuses property restriction syntax:
has_MRN exactly 1 string
11. SADI SERVICE I/O
โข Input: RDF description of an input object.
โข Output: another RDF graph providing more
(computed or retrieved) info about the input
object or linking it to other objects.
โข Since all SADI services โtalk the same
languageโ (RDF), they are 100% syntactically
interoperable:
โ output of one SADI service can be directly
consumed by any other SADI services.
12. COMPLETE SEMANTIC DESCRIPTIONS
OF SERVICE FUNCTIONALITY
SADI services publish semantic descriptions of their
I/O that completely define what the service expects and
can accept as input, and what RDF assertions the service
can output.
โข Unique and extremely powerful property: it facilitates
completely automatic discovery
and
orchestration of services.
14. SEMANTIC FUNCTIONALITY DESCRIPTION
โข OWL syntax is repurposed to define what RDF graphs are
acceptable as input, and what RDF graphs may be produced
in the output.
โข Input(computeBMI) =
Person and (has_height exactly 1
(Measurement and (has_value exactly 1
float)))
โข Output(computeBMI) = has_BMI exactly 1 float
15. SERVICE INPUT CLASS
โข Specifies what kind of objects (RDF descriptions) the service
expects in the input. OWL syntax is convenient for such definitions.
โข Almost always just an enumeration of attributes of the input objects
the SADI service expects.
โ If the input class is defined as
Person and
(has_height exactly 1
(Measurement and
(has_value exactly 1 float) and
(has_units exactly 1 {m})) and
(has_mass exactly 1
(Measurement and
(has_value exactly 1 float) and
(has_units exactly 1 {kg}))
โฆ the service expects something
like this in the input:
patient1234 a Person;
has_height [a Measurement;
has_value โ1.7"^^xsd:float;
has_units m];
has_mass [a Measurement;
has_value โ92.0"^^xsd:float;
has_units kg]
16. SERVICE OUTPUT CLASS
โข A SADI service advertises itself by publishing its
output class specifying what the service promises
to produce as the output.
โข The class must enumerate attributes that the
service will add to the input object. This fully
semantically defines what the service does!
โ If the output class is defined as
has_BMI exactly 1 float
โฆ service clients can expect
something like this in the output:
patient1234 has_BMI โ31.83โ^^xsd:float
17. DIGRESSION: SPARQL
โข W3C SPARQL - standard query language for
the RDF data model.
โข SPARQL clients are programs that execute
SPARQL queries, typically on RDF triplestores.
PREFIX mt: <http://localhost:8080/medical_terminology.owl#>
SELECT ?mass
{
<http://example.com/patient#1234> a mt:Person ;
mt:has_mass [a mt:Measurement;
mt:has_value ?mass;
mt:has_units mt:kg] . }
โข HYDRA is also a
SPARQL client,
but for virtual
RDF DBs.
18. AUTOMATIC SERVICE DISCOVERY
โข With the I/O descriptions, a sufficiently intelligent
client can figure out that it can call the service if the
client has to satisfy a query condition like this:
patient1234 has_BMI ?bmi_value
โข The query condition suggests that a service with
has_BMI in the output may be useful if called on the
object patient1234
โข To make the call, the client must have enough
information about patient1234 : according to the input
class, has_height and has_mass must be attached to it
and sent to the service.
19. QUERY, EXECUTION, ANSWERS
Query:
FROM <.......rdf> # seed data
SELECT ?bmi_value
{
patient1234 a Person;
has_BMI ?bmi_value
}
Execution: HYDRA
โ seed data in FROM clause describes the
heights and weights of some people, including
patient1234, using has_height and has_mass;
โ since has_BMI is there, HYDRA looks for all
services in the available registries that can
attach has_BMI and finds computeBMI;
โ patient1234 satisfies the input condition of
computeBMI, so HYDRA calls it;
โ computeBMI returns
patient1234 has_BMI โ32.3โ
so HYDRA can return an an answer:
?bmi_value = โ32.3โ
20. MULTIPLE SERVICES
โข Suppose, we donโt know patientโs height/mass, but can
retrieve them from a DB by patientโs medical record
number (MRN).
โข We write another SADI service, patientInfo :
Output(patientInfo) =
(has_height exactly 1
(Measurement and
(has_value exactly 1 float) and
(has_units exactly 1 {m})) and
(has_mass exactly 1
(Measurement and
(has_value exactly 1 float) and
(has_units exactly 1 {kg}))
Input(patientInfo) =
Person and (has_MRN exactly 1 string)
21. AUTOMATIC SERVICE COMPOSITION
โข HYDRA can figure out automatically that the output of
patientInfo can be submitted to computeBMI, and
the composition of the services can solve the query
SELECT ?bmi_value
{
?patient a Person ;
has_MRN โ1234โ ;
has_BMI ?bmi_value
}
(no has_height or has_mass anywhere !)
22. INTELLIGENT (REASONING-ENABLED) QUERY
EXECUTION
โ Some queries are too complex unless generality
can be exploited:
โข For example, query concerning all antibiotics
requires generalisation, otherwise all types of
antibiotics would have to be enumerated in the
query.
โ Much better way to do this is to import a
classification of drugs and use it in query execution.
โ HYDRA facilitates such reasoning and even more
complex reasoning with rules.
23. (TINY) REASONING EXAMPLE
Query defines ?patient as a Patient instead of Person:
?patient a Patient ; has_MRN โ1234โ ; ...
โ HYDRA is still able to call patientInfo on the Patient
instance, say patient1234, if there is an axiom Patient
subClassOf Person. It infers patient1234 a Person, which
can be used as input to patientInfo.
โ The axiom can be included in the definition of Output
(patientInfo), or specified separately.
24. RESOURCE PUBLISHING WITH SADI (1)
โข Specify the source of data / software you
want to publish with SADI.
โข Model data semantically: find ontologies
describing your domains and decide how
your data will be expressed in the terms of
these ontologies.
For example, a
patient database
and a BMI
computation
algorithm.
25. RESOURCE PUBLISHING WITH SADI (2)
โข Define your services I/O semantically: decide how to describe
the operation of your services in the terms of the domain
ontologies, i.e., what will be written in the input and output classes.
โข Code the business logic of your services in Java, Perl or
Python. If a service wraps a DB, convert the input RDF into a
query and the query results back to RDF. The coding effort is usually
tiny compared to the modelling.
โข Overall development costs may be considerable, but this cost is
well amortized because SADI services are highly reusable, due
to their unprecedented degree of interoperability and
discoverability.
26. PRACTICAL EXAMPLE (1)
โ Specify the source of data / software you want to
publish with SADI.
โขDatabase (CSV file) containing patient MRN, name,
height, weight, etc. We will use it to implement
patientInfo.
โขBMI computation algorithm:
BMI = mass, kg / height, m ^2.
27. PRACTICAL EXAMPLE (2)
โ Model data semantically: find ontologies describing your
domains and decide how your data will be expressed in the
terms of these ontologies.
โข Create ontology clinical_terms.owl in Protรฉgรฉ:
โข Classes: Person, Patient, Measurement, Units
โข Properties: has_BMI, has_MRN, has_height, has_mass,
has_value, has_units.
โข Individuals: m, kg.
โข RDF data sample:
patient1234 a Patient;
has_MRN โ1234โ^^xsd:string;
has_height [a Measurement;
has_value โ1.7"^^xsd:float;
has_units m];
. . .
29. PRACTICAL EXAMPLE (4)
โ Define your services I/O semantically: decide how to describe the
operation of your services in the terms of the domain ontologies, i.e.,
what will be written in the input and output classes.
โข I/O ontologies: patientInfo.owl and computeBMI.owl, importing
medical_terminology.owl
30. PRACTICAL EXAMPLE (5)
โ Code the business logic of your services in Java, Perl or Python.
โข There is a good open-source Java library for creating SADI
services as Java Servlets.
โข A skeleton code for a service is generated automatically; we just
have to fill the body of one method.
โข The library takes care of all the HTTP connectivity issues, parses
the input RDF to a simple abstract representation (Jena), and
renders the output RDF.
โข The compiled WAR file can be immediately deployed on a servlet
container (Tomcat, Jetty, etc).
โข SADI services take only 10-15 min to code (if the business logic is
simple or already programmed).
36. PRACTICAL EXAMPLE (11)
โข Deploy the services:
COPY target/my-sadi-services.war TO /var/lib/tomcat7/webapps/
โข Test service description availability (HTTP GET):
40. HYDRA PACKAGING
โข Java API - can be embedded in something else.
โข Command line application - convenient for small
experiments.
โข Web service (Java servlet) with
โ JSON-based protocol
โ Java client-side API.
43. THERE ARE NO PRINCIPLE OBSTACLES
TO SELF-SERVICE QUERYING BECAUSE ..
โ HYDRA implements semantic querying:
โ users need not know how the source data is organised or
accessed.
โ HYDRA can apply concept hierarchies and rules:
โ syntactically simple queries for complex questions.
We just need an adequate user interface
for building queries.
44. HYDRA QUERY COMPOSITION GUI PRINCIPLES
โ Queries are rendered as highly readable graphs.
โ A lot of query composition is done by entering
keyphrases in English;
โ HYDRA GUI suggests (sub)graphs
implementing a given keyphrase.
โ Nodes can be delete/added manually;
โ the system suggests possibilities (navigation).
66. THANK YOU!
Further materials/services are available on request:
โข Live and recorded demos.
โข Publications on previous (academic) case studies.
โข Training/consulting.
โข http://ipsnp.com/ (Canada) and http://ipsnp.co/ (UK)