SlideShare a Scribd company logo
1 of 66
Download to read offline
A PRACTICAL INTRODUCTION TO
SADI SEMANTIC WEB SERVICES
AND HYDRA QUERY TOOL
Alexandre Riazanov, CTO
IPSNP Computing Inc
Oslo University, Sep 23, 2015
PLAN OF THE TALK
โ€ข A brief reminder of the previous episode: data
federation with SADI and HYDRA.
โ€ข RDF and OWL as syntactic foundations of service I/O
and functionality descriptions.
โ€ข Query execution with automatic service discovery and
reasoning.
โ€ข Resource publishing process with SADI, with a detailed
practical example (time permitting).
DATA FEDERATION: QUERYING MULTIPLE
HETEROGENEOUS SOURCES AS A SINGLE DB
QUERY EXAMPLES
โ€ข Find the names of drugs that contain chemical category Y as
active ingredients.
โ€ข Find documents mentioning enzyme activity X, extract info on
protein mutations and visualize mutations on 3D structure.
โ€ข Annotate a DNA sequence X with molecular functions of
proteins produced by the corresponding gene.
โ€ข Find patients with precondition X diagnosed with infections Y
resulting from procedure Z.
โ€ข Find patients diagnosed with X while taking drug C.
HOW WE DO IT WITH HYDRA AND SADI
SEMANTIC WEB SERVICES
A HIGH LEVEL VIEW OF THE HYDRA APPROACH
โ— Given a SPARQL query, HYDRA analyses it
by using an intelligent logic-based algorithm
(proprietary, unlike SADI itself).
โ— HYDRA requests descriptions of potentially
useful services from available SADI service
registries.
โ— HYDRA processes the descriptions and
figures out which services have to be
invoked, on what data and in what order.
SPARQL is a W3C
standard semantic
query language --
much more intuitive
than SQL.
HOW IS THIS ALL POSSIBLE?
โ€ข Key ingredient: the SADI framework for
Semantic Web services (Semantic Automated
Discovery and Integration).
โ€ข SADI services are:
โ€ข RESTful services
โ€ข consuming and producing one format -- RDF,
โ€ข with semantic descriptions (in OWL) fully defining
their functionality.
DIGRESSION: RDF
โ€ข W3C RDF = Resource Description
Framework
โ€ข Standartised graph-based data model and a
few standard rendering formats.
โ€ข Nodes = objects (URIs) and data values like
โ€œabcโ€^^xsd:string or โ€œ123โ€^^xsd:integer.
โ€ข Edges: binary relations.
RDF EXAMPLES
@prefix mt: <http://localhost:8080/medical_terminology.owl#> .
<http://example.com/patient#1234> rdf:type mt:Patient .
<http://example.com/patient#1234> mt:has_mass _:hm .
_:hm rdf:type mt:Measurement .
_:hm mt:has_value "92.0"^^xsd:float .
_:hm mt:has_units mt:kg .
@prefix mt: <http://localhost:8080/medical_terminology.owl#> .
<http://example.com/patient#1234> a mt:Person ;
mt:has_mass [a mt:Measurement;
mt:has_value "92.0"^^xsd:float;
mt:has_units mt:kg] .
The original XML-
based rendering
format is also
popular.
DIGRESSION: OWL
โ€ข W3C OWL = Web Ontology Language
โ€ข Essentially, extends RDF with definitions and other axioms
for classes (types of objects) and properties (binary
relations).
โ€ข Most useful axiom types -- class and property chierarchies:
Patient subClassOf Person
loves subPropertyOf knows
โ€ข SADI reuses property restriction syntax:
has_MRN exactly 1 string
SADI SERVICE I/O
โ€ข Input: RDF description of an input object.
โ€ข Output: another RDF graph providing more
(computed or retrieved) info about the input
object or linking it to other objects.
โ€ข Since all SADI services โ€œtalk the same
languageโ€ (RDF), they are 100% syntactically
interoperable:
โ€“ output of one SADI service can be directly
consumed by any other SADI services.
COMPLETE SEMANTIC DESCRIPTIONS
OF SERVICE FUNCTIONALITY
SADI services publish semantic descriptions of their
I/O that completely define what the service expects and
can accept as input, and what RDF assertions the service
can output.
โ€ข Unique and extremely powerful property: it facilitates
completely automatic discovery
and
orchestration of services.
Example: computeBMI service I/O
SEMANTIC FUNCTIONALITY DESCRIPTION
โ€ข OWL syntax is repurposed to define what RDF graphs are
acceptable as input, and what RDF graphs may be produced
in the output.
โ€ข Input(computeBMI) =
Person and (has_height exactly 1
(Measurement and (has_value exactly 1
float)))
โ€ข Output(computeBMI) = has_BMI exactly 1 float
SERVICE INPUT CLASS
โ€ข Specifies what kind of objects (RDF descriptions) the service
expects in the input. OWL syntax is convenient for such definitions.
โ€ข Almost always just an enumeration of attributes of the input objects
the SADI service expects.
โ— If the input class is defined as
Person and
(has_height exactly 1
(Measurement and
(has_value exactly 1 float) and
(has_units exactly 1 {m})) and
(has_mass exactly 1
(Measurement and
(has_value exactly 1 float) and
(has_units exactly 1 {kg}))
โ€ฆ the service expects something
like this in the input:
patient1234 a Person;
has_height [a Measurement;
has_value โ€œ1.7"^^xsd:float;
has_units m];
has_mass [a Measurement;
has_value โ€œ92.0"^^xsd:float;
has_units kg]
SERVICE OUTPUT CLASS
โ€ข A SADI service advertises itself by publishing its
output class specifying what the service promises
to produce as the output.
โ€ข The class must enumerate attributes that the
service will add to the input object. This fully
semantically defines what the service does!
โ— If the output class is defined as
has_BMI exactly 1 float
โ€ฆ service clients can expect
something like this in the output:
patient1234 has_BMI โ€œ31.83โ€^^xsd:float
DIGRESSION: SPARQL
โ€ข W3C SPARQL - standard query language for
the RDF data model.
โ€ข SPARQL clients are programs that execute
SPARQL queries, typically on RDF triplestores.
PREFIX mt: <http://localhost:8080/medical_terminology.owl#>
SELECT ?mass
{
<http://example.com/patient#1234> a mt:Person ;
mt:has_mass [a mt:Measurement;
mt:has_value ?mass;
mt:has_units mt:kg] . }
โ€ข HYDRA is also a
SPARQL client,
but for virtual
RDF DBs.
AUTOMATIC SERVICE DISCOVERY
โ€ข With the I/O descriptions, a sufficiently intelligent
client can figure out that it can call the service if the
client has to satisfy a query condition like this:
patient1234 has_BMI ?bmi_value
โ€ข The query condition suggests that a service with
has_BMI in the output may be useful if called on the
object patient1234
โ€ข To make the call, the client must have enough
information about patient1234 : according to the input
class, has_height and has_mass must be attached to it
and sent to the service.
QUERY, EXECUTION, ANSWERS
Query:
FROM <.......rdf> # seed data
SELECT ?bmi_value
{
patient1234 a Person;
has_BMI ?bmi_value
}
Execution: HYDRA
โ— seed data in FROM clause describes the
heights and weights of some people, including
patient1234, using has_height and has_mass;
โ— since has_BMI is there, HYDRA looks for all
services in the available registries that can
attach has_BMI and finds computeBMI;
โ— patient1234 satisfies the input condition of
computeBMI, so HYDRA calls it;
โ— computeBMI returns
patient1234 has_BMI โ€œ32.3โ€
so HYDRA can return an an answer:
?bmi_value = โ€œ32.3โ€
MULTIPLE SERVICES
โ€ข Suppose, we donโ€™t know patientโ€™s height/mass, but can
retrieve them from a DB by patientโ€™s medical record
number (MRN).
โ€ข We write another SADI service, patientInfo :
Output(patientInfo) =
(has_height exactly 1
(Measurement and
(has_value exactly 1 float) and
(has_units exactly 1 {m})) and
(has_mass exactly 1
(Measurement and
(has_value exactly 1 float) and
(has_units exactly 1 {kg}))
Input(patientInfo) =
Person and (has_MRN exactly 1 string)
AUTOMATIC SERVICE COMPOSITION
โ€ข HYDRA can figure out automatically that the output of
patientInfo can be submitted to computeBMI, and
the composition of the services can solve the query
SELECT ?bmi_value
{
?patient a Person ;
has_MRN โ€œ1234โ€ ;
has_BMI ?bmi_value
}
(no has_height or has_mass anywhere !)
INTELLIGENT (REASONING-ENABLED) QUERY
EXECUTION
โ— Some queries are too complex unless generality
can be exploited:
โžข For example, query concerning all antibiotics
requires generalisation, otherwise all types of
antibiotics would have to be enumerated in the
query.
โ— Much better way to do this is to import a
classification of drugs and use it in query execution.
โ— HYDRA facilitates such reasoning and even more
complex reasoning with rules.
(TINY) REASONING EXAMPLE
Query defines ?patient as a Patient instead of Person:
?patient a Patient ; has_MRN โ€œ1234โ€ ; ...
โ— HYDRA is still able to call patientInfo on the Patient
instance, say patient1234, if there is an axiom Patient
subClassOf Person. It infers patient1234 a Person, which
can be used as input to patientInfo.
โ— The axiom can be included in the definition of Output
(patientInfo), or specified separately.
RESOURCE PUBLISHING WITH SADI (1)
โ€ข Specify the source of data / software you
want to publish with SADI.
โ€ข Model data semantically: find ontologies
describing your domains and decide how
your data will be expressed in the terms of
these ontologies.
For example, a
patient database
and a BMI
computation
algorithm.
RESOURCE PUBLISHING WITH SADI (2)
โ€ข Define your services I/O semantically: decide how to describe
the operation of your services in the terms of the domain
ontologies, i.e., what will be written in the input and output classes.
โ€ข Code the business logic of your services in Java, Perl or
Python. If a service wraps a DB, convert the input RDF into a
query and the query results back to RDF. The coding effort is usually
tiny compared to the modelling.
โ€ข Overall development costs may be considerable, but this cost is
well amortized because SADI services are highly reusable, due
to their unprecedented degree of interoperability and
discoverability.
PRACTICAL EXAMPLE (1)
โ— Specify the source of data / software you want to
publish with SADI.
โžขDatabase (CSV file) containing patient MRN, name,
height, weight, etc. We will use it to implement
patientInfo.
โžขBMI computation algorithm:
BMI = mass, kg / height, m ^2.
PRACTICAL EXAMPLE (2)
โ— Model data semantically: find ontologies describing your
domains and decide how your data will be expressed in the
terms of these ontologies.
โžข Create ontology clinical_terms.owl in Protรฉgรฉ:
โžข Classes: Person, Patient, Measurement, Units
โžข Properties: has_BMI, has_MRN, has_height, has_mass,
has_value, has_units.
โžข Individuals: m, kg.
โžข RDF data sample:
patient1234 a Patient;
has_MRN โ€œ1234โ€^^xsd:string;
has_height [a Measurement;
has_value โ€œ1.7"^^xsd:float;
has_units m];
. . .
PRACTICAL EXAMPLE (3)
Background ontology medical_terminology.owl
Deploy:
cp medical_terminology.owl
/var/lib/tomcat7/webapps/ROOT/
URL: http://localhost:8080/medical_terminology.owl
PRACTICAL EXAMPLE (4)
โ— Define your services I/O semantically: decide how to describe the
operation of your services in the terms of the domain ontologies, i.e.,
what will be written in the input and output classes.
โžข I/O ontologies: patientInfo.owl and computeBMI.owl, importing
medical_terminology.owl
PRACTICAL EXAMPLE (5)
โ— Code the business logic of your services in Java, Perl or Python.
โžข There is a good open-source Java library for creating SADI
services as Java Servlets.
โžข A skeleton code for a service is generated automatically; we just
have to fill the body of one method.
โžข The library takes care of all the HTTP connectivity issues, parses
the input RDF to a simple abstract representation (Jena), and
renders the output RDF.
โžข The compiled WAR file can be immediately deployed on a servlet
container (Tomcat, Jetty, etc).
โžข SADI services take only 10-15 min to code (if the business logic is
simple or already programmed).
PRACTICAL EXAMPLE (6)
Edit pom.xml and run service skeleton creation plug-in:
PRACTICAL EXAMPLE (7)
Just add your business logic code in processInput():
PRACTICAL EXAMPLE (8)
Source database patientsDB.csv :
PRACTICAL EXAMPLE (9)
Finished processInput()
for service
patientInfo :
PRACTICAL EXAMPLE (10)
Finished processInput() for service
computeBMI :
PRACTICAL EXAMPLE (11)
โ€ข Deploy the services:
COPY target/my-sadi-services.war TO /var/lib/tomcat7/webapps/
โ€ข Test service description availability (HTTP GET):
PRACTICAL EXAMPLE (12)
Test RDF for the services:
PRACTICAL EXAMPLE (13)
Service test runs with HTTP POST:
PRACTICAL EXAMPLE (14)
Running HYDRA command line application:
HYDRA PACKAGING
โ€ข Java API - can be embedded in something else.
โ€ข Command line application - convenient for small
experiments.
โ€ข Web service (Java servlet) with
โ€“ JSON-based protocol
โ€“ Java client-side API.
REMEMBER OUR BIG VISION?
BIGGER VISION:
SELF-SERVICE AD HOC QUERYING OF FEDERATED DATA
THERE ARE NO PRINCIPLE OBSTACLES
TO SELF-SERVICE QUERYING BECAUSE ..
โ— HYDRA implements semantic querying:
โ—‹ users need not know how the source data is organised or
accessed.
โ— HYDRA can apply concept hierarchies and rules:
โ—‹ syntactically simple queries for complex questions.
We just need an adequate user interface
for building queries.
HYDRA QUERY COMPOSITION GUI PRINCIPLES
โ— Queries are rendered as highly readable graphs.
โ— A lot of query composition is done by entering
keyphrases in English;
โ—‹ HYDRA GUI suggests (sub)graphs
implementing a given keyphrase.
โ— Nodes can be delete/added manually;
โ—‹ the system suggests possibilities (navigation).
HYDRA GUI SCREENSHOTS
READABLE QUERY DESCRIPTION
EMPTY CANVAS
SERVICE REGISTRY
Note that we added allPatients that enumerates all patients with their MRN.
KEYPHRASE INPUT
HYDRA GUI PROPOSES QUERY GRAPHS
THE USER CAN CONFIRM THE WHOLE GRAPH
OR SOME PARTS OF IT
ADDING MNEMONIC VARIABLE NAME
MNEMONIC VARIABLE NAME ADDED
MORE KEYPHRASE INPUT
HYDRA GUI PROPOSES GRAPH AUGMENTATIONS
VARIABLE NAME
VARIABLE NAME ADDED
MANUALLY ADDING RELATIONS
Numeric
comparison <
here, but could
be any kinds of
relations.
EXTENDED GRAPH
SPECIFYING A DATA VALUE
EXTENDED GRAPH
The query is ready.
It finds all patients
with 20 < BMI < 30
and outputs their
BMI values and
MRNs.
HYDRA GUI GENERATES SPARQL
FROM QUERY GRAPHS
EXECUTING THE QUERY
ANSWERS
SAVING THE ANSWERS AS AN EXCEL
SPREADSHEET
THANK YOU!
Further materials/services are available on request:
โ€ข Live and recorded demos.
โ€ข Publications on previous (academic) case studies.
โ€ข Training/consulting.
โ€ข http://ipsnp.com/ (Canada) and http://ipsnp.co/ (UK)

More Related Content

What's hot

ALA 2010 -- Jabin White
ALA 2010 -- Jabin WhiteALA 2010 -- Jabin White
ALA 2010 -- Jabin White
bisg
ย 
Presentation forpd bj_1
Presentation forpd bj_1Presentation forpd bj_1
Presentation forpd bj_1
Maori Ito
ย 
Knowledge Discovery & Representation
Knowledge Discovery & RepresentationKnowledge Discovery & Representation
Knowledge Discovery & Representation
Darshan Patil
ย 

What's hot (20)

IRJET- Data Analysis and Solution Prediction using Elasticsearch in Healt...
IRJET-  	  Data Analysis and Solution Prediction using Elasticsearch in Healt...IRJET-  	  Data Analysis and Solution Prediction using Elasticsearch in Healt...
IRJET- Data Analysis and Solution Prediction using Elasticsearch in Healt...
ย 
ALA 2010 -- Jabin White
ALA 2010 -- Jabin WhiteALA 2010 -- Jabin White
ALA 2010 -- Jabin White
ย 
Ontop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational DatabasesOntop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational Databases
ย 
Snow Owl Platform. Unlocking the meaning from healthcare data.
Snow Owl Platform. Unlocking the meaning from healthcare data. Snow Owl Platform. Unlocking the meaning from healthcare data.
Snow Owl Platform. Unlocking the meaning from healthcare data.
ย 
MongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema DesignMongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema Design
ย 
Data mining
Data miningData mining
Data mining
ย 
Presentation forpd bj_1
Presentation forpd bj_1Presentation forpd bj_1
Presentation forpd bj_1
ย 
Webinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance ImplicationsWebinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance Implications
ย 
SNOMED Bound to (Information) Model | Putting terminology to work
SNOMED Bound to (Information) Model | Putting terminology to workSNOMED Bound to (Information) Model | Putting terminology to work
SNOMED Bound to (Information) Model | Putting terminology to work
ย 
WP4-QoS Management in the Cloud
WP4-QoS Management in the CloudWP4-QoS Management in the Cloud
WP4-QoS Management in the Cloud
ย 
Limits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in BioinformaticsLimits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in Bioinformatics
ย 
Text mining meets neural nets
Text mining meets neural netsText mining meets neural nets
Text mining meets neural nets
ย 
Data Mining on SpamBase,Wine Quality and Communities and Crime Datasets
Data Mining on SpamBase,Wine Quality and Communities and Crime DatasetsData Mining on SpamBase,Wine Quality and Communities and Crime Datasets
Data Mining on SpamBase,Wine Quality and Communities and Crime Datasets
ย 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
ย 
Knowledge Discovery in Remote Access Databases
Knowledge Discovery in Remote Access Databases Knowledge Discovery in Remote Access Databases
Knowledge Discovery in Remote Access Databases
ย 
What makes a linked data pattern interesting?
What makes a linked data pattern interesting?What makes a linked data pattern interesting?
What makes a linked data pattern interesting?
ย 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?
ย 
Knowledge Discovery & Representation
Knowledge Discovery & RepresentationKnowledge Discovery & Representation
Knowledge Discovery & Representation
ย 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
ย 
Dev days 2017 questionnaires (brian postlethwaite)
Dev days 2017 questionnaires (brian postlethwaite)Dev days 2017 questionnaires (brian postlethwaite)
Dev days 2017 questionnaires (brian postlethwaite)
ย 

Similar to A practical introduction to SADI semantic Web services and HYDRA query tool

Michael mrissa c aise
Michael mrissa c aiseMichael mrissa c aise
Michael mrissa c aise
caise2013vlc
ย 
Health Care Data Management.docx
Health Care Data Management.docxHealth Care Data Management.docx
Health Care Data Management.docx
write4
ย 
Tutorial 22 mastering olap reporting drilling through using mdx
Tutorial 22 mastering olap reporting drilling through using mdxTutorial 22 mastering olap reporting drilling through using mdx
Tutorial 22 mastering olap reporting drilling through using mdx
Subandi Wahyudi
ย 
Breaking down data silos with the open data protocol
Breaking down data silos with the open data protocolBreaking down data silos with the open data protocol
Breaking down data silos with the open data protocol
Woodruff Solutions LLC
ย 
Publish subscribe model overview
Publish subscribe model overviewPublish subscribe model overview
Publish subscribe model overview
Ishraq Al Fataftah
ย 

Similar to A practical introduction to SADI semantic Web services and HYDRA query tool (20)

Alexandria ACM Student Chapter | Specification & Verification of Data-Centric...
Alexandria ACM Student Chapter | Specification & Verification of Data-Centric...Alexandria ACM Student Chapter | Specification & Verification of Data-Centric...
Alexandria ACM Student Chapter | Specification & Verification of Data-Centric...
ย 
Michael mrissa c aise
Michael mrissa c aiseMichael mrissa c aise
Michael mrissa c aise
ย 
MongoDB Stich Overview
MongoDB Stich OverviewMongoDB Stich Overview
MongoDB Stich Overview
ย 
Services for Science
Services for ScienceServices for Science
Services for Science
ย 
Health Care Data Management.docx
Health Care Data Management.docxHealth Care Data Management.docx
Health Care Data Management.docx
ย 
Orm and hibernate
Orm and hibernateOrm and hibernate
Orm and hibernate
ย 
Tutorial 22 mastering olap reporting drilling through using mdx
Tutorial 22 mastering olap reporting drilling through using mdxTutorial 22 mastering olap reporting drilling through using mdx
Tutorial 22 mastering olap reporting drilling through using mdx
ย 
LeVan, "Search Web Services"
LeVan, "Search Web Services"LeVan, "Search Web Services"
LeVan, "Search Web Services"
ย 
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDB
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDBIntroducing MongoDB Stitch, Backend-as-a-Service from MongoDB
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDB
ย 
Databases on AWS: The Right Tool for the Right Job (DAT205-R1) - AWS re:Inven...
Databases on AWS: The Right Tool for the Right Job (DAT205-R1) - AWS re:Inven...Databases on AWS: The Right Tool for the Right Job (DAT205-R1) - AWS re:Inven...
Databases on AWS: The Right Tool for the Right Job (DAT205-R1) - AWS re:Inven...
ย 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
ย 
les07.pdf
les07.pdfles07.pdf
les07.pdf
ย 
Breaking down data silos with the open data protocol
Breaking down data silos with the open data protocolBreaking down data silos with the open data protocol
Breaking down data silos with the open data protocol
ย 
A Case Elaboration Methodology for a Semantic Web Service Discovery System Ba...
A Case Elaboration Methodology for a Semantic Web Service Discovery System Ba...A Case Elaboration Methodology for a Semantic Web Service Discovery System Ba...
A Case Elaboration Methodology for a Semantic Web Service Discovery System Ba...
ย 
Object relational database management system
Object relational database management systemObject relational database management system
Object relational database management system
ย 
MIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresMIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome Measures
ย 
Knowledge Discovery Query Language (KDQL)
Knowledge Discovery Query Language (KDQL)Knowledge Discovery Query Language (KDQL)
Knowledge Discovery Query Language (KDQL)
ย 
ADBMS ASSIGNMENT
ADBMS ASSIGNMENTADBMS ASSIGNMENT
ADBMS ASSIGNMENT
ย 
Publish subscribe model overview
Publish subscribe model overviewPublish subscribe model overview
Publish subscribe model overview
ย 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
ย 

Recently uploaded

CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service ๐Ÿชก
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service  ๐ŸชกCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service  ๐Ÿชก
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service ๐Ÿชก
anilsa9823
ย 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
ย 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
University of Hertfordshire
ย 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
ย 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sรฉrgio Sacani
ย 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
LeenakshiTyagi
ย 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Sรฉrgio Sacani
ย 

Recently uploaded (20)

โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….
โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….
โคJammu Kashmir Call Girls 8617697112 Personal Whatsapp Number ๐Ÿ’ฆโœ….
ย 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
ย 
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service ๐Ÿชก
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service  ๐ŸชกCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service  ๐Ÿชก
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kesar Bagh Lucknow best Night Fun service ๐Ÿชก
ย 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
ย 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
ย 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
ย 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
ย 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
ย 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
ย 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
ย 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
ย 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
ย 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
ย 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
ย 
Stunning โžฅ8448380779โ–ป Call Girls In Panchshil Enclave Delhi NCR
Stunning โžฅ8448380779โ–ป Call Girls In Panchshil Enclave Delhi NCRStunning โžฅ8448380779โ–ป Call Girls In Panchshil Enclave Delhi NCR
Stunning โžฅ8448380779โ–ป Call Girls In Panchshil Enclave Delhi NCR
ย 
Lucknow ๐Ÿ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow ๐Ÿ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow ๐Ÿ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow ๐Ÿ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
ย 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
ย 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
ย 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
ย 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
ย 

A practical introduction to SADI semantic Web services and HYDRA query tool

  • 1. A PRACTICAL INTRODUCTION TO SADI SEMANTIC WEB SERVICES AND HYDRA QUERY TOOL Alexandre Riazanov, CTO IPSNP Computing Inc Oslo University, Sep 23, 2015
  • 2. PLAN OF THE TALK โ€ข A brief reminder of the previous episode: data federation with SADI and HYDRA. โ€ข RDF and OWL as syntactic foundations of service I/O and functionality descriptions. โ€ข Query execution with automatic service discovery and reasoning. โ€ข Resource publishing process with SADI, with a detailed practical example (time permitting).
  • 3. DATA FEDERATION: QUERYING MULTIPLE HETEROGENEOUS SOURCES AS A SINGLE DB
  • 4. QUERY EXAMPLES โ€ข Find the names of drugs that contain chemical category Y as active ingredients. โ€ข Find documents mentioning enzyme activity X, extract info on protein mutations and visualize mutations on 3D structure. โ€ข Annotate a DNA sequence X with molecular functions of proteins produced by the corresponding gene. โ€ข Find patients with precondition X diagnosed with infections Y resulting from procedure Z. โ€ข Find patients diagnosed with X while taking drug C.
  • 5. HOW WE DO IT WITH HYDRA AND SADI SEMANTIC WEB SERVICES
  • 6. A HIGH LEVEL VIEW OF THE HYDRA APPROACH โ— Given a SPARQL query, HYDRA analyses it by using an intelligent logic-based algorithm (proprietary, unlike SADI itself). โ— HYDRA requests descriptions of potentially useful services from available SADI service registries. โ— HYDRA processes the descriptions and figures out which services have to be invoked, on what data and in what order. SPARQL is a W3C standard semantic query language -- much more intuitive than SQL.
  • 7. HOW IS THIS ALL POSSIBLE? โ€ข Key ingredient: the SADI framework for Semantic Web services (Semantic Automated Discovery and Integration). โ€ข SADI services are: โ€ข RESTful services โ€ข consuming and producing one format -- RDF, โ€ข with semantic descriptions (in OWL) fully defining their functionality.
  • 8. DIGRESSION: RDF โ€ข W3C RDF = Resource Description Framework โ€ข Standartised graph-based data model and a few standard rendering formats. โ€ข Nodes = objects (URIs) and data values like โ€œabcโ€^^xsd:string or โ€œ123โ€^^xsd:integer. โ€ข Edges: binary relations.
  • 9. RDF EXAMPLES @prefix mt: <http://localhost:8080/medical_terminology.owl#> . <http://example.com/patient#1234> rdf:type mt:Patient . <http://example.com/patient#1234> mt:has_mass _:hm . _:hm rdf:type mt:Measurement . _:hm mt:has_value "92.0"^^xsd:float . _:hm mt:has_units mt:kg . @prefix mt: <http://localhost:8080/medical_terminology.owl#> . <http://example.com/patient#1234> a mt:Person ; mt:has_mass [a mt:Measurement; mt:has_value "92.0"^^xsd:float; mt:has_units mt:kg] . The original XML- based rendering format is also popular.
  • 10. DIGRESSION: OWL โ€ข W3C OWL = Web Ontology Language โ€ข Essentially, extends RDF with definitions and other axioms for classes (types of objects) and properties (binary relations). โ€ข Most useful axiom types -- class and property chierarchies: Patient subClassOf Person loves subPropertyOf knows โ€ข SADI reuses property restriction syntax: has_MRN exactly 1 string
  • 11. SADI SERVICE I/O โ€ข Input: RDF description of an input object. โ€ข Output: another RDF graph providing more (computed or retrieved) info about the input object or linking it to other objects. โ€ข Since all SADI services โ€œtalk the same languageโ€ (RDF), they are 100% syntactically interoperable: โ€“ output of one SADI service can be directly consumed by any other SADI services.
  • 12. COMPLETE SEMANTIC DESCRIPTIONS OF SERVICE FUNCTIONALITY SADI services publish semantic descriptions of their I/O that completely define what the service expects and can accept as input, and what RDF assertions the service can output. โ€ข Unique and extremely powerful property: it facilitates completely automatic discovery and orchestration of services.
  • 14. SEMANTIC FUNCTIONALITY DESCRIPTION โ€ข OWL syntax is repurposed to define what RDF graphs are acceptable as input, and what RDF graphs may be produced in the output. โ€ข Input(computeBMI) = Person and (has_height exactly 1 (Measurement and (has_value exactly 1 float))) โ€ข Output(computeBMI) = has_BMI exactly 1 float
  • 15. SERVICE INPUT CLASS โ€ข Specifies what kind of objects (RDF descriptions) the service expects in the input. OWL syntax is convenient for such definitions. โ€ข Almost always just an enumeration of attributes of the input objects the SADI service expects. โ— If the input class is defined as Person and (has_height exactly 1 (Measurement and (has_value exactly 1 float) and (has_units exactly 1 {m})) and (has_mass exactly 1 (Measurement and (has_value exactly 1 float) and (has_units exactly 1 {kg})) โ€ฆ the service expects something like this in the input: patient1234 a Person; has_height [a Measurement; has_value โ€œ1.7"^^xsd:float; has_units m]; has_mass [a Measurement; has_value โ€œ92.0"^^xsd:float; has_units kg]
  • 16. SERVICE OUTPUT CLASS โ€ข A SADI service advertises itself by publishing its output class specifying what the service promises to produce as the output. โ€ข The class must enumerate attributes that the service will add to the input object. This fully semantically defines what the service does! โ— If the output class is defined as has_BMI exactly 1 float โ€ฆ service clients can expect something like this in the output: patient1234 has_BMI โ€œ31.83โ€^^xsd:float
  • 17. DIGRESSION: SPARQL โ€ข W3C SPARQL - standard query language for the RDF data model. โ€ข SPARQL clients are programs that execute SPARQL queries, typically on RDF triplestores. PREFIX mt: <http://localhost:8080/medical_terminology.owl#> SELECT ?mass { <http://example.com/patient#1234> a mt:Person ; mt:has_mass [a mt:Measurement; mt:has_value ?mass; mt:has_units mt:kg] . } โ€ข HYDRA is also a SPARQL client, but for virtual RDF DBs.
  • 18. AUTOMATIC SERVICE DISCOVERY โ€ข With the I/O descriptions, a sufficiently intelligent client can figure out that it can call the service if the client has to satisfy a query condition like this: patient1234 has_BMI ?bmi_value โ€ข The query condition suggests that a service with has_BMI in the output may be useful if called on the object patient1234 โ€ข To make the call, the client must have enough information about patient1234 : according to the input class, has_height and has_mass must be attached to it and sent to the service.
  • 19. QUERY, EXECUTION, ANSWERS Query: FROM <.......rdf> # seed data SELECT ?bmi_value { patient1234 a Person; has_BMI ?bmi_value } Execution: HYDRA โ— seed data in FROM clause describes the heights and weights of some people, including patient1234, using has_height and has_mass; โ— since has_BMI is there, HYDRA looks for all services in the available registries that can attach has_BMI and finds computeBMI; โ— patient1234 satisfies the input condition of computeBMI, so HYDRA calls it; โ— computeBMI returns patient1234 has_BMI โ€œ32.3โ€ so HYDRA can return an an answer: ?bmi_value = โ€œ32.3โ€
  • 20. MULTIPLE SERVICES โ€ข Suppose, we donโ€™t know patientโ€™s height/mass, but can retrieve them from a DB by patientโ€™s medical record number (MRN). โ€ข We write another SADI service, patientInfo : Output(patientInfo) = (has_height exactly 1 (Measurement and (has_value exactly 1 float) and (has_units exactly 1 {m})) and (has_mass exactly 1 (Measurement and (has_value exactly 1 float) and (has_units exactly 1 {kg})) Input(patientInfo) = Person and (has_MRN exactly 1 string)
  • 21. AUTOMATIC SERVICE COMPOSITION โ€ข HYDRA can figure out automatically that the output of patientInfo can be submitted to computeBMI, and the composition of the services can solve the query SELECT ?bmi_value { ?patient a Person ; has_MRN โ€œ1234โ€ ; has_BMI ?bmi_value } (no has_height or has_mass anywhere !)
  • 22. INTELLIGENT (REASONING-ENABLED) QUERY EXECUTION โ— Some queries are too complex unless generality can be exploited: โžข For example, query concerning all antibiotics requires generalisation, otherwise all types of antibiotics would have to be enumerated in the query. โ— Much better way to do this is to import a classification of drugs and use it in query execution. โ— HYDRA facilitates such reasoning and even more complex reasoning with rules.
  • 23. (TINY) REASONING EXAMPLE Query defines ?patient as a Patient instead of Person: ?patient a Patient ; has_MRN โ€œ1234โ€ ; ... โ— HYDRA is still able to call patientInfo on the Patient instance, say patient1234, if there is an axiom Patient subClassOf Person. It infers patient1234 a Person, which can be used as input to patientInfo. โ— The axiom can be included in the definition of Output (patientInfo), or specified separately.
  • 24. RESOURCE PUBLISHING WITH SADI (1) โ€ข Specify the source of data / software you want to publish with SADI. โ€ข Model data semantically: find ontologies describing your domains and decide how your data will be expressed in the terms of these ontologies. For example, a patient database and a BMI computation algorithm.
  • 25. RESOURCE PUBLISHING WITH SADI (2) โ€ข Define your services I/O semantically: decide how to describe the operation of your services in the terms of the domain ontologies, i.e., what will be written in the input and output classes. โ€ข Code the business logic of your services in Java, Perl or Python. If a service wraps a DB, convert the input RDF into a query and the query results back to RDF. The coding effort is usually tiny compared to the modelling. โ€ข Overall development costs may be considerable, but this cost is well amortized because SADI services are highly reusable, due to their unprecedented degree of interoperability and discoverability.
  • 26. PRACTICAL EXAMPLE (1) โ— Specify the source of data / software you want to publish with SADI. โžขDatabase (CSV file) containing patient MRN, name, height, weight, etc. We will use it to implement patientInfo. โžขBMI computation algorithm: BMI = mass, kg / height, m ^2.
  • 27. PRACTICAL EXAMPLE (2) โ— Model data semantically: find ontologies describing your domains and decide how your data will be expressed in the terms of these ontologies. โžข Create ontology clinical_terms.owl in Protรฉgรฉ: โžข Classes: Person, Patient, Measurement, Units โžข Properties: has_BMI, has_MRN, has_height, has_mass, has_value, has_units. โžข Individuals: m, kg. โžข RDF data sample: patient1234 a Patient; has_MRN โ€œ1234โ€^^xsd:string; has_height [a Measurement; has_value โ€œ1.7"^^xsd:float; has_units m]; . . .
  • 28. PRACTICAL EXAMPLE (3) Background ontology medical_terminology.owl Deploy: cp medical_terminology.owl /var/lib/tomcat7/webapps/ROOT/ URL: http://localhost:8080/medical_terminology.owl
  • 29. PRACTICAL EXAMPLE (4) โ— Define your services I/O semantically: decide how to describe the operation of your services in the terms of the domain ontologies, i.e., what will be written in the input and output classes. โžข I/O ontologies: patientInfo.owl and computeBMI.owl, importing medical_terminology.owl
  • 30. PRACTICAL EXAMPLE (5) โ— Code the business logic of your services in Java, Perl or Python. โžข There is a good open-source Java library for creating SADI services as Java Servlets. โžข A skeleton code for a service is generated automatically; we just have to fill the body of one method. โžข The library takes care of all the HTTP connectivity issues, parses the input RDF to a simple abstract representation (Jena), and renders the output RDF. โžข The compiled WAR file can be immediately deployed on a servlet container (Tomcat, Jetty, etc). โžข SADI services take only 10-15 min to code (if the business logic is simple or already programmed).
  • 31. PRACTICAL EXAMPLE (6) Edit pom.xml and run service skeleton creation plug-in:
  • 32. PRACTICAL EXAMPLE (7) Just add your business logic code in processInput():
  • 33. PRACTICAL EXAMPLE (8) Source database patientsDB.csv :
  • 34. PRACTICAL EXAMPLE (9) Finished processInput() for service patientInfo :
  • 35. PRACTICAL EXAMPLE (10) Finished processInput() for service computeBMI :
  • 36. PRACTICAL EXAMPLE (11) โ€ข Deploy the services: COPY target/my-sadi-services.war TO /var/lib/tomcat7/webapps/ โ€ข Test service description availability (HTTP GET):
  • 37. PRACTICAL EXAMPLE (12) Test RDF for the services:
  • 38. PRACTICAL EXAMPLE (13) Service test runs with HTTP POST:
  • 39. PRACTICAL EXAMPLE (14) Running HYDRA command line application:
  • 40. HYDRA PACKAGING โ€ข Java API - can be embedded in something else. โ€ข Command line application - convenient for small experiments. โ€ข Web service (Java servlet) with โ€“ JSON-based protocol โ€“ Java client-side API.
  • 41. REMEMBER OUR BIG VISION?
  • 42. BIGGER VISION: SELF-SERVICE AD HOC QUERYING OF FEDERATED DATA
  • 43. THERE ARE NO PRINCIPLE OBSTACLES TO SELF-SERVICE QUERYING BECAUSE .. โ— HYDRA implements semantic querying: โ—‹ users need not know how the source data is organised or accessed. โ— HYDRA can apply concept hierarchies and rules: โ—‹ syntactically simple queries for complex questions. We just need an adequate user interface for building queries.
  • 44. HYDRA QUERY COMPOSITION GUI PRINCIPLES โ— Queries are rendered as highly readable graphs. โ— A lot of query composition is done by entering keyphrases in English; โ—‹ HYDRA GUI suggests (sub)graphs implementing a given keyphrase. โ— Nodes can be delete/added manually; โ—‹ the system suggests possibilities (navigation).
  • 48. SERVICE REGISTRY Note that we added allPatients that enumerates all patients with their MRN.
  • 50. HYDRA GUI PROPOSES QUERY GRAPHS
  • 51. THE USER CAN CONFIRM THE WHOLE GRAPH OR SOME PARTS OF IT
  • 55. HYDRA GUI PROPOSES GRAPH AUGMENTATIONS
  • 58. MANUALLY ADDING RELATIONS Numeric comparison < here, but could be any kinds of relations.
  • 61. EXTENDED GRAPH The query is ready. It finds all patients with 20 < BMI < 30 and outputs their BMI values and MRNs.
  • 62. HYDRA GUI GENERATES SPARQL FROM QUERY GRAPHS
  • 65. SAVING THE ANSWERS AS AN EXCEL SPREADSHEET
  • 66. THANK YOU! Further materials/services are available on request: โ€ข Live and recorded demos. โ€ข Publications on previous (academic) case studies. โ€ข Training/consulting. โ€ข http://ipsnp.com/ (Canada) and http://ipsnp.co/ (UK)