3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
3 Status report of degree project
for semantic reasoning in Bioclipse
Samuel Lampa, 2010-04-07
Project blog: http://saml.rilspace.com
How do biochemical questions
formulated as Prolog queries
compare to other solutions
available in Bioclipse in terms of
speed and expressiveness?
Compared Semantic Tools
● General RDF querying (via SPARQL)
● OWL-DL Reasoning (via SPARQL)
● General querying via Jena (via SPARQL)
● Access to RDF triples (both assertion and querying) via the
rdf( Subject, Predicate, Object ) method
● Complex wrapper/convenience methods can be built
Use Case: NMRShiftDB
Interesting use case:
Querying NMRShiftDB data
– Rather shallow RDF graph
– Numeric (float value) interval
NMR Spectrum Similarity Search
What to test:
Given a spectrum,
represented as a list of shift
values, find spectra with
the same shifts, (allowing
Intensity variation within a limit).
Shift → “Dereferencing”
% Register RDF namespaces, for use in the convenience methods at the end
:- rdf_register_ns(nmr, 'http://www.nmrshiftdb.org/onto#').
Prolog code :- rdf_register_ns(xsd, 'http://www.w3.org/2001/XMLSchema#').
find_mol_with_peak_vals_near( SearchShiftVals, Mols ) :-
% Pick the Mols in 'Mol', that match the pattern:
% list_peak_shifts_of_mol( Mol, MolShiftVals ), contains_list_elems_near( SearchShiftVals, MolShiftVals )
% and collect them in 'Mols'.
( list_peak_shifts_of_mol( Mol, MolShiftVals ), % A Mol's shift values are collected
contains_list_elems_near( SearchShiftVals, MolShiftVals ) ), % and compared against the given SearchShiftVals
[Mols|MolTail] ). % In 'Mols', all 'Mol's, for which their shift
% values match the SearchShiftVals, are collected.
% Given a 'Mol', give it's shiftvalues in list form, in 'ListOfPeaks'
list_peak_shifts_of_mol( Mol, ListOfPeaks ) :-
has_spectrum( Mol, Spectrum ),
( has_peak( Spectrum, Peak ),
has_shift_val( Peak, ShiftVal ) ),
% Compare two lists to see if list2 has near-matches for each of the values in list1
contains_list_elems_near( [ElemHead|ElemTail], List ) :-
member_close_to( ElemHead, List ),
( contains_list_elems_near( ElemTail, List );
ElemTail ==  ).
% Recursive construct: %
% Test first the end criterion:
member_close_to( X, [ Y | Tail ] ) :-
closeTo( X, Y ).
% but if the above doesn't validate, then recursively continue with the tail of List2:
member_close_to( X, [ Y | Tail ] ) :-
member_close_to( X, Tail ).
% Numerical near-match
closeTo( Val1, Val2 ) :-
abs(Val1 - Val2) =< 0.3.
% Convenience accessory methods %
has_shift_val( Peak, ShiftVal ) :-
rdf( Peak, nmr:hasShift, literal(type(xsd:decimal, ShiftValLiteral))),
atom_number_create( ShiftValLiteral, ShiftVal ).
has_spectrum( Subject, Predicate ) :-
rdf( Subject, nmr:has_spectrum, Predicate).
has_peak( Subject, Predicate ) :-
rdf( Subject, nmr:has_peak, Predicate).
% Wrapper method for the atom_number/2 method which converts atoms (string constants) to number.
% The wrapper methods avoids exceptions on empty atoms, instead converting into a zero.
atom_number_create( Atom, Number ) :-
atom_length( Atom, AtomLength ), AtomLength > 0 -> % IF atom is not empty
atom_number( Atom, Number ); % THEN Convert the atom to a numerical value
atom_number( '0', Number ). % ELSE Convert to a zero ");
“Expressivity”: SPARQL vs Prolog
Prolog predicate taking variables
How to change “input parameters”?
● SPARQL: Modify SPARQL query
● Prolog: Change input parameter
● Fewer lines of code
● Easier to understand the code
● Easier to change input parameters
● Easier to re-use existing logic
(call a method rather than cut and paste
● Easier to change aspects of the execution logic
● Prolog is the fastest (in-memory only)
● Jena faster with disk based than with
in-memory RDF store!
● Pellet with in-memory store is slow
● Pellet with disk based store out of
Project plan from last
Planned final presentation: 28 april 2010 (BMC B7:101a)
Everybody is welcome!