nd
2 Status report of degree project

        Integrating Blipkit/BioProlog
      for semantic reasoning in Bioclipse
            Samuel Lampa, 2010-01-25
       Project blog: http://saml.rilspace.com
Some background...
What is “Semantic Web”?
What is Semantic Web?

  “Enabling more powerful use of information”
  Main goals:
● Data availability (on the web)


● Machine-readability of data


● Knowledge integration


● Automatic “conclusion drawing”


  ● “Reasoning”, using Reasoners →
This project compares
   two reasoners:
  Pellet and Blipkit
Research question
Research question



 How do biochemical questions
  formulated as Prolog queries
   compare to other solutions
available in Bioclipse in terms of
    speed and expressiveness?
Semantic Reasoners

●   Pellet/Jena
    ●   Uses W3C languages
        – OWL (Class definitions)
        – RDF (Facts)
        – SPARQL (Querying)
●   Blipkit/BioProlog
    ●   Uses Prolog, with W3C languages “on top”
        – Class definitions, Facts and Queries either in
          W3C languages (“on top” of prolog) or in pure
          Prolog!
What is Prolog?
What is Prolog?

● State facts and rules
● Execute by running queries over these

  facts and rules

●   Unique features:
    ●   Backtracking
    ●   “Closed-world assumption”
Prolog code example
Prolog code example
% === SOME FACTS ===

hasHBondDonors( substanceX, 3 ). % “substance X has 3 H-bond donors”
% etc …

% === A RULE ("RULE OF FIVE" ÀLA PROLOG) ===

isDrugLike( Substance ) :-
  hasHBondDonorsCount( Substance, HBDonors ),
  HBDonors <= 5,
  hasHBondAcceptorsCount( Substance, HBAcceptors ),
  HBAcceptors <= 10,
  hasMolecularWeight( Substance, MW ),
  MW < 500.

% === QUERYING THE RULE ===

?- isDrugLike(substanceX)
true.
?- isDrugLike(X)
X = substanceX ;
X = substanceY.
Prolog code example
% === SOME FACTS ===

hasHBondDonors( substanceX, 3 ). % “substance X has 3 H-bond donors”
% etc …

% === A RULE ("RULE OF FIVE" ÀLA PROLOG) ===
Head                    Implication (“If [body] then [head]”)
isDrugLike( Substance ) :-
  hasHBondDonorsCount( Substance, HBDonors ),
  HBDonors <= 5,
  hasHBondAcceptorsCount( Substance, HBAcceptors ),
  HBAcceptors <= 10,
  hasMolecularWeight( Substance, MW ),
  MW < 500.
                                                     Body
% === QUERYING THE RULE ===

?- isDrugLike(substanceX)                  Comma means conjunction (“and”)
true.
?- isDrugLike(X)
X = substanceX ;
X = substanceY.
                 Capitalized terms are always variables
Prolog code example
% === SOME FACTS ===

hasHBondDonors( substanceX, 3 ). % “substance X has 3 H-bond donors”
% etc …

% === A RULE ("RULE OF FIVE" ÀLA PROLOG) ===

isDrugLike( Substance ) :-
  hasHBondDonorsCount( Substance, HBDonors ),
  HBDonors <= 5,
  hasHBondAcceptorsCount( Substance, HBAcceptors ),
  HBAcceptors <= 10,
  hasMolecularWeight( Substance, MW ),
  MW < 500.

% === QUERYING THE RULE ===

?- isDrugLike(substanceX)     Testing a specific atom (“sutstanceX”)
true.
?- isDrugLike(X)
X = substanceX ;              By submitting a variable (“X”), it will be populated with all
                              instances which satisfies the “isDrugLike” rule
X = substanceY.
Where are we now?
Project plan
What is done so far?
What is done so far?

●   Integration of Blipkit in Bioclipse
    ●   Done: General purpose methods
    ●   Done: Found usage strategy for combined use of
        Bioclipse JS scripting and Prolog
●   Comparing Prolog and Pellet
    ●   Done: Simple performance testing
    ●   Now: Stuck on NMR spectrum similarity search
        – (No backtracking on arithmetic operators in
          SPARQL)
What is left?
What remains to be done?

●   Integration of Prolog / Blipkit
    ●   Refinements?
●   Comparing Prolog and Pellet
    ●   NMR spectrum similarity search
         – Investigate use of OWL in querying
         – Other options? SWRL?
    ●   ChEMBL data
    ●   Toxicity data (opentox.org)
Example
Bioclipse / Prolog script
Example Bioclipse/Prolog script

blipkit.init();
blipkit.loadRDFToProlog("nmrshiftdata.100.rdf.xml");

// Define a “convenience prolog method”

blipkit.loadPrologCode(" 
  hasPeak( Subject, Predicate ) :- 
    rdf_db:rdf( Subject, 
                'http://www.nmrshiftdb.org/onto#hasPeak', 
                Predicate ). 
");

// Call the convenience method (which in turn executes it's
// “body”), and returns all mathing results as an array
var resultList =
blipkit.queryProlog(["hasPeak","10","Subject","Predicate"]);
Example Bioclipse/Prolog script

blipkit.init();
blipkit.loadRDFToProlog("nmrshiftdata.100.rdf.xml");

// Define a “convenience prolog method”

blipkit.loadPrologCode(" 
  hasPeak( Subject, Predicate ) :- 
    rdf_db:rdf( Subject, 
                'http://www.nmrshiftdb.org/onto#hasPeak', 
                Predicate ). 
");                            Prolog rule to load into prolog engine

// Call the convenience method (which in turn executes it's
// “body”), and returns all mathing results as an array
var resultList =
blipkit.queryProlog(["hasPeak","10","Subject","Predicate"]);


       Prolog method to call
                               Limit the number of results   Prolog variables
Current status of
research question
Current status of research question

●   Performance
    ● Prolog won so far. Exceptions?




●   Usability
    ● Prolog very convenient for iterative

      wrapping of complex logic.
      Can RDF/OWL/SPARQL replicate this?

●   Where do RDF/OWL/SPARQL excel?
Project plan
Thank you!
Project blog: http://saml.rilspace.com
Project plan – Current version
Project plan – Proposed version

2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse

  • 1.
    nd 2 Status reportof degree project Integrating Blipkit/BioProlog for semantic reasoning in Bioclipse Samuel Lampa, 2010-01-25 Project blog: http://saml.rilspace.com
  • 2.
  • 3.
  • 4.
    What is SemanticWeb? “Enabling more powerful use of information” Main goals: ● Data availability (on the web) ● Machine-readability of data ● Knowledge integration ● Automatic “conclusion drawing” ● “Reasoning”, using Reasoners →
  • 5.
    This project compares two reasoners: Pellet and Blipkit
  • 6.
  • 7.
    Research question Howdo biochemical questions formulated as Prolog queries compare to other solutions available in Bioclipse in terms of speed and expressiveness?
  • 8.
    Semantic Reasoners ● Pellet/Jena ● Uses W3C languages – OWL (Class definitions) – RDF (Facts) – SPARQL (Querying) ● Blipkit/BioProlog ● Uses Prolog, with W3C languages “on top” – Class definitions, Facts and Queries either in W3C languages (“on top” of prolog) or in pure Prolog!
  • 9.
  • 10.
    What is Prolog? ●State facts and rules ● Execute by running queries over these facts and rules ● Unique features: ● Backtracking ● “Closed-world assumption”
  • 11.
  • 12.
    Prolog code example %=== SOME FACTS === hasHBondDonors( substanceX, 3 ). % “substance X has 3 H-bond donors” % etc … % === A RULE ("RULE OF FIVE" ÀLA PROLOG) === isDrugLike( Substance ) :- hasHBondDonorsCount( Substance, HBDonors ), HBDonors <= 5, hasHBondAcceptorsCount( Substance, HBAcceptors ), HBAcceptors <= 10, hasMolecularWeight( Substance, MW ), MW < 500. % === QUERYING THE RULE === ?- isDrugLike(substanceX) true. ?- isDrugLike(X) X = substanceX ; X = substanceY.
  • 13.
    Prolog code example %=== SOME FACTS === hasHBondDonors( substanceX, 3 ). % “substance X has 3 H-bond donors” % etc … % === A RULE ("RULE OF FIVE" ÀLA PROLOG) === Head Implication (“If [body] then [head]”) isDrugLike( Substance ) :- hasHBondDonorsCount( Substance, HBDonors ), HBDonors <= 5, hasHBondAcceptorsCount( Substance, HBAcceptors ), HBAcceptors <= 10, hasMolecularWeight( Substance, MW ), MW < 500. Body % === QUERYING THE RULE === ?- isDrugLike(substanceX) Comma means conjunction (“and”) true. ?- isDrugLike(X) X = substanceX ; X = substanceY. Capitalized terms are always variables
  • 14.
    Prolog code example %=== SOME FACTS === hasHBondDonors( substanceX, 3 ). % “substance X has 3 H-bond donors” % etc … % === A RULE ("RULE OF FIVE" ÀLA PROLOG) === isDrugLike( Substance ) :- hasHBondDonorsCount( Substance, HBDonors ), HBDonors <= 5, hasHBondAcceptorsCount( Substance, HBAcceptors ), HBAcceptors <= 10, hasMolecularWeight( Substance, MW ), MW < 500. % === QUERYING THE RULE === ?- isDrugLike(substanceX) Testing a specific atom (“sutstanceX”) true. ?- isDrugLike(X) X = substanceX ; By submitting a variable (“X”), it will be populated with all instances which satisfies the “isDrugLike” rule X = substanceY.
  • 15.
  • 16.
  • 17.
    What is doneso far?
  • 18.
    What is doneso far? ● Integration of Blipkit in Bioclipse ● Done: General purpose methods ● Done: Found usage strategy for combined use of Bioclipse JS scripting and Prolog ● Comparing Prolog and Pellet ● Done: Simple performance testing ● Now: Stuck on NMR spectrum similarity search – (No backtracking on arithmetic operators in SPARQL)
  • 19.
  • 20.
    What remains tobe done? ● Integration of Prolog / Blipkit ● Refinements? ● Comparing Prolog and Pellet ● NMR spectrum similarity search – Investigate use of OWL in querying – Other options? SWRL? ● ChEMBL data ● Toxicity data (opentox.org)
  • 21.
  • 22.
    Example Bioclipse/Prolog script blipkit.init(); blipkit.loadRDFToProlog("nmrshiftdata.100.rdf.xml"); //Define a “convenience prolog method” blipkit.loadPrologCode(" hasPeak( Subject, Predicate ) :- rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#hasPeak', Predicate ). "); // Call the convenience method (which in turn executes it's // “body”), and returns all mathing results as an array var resultList = blipkit.queryProlog(["hasPeak","10","Subject","Predicate"]);
  • 23.
    Example Bioclipse/Prolog script blipkit.init(); blipkit.loadRDFToProlog("nmrshiftdata.100.rdf.xml"); //Define a “convenience prolog method” blipkit.loadPrologCode(" hasPeak( Subject, Predicate ) :- rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#hasPeak', Predicate ). "); Prolog rule to load into prolog engine // Call the convenience method (which in turn executes it's // “body”), and returns all mathing results as an array var resultList = blipkit.queryProlog(["hasPeak","10","Subject","Predicate"]); Prolog method to call Limit the number of results Prolog variables
  • 24.
  • 25.
    Current status ofresearch question ● Performance ● Prolog won so far. Exceptions? ● Usability ● Prolog very convenient for iterative wrapping of complex logic. Can RDF/OWL/SPARQL replicate this? ● Where do RDF/OWL/SPARQL excel?
  • 26.
  • 27.
    Thank you! Project blog:http://saml.rilspace.com
  • 28.
    Project plan –Current version
  • 29.
    Project plan –Proposed version