Evaluating scientific hypotheses using      the SPARQL Inferencing Notation             Alison Callahan and Michel Dumonti...
2   ESWC2012::HyQue-SPIN
3   ESWC2012::HyQue-SPIN
Uncovering all the evidence to support/refute a hypothesis is becoming increasingly difficult                  and require...
Continuous growth in research outputs    Source:http://www.nlm.nih.gov/bsd/stats/cit_added.html                           ...
Semantic Web technologies for biological   knowledge management and discovery• Capability to publish, link, retrieve and q...
A rapidly growing web of linked data7   “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lo...
Bio2RDF covers the major biological           databases
BioPortal gives up-to-date access to           bio-ontologies
SADI provides access to Semantic           Web Services                                                 The Semantic Autom...
HyQue     HyQue is the Hypothesis query and evaluation system     • A platform for knowledge discovery     • Facilitates h...
HyQue  • Background knowledge as OWL ontologies      hypotheses (HO), processes/events (GO), measurement      values (SIO)...
HyQue Architecture
A HyQue hypothesis is a collection of propositions• proposition: “a statement expressing something true or false”• HyQue p...
Event-based data model     HyQue events denote a phenomenon involving two     objects: „agent‟ and „target‟ . In addition,...
Example Hypothesis     • HyQue‟s demonstrative knowledge       base is focused on galactose       metabolism and regulatio...
Users don‟t need to know RDF to formulate hypotheses     User Interface with auto-completion     http://hyque.semanticscie...
Hypothesis RDF Representation  hypothesis                  :h rdf:type hyque:Hypothesis ;                                 ...
Event RDF representation                                  :e1 rdf:type hyque:event ;                                   <!–...
event:     gal4p positively regulates the         expression of GAL1                           HyQue‟s SPIN rules retrieve...
Combination of system and domain rules to retrieve and score data, and add new triples Event - induction         SPIN indu...
SPIN System Rule :          Link Hypothesis to Evaluation     CONSTRUCT {       ?this ‘has attribute’ ?hypothesisEval .   ...
SPIN Domain Rule: Score experimental evidence          of Gene Expression Induction Event SELECT ?induceEventScore WHERE {...
HyQue domain rules CALCULATE a quantitative      measure of evidence for an event„induce‟ rule (maximum score: 5):   – Is ...
SPIN rule, outcome and score     for a GAL gene induction event                                   4/5 = 0.8026            ...
Can customize rules to get more      evidence, but at a cost if not found     • calculateInhibitEventScore       does not ...
Customization of rules and rulesets can generate     different evidence-based evaluations
Reproducible eScienceLOD for Hypothesis, Rules, Data and Evaluation
Summary     • HyQue is a system that facilitates the       formulation and evaluation of scientific       hypotheses again...
Future Directions     • Collaborative, end user-centered       environment to engineer, share, compare       and evaluate ...
dumontierlab.commichel_dumontier@carleton.ca                         Website: http://dumontierlab.com    Presentations: ht...
Evaluating scientific hypotheses using the SPARQL Inferencing Notation
Upcoming SlideShare
Loading in...5
×

Evaluating scientific hypotheses using the SPARQL Inferencing Notation

1,436

Published on

valuating a hypothesis and its claims against experimental data is an essential scientific activity. However, this task is increasingly challenging given the ever growing volume of publications and data sets. Towards addressing this challenge, we previously developed HyQue, a system for hypothesis formulation and evaluation. HyQue uses domain-specific rulesets to evaluate hypotheses based on well understood scientific principles. However, because scientists may apply differing scientific premises when exploring a hypothesis, flexibility is required in both crafting and executing rulesets to evaluate hypotheses. Here, we report on an extension of HyQue that incorporates rules specified using the SPARQL Inferencing Notation (SPIN). Hypotheses, background knowledge, queries, results and now rulesets are represented and executed using Semantic Web technologies, enabling users to explicitly trace a hypothesis to its evaluation as Linked Data, including the data and rules used by HyQue. We demonstrate the use of HyQue to evaluate hypotheses concerning the yeast galactosegene system.

Published in: Technology, Education
1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total Views
1,436
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
20
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide
  • Evaluating a hypothesis and its claims against experimental data is an essential scientific activity. However, this task is increasingly challenging given the ever growing volume of publications and data sets. Towards addressing this challenge, we previously developed HyQue, a system for hypothesis formulation and evaluation. HyQue uses domain-specific rulesets to evaluate hypotheses based on well understood scientific principles. However, because scientists may apply differing scientific premises when exploring a hypothesis, flexibility is required in both crafting and executing rulesets to evaluate hypotheses. Here, we report on an extension of HyQue that incorporates rules specified using the SPARQL Inferencing Notation (SPIN). Hypotheses, background knowledge, queries, results and now rulesets are represented and executed using Semantic Web technologies, enabling users to explicitly trace a hypothesis to its evaluation as Linked Data, including the data and rules used by HyQue. We demonstrate the use of HyQue to evaluate hypotheses concerning the yeast galactosegene system.
  • Can’t answer questions that require background knowledge
  • We represent a hypothesis as a collection of propositions
  • This is part of a hypothesis represented in N3 and used as input to HyQueNote: Binding between galactose and Gal3p does not return any results; there IS binding between Gal3p and Gal80p
  • The RDF representing the evaluation of the input hypothesis is linked to both the hypothesis AND the data used to evaluate the hypothesis
  • This is a screenshot of some HyQue data in Virtuoso, a triple store system that we use to store and access RDF
  • Transcript of "Evaluating scientific hypotheses using the SPARQL Inferencing Notation"

    1. 1. Evaluating scientific hypotheses using the SPARQL Inferencing Notation Alison Callahan and Michel Dumontier Department of Biology, Carleton University1 ESWC2012::HyQue-SPIN
    2. 2. 2 ESWC2012::HyQue-SPIN
    3. 3. 3 ESWC2012::HyQue-SPIN
    4. 4. Uncovering all the evidence to support/refute a hypothesis is becoming increasingly difficult and requires a lot of digging around
    5. 5. Continuous growth in research outputs Source:http://www.nlm.nih.gov/bsd/stats/cit_added.html http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=1515 ESWC2012::HyQue-SPIN
    6. 6. Semantic Web technologies for biological knowledge management and discovery• Capability to publish, link, retrieve and query de- centralized data• A powerful integrative platform across data, ontology and services• Formal knowledge representation allows for automated reasoning• Massive growth in dataset availability, and soon, in application development
    7. 7. A rapidly growing web of linked data7 “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
    8. 8. Bio2RDF covers the major biological databases
    9. 9. BioPortal gives up-to-date access to bio-ontologies
    10. 10. SADI provides access to Semantic Web Services The Semantic Automated Discovery and Integration (SADI) framework makes it easy to create Semantic Web Services using OWL classes as service inputs and outputs http://sadiframework.org~700 bioinformatic services as of May 29, 2012 Mark Wilkinson, UBC Michel Dumontier, Carleton University Christopher Baker, UNB
    11. 11. HyQue HyQue is the Hypothesis query and evaluation system • A platform for knowledge discovery • Facilitates hypothesis formulation and evaluation • Leverages Semantic Web technologies to provide access to facts, expert knowledge and web services • Conforms to a simplified event-based model • Supports evaluation against positive and negative findings • Transparent and reproducible evidence prioritization • Provenance of across all elements of hypothesis testing – trace a hypothesis to its evaluation, including the data and rules used Callahan A, Dumontier M, Shah NH. HyQue: evaluating hypotheses using Semantic Web technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3.11 ESWC2012::HyQue-SPIN
    12. 12. HyQue • Background knowledge as OWL ontologies hypotheses (HO), processes/events (GO), measurement values (SIO), units (UO), evidence (ECO), molecules (ChEBI), biopolymers (SO), etc • Facts as RDF data model organism data - genes and their chromosomal location, proteins and their functions, localization and participation in interactions, complexes, pathways, biological processes, etc • Evaluation rules defined using SPIN Domain-specific rules - scores based on external knowledge System rules - scores based on hypothesis structureCallahan A, Dumontier M. Evaluating scientific hypotheses using the SPARQL Inferencing Notation.Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012.
    13. 13. HyQue Architecture
    14. 14. A HyQue hypothesis is a collection of propositions• proposition: “a statement expressing something true or false”• HyQue propositions specify events• complex propositions can be formulated using logical operators (AND, OR, XOR…) or decomposed using component relationsHyQue hypothesis ≡ ‘proposition’ that ‘specifies’ only `event’)HyQue hypothesis ≡ ‘proposition’ that `has component part’ only (`proposition’ that ‘specifies’ only `event’)
    15. 15. Event-based data model HyQue events denote a phenomenon involving two objects: „agent‟ and „target‟ . In addition, we can specify the context of this event (e.g. located in nucleus, or under some genetic background) Currently supported events Event 1. protein-protein binding ‘has agent’ agent 2. protein-nucleic acid binding ‘has target’ target 3. molecular activation ‘is located in’ location 4. molecular inhibition 5. gene induction ‘is negated’ boolean 6. gene repression 7. transport15 ESWC2012::HyQue-SPIN
    16. 16. Example Hypothesis • HyQue‟s demonstrative knowledge base is focused on galactose metabolism and regulation. The paper describes a union of hypotheses: (Gal4p induces the expression of GAL1 AND Gal4p induces the expression of GAL7 AND Gal3p induces the expression of GAL2) OR (Gal4p induces the expression of GAL7 AND Gal80p induces the expression of GAL7 AND Gal80p does not inhibit the activity of Gal4p WHEN GAL3 is over-expressed)16
    17. 17. Users don‟t need to know RDF to formulate hypotheses User Interface with auto-completion http://hyque.semanticscience.org17 ESWC2012::HyQue-SPIN
    18. 18. Hypothesis RDF Representation hypothesis :h rdf:type hyque:Hypothesis ; hyque:has-component-part :p1 . has component part :p1 rdf:type hyque:Proposition ; proposition hyque:specifies :e1 specifies :e1 rdf:type hyque:Event . event
    19. 19. Event RDF representation :e1 rdf:type hyque:event ; <!– positive regulation of gene expression --> event: gal4p positively regulates rdf:type <http://bio2rdf.org/go:0010628>; the expression of GAL1 hyque:agent <http://bio2rdf.org/sgd:Gal4p> ; hyque:target <http://bio2rdf.org/sgd:GAL1> ; hyque:is_negated "0"; ….19 ESWC2012::HyQue-SPIN
    20. 20. event: gal4p positively regulates the expression of GAL1 HyQue‟s SPIN rules retrieve event data, and then score it and the overall hypothesis HyQue current contains 63 SPIN rules to evaluate hypotheses: 18 system rules, 45 domain specific rules20 ESWC2012::HyQue-SPIN
    21. 21. Combination of system and domain rules to retrieve and score data, and add new triples Event - induction SPIN induction rule :e1 a go:0010628; hyque:agent sgd:Gal4p; hyque:target sgd:GAL1 . hyque:is_negated "0" ;21 ESWC2012::HyQue-SPIN
    22. 22. SPIN System Rule : Link Hypothesis to Evaluation CONSTRUCT { ?this ‘has attribute’ ?hypothesisEval . ?hypothesisEval a ‘evaluation’. ?hypothesisEval ‘obtained from’ ?propositionEval . ?hypothesisEval ‘has value ?hypothesisEvalScore . } WHERE { ?this ‘has component part’ ?proposition . ?proposition ‘has attribute’ ?propositionEval . BIND(:calculateHypothesisScore(?this) AS ?hypothesisEvalScore) . BIND(IRI(fn:concat(afn:namespace(?this), afn:localname(?this),"_", "evaluation")) AS ?hypothesisEval) . }22 ESWC2012::HyQue-SPIN
    23. 23. SPIN Domain Rule: Score experimental evidence of Gene Expression Induction Event SELECT ?induceEventScore WHERE { BIND (:calculateInduceAgentTypeScore(?arg1) AS ?agentTypeScore) . BIND (:calculateInduceAgentFunctionTypeScore(?arg1) AS ?agentFunctionTypeScore) . BIND (:calculateInduceTargetTypeScore(?arg1) AS ?targetTypeScore) . BIND (:calculateInduceLogicalOperatorScore(?arg1) AS ?logicalOperatorScore) . BIND (:calculateInduceEventLocationScore(?arg1) AS ?eventLocationScore) . BIND (:penalizeNegation(?arg1) AS ?negationScore) . BIND (5 AS ?maxScore) . BIND (((((((?agentTypeScore + ?agentFunctionTypeScore) + ?targetTypeScore) + ?logicalOperatorScore) + ?eventLocationScore) + ?negationScore) / ?maxScore) AS ?induceEventScore) . }24 ESWC2012::HyQue-SPIN
    24. 24. HyQue domain rules CALCULATE a quantitative measure of evidence for an event„induce‟ rule (maximum score: 5): – Is event negated? GO:0010628 • If yes, subtract 2 – Is event of type „induce‟? CHEBI:36080 • If yes, add 1; if no, subtract 1 – Is agent of type „protein‟ or „RNA‟? • If yes, add 1; if of type „gene‟, subtract 1 – Is target of type „gene‟? SO:0000236 • If yes, add 1; if no, subtract 1 – Does agent have known „transcription factor activity‟? • If yes, add 1 GO:0003700 – Is event located in the „nucleus‟? • If yes, add 1; if no, subtract 1 GO:0005634
    25. 25. SPIN rule, outcome and score for a GAL gene induction event 4/5 = 0.8026 ESWC2012::HyQue-SPIN
    26. 26. Can customize rules to get more evidence, but at a cost if not found • calculateInhibitEventScore does not take into account the (Gal4p induces the expression of GAL1 e1 AND physical location of the event Gal3p induces the expression of GAL2 e2 AND participants Gal4p induces the expression of GAL7) OR • Experimental evidence (Gal4p induces the expression of GAL7 AND suggests that physical location Gal80p induces the expression of GAL7 AND in the context of an inhibition Gal80p does not inhibit the activity of Gal4p event is important WHEN GAL3 is over-expressed) • Inhibition of Gal4p activity by Gal80p is known to take place in the nucleus, yet this inhibition is interrupted when Adding a new rule to consider location Gal80p is bound by Gal3p, weakens the event due to lack of data which is typically found in the (0.87 -> score 0.78) cytoplasm27 ESWC2012::HyQue-SPIN
    27. 27. Customization of rules and rulesets can generate different evidence-based evaluations
    28. 28. Reproducible eScienceLOD for Hypothesis, Rules, Data and Evaluation
    29. 29. Summary • HyQue is a system that facilitates the formulation and evaluation of scientific hypotheses against formalized knowledge on the Semantic Web. • This work focused on the development and incorporation of recursive SPIN rules to obtain and score events and multi-event hypotheses using OWL ontologies and RDF-based LOD.30 ESWC2012::HyQue-SPIN
    30. 30. Future Directions • Collaborative, end user-centered environment to engineer, share, compare and evaluate hypotheses • Investigate alternative scoring systems • Structure knowledge beyond the GAL network – EU/US Collaborations on disease-centered research hypotheses – Applications for clinical decision support31 ESWC2012::HyQue-SPIN
    31. 31. dumontierlab.commichel_dumontier@carleton.ca Website: http://dumontierlab.com Presentations: http://slideshare.com/micheldumontier ESWC2012::HyQue-SPIN
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×