Evaluating scientific hypotheses using the SPARQL Inferencing Notation
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Evaluating scientific hypotheses using the SPARQL Inferencing Notation

  • 1,687 views
Uploaded on

valuating a hypothesis and its claims against experimental data is an essential scientific activity. However, this task is increasingly challenging given the ever growing volume of publications and......

valuating a hypothesis and its claims against experimental data is an essential scientific activity. However, this task is increasingly challenging given the ever growing volume of publications and data sets. Towards addressing this challenge, we previously developed HyQue, a system for hypothesis formulation and evaluation. HyQue uses domain-specific rulesets to evaluate hypotheses based on well understood scientific principles. However, because scientists may apply differing scientific premises when exploring a hypothesis, flexibility is required in both crafting and executing rulesets to evaluate hypotheses. Here, we report on an extension of HyQue that incorporates rules specified using the SPARQL Inferencing Notation (SPIN). Hypotheses, background knowledge, queries, results and now rulesets are represented and executed using Semantic Web technologies, enabling users to explicitly trace a hypothesis to its evaluation as Linked Data, including the data and rules used by HyQue. We demonstrate the use of HyQue to evaluate hypotheses concerning the yeast galactosegene system.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
1,687
On Slideshare
1,684
From Embeds
3
Number of Embeds
2

Actions

Shares
Downloads
19
Comments
1
Likes
3

Embeds 3

https://twitter.com 2
http://twitter.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Evaluating a hypothesis and its claims against experimental data is an essential scientific activity. However, this task is increasingly challenging given the ever growing volume of publications and data sets. Towards addressing this challenge, we previously developed HyQue, a system for hypothesis formulation and evaluation. HyQue uses domain-specific rulesets to evaluate hypotheses based on well understood scientific principles. However, because scientists may apply differing scientific premises when exploring a hypothesis, flexibility is required in both crafting and executing rulesets to evaluate hypotheses. Here, we report on an extension of HyQue that incorporates rules specified using the SPARQL Inferencing Notation (SPIN). Hypotheses, background knowledge, queries, results and now rulesets are represented and executed using Semantic Web technologies, enabling users to explicitly trace a hypothesis to its evaluation as Linked Data, including the data and rules used by HyQue. We demonstrate the use of HyQue to evaluate hypotheses concerning the yeast galactosegene system.
  • Can’t answer questions that require background knowledge
  • We represent a hypothesis as a collection of propositions
  • This is part of a hypothesis represented in N3 and used as input to HyQueNote: Binding between galactose and Gal3p does not return any results; there IS binding between Gal3p and Gal80p
  • The RDF representing the evaluation of the input hypothesis is linked to both the hypothesis AND the data used to evaluate the hypothesis
  • This is a screenshot of some HyQue data in Virtuoso, a triple store system that we use to store and access RDF

Transcript

  • 1. Evaluating scientific hypotheses using the SPARQL Inferencing Notation Alison Callahan and Michel Dumontier Department of Biology, Carleton University1 ESWC2012::HyQue-SPIN
  • 2. 2 ESWC2012::HyQue-SPIN
  • 3. 3 ESWC2012::HyQue-SPIN
  • 4. Uncovering all the evidence to support/refute a hypothesis is becoming increasingly difficult and requires a lot of digging around
  • 5. Continuous growth in research outputs Source:http://www.nlm.nih.gov/bsd/stats/cit_added.html http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=1515 ESWC2012::HyQue-SPIN
  • 6. Semantic Web technologies for biological knowledge management and discovery• Capability to publish, link, retrieve and query de- centralized data• A powerful integrative platform across data, ontology and services• Formal knowledge representation allows for automated reasoning• Massive growth in dataset availability, and soon, in application development
  • 7. A rapidly growing web of linked data7 “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
  • 8. Bio2RDF covers the major biological databases
  • 9. BioPortal gives up-to-date access to bio-ontologies
  • 10. SADI provides access to Semantic Web Services The Semantic Automated Discovery and Integration (SADI) framework makes it easy to create Semantic Web Services using OWL classes as service inputs and outputs http://sadiframework.org~700 bioinformatic services as of May 29, 2012 Mark Wilkinson, UBC Michel Dumontier, Carleton University Christopher Baker, UNB
  • 11. HyQue HyQue is the Hypothesis query and evaluation system • A platform for knowledge discovery • Facilitates hypothesis formulation and evaluation • Leverages Semantic Web technologies to provide access to facts, expert knowledge and web services • Conforms to a simplified event-based model • Supports evaluation against positive and negative findings • Transparent and reproducible evidence prioritization • Provenance of across all elements of hypothesis testing – trace a hypothesis to its evaluation, including the data and rules used Callahan A, Dumontier M, Shah NH. HyQue: evaluating hypotheses using Semantic Web technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3.11 ESWC2012::HyQue-SPIN
  • 12. HyQue • Background knowledge as OWL ontologies hypotheses (HO), processes/events (GO), measurement values (SIO), units (UO), evidence (ECO), molecules (ChEBI), biopolymers (SO), etc • Facts as RDF data model organism data - genes and their chromosomal location, proteins and their functions, localization and participation in interactions, complexes, pathways, biological processes, etc • Evaluation rules defined using SPIN Domain-specific rules - scores based on external knowledge System rules - scores based on hypothesis structureCallahan A, Dumontier M. Evaluating scientific hypotheses using the SPARQL Inferencing Notation.Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012.
  • 13. HyQue Architecture
  • 14. A HyQue hypothesis is a collection of propositions• proposition: “a statement expressing something true or false”• HyQue propositions specify events• complex propositions can be formulated using logical operators (AND, OR, XOR…) or decomposed using component relationsHyQue hypothesis ≡ ‘proposition’ that ‘specifies’ only `event’)HyQue hypothesis ≡ ‘proposition’ that `has component part’ only (`proposition’ that ‘specifies’ only `event’)
  • 15. Event-based data model HyQue events denote a phenomenon involving two objects: „agent‟ and „target‟ . In addition, we can specify the context of this event (e.g. located in nucleus, or under some genetic background) Currently supported events Event 1. protein-protein binding ‘has agent’ agent 2. protein-nucleic acid binding ‘has target’ target 3. molecular activation ‘is located in’ location 4. molecular inhibition 5. gene induction ‘is negated’ boolean 6. gene repression 7. transport15 ESWC2012::HyQue-SPIN
  • 16. Example Hypothesis • HyQue‟s demonstrative knowledge base is focused on galactose metabolism and regulation. The paper describes a union of hypotheses: (Gal4p induces the expression of GAL1 AND Gal4p induces the expression of GAL7 AND Gal3p induces the expression of GAL2) OR (Gal4p induces the expression of GAL7 AND Gal80p induces the expression of GAL7 AND Gal80p does not inhibit the activity of Gal4p WHEN GAL3 is over-expressed)16
  • 17. Users don‟t need to know RDF to formulate hypotheses User Interface with auto-completion http://hyque.semanticscience.org17 ESWC2012::HyQue-SPIN
  • 18. Hypothesis RDF Representation hypothesis :h rdf:type hyque:Hypothesis ; hyque:has-component-part :p1 . has component part :p1 rdf:type hyque:Proposition ; proposition hyque:specifies :e1 specifies :e1 rdf:type hyque:Event . event
  • 19. Event RDF representation :e1 rdf:type hyque:event ; <!– positive regulation of gene expression --> event: gal4p positively regulates rdf:type <http://bio2rdf.org/go:0010628>; the expression of GAL1 hyque:agent <http://bio2rdf.org/sgd:Gal4p> ; hyque:target <http://bio2rdf.org/sgd:GAL1> ; hyque:is_negated "0"; ….19 ESWC2012::HyQue-SPIN
  • 20. event: gal4p positively regulates the expression of GAL1 HyQue‟s SPIN rules retrieve event data, and then score it and the overall hypothesis HyQue current contains 63 SPIN rules to evaluate hypotheses: 18 system rules, 45 domain specific rules20 ESWC2012::HyQue-SPIN
  • 21. Combination of system and domain rules to retrieve and score data, and add new triples Event - induction SPIN induction rule :e1 a go:0010628; hyque:agent sgd:Gal4p; hyque:target sgd:GAL1 . hyque:is_negated "0" ;21 ESWC2012::HyQue-SPIN
  • 22. SPIN System Rule : Link Hypothesis to Evaluation CONSTRUCT { ?this ‘has attribute’ ?hypothesisEval . ?hypothesisEval a ‘evaluation’. ?hypothesisEval ‘obtained from’ ?propositionEval . ?hypothesisEval ‘has value ?hypothesisEvalScore . } WHERE { ?this ‘has component part’ ?proposition . ?proposition ‘has attribute’ ?propositionEval . BIND(:calculateHypothesisScore(?this) AS ?hypothesisEvalScore) . BIND(IRI(fn:concat(afn:namespace(?this), afn:localname(?this),"_", "evaluation")) AS ?hypothesisEval) . }22 ESWC2012::HyQue-SPIN
  • 23. SPIN Domain Rule: Score experimental evidence of Gene Expression Induction Event SELECT ?induceEventScore WHERE { BIND (:calculateInduceAgentTypeScore(?arg1) AS ?agentTypeScore) . BIND (:calculateInduceAgentFunctionTypeScore(?arg1) AS ?agentFunctionTypeScore) . BIND (:calculateInduceTargetTypeScore(?arg1) AS ?targetTypeScore) . BIND (:calculateInduceLogicalOperatorScore(?arg1) AS ?logicalOperatorScore) . BIND (:calculateInduceEventLocationScore(?arg1) AS ?eventLocationScore) . BIND (:penalizeNegation(?arg1) AS ?negationScore) . BIND (5 AS ?maxScore) . BIND (((((((?agentTypeScore + ?agentFunctionTypeScore) + ?targetTypeScore) + ?logicalOperatorScore) + ?eventLocationScore) + ?negationScore) / ?maxScore) AS ?induceEventScore) . }24 ESWC2012::HyQue-SPIN
  • 24. HyQue domain rules CALCULATE a quantitative measure of evidence for an event„induce‟ rule (maximum score: 5): – Is event negated? GO:0010628 • If yes, subtract 2 – Is event of type „induce‟? CHEBI:36080 • If yes, add 1; if no, subtract 1 – Is agent of type „protein‟ or „RNA‟? • If yes, add 1; if of type „gene‟, subtract 1 – Is target of type „gene‟? SO:0000236 • If yes, add 1; if no, subtract 1 – Does agent have known „transcription factor activity‟? • If yes, add 1 GO:0003700 – Is event located in the „nucleus‟? • If yes, add 1; if no, subtract 1 GO:0005634
  • 25. SPIN rule, outcome and score for a GAL gene induction event 4/5 = 0.8026 ESWC2012::HyQue-SPIN
  • 26. Can customize rules to get more evidence, but at a cost if not found • calculateInhibitEventScore does not take into account the (Gal4p induces the expression of GAL1 e1 AND physical location of the event Gal3p induces the expression of GAL2 e2 AND participants Gal4p induces the expression of GAL7) OR • Experimental evidence (Gal4p induces the expression of GAL7 AND suggests that physical location Gal80p induces the expression of GAL7 AND in the context of an inhibition Gal80p does not inhibit the activity of Gal4p event is important WHEN GAL3 is over-expressed) • Inhibition of Gal4p activity by Gal80p is known to take place in the nucleus, yet this inhibition is interrupted when Adding a new rule to consider location Gal80p is bound by Gal3p, weakens the event due to lack of data which is typically found in the (0.87 -> score 0.78) cytoplasm27 ESWC2012::HyQue-SPIN
  • 27. Customization of rules and rulesets can generate different evidence-based evaluations
  • 28. Reproducible eScienceLOD for Hypothesis, Rules, Data and Evaluation
  • 29. Summary • HyQue is a system that facilitates the formulation and evaluation of scientific hypotheses against formalized knowledge on the Semantic Web. • This work focused on the development and incorporation of recursive SPIN rules to obtain and score events and multi-event hypotheses using OWL ontologies and RDF-based LOD.30 ESWC2012::HyQue-SPIN
  • 30. Future Directions • Collaborative, end user-centered environment to engineer, share, compare and evaluate hypotheses • Investigate alternative scoring systems • Structure knowledge beyond the GAL network – EU/US Collaborations on disease-centered research hypotheses – Applications for clinical decision support31 ESWC2012::HyQue-SPIN
  • 31. dumontierlab.commichel_dumontier@carleton.ca Website: http://dumontierlab.com Presentations: http://slideshare.com/micheldumontier ESWC2012::HyQue-SPIN