Drug-discovery knowledge integration and analysis using OWL and reasoners

576 views
474 views

Published on

Tutorial presented at SWAT4LS 2012, how to leverage the content of databases with the web ontology language (OWL).

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
576
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Drug-discovery knowledge integration and analysis using OWL and reasoners

  1. 1. Knowledge integration and analysis using OWL and reasoners SWAT4LS 2012 Tutorial November 29thSamuel Croset & Dietrich Rebholz-Schumann
  2. 2. Material• Files: http://bit.ly/WRkefF• Protégé: http://stanford.io/83YUJ4• Brain: http://bit.ly/TYGj4O
  3. 3. Tutorial• Ask questions!• What is OWL?• Why is it particularly interesting for life sciences?• How to use OWL?• What is OWL 2EL?• How to integrate and query biomedical knowledge?
  4. 4. Why learning OWL? “The scientist is not a person who gives the right answers, hes one who asks the right questions” ― Claude Lévi-Strauss“Half of science is putting forth the right questions” ― Sir Francis Bacon
  5. 5. Why learning OWL?“What are the human proteins that regulates the blood coagulation?”
  6. 6. Why learning OWL?Classification (flat file) Database (SQL or RDF) “What are the human proteins that regulates the blood coagulation?” Ontology (OBO)
  7. 7. Why learning OWL?Classification (flat file) Database (SQL or RDF) How do I integrate the data? “What are the human proteins that regulates the blood coagulation?” What does it What are the even mean? parts? What is Ontology (OBO) composing it?
  8. 8. Why learning OWL?• Existing resources can already answer the question  But they need to interact• Ontologies are not only labels for biological concept (“blood coagulation”)  They help to formalize the domain knowledge• We want to mix traditional ontologies with other large-scale data• We want an intuitive way to formulate the query, hiding the implementation
  9. 9. What is OWL?• The Semantic Web: RDF  URI and triples  Should improve interoperability over the Web• Need for shared schemas  ontologies• OWL  Description logics and knowledge representation, decidable, attractive and well- understood computational properties.• OWL  Direct Semantics or RDF-based semantics
  10. 10. What is OWL?• Confusing relations between OWL, RDF, SPARQL, reasoning, etc…• Here we deal with the Direct Semantics of OWL (no RDF)  It’s easier!• You get to use the reasoner a lot!• In OWL you build knowledge-bases and ontologies.
  11. 11. OWL and Life SciencesAdvantages versus RDF, SQL and flatfiles?• Formal language to represent classifications and ontologies• Machine reasoning• Large-scale (OWL 2EL)• Knowledge integration• Composition• Powerful query mechanism
  12. 12. OWL 2 Terminology • It’s all about definitions! • Defining things based on the relations they have • Entities: elements used to refer to real-world objects • Expressions: combinations of entities to form complex descriptions from basic ones • Axioms: the basic statements that an OWL ontology expresses  Pieces of knowledgehttp://www.w3.org/TR/owl2-primer/#Modeling_Knowledge:_Basic_Notions
  13. 13. Entities• Classes: Categories and Terminology – Protein, Human, Drug, Chemical, P53, Binding site, etc…  Pretty much everything in life science.• Individuals (objects): Instances – Rex the dog, this mouse on the bench, you, etc…• Properties: Relations between individuals – Part of, regulates, perturbs, etc…
  14. 14. Axioms• Statements, pieces of knowledge  express the truth.• How classes and properties relate to each other: – All Humans are Mammals  Human is a subclass of Mammal• Our first OWL axiom: SubClassOf
  15. 15. Ontology/Knowledge-base• Set of axioms• Serialized as “.owl”ObjectProperty: part-ofClass: owl:ThingClass: CellClass: Nucleus SubClassOf: part-of some Cell
  16. 16. Terminology SummaryScientist regulates John Class Property Individual Person works in Paris
  17. 17. Terminology Summary John is a Scientist Paris is a CityScientist is a Person John works in ParisOntology/Knowledge-baseAxiom Class Individual Property
  18. 18. Terminology Summary Output in Manchester Syntax: Ontology: <brain.owl> ObjectProperty: part-of Class: owl:Thing Class: Cell Class: Nucleus SubClassOf: part-of some Cell
  19. 19. Terminology Summary Output in RDF (turtle):<brain.owl> rdf:type owl:Ontology .:part-of rdf:type owl:ObjectProperty .:Cell rdf:type owl:Class .:Nucleus rdf:type owl:Class ; rdfs:subClassOf [ rdf:type owl:Restriction ; owl:onProperty :part-of ; owl:someValuesFrom :Cell ] .owl:Thing rdf:type owl:Class .
  20. 20. Exercise 1 – Classes and axioms• Open the file “NCBI-taxonomy-mammals.owl” with a text editor. Can you understand what’s inside?• Now open the file with Protégé and go under the tab “classes”. You can use the option “render by label” in the “View” menu.• Can you recognize the classes? What do they describe?• Can you spot the axioms? What do they capture?
  21. 21. Reasoner• A program that understand the axioms and can deduce things from it.• Used to classify the ontology.• Query engine for knowledge-bases.• More or less fast depending on the number and type of axioms.
  22. 22. Exercise 2 - Reasoning• In Protégé, go under the “DL query” tab and retrieve all descendant classes of the class “Abrothrix” (or “NCBI_156196”).• What does this query means?
  23. 23. Comparison against mySQLSELECT s.*FROM species AS s, species AS tWHERE (s.left_value BETWEEN t.left_value AND t.right_value)AND t.common_name=abrothrix;
  24. 24. Constructs – Class expressions• Combining classes and properties to define more things (class expression)  Composition• Intersection: and – Mammal and Omnivore• Existential Restriction: some – part-of some Cell Cuneiform script (3000 BC): Head Food Eat
  25. 25. Construct: and Mammal and Omnivore Omnivore Mammalindividual
  26. 26. Constructs & axioms Human SubClassOf Mammal and Omnivore Mammal and Omnivore Omnivore Mammal Human Pigindividual
  27. 27. Constructs & axiomsHuman SubClassOf Mammal and OmnivoreThis definition (Mammal and Omnivore) of theconcept “Human ” is partial.• Every human must be at least a mammal and an omnivore according to our definition.• But it’s not because you are a mammal and an omnivore that you are necessary human!!
  28. 28. Construct: someExistential restriction: Weird construct at first, butuseful while dealing with incomplete knowledgeP some C: if it exists then a least one instance of C linked by P Cell part-of some Cell part-of part-of part-of
  29. 29. Constructs & axiomsNucleus SubClassOf part-of some Cell “Each nucleus must be part of a cell” part-of some Cell Cell Nucleus part-of part-of part-of
  30. 30. Exercise 3 – Implementing the axiom• Create a new project inside Protégé.• Implement “Human SubClassOf Mammal and Omnivore”• Run the reasoner and look at the hierarchy of classes. Does it make sense?• That’s the main role of the reasoner  classifying things based on their definitions.• “Conceptual Lego”http://www.co-ode.org/resources/papers/ekaw2004.pdf
  31. 31. OWL concepts Class : Basic block Property : Basic block Constructor : Used in class expressionsClass Expression : Class , Property , Constructor Axiom : Relations between these entities.
  32. 32. OWL ConceptsAxiom TBox (Terminological Axiom) RBox SubClassOf (Relational Axiom) EquivalentClasses DisjointClasses SubObjectPropertyOf EquivalentObjectProperties ABox ObjectPropertyChain (Assertional Axiom) TransitiveObjectProperty … ClassAssertion…
  33. 33. Real-life example: The Gene Ontology• Open Biomedical Ontology (OBO) format originally. Nucleus Cell part-of• Moved to OWL  Stronger semanticshttp://www.geneontology.org/GO.ontology-ext.relations.shtml
  34. 34. GO constructs• Central pattern: A SubClassOf P some B Nucleus SubClassOf part-of some Cell ( Nucleus part-of Cell )http://www.geneontology.org/GO.ontology-ext.relations.shtml
  35. 35. GO - RBox
  36. 36. GO – Rbox: part-of Transitivity
  37. 37. Exercise 4 – Transitive property• Open the “gene_ontology.owl” file.• What are the things that are a biological_process and part_of some wound healing?• Look at the class “blood coagulation, common pathway”. Is it obvious for this class to be in the results?
  38. 38. GO – Rbox: regulates Chain
  39. 39. Exercise 5 – Chained properties• Look at the “regulates” property inside Protégé.• What are the things that are a biological_process and regulates some mitotic cell cycle?• Look at the class “positive regulation of syncytial blastoderm mitotic cell cycle”• Is it obvious for this class to be in the results?
  40. 40. GO – Rbox: positively/negatively regulates SubProperty
  41. 41. Exercise 6 – Sub Properties• Look at the “positively-regulates” property inside Protégé.• What are the things that are a biological_process and positively_regulates some mitotic cell cycle?• Are they different from the things that are biological_process and regulates some mitotic cell cycle?
  42. 42. Exercise 7 – Verifying properties• Are we respecting the GO specifications?
  43. 43. Summary GO• Concepts are defined using one construct only (A SubClassOf P some B).• Rich RBox• OWL is helpful to represent these relations, helps to abstract away.
  44. 44. Knowledge integration• We would like to answer questions over all different source of knowledge.• “Thrombosis is a widespread condition and a leading cause of death in the UK.”• We would like to find a new protein target in order to treat thrombosis.• Here we would like to know “what are the human proteins that regulates the blood coagulation”.
  45. 45. Knowledge-bases• Species: NCBI taxonomy• Biological Process: Gene Ontology• Proteins: Uniprot
  46. 46. Exercise 8 – Integrating knowledge• Open the file uniprot.owl• Do you understand its content?• Now open the file “integrated.owl”• How would you formulate the question “what are the human proteins that regulates the blood coagulation” in OWL?• involved_in some (regulates some blood coagulation) and expressed_in some Homo sapiens
  47. 47. Implementation using BrainBrain brain = new Brain();brain.learn("data/gene_ontology.owl");brain.learn("data/NCBI-taxonomy-mammals.owl");brain.learn("data/uniprot.owl");String query = "involved_in some (regulates some GO_0007596) andexpressed_in some NCBI_9606“;List<String> subClasses = brain.getSubClasses(query,false);brain.sleep();
  48. 48. Large-scale implementation• OWL is computing intensive  OWL 2EL• Less axioms and constructs  easier for you to remember and easier for the reasoner to compute• Suited for life sciences  lots of classes, few instances
  49. 49. H2O H O H Expressivity RDFSPARQL RDFS OWL2 EL OWL2 PSPACE(all constructs) NP (AND, FILTER, UNION) PTIME PTIME NP-HARD LOGSPACE Tractable(AND, FILTER) http://www.w3.org Parallelism /TR/owl2-profiles/
  50. 50. Why learning OWL?Classification (flat file) Database (SQL or RDF) How do I integrate the data? “What are the human proteins that regulates the blood coagulation?” What does it What are the even mean? parts? What is Ontology (OBO) composing it?
  51. 51. Conclusion• Ask questions!• What is OWL?• Why is it particularly interesting for life sciences?• How to use OWL?• What is OWL 2EL?• How to integrate and query biomedical knowledge?
  52. 52. Thank you!• croset@ebi.ac.uk• rebholz@ebi.ac.uk

×