Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision TreesLeonid L. Chepelev, Dana Klassen, and Mich...
Motivation<br />Machine learning approaches such as decision trees are commonly used in toxicity prediction<br />However, ...
Druglikeness: Lipinski’s Rule of Five<br />Rule of thumb for druglikeness (orally active in humans)<br />	(4 rules with mu...
Chemical Data<br />Lipinski drug-likeness dataset comprised of 7000 compounds from the Human Metabolome Database (HMDB). <...
Rule of Five Decision Tree<br />Correctly classified molecule counts are given in brackets. <br />100% accuracy in ten-fol...
Formalization<br />Substance I <br />subClassOf<br />Substance II<br />Substance III<br />A substance I is something that ...
Formalization<br />subClassOf<br />Substance I I<br />Substance I <br />has <br />attribute<br />has <br />attribute<br />...
The Chemical Information Ontology (CHEMINF)<br />100+ chemical descriptors<br />50+ chemical qualities<br />Relates descri...
A simple decision tree can be represented as a set of subsuming OWL classes<br />Methods: A WEKA tree was trained and seri...
Each outcome may also be formalized in terms of the set of all attributes as obtained by drawing a path to the root<br />D...
Large scale decision trees<br />Lipinski example is typically trivial<br />Can we create a new decision tree capable of cl...
A decision tree to predict carcinogenic toxicity<br />12<br />
Decision Tree to OWL Ontology<br />13<br />
Is acetaminophen toxic?<br />14<br />
From data to automated reasoning<br />data<br />linked data<br />Automated<br />Reasoning <br />(realization) over <br />O...
16<br />
17<br />
Path through Decision Tree kindly provided by reasoning about the OWL ontology<br />18<br />
Comparison of toxicity trees<br />Along with the standard lipinski rule of five ontology, we generated a variant where MW ...
Conclusion<br />Decision trees can be faithfully represented as OWL ontologies<br />As formalized ontologies, we can autom...
Acknowledgements<br />CHEMINF Group<br />Leo Chepelev<br />Janna Hastings<br />EgonWillighagen<br />Nico Adams<br />Toxici...
   dumontierlab.com<br />michel_dumontier@carleton.ca<br />Presentations: http://slideshare.com/micheldumontier<br />22<br />
Upcoming SlideShare
Loading in …5
×

Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees

2,347 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,347
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees

  1. 1. Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision TreesLeonid L. Chepelev, Dana Klassen, and Michel DumontierDepartment of Biology, Institute of Biochemistry, School of Computer ScienceCarleton University Ottawa, Canada<br />An OWLED 2011 Paper<br />
  2. 2. Motivation<br />Machine learning approaches such as decision trees are commonly used in toxicity prediction<br />However, interpretation of complex trees can be difficult to interpret, and there is no explanation for the category obtained.<br />Moreover, many variant decision trees are coming out, difficult to compare<br />Can we use OWL ontologies to formally represent and compare decision trees?<br />A simple toxicity decision tree: at each branching point, a rule is evaluated, and based on the outcome of this rule, either a final activity decision is made, or judgment is deferred to another node. <br />2<br />
  3. 3. Druglikeness: Lipinski’s Rule of Five<br />Rule of thumb for druglikeness (orally active in humans)<br /> (4 rules with multiples of 5)<br />mass of 500 Daltons or less<br />5 hydrogen bond donors or less<br />10 hydrogen bond acceptors or less<br />A partition coefficient (logP) value between -5 and 5<br />Multiple conditions that must be satisfied to be considered druglike. <br />A molecule must failing any of these would not be drug like.<br />3<br />
  4. 4. Chemical Data<br />Lipinski drug-likeness dataset comprised of 7000 compounds from the Human Metabolome Database (HMDB). <br />attributes computed using the Chemistry Development Kit.<br />Tree built with open source Weka - collection of machine learning algorithms for data mining tools for data pre-processing, classification, regression, clustering, association rules, and visualization. <br />4<br />
  5. 5. Rule of Five Decision Tree<br />Correctly classified molecule counts are given in brackets. <br />100% accuracy in ten-fold cross validation.<br />5<br />
  6. 6. Formalization<br />Substance I <br />subClassOf<br />Substance II<br />Substance III<br />A substance I is something that has a molecular weight <br />Substance II is a kind of substance I that has a molecular weight <= 500<br />Substance III is a kind of substance I that has a molecular weight > 500<br />6<br />
  7. 7. Formalization<br />subClassOf<br />Substance I I<br />Substance I <br />has <br />attribute<br />has <br />attribute<br />Molecular Weight <br />Molecular Weight <br />has <br />value<br />Every node in the decision tree represents an entity having a attribute or feature, whose value may be specified<br />substance I is something that has a molecular weight <br />‘substance I’ equivalentClass<br /> ‘has attribute’ some ‘molecular weight’<br />substance II is a kind of substance I with a specified <br />‘substance II’ equivalentClass<br /> ‘substance I’ <br /> and ‘has attribute’ some (‘molecular weight’<br /> and ‘has value’ double[<= 499.296759]))<br />>499.296759<br />7<br />
  8. 8. The Chemical Information Ontology (CHEMINF)<br />100+ chemical descriptors<br />50+ chemical qualities<br />Relates descriptors to their specifications, the software that generated them (along with the running parameters, and the algorithms that they implement)<br />Contributors: Nico Adams, Leonid Chepelev, Michel Dumontier, Janna Hastings, EgonWillighagen, Peter Murray-Rust, Cristoph Steinbeck<br />8<br />http://semanticchemistry.googlecode.com<br />
  9. 9. A simple decision tree can be represented as a set of subsuming OWL classes<br />Methods: A WEKA tree was trained and serialized into dot format. Used the Weka API to read the document and create the ontology using the OWL API.<br />9<br />
  10. 10. Each outcome may also be formalized in terms of the set of all attributes as obtained by drawing a path to the root<br />Druglike-moleculeequivalentClass<br />‘molecule’<br />and ‘has attribute’ some (‘molecular weight’ that ‘has value’ double[<= 500.0])<br />and ‘has attribute’ some (‘hydrogen bond count donor count’ that ‘has value’ int[<= 5])<br />and ‘has attribute’ some (‘hydrogen bond acceptor count’ that ‘has value’ int[<= 10])<br />and ‘has attribute’ some (‘partition coefficient’ that ‘has value’ double[<= 5.0, >= -5.0])<br />10<br />
  11. 11. Large scale decision trees<br />Lipinski example is typically trivial<br />Can we create a new decision tree capable of classification of linked data<br />Obtained 1400 chemicals from an EPA ToxCast carcinogenic toxicity dataset labelled either toxic or non-toxic<br />Computed 318 boolean features using the ToxTree API. http://toxtree.sourceforge.net/<br />Generated the decision tree using Weka<br />Generated the OWL ontology using the OWL API<br />Generated individuals using the CHESS specification and used descriptors specified in the CHEMINF ontology.<br />Classification using OWL API + Pellet; Protégé 4 and Hermit.<br />11<br />
  12. 12. A decision tree to predict carcinogenic toxicity<br />12<br />
  13. 13. Decision Tree to OWL Ontology<br />13<br />
  14. 14. Is acetaminophen toxic?<br />14<br />
  15. 15. From data to automated reasoning<br />data<br />linked data<br />Automated<br />Reasoning <br />(realization) over <br />OWL encoded <br />Toxicity tree<br />15<br />
  16. 16. 16<br />
  17. 17. 17<br />
  18. 18. Path through Decision Tree kindly provided by reasoning about the OWL ontology<br />18<br />
  19. 19. Comparison of toxicity trees<br />Along with the standard lipinski rule of five ontology, we generated a variant where MW <= 250. <br />Reasoning over the two ontologies, we see that the active compound (based on the MW <= 250) is subsumed by the active compound based on MW <= 500<br />19<br />
  20. 20. Conclusion<br />Decision trees can be faithfully represented as OWL ontologies<br />As formalized ontologies, we can automatically reason about the ontology, and use it to classify new chemicals (hence predict toxicity)<br />If we maintain the structure of the decision tree, we can get explanations to provide the set of attributes used in the decision making (unlike black box counterpart).<br />Expectation that trees generated with different, but aligned vocabularies may now be comparable<br />20<br />
  21. 21. Acknowledgements<br />CHEMINF Group<br />Leo Chepelev<br />Janna Hastings<br />EgonWillighagen<br />Nico Adams<br />Toxicity Group<br />Leo Chepelev<br />Dana Klassen<br />21<br />
  22. 22. dumontierlab.com<br />michel_dumontier@carleton.ca<br />Presentations: http://slideshare.com/micheldumontier<br />22<br />

×