Your SlideShare is downloading. ×
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees
Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision Trees

1,676

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,676
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Hazard Estimation and Method Comparison with OWL-Encoded Toxicity Decision TreesLeonid L. Chepelev, Dana Klassen, and Michel DumontierDepartment of Biology, Institute of Biochemistry, School of Computer ScienceCarleton University Ottawa, Canada
    An OWLED 2011 Paper
  • 2. Motivation
    Machine learning approaches such as decision trees are commonly used in toxicity prediction
    However, interpretation of complex trees can be difficult to interpret, and there is no explanation for the category obtained.
    Moreover, many variant decision trees are coming out, difficult to compare
    Can we use OWL ontologies to formally represent and compare decision trees?
    A simple toxicity decision tree: at each branching point, a rule is evaluated, and based on the outcome of this rule, either a final activity decision is made, or judgment is deferred to another node.
    2
  • 3. Druglikeness: Lipinski’s Rule of Five
    Rule of thumb for druglikeness (orally active in humans)
    (4 rules with multiples of 5)
    mass of 500 Daltons or less
    5 hydrogen bond donors or less
    10 hydrogen bond acceptors or less
    A partition coefficient (logP) value between -5 and 5
    Multiple conditions that must be satisfied to be considered druglike.
    A molecule must failing any of these would not be drug like.
    3
  • 4. Chemical Data
    Lipinski drug-likeness dataset comprised of 7000 compounds from the Human Metabolome Database (HMDB).
    attributes computed using the Chemistry Development Kit.
    Tree built with open source Weka - collection of machine learning algorithms for data mining tools for data pre-processing, classification, regression, clustering, association rules, and visualization.
    4
  • 5. Rule of Five Decision Tree
    Correctly classified molecule counts are given in brackets.
    100% accuracy in ten-fold cross validation.
    5
  • 6. Formalization
    Substance I
    subClassOf
    Substance II
    Substance III
    A substance I is something that has a molecular weight
    Substance II is a kind of substance I that has a molecular weight <= 500
    Substance III is a kind of substance I that has a molecular weight > 500
    6
  • 7. Formalization
    subClassOf
    Substance I I
    Substance I
    has
    attribute
    has
    attribute
    Molecular Weight
    Molecular Weight
    has
    value
    Every node in the decision tree represents an entity having a attribute or feature, whose value may be specified
    substance I is something that has a molecular weight
    ‘substance I’ equivalentClass
    ‘has attribute’ some ‘molecular weight’
    substance II is a kind of substance I with a specified
    ‘substance II’ equivalentClass
    ‘substance I’
    and ‘has attribute’ some (‘molecular weight’
    and ‘has value’ double[<= 499.296759]))
    >499.296759
    7
  • 8. The Chemical Information Ontology (CHEMINF)
    100+ chemical descriptors
    50+ chemical qualities
    Relates descriptors to their specifications, the software that generated them (along with the running parameters, and the algorithms that they implement)
    Contributors: Nico Adams, Leonid Chepelev, Michel Dumontier, Janna Hastings, EgonWillighagen, Peter Murray-Rust, Cristoph Steinbeck
    8
    http://semanticchemistry.googlecode.com
  • 9. A simple decision tree can be represented as a set of subsuming OWL classes
    Methods: A WEKA tree was trained and serialized into dot format. Used the Weka API to read the document and create the ontology using the OWL API.
    9
  • 10. Each outcome may also be formalized in terms of the set of all attributes as obtained by drawing a path to the root
    Druglike-moleculeequivalentClass
    ‘molecule’
    and ‘has attribute’ some (‘molecular weight’ that ‘has value’ double[<= 500.0])
    and ‘has attribute’ some (‘hydrogen bond count donor count’ that ‘has value’ int[<= 5])
    and ‘has attribute’ some (‘hydrogen bond acceptor count’ that ‘has value’ int[<= 10])
    and ‘has attribute’ some (‘partition coefficient’ that ‘has value’ double[<= 5.0, >= -5.0])
    10
  • 11. Large scale decision trees
    Lipinski example is typically trivial
    Can we create a new decision tree capable of classification of linked data
    Obtained 1400 chemicals from an EPA ToxCast carcinogenic toxicity dataset labelled either toxic or non-toxic
    Computed 318 boolean features using the ToxTree API. http://toxtree.sourceforge.net/
    Generated the decision tree using Weka
    Generated the OWL ontology using the OWL API
    Generated individuals using the CHESS specification and used descriptors specified in the CHEMINF ontology.
    Classification using OWL API + Pellet; Protégé 4 and Hermit.
    11
  • 12. A decision tree to predict carcinogenic toxicity
    12
  • 13. Decision Tree to OWL Ontology
    13
  • 14. Is acetaminophen toxic?
    14
  • 15. From data to automated reasoning
    data
    linked data
    Automated
    Reasoning
    (realization) over
    OWL encoded
    Toxicity tree
    15
  • 16. 16
  • 17. 17
  • 18. Path through Decision Tree kindly provided by reasoning about the OWL ontology
    18
  • 19. Comparison of toxicity trees
    Along with the standard lipinski rule of five ontology, we generated a variant where MW <= 250.
    Reasoning over the two ontologies, we see that the active compound (based on the MW <= 250) is subsumed by the active compound based on MW <= 500
    19
  • 20. Conclusion
    Decision trees can be faithfully represented as OWL ontologies
    As formalized ontologies, we can automatically reason about the ontology, and use it to classify new chemicals (hence predict toxicity)
    If we maintain the structure of the decision tree, we can get explanations to provide the set of attributes used in the decision making (unlike black box counterpart).
    Expectation that trees generated with different, but aligned vocabularies may now be comparable
    20
  • 21. Acknowledgements
    CHEMINF Group
    Leo Chepelev
    Janna Hastings
    EgonWillighagen
    Nico Adams
    Toxicity Group
    Leo Chepelev
    Dana Klassen
    21
  • 22. dumontierlab.com
    michel_dumontier@carleton.ca
    Presentations: http://slideshare.com/micheldumontier
    22

×