Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kboom phenoday-2016


Published on

Presentation on bayesian owl ontology merging approach from Phenoday/Bio-ontologies 2016

Published in: Science
  • Be the first to comment

  • Be the first to like this

Kboom phenoday-2016

  1. 1. k-BOOM A Bayesian approach to ontology structure inference, with applications in disease ontology construction Chris Mungall Lawrence Berkeley Laboratory PhenoDay 2016 @monarchinit @chrismungall
  2. 2. Building a cohesive, complete disease ontology Objective • Combine existing disease classifications and lists into unified cohesive framework • Best of all worlds • Integrate data from multiple resources Challenges • Current resources developed independently, different perspectives • Mappings are imprecise OMIM Orphanet DO MESH NCIT Deciphe r ICD SNOMED Combined, coherent view
  3. 3. Disease classifications and why mappings are not enough • Given N disease lists – Where each provides cross-references (xrefs) to up to N-1 others – Up to (N^2)-N sets of mappings • Even more with 3rd party mappings – These are frequently • Inconsistent (directly or indirectly) • Different meanings and levels of specificity • Incomplete • Stale • Difficult to computationally verify • Fundamental issue – Xrefs lack semantics – Explicit semantics would enable computational checks Ont1 Ont2 Ont3 Ont4 Ont5 Ont6
  4. 4. DOID (blue) OMIM (brown) MESH (grey) ORDO/Orphanet (yellow) SubClassOf (solid line) Xref (dashed grey line) 4 disease resources plus mappings: Hemolytic anemia
  5. 5. Objective: Coherent OWL Ontology Merging (OOM) • Criteria for OOM – Merged • Combines multiple lists and classifications (terminologies and lists treated as ‘degenerate’ ontologies), Presented as a single ontology • Equivalent classes merged – Logically Connected • OWL/Description Logic constructs – e.g. SubClassOf, EquivalentClass, SomeValuesFrom • Not xrefs – Coherent • Logically coherent: no unsatisfiable classes • Biologically coherent: makes biological and clinical sense
  6. 6. Our previous approach, applied to phenotypes: L-DOOM Logical Definition based OWL Ontology Merging Mungall, C. J., Gkoutos, G., Smith, C., Haendel, M., Lewis, S., & Ashburner, M. (2010). Integrating phenotype ontologies across multiple species. Genome Biology, 11(1), R2. doi:10.1186/gb-2010-11-1-r2 Köhler, S., Doelken, S. C., Ruef, B. J., Bauer, S., Washington, N., Westerfield, M., … Mungall, C. J. (2013). Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research. F1000Research, 1–12. doi:10.3410/f1000research.2-30.v1 Application to diseases? • Works well for compositional classes (e.g. many cancer terms) • Less well for genetic diseases, complex syndromes 1. Assign Logical Definitions (OWL equivalence axioms) to classes in each ontology • Can be assigned manually or semi- automatically (Obol) HP:0002180 Neuro- degeneration MP:0000876 Purkinje cell degeneration Equiv CL:0000540 neuron CL:0000121 Purkinje cell Equiv degenerate AND inheres-in SOME neuron degenerate AND inheres-in SOME Purkinje cell 2. Using reasoning to infer logical axioms SubClassOf
  7. 7. Probabilistic Ontology OP = <A,H> BOOM Bayes OWL Ontology Merging: Finds the set of hypothetical axioms that maximises P(OP) Merged Coherent OWL Ontology Elk Reasoner Ontology 1 Inter- Ontology Mappings mapping tool Ontology 2 Ontology .. Ontology n Hypothetical Logical Axioms plus Weights (H) mapping curation Axiom Weight Estimator Weight Curation Next iteration Merge equivalent classes
  8. 8. Generating hypothetical logical axioms Inter- Ontology Mappings Hypothetical Logical Axioms plus Weights (H) Axiom Weight Estimator E.g: OMIM:123 xref DOID:987 Pr(OMIM:123 ≡ DOID:987) = 0.3 Pr(OMIM:123 ⊂ DOID:987) = 0.4 Pr(OMIM:123 ⊃DOID:987) = 0.1 Domain rules (lexical, structural, …):
  9. 9. K-BOOM Algorithm for finding most likely merged ontology 1. Factorize calculation by dividing combined axioms into k modules (k-BOOM) Algorithm: i. Assert all hypothetical axioms to be true, ii. Make module from equivalence clique Find values for H that maximises P. Problem: 2^N ontologies hi : boolean representing truth value of hypothetical axiom Hi 2. Use greedy algorithm; start with Most likely hypothetical axioms in Ok 3. Test each configuration using OWL Reasoner (Elk) for satisfiability (unsat => Pr=0), calc posterior probability 4. Repeat until number of tests exceeds threshold 5. Return most likely configuration for Ok
  10. 10. Probability guided curator workflow: A little knowledge goes a long way • Run cycle • Examine results for modules with: – low posterior probability – low confidence (top ranked solution has similar P to next ranked) – Pr(H_i = true) << threshold • Apply biological/clinical knowledge • Override auto-generated hypothetical axiom weights with curated ones – Feedback issues to source ontologies • Repeat dialog Mondo curator External ontology curator
  11. 11. Application: merging diseases into MonDO “Ontology” Classes (before, after merge) SubClass axioms Xrefs Inputs: DOID 6878  6012 7082 36656 MESH (D) 11314  4152 19036 OMIM (D) 7783  7783 0 31242 Orphanet (D) 8740  4683 15182 20326 OMIA 4833  4833 3120 355 DC 209  208 310 316 Medic 0 8630 3435 Output: MonDO 39757  27617 44837 Held back: NCIT, SNOMED, ICD9, GARD
  12. 12. Example Module Resolution: ITM2B amyloidosis
  13. 13. Example failed resolution – due to ontology error
  14. 14. Example failed resolution – due to mesh duplicates
  15. 15. Evaluating results of disease merger • No gold standard for multiple ontology merger – Partial evaluation using held-back Orphanet NTBT/E calls: • 6977/7986 (87% agreement) • Ad-hoc evaluation by curator – Approach: use posterior probabilities to rank modules requiring attention – This is the killer-app feature – Iteratively refine curated probabilities • • Results – Manual inspection and use of mondo – Detection of errors in source ontologies • E.g. duplicates in MESH • Incorrect xrefs in DO, e.g. – - issues #164, #163, #156, #154, #151, #150, #149, #140, #135
  16. 16. Next Steps • Integrate hypothetical axiom weight estimation into Bayesian model • Apply Markov Chain Monte Carlo (MCMC) methods for estimating most likely graph – E.g Metropolis-Hastings • Integrate other knowledge – Logical Definitions (Phenotypes) – Molecular knowledge • Improve Evaluation – Test k-BOOM on task where we have gold standard, e.g. neuroanatomy/uberon – Formal comparison with EFO, MedGen, …
  17. 17. Discussion • Retrospective merging vs prospective development – Better to work together from outset (OBO model) – However, current state of affairs is such that expert knowledge is distributed across resources – We want to preserve that rather than reinvent – Coherent merging of molecular knowledge with classical top-down knowledge will be required moving forward
  18. 18. Implementation/Availability • Software – • Paper – – • MonDO – disease-ontology – Both OWL ontology and axiom weight rules
  19. 19. Acknowledgments k-BOOM • Ian Holmes • Sebastian Kohler • Jim Balhoff • Peter Robinson • Melissa Haendel Curation • Nicole Vasilesky (MonDO, DC) • Sue Bello (DC) • Elvira Mitraka (DO) • Lynn Shriml (DO) FUNDING: NIH Office of Director: 1R24OD011883; NIH-UDP: HHSN268201300036C