Kboom phenoday-2016

k-BOOM
A Bayesian approach to ontology structure inference,
with applications in disease ontology construction
Chris Mungall
Lawrence Berkeley Laboratory
PhenoDay 2016
@monarchinit
@chrismungall

Building a cohesive, complete disease
ontology
Objective
• Combine existing disease
classifications and lists into
unified cohesive
framework
• Best of all worlds
• Integrate data from multiple
resources
Challenges
• Current resources
developed independently,
different perspectives
• Mappings are imprecise
OMIM Orphanet DO MESH NCIT
Deciphe
r
ICD SNOMED
Combined, coherent view

Disease classifications and why
mappings are not enough
• Given N disease lists
– Where each provides cross-references
(xrefs) to up to N-1 others
– Up to (N^2)-N sets of mappings
• Even more with 3rd party mappings
– These are frequently
• Inconsistent (directly or indirectly)
• Different meanings and levels of specificity
• Incomplete
• Stale
• Difficult to computationally verify
• Fundamental issue
– Xrefs lack semantics
– Explicit semantics would enable
computational checks
Ont1
Ont2 Ont3
Ont4
Ont5
Ont6

DOID
(blue)
OMIM
(brown)
MESH
(grey)
ORDO/Orphanet
(yellow)
SubClassOf
(solid line)
Xref
(dashed grey line)
4 disease resources
plus mappings:
Hemolytic anemia

Objective: Coherent OWL Ontology
Merging (OOM)
• Criteria for OOM
– Merged
• Combines multiple lists and classifications (terminologies
and lists treated as ‘degenerate’ ontologies), Presented as a
single ontology
• Equivalent classes merged
– Logically Connected
• OWL/Description Logic constructs
– e.g. SubClassOf, EquivalentClass, SomeValuesFrom
• Not xrefs
– Coherent
• Logically coherent: no unsatisfiable classes
• Biologically coherent: makes biological and clinical sense

Our previous approach, applied to
phenotypes: L-DOOM
Logical Definition based OWL Ontology Merging
Mungall, C. J., Gkoutos, G., Smith, C., Haendel, M., Lewis, S., & Ashburner, M. (2010). Integrating phenotype ontologies across multiple species. Genome Biology, 11(1), R2.
doi:10.1186/gb-2010-11-1-r2
Köhler, S., Doelken, S. C., Ruef, B. J., Bauer, S., Washington, N., Westerfield, M., … Mungall, C. J. (2013). Construction and accessibility of a cross-species phenotype ontology
along with gene annotations for biomedical research. F1000Research, 1–12. doi:10.3410/f1000research.2-30.v1
Application to diseases?
• Works well for compositional classes (e.g. many cancer terms)
• Less well for genetic diseases, complex syndromes
1. Assign Logical Definitions
(OWL equivalence axioms) to
classes in each ontology
• Can be assigned
manually or semi-
automatically (Obol)
HP:0002180
Neuro-
degeneration
MP:0000876
Purkinje cell
degeneration
Equiv
CL:0000540
neuron
CL:0000121
Purkinje cell
Equiv
degenerate
AND
inheres-in SOME
neuron
degenerate
AND
inheres-in SOME
Purkinje cell
2. Using reasoning to infer logical
axioms
SubClassOf

Probabilistic Ontology OP = <A,H>
BOOM Bayes OWL Ontology Merging:
Finds the set of hypothetical axioms that maximises P(OP)
Merged Coherent
OWL Ontology
Elk
Reasoner
Ontology 1
Inter-
Ontology
Mappings
mapping
tool
Ontology 2
Ontology ..
Ontology n
Hypothetical
Logical Axioms
plus Weights (H)
mapping
curation
Axiom Weight Estimator
Weight
Curation
Next iteration
Merge equivalent
classes

Generating hypothetical logical axioms
Inter-
Ontology
Mappings
Hypothetical
Logical Axioms
plus Weights (H)
Axiom Weight Estimator
E.g:
OMIM:123 xref
DOID:987
Pr(OMIM:123 ≡ DOID:987) = 0.3
Pr(OMIM:123 ⊂ DOID:987) = 0.4
Pr(OMIM:123 ⊃DOID:987) = 0.1
Domain rules
(lexical, structural, …):

K-BOOM Algorithm for finding most
likely merged ontology
1. Factorize calculation by dividing combined
axioms into k modules (k-BOOM)
Algorithm:
i. Assert all hypothetical axioms to be true,
ii. Make module from equivalence clique
Find values for H that maximises P.
Problem: 2^N ontologies
hi
: boolean representing truth value of hypothetical axiom Hi
2. Use greedy algorithm; start with
Most likely hypothetical axioms in Ok
3. Test each configuration using OWL
Reasoner (Elk) for satisfiability
(unsat => Pr=0), calc posterior probability
4. Repeat until number of tests
exceeds threshold
5. Return most likely configuration for Ok

Probability guided curator workflow:
A little knowledge goes a long way
• Run cycle
• Examine results for modules
with:
– low posterior probability
– low confidence (top ranked
solution has similar P to next
ranked)
– Pr(H_i = true) << threshold
• Apply biological/clinical
knowledge
• Override auto-generated
hypothetical axiom weights with
curated ones
– Feedback issues to source
ontologies
• Repeat
dialog
Mondo
curator
External
ontology
curator

Application: merging diseases into
MonDO
https://github.com/monarch-initiative/monarch-disease-ontology
“Ontology” Classes (before, after
merge)
SubClass axioms Xrefs
Inputs:
DOID 6878  6012 7082 36656
MESH (D) 11314  4152 19036
OMIM (D) 7783  7783 0 31242
Orphanet (D) 8740  4683 15182 20326
OMIA 4833  4833 3120 355
DC 209  208 310 316
Medic 0 8630 3435
Output:
MonDO 39757  27617 44837
Held back: NCIT, SNOMED, ICD9, GARD

Example Module Resolution: ITM2B
amyloidosis

Example failed resolution – due to
ontology error
https://github.com/monarch-initiative/monarch-disease-ontology/issues/99
https://github.com/DiseaseOntology/HumanDiseaseOntology/issues/164

Example failed resolution – due to
mesh duplicates
https://github.com/monarch-initiative/monarch-disease-ontology/issues/81

Evaluating results of disease merger
• No gold standard for multiple ontology merger
– Partial evaluation using held-back Orphanet NTBT/E calls:
• 6977/7986 (87% agreement)
• Ad-hoc evaluation by curator
– Approach: use posterior probabilities to rank modules requiring
attention
– This is the killer-app feature
– Iteratively refine curated probabilities
• https://github.com/monarch-initiative/monarch-disease-ontology/issues/
• Results
– Manual inspection and use of mondo
– Detection of errors in source ontologies
• E.g. duplicates in MESH
• Incorrect xrefs in DO, e.g.
– https://github.com/DiseaseOntology/HumanDiseaseOntology/issues - issues #164, #163,
#156, #154, #151, #150, #149, #140, #135

Next Steps
• Integrate hypothetical axiom weight estimation into
Bayesian model
• Apply Markov Chain Monte Carlo (MCMC) methods for
estimating most likely graph
– E.g Metropolis-Hastings
• Integrate other knowledge
– Logical Definitions (Phenotypes)
– Molecular knowledge
• Improve Evaluation
– Test k-BOOM on task where we have gold standard, e.g.
neuroanatomy/uberon
– Formal comparison with EFO, MedGen, …

Discussion
• Retrospective merging vs prospective
development
– Better to work together from outset (OBO model)
– However, current state of affairs is such that
expert knowledge is distributed across resources
– We want to preserve that rather than reinvent
– Coherent merging of molecular knowledge with
classical top-down knowledge will be required
moving forward

Implementation/Availability
• Software
– https://github.com/monarch-initiative/kboom
• Paper
– https://github.com/cmungall/kboom-paper
– http://biorxiv.org/content/early/2016/04/15/048843
• MonDO
– https://github.com/monarch-initiative/monarch-
disease-ontology
– Both OWL ontology and axiom weight rules

Acknowledgments
k-BOOM
• Ian Holmes
• Sebastian Kohler
• Jim Balhoff
• Peter Robinson
• Melissa Haendel
Curation
• Nicole Vasilesky (MonDO,
DC)
• Sue Bello (DC)
• Elvira Mitraka (DO)
• Lynn Shriml (DO)
FUNDING: NIH Office of Director: 1R24OD011883; NIH-UDP:
HHSN268201300036C

Kboom phenoday-2016

More Related Content

Similar to Kboom phenoday-2016

More from Chris Mungall

Recently uploaded

Kboom phenoday-2016

Editor's Notes