Reasoning over multiple open bio-
ontologies to make machines and
humans happy
Chris Mungall
cjmungall@lbl.gov
@chrismungall
http://bit.ly/mungall-us2ts-2019
Biological data management is hard.
There are many named things.
Drugs 10k
Chemicals 1-50m?
Species
~9 million
Diseases and
Phenotypes
10-50k/species
Cells
1000s+ types
per species)
Experiments
Raw data
Genes 20k/species
Genetic
variants
3m (human)
There are many ways to
categorize the things
Genes 20k/species
Gene Ontology
45k functional descriptor
classes
Knowledge Graph Edges
~7m
There are many ontologies
to categorize the things
762 ontologies
How do we manage this?
MODULARITY REASONING
How do we manage this?
MODULARITY REASONING
EL (Elk,
Whelk)
DL (Hermit,
FACT++)
● OBO
● Rector Normalization
● Design Patterns
● Relation Ontology
● ROBOT
Open Biological Ontologies (OBO)
http://obofoundry.org
1. Well-integrated
Modular ontologies
(SUBSET of bioportal)
2. Provide technical and
sociotechnological
framework for
cooperation
4. Allow us to
curate all of the
things
3. Provide tools,
best practices and
infrastructure for
forging new
ontologies
@obofoundry
Assembling
the Jigsaw
RECTOR
NORMALIZATION
Rector 2003
Modularisation of domain ontologies
implemented in description logics and related
formalisms including owl.
+ =
http://www.cs.man.ac.uk/~rector/papers/rector-modularisation-kcap-2003-distrib.pdf
Minimal Constructs Needed for
Reactor Normalization
Some
Values From
Intersection
Of
EquivalentTo
SubClassOf
OBO Relation Ontology: glue
within and between ontologies
http://obofoundry.org/ontology/ro
Spatial Reasoning OWL design
patterns
nucleus
> spatially_disjoint_with.yaml
axiom:
Text: (part-of some %s)
DisjointWith
(part_of some %s)
Vars:
- component1
- component2
Ontology:
(part-of some nucleus)
DisjointWith
(part-of some cytosol)
http://robot.obolibrary.org
Managing ontology release
Workflows with ODK and ROBOT
● Configure ontology
repo with yaml
● Reasoning + QC
checks via Travis-CI
https://github.com/INCATools/ontology-development-kit
Reasoning detects annotation
errors
Genes are often assigned
functions automatically based on
homology. This is error-prone.
Previous errors include:
• Genes in slime mold
responsible for dorsal fin
development
• Genes in chicken responsible
for lactation
Reasoning detects annotation
errors
Genes are often assigned
functions automatically based on
homology. This is error-prone.
Previous errors include:
• Genes in chicken responsible
for lactation
• Genes in slime mold responsible
for dorsal fin development
Dorsal Fin SubClassOf Fin
Fin SubClassOf part-of some Vertebrate
(Part-of some Animal) DisjointWith (part-of some Slime Mold)
Exomiser + OwlSim
OWL reasoning used
in clinical applications
to diagnose patients
Challenges
SOLVED
STILL VERY
HARD
Machine Reasoning Human Reasoning about
Machine Reasoning
Pop quick: what OWL profile is this?
'DNA extent' EquivalentTo
'sequence molecular entity extent' and
('has part' only
('deoxyribonucleotide residue' or
(('chemical entity' or
'biological sequence entity') and
(not ('biological sequence unit')))))
Combining transitive properties and universal
restrictions can take you strange places
'DNA extent' EquivalentTo
'sequence molecular entity extent' and
('has part' only
('deoxyribonucleotide residue' or
(('chemical entity' or
'biological sequence entity') and
(not ('biological sequence unit'))
)
))
Avoid going mad with complex nested boolean
expressions
KEEP IT SIMPLE,
SAPIENS
Disjoint
Classes
Some
Values From
Intersection
Of
Use with caution:
1. Only
2. Not
3. Cardinality
4. Levels of nesting requiring
parentheses
Generally not needed for bio-
ontology T-Box reasoning
1. Data Properties
2. Keys
BIG BUCKET OF
MIXED AXIOMS
I've giv'n her all
she's got captain, an'
I canna give her no
more!
1
BIG BUCKET OF
MIXED AXIOMS
I've giv'n her all
she's got captain, an'
I canna give her no
more!
WEE BUCKET
OF HARD
AXIOMS
BIG BUCKET OF
EASY AXIOMS
Let me just shoogle these
axioms aroond a wee bit
1
2
HARD: Erythrocyte SubClassOf has_part exactly 0 nucleus
⇒
HARD: Anucleate EquivalentTo has_part exactly 0 nucleus
EASY: Erythrocyte SubClassOf Anucleate
BIG BUCKET OF
MIXED AXIOMS
I've giv'n her all
she's got captain, an'
I canna give her no
more!
WEE BUCKET
OF HARD
AXIOMS
BIG BUCKET OF
EASY AXIOMS
Let me just shoogle these
axioms aroond a wee bit
Och aye that’s
just aboot right
1
2
3
BIG BUCKET OF
MIXED AXIOMS
I've giv'n her all
she's got captain, an'
I canna give her no
more!
WEE BUCKET
OF HARD
AXIOMS
BIG BUCKET OF
EASY AXIOMS
Let me just shoogle these
axioms aroond a wee bit
Och aye that’s
just aboot right
1
2
3
Now I’ll hand these over to ma
pal the Elk, he’s pure dead fast
4
I’m traveling at the speed of light that’s
why they call me Mr Farenheit
5
THE
END
What happens when the pieces
don’t fit together?
Making the pieces fit together: GO
and CHEBI
GO CHEBI
• Some relationships didn’t make
sense
• E.g. nucleotide isa
carbohydrate
• Acids ⬄ conjugate
bases
Making the pieces fit together: GO
and CHEBI
Hill, D. P., Adams, N., Bada, M., Batchelor, C., Berardini, T. Z., Dietze, H., … Lomax, J. (2013). Dovetailing biology and
chemistry: integrating the Gene Ontology with the ChEBI chemical ontology. BMC Genomics, 14(1), 513.
GO CHEBI
• Fixed many is-as
• E.g. nucleotide isa
carbohydrate
• Acids ⬄ conjugate
bases
+ OWL reasoning
Harold Drabkin
David Hill
Jane Lomax
Tanya Berardini
Janna Hastings
GO CHEBI
+ Design
Patterns
https://douroucouli.wordpress.com
Conclusions
● Maintaining > ~100 classes benefits from reasoning
● Maintaining > ~10000 classes: you will be in maintenance hell without
reasoning
● Reasoning is dead easy for computers
● Reasoning can be hard for humans
○ Keep it simple
○ Use Design Patterns / Templates
○ Use software engineering paradigms
○ Avoid unneccessary complexity
● Sociotechnological aspects of reasoning are hardest
○ “I don’t like the entailments I get when I use your ontology”
http://bit.ly/mungall-us2ts-2019

US2TS: Reasoning over multiple open bio-ontologies to make machines and humans happy

  • 1.
    Reasoning over multipleopen bio- ontologies to make machines and humans happy Chris Mungall cjmungall@lbl.gov @chrismungall http://bit.ly/mungall-us2ts-2019
  • 2.
    Biological data managementis hard. There are many named things. Drugs 10k Chemicals 1-50m? Species ~9 million Diseases and Phenotypes 10-50k/species Cells 1000s+ types per species) Experiments Raw data Genes 20k/species Genetic variants 3m (human)
  • 3.
    There are manyways to categorize the things Genes 20k/species Gene Ontology 45k functional descriptor classes Knowledge Graph Edges ~7m
  • 4.
    There are manyontologies to categorize the things 762 ontologies
  • 5.
    How do wemanage this? MODULARITY REASONING
  • 6.
    How do wemanage this? MODULARITY REASONING EL (Elk, Whelk) DL (Hermit, FACT++) ● OBO ● Rector Normalization ● Design Patterns ● Relation Ontology ● ROBOT
  • 7.
    Open Biological Ontologies(OBO) http://obofoundry.org 1. Well-integrated Modular ontologies (SUBSET of bioportal) 2. Provide technical and sociotechnological framework for cooperation 4. Allow us to curate all of the things 3. Provide tools, best practices and infrastructure for forging new ontologies @obofoundry
  • 8.
  • 9.
    RECTOR NORMALIZATION Rector 2003 Modularisation ofdomain ontologies implemented in description logics and related formalisms including owl. + = http://www.cs.man.ac.uk/~rector/papers/rector-modularisation-kcap-2003-distrib.pdf
  • 10.
    Minimal Constructs Neededfor Reactor Normalization Some Values From Intersection Of EquivalentTo SubClassOf
  • 11.
    OBO Relation Ontology:glue within and between ontologies http://obofoundry.org/ontology/ro
  • 12.
    Spatial Reasoning OWLdesign patterns nucleus > spatially_disjoint_with.yaml axiom: Text: (part-of some %s) DisjointWith (part_of some %s) Vars: - component1 - component2 Ontology: (part-of some nucleus) DisjointWith (part-of some cytosol)
  • 13.
    http://robot.obolibrary.org Managing ontology release Workflowswith ODK and ROBOT ● Configure ontology repo with yaml ● Reasoning + QC checks via Travis-CI https://github.com/INCATools/ontology-development-kit
  • 14.
    Reasoning detects annotation errors Genesare often assigned functions automatically based on homology. This is error-prone. Previous errors include: • Genes in slime mold responsible for dorsal fin development • Genes in chicken responsible for lactation
  • 15.
    Reasoning detects annotation errors Genesare often assigned functions automatically based on homology. This is error-prone. Previous errors include: • Genes in chicken responsible for lactation • Genes in slime mold responsible for dorsal fin development Dorsal Fin SubClassOf Fin Fin SubClassOf part-of some Vertebrate (Part-of some Animal) DisjointWith (part-of some Slime Mold)
  • 16.
    Exomiser + OwlSim OWLreasoning used in clinical applications to diagnose patients
  • 17.
    Challenges SOLVED STILL VERY HARD Machine ReasoningHuman Reasoning about Machine Reasoning
  • 18.
    Pop quick: whatOWL profile is this? 'DNA extent' EquivalentTo 'sequence molecular entity extent' and ('has part' only ('deoxyribonucleotide residue' or (('chemical entity' or 'biological sequence entity') and (not ('biological sequence unit')))))
  • 19.
    Combining transitive propertiesand universal restrictions can take you strange places 'DNA extent' EquivalentTo 'sequence molecular entity extent' and ('has part' only ('deoxyribonucleotide residue' or (('chemical entity' or 'biological sequence entity') and (not ('biological sequence unit')) ) ))
  • 20.
    Avoid going madwith complex nested boolean expressions KEEP IT SIMPLE, SAPIENS Disjoint Classes Some Values From Intersection Of Use with caution: 1. Only 2. Not 3. Cardinality 4. Levels of nesting requiring parentheses Generally not needed for bio- ontology T-Box reasoning 1. Data Properties 2. Keys
  • 21.
    BIG BUCKET OF MIXEDAXIOMS I've giv'n her all she's got captain, an' I canna give her no more! 1
  • 22.
    BIG BUCKET OF MIXEDAXIOMS I've giv'n her all she's got captain, an' I canna give her no more! WEE BUCKET OF HARD AXIOMS BIG BUCKET OF EASY AXIOMS Let me just shoogle these axioms aroond a wee bit 1 2 HARD: Erythrocyte SubClassOf has_part exactly 0 nucleus ⇒ HARD: Anucleate EquivalentTo has_part exactly 0 nucleus EASY: Erythrocyte SubClassOf Anucleate
  • 23.
    BIG BUCKET OF MIXEDAXIOMS I've giv'n her all she's got captain, an' I canna give her no more! WEE BUCKET OF HARD AXIOMS BIG BUCKET OF EASY AXIOMS Let me just shoogle these axioms aroond a wee bit Och aye that’s just aboot right 1 2 3
  • 24.
    BIG BUCKET OF MIXEDAXIOMS I've giv'n her all she's got captain, an' I canna give her no more! WEE BUCKET OF HARD AXIOMS BIG BUCKET OF EASY AXIOMS Let me just shoogle these axioms aroond a wee bit Och aye that’s just aboot right 1 2 3 Now I’ll hand these over to ma pal the Elk, he’s pure dead fast 4 I’m traveling at the speed of light that’s why they call me Mr Farenheit 5 THE END
  • 25.
    What happens whenthe pieces don’t fit together?
  • 26.
    Making the piecesfit together: GO and CHEBI GO CHEBI • Some relationships didn’t make sense • E.g. nucleotide isa carbohydrate • Acids ⬄ conjugate bases
  • 27.
    Making the piecesfit together: GO and CHEBI Hill, D. P., Adams, N., Bada, M., Batchelor, C., Berardini, T. Z., Dietze, H., … Lomax, J. (2013). Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology. BMC Genomics, 14(1), 513. GO CHEBI • Fixed many is-as • E.g. nucleotide isa carbohydrate • Acids ⬄ conjugate bases + OWL reasoning Harold Drabkin David Hill Jane Lomax Tanya Berardini Janna Hastings GO CHEBI + Design Patterns
  • 28.
  • 29.
    Conclusions ● Maintaining >~100 classes benefits from reasoning ● Maintaining > ~10000 classes: you will be in maintenance hell without reasoning ● Reasoning is dead easy for computers ● Reasoning can be hard for humans ○ Keep it simple ○ Use Design Patterns / Templates ○ Use software engineering paradigms ○ Avoid unneccessary complexity ● Sociotechnological aspects of reasoning are hardest ○ “I don’t like the entailments I get when I use your ontology” http://bit.ly/mungall-us2ts-2019