Causal Reasoning using the
Relation Ontology
Chris Mungall
Lawrence Berkeley National Laboratory
cjmungall@lbl.gov
Outline
● The need for an ontology of relations
● Tour of the Relation Ontology
● Use in GO Causal Inference
● Causal Relations for Diseases
● Integrating multiple knowledge graphs
Why we need relationship
types
Why we need relationship
types
Melania
Trump
person
Barack
Obama
person
Michelle
Obama
person
Vladimir
Putin
Russia
Donald
Trump
person
USA
country
country
person
Why we need relationship types
for biological data
Gene C
gene
Gene B
gene
Disease
X
Gene A
gene
Disease
Y
disease
disease
Why we need standardised
relationship types for biological
data
B
gene
A
gene
INTERACTS_WITH
B
gene
A
gene
physically interacts with
B
gene
A
gene
binds
database 1
database 2
database 3
Why we need standardised
relationship types for biological
data
B
gene
A
gene
affects
B
gene
A
gene
regulates
B
gene
A
gene
PHOSPHORYLATES
database 4
database 5
database 6
MeSH
Why we need
relationship types
in biological
ontologies
Application to linked data
{ “id”: “ENSEMBL:ENSG000000001”,
“symbol”: “…”
“type”: “gene”,
“has_part” : [
{ “id” : “ENSEMBL:ENST0000…”,
“type” : “transcript”,
“encodes” : {
“id” : “ENSEMBL:ENSP....”,
“type” : “protein”,
…
Application to linked data
{ “id”: “ENSEMBL:ENSG000000001”,
“symbol”: “…”
“type”: “gene”,
“has_part” : [
{ “id” : “ENSEMBL:ENST0000…”,
“type” : “transcript”,
“encodes” : {
“id” : “ENSEMBL:ENSP....”,
“type” : “protein”,
…
{“@context”: {
“symbol” : “rdfs:label”,
“type” : “rdf:type”,
“gene” : “http://purl.obolibrary.org/obo/SO_0000704”,
“transcript” : “http://purl.obolibrary.org/obo/SO_0000673”,
“has_part” : ???,
“encodes” : ???,
…
Relations are the glue for
integration
https://twitter.com/dhimmel/status/810996703901777920
OBO Relation Ontology
● An ontology of Relationship Types
◦ Hierarchically organized
● OWL provides mathematical-logical
foundation
● Currently > 450 relations
◦ “Core” relations (e.g. part of)
◦ General purpose (e.g. has input)
◦ Domain-centric (e.g. phosphorylates)
● Originally used for relationships in
ontologies
◦ Now used in Knowledge Graphs, Linked Data
http://obofoundry.org/ontology/ro.html
https://www.ebi.ac.uk/ols/ontologies/ro
OLS View
https://www.ebi.ac.uk/ols/ontologies/ro
OLS View ID
URI
OntoBee view
http://purl.obolibrary.org/obo/RO_0002211
Protégé View
Description Logics provide basis for
logical reasoning
● TBox
◦ Classes and class axioms
⚫e.g. nucleus SubClassOf organelle, part_of some
cytoplasm
◦ (Most ontologies are TBox-centric)
● ABox
◦ Instances and instance-level axioms
⚫e.g. patient123 has_sequence genome567
◦ (Typically not asserted in ontologies)
● RBox
◦ Object Properties (aka Relations)
⚫e.g. part of is Transitive
◦ (RO is RBox-centric)
SubPropertyOf Axioms
regulates
positively
regulates
negatively
regulates
SubPropertyOfSubPropertyOf
x positively regulates y
➔ x regulates y 606
SubPropertyOf
Axioms
InverseOf Axioms
regulates regulated by
x regulates y ⬄ y regulated by x
105
InverseOf
Axioms
Note: relations often have
an arbitrary canonical
direction, properties of
inverse is trivially inferred
Domain and Range
expressed in
material
anatomical
entity
expressed in
Domain: gene
Range: material anatomical entity
gene
221
Domain/Range
Axioms
BFO and OBO Core used for constraints
Characteristics
● Transitive
◦ x R y / y R z ➔ x R z
◦ Examples: part of, develops from
● Symmetric
◦ x R y ➔ y R x
◦ Examples: adjacent to
● Reflexive
● Anti-symmetric
● Functional
129
Axioms
SWRL (Semantic Web Rule
Language)
● Examples:
◦ child_of(?x, ?y)∧brother_of(?y, ?z) ➔
has_uncle(?x, ?z)
◦ negatively_regulates(?x, ?y) ∧
negatively_regulates(?y, ?z) ➔
positively_regulates(?x, ?z)
18
SWRL Rules
Property Chains
● More compact way to write SWRL rules
◦ Uses function composition symbol ‘•’
◦ Less expressive
◦ Examples:
⚫child_of • has_brother ➔ has uncle
⚫negatively regulates • negatively regulates ➔
positively regulates
139
Property Chain
Axioms
RO Release Process
● All coordinated via GitHub
◦ Issues: https://github.com/oborel/obo-relations/isssues
◦ All changes proposed via Pull Requests
https://github.com/oborel/obo-relations/pulls
◦ Validated by Travis-CI
◦ Merged by core editors
● All released are vetted
◦ Automatically
⚫ HermiT OWL Reasoner
⚫ ROBOT Release Tool
⚫ Ontology Development Kit Docker
◦ Manually
https://github.com/INCATools/ontology-development-kit
RO Core
● Generic: apply across
multiple domains
● E.g.
⚫every finger part of a hand
⚫every M phase part of a cell cycle
⚫Cambridge part of UK
General purpose and specific
relations
RO for particular domains
● Ecology
◦ Biotic interaction relationships
● Anatomy
● Evolutionary Relationships
● Genome Features
● Causal activity models
● Disease causation
How RO is used
● Ontologies:
◦ Relationships between classes
◦ Widely used in OBO
● Knowledge Graphs:
⚫SPARQL endpoints
⚫Neo4J and other graph databases
⚫JSON-LD
⚫Relational Databases (e.g. GMOD/Chado)
Usage of RO in OBO
● Count of number of ontologies
using each relation
Use of RO in Knowledge
Graphs
● GO Causal Annotation Graphs
● Disease/Phenotype Graphs
GO’s initial attempts at
causality
GO:0086094
positive regulation of ryanodine-sensitive
calcium-release channel activity by
adrenergic receptor signaling pathway
involved in positive regulation of cardiac
muscle contraction
Mungall’s law[??*]: an inexpressive bio-database schema
will be abused to the maximum extent possible in order for
curators to express complexities of biology
[*] I have a feeling I’m not the first to express this
GO:0086023
adenylate cyclase-activating adrenergic receptor
signaling pathway involved in heart process
subClassOf
http://noctua.geneontology.org
GO Causal Activity Model (GO-CAM) RDF
Graphs
Collaborative
Editing!
RO axioms support inference across
graphs of individuals
grk-2 Cele#59dc728000000288 acts_upstream_of_positive_effect G-protein coupled serotonin
receptor activity#59dc728000000347
Arachne Reasoner: https://github.com/balhoff/arachne
Pathway 2
GO-CAM
Reactome
GO-CAM
(OWL)
BioPAX
Level 3
Converter
https://github.com/geneontology/pathways2GO
Reaction -> Activity
BMP2 binds to the receptor complex (Reactome)
GO:0005160
Mapping rules
Pathway ->
Causal Activity Model
Causal Activity Flow
Clathrin-mediated endocytosis
OWL Reasoning
Infers more
specific GO
assignments
using GO OWL
axiomatization
Shortcut Relations and inference
rules unify perspectives
Any GO kinase
activity
Any GO activity
GeneProduct1 GeneProduct2
directly
regulates
enabled
by
enabled
by
phosphorylates
GO-CAM
View
(activity
centric,
semantics
on nodes)
entity-
centric
(SIF,
CausalTab, ..)
GO-CAM site
http://geneontology.org/go-cam
Future Applications
● Boolean Modeling
● Causal Gene Set Enrichment
MONDO: Monarch Disease
Ontology
● Unifies multiple disease resources
● Diseases as states
● Diseases have causal basis in
◦ disruption of a process
◦ dysfunction of a structure, causing
disruption of a biological process
● Diseases have features
◦ also causally linked
https://api.monarchinitiative.org/api/
BioLink API
Unifying multiple knowledge
graphs
● KGs emerging as popular ML
representation
◦ node embedding, NNs, link prediction
● Challenge
◦ combining different KGs together
● Different standards
◦ RO/OBO
◦ Wikidata
◦ SIO http://sio.semanticscience.org/
◦ Many KGs have no standards, ad-hoc
relations
⚫ e.g. SemMedDB
BioLink Model
https://biolink.github.io/biolink-model/
Biological
Entity
Organismal
Entity
Molecular
Entity
Genomic
Entity
Chemical
Substance
Organism
Anatomical
Entity
Cell Type
Gross
Anatomical
Structure
Gene
Gene
Family
https://biolink.github.io/biolink-
model/docs/PairwiseGeneToGeneInteraction.html
Translator and KGX
https://github.com/NCATS-Tangerine/kgx
Merged Knowledge Graphs
Conclusions
● Standardized relations required for
◦ ontologies
◦ knowledge graphs
◦ bioinformatics exchange formats
● RO provides
◦ Broad set of relations
◦ Different use cases
◦ OWL axiomatization enables inference
● Uses
◦ GO
◦ Disease and phenotype
Acknowledgments
● Relation Ontology
◦ Matt Brush
◦ David Osumi-Sutherland
◦ James Overton
◦ Jim Balhoff
◦ Suzanna Lewis
◦ Anne Thessen
◦ Mike Sinclair
◦ David Hill
◦ Kimberley Van Auken
◦ Larry Hunter
◦ Barry Smith
◦ Alan Ruttenberg
◦ Melissa Haendel
◦ Paul Thomas
● MONDO
◦ Nicole Vasilevsky
◦ Peter Robinson
◦ EBI curators
◦ GARD curators
◦ ClinGen curators
● BioLink
◦ Harold Solbrig
◦ Deepak Unni
◦ Seth Carbon
◦ Gregg Stuppe
◦ Laurent-Phillipe Albou
◦ Tim Putman
◦ Kent Shefchek
◦ Chris Bizon
◦ Michel Dumontier
◦ Lance Hannestad
◦ Richard Bruskiewich

Causal reasoning using the Relation Ontology

  • 1.
    Causal Reasoning usingthe Relation Ontology Chris Mungall Lawrence Berkeley National Laboratory cjmungall@lbl.gov
  • 2.
    Outline ● The needfor an ontology of relations ● Tour of the Relation Ontology ● Use in GO Causal Inference ● Causal Relations for Diseases ● Integrating multiple knowledge graphs
  • 3.
    Why we needrelationship types
  • 4.
    Why we needrelationship types Melania Trump person Barack Obama person Michelle Obama person Vladimir Putin Russia Donald Trump person USA country country person
  • 5.
    Why we needrelationship types for biological data Gene C gene Gene B gene Disease X Gene A gene Disease Y disease disease
  • 6.
    Why we needstandardised relationship types for biological data B gene A gene INTERACTS_WITH B gene A gene physically interacts with B gene A gene binds database 1 database 2 database 3
  • 7.
    Why we needstandardised relationship types for biological data B gene A gene affects B gene A gene regulates B gene A gene PHOSPHORYLATES database 4 database 5 database 6
  • 8.
    MeSH Why we need relationshiptypes in biological ontologies
  • 9.
    Application to linkeddata { “id”: “ENSEMBL:ENSG000000001”, “symbol”: “…” “type”: “gene”, “has_part” : [ { “id” : “ENSEMBL:ENST0000…”, “type” : “transcript”, “encodes” : { “id” : “ENSEMBL:ENSP....”, “type” : “protein”, …
  • 10.
    Application to linkeddata { “id”: “ENSEMBL:ENSG000000001”, “symbol”: “…” “type”: “gene”, “has_part” : [ { “id” : “ENSEMBL:ENST0000…”, “type” : “transcript”, “encodes” : { “id” : “ENSEMBL:ENSP....”, “type” : “protein”, … {“@context”: { “symbol” : “rdfs:label”, “type” : “rdf:type”, “gene” : “http://purl.obolibrary.org/obo/SO_0000704”, “transcript” : “http://purl.obolibrary.org/obo/SO_0000673”, “has_part” : ???, “encodes” : ???, …
  • 11.
    Relations are theglue for integration https://twitter.com/dhimmel/status/810996703901777920
  • 12.
    OBO Relation Ontology ●An ontology of Relationship Types ◦ Hierarchically organized ● OWL provides mathematical-logical foundation ● Currently > 450 relations ◦ “Core” relations (e.g. part of) ◦ General purpose (e.g. has input) ◦ Domain-centric (e.g. phosphorylates) ● Originally used for relationships in ontologies ◦ Now used in Knowledge Graphs, Linked Data
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
    Description Logics providebasis for logical reasoning ● TBox ◦ Classes and class axioms ⚫e.g. nucleus SubClassOf organelle, part_of some cytoplasm ◦ (Most ontologies are TBox-centric) ● ABox ◦ Instances and instance-level axioms ⚫e.g. patient123 has_sequence genome567 ◦ (Typically not asserted in ontologies) ● RBox ◦ Object Properties (aka Relations) ⚫e.g. part of is Transitive ◦ (RO is RBox-centric)
  • 19.
  • 20.
    InverseOf Axioms regulates regulatedby x regulates y ⬄ y regulated by x 105 InverseOf Axioms Note: relations often have an arbitrary canonical direction, properties of inverse is trivially inferred
  • 21.
    Domain and Range expressedin material anatomical entity expressed in Domain: gene Range: material anatomical entity gene 221 Domain/Range Axioms BFO and OBO Core used for constraints
  • 22.
    Characteristics ● Transitive ◦ xR y / y R z ➔ x R z ◦ Examples: part of, develops from ● Symmetric ◦ x R y ➔ y R x ◦ Examples: adjacent to ● Reflexive ● Anti-symmetric ● Functional 129 Axioms
  • 23.
    SWRL (Semantic WebRule Language) ● Examples: ◦ child_of(?x, ?y)∧brother_of(?y, ?z) ➔ has_uncle(?x, ?z) ◦ negatively_regulates(?x, ?y) ∧ negatively_regulates(?y, ?z) ➔ positively_regulates(?x, ?z) 18 SWRL Rules
  • 24.
    Property Chains ● Morecompact way to write SWRL rules ◦ Uses function composition symbol ‘•’ ◦ Less expressive ◦ Examples: ⚫child_of • has_brother ➔ has uncle ⚫negatively regulates • negatively regulates ➔ positively regulates 139 Property Chain Axioms
  • 25.
    RO Release Process ●All coordinated via GitHub ◦ Issues: https://github.com/oborel/obo-relations/isssues ◦ All changes proposed via Pull Requests https://github.com/oborel/obo-relations/pulls ◦ Validated by Travis-CI ◦ Merged by core editors ● All released are vetted ◦ Automatically ⚫ HermiT OWL Reasoner ⚫ ROBOT Release Tool ⚫ Ontology Development Kit Docker ◦ Manually https://github.com/INCATools/ontology-development-kit
  • 26.
    RO Core ● Generic:apply across multiple domains ● E.g. ⚫every finger part of a hand ⚫every M phase part of a cell cycle ⚫Cambridge part of UK
  • 27.
    General purpose andspecific relations
  • 28.
    RO for particulardomains ● Ecology ◦ Biotic interaction relationships ● Anatomy ● Evolutionary Relationships ● Genome Features ● Causal activity models ● Disease causation
  • 29.
    How RO isused ● Ontologies: ◦ Relationships between classes ◦ Widely used in OBO ● Knowledge Graphs: ⚫SPARQL endpoints ⚫Neo4J and other graph databases ⚫JSON-LD ⚫Relational Databases (e.g. GMOD/Chado)
  • 30.
    Usage of ROin OBO ● Count of number of ontologies using each relation
  • 31.
    Use of ROin Knowledge Graphs ● GO Causal Annotation Graphs ● Disease/Phenotype Graphs
  • 32.
    GO’s initial attemptsat causality GO:0086094 positive regulation of ryanodine-sensitive calcium-release channel activity by adrenergic receptor signaling pathway involved in positive regulation of cardiac muscle contraction Mungall’s law[??*]: an inexpressive bio-database schema will be abused to the maximum extent possible in order for curators to express complexities of biology [*] I have a feeling I’m not the first to express this GO:0086023 adenylate cyclase-activating adrenergic receptor signaling pathway involved in heart process subClassOf
  • 33.
  • 34.
    GO Causal ActivityModel (GO-CAM) RDF Graphs
  • 36.
  • 37.
    RO axioms supportinference across graphs of individuals grk-2 Cele#59dc728000000288 acts_upstream_of_positive_effect G-protein coupled serotonin receptor activity#59dc728000000347 Arachne Reasoner: https://github.com/balhoff/arachne
  • 38.
  • 39.
    Reaction -> Activity BMP2binds to the receptor complex (Reactome) GO:0005160 Mapping rules
  • 40.
  • 41.
  • 42.
    OWL Reasoning Infers more specificGO assignments using GO OWL axiomatization
  • 43.
    Shortcut Relations andinference rules unify perspectives Any GO kinase activity Any GO activity GeneProduct1 GeneProduct2 directly regulates enabled by enabled by phosphorylates GO-CAM View (activity centric, semantics on nodes) entity- centric (SIF, CausalTab, ..)
  • 44.
  • 45.
    Future Applications ● BooleanModeling ● Causal Gene Set Enrichment
  • 46.
    MONDO: Monarch Disease Ontology ●Unifies multiple disease resources ● Diseases as states ● Diseases have causal basis in ◦ disruption of a process ◦ dysfunction of a structure, causing disruption of a biological process ● Diseases have features ◦ also causally linked
  • 48.
  • 49.
    Unifying multiple knowledge graphs ●KGs emerging as popular ML representation ◦ node embedding, NNs, link prediction ● Challenge ◦ combining different KGs together ● Different standards ◦ RO/OBO ◦ Wikidata ◦ SIO http://sio.semanticscience.org/ ◦ Many KGs have no standards, ad-hoc relations ⚫ e.g. SemMedDB
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
    Conclusions ● Standardized relationsrequired for ◦ ontologies ◦ knowledge graphs ◦ bioinformatics exchange formats ● RO provides ◦ Broad set of relations ◦ Different use cases ◦ OWL axiomatization enables inference ● Uses ◦ GO ◦ Disease and phenotype
  • 55.
    Acknowledgments ● Relation Ontology ◦Matt Brush ◦ David Osumi-Sutherland ◦ James Overton ◦ Jim Balhoff ◦ Suzanna Lewis ◦ Anne Thessen ◦ Mike Sinclair ◦ David Hill ◦ Kimberley Van Auken ◦ Larry Hunter ◦ Barry Smith ◦ Alan Ruttenberg ◦ Melissa Haendel ◦ Paul Thomas ● MONDO ◦ Nicole Vasilevsky ◦ Peter Robinson ◦ EBI curators ◦ GARD curators ◦ ClinGen curators ● BioLink ◦ Harold Solbrig ◦ Deepak Unni ◦ Seth Carbon ◦ Gregg Stuppe ◦ Laurent-Phillipe Albou ◦ Tim Putman ◦ Kent Shefchek ◦ Chris Bizon ◦ Michel Dumontier ◦ Lance Hannestad ◦ Richard Bruskiewich