The document discusses collaboratively building a knowledge graph of life by connecting existing biological ontologies. It describes how ontologies can standardize and organize biological data by representing entities and their relationships in a graph. The challenges of integrating different ontology projects are addressed through initiatives like the Open Biological and Biomedical Ontologies (OBO) Foundry. The document outlines how ontologies can be formalized using OWL and connected using tools like the Ontology Development Kit to enable discovery across domains. Current efforts like the Gene Ontology, Biolink Model, and National Microbiome Data Collaborative are leveraging these techniques to create unified, semantically queryable knowledge graphs.
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Collaboratively Creating the Knowledge Graph of Life
1. Collaboratively Creating the Knowledge
Graph of Life
Chris Mungall
cjmungall@lbl.gov @chrismungall
JPM Graph Gang April 2021
2. Who am I and why am I here?
Education: University of Edinburgh
(BSc & PhD, AI + Bioinformatics)
Current: Staff Scientist, Berkeley Lab,
Environmental Genomics and Systems
Biology
In between: Lots of hacking and
occasional research papers
What I’m known for:
Biological databases and ontologies
What I actually do:
Write proposals and wrange grants
and let others do the work
6. Biological data management is hard.
We have many named things.
Drugs 10k
Chemicals 4tn?
Species
~9 million
Diseases and
Phenotypes
10-50k/species
Cells
10,000s+
types
per species)
Experiments
Raw data
?? exabytes
Genes 20k per species
Genetic
variants
3m in human
alone
7. The things are interconnected
Cirrhosis
MONDO:0005155
Liver
UBERON:0002107
Hepatocyte
CL:0000182
Ethanol
CHEBI:16236
8. It’s hard to find and integrate the things
I guess I
have a lot of
reading to
do!
9. Ontologies and Knowledge Graphs to the rescue!
I can organize it all
for you!
Ontolowhat?
genes diseases cell types
10. What is an ontology anyway?
??? how does this
help me?
It is the branch of
metaphysics dealing with
the nature of being.
No, it’s a formal, explicit
specification of a shared
conceptualization
11. What is an ontology anyway?
...better,
I guess
A graph (network)
connecting all the things
you care about
me
pizza cheese
food
Victor
cat
human
mammal
fromage@fr
type
is a
has pet
depicted by
likes
has part
has role
type
12. Ontologies enable discovery
This is
fun
actually...
Do owners of different
kinds of pets like different
kinds of food? What do
those foods have in
common?
me
pizza cheese
food
Victor
cat
human
mammal
fromage@fr
type
is a
has pet
depicted by
likes
has part
has role
type
Lizard owners
like spicy food
[p=0.04]
13. Some of the things you can do with ontologies
Standardize,
organize, &
communicate
data
Filter &
search for
data
Connect &
harmonize data
Infer
knowledge &
make
suggestions
14. The Gene Ontology: An Ontology for Genes
Genes 20k/species
Gene Ontology (GO)
45k ontology classes
Each gene can be categorized with multiple
GO terms describing the role of each gene
15. The Gene Ontology is the work of many people
● Manual curation forms the
backbone of the GO
● AI can help but not replace!
16. id: GO:0043570
name: maintenance of
DNA repeat elements
id: GO:0006915
name: apoptosis
id: GO:0016446
name: somatic hypermutation of
immunoglobulin genes
Inferring GO
classification
based on
evolutionary
history of
genes
17. Effects of space radiation on CSF molecular profiles
• Innate immune system overactivated
• Decreased nervous system development
• axonal fasciculation, astrocyte & oligodendrocyte
differentiation,
synaptic plasticity, axonogenesis, …
• Many negative regulation processes impaired:
• Neuron proliferation, differentiation and projection
• Leukocyte proliferation and differentiation
• Extrinsic and possibly intrinsic apoptotic signaling pathways
Goal: predict individual risks for
behavior deficits and brain
pathologies in astronauts
proteomic
data
GO
analysis
predict
18. GO is used by researchers… and in the clinic!
doi:10.1038/nature24487
Transgenic replacement
skin was tested for off-
target mutations using GO
19. GO describes just one aspect of biology
Drugs 10k
Chemicals 4tn?
Species
~9 million
Diseases and
Phenotypes
10-50k/species
Cells
1000s+ core
types
per species)
Experiments
Raw data
?? exabytes
Genes 20k per species
Gene Functions
Genetic
variants
3m in human
alone
20. There are many ontologies
to categorize the other
things
many biological ontologies!
Problems:
● Duplication
● Silos
● Lack of interoperability
21. We can build
the universal
ontological
map of life...
...but how do
we put the
pieces
together?
23. Open Biological Ontologies (OBO)
http://obofoundry.org
1. Well-integrated
Modular ontologies
(SUBSET of bioportal)
2. Provide technical and
sociotechnological
framework for
cooperation
4. Allow us to describe all
of the things
3. Provide tools,
best practices and
infrastructure for
forging new
ontologies
@obofoundry
24. • 160 active ontologies
○ Developed by different teams
• Millions of classes
• Wide variety
○ Specific
■ E.g. cephalopod
○ General
■ E.g. chemicals
http://obofoundry.org
The OBO Foundry
27. The original bio-ontologies were silos
glucan biosynthesis
(GO:0009250)
polysaccharide biosynthesis
(GO:0000271)
is_a
glucan
(CHEBI:37163)
polysaccharide
(CHEBI:18154)
is_a
GO:
Biological
Processes
CHEBI:
Chemical
Entities
No reuse or
connection
28. OWL to the rescue!
MODULARITY
TOOLS +
REASONING
32. ODK: ONTOLOGY DEVELOPMENT KIT
kernel
ODK container
ROBOT
Make
odk.py
dosdp-tools
Reasoners
container
Ontology
Operations
(Command Line)
Workflows: chains
together
operations
Seed an ontology project:
Create a GitHub
repository
with workflows in place
Build ontologies rapidly
from
Design Pattern templates
Includes Elk, HermiT,
Konklude
Complements ODEs
(Protégé)
fastobo
Validation of obo
format files
(Rust)
https://github.com/INCATools/ontology-development-kit
33. ROBOT is an OBO tool
http://robot.obolibrary.org/
Standard
Command
Line
Operations
40. Ontology Users
Ontology
Developer
s
OWL
experts
● Author OWL templates
● Create Design Patterns
● Implement OWL templates
● Test against Design Patterns
● Consume pre-
reasoned hierarchies
Leverage the Expertise Pyramid
https://github.com/INCATools/dead_simple_owl_design_patterns
41. Can we make semantic tools easier?
RDF
OWL
SPARQL
SHACL
Semantic
engineer /
ontologist
Developer
Data Scientist
Scientists, Clinicians, ..
Python
SQL
Mongo
JSON
Pandas
BigTable
SPARK
Scikit-learn
Excel
Web Portals
???
42. id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
LinkML: Linked Data Modeling Language
MyModel
Documentation
OWL
JSON Schema
ShEx Schema
Schema.py
GraphQL Schema
JSONLD Context
. . .
LinkML
schema
http://linkml.github.io
43. Biolink
Model
Biolink: Goals
The charge from NCATS:
● Create a Knowledge Graph Schema
● Encompass all biology from molecules through to clinical entities
● Get 20 different sites using the same data model
○ (oh: Only a handful of which use RDF/OWL)
● Do it quickly and break new ground in Translational Science
43
44. Biolink
Model
Where we are (year 2 or 5)
● All participants using common KG datamodel
● Early demonstrations of powerful federated queries
● LinkML advantages:
○ Edges are first-class citizens
○ Ontologies/OWL leveraged, but in background
44
45. NationalMicro
biome Data
Collaborative
Goal
● Make multi-omics microbiome data FAIR
○ Environments
○ Metagenomes
○ Metatranscriptomes
○ Metabolomics
○ Metaproteomics
● Leverage existing ontologies and standards
● Enable discovery in microbiome science
● Collaboration between multiple National Labs
45
46. NationalMicro
biome Data
Collaborative
Approach
● Formalize existing “checklist” standards
● Create modular schema
● Leverage MIxS, ENVO, PROV
Why LinkML
● Developers like JSON + JSON-Schema
● Biologists like spreadsheets
● “Semantic enums” work well
● Needed something that worked with
traditional technology (Mongo, Postgres)
● “Stealth semantics”
○ Everything has URI
○ All JSON is transparently JSON-LD
46
47. NationalMicro
biome Data
Collaborative
Where we are (year 2)
● Unified modular schema
● Easy for developers
○ System based mainly on JSON
exchange
○ RDF can be leveraged
○ Currently Mongo + Postgres +
TerminusDB
● Easy for biologists
○ Spreadsheets and validators created
from the schema
● Everything has semantics
○ “On-the-fly” JSON-LD
○ Satisfies FAIR mandate
47
48. Take Homes
Building the graph of life requires collaboration, social engineering, and lots of
curation
OWL is a powerful framework but it can be challenging to deploy effectively in an
information system
Integrating data into cohesive ontologies/KGs is hard but the return on investment is
high
LinkML provides a unifying layer over tooling… but more hands on deck required!
1
2
3
4
49. Some Links
● Open Bio Ontologies: http://obofoundry.org/resources
● ODK: https://github.com/INCATools/ontology-development-kit
● LinkML: https://linkml.github.io
● KG Hub: https://knowledge-graph-hub.github.io/
● GO: http://geneontology.org
● http://douroucouli.wordpress.com: My blog on all things OWL and Knowledge
Graphs