SlideShare a Scribd company logo
1 of 75
Scaling up semantics
lessons learned from across the life sciences
Chris Mungall
Lawrence Berkeley National Laboratory
cjmungall@lbl.gov
https://bit.ly/mungall-swat-23
Providing Semantics for Systems Biology
2
✅
❌
● The language of biological
knowledge is text and images
● Digestion into semantic
models necessary to be
properly understood by
machines
The Gene Ontology: Semantic Systems Biology
3
✅
❌ ✅
✅
45k classes
750,000 experimentally supported
annotations manually curated from
140,000 publications
1bn computed annotations
geneontology.org
Semantic resources and life science applications
ontologies Semantic resources applications
Molecular biology
Translational
research and health
Microbiome science
multi-Omics integration
semantic models
We have always sought to model the world
https://en.wikipedia.org/wiki/Antikythera_mechanism
model of
Preservation and rediscovery of
knowledge
Modeling paradigms in the semantic solar system
Schemas
KGs
LLMs
Modeling paradigms in the semantic solar system
Knowledge Graph planet
● Model of things
Schemas
LLMs
KGs
Modeling paradigms in the semantic solar system
Semantic schema planet
● Model of information
Knowledge Graph planet
● Model of things
Schemas
LLMs
KGs
Modeling paradigms in the semantic solar system
Semantic schema planet
● Model of information
Knowledge Graph planet
● Model of things
Large Language Model
planet
● Model of words
Schemas
LLMs
KGs
Planet KG: currently hot
Knowledge Graph planet
● Model of things
● Temperature: hot
KGs
Reese J, KG-COVID-19
10.1101/2020.08.17.254839
Previous iterations of KG planet: Semantic Web
2001
RDF
Differences:
● Ad hoc IDs >> URIs
● Scale >> Correctness
● Property Graphs >> Triples
● Graphs, ML >> Reasoning
● Shiny databases >> Standards
Previous iterations: bio-ontologies 1999->
GO
Differences:
● All entities >> just terms
● Quantity >> Quality
Ashburner et al Gene Ontology:
tool for the unification of biology.
Nature Genetics 2000
The interconnected OBO KG 2000->present
15k citations/year
1 billion annotations
10-1000 billion
annotations
Millions of environmental
samples annotated
Diagnosing rare
disease patients
uses
Analyzing
single-cell
seq
You’re
welcome!
obofoundry.org
OBO and Wikidata
OBO and Wikidata
OBO and Wikidata
● Many more OBO ontologies to go!
○ Join us on #wikidata on OBO slack
○ http://obofoundry.org
● Next up:
○ Environment Ontology (ENVO)
○ H
○ Join
OBO KG in Ubergraph
https://icbo-conference.github.io/icbo2022/papers/ICBO-2022_paper_5005.pdf
Inter-ontology connections
in Ubergraph, grouped by
OBO ontology
https://ubergraph.apps.renci.org/sparql
https://github.com/INCATools/ubergraph
● Triplestore with interlinked subset of OBO
● Inference pre-entailed using Relation Graph
● Pre-categorized using Biolink
● Powerful federated queries with uniprot,
wikidata, others
Use of semantics in the connected KG
glucan biosynthesis
(GO:0009250)
polysaccharide
biosynthesis
(GO:0000271)
⊑
≡
⊓
⊓
∃.has_output
≡
∃.has_output
⊑
Inferred by
OWL
reasoner
biosynthesis
(GO:0009058)
glucan
(CHEBI:37163)
biosynthesis
(GO:0009058)
polysaccharide
(CHEBI:18154)
Ontology Development Kit (ODK)
https://incatools.github.io/ontology-
development-kit
● ODK makes it easy to create a new GitHub
repo following best practice + automation
● standardized, customizable and automatically
executable workflows, and packages all
required tooling in a single Docker image
Matentzoglu et al (2022) Ontology Development Kit: a toolkit for
building, maintaining and standardizing biomedical ontologies.
Database 10.1093/database/baac087
NEW: Ontology Access Kit (OAK). Python library for working with ontologies https://incatools.github.io/ontology-
access-kit
Gene Ontology
Causal Activity Models:
Systems KGs
http://rdf.geneontology.org/ - SPARQL endpoint
Biologist View RDF view
https://www.alliancegenome.org/gene/HGNC:11584
● Pathway views on
Alliance gene pages
Previous iterations of planet KG
Semantic
Nets
Knowledge Representation with graphs
didn’t start with the semantic web and
ontologies…
Previous iterations: 1950s KGs
Semantic
Nets
“The cat is bitten by the dog”
“The cat is on the mat”
Richens RH, "Preprogramming for
mechanical translation" Mechanical
Translation 3 (1), July 1956, 20–25
Differences:
● Emphasis on language
● Applications: Machine Translation
Previous iterations: KGs in the 3rd century CE
Poryphrian
Trees
Differences:
● Philosophical inquiry >> Computation
● Theology >> science or empiricism
Curse of the eternal return
RDF
KGs
Ontologies
Curse of the eternal return
RDF
KGs
Ontologies
Embrace CURIEs
● URI are rarely used as identifiers outside RDF
frameworks
○ URIs are awkward to work with outside RDF tooling
● Solution: CURIEs
○ Prefixed IDs have been standard in bio-databases before RDF
■ GO:0005634
■ UniProtKB:P12345
○ Key: always make the prefixmap explicit
Hoyt, C. T., et al. (2022) The Unifying the
identification of biomedical entities with the
Bioregistry. Nature Scientific Data, s41597-022-
01807-3
https://bioregistry.io/
https://github.com/linkml/prefixmaps/
Can we have semantics in our KGs?
The story:
RDF and OWL provide semantics for the web
The reality:
OWL is great for terminology maintenance
OWL layering in RDF is problematic and ignored in non-RDF KGs
OWL is a poor fit for contingent knowledge common in the life sciences
Example: semantics of parthood
:Planet
:SolarSystem
:Galaxy
:partOf
:partOf
● Doesn’t capture quantification
Example: semantics of parthood in OWL
Class: Planet
SubClassOf: partOf some SolarSystem
Class: SolarSystem
SubClassOf: partOf some Galaxy
RDF/OWL Ontology Rendering
:Planet
:SolarSystem
:Galaxy
<blank>
<blank>
:partOf
owl:Restriction
rdfs:subClassOf
rdfs:subClassOf
owl:someValuesFrom
owl:someValuesFrom
rdf:type
rdf:type
owl:onProperty
owl:onProperty
● Complex monstrosity
● Misaligned with mental model
● Confuses developers
● Not amenable to
transitive/graph querying
○ E.g. what kinds of things are found
in galaxy?
OWLStar Property Graph conceptual model
:Planet
:SolarSystem
:Galaxy
:partOf
:partOf
Simple view (as used in Wikidata)
https://github.com/linkml/owlstar
OWLStar Property Graph conceptual model
:Planet
:SolarSystem
:Galaxy
owlstar:Some
ValuesFrom
:partOf
:partOf
owlstar:interpretation
owlstar:interpretation
Simple view (as used in Wikidata)
Decorate with semantics
Assumes Property Graph
Decoration can be deferred
https://github.com/linkml/owlstar
OWLStar Property Graph in RDFStar
:Planet
:SolarSystem
:Galaxy
owlstar:Some
ValuesFrom
:partOf
:partOf
owlstar:interpretation
owlstar:interpretation
<<:Planet :part-of :SolarSystem>> owlstar:interpretation owlstar:AllSomeInterpretation .
<<:SolarSystem :part-of :Galaxy>> owlstar:interpretation owlstar:AllSomeInterpretation .
https://github.com/linkml/owlstar
OWLStar Property Graph in OneGraph
:Planet
:SolarSystem
:Galaxy
owlstar:Some
ValuesFrom
:partOf
:partOf
owlstar:interpretation
owlstar:interpretation
Coming Soon: OneGraph
layering
https://github.com/linkml/owlstar
Semantics of contingent knowledge in KGs
FOL/model theory is the basis of RDF and OWL
… yet is a poor fit for much biological knowledge
See:
Schulz S: Strengths and limitations of formal ontologies in the biomedical domain PMC2904529
Rector A: On beyond Gruber: "Ontologies" in today's biomedical information systems and the limits of OWL PMID34384571
Schulz S: “Lmo-2 interacts with Elf-2” On the Meaning of Common Statements in Biomedical LiteratureKRMED 2006
:lmo-2 :elf-2
A new old logic layered on Property Graphs
:lmo-2 :elf-2
Multimodal semantics:
Use whatever the use case demands
Defeasible reasoning, bayesian probabilities, …
Or simply defer
<< your
semantics
here>>
https://github.com/linkml/owlstar
BioLink Model: hierarchical categories for KGs
Unni D et al (2022). Biolink Model: A universal schema for
knowledge graphs in clinical, biomedical, and
translational science. Clin. Tran. Sci
https://w3id.org/biolink
Standards in NCATS Biomedical Translator
Standards in NCATS Biomedical Translator
Planet Semantic Schema
Semantic schema planet
● Model of information
● Temperature: cool
Schemas
KGs
LLMs
Previous iteration: OWL misused as schema
language
Semantic schema planet
● Model of information
● Temperature: cool
Description
Logics
https://twitter.com/aaranged/status/1485670670134497281
Lessons:
● OWL is for modeling terminological knowledge
● OWL is not a language to describe data
Previous iteration: Frames 1990s-2000s
Semantic schema planet
● Model of information
● Temperature: cool
Frames
● OWL succeeded DAML+OIL
● DAML normally frame-like
Previous revolution: Frames 1970s-1990s
Semantic schema planet
● Model of information
● Temperature: cool
Frames
Lehmann, F. Semantic Networks. Computers Math. Applic. 23(2-5), 1-50, (1992)
https://doi.org/10.1016/0898-1221(92)90135-5
A Framework for Representing Knowledge: Minsky, M. MIT-AI
Laboratory Memo 306, June, 1974
We need semantic standards!
>1800 Databases
>1500 Standards
>1k Ontologies
~13.5m terms
Most data standards are underspecified
Most standards in fairsharing don’t have an associated schema or validator
If you’re lucky:
XML Schema (older)
JSON Schema (newer)
Bindings to ontology terms are frequently unclear
Most data standards are underspecified
Most standards in Fairsharing don’t have an associated schema or validator
If you’re lucky:
XML Schema (older)
JSON Schema (newer)
Bindings to ontology terms are frequently unclear
What about ShEx/SHACL standards?
Most data in the biosciences not shared as RDF
Some (e.g BioPAX) are really XML standards
Lack of adherence to schemas limits meta-
analyses
Metagenome standards analysis findings:
● Standards are underspecified
● Not machine actionable
● Public data has poor conformance
Data too noisy for environmental meta-analyses
Biosample metadata Actual
data!
LinkML: Linked Data
Modeling Language
https://linkml.io
https://github.com/linkml/linkml
Create data models in simple YAML files,
optionally annotated using ontologies
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
LinkML: Easy for all
https://linkml.io
https://github.com/linkml/linkml
Create data models in simple YAML files,
optionally annotated using ontologies
Biocurator
Data
Scientist
dct:creator
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
LinkML: Polyglot modeling
54
JSON-Schema
Python
Dataclasses
https://linkml.io
https://github.com/linkml/linkml
“Traditional”
Applications and
Infrastructure
SQL DDL
TSVs
Create data models in simple YAML files,
optionally annotated using ontologies
Compile to other
frameworks
Biocurator
Data
Scientist
dct:creator
LinkML: Bridges semantic frameworks
55
JSON-Schema
ShEx, SHACL
JSON-LD
Contexts
Python
Dataclasses
OWL
https://linkml.io
https://github.com/linkml/linkml
Semantic Web
Applications
And
Infrastructure
“Traditional”
Applications and
Infrastructure
SQL DDL
TSVs
Create data models in simple YAML files,
optionally annotated using ontologies
Compile to other
frameworks
Choose the right tools
for the job, no lock in
Biocurator
Data
Scientist
dct:creator
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
license: https://creativecommons.org/publicdomain/zero/1.0/
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
Export data to RDF and JSON-LD
>> Make the meaning of your schema
more explicit
>> Data integration hooks
Annotate schemas
with ontologies
Easy ontology support via value sets
Easy ontology support via value sets
Classes:
Person:
Slots:
age_in_years:
range: integer
LinkML Model
(YAML)
Model Serializations
CREATE TABLE "Person" (
age_in_years INTEGER
)
SQLDDL
<Person> CLOSED {
( $<Person_tes> (
<age_in_years> @linkml:Integer ?
;
}
ShEx, SHACL
Instance data as JSON/YAML, RDF, or TSV
from examples.basic import Person
from linkml.dumpers import json_dumper, rdf_dumper
sam = Person("1172438", first_name=["Samual", "J"], last_name="Snooter")
ann = Person("17a3923", first_name="Jill", last_name="Jones", knows=[sam.id])
print(json_dumper.dumps(ann))
print(yaml_dumper.dumps(ann))
print(rdf_dumper.dumps(ann, contexts="../examples/jsonld/basic.context.jsonld"))
{
"id": "17a3923",
"first_name": [
"Jill"
],
"last_name": "Jones",
"knows": [
"1172438"
],
"@type": "Person"
}
id: 17a3923
first_name:
- Jill
last_name: Jones
knows:
- '1172438'
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix sdo: <https://schema.org/> .
<https://example.org/linkml/hello-world/17a3923> a sdo:Person ;
foaf:knows <https://example.org/linkml/hello-
world/1172438> ;
sdo:familyName "Jones" ;
sdo:givenName "Jill" .
python
JSON output YAML output RDF output
60
Ongoing: coordinating with HDMF group on HDF5 readers/writers
Ecosystem
LinkML-OWL
LinkML-Datalog
DataHarmonizer
docgen
codegen
schemagen
LinkML adoption
Environmental microbiome data
Open, active, helpful,
inclusive community
https://github.com/linkml/linkml/graphs/contributors
Planet Large Language Model
Large Language Model
planet
● Model of words
● Temperature: ultra
hot
LLMs
LLMs lack semantic understanding
Large Language Model
planet
● Model of words
● Temperature: ultra
hot
LLMs
OntoGPT: using LLMs with semantic structures
Ontology layer onto OpenAI GPT-3 API
● Integrates prompt engineering, semantic data models, ontology annotation, and
OWL translation
Components:
● SPIRES: Stochastic Parrot Interrogation and Recursive Extraction of Semantics
● HALO: Hallucinating Latent Ontologies
● Authoring KGs with copilot
https://github.com/monarch-initiative/ontogpt
SPIRES: Stochastic Parrot Interrogation and Recursive Extraction of Semantics
Input: LinkML Schema
● Nested/Compositional
● Annotated with ontology
metadata (e.g. FOODON)
Schema can be nested; e.g
● Recipe has Step
● Step has utensils, actions,
input, output
● Input has foodtype, quantity
SPIRES: Stochastic Parrot Interrogation and Recursive Extraction of Semantics
Input: LinkML Schema
● Nested/Compositional
● Annotated with ontology
metadata (e.g. FOODON)
Input: Unstructured text
● E.g recipe, abstract
On medium heat melt the butter and sauté the onion and bell peppers.
Add the hamburger meat and cook until meat is well done.
Add the tomato sauce, salt, pepper and garlic powder.
Salt, pepper and garlic powder can be adjusted to your own tastes.
Cook noodles as directed.
Mix the sauce and noodles if you like, I keep them separated
INGREDIENTS
UNITS: US
1 small onion (chopped)
1 bell pepper (chopped)
2 tablespoons garlic powder
3 tablespoons butter
1 teaspoon salt
1 teaspoon pepper
2 (15 ounce) cans tomato sauce
1 (16 ounce) box spaghetti noodles
1 - 1 1⁄2 lb hamburger meat.
SPIRES: Stochastic Parrot Interrogation and Recursive Extraction of Semantics
Input: LinkML Schema
● Nested/Compositional
● Annotated with ontology
metadata (e.g. FOODON)
Input: Unstructured text
● E.g recipe, abstract
Process:
● Recursively query GPT3
via prompts
● Annotate using OAK and
Bioportal
● (Map results to OWL)
Unstructured Text
+ Prompt
Extract from the above text the following key-value
pairs…
JSON/RDF/OWL
SPIRES: Stochastic Parrot Interrogation and Recursive Extraction of Semantics
Input: LinkML Schema
● Nested/Compositional
● Annotated with ontology
metadata (e.g. FOODON)
Input: Unstructured text
● E.g recipe, abstract
Process:
● Recursively query GPT3
via prompts
● Annotate using OAK and
Bioportal
● (Map results to OWL)
Output:
● Nested JSON/RDF
● (OWL TBox model)
Standard schemas in life sciences, 1990s
https://web.archive.org/web/20080828214226/http://www.mged.org/Meetings/presentations/OMG/sld019.htm
The Life Sciences Research (LSR) group is a
consortium of people representing pharmaceutical
companies, academic institutions, software vendors
and hardware vendors from all over the world who are
working together within the Object Management
Group (OMG) to improve communication and
interoperability among computational resources in life
sciences research
Mission:
To improve the quality and utility of software and
information systems used in Life Sciences
Research through use of the Model Driven
Architecture (MDA), OMG technologies such as
UML and CORBA, and MDA-supported
technologies such as XML and EJB
Standard schemas in life sciences, 1990s
https://web.archive.org/web/20170922215834/http://www.omg.org/lsr/FAQ.html#why%20succeed
Conclusions
● Embrace change
● Don’t be wedded to any one technology stack
○ Why can’t BigTable, Neo4j, etc have semantics?
● We need to scale up semantics to cover ALL the data
● This requires an open, inclusive approach
Acknowledgments
● OBO and Ontologies
○ OBO Operations Group, James Overton, Nico Matentzoglu, Bjoern Peters
○ Gene Ontology Consortium, Ben Good, Seth Carbon, Laurent-Philippe Albou, Tremayne Mushayaham,
Dustin Ebert,Peter d’Eustachio, David Hill, Pascale Gaudet, Paul D Thomas
● Biolink, Translator, and KG-Hub:
○ NCATS Translator Data Modeling Group, Chris Bizon, Nomi Harris, Mike Bada, Matt Brush, Deepak Unni,
Sierra Moxon
○ KG-Hub group, Peter Robison, Melissa Haendel, David Osumi-Sutherland, Kevin Shafer, Marcin Joachimiak,
Seth Carbon, Tim Putman, Luca Cappalletti, Justin Reese
● LinkML:
○ LinkML community and developer group, Harold Solbrig, Sierra Moxon, Mark Miller, Sujay Patil
● OntoGPT:
○ Harry Caufield, Harshad Hegde

More Related Content

Similar to Scaling up semantics; lessons learned across the life sciences

Standardized visualisation of differences between model versions
Standardized visualisation of differences between model versionsStandardized visualisation of differences between model versions
Standardized visualisation of differences between model versionsVasundra Touré
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
 
Real World Applications of OWL
Real World Applications of OWLReal World Applications of OWL
Real World Applications of OWLMichel Dumontier
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitBOSC 2010
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsBenjamin Good
 
Pattern-based Ontology Engineering
Pattern-based Ontology EngineeringPattern-based Ontology Engineering
Pattern-based Ontology Engineeringkjanowicz
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...Anubhav Jain
 
Machine Learning for Molecules
Machine Learning for MoleculesMachine Learning for Molecules
Machine Learning for MoleculesIchigaku Takigawa
 
US2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyUS2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyChris Mungall
 
QALL-ME: Ontology and Semantic Web
QALL-ME: Ontology and Semantic WebQALL-ME: Ontology and Semantic Web
QALL-ME: Ontology and Semantic WebConstantin Orasan
 
2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible researchYannick Wurm
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeChris Mungall
 
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...Hilmar Lapp
 
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data WarehouseMaking Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data WarehouseJustin Clark-Casey
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationVladimir Alexiev, PhD, PMP
 
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017Kenta Oono
 
Avogadro 2 and Open Chemistry
Avogadro 2 and Open ChemistryAvogadro 2 and Open Chemistry
Avogadro 2 and Open ChemistryMarcus Hanwell
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogsandrea huang
 

Similar to Scaling up semantics; lessons learned across the life sciences (20)

Standardized visualisation of differences between model versions
Standardized visualisation of differences between model versionsStandardized visualisation of differences between model versions
Standardized visualisation of differences between model versions
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
 
Real World Applications of OWL
Real World Applications of OWLReal World Applications of OWL
Real World Applications of OWL
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkit
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity Models
 
Pattern-based Ontology Engineering
Pattern-based Ontology EngineeringPattern-based Ontology Engineering
Pattern-based Ontology Engineering
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...
 
Machine Learning for Molecules
Machine Learning for MoleculesMachine Learning for Molecules
Machine Learning for Molecules
 
US2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyUS2TS presentation on Gene Ontology
US2TS presentation on Gene Ontology
 
QALL-ME: Ontology and Semantic Web
QALL-ME: Ontology and Semantic WebQALL-ME: Ontology and Semantic Web
QALL-ME: Ontology and Semantic Web
 
2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of Life
 
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
 
Chado introduction
Chado introductionChado introduction
Chado introduction
 
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data WarehouseMaking Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
 
BioSD Tutorial 2014 Editition
BioSD Tutorial 2014 EdititionBioSD Tutorial 2014 Editition
BioSD Tutorial 2014 Editition
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
 
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
 
Avogadro 2 and Open Chemistry
Avogadro 2 and Open ChemistryAvogadro 2 and Open Chemistry
Avogadro 2 and Open Chemistry
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs
 

More from Chris Mungall

LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOLinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOChris Mungall
 
Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxChris Mungall
 
LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)Chris Mungall
 
LinkML presentation to Yosemite Group
LinkML presentation to Yosemite GroupLinkML presentation to Yosemite Group
LinkML presentation to Yosemite GroupChris Mungall
 
Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Chris Mungall
 
Representation of kidney structures in Uberon
Representation of kidney structures in UberonRepresentation of kidney structures in Uberon
Representation of kidney structures in UberonChris Mungall
 
SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)Chris Mungall
 
Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019Chris Mungall
 
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...Chris Mungall
 
Uberon: opening up to community contributions
Uberon: opening up to community contributionsUberon: opening up to community contributions
Uberon: opening up to community contributionsChris Mungall
 
Modeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesModeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesChris Mungall
 
Causal reasoning using the Relation Ontology
Causal reasoning using the Relation OntologyCausal reasoning using the Relation Ontology
Causal reasoning using the Relation OntologyChris Mungall
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Chris Mungall
 
Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Chris Mungall
 
GIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype DataGIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype DataChris Mungall
 
Mapping Phenotype Ontologies for Obesity and Diabetes
Mapping Phenotype Ontologies for Obesity and DiabetesMapping Phenotype Ontologies for Obesity and Diabetes
Mapping Phenotype Ontologies for Obesity and DiabetesChris Mungall
 
Uberon EBI industry workshop
Uberon EBI industry workshopUberon EBI industry workshop
Uberon EBI industry workshopChris Mungall
 

More from Chris Mungall (20)

LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOLinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
 
Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptx
 
LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)
 
LinkML presentation to Yosemite Group
LinkML presentation to Yosemite GroupLinkML presentation to Yosemite Group
LinkML presentation to Yosemite Group
 
Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...
 
Representation of kidney structures in Uberon
Representation of kidney structures in UberonRepresentation of kidney structures in Uberon
Representation of kidney structures in Uberon
 
SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)
 
Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019
 
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
 
Uberon: opening up to community contributions
Uberon: opening up to community contributionsUberon: opening up to community contributions
Uberon: opening up to community contributions
 
Modeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesModeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologies
 
Causal reasoning using the Relation Ontology
Causal reasoning using the Relation OntologyCausal reasoning using the Relation Ontology
Causal reasoning using the Relation Ontology
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015
 
ENVO GSC 2015
ENVO GSC 2015ENVO GSC 2015
ENVO GSC 2015
 
Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017
 
Kboom phenoday-2016
Kboom phenoday-2016Kboom phenoday-2016
Kboom phenoday-2016
 
BioMake PAG 2017
BioMake PAG 2017 BioMake PAG 2017
BioMake PAG 2017
 
GIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype DataGIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype Data
 
Mapping Phenotype Ontologies for Obesity and Diabetes
Mapping Phenotype Ontologies for Obesity and DiabetesMapping Phenotype Ontologies for Obesity and Diabetes
Mapping Phenotype Ontologies for Obesity and Diabetes
 
Uberon EBI industry workshop
Uberon EBI industry workshopUberon EBI industry workshop
Uberon EBI industry workshop
 

Recently uploaded

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 

Recently uploaded (20)

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 

Scaling up semantics; lessons learned across the life sciences

  • 1. Scaling up semantics lessons learned from across the life sciences Chris Mungall Lawrence Berkeley National Laboratory cjmungall@lbl.gov https://bit.ly/mungall-swat-23
  • 2. Providing Semantics for Systems Biology 2 ✅ ❌ ● The language of biological knowledge is text and images ● Digestion into semantic models necessary to be properly understood by machines
  • 3. The Gene Ontology: Semantic Systems Biology 3 ✅ ❌ ✅ ✅ 45k classes 750,000 experimentally supported annotations manually curated from 140,000 publications 1bn computed annotations geneontology.org
  • 4. Semantic resources and life science applications ontologies Semantic resources applications Molecular biology Translational research and health Microbiome science multi-Omics integration semantic models
  • 5. We have always sought to model the world https://en.wikipedia.org/wiki/Antikythera_mechanism model of
  • 7. Modeling paradigms in the semantic solar system Schemas KGs LLMs
  • 8. Modeling paradigms in the semantic solar system Knowledge Graph planet ● Model of things Schemas LLMs KGs
  • 9. Modeling paradigms in the semantic solar system Semantic schema planet ● Model of information Knowledge Graph planet ● Model of things Schemas LLMs KGs
  • 10. Modeling paradigms in the semantic solar system Semantic schema planet ● Model of information Knowledge Graph planet ● Model of things Large Language Model planet ● Model of words Schemas LLMs KGs
  • 11. Planet KG: currently hot Knowledge Graph planet ● Model of things ● Temperature: hot KGs Reese J, KG-COVID-19 10.1101/2020.08.17.254839
  • 12. Previous iterations of KG planet: Semantic Web 2001 RDF Differences: ● Ad hoc IDs >> URIs ● Scale >> Correctness ● Property Graphs >> Triples ● Graphs, ML >> Reasoning ● Shiny databases >> Standards
  • 13. Previous iterations: bio-ontologies 1999-> GO Differences: ● All entities >> just terms ● Quantity >> Quality Ashburner et al Gene Ontology: tool for the unification of biology. Nature Genetics 2000
  • 14. The interconnected OBO KG 2000->present 15k citations/year 1 billion annotations 10-1000 billion annotations Millions of environmental samples annotated Diagnosing rare disease patients uses Analyzing single-cell seq You’re welcome! obofoundry.org
  • 17. OBO and Wikidata ● Many more OBO ontologies to go! ○ Join us on #wikidata on OBO slack ○ http://obofoundry.org ● Next up: ○ Environment Ontology (ENVO) ○ H ○ Join
  • 18. OBO KG in Ubergraph https://icbo-conference.github.io/icbo2022/papers/ICBO-2022_paper_5005.pdf Inter-ontology connections in Ubergraph, grouped by OBO ontology https://ubergraph.apps.renci.org/sparql https://github.com/INCATools/ubergraph ● Triplestore with interlinked subset of OBO ● Inference pre-entailed using Relation Graph ● Pre-categorized using Biolink ● Powerful federated queries with uniprot, wikidata, others
  • 19. Use of semantics in the connected KG glucan biosynthesis (GO:0009250) polysaccharide biosynthesis (GO:0000271) ⊑ ≡ ⊓ ⊓ ∃.has_output ≡ ∃.has_output ⊑ Inferred by OWL reasoner biosynthesis (GO:0009058) glucan (CHEBI:37163) biosynthesis (GO:0009058) polysaccharide (CHEBI:18154)
  • 20. Ontology Development Kit (ODK) https://incatools.github.io/ontology- development-kit ● ODK makes it easy to create a new GitHub repo following best practice + automation ● standardized, customizable and automatically executable workflows, and packages all required tooling in a single Docker image Matentzoglu et al (2022) Ontology Development Kit: a toolkit for building, maintaining and standardizing biomedical ontologies. Database 10.1093/database/baac087 NEW: Ontology Access Kit (OAK). Python library for working with ontologies https://incatools.github.io/ontology- access-kit
  • 21. Gene Ontology Causal Activity Models: Systems KGs
  • 22. http://rdf.geneontology.org/ - SPARQL endpoint Biologist View RDF view
  • 24.
  • 25. Previous iterations of planet KG Semantic Nets Knowledge Representation with graphs didn’t start with the semantic web and ontologies…
  • 26. Previous iterations: 1950s KGs Semantic Nets “The cat is bitten by the dog” “The cat is on the mat” Richens RH, "Preprogramming for mechanical translation" Mechanical Translation 3 (1), July 1956, 20–25 Differences: ● Emphasis on language ● Applications: Machine Translation
  • 27. Previous iterations: KGs in the 3rd century CE Poryphrian Trees Differences: ● Philosophical inquiry >> Computation ● Theology >> science or empiricism
  • 28. Curse of the eternal return RDF KGs Ontologies
  • 29. Curse of the eternal return RDF KGs Ontologies
  • 30. Embrace CURIEs ● URI are rarely used as identifiers outside RDF frameworks ○ URIs are awkward to work with outside RDF tooling ● Solution: CURIEs ○ Prefixed IDs have been standard in bio-databases before RDF ■ GO:0005634 ■ UniProtKB:P12345 ○ Key: always make the prefixmap explicit Hoyt, C. T., et al. (2022) The Unifying the identification of biomedical entities with the Bioregistry. Nature Scientific Data, s41597-022- 01807-3 https://bioregistry.io/ https://github.com/linkml/prefixmaps/
  • 31. Can we have semantics in our KGs? The story: RDF and OWL provide semantics for the web The reality: OWL is great for terminology maintenance OWL layering in RDF is problematic and ignored in non-RDF KGs OWL is a poor fit for contingent knowledge common in the life sciences
  • 32. Example: semantics of parthood :Planet :SolarSystem :Galaxy :partOf :partOf ● Doesn’t capture quantification
  • 33. Example: semantics of parthood in OWL Class: Planet SubClassOf: partOf some SolarSystem Class: SolarSystem SubClassOf: partOf some Galaxy
  • 34. RDF/OWL Ontology Rendering :Planet :SolarSystem :Galaxy <blank> <blank> :partOf owl:Restriction rdfs:subClassOf rdfs:subClassOf owl:someValuesFrom owl:someValuesFrom rdf:type rdf:type owl:onProperty owl:onProperty ● Complex monstrosity ● Misaligned with mental model ● Confuses developers ● Not amenable to transitive/graph querying ○ E.g. what kinds of things are found in galaxy?
  • 35. OWLStar Property Graph conceptual model :Planet :SolarSystem :Galaxy :partOf :partOf Simple view (as used in Wikidata) https://github.com/linkml/owlstar
  • 36. OWLStar Property Graph conceptual model :Planet :SolarSystem :Galaxy owlstar:Some ValuesFrom :partOf :partOf owlstar:interpretation owlstar:interpretation Simple view (as used in Wikidata) Decorate with semantics Assumes Property Graph Decoration can be deferred https://github.com/linkml/owlstar
  • 37. OWLStar Property Graph in RDFStar :Planet :SolarSystem :Galaxy owlstar:Some ValuesFrom :partOf :partOf owlstar:interpretation owlstar:interpretation <<:Planet :part-of :SolarSystem>> owlstar:interpretation owlstar:AllSomeInterpretation . <<:SolarSystem :part-of :Galaxy>> owlstar:interpretation owlstar:AllSomeInterpretation . https://github.com/linkml/owlstar
  • 38. OWLStar Property Graph in OneGraph :Planet :SolarSystem :Galaxy owlstar:Some ValuesFrom :partOf :partOf owlstar:interpretation owlstar:interpretation Coming Soon: OneGraph layering https://github.com/linkml/owlstar
  • 39. Semantics of contingent knowledge in KGs FOL/model theory is the basis of RDF and OWL … yet is a poor fit for much biological knowledge See: Schulz S: Strengths and limitations of formal ontologies in the biomedical domain PMC2904529 Rector A: On beyond Gruber: "Ontologies" in today's biomedical information systems and the limits of OWL PMID34384571 Schulz S: “Lmo-2 interacts with Elf-2” On the Meaning of Common Statements in Biomedical LiteratureKRMED 2006 :lmo-2 :elf-2
  • 40. A new old logic layered on Property Graphs :lmo-2 :elf-2 Multimodal semantics: Use whatever the use case demands Defeasible reasoning, bayesian probabilities, … Or simply defer << your semantics here>> https://github.com/linkml/owlstar
  • 41. BioLink Model: hierarchical categories for KGs Unni D et al (2022). Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science. Clin. Tran. Sci https://w3id.org/biolink
  • 42. Standards in NCATS Biomedical Translator
  • 43. Standards in NCATS Biomedical Translator
  • 44. Planet Semantic Schema Semantic schema planet ● Model of information ● Temperature: cool Schemas KGs LLMs
  • 45. Previous iteration: OWL misused as schema language Semantic schema planet ● Model of information ● Temperature: cool Description Logics https://twitter.com/aaranged/status/1485670670134497281 Lessons: ● OWL is for modeling terminological knowledge ● OWL is not a language to describe data
  • 46. Previous iteration: Frames 1990s-2000s Semantic schema planet ● Model of information ● Temperature: cool Frames ● OWL succeeded DAML+OIL ● DAML normally frame-like
  • 47. Previous revolution: Frames 1970s-1990s Semantic schema planet ● Model of information ● Temperature: cool Frames Lehmann, F. Semantic Networks. Computers Math. Applic. 23(2-5), 1-50, (1992) https://doi.org/10.1016/0898-1221(92)90135-5 A Framework for Representing Knowledge: Minsky, M. MIT-AI Laboratory Memo 306, June, 1974
  • 48. We need semantic standards! >1800 Databases >1500 Standards >1k Ontologies ~13.5m terms
  • 49. Most data standards are underspecified Most standards in fairsharing don’t have an associated schema or validator If you’re lucky: XML Schema (older) JSON Schema (newer) Bindings to ontology terms are frequently unclear
  • 50. Most data standards are underspecified Most standards in Fairsharing don’t have an associated schema or validator If you’re lucky: XML Schema (older) JSON Schema (newer) Bindings to ontology terms are frequently unclear What about ShEx/SHACL standards? Most data in the biosciences not shared as RDF Some (e.g BioPAX) are really XML standards
  • 51. Lack of adherence to schemas limits meta- analyses Metagenome standards analysis findings: ● Standards are underspecified ● Not machine actionable ● Public data has poor conformance Data too noisy for environmental meta-analyses Biosample metadata Actual data!
  • 52. LinkML: Linked Data Modeling Language https://linkml.io https://github.com/linkml/linkml Create data models in simple YAML files, optionally annotated using ontologies id: https://example.org/linkml/hello-world title: Really basic LinkML model name: hello-world version: 0.0.1 prefixes: linkml: https://w3id.org/linkml/ sdo: https://schema.org/ ex: https://example.org/linkml/hello-world/ default_prefix: ex default_curi_maps: - semweb_context imports: - linkml:types classes: Person: description: Minimal information about a person class_uri: sdo:Person attributes: id: identifier: true slot_uri: sdo:taxID first_name: required: true slot_uri: sdo:givenName multivalued: true last_name: required: true slot_uri: sdo:familyName knows: range: Person multivalued: true slot_uri: foaf:knows
  • 53. LinkML: Easy for all https://linkml.io https://github.com/linkml/linkml Create data models in simple YAML files, optionally annotated using ontologies Biocurator Data Scientist dct:creator id: https://example.org/linkml/hello-world title: Really basic LinkML model name: hello-world version: 0.0.1 prefixes: linkml: https://w3id.org/linkml/ sdo: https://schema.org/ ex: https://example.org/linkml/hello-world/ default_prefix: ex default_curi_maps: - semweb_context imports: - linkml:types classes: Person: description: Minimal information about a person class_uri: sdo:Person attributes: id: identifier: true slot_uri: sdo:taxID first_name: required: true slot_uri: sdo:givenName multivalued: true last_name: required: true slot_uri: sdo:familyName knows: range: Person multivalued: true slot_uri: foaf:knows
  • 54. LinkML: Polyglot modeling 54 JSON-Schema Python Dataclasses https://linkml.io https://github.com/linkml/linkml “Traditional” Applications and Infrastructure SQL DDL TSVs Create data models in simple YAML files, optionally annotated using ontologies Compile to other frameworks Biocurator Data Scientist dct:creator
  • 55. LinkML: Bridges semantic frameworks 55 JSON-Schema ShEx, SHACL JSON-LD Contexts Python Dataclasses OWL https://linkml.io https://github.com/linkml/linkml Semantic Web Applications And Infrastructure “Traditional” Applications and Infrastructure SQL DDL TSVs Create data models in simple YAML files, optionally annotated using ontologies Compile to other frameworks Choose the right tools for the job, no lock in Biocurator Data Scientist dct:creator
  • 56. id: https://example.org/linkml/hello-world title: Really basic LinkML model name: hello-world license: https://creativecommons.org/publicdomain/zero/1.0/ version: 0.0.1 prefixes: linkml: https://w3id.org/linkml/ sdo: https://schema.org/ ex: https://example.org/linkml/hello-world/ default_prefix: ex default_curi_maps: - semweb_context imports: - linkml:types classes: Person: description: Minimal information about a person class_uri: sdo:Person attributes: id: identifier: true slot_uri: sdo:taxID first_name: required: true slot_uri: sdo:givenName multivalued: true last_name: required: true slot_uri: sdo:familyName knows: range: Person multivalued: true slot_uri: foaf:knows Export data to RDF and JSON-LD >> Make the meaning of your schema more explicit >> Data integration hooks Annotate schemas with ontologies
  • 57. Easy ontology support via value sets
  • 58. Easy ontology support via value sets
  • 59. Classes: Person: Slots: age_in_years: range: integer LinkML Model (YAML) Model Serializations CREATE TABLE "Person" ( age_in_years INTEGER ) SQLDDL <Person> CLOSED { ( $<Person_tes> ( <age_in_years> @linkml:Integer ? ; } ShEx, SHACL
  • 60. Instance data as JSON/YAML, RDF, or TSV from examples.basic import Person from linkml.dumpers import json_dumper, rdf_dumper sam = Person("1172438", first_name=["Samual", "J"], last_name="Snooter") ann = Person("17a3923", first_name="Jill", last_name="Jones", knows=[sam.id]) print(json_dumper.dumps(ann)) print(yaml_dumper.dumps(ann)) print(rdf_dumper.dumps(ann, contexts="../examples/jsonld/basic.context.jsonld")) { "id": "17a3923", "first_name": [ "Jill" ], "last_name": "Jones", "knows": [ "1172438" ], "@type": "Person" } id: 17a3923 first_name: - Jill last_name: Jones knows: - '1172438' @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix sdo: <https://schema.org/> . <https://example.org/linkml/hello-world/17a3923> a sdo:Person ; foaf:knows <https://example.org/linkml/hello- world/1172438> ; sdo:familyName "Jones" ; sdo:givenName "Jill" . python JSON output YAML output RDF output 60 Ongoing: coordinating with HDMF group on HDF5 readers/writers
  • 64. Open, active, helpful, inclusive community https://github.com/linkml/linkml/graphs/contributors
  • 65. Planet Large Language Model Large Language Model planet ● Model of words ● Temperature: ultra hot LLMs
  • 66. LLMs lack semantic understanding Large Language Model planet ● Model of words ● Temperature: ultra hot LLMs
  • 67. OntoGPT: using LLMs with semantic structures Ontology layer onto OpenAI GPT-3 API ● Integrates prompt engineering, semantic data models, ontology annotation, and OWL translation Components: ● SPIRES: Stochastic Parrot Interrogation and Recursive Extraction of Semantics ● HALO: Hallucinating Latent Ontologies ● Authoring KGs with copilot https://github.com/monarch-initiative/ontogpt
  • 68. SPIRES: Stochastic Parrot Interrogation and Recursive Extraction of Semantics Input: LinkML Schema ● Nested/Compositional ● Annotated with ontology metadata (e.g. FOODON) Schema can be nested; e.g ● Recipe has Step ● Step has utensils, actions, input, output ● Input has foodtype, quantity
  • 69. SPIRES: Stochastic Parrot Interrogation and Recursive Extraction of Semantics Input: LinkML Schema ● Nested/Compositional ● Annotated with ontology metadata (e.g. FOODON) Input: Unstructured text ● E.g recipe, abstract On medium heat melt the butter and sauté the onion and bell peppers. Add the hamburger meat and cook until meat is well done. Add the tomato sauce, salt, pepper and garlic powder. Salt, pepper and garlic powder can be adjusted to your own tastes. Cook noodles as directed. Mix the sauce and noodles if you like, I keep them separated INGREDIENTS UNITS: US 1 small onion (chopped) 1 bell pepper (chopped) 2 tablespoons garlic powder 3 tablespoons butter 1 teaspoon salt 1 teaspoon pepper 2 (15 ounce) cans tomato sauce 1 (16 ounce) box spaghetti noodles 1 - 1 1⁄2 lb hamburger meat.
  • 70. SPIRES: Stochastic Parrot Interrogation and Recursive Extraction of Semantics Input: LinkML Schema ● Nested/Compositional ● Annotated with ontology metadata (e.g. FOODON) Input: Unstructured text ● E.g recipe, abstract Process: ● Recursively query GPT3 via prompts ● Annotate using OAK and Bioportal ● (Map results to OWL) Unstructured Text + Prompt Extract from the above text the following key-value pairs… JSON/RDF/OWL
  • 71. SPIRES: Stochastic Parrot Interrogation and Recursive Extraction of Semantics Input: LinkML Schema ● Nested/Compositional ● Annotated with ontology metadata (e.g. FOODON) Input: Unstructured text ● E.g recipe, abstract Process: ● Recursively query GPT3 via prompts ● Annotate using OAK and Bioportal ● (Map results to OWL) Output: ● Nested JSON/RDF ● (OWL TBox model)
  • 72. Standard schemas in life sciences, 1990s https://web.archive.org/web/20080828214226/http://www.mged.org/Meetings/presentations/OMG/sld019.htm The Life Sciences Research (LSR) group is a consortium of people representing pharmaceutical companies, academic institutions, software vendors and hardware vendors from all over the world who are working together within the Object Management Group (OMG) to improve communication and interoperability among computational resources in life sciences research Mission: To improve the quality and utility of software and information systems used in Life Sciences Research through use of the Model Driven Architecture (MDA), OMG technologies such as UML and CORBA, and MDA-supported technologies such as XML and EJB
  • 73. Standard schemas in life sciences, 1990s https://web.archive.org/web/20170922215834/http://www.omg.org/lsr/FAQ.html#why%20succeed
  • 74. Conclusions ● Embrace change ● Don’t be wedded to any one technology stack ○ Why can’t BigTable, Neo4j, etc have semantics? ● We need to scale up semantics to cover ALL the data ● This requires an open, inclusive approach
  • 75. Acknowledgments ● OBO and Ontologies ○ OBO Operations Group, James Overton, Nico Matentzoglu, Bjoern Peters ○ Gene Ontology Consortium, Ben Good, Seth Carbon, Laurent-Philippe Albou, Tremayne Mushayaham, Dustin Ebert,Peter d’Eustachio, David Hill, Pascale Gaudet, Paul D Thomas ● Biolink, Translator, and KG-Hub: ○ NCATS Translator Data Modeling Group, Chris Bizon, Nomi Harris, Mike Bada, Matt Brush, Deepak Unni, Sierra Moxon ○ KG-Hub group, Peter Robison, Melissa Haendel, David Osumi-Sutherland, Kevin Shafer, Marcin Joachimiak, Seth Carbon, Tim Putman, Luca Cappalletti, Justin Reese ● LinkML: ○ LinkML community and developer group, Harold Solbrig, Sierra Moxon, Mark Miller, Sujay Patil ● OntoGPT: ○ Harry Caufield, Harshad Hegde

Editor's Notes

  1. In the life sciences we are generating massive amounts of new data and knowledge, the way we communicate this knowledge is through words and pictures
  2. TODO: add glyphs for genes, drugs, diseases, samples
  3. TODO
  4. TODO: chart should only show KG TIMEPOINT: 5m
  5. Limiting to see KGs purely as branding exercise: genuine difference in emphasis and communities
  6. TIMEPOINT: 10m
  7. This paradigm now used throughout OBO
  8. TIMEPOINT: 15m
  9. TODO: emphasize circularity more in figure TIMEPOINT: 20m
  10. TODO: emphasize circularity more in figure opportunity
  11. TODO: tidy
  12. TIMEPOINT: 25m
  13. TIMEPOINT: 30m
  14. Should back to Elif’s talk. Figure 3: A conceptual overview of the Translator Consortium use case and implementation plan. Variants of the gene RHOBTB2 cause epilepsy, learning difficulties, and movement disorders, presumably due to an accumulation of RHOBTB2 protein. In order to learn more about the relationship between RHOBTB2 and neurological disorders, including potential drug treatment, investigators would typically need to review data and publications from multiple resources–a manual, time-consuming, and often challenging process due to incompatibilities in data models. In Translator, data is normalized to Biolink Model such that users can pose queries and rapidly receive harmonized answers derived from independent data sources.
  15. Figure 3: A conceptual overview of the Translator Consortium use case and implementation plan. Variants of the gene RHOBTB2 cause epilepsy, learning difficulties, and movement disorders, presumably due to an accumulation of RHOBTB2 protein. In order to learn more about the relationship between RHOBTB2 and neurological disorders, including potential drug treatment, investigators would typically need to review data and publications from multiple resources–a manual, time-consuming, and often challenging process due to incompatibilities in data models. In Translator, data is normalized to Biolink Model such that users can pose queries and rapidly receive harmonized answers derived from independent data sources.
  16. TIMEPOINT: 35m
  17. TODO: adjust text
  18. TODO: split
  19. TODO: split
  20. TIMEPOINT: 40m
  21. TIMEPOINT: 45m
  22. TODO
  23. TIMEPOINT: 50m
  24. TODO: emphasize iteration and imbuing LLMs with semantics
  25. TODO
  26. TODO
  27. TODO
  28. TODO
  29. TIMEPOINT: 55m
  30. TODO