SlideShare a Scribd company logo
Standardized biological
knowledge graphs: The
BioLink Model
Chris Mungall
2018-04-13
Challenge: making representations of biological
knowledge interoperable
OMIM
MGI
HGNC
FlyBase
ClinVar
CTD
DrugBank
UniProt
BGeeDb
GO
SGD
RGD PomBase Monarch
WormBasePharmGKB
Reactome
GWAS
catalog
CHEMBL
ENSEMBL
DrugBank
BioGrid
KEGG
Panther
ZFIN
Xen
Base
Animal
QTLdb
What do we mean by knowledge here?
● Data, sensu lato : collection of values in some organized form
○ Data, sensu stricto: Output of a data collection process
■ Instrumentation or observation; raw or processed; not altered by curation
■ Serves role as evidence
■ E.g. read count in RNAseq experiment OR examination of KO mouse
○ Metadata: Data about data (or more typically) datasets
■ May be curated at source, post-hoc, manually or automatically
■ E.g. details about an RNAseq experiment (factors, instrumentation, sample prep)
○ Knowledge: Propositional assertions inferred from data
■ Something you need evidence for
■ E.g.
● gene G is expressed in tissue T under condition C
● Knocking out G gives rise to phenotype P with high penetrance
● Many bio-”databases” are actually “knowledge bases” (by this definition)
● Usual caveats:
○ Other definitions available, divisions can be murky, this is a guide rather than dogma, etc
Solution: Standard schema/datamodel for all of
biology?
Haven’t we been here before?
http://www.mged.org/Meetings/presentations/OMG/sld019.htm
Haven’t we been here before?
http://www.mged.org/Meetings/presentations/OMG/sld019.htm
Complexity and fluidity of biological knowledge vs
schema rigidity
// hypothetical strawman schema
class Gene {
String: name
String: function
String: phenotype
Protein: product
Int: start
Int: end
String: chromosome
}
Bad assumption:
- Genes actually have multiple functions
- String representation rather than vocab
Bad assumption:
- Different builds?
- Should be inherited from generic
seq feature
Bad assumption:
- Genes can have multiple products
- Products not necessarily genes
- What about transcript, exon, ...
}
The backwards evolution of schema languages
● 80s: ER, SQL DDL
○ Basis in FOL, formal algebra/calculus
● 90s: OO, UML, Description Logics
○ Rich polymorphism
● 00s: XML, SOAP
○ Can’t even...
● 10s: JSON and JSON-Schema
○ No polymorphism
○ Limited typing
○ Tree-based
○ Geared towards web-apps, not rich modeling
What works: Open-ended knowledge representation
using RDF Graphs plus OWL
● RDF: minimal
representation
model for
representing simple
facts as edges
● OWL: encodes
semantics about
RDF graphs
Success of OWL:
Bio-Ontologies
● One datamodel (OWL),
covers rich variety of
interconnected biology
● APIs, SPARQL, ...
http://obofoundry.org/ontology/uberon.html
Analogous approach in biological databases
● GMOD Chado
● Graph-like database layered
over RDBMS
● Allowed flexibility and
extensibility
● Large uptake by small MODs
Mungall, C. J., Emmert, D. B., et al. (2007) A. Bioinformatics, 23(13),
i337-346. http://doi.org/10.1093/bioinformatics/btm189
https://github.com/GMOD/Chado
Knowledge Graphs, the most pluripotent representation of data, are no longer as exotic or
experimental as they were 10 years ago. Goofaceamazonlink etc are all using them to some degree.
Challenge: too much flexibility
● With flexible schema-free graph-based
representations, multiple ways of modeling
things
● OWL provides semantic open-world
biological constraints
○ All genes are located_on exactly 1 chromosome
● Software often needs more rigid closed-
world information model constraints
○ Information System A: gene can be located on
multiple contigs/scaffolds
○ Information System B: locational info not relevant
BioLink Model Approach
● Define a powerful underlying metamodel
○ Mix aspects of closed-world UML and open-world OWL
○ Build for extensibility
○ Define exports: UML, SQL DDL, GraphQL, Json-Schema, Java, ...
● Define core biological types (E)
○ Gene, disease, anatomical entity, disease, ...
○ Cede detailed typology to ontologies
● Define core properties (R)
○ Id, name, synonym
○ Part-of, interacts-with, gives-rise-to
● Define taxonomy of relationships (extension of R)
○ Gene-gene-interaction, gene-tissue-expression
● Extensibility through use-case specific profiles
https://biolink.github.io/biolink-model
Browsing the model
● YAML source
● Autogenerated website docs: https://biolink.github.io/biolink-model
● OWL export
○ Protege
○ Bioportal
● JSON-Schema (lossy unless working in JSON-LD)
● GraphQL (lossy)
● UML Diagrams (lossy)
https://biolink.github.io/biolink-model
https://github.com/biolink/biolink-model/blob/master/biolink-model.yaml
https://github.com/biolink/biolink-model/tree/master/ontology
https://bioportal.bioontology.org/ontologies/BLM/
https://biolink.github.io/biolink-model/
Entities
Relationships
Aka assertions, facts, propositions, reified triples, edges ...
Profiles
● Different projects require different views of the data
○ E.g. omission/inclusion of different fields
○ Denormalizations
○ Inlining vs referencing
● Metamodel supports remixing and mixins
● One core conceptual model
● Different serializations for different profiles
● Well-defined transforms
● Caveat: this part is not well documented yet
How do I use it? How do I get data?
● Data model is serialization neutral
○ Plus: Flexible
○ Negative: Additional layer of abstraction
● RDF/Turtle serialization
○ http://data.monarchinitiative.org/ttl/
○ Turtle conforms to association patterns
● Property graphs
○ http://neo4j.monarchinitiative.org/
● JSON
○ Challenge: lack of polymorphism
○ Available via generic model or specific models
○ API http://api.monarchinitiative.org/api/
○ Preview: https://data.monarchinitiative.org/json/
○ BDBags of JSON coming soon
What NOT to use the biolink-model for
● Raw data
● Metadata about a dataset
● ..
● However..
○ Underlying metamodel may be useful in providing flexible representations of these
○ Currently aligning with FHIR metamodel
How does this relate to KC7?
● One view: DC is about data sensu stricto, and metadata
○ Search = lightweight ontology (syns + subsumption) + metadata datamodels
○ “Knowledge bases” have their own specialized search interfaces developed by specialists
○ No role for a standard KM in DC
● Counterview
○ We’re not trying to compete with bio-KBs
○ We want to leverage knowledge to enhance data search
■ Analogous to how google KG enhances google search
○ Example:
■ Find TopMed studies relevant to my disease
● Exploit KG linkages between disease-phenotype, phenotype-variable, phenotype-
gene

More Related Content

What's hot

Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
Hady Elsahar
 
Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptx
Chris Mungall
 
Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach
Neo4j
 
Semantic Web - Ontologies
Semantic Web - OntologiesSemantic Web - Ontologies
Semantic Web - Ontologies
Serge Linckels
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
Jose Emilio Labra Gayo
 
SHACL by example
SHACL by exampleSHACL by example
SHACL by example
Jose Emilio Labra Gayo
 
23 sparql
23 sparql23 sparql
GSK: How Knowledge Graphs Improve Clinical Reporting Workflows
GSK: How Knowledge Graphs Improve Clinical Reporting WorkflowsGSK: How Knowledge Graphs Improve Clinical Reporting Workflows
GSK: How Knowledge Graphs Improve Clinical Reporting Workflows
Neo4j
 
Meaning Representations for Natural Languages: Design, Models and Applications
Meaning Representations for Natural Languages:  Design, Models and ApplicationsMeaning Representations for Natural Languages:  Design, Models and Applications
Meaning Representations for Natural Languages: Design, Models and Applications
Yunyao Li
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AI
Neo4j
 
Knowledge graphs ilaria maresi the hyve 23apr2020
Knowledge graphs   ilaria maresi the hyve 23apr2020Knowledge graphs   ilaria maresi the hyve 23apr2020
Knowledge graphs ilaria maresi the hyve 23apr2020
Pistoia Alliance
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
Leon Dohmen
 
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityBuilding an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
Joshua Shinavier
 
Jena Programming
Jena ProgrammingJena Programming
Jena Programming
Myungjin Lee
 
Querying Linked Data with SPARQL
Querying Linked Data with SPARQLQuerying Linked Data with SPARQL
Querying Linked Data with SPARQL
Olaf Hartig
 
New Concepts: Fictitious and Non-human Personages
New Concepts: Fictitious and Non-human PersonagesNew Concepts: Fictitious and Non-human Personages
New Concepts: Fictitious and Non-human Personages
ALAeLearningSolutions
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
Ajay Taneja
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
Elena Simperl
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Mihai Criveti
 
Generative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of MaterialsGenerative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of Materials
Deakin University
 

What's hot (20)

Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
 
Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptx
 
Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach
 
Semantic Web - Ontologies
Semantic Web - OntologiesSemantic Web - Ontologies
Semantic Web - Ontologies
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
 
SHACL by example
SHACL by exampleSHACL by example
SHACL by example
 
23 sparql
23 sparql23 sparql
23 sparql
 
GSK: How Knowledge Graphs Improve Clinical Reporting Workflows
GSK: How Knowledge Graphs Improve Clinical Reporting WorkflowsGSK: How Knowledge Graphs Improve Clinical Reporting Workflows
GSK: How Knowledge Graphs Improve Clinical Reporting Workflows
 
Meaning Representations for Natural Languages: Design, Models and Applications
Meaning Representations for Natural Languages:  Design, Models and ApplicationsMeaning Representations for Natural Languages:  Design, Models and Applications
Meaning Representations for Natural Languages: Design, Models and Applications
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AI
 
Knowledge graphs ilaria maresi the hyve 23apr2020
Knowledge graphs   ilaria maresi the hyve 23apr2020Knowledge graphs   ilaria maresi the hyve 23apr2020
Knowledge graphs ilaria maresi the hyve 23apr2020
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
 
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityBuilding an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
 
Jena Programming
Jena ProgrammingJena Programming
Jena Programming
 
Querying Linked Data with SPARQL
Querying Linked Data with SPARQLQuerying Linked Data with SPARQL
Querying Linked Data with SPARQL
 
New Concepts: Fictitious and Non-human Personages
New Concepts: Fictitious and Non-human PersonagesNew Concepts: Fictitious and Non-human Personages
New Concepts: Fictitious and Non-human Personages
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
 
Generative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of MaterialsGenerative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of Materials
 

Similar to Introduction to the BioLink datamodel

MADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxMADICES Mungall 2022.pptx
MADICES Mungall 2022.pptx
Chris Mungall
 
Scaling up semantics; lessons learned across the life sciences
Scaling up semantics; lessons learned across the life sciencesScaling up semantics; lessons learned across the life sciences
Scaling up semantics; lessons learned across the life sciences
Chris Mungall
 
Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledge
Benjamin Good
 
Exploring Large Chemical Data Sets
Exploring Large Chemical Data SetsExploring Large Chemical Data Sets
Exploring Large Chemical Data Sets
kylelutz
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Ontotext
 
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryOpen Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Marcus Hanwell
 
Chado introduction
Chado introductionChado introduction
Chado introduction
Chris Mungall
 
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data WarehouseMaking Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Justin Clark-Casey
 
All together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of lifeAll together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of life
Chris Mungall
 
Avogadro 2 and Open Chemistry
Avogadro 2 and Open ChemistryAvogadro 2 and Open Chemistry
Avogadro 2 and Open Chemistry
Marcus Hanwell
 
Experiences with logic programming in bioinformatics
Experiences with logic programming in bioinformaticsExperiences with logic programming in bioinformatics
Experiences with logic programming in bioinformatics
Chris Mungall
 
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of DatadipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
eXascale Infolab
 
L15.pptx
L15.pptxL15.pptx
L15.pptx
ImonBennett
 
General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DS
Roopesh Kohad
 
Graph databases in computational bioloby: case of neo4j and TitanDB
Graph databases in computational bioloby: case of neo4j and TitanDBGraph databases in computational bioloby: case of neo4j and TitanDB
Graph databases in computational bioloby: case of neo4j and TitanDB
Andrei KUCHARAVY
 
Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?
Timothy Danford
 
BioSD Tutorial 2014 Editition
BioSD Tutorial 2014 EdititionBioSD Tutorial 2014 Editition
BioSD Tutorial 2014 Editition
Rothamsted Research, UK
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Benjamin Good
 
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMinerA Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
Rothamsted Research, UK
 
Capturing the context: one small(ish step for modellers, one giant leap for m...
Capturing the context: one small(ish step for modellers, one giant leap for m...Capturing the context: one small(ish step for modellers, one giant leap for m...
Capturing the context: one small(ish step for modellers, one giant leap for m...
FAIRDOM
 

Similar to Introduction to the BioLink datamodel (20)

MADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxMADICES Mungall 2022.pptx
MADICES Mungall 2022.pptx
 
Scaling up semantics; lessons learned across the life sciences
Scaling up semantics; lessons learned across the life sciencesScaling up semantics; lessons learned across the life sciences
Scaling up semantics; lessons learned across the life sciences
 
Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledge
 
Exploring Large Chemical Data Sets
Exploring Large Chemical Data SetsExploring Large Chemical Data Sets
Exploring Large Chemical Data Sets
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
 
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryOpen Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
 
Chado introduction
Chado introductionChado introduction
Chado introduction
 
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data WarehouseMaking Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
 
All together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of lifeAll together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of life
 
Avogadro 2 and Open Chemistry
Avogadro 2 and Open ChemistryAvogadro 2 and Open Chemistry
Avogadro 2 and Open Chemistry
 
Experiences with logic programming in bioinformatics
Experiences with logic programming in bioinformaticsExperiences with logic programming in bioinformatics
Experiences with logic programming in bioinformatics
 
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of DatadipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
 
L15.pptx
L15.pptxL15.pptx
L15.pptx
 
General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DS
 
Graph databases in computational bioloby: case of neo4j and TitanDB
Graph databases in computational bioloby: case of neo4j and TitanDBGraph databases in computational bioloby: case of neo4j and TitanDB
Graph databases in computational bioloby: case of neo4j and TitanDB
 
Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?
 
BioSD Tutorial 2014 Editition
BioSD Tutorial 2014 EdititionBioSD Tutorial 2014 Editition
BioSD Tutorial 2014 Editition
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity Models
 
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMinerA Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
 
Capturing the context: one small(ish step for modellers, one giant leap for m...
Capturing the context: one small(ish step for modellers, one giant leap for m...Capturing the context: one small(ish step for modellers, one giant leap for m...
Capturing the context: one small(ish step for modellers, one giant leap for m...
 

More from Chris Mungall

LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)
Chris Mungall
 
Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...
Chris Mungall
 
Representation of kidney structures in Uberon
Representation of kidney structures in UberonRepresentation of kidney structures in Uberon
Representation of kidney structures in Uberon
Chris Mungall
 
SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)
Chris Mungall
 
Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019
Chris Mungall
 
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
Chris Mungall
 
Uberon: opening up to community contributions
Uberon: opening up to community contributionsUberon: opening up to community contributions
Uberon: opening up to community contributions
Chris Mungall
 
Modeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesModeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologies
Chris Mungall
 
Causal reasoning using the Relation Ontology
Causal reasoning using the Relation OntologyCausal reasoning using the Relation Ontology
Causal reasoning using the Relation Ontology
Chris Mungall
 
US2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyUS2TS presentation on Gene Ontology
US2TS presentation on Gene Ontology
Chris Mungall
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015
Chris Mungall
 
ENVO GSC 2015
ENVO GSC 2015ENVO GSC 2015
ENVO GSC 2015
Chris Mungall
 
Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017
Chris Mungall
 
Kboom phenoday-2016
Kboom phenoday-2016Kboom phenoday-2016
Kboom phenoday-2016
Chris Mungall
 
BioMake PAG 2017
BioMake PAG 2017 BioMake PAG 2017
BioMake PAG 2017
Chris Mungall
 
GIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype DataGIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype Data
Chris Mungall
 
Mapping Phenotype Ontologies for Obesity and Diabetes
Mapping Phenotype Ontologies for Obesity and DiabetesMapping Phenotype Ontologies for Obesity and Diabetes
Mapping Phenotype Ontologies for Obesity and Diabetes
Chris Mungall
 
Uberon EBI industry workshop
Uberon EBI industry workshopUberon EBI industry workshop
Uberon EBI industry workshop
Chris Mungall
 
Increased Expressivity of Gene Ontology Annotations - Biocuration 2013
Increased Expressivity of Gene Ontology Annotations - Biocuration 2013Increased Expressivity of Gene Ontology Annotations - Biocuration 2013
Increased Expressivity of Gene Ontology Annotations - Biocuration 2013
Chris Mungall
 
Uberon PAG 2013
Uberon PAG 2013Uberon PAG 2013
Uberon PAG 2013
Chris Mungall
 

More from Chris Mungall (20)

LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)
 
Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...
 
Representation of kidney structures in Uberon
Representation of kidney structures in UberonRepresentation of kidney structures in Uberon
Representation of kidney structures in Uberon
 
SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)
 
Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019
 
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
 
Uberon: opening up to community contributions
Uberon: opening up to community contributionsUberon: opening up to community contributions
Uberon: opening up to community contributions
 
Modeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesModeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologies
 
Causal reasoning using the Relation Ontology
Causal reasoning using the Relation OntologyCausal reasoning using the Relation Ontology
Causal reasoning using the Relation Ontology
 
US2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyUS2TS presentation on Gene Ontology
US2TS presentation on Gene Ontology
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015
 
ENVO GSC 2015
ENVO GSC 2015ENVO GSC 2015
ENVO GSC 2015
 
Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017
 
Kboom phenoday-2016
Kboom phenoday-2016Kboom phenoday-2016
Kboom phenoday-2016
 
BioMake PAG 2017
BioMake PAG 2017 BioMake PAG 2017
BioMake PAG 2017
 
GIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype DataGIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype Data
 
Mapping Phenotype Ontologies for Obesity and Diabetes
Mapping Phenotype Ontologies for Obesity and DiabetesMapping Phenotype Ontologies for Obesity and Diabetes
Mapping Phenotype Ontologies for Obesity and Diabetes
 
Uberon EBI industry workshop
Uberon EBI industry workshopUberon EBI industry workshop
Uberon EBI industry workshop
 
Increased Expressivity of Gene Ontology Annotations - Biocuration 2013
Increased Expressivity of Gene Ontology Annotations - Biocuration 2013Increased Expressivity of Gene Ontology Annotations - Biocuration 2013
Increased Expressivity of Gene Ontology Annotations - Biocuration 2013
 
Uberon PAG 2013
Uberon PAG 2013Uberon PAG 2013
Uberon PAG 2013
 

Recently uploaded

AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
LengamoLAppostilic
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills MN
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
RDhivya6
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Selcen Ozturkcan
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
frank0071
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
PsychoTech Services
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 

Recently uploaded (20)

AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 

Introduction to the BioLink datamodel

  • 1. Standardized biological knowledge graphs: The BioLink Model Chris Mungall 2018-04-13
  • 2. Challenge: making representations of biological knowledge interoperable OMIM MGI HGNC FlyBase ClinVar CTD DrugBank UniProt BGeeDb GO SGD RGD PomBase Monarch WormBasePharmGKB Reactome GWAS catalog CHEMBL ENSEMBL DrugBank BioGrid KEGG Panther ZFIN Xen Base Animal QTLdb
  • 3. What do we mean by knowledge here? ● Data, sensu lato : collection of values in some organized form ○ Data, sensu stricto: Output of a data collection process ■ Instrumentation or observation; raw or processed; not altered by curation ■ Serves role as evidence ■ E.g. read count in RNAseq experiment OR examination of KO mouse ○ Metadata: Data about data (or more typically) datasets ■ May be curated at source, post-hoc, manually or automatically ■ E.g. details about an RNAseq experiment (factors, instrumentation, sample prep) ○ Knowledge: Propositional assertions inferred from data ■ Something you need evidence for ■ E.g. ● gene G is expressed in tissue T under condition C ● Knocking out G gives rise to phenotype P with high penetrance ● Many bio-”databases” are actually “knowledge bases” (by this definition) ● Usual caveats: ○ Other definitions available, divisions can be murky, this is a guide rather than dogma, etc
  • 5. Haven’t we been here before? http://www.mged.org/Meetings/presentations/OMG/sld019.htm
  • 6. Haven’t we been here before? http://www.mged.org/Meetings/presentations/OMG/sld019.htm
  • 7. Complexity and fluidity of biological knowledge vs schema rigidity // hypothetical strawman schema class Gene { String: name String: function String: phenotype Protein: product Int: start Int: end String: chromosome } Bad assumption: - Genes actually have multiple functions - String representation rather than vocab Bad assumption: - Different builds? - Should be inherited from generic seq feature Bad assumption: - Genes can have multiple products - Products not necessarily genes - What about transcript, exon, ... }
  • 8. The backwards evolution of schema languages ● 80s: ER, SQL DDL ○ Basis in FOL, formal algebra/calculus ● 90s: OO, UML, Description Logics ○ Rich polymorphism ● 00s: XML, SOAP ○ Can’t even... ● 10s: JSON and JSON-Schema ○ No polymorphism ○ Limited typing ○ Tree-based ○ Geared towards web-apps, not rich modeling
  • 9. What works: Open-ended knowledge representation using RDF Graphs plus OWL ● RDF: minimal representation model for representing simple facts as edges ● OWL: encodes semantics about RDF graphs
  • 10. Success of OWL: Bio-Ontologies ● One datamodel (OWL), covers rich variety of interconnected biology ● APIs, SPARQL, ... http://obofoundry.org/ontology/uberon.html
  • 11. Analogous approach in biological databases ● GMOD Chado ● Graph-like database layered over RDBMS ● Allowed flexibility and extensibility ● Large uptake by small MODs Mungall, C. J., Emmert, D. B., et al. (2007) A. Bioinformatics, 23(13), i337-346. http://doi.org/10.1093/bioinformatics/btm189 https://github.com/GMOD/Chado
  • 12. Knowledge Graphs, the most pluripotent representation of data, are no longer as exotic or experimental as they were 10 years ago. Goofaceamazonlink etc are all using them to some degree.
  • 13. Challenge: too much flexibility ● With flexible schema-free graph-based representations, multiple ways of modeling things ● OWL provides semantic open-world biological constraints ○ All genes are located_on exactly 1 chromosome ● Software often needs more rigid closed- world information model constraints ○ Information System A: gene can be located on multiple contigs/scaffolds ○ Information System B: locational info not relevant
  • 14. BioLink Model Approach ● Define a powerful underlying metamodel ○ Mix aspects of closed-world UML and open-world OWL ○ Build for extensibility ○ Define exports: UML, SQL DDL, GraphQL, Json-Schema, Java, ... ● Define core biological types (E) ○ Gene, disease, anatomical entity, disease, ... ○ Cede detailed typology to ontologies ● Define core properties (R) ○ Id, name, synonym ○ Part-of, interacts-with, gives-rise-to ● Define taxonomy of relationships (extension of R) ○ Gene-gene-interaction, gene-tissue-expression ● Extensibility through use-case specific profiles https://biolink.github.io/biolink-model
  • 15. Browsing the model ● YAML source ● Autogenerated website docs: https://biolink.github.io/biolink-model ● OWL export ○ Protege ○ Bioportal ● JSON-Schema (lossy unless working in JSON-LD) ● GraphQL (lossy) ● UML Diagrams (lossy) https://biolink.github.io/biolink-model
  • 21. Relationships Aka assertions, facts, propositions, reified triples, edges ...
  • 22. Profiles ● Different projects require different views of the data ○ E.g. omission/inclusion of different fields ○ Denormalizations ○ Inlining vs referencing ● Metamodel supports remixing and mixins ● One core conceptual model ● Different serializations for different profiles ● Well-defined transforms ● Caveat: this part is not well documented yet
  • 23. How do I use it? How do I get data? ● Data model is serialization neutral ○ Plus: Flexible ○ Negative: Additional layer of abstraction ● RDF/Turtle serialization ○ http://data.monarchinitiative.org/ttl/ ○ Turtle conforms to association patterns ● Property graphs ○ http://neo4j.monarchinitiative.org/ ● JSON ○ Challenge: lack of polymorphism ○ Available via generic model or specific models ○ API http://api.monarchinitiative.org/api/ ○ Preview: https://data.monarchinitiative.org/json/ ○ BDBags of JSON coming soon
  • 24. What NOT to use the biolink-model for ● Raw data ● Metadata about a dataset ● .. ● However.. ○ Underlying metamodel may be useful in providing flexible representations of these ○ Currently aligning with FHIR metamodel
  • 25. How does this relate to KC7? ● One view: DC is about data sensu stricto, and metadata ○ Search = lightweight ontology (syns + subsumption) + metadata datamodels ○ “Knowledge bases” have their own specialized search interfaces developed by specialists ○ No role for a standard KM in DC ● Counterview ○ We’re not trying to compete with bio-KBs ○ We want to leverage knowledge to enhance data search ■ Analogous to how google KG enhances google search ○ Example: ■ Find TopMed studies relevant to my disease ● Exploit KG linkages between disease-phenotype, phenotype-variable, phenotype- gene