The Linked Data Modeling Language:
A framework for describing and integrating
rich biomedical data
Chris Mungall
Lawrence Berkeley National Laboratory
@chrismungall
June 2022
Outline
Structuring
our data: we
can do better
Ontologies
and
vocabularies:
necessary
but not
sufficient
The LinkML
framework
Applications
Proliferation of entities, standards, and ontologies
>1800 Databases
>1500 Standards
>900 Ontologies
~13.5m terms
220m proteins
65bn genes
227m substances
>?? Data
Commonses
Data Integration is a constant challenge
Omics Data Phenotype /
clinical
Data
insights
Common vocabularies are key
Open Biological Ontologies (OBO)
http://obofoundry.org
1. Well-integrated
Modular ontologies
(SUBSET of bioportal)
E.g GO, CHEBI, …
2. Provide technical and
sociotechnological
framework for
cooperation
4. Allow us to describe all
of the things
3. Provide tools,
best practices and
infrastructure for
forging new
ontologies
@obofoundry
Ontologies: Example uses
Discovery and
machine reasoning
Text Mining
Data
Standardization
Ontologies: Example uses
Discovery and
machine reasoning
Text Mining
Data
Standardization
Bada et al 2017 Gold-standard ontology-
based anatomical annotation in the
CRAFT Corpus
Ontologies: Example uses
Discovery and
machine reasoning
Text Mining
Data
Standardization
Maladi et al 2015 Ontology application
and use at the ENCODE DCC
Example:
Uberon
Mungall et al. (2012). Genome Biology, 13(1),
R5. doi:10.1186/gb-2012-13-1-r5
http://obofoundry.org/ontology/uberon
Uberon usage in standards
https://fairsharing.org/graph/1197
Note: this is missing links to
hubmap, LINCS, MIxS, ENCODE,
….
Uberon usage in standards
https://fairshake.cloud/metric/140
Common Fund
Data Ecosystem
(CFDE) FAIR rubric
Uberon (mis) usage in standards
https://fairshake.cloud/metric/140
Many standards are not
Machine Actionable
Many standards are specified in PDF
or Excel
● Not machine-actionable
● No validators
● Unclear semantics
Lack of automatic validation or data
submission assistance leads to noise
Results
in
(actual data from INSDC)
Challenge: ontologies still underused
Challenge: Terms are not enough
Incompatible
Schemas !
The common situation
Semantic Web building blocks
URIs for identity
http://purl.uniprot.org/P12345
http://schema.org/name
Properties
Triples
For connecting nodes into
graphs
Classes
RDFS:
Schemas
OWL:
Ontology
Rule
Languages
Shape
Languages
ISO-11179: Metadata Standards
ISO-11179: Metadata Standards
Semantic tooling has still not permeated
RDF
OWL
SPARQL
SHACL
ShEx
Rules
Semantic web
developer
Developer
Data Scientist
Scientists, Clinicians, ..
Python
SQL
Mongo
JSON
Pandas
BigTable
SPARK
Scikit-learn
Excel
Web Portals
???
ISO-11179
CDEs
Can we have a universal framework?
LinkML: The basics
THE STANDARD
A meta-datamodel for structuring your data
TOOLS
Pragmatic developer and curator
friendly tools for working with data
definition
Class Slot
element
has
0..*
is_a 0..1
mixin 0..n
range
0..1
schema
imports
0..*
Validators
Data Converters
Compatibility tools
Data entry
Schema inference
LinkML Landscape
JSON-Schema
ShEx, SHACL
JSON-LD
Contexts
Python
Dataclasses
OWL
https://linkml.io
https://github.com/linkml/linkml
Semantic Web
Applications
And
Infrastructure
“Traditional”
Applications and
Infrastructure
SQL DDL
TSVs
Create datamodels in simple YAML files,
optionally annotated using ontologies
Compile to other
frameworks
Choose the right tools
for the job, no lock in
Biocurator
Data
Scientist
dct:creator
Use Case: Making FAIR standards
As a….
I want to…
So that…
DCC wrangler
Design a data
submission standard
Experimentalists can easily
submit to the DCC
And…
The DCC can integrate it in the
context of other DCC data
It is maximally “FAIR” for
community reuse
And…
X
First Step: Create your datamodel
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
YAML conformant
to LinkML
standard
Metadata
Dependencies
Namespaces
Actual Datamodel
Option A: Author
YAML directly
First Step: Create your datamode
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
YAML conformant
to LinkML
standard
Metadata
Dependencies
Namespaces
Actual Datamodel
Option B: Author using
schemasheets
First Step: Create your datamodel
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
YAML conformant
to LinkML
standard
Metadata
Dependencies
Namespaces
Actual Datamodel
Option C: Get intelligent
assistance from
autoschema tools
Autoschema /
model enrichment
framework
Semi-structured
datasources
refine
Tooling for submitters
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
Option A: Generate
spreadsheet templates
empty sheet
Validator
populatedsheet
Tooling for submitters
Option B: Use
DataHarmonizer
(Hsaio Lab)
https://github.com/cidgoh/DataHarmonizer
Tooling for submitters
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
Option A: Generate
JSON-Schema
JSON-Schema
Validator
(JSON-Schema)
populatedJSON
searchable documentation for your
standard
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
https://cancerdhc.github.io/ccdhmodel/v1.1
Incorporating ontologies into standards
Standardizing descriptors
aka. column headers, data dictionary,
metadata elements, CDEs
● Tissue sampling site
● Person name
● Symptoms
● Vital status
● Heart rate
● age
● Datafile sha256
● Sources
● Assay
● …
Standardizing value sets
I.e. column headers, data dictionary,
metadata elements, CDEs
● Organ slim (uberon)
● Phenotypic abnormality (HPO)
● Vital status (PATO)
● Assay Type (OBI)
● …
Annotating
schemas with
vocabularies
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
license: https://creativecommons.org/publicdomain/zero/1.0/
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
Export data to RDF and JSON-LD
Make the meaning of your schema
more explicit
Data integration hooks
Easy ontology support via value sets
Slots:
...
gender:
description: Person gender
slot_uri: SDO:gender
range: gender_enum
classes:
Thing:
description: The most generic type of item.
class_uri: SDO:Thing
slots:
- identifier
- url
- name
Person:
is_a: Thing
class_uri: SDO:Person
description: A person (alive, dead, undead, or
fictional).
slots:
- givenName
- additionalName
- gender
39
LinkML incorporates ISO/IEC 11179-3 meaning/data model
ISO/IEC 11179-
3:2013(E)
ISO/IEC 11179-3:2013(E) p. 101
A value that can appear in the data
What a particular value means
40
ISO/IEC 11179-3 divides enums into representation / meaning
enums:
gender_enum:
description: |-
Gender of something, ...
permissible_values:
0: Male Gender
1: Female Gender
8: Mixed Gender
Enumeration flavors
41
LinkML supports simple enums
Enumeration flavors
gender_enum_2:
code_set: sdo:GenderType
permissible_values:
0:
description: Male Gender
meaning: sdo:Male
1:
description: Female Gender
meaning: sdo:Female
8:
description: Mixed Gender
42
LinkML supports meaning link
gender_enum_3:
code_set: sdo:GenderType
pv_formula: CODE
43
LinkML supports meanings
drawn from conceptual domain
Other schema features
Rich type system
Inheritance
/polymorphism
Complex boolean and
conditional constraints
Developer support:
Bindings for python,
typescript
Use in cancer data harmonization
Clinical
Terminologies
OBO Ontologies
(Uberon, CL, GO, …)
https://cancerdhc.github.io/ccdhmodel
Cancer Research Data
Commons (CRDC)
Harmonized Data Model
● Modeling team
● Terminology team
● Unified framework
Core concepts:
Specimen
Subject
Observation
Environmental microbiome data
https://microbiomedata.github.io/nmdc-schema/
Metadata standards to enable
microbiome analysis
● Environmental sample data
● Omics data
● Community development model
Core concepts:
Study
Environmental Sample
Workflow Analysis
(genomic, metabolomic, ..)
Data Object
Environmental microbiome data
Biological Knowledge Graphs
Biolink: Goals
The charge from NCATS:
● Create a Knowledge Graph Schema
● Encompass all biology from molecules through to clinical entities
● Get 20 different sites using the same data model
○ (oh: Only a handful of which use RDF/OWL)
● Do it quickly and break new ground in Translational Science
Biolink-Model: A schema for biological KGs
● Expressed in LinkML
● Integrates multiple Knowledge Graphs and
Knowledge Providers
Biolink Model
https://biolink.github.io/biolink-model
Other adopters
Future Plans
Hardening and adoption
● Governance around metamodel standard
● Documentation and tutorials
● Coordinate with major data providers and communities
● Completion of roundtrip conversion to multiple frameworks
● Highly efficient data readers/writers
Tool ecosystem
● Web based tooling
● Integrate automated assistant features
● Change management
● Rule systems
Currently driven by
community contributions
LinkML Summary
Challenges
● Authoring standards and data models is hard
● Adding semantics is harder
● Developing tools (UI, validators) is expensive
LinkML
● Designed to be easy to use
● Layer in semantics as you need them
● Leverage multiple tool stacks
● Increasing adoption
Acknowledgements
Person GitHub Institution
Harold Solbrig @hsolbrig JHU
Sujay Patil @sujaypatil96 LBNL
Sierra Moxon @sierra-moxon LBNL
Gaurav Vaidya @gaurav RENCI
Bill Duncan @wdduncan LBNL, UFL
Kevin Schaper @kevinschaper CU Anschutz
Joe Flack @joeflack4 JHU
Deepak Unni @deepakunni3 EMBL
Vincent Emonet @vemonet U Maastricht
Mark Miller @turbomam LBNL
Harshad Hegde @hrshdhgd LBNL
Person GitHub Institution
Dazhi Jiao @jiaola JHU
Matt Brush @mbrush CU Anschutz
Brian Furner @bfurner U Chicago
Tim Putman @putmantime CU Anschutz
Nico Matentzoglu @matentzn Semanticly
Ramona Walls @ramonawalls Critical Path Institute
Victoria Soesanto @victoriasoesanto CU Anschutz
Melissa Haendel @mellybelly CU Anschutz
U01HG009453
Intelligent Concept
Assistant
HG010860-01
Phenomics First CEGS

LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO

  • 1.
    The Linked DataModeling Language: A framework for describing and integrating rich biomedical data Chris Mungall Lawrence Berkeley National Laboratory @chrismungall June 2022
  • 2.
    Outline Structuring our data: we cando better Ontologies and vocabularies: necessary but not sufficient The LinkML framework Applications
  • 3.
    Proliferation of entities,standards, and ontologies >1800 Databases >1500 Standards >900 Ontologies ~13.5m terms 220m proteins 65bn genes 227m substances >?? Data Commonses
  • 4.
    Data Integration isa constant challenge Omics Data Phenotype / clinical Data insights
  • 5.
  • 6.
    Open Biological Ontologies(OBO) http://obofoundry.org 1. Well-integrated Modular ontologies (SUBSET of bioportal) E.g GO, CHEBI, … 2. Provide technical and sociotechnological framework for cooperation 4. Allow us to describe all of the things 3. Provide tools, best practices and infrastructure for forging new ontologies @obofoundry
  • 7.
    Ontologies: Example uses Discoveryand machine reasoning Text Mining Data Standardization
  • 8.
    Ontologies: Example uses Discoveryand machine reasoning Text Mining Data Standardization Bada et al 2017 Gold-standard ontology- based anatomical annotation in the CRAFT Corpus
  • 9.
    Ontologies: Example uses Discoveryand machine reasoning Text Mining Data Standardization Maladi et al 2015 Ontology application and use at the ENCODE DCC
  • 10.
    Example: Uberon Mungall et al.(2012). Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5 http://obofoundry.org/ontology/uberon
  • 11.
    Uberon usage instandards https://fairsharing.org/graph/1197 Note: this is missing links to hubmap, LINCS, MIxS, ENCODE, ….
  • 12.
    Uberon usage instandards https://fairshake.cloud/metric/140 Common Fund Data Ecosystem (CFDE) FAIR rubric
  • 13.
    Uberon (mis) usagein standards https://fairshake.cloud/metric/140
  • 14.
    Many standards arenot Machine Actionable Many standards are specified in PDF or Excel ● Not machine-actionable ● No validators ● Unclear semantics Lack of automatic validation or data submission assistance leads to noise Results in (actual data from INSDC)
  • 15.
  • 16.
    Challenge: Terms arenot enough Incompatible Schemas !
  • 17.
  • 19.
    Semantic Web buildingblocks URIs for identity http://purl.uniprot.org/P12345 http://schema.org/name Properties Triples For connecting nodes into graphs Classes RDFS: Schemas OWL: Ontology Rule Languages Shape Languages
  • 20.
  • 21.
  • 22.
    Semantic tooling hasstill not permeated RDF OWL SPARQL SHACL ShEx Rules Semantic web developer Developer Data Scientist Scientists, Clinicians, .. Python SQL Mongo JSON Pandas BigTable SPARK Scikit-learn Excel Web Portals ??? ISO-11179 CDEs
  • 23.
    Can we havea universal framework?
  • 24.
    LinkML: The basics THESTANDARD A meta-datamodel for structuring your data TOOLS Pragmatic developer and curator friendly tools for working with data definition Class Slot element has 0..* is_a 0..1 mixin 0..n range 0..1 schema imports 0..* Validators Data Converters Compatibility tools Data entry Schema inference
  • 25.
    LinkML Landscape JSON-Schema ShEx, SHACL JSON-LD Contexts Python Dataclasses OWL https://linkml.io https://github.com/linkml/linkml SemanticWeb Applications And Infrastructure “Traditional” Applications and Infrastructure SQL DDL TSVs Create datamodels in simple YAML files, optionally annotated using ontologies Compile to other frameworks Choose the right tools for the job, no lock in Biocurator Data Scientist dct:creator
  • 26.
    Use Case: MakingFAIR standards As a…. I want to… So that… DCC wrangler Design a data submission standard Experimentalists can easily submit to the DCC And… The DCC can integrate it in the context of other DCC data It is maximally “FAIR” for community reuse And… X
  • 27.
    First Step: Createyour datamodel id: https://example.org/linkml/hello-world title: Really basic LinkML model name: hello-world version: 0.0.1 prefixes: linkml: https://w3id.org/linkml/ sdo: https://schema.org/ ex: https://example.org/linkml/hello-world/ default_prefix: ex default_curi_maps: - semweb_context imports: - linkml:types classes: Person: description: Minimal information about a person class_uri: sdo:Person attributes: id: identifier: true slot_uri: sdo:taxID first_name: required: true slot_uri: sdo:givenName multivalued: true last_name: required: true slot_uri: sdo:familyName knows: range: Person multivalued: true slot_uri: foaf:knows YAML conformant to LinkML standard Metadata Dependencies Namespaces Actual Datamodel Option A: Author YAML directly
  • 28.
    First Step: Createyour datamode id: https://example.org/linkml/hello-world title: Really basic LinkML model name: hello-world version: 0.0.1 prefixes: linkml: https://w3id.org/linkml/ sdo: https://schema.org/ ex: https://example.org/linkml/hello-world/ default_prefix: ex default_curi_maps: - semweb_context imports: - linkml:types classes: Person: description: Minimal information about a person class_uri: sdo:Person attributes: id: identifier: true slot_uri: sdo:taxID first_name: required: true slot_uri: sdo:givenName multivalued: true last_name: required: true slot_uri: sdo:familyName knows: range: Person multivalued: true slot_uri: foaf:knows YAML conformant to LinkML standard Metadata Dependencies Namespaces Actual Datamodel Option B: Author using schemasheets
  • 29.
    First Step: Createyour datamodel id: https://example.org/linkml/hello-world title: Really basic LinkML model name: hello-world version: 0.0.1 prefixes: linkml: https://w3id.org/linkml/ sdo: https://schema.org/ ex: https://example.org/linkml/hello-world/ default_prefix: ex default_curi_maps: - semweb_context imports: - linkml:types classes: Person: description: Minimal information about a person class_uri: sdo:Person attributes: id: identifier: true slot_uri: sdo:taxID first_name: required: true slot_uri: sdo:givenName multivalued: true last_name: required: true slot_uri: sdo:familyName knows: range: Person multivalued: true slot_uri: foaf:knows YAML conformant to LinkML standard Metadata Dependencies Namespaces Actual Datamodel Option C: Get intelligent assistance from autoschema tools Autoschema / model enrichment framework Semi-structured datasources refine
  • 30.
    Tooling for submitters id:https://example.org/linkml/hello-world title: Really basic LinkML model name: hello-world version: 0.0.1 prefixes: linkml: https://w3id.org/linkml/ sdo: https://schema.org/ ex: https://example.org/linkml/hello-world/ default_prefix: ex default_curi_maps: - semweb_context imports: - linkml:types classes: Person: description: Minimal information about a person class_uri: sdo:Person attributes: id: identifier: true slot_uri: sdo:taxID first_name: required: true slot_uri: sdo:givenName multivalued: true last_name: required: true slot_uri: sdo:familyName knows: range: Person multivalued: true slot_uri: foaf:knows Option A: Generate spreadsheet templates empty sheet Validator populatedsheet
  • 31.
    Tooling for submitters OptionB: Use DataHarmonizer (Hsaio Lab) https://github.com/cidgoh/DataHarmonizer
  • 32.
    Tooling for submitters id:https://example.org/linkml/hello-world title: Really basic LinkML model name: hello-world version: 0.0.1 prefixes: linkml: https://w3id.org/linkml/ sdo: https://schema.org/ ex: https://example.org/linkml/hello-world/ default_prefix: ex default_curi_maps: - semweb_context imports: - linkml:types classes: Person: description: Minimal information about a person class_uri: sdo:Person attributes: id: identifier: true slot_uri: sdo:taxID first_name: required: true slot_uri: sdo:givenName multivalued: true last_name: required: true slot_uri: sdo:familyName knows: range: Person multivalued: true slot_uri: foaf:knows Option A: Generate JSON-Schema JSON-Schema Validator (JSON-Schema) populatedJSON
  • 33.
    searchable documentation foryour standard id: https://example.org/linkml/hello-world title: Really basic LinkML model name: hello-world version: 0.0.1 prefixes: linkml: https://w3id.org/linkml/ sdo: https://schema.org/ ex: https://example.org/linkml/hello-world/ default_prefix: ex default_curi_maps: - semweb_context imports: - linkml:types classes: Person: description: Minimal information about a person class_uri: sdo:Person attributes: id: identifier: true slot_uri: sdo:taxID first_name: required: true slot_uri: sdo:givenName multivalued: true last_name: required: true slot_uri: sdo:familyName knows: range: Person multivalued: true slot_uri: foaf:knows
  • 34.
  • 36.
    Incorporating ontologies intostandards Standardizing descriptors aka. column headers, data dictionary, metadata elements, CDEs ● Tissue sampling site ● Person name ● Symptoms ● Vital status ● Heart rate ● age ● Datafile sha256 ● Sources ● Assay ● … Standardizing value sets I.e. column headers, data dictionary, metadata elements, CDEs ● Organ slim (uberon) ● Phenotypic abnormality (HPO) ● Vital status (PATO) ● Assay Type (OBI) ● …
  • 37.
    Annotating schemas with vocabularies id: https://example.org/linkml/hello-world title:Really basic LinkML model name: hello-world license: https://creativecommons.org/publicdomain/zero/1.0/ version: 0.0.1 prefixes: linkml: https://w3id.org/linkml/ sdo: https://schema.org/ ex: https://example.org/linkml/hello-world/ default_prefix: ex default_curi_maps: - semweb_context imports: - linkml:types classes: Person: description: Minimal information about a person class_uri: sdo:Person attributes: id: identifier: true slot_uri: sdo:taxID first_name: required: true slot_uri: sdo:givenName multivalued: true last_name: required: true slot_uri: sdo:familyName knows: range: Person multivalued: true slot_uri: foaf:knows Export data to RDF and JSON-LD Make the meaning of your schema more explicit Data integration hooks
  • 38.
    Easy ontology supportvia value sets
  • 39.
    Slots: ... gender: description: Person gender slot_uri:SDO:gender range: gender_enum classes: Thing: description: The most generic type of item. class_uri: SDO:Thing slots: - identifier - url - name Person: is_a: Thing class_uri: SDO:Person description: A person (alive, dead, undead, or fictional). slots: - givenName - additionalName - gender 39 LinkML incorporates ISO/IEC 11179-3 meaning/data model ISO/IEC 11179- 3:2013(E)
  • 40.
    ISO/IEC 11179-3:2013(E) p.101 A value that can appear in the data What a particular value means 40 ISO/IEC 11179-3 divides enums into representation / meaning
  • 41.
    enums: gender_enum: description: |- Gender ofsomething, ... permissible_values: 0: Male Gender 1: Female Gender 8: Mixed Gender Enumeration flavors 41 LinkML supports simple enums
  • 42.
    Enumeration flavors gender_enum_2: code_set: sdo:GenderType permissible_values: 0: description:Male Gender meaning: sdo:Male 1: description: Female Gender meaning: sdo:Female 8: description: Mixed Gender 42 LinkML supports meaning link
  • 43.
    gender_enum_3: code_set: sdo:GenderType pv_formula: CODE 43 LinkMLsupports meanings drawn from conceptual domain
  • 44.
    Other schema features Richtype system Inheritance /polymorphism Complex boolean and conditional constraints Developer support: Bindings for python, typescript
  • 45.
    Use in cancerdata harmonization Clinical Terminologies OBO Ontologies (Uberon, CL, GO, …) https://cancerdhc.github.io/ccdhmodel Cancer Research Data Commons (CRDC) Harmonized Data Model ● Modeling team ● Terminology team ● Unified framework Core concepts: Specimen Subject Observation
  • 46.
    Environmental microbiome data https://microbiomedata.github.io/nmdc-schema/ Metadatastandards to enable microbiome analysis ● Environmental sample data ● Omics data ● Community development model Core concepts: Study Environmental Sample Workflow Analysis (genomic, metabolomic, ..) Data Object
  • 47.
  • 48.
    Biological Knowledge Graphs Biolink:Goals The charge from NCATS: ● Create a Knowledge Graph Schema ● Encompass all biology from molecules through to clinical entities ● Get 20 different sites using the same data model ○ (oh: Only a handful of which use RDF/OWL) ● Do it quickly and break new ground in Translational Science
  • 49.
    Biolink-Model: A schemafor biological KGs ● Expressed in LinkML ● Integrates multiple Knowledge Graphs and Knowledge Providers Biolink Model https://biolink.github.io/biolink-model
  • 50.
  • 51.
    Future Plans Hardening andadoption ● Governance around metamodel standard ● Documentation and tutorials ● Coordinate with major data providers and communities ● Completion of roundtrip conversion to multiple frameworks ● Highly efficient data readers/writers Tool ecosystem ● Web based tooling ● Integrate automated assistant features ● Change management ● Rule systems Currently driven by community contributions
  • 52.
    LinkML Summary Challenges ● Authoringstandards and data models is hard ● Adding semantics is harder ● Developing tools (UI, validators) is expensive LinkML ● Designed to be easy to use ● Layer in semantics as you need them ● Leverage multiple tool stacks ● Increasing adoption
  • 53.
    Acknowledgements Person GitHub Institution HaroldSolbrig @hsolbrig JHU Sujay Patil @sujaypatil96 LBNL Sierra Moxon @sierra-moxon LBNL Gaurav Vaidya @gaurav RENCI Bill Duncan @wdduncan LBNL, UFL Kevin Schaper @kevinschaper CU Anschutz Joe Flack @joeflack4 JHU Deepak Unni @deepakunni3 EMBL Vincent Emonet @vemonet U Maastricht Mark Miller @turbomam LBNL Harshad Hegde @hrshdhgd LBNL Person GitHub Institution Dazhi Jiao @jiaola JHU Matt Brush @mbrush CU Anschutz Brian Furner @bfurner U Chicago Tim Putman @putmantime CU Anschutz Nico Matentzoglu @matentzn Semanticly Ramona Walls @ramonawalls Critical Path Institute Victoria Soesanto @victoriasoesanto CU Anschutz Melissa Haendel @mellybelly CU Anschutz U01HG009453 Intelligent Concept Assistant HG010860-01 Phenomics First CEGS