LinkML
Linked (Open) Data Modeling Language
Yosemite Presentation April 2021
Harold Solbrig
Chris Mungall
These slides:
https://tinyurl.com/linkml-2021-april 1
2
Interoperability Roadmap
Healthcare
Information
Interoperability
Standardize
the Standards
Crowdsource
Translations
Incentivize
RDF as a Universal
Information
Representation
http://YosemiteProject.org/
3
Interoperability Roadmap
Healthcare
Information
Interoperability
Standardize
the Standards
Crowdsource
Translations
Incentivize
RDF as a Universal
Information
Representation
http://YosemiteProject.org/
Background on
Semantic Web
4
“For the semantic web to function,
computers must have access to
structured collections of information and
sets of inference rules that they can use
to conduct automated reasoning.”
“Traditional knowledge-representation
systems typically have been centralized,
requiring everyone to share exactly the
same definition of common concepts
such as "parent" or "vehicle." But central
control is stifling, and increasing the size
and scope of such a system rapidly
becomes unmanageable.”
5
Vision of the
Semantic Web:
information →
meaning
“The Semantic Web is not a
separate Web but an
extension of the current one,
in which information is given
well-defined meaning, better
enabling computers and
people to work in
cooperation.”
RDF for machines
Decentralized information networks
Ontologies
Automatic Agents
Digital Signatures
Identify with resolvable http URIs
Prose for humans
Centralized data repositories
Free text
Manual extraction / data wrangling
Unsigned
Identify with strings
6
The Semantic Web 20 years Later
Progress
- Web is ubiquitous
- URIs are used
- Agents abound
- Digital signatures and security
have advanced
- Semantics are improving
schema.org
(Not so much) progress
- Decentralization -- Web is
decentralized, but aggregators
dominate (Solid project)
- Semantics -- ontologies abound,
but useful ontologies… not so
much so.
- RDF -- still an afterthought.
Informal models (JSON) or formal
schemas, but semantics are still
largely textual
7
Biolink
Model
What lead to LinkML
The charge from NCATS:
● Create a Knowledge Graph Schema
● Encompass all biology from molecules through to clinical entities
● Get 20 different sites using the same data model
○ (oh: Only a handful of which use RDF/OWL)
● Do it quickly and break new ground in Translational Science
8
NationalMicro
biome Data
Collaborative
Goal
● Make multi-omics microbiome data FAIR
○ Environments
○ Metagenomes
○ Metatranscriptomes
○ Metabolomics
○ Metaproteomics
● Leverage existing ontologies and
standards
● Enable discovery in microbiome science
9
Metamodeling silos
SKOS
JSON
Schema
UML,
OO
OWL
10
GraphQL
ProtoBuf
checklists
CDEs
LinkML Philosophy
● Simplicity: YAML source files managed in GitHub
● Multimodal
○ JSON, RDF, Property Graphs
○ Open and Closed World use cases
● Stealth Semantics
○ Let them have JSON and OO Python Data Classes
○ Shh, secretly it’s JSON-LD
● Be a parasite
○ Compiles down to other frameworks; we can then leverage their toolchains
■ JSON-Schema: validation of JSON
■ ShEx: validation of RDF graphs
■ GraphQL: APIs
■ OWL: reasoning, browsers/registries
■ JSON-LD Contexts
11
SKOS
JSON
Schema
UML,
OO
OWL
12
LinkML
BiolinkML: The LinkML predecessor
https://github.com/biolink/biolinkml/
13
LinkML
“Goals”
Distributed, federated models
● Easy to create and maintain
● Available in multiple forms
● URL Addressable
● Integrated with Github idiom
Automatic tool generation
● Loaders / dumpers
● Format transformations
Baked in semantics
● Everything gets a URL
● Baked in RDF and Semantic links
○ Invisible except when necessary
● Semantic driven model transformation via
RDF
○ JSON-LD and ShEx under the
covers
○ JSON / YAML / CSV on the surface 14
The Yosemite vision
15
3
1
2
The Yosemite Vision of Data Translation
16
Source Target
Translate based on
crowdsourced rules
Adapted from Graphic by David Booth
3
1
2
...was not without its problems
Adapted from Graphic by David Booth 17
Source Target
Translate based on
crowdsourced rules
- Source doesn’t include formal (RDF)
semantics. 3rd parties must create, validate
and maintain these semantics
- RDF doesn’t lend itself to crowdsourcing
- Structural and semantic differences
mean that both the source and target
need to support not just semantics but
shared semantics.
LinkML target model
LinkML source model
LinkML: Embed RDF semantics directly in the Source and target
models; Augment the translation process with ontology and reasoners.
18
Ontology Reasoners
Introduction to LinkML
19
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
license: https://creativecommons.org/publicdomain/zero/1.0/
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
Metadata
Dependencies
Namespaces
Actual Model
A sample LinkML Schema
20
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
license: https://creativecommons.org/publicdomain/zero/1.0/
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
Metadata
Dependencies
Namespaces
Actual Model
LinkML RDF is hidden in plain sight
21
LinkML parser allows different frameworks to be used in different contexts
MyModel
Documentation
OWL
JSON Schema
ShEx Schema
Schema.py
GraphQL Schema
LinkML Schema
JSONLD Context
. . .
LinkML
parser
22
LinkML automates the documentation process
Schema
Documentation
OWL
JSON Schema
ShEx Schema
Schema.py
GraphQL Schema
LinkML Schema
JSONLD Context
. . .
LinkML
parser
23
Sample model documentation output
https://hsolbrig.github.io/sample_model/docs
24
LinkML can generate a variety of conforming schemas
Schema
OWL
JSON Schema
ShEx Schema
Schema.py
GraphQL Schema
Schema Source
JSONLD Context
. . .
LinkML
parser
25
BASE <https://example.org/linkml/hello-world/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX sdo: <https://schema.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
<String> xsd:string
<Person> CLOSED {
( $<Person_tes> ( sdo:givenName @<String> + ;
sdo:familyName @<String> ;
foaf:knows @<Person> *
) ;
rdf:type [ sdo:Person ]
)
}
"$id": "https://example.org/linkml/person",
"$schema": "http://json-schema.org/draft-07/schema#",
"definitions": {
"Person": {
"additionalProperties": false,
"description": "Minimal information about a person",
"properties": {
"first_name": {
"items": {
"type": "string"
},
"type": "array"
},
"id": {
"type": "string"
},
...
Shape Expressions (ShEx) Schema
JSON Schema
type Person
{
id: String!
firstName: [String]!
lastName: String!
knows: [Person]
}
Graphql Schema
Sample LinkML generated schemas
26
LinkML can also emit OWL
27
LinkML can also be used to represent ontology
28
OWL derived from
Biolink Model
LinkML models can be translated to JSON-LD context
Schema
OWL
JSON Schema
ShEx Schema
Schema.py
GraphQL Schema
Schema Source
JSONLD Context
. . .
LinkML
parser
29
{
"@context": {
"ex": "https://example.org/linkml/hello-world/",
"foaf": "http://xmlns.com/foaf/0.1/",
"linkml": "https://w3id.org/linkml/",
"sdo": "https://schema.org/",
"@vocab": "https://example.org/linkml/hello-world/",
"first_name": {
"@id": "sdo:givenName"
},
"id": "@id",
"knows": {
"@type": "@id",
"@id": "foaf:knows"
},
"last_name": {
"@id": "sdo:familyName"
},
"Person": {
"@id": "sdo:Person"
}
}
}
A sample JSON-LD
context in use
https://tinyurl.com/s6keujhm
30
LinkML can emit python
Schema
Documentation
OWL
JSON Schema
ShEx Schema
Schema.py
GraphQL Schema
Schema Source
JSONLD Context
. . .
LinkML
parser
31
# Types
class String(str):
type_class_uri = XSD.string
type_class_curie = "xsd:string"
type_name = "string"
type_model_uri = EX.String
@dataclass
class Person(YAMLRoot):
"""
Minimal information about a person
"""
id: Union[str, PersonId] = None
first_name: Union[str, List[str]] = None
last_name: str = None
knows: Optional[Union[Union[str, PersonId], List[Union[str, PersonId]]]] = empty_list()
def __post_init__(self, *_: List[str], **kwargs: Dict[str, Any]):
if self.id is None:
raise ValueError("id must be supplied")
if not isinstance(self.id, PersonId):
self.id = PersonId(self.id)
if self.first_name is None:
raise ValueError("first_name must be supplied")
elif not isinstance(self.first_name, list):
self.first_name = [self.first_name]
elif len(self.first_name) == 0:
raise ValueError(f"first_name must be a non-empty list")
self.first_name = [v if isinstance(v, str) else str(v) for v in self.first_name]
...
from examples.basic import Person
sam = Person("1172438", first_name=["Samual", "J"],last_name="Snooter")
print(sam)
Person(id='1172438', first_name=['Samual', 'J'], last_name='Snooter',
knows=[])
fred = Person("a117", first_name="John")
...
ValueError: last_name must be supplied
Using python code emitted by LinkML
32
The LinkML runtime can consume and create...
JSON
Instance
YAML
Instance
RDF
Instance
Tabular
(CSV, TSV,
Spreadsheet)
Instance
FHIR
Instance
…
Instance
LinkML Runtime
Schema.py
33
Generated python can be a gateway to anything...
JSON
Instance
YAML
Instance
RDF
Instance
Tabular
(CSV, TSV,
Spreadsheet)
Instance
FHIR
Instance
…
Instance
LinkML Runtime
Schema.py
Any Jupyter /
Big Data /
Pandas tool
that supports
34
from examples.basic import Person
from linkml.dumpers import json_dumper, rdf_dumper
sam = Person("1172438", first_name=["Samual", "J"], last_name="Snooter")
ann = Person("17a3923", first_name="Jill", last_name="Jones", knows=[sam.id])
print(json_dumper.dumps(ann))
print(yaml_dumper.dumps(ann))
print(rdf_dumper.dumps(ann, contexts="../examples/jsonld/basic.context.jsonld"))
{
"id": "17a3923",
"first_name": [
"Jill"
],
"last_name": "Jones",
"knows": [
"1172438"
],
"@type": "Person"
}
id: 17a3923
first_name:
- Jill
last_name: Jones
knows:
- '1172438'
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix sdo: <https://schema.org/> .
<https://example.org/linkml/hello-world/17a3923> a sdo:Person ;
foaf:knows <https://example.org/linkml/hello-
world/1172438> ;
sdo:familyName "Jones" ;
sdo:givenName "Jill" .
python
JSON output YAML output RDF output (by way of JSON-LD)
Objects can be exported as JSON, YAML, or RDF
35
from linkml.loaders import yaml_loader
fred = yaml_loader.load('input/fred.yaml', target_class=Person)
print(fred.first_name)
['Fred', 'William']
harvey = json_loader.load('https://raw.githubusercontent.com/hsolbrig/linkml-enhanced-
template/master/tests/input/harvey.json', target_class=Person)
print(harvey.last_name)
Mackerson
ann = rdf_loader.load('input/ann.xml', target_class=Person, fmt="xml")
print(ann.last_name)
Richardson
id: 118-28-3199
first_name:
- Fred
- William
last_name: Phillips
knows:
- '1172438'
- '1172438'
input/fred.yaml
Python code
{
"id": "118-78-0697",
"first_name": [
"Harvey"
],
"last_name": "Mackerson"
}
http://example.org/.../harvey.json input/ann.xml
Objects can be import from JSON, YAML, or RDF
36
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:sdo="https://schema.org/"
>
<rdf:Description rdf:about="https://peoples.r.us">
<sdo:givenName>Ann</sdo:givenName>
<rdf:type rdf:resource="https://schema.org/Person"/>
<sdo:familyName>Richardson</sdo:familyName>
<sdo:givenName>Elizabeth</sdo:givenName>
</rdf:Description>
</rdf:RDF>
LinkML and Enums
37
Slots:
...
gender:
description: Person gender
slot_uri: SDO:gender
range: gender_enum
classes:
Thing:
description: The most generic type of item.
class_uri: SDO:Thing
slots:
- identifier
- url
- name
Person:
is_a: Thing
class_uri: SDO:Person
description: A person (alive, dead, undead, or
fictional).
slots:
- givenName
- additionalName
- gender
38
LinkML incorporates ISO/IEC 11179-3 meaning/data model
ISO/IEC 11179-
3:2013(E)
ISO/IEC 11179-3:2013(E) p. 101
A value that can appear in the data
What a particular value means
39
ISO/IEC 11179-3 divides enums into representation / meaning
enums:
gender_enum:
description: |-
Gender of something, ...
permissible_values:
0: Male Gender
1: Female Gender
8: Mixed Gender
Enumeration flavors
40
LinkML supports simple enums
Enumeration flavors
gender_enum_2:
code_set: sdo:GenderType
permissible_values:
0:
description: Male Gender
meaning: sdo:Male
1:
description: Female Gender
meaning: sdo:Female
8:
description: Mixed Gender
41
LinkML supports meaning link
gender_enum_3:
code_set: sdo:GenderType
pv_formula: CODE
42
LinkML supports meanings
drawn from conceptual domain
Revisiting the
Yosemite vision
43
LinkML and the Yosemite vision
LinkML:
- Embed RDF semantics directly in the Source and target models
- Augment the translation process with ontology and reasoners.
LinkML source
model A
LinkML source
model B
Ontology /
Reasoners
Semantic representation of
Model Content
44
LinkML is LinkML
45
The LinkML model is developed in LinkML
https://w3id.org/linkml/meta.yaml https://w3id.org/linkml/SchemaDefinition
https://w3id.org/linkml/meta.context.jsonld
46
47
The
Balancing
Act
Information Modeling World
● Explicit structures
● Implicit semantics
● Closed World Assumption
● Classes are Primary
○ Attributes owned by classes
Ontology Modeling World
● Structures are dynamic
● Semantics front and center
● Open World Assumption
● Slots (Predicates) and Classes
(Resources) are co-equals
Projects Using LinkML
48
Biolink
Model
Biolink: Goals
The charge from NCATS:
● Create a Knowledge Graph Schema
● Encompass all biology from molecules through to clinical entities
● Get 20 different sites using the same data model
○ (oh: Only a handful of which use RDF/OWL)
● Do it quickly and break new ground in Translational Science
49
Biolink
Model
Approach
● Build data model:
○ Main categories (gene, chemical, disease, …)
○ Predicates and associations
■ E.g. chemical treats disease, Gene interacts with
gene
○ .Leverage ontologies
● Collaborative development
○ Domain-specific working groups
○ Anyone can make Pull Requests
Why LinkML?
● Validate using closed-world assumption
● Ontologies and semantics, but in the background
● Property graphs and edges as first-class citizens
50
Biolink
Model
Where we are (year 2 or 5)
● All “Knowledge Providers” and “Autonomous Relay Agents”
nominally using Biolink
● Validation dashboard in progress
● Early demonstrations of powerful federated queries
51
NationalMicro
biome Data
Collaborative
Goal
● Make multi-omics microbiome data FAIR
○ Environments
○ Metagenomes
○ Metatranscriptomes
○ Metabolomics
○ Metaproteomics
● Leverage existing ontologies and
standards
● Enable discovery in microbiome science
52
NationalMicro
biome Data
Collaborative
Approach
● Formalize existing “checklist” standards
● Create modular schema
● Leverage MIxS, ENVO, PROV
Why LinkML
● Developers like JSON + JSON-Schema
● Biologists like spreadsheets
● “Semantic enums” work well
● Needed something that worked with
traditional technology (Mongo, Postgres)
● “Stealth semantics”
○ Everything has URI
○ All JSON is transparently JSON-LD
53
NationalMicro
biome Data
Collaborative
Where we are (year 2)
● Unified modular schema
● Heterogeneous data successfully
integrated
○ Environmental
○ Multiple omics types
○ Functional annotation
○ MAG binning
● Ontologies like ENVO used as ‘slot-fillers’
● Easy for developers
○ System based mainly on JSON
exchange
○ RDF can be leveraged
○ Currently Mongo + Postgres
○ Working on TerminusDB adapters
● Working with upstream standards
providers to LinkML-ify checklists
○ Spreadsheets → computable
artefacts 54
Other
projects
● Center for Cancer Data Harmonization
○ Cancer sample and patient metadata
○ Omics data
● HOT Ecosystem
○ Health Open Terminologies
○ SKOS metamodel
● Genome Features
○ Formalization of GFF3 schema
○ Sequence Ontology
● Unified Chemistry Datamodel
○ Data model and ontology for chemistry
● Gene Ontology
○ Causal Activity Models
● CSOLink
○ A high level data model of computer
systems
55
Help wanted: LinkML is still very much under construction
56
Inquire at monarchinit@gmail.com or w/ authors
Credits
57
Contributors
● Chris Mungall (Berkeley Lab)
● Deepak Unni (Berkeley Lab)
● Dazhi Jiao (Johns Hopkins University)
● Harold Solbrig (Johns Hopkins University)
● Richard Bruskiewich (Star Informatics)
● Jim Balhoff (RENCI)
● William Duncan (Berkeley Lab)
● Harshad Hegde (Berkeley Lab)
● Mark Miller (Berkeley Lab)
● Melissa Haendel (CU)
● Matthew Brush (OHSU)
● Sierra Moxon (Berkeley Lab)
● Donnie Winston (Polyneme)
58
Funding
LinkML project development was supported by funding from:
● NCATS Translator (OT2 TR003449)
● NIH Monarch (R24 OD011883)
● CD2H (U24 TR002306)
● CCDH
● FHIRCat (R56 EB028101)
● Phenomics First (RM1 HG010860)
● DOE National Microbiome Data Collaborative
59
Links and contact information
https://linkml.github.io/
https://github.com/linkml/
https://github.com/linkml/examples/ (Will be available shortly…)
solbrig@jhu.edu - Harold Solbrig
60

LinkML presentation to Yosemite Group

  • 1.
    LinkML Linked (Open) DataModeling Language Yosemite Presentation April 2021 Harold Solbrig Chris Mungall These slides: https://tinyurl.com/linkml-2021-april 1
  • 2.
  • 3.
  • 4.
  • 5.
    “For the semanticweb to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning.” “Traditional knowledge-representation systems typically have been centralized, requiring everyone to share exactly the same definition of common concepts such as "parent" or "vehicle." But central control is stifling, and increasing the size and scope of such a system rapidly becomes unmanageable.” 5
  • 6.
    Vision of the SemanticWeb: information → meaning “The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” RDF for machines Decentralized information networks Ontologies Automatic Agents Digital Signatures Identify with resolvable http URIs Prose for humans Centralized data repositories Free text Manual extraction / data wrangling Unsigned Identify with strings 6
  • 7.
    The Semantic Web20 years Later Progress - Web is ubiquitous - URIs are used - Agents abound - Digital signatures and security have advanced - Semantics are improving schema.org (Not so much) progress - Decentralization -- Web is decentralized, but aggregators dominate (Solid project) - Semantics -- ontologies abound, but useful ontologies… not so much so. - RDF -- still an afterthought. Informal models (JSON) or formal schemas, but semantics are still largely textual 7
  • 8.
    Biolink Model What lead toLinkML The charge from NCATS: ● Create a Knowledge Graph Schema ● Encompass all biology from molecules through to clinical entities ● Get 20 different sites using the same data model ○ (oh: Only a handful of which use RDF/OWL) ● Do it quickly and break new ground in Translational Science 8
  • 9.
    NationalMicro biome Data Collaborative Goal ● Makemulti-omics microbiome data FAIR ○ Environments ○ Metagenomes ○ Metatranscriptomes ○ Metabolomics ○ Metaproteomics ● Leverage existing ontologies and standards ● Enable discovery in microbiome science 9
  • 10.
  • 11.
    LinkML Philosophy ● Simplicity:YAML source files managed in GitHub ● Multimodal ○ JSON, RDF, Property Graphs ○ Open and Closed World use cases ● Stealth Semantics ○ Let them have JSON and OO Python Data Classes ○ Shh, secretly it’s JSON-LD ● Be a parasite ○ Compiles down to other frameworks; we can then leverage their toolchains ■ JSON-Schema: validation of JSON ■ ShEx: validation of RDF graphs ■ GraphQL: APIs ■ OWL: reasoning, browsers/registries ■ JSON-LD Contexts 11
  • 12.
  • 13.
    BiolinkML: The LinkMLpredecessor https://github.com/biolink/biolinkml/ 13
  • 14.
    LinkML “Goals” Distributed, federated models ●Easy to create and maintain ● Available in multiple forms ● URL Addressable ● Integrated with Github idiom Automatic tool generation ● Loaders / dumpers ● Format transformations Baked in semantics ● Everything gets a URL ● Baked in RDF and Semantic links ○ Invisible except when necessary ● Semantic driven model transformation via RDF ○ JSON-LD and ShEx under the covers ○ JSON / YAML / CSV on the surface 14
  • 15.
  • 16.
    3 1 2 The Yosemite Visionof Data Translation 16 Source Target Translate based on crowdsourced rules Adapted from Graphic by David Booth
  • 17.
    3 1 2 ...was not withoutits problems Adapted from Graphic by David Booth 17 Source Target Translate based on crowdsourced rules - Source doesn’t include formal (RDF) semantics. 3rd parties must create, validate and maintain these semantics - RDF doesn’t lend itself to crowdsourcing - Structural and semantic differences mean that both the source and target need to support not just semantics but shared semantics.
  • 18.
    LinkML target model LinkMLsource model LinkML: Embed RDF semantics directly in the Source and target models; Augment the translation process with ontology and reasoners. 18 Ontology Reasoners
  • 19.
  • 20.
    id: https://example.org/linkml/hello-world title: Reallybasic LinkML model name: hello-world license: https://creativecommons.org/publicdomain/zero/1.0/ version: 0.0.1 prefixes: linkml: https://w3id.org/linkml/ sdo: https://schema.org/ ex: https://example.org/linkml/hello-world/ default_prefix: ex default_curi_maps: - semweb_context imports: - linkml:types classes: Person: description: Minimal information about a person class_uri: sdo:Person attributes: id: identifier: true slot_uri: sdo:taxID first_name: required: true slot_uri: sdo:givenName multivalued: true last_name: required: true slot_uri: sdo:familyName knows: range: Person multivalued: true slot_uri: foaf:knows Metadata Dependencies Namespaces Actual Model A sample LinkML Schema 20
  • 21.
    id: https://example.org/linkml/hello-world title: Reallybasic LinkML model name: hello-world license: https://creativecommons.org/publicdomain/zero/1.0/ version: 0.0.1 prefixes: linkml: https://w3id.org/linkml/ sdo: https://schema.org/ ex: https://example.org/linkml/hello-world/ default_prefix: ex default_curi_maps: - semweb_context imports: - linkml:types classes: Person: description: Minimal information about a person class_uri: sdo:Person attributes: id: identifier: true slot_uri: sdo:taxID first_name: required: true slot_uri: sdo:givenName multivalued: true last_name: required: true slot_uri: sdo:familyName knows: range: Person multivalued: true slot_uri: foaf:knows Metadata Dependencies Namespaces Actual Model LinkML RDF is hidden in plain sight 21
  • 22.
    LinkML parser allowsdifferent frameworks to be used in different contexts MyModel Documentation OWL JSON Schema ShEx Schema Schema.py GraphQL Schema LinkML Schema JSONLD Context . . . LinkML parser 22
  • 23.
    LinkML automates thedocumentation process Schema Documentation OWL JSON Schema ShEx Schema Schema.py GraphQL Schema LinkML Schema JSONLD Context . . . LinkML parser 23
  • 24.
    Sample model documentationoutput https://hsolbrig.github.io/sample_model/docs 24
  • 25.
    LinkML can generatea variety of conforming schemas Schema OWL JSON Schema ShEx Schema Schema.py GraphQL Schema Schema Source JSONLD Context . . . LinkML parser 25
  • 26.
    BASE <https://example.org/linkml/hello-world/> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX sdo: <https://schema.org/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> <String> xsd:string <Person> CLOSED { ( $<Person_tes> ( sdo:givenName @<String> + ; sdo:familyName @<String> ; foaf:knows @<Person> * ) ; rdf:type [ sdo:Person ] ) } "$id": "https://example.org/linkml/person", "$schema": "http://json-schema.org/draft-07/schema#", "definitions": { "Person": { "additionalProperties": false, "description": "Minimal information about a person", "properties": { "first_name": { "items": { "type": "string" }, "type": "array" }, "id": { "type": "string" }, ... Shape Expressions (ShEx) Schema JSON Schema type Person { id: String! firstName: [String]! lastName: String! knows: [Person] } Graphql Schema Sample LinkML generated schemas 26
  • 27.
    LinkML can alsoemit OWL 27
  • 28.
    LinkML can alsobe used to represent ontology 28 OWL derived from Biolink Model
  • 29.
    LinkML models canbe translated to JSON-LD context Schema OWL JSON Schema ShEx Schema Schema.py GraphQL Schema Schema Source JSONLD Context . . . LinkML parser 29
  • 30.
    { "@context": { "ex": "https://example.org/linkml/hello-world/", "foaf":"http://xmlns.com/foaf/0.1/", "linkml": "https://w3id.org/linkml/", "sdo": "https://schema.org/", "@vocab": "https://example.org/linkml/hello-world/", "first_name": { "@id": "sdo:givenName" }, "id": "@id", "knows": { "@type": "@id", "@id": "foaf:knows" }, "last_name": { "@id": "sdo:familyName" }, "Person": { "@id": "sdo:Person" } } } A sample JSON-LD context in use https://tinyurl.com/s6keujhm 30
  • 31.
    LinkML can emitpython Schema Documentation OWL JSON Schema ShEx Schema Schema.py GraphQL Schema Schema Source JSONLD Context . . . LinkML parser 31
  • 32.
    # Types class String(str): type_class_uri= XSD.string type_class_curie = "xsd:string" type_name = "string" type_model_uri = EX.String @dataclass class Person(YAMLRoot): """ Minimal information about a person """ id: Union[str, PersonId] = None first_name: Union[str, List[str]] = None last_name: str = None knows: Optional[Union[Union[str, PersonId], List[Union[str, PersonId]]]] = empty_list() def __post_init__(self, *_: List[str], **kwargs: Dict[str, Any]): if self.id is None: raise ValueError("id must be supplied") if not isinstance(self.id, PersonId): self.id = PersonId(self.id) if self.first_name is None: raise ValueError("first_name must be supplied") elif not isinstance(self.first_name, list): self.first_name = [self.first_name] elif len(self.first_name) == 0: raise ValueError(f"first_name must be a non-empty list") self.first_name = [v if isinstance(v, str) else str(v) for v in self.first_name] ... from examples.basic import Person sam = Person("1172438", first_name=["Samual", "J"],last_name="Snooter") print(sam) Person(id='1172438', first_name=['Samual', 'J'], last_name='Snooter', knows=[]) fred = Person("a117", first_name="John") ... ValueError: last_name must be supplied Using python code emitted by LinkML 32
  • 33.
    The LinkML runtimecan consume and create... JSON Instance YAML Instance RDF Instance Tabular (CSV, TSV, Spreadsheet) Instance FHIR Instance … Instance LinkML Runtime Schema.py 33
  • 34.
    Generated python canbe a gateway to anything... JSON Instance YAML Instance RDF Instance Tabular (CSV, TSV, Spreadsheet) Instance FHIR Instance … Instance LinkML Runtime Schema.py Any Jupyter / Big Data / Pandas tool that supports 34
  • 35.
    from examples.basic importPerson from linkml.dumpers import json_dumper, rdf_dumper sam = Person("1172438", first_name=["Samual", "J"], last_name="Snooter") ann = Person("17a3923", first_name="Jill", last_name="Jones", knows=[sam.id]) print(json_dumper.dumps(ann)) print(yaml_dumper.dumps(ann)) print(rdf_dumper.dumps(ann, contexts="../examples/jsonld/basic.context.jsonld")) { "id": "17a3923", "first_name": [ "Jill" ], "last_name": "Jones", "knows": [ "1172438" ], "@type": "Person" } id: 17a3923 first_name: - Jill last_name: Jones knows: - '1172438' @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix sdo: <https://schema.org/> . <https://example.org/linkml/hello-world/17a3923> a sdo:Person ; foaf:knows <https://example.org/linkml/hello- world/1172438> ; sdo:familyName "Jones" ; sdo:givenName "Jill" . python JSON output YAML output RDF output (by way of JSON-LD) Objects can be exported as JSON, YAML, or RDF 35
  • 36.
    from linkml.loaders importyaml_loader fred = yaml_loader.load('input/fred.yaml', target_class=Person) print(fred.first_name) ['Fred', 'William'] harvey = json_loader.load('https://raw.githubusercontent.com/hsolbrig/linkml-enhanced- template/master/tests/input/harvey.json', target_class=Person) print(harvey.last_name) Mackerson ann = rdf_loader.load('input/ann.xml', target_class=Person, fmt="xml") print(ann.last_name) Richardson id: 118-28-3199 first_name: - Fred - William last_name: Phillips knows: - '1172438' - '1172438' input/fred.yaml Python code { "id": "118-78-0697", "first_name": [ "Harvey" ], "last_name": "Mackerson" } http://example.org/.../harvey.json input/ann.xml Objects can be import from JSON, YAML, or RDF 36 <?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:sdo="https://schema.org/" > <rdf:Description rdf:about="https://peoples.r.us"> <sdo:givenName>Ann</sdo:givenName> <rdf:type rdf:resource="https://schema.org/Person"/> <sdo:familyName>Richardson</sdo:familyName> <sdo:givenName>Elizabeth</sdo:givenName> </rdf:Description> </rdf:RDF>
  • 37.
  • 38.
    Slots: ... gender: description: Person gender slot_uri:SDO:gender range: gender_enum classes: Thing: description: The most generic type of item. class_uri: SDO:Thing slots: - identifier - url - name Person: is_a: Thing class_uri: SDO:Person description: A person (alive, dead, undead, or fictional). slots: - givenName - additionalName - gender 38 LinkML incorporates ISO/IEC 11179-3 meaning/data model ISO/IEC 11179- 3:2013(E)
  • 39.
    ISO/IEC 11179-3:2013(E) p.101 A value that can appear in the data What a particular value means 39 ISO/IEC 11179-3 divides enums into representation / meaning
  • 40.
    enums: gender_enum: description: |- Gender ofsomething, ... permissible_values: 0: Male Gender 1: Female Gender 8: Mixed Gender Enumeration flavors 40 LinkML supports simple enums
  • 41.
    Enumeration flavors gender_enum_2: code_set: sdo:GenderType permissible_values: 0: description:Male Gender meaning: sdo:Male 1: description: Female Gender meaning: sdo:Female 8: description: Mixed Gender 41 LinkML supports meaning link
  • 42.
    gender_enum_3: code_set: sdo:GenderType pv_formula: CODE 42 LinkMLsupports meanings drawn from conceptual domain
  • 43.
  • 44.
    LinkML and theYosemite vision LinkML: - Embed RDF semantics directly in the Source and target models - Augment the translation process with ontology and reasoners. LinkML source model A LinkML source model B Ontology / Reasoners Semantic representation of Model Content 44
  • 45.
  • 46.
    The LinkML modelis developed in LinkML https://w3id.org/linkml/meta.yaml https://w3id.org/linkml/SchemaDefinition https://w3id.org/linkml/meta.context.jsonld 46
  • 47.
    47 The Balancing Act Information Modeling World ●Explicit structures ● Implicit semantics ● Closed World Assumption ● Classes are Primary ○ Attributes owned by classes Ontology Modeling World ● Structures are dynamic ● Semantics front and center ● Open World Assumption ● Slots (Predicates) and Classes (Resources) are co-equals
  • 48.
  • 49.
    Biolink Model Biolink: Goals The chargefrom NCATS: ● Create a Knowledge Graph Schema ● Encompass all biology from molecules through to clinical entities ● Get 20 different sites using the same data model ○ (oh: Only a handful of which use RDF/OWL) ● Do it quickly and break new ground in Translational Science 49
  • 50.
    Biolink Model Approach ● Build datamodel: ○ Main categories (gene, chemical, disease, …) ○ Predicates and associations ■ E.g. chemical treats disease, Gene interacts with gene ○ .Leverage ontologies ● Collaborative development ○ Domain-specific working groups ○ Anyone can make Pull Requests Why LinkML? ● Validate using closed-world assumption ● Ontologies and semantics, but in the background ● Property graphs and edges as first-class citizens 50
  • 51.
    Biolink Model Where we are(year 2 or 5) ● All “Knowledge Providers” and “Autonomous Relay Agents” nominally using Biolink ● Validation dashboard in progress ● Early demonstrations of powerful federated queries 51
  • 52.
    NationalMicro biome Data Collaborative Goal ● Makemulti-omics microbiome data FAIR ○ Environments ○ Metagenomes ○ Metatranscriptomes ○ Metabolomics ○ Metaproteomics ● Leverage existing ontologies and standards ● Enable discovery in microbiome science 52
  • 53.
    NationalMicro biome Data Collaborative Approach ● Formalizeexisting “checklist” standards ● Create modular schema ● Leverage MIxS, ENVO, PROV Why LinkML ● Developers like JSON + JSON-Schema ● Biologists like spreadsheets ● “Semantic enums” work well ● Needed something that worked with traditional technology (Mongo, Postgres) ● “Stealth semantics” ○ Everything has URI ○ All JSON is transparently JSON-LD 53
  • 54.
    NationalMicro biome Data Collaborative Where weare (year 2) ● Unified modular schema ● Heterogeneous data successfully integrated ○ Environmental ○ Multiple omics types ○ Functional annotation ○ MAG binning ● Ontologies like ENVO used as ‘slot-fillers’ ● Easy for developers ○ System based mainly on JSON exchange ○ RDF can be leveraged ○ Currently Mongo + Postgres ○ Working on TerminusDB adapters ● Working with upstream standards providers to LinkML-ify checklists ○ Spreadsheets → computable artefacts 54
  • 55.
    Other projects ● Center forCancer Data Harmonization ○ Cancer sample and patient metadata ○ Omics data ● HOT Ecosystem ○ Health Open Terminologies ○ SKOS metamodel ● Genome Features ○ Formalization of GFF3 schema ○ Sequence Ontology ● Unified Chemistry Datamodel ○ Data model and ontology for chemistry ● Gene Ontology ○ Causal Activity Models ● CSOLink ○ A high level data model of computer systems 55
  • 56.
    Help wanted: LinkMLis still very much under construction 56 Inquire at monarchinit@gmail.com or w/ authors
  • 57.
  • 58.
    Contributors ● Chris Mungall(Berkeley Lab) ● Deepak Unni (Berkeley Lab) ● Dazhi Jiao (Johns Hopkins University) ● Harold Solbrig (Johns Hopkins University) ● Richard Bruskiewich (Star Informatics) ● Jim Balhoff (RENCI) ● William Duncan (Berkeley Lab) ● Harshad Hegde (Berkeley Lab) ● Mark Miller (Berkeley Lab) ● Melissa Haendel (CU) ● Matthew Brush (OHSU) ● Sierra Moxon (Berkeley Lab) ● Donnie Winston (Polyneme) 58
  • 59.
    Funding LinkML project developmentwas supported by funding from: ● NCATS Translator (OT2 TR003449) ● NIH Monarch (R24 OD011883) ● CD2H (U24 TR002306) ● CCDH ● FHIRCat (R56 EB028101) ● Phenomics First (RM1 HG010860) ● DOE National Microbiome Data Collaborative 59
  • 60.
    Links and contactinformation https://linkml.github.io/ https://github.com/linkml/ https://github.com/linkml/examples/ (Will be available shortly…) solbrig@jhu.edu - Harold Solbrig 60