Developed as a community effort, as
part of the NIH BD2K bioCADDIE
grant (1U24 AI117966-01)
DATS - Data Tag Suite:
model overview
Philippe Rocca-Serra,
Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone
{philippe.rocca-serra,alejandra.gonzalez-beltran,susanna-assunta.sansone}@oerc.ox.ac.uk
Oxford e-Research Centre, Department of Engineering, University of Oxford, UK
NIH DCCPC - KC7, crosscut metadata model subgroup; April 20th, 2018
An activity of the NIH Data Commons’
Oxygen (1OT3OD025462-01) and
Phosphorus (1OT3OD025459-01) Teams in KC7
What is DATS ?
JATS (Journal Article Tag Suite) underpins PubMed for literature indexing,
DATS (DAta Tag Suite) the data model to index data sources
(used by DataMed, but not limited to)
TIP: click github octocat to be taken to relevant files/document from this
slide deck to the DATS specification(s)
Where do I find the documentation?
JATS (Journal Article Tag Suite) underpins PubMed for literature indexing,
DATS (DatA Tag Suite) the data model to index data sources
(used by DataMed, but not limited to)
doi:10.1038/ng.3864
(2017)
doi:10.1038/sdata.2017.59
(2017)
Mar15
Jun15
Dec15
Jun16
Aug15
May16
Sep16
Mar17
bioCADDIE team - iterative development
Our community engagement: input, feedback and links
Phase 1 Phase 2 Phase 3
Design and development
SOP and
metadata
strawman
<DATS>
name
DATS
v1.1
May17
DATS v2.0
(with access
metadata,
WG7)
DATS v2.1
(schema.org
JSON-LD)
DATS
v2.2
Metadata
specification V1.0
with JSON schema
Use cases
workshop
1st
DATS
workshop
WG3 formed;
telecons start;
dissemination via
2nd
DATS
workshop
WG7 formed;
telecons start
WG12 formed;
telecons start
Evaluation & iterative refinement Continued evaluation & consolidation
primarily metadata modelers
primarily implementers
❖ Enabling discoverability: find and access datasets
❖ Focusing on surfacing key metadata descriptors, such as
✧ information and relations between authors, datasets, publication,
funding sources, nature of biological signal and perturbation etc.
✧ Not the perfect model to represent all experimental
details but enough capability to capture essential
descriptors
✧ the domain-specific level of details and metadata belong to the
realm of specialized databases
❖ Better than just having keywords
✧ we have aimed to have maximum coverage of use cases with
minimal number of data elements and relations
What was DATS supposed to do and be?
Metadata elements identified by combining the two complementary approaches
USE CASES: top-down approach SCHEMAS: bottom-up approach
The development process in a nutshell
(v1.0, v1.1, v2.0, v2.1, v2.2)
bottom-up approach
Building DATS by alignment
(standing on the shoulders of giants)
❖ BioProject
❖ BioSample
❖ MiNIML
❖ PRIDE-ml
❖ MAGE-tab
❖ GA4GH metadata schema
❖ SRA xml
❖ ISA
❖ CDISC SDM / element of BRIDGE model
❖ ……(full list in the DATS specification)
❖ DataCite
❖ RIF-CS
❖ W3C HCLS dataset descriptions
❖ (mapping of many models including DCAT, PROV, VOID, Dublin Core)
❖ Project Open Metadata (used by HealthData.gov is being added in this new iteration)
❖ schema.org
Convergence
of elements
extracted from
competency
questions
and existing
(generic and
biomedical)
data models
(incl. DataCite,
DCAT, schema.org,
HCLS dataset,
RIF-CS, ISA-Tab,
SRA-xml etc.)
Building DATS from query cases
Adoption
of elements extracted
from
and from
core entities
extended entities
Capturing the nature of a dataset
Database of Reference Knowledge
Storing knowledge about “The building blocks”
Archive of Experiments
Storing “The signal”
[acquisition, analysis, reverse engineering]
Defining boundaries for DATS
● Get all datasets on COPD where transcription profiling,
spirometry’ were measured in cohorts of Southern
Europeans
A query to retrieve datasets for further analysis.
● Get all genes whose expression in human lung tissue is
elevated following exposure to diesel particulates
A query about findings for hypothesis generation
DATS could represent the collection of statements as a datasets
but how the statements are actually structured is beyond to
current scope of DATS .
Complementarity with Biolinks - @cmungall
Model general overview
❖ What is the dataset about?
✧ Material, Data
❖ How was the dataset produced ? Which information does it hold?
✧ Dataset / Data Type with its Dimension, Method/Technology,
Instrument
❖ Where can a dataset be found?
✧ Dataset, Distribution, Access objects (links to License, Formats)
❖ When was the datasets produced, released etc.?
✧ Dates to specify the nature of an event {create, modify, start, end...}
and its timestamp
❖ Who did the work, funded the research, hosts the resources etc.?
✧ Person, Organisation and their roles, Grant
DATS fundamentals
Relevant use cases:
assembling synthetic cohorts
DATS objects highlight
Counting things (I):
tracking patient and
specimen relationships
Counting things (I):
tracking patient and specimen relationships
Relationships between materials matter:
❖ Assessing sample / specimen origin and patient identity
❖ In the context of longitudinal studies, repeated measure designs, where samples are
collected or variables measured several times over the course of a study
Ease of use and compatibility
with biomedical ontologies
owing to 'familiarity and
awareness’ of DO, GO,
UBERON and the likes.
Note: if underlying model isn’t rich enough (as observed when mapping from a broad range of
resources), accurate mapping from a primary resource into DATS may prove difficult
Counting things (II):
groups and sizes
in the context of studies
Counting things (II):
groups and sizes in the context of studies
For all datasets characterising “signal”, the ability to identify, list and characterise
study populations matters, as does the ability to capture descriptors for
‘treatment’ or ‘perturbations’
Dealing with
spatial and temporal
properties of a dataset
Tracking dataset spatial and temporal properties
Where & When
Query: “Get all datasets collected between 1945 and
1968 in University Hospitals from Japan and Korea”
Spatial information
❖ DATS uses the entity ‘Place’ to report geolocation information for a
Dataset (and other entities)
Place: entity with the following attributes
-name.
-description.
-coordinates.
-geometry. {values from geoJSON}
-postalAddress.
dats.Dataset spatialCoverage dats.Place
dats.Material spatialCoverage dats.Place
dats.Organization location dats.Place
dats.Activity location dats.Place
dats.Place relates to Feature in GeoJSON, GeoLocation in DataCite and Place in schema.org
Measuring things:
Supporting the description of variables -
dimensions and their relation to datasets
Dimensions
1. DATS.Dimension: meant to be used to report what data
points are about in a dataset, their nature, their units.
2. DATS.Dimension should be typed (categorical, continuous)
3. DATS.Dimension used from :
○ DATS.Material.characteristics.Dimension
○ DATS.DataAcquisition.measures.Dimension
Dimension: an example
{ "@type": "Dimension",
"identifier": {
"identifier": "AQ5",
"identifierSource": ""
} ,
"name": {
"valueIRI": "",
"value": "Current marital status"
},
"types":[{"value":"categorical","valueIRI":""}],
"values": [
"1 Married",
"2 Widowed",
"3 Separated",
"4 Divorced",
"5 Never married",
"-9 Missing"
],
"partOf": [
"Dataset-33581-0001.json"
],
"extraProperties": [
{
"category": "landingPage",
"values": [
"http://www.icpsr.umich.edu/
icpsrweb/ICPSR/ssvd/studies/
33581/datasets/0001/variables/AQ5"
]
}
]
}
/json-instances/ICPSR-33581/Dimension-33581-0001-AQ5.json
Credits to Matthew Richardson / Sanda Ionescu (ICPSR)
Dimensions
❖ Ongoing discussion to augment DATS.dimension in order to:
❖ provide summary statistics (min,max,mode,median,mean….)
❖ linked to group information
❖ under development and evaluation
❖ tightly tied to consent, access and terms of use issues
summary statistics
Tracking what the dataset is about
Note: alignment with
biomedical ontologies
(e.g. OBO foundry)
Objects to support the description of variables
dimensions and their relation to datasets
A study schedules a data acquisition event which measures a variable about some
material, input to the event the resulting datasets has part dimensions
Datasets can contain Datasets
Distinguish between results from measurement (output of data acquisitions)
and results from data transformations (output of data analysis)
Condition of access:
licenses, access conditions
and distributions
How can the dataset be accessed?
Access entity:
❖ landing page; access URL
❖ methods of access (e.g. download, service)
❖ authorization requirements
❖ authentication requirements
How can the dataset be used? (licenses)
❖ The dataset should be associated with one or more licenses, which
determine the terms of use of the dataset
❖ Licenses are legal documents giving official permission to do something
with the resource
❖ DATS supports to record licenses’ identifiers, names, versions, creators.
More details about the licenses are expected to be retrieved from external
resources.
Where the dataset can be found?
What data standards the dataset conforms to?
How can the dataset be accessed?
DATS allows reporting on what reporting guidelines, formats/models,
terminologies the dataset complies with/uses
Distribution: an example
A distribution is a specific available form of a dataset
(e.g. the dataset in a specific format or specific endpoint)
Distribution: an example
Relating datasets to databases and data
standards
“Housekeeping elements”:
Identification, publication, organization,
people, and grant
primary identifier (0..1)
alternative and related identifiers (0..n)
Object identification
Object identification
https://biocaddie.org/group/working-group/working-group-2-data-identifiers-recommendation
Object identification: guidelines
master/examples/Uniprot-P77967.json
"identifier":
{
"identifier": "uniprot:P77967",
"identifierSource": "uniprot"
},
“alternateIdentifiers”:[
{
"alternateIdentifier": "PIR:S74805",
"identifierSource": "PIR"
}
],
”relatedIdentifiers": [
{
"identifier": "PANTHER:PTHR11455:SF22",
"identifierSource": “PANTHER”,
"relationType": “family and domain database” }]
❖ Primary identifier of the dataset -
can be a string, but ideally an IRI.
The identifier source is
organization/namespace
responsible for creating/hosting it
(here using “compact URIs”)
❖ Identifiers of the dataset, other
than the primary and their sources
❖ Identifiers of related resources:
useful to allow cross-references
with other complementary
resources
Tracking dataset producer identity
Who
(and acknowledging funders)
Tracking bibliographic information
(and acknowledging funders)
❖ Distinction between primary publication(s) and other citations
❖ If published work, pubmed or DOI will suffice
✧ Rely on dedicated APIs to recover necessary publication metadata for
indexing/search, which can be included in DATS automatically
"primaryPublications" : [
{
"identifier":
{
"identifier": "https://www.ncbi.nlm.nih.gov/pubmed/7762914",
"identifierSource": "pubmed"
},
"alternateIdentifiers": [
{
"identifier": "http://dx.doi.org/10.7326/0003-4819-123-1-199507010-00007",
"identifierSource": "doi"
}
],
}
]
Tracking bibliographic information
❖ Validators / Schema compliance testing
❖ DataMed Transformation Language
https://biocaddie.org/sites/default/files/d7/project/1869/biocaddie-ahm-ingestion-2017sep.pdf
[Jeff Grethe]
Tools to handle DATS documents
https://github.com/biocaddie/WG3-MetadataSpecifications
Lessons learned
❖ Identification of a Dataset:
➢ Identifying what is a dataset for a particular source is crucial
for setting up an indexing pipeline to DataMed
❖ Use of DATS Dimensions (a high-level representation of
quantitative or qualitative properties of an entity)
➢ E.g. in OMOP CDM there was a need to split a single entity
into its procedural (mapped to DATS.DataAcquisition) and
its variable information (mapped to DATS.Dimension)
❖ Documentation
➢ The available DATS documentation was useful, more would
be better
❖ Support infrastructure
➢ For the future, include more examples and validation
infrastructure
Serializations and use of schema.org
❖ DATS model in JSON schema, serialized as:
➢ JSON* format, and
➢ JSON-LD** with vocabulary from schema.org
■ serializations in other formats can also be done, as / if needed
❖ Benefits for DataMed and databases index by DataMed
➢ Increased visibility (by both popular search engines), accessibility (via
common query interfaces) and possibly improved ranking
➢
❖ Extending/influencing schema.org
➢ Submitted to their tracker missing DATS core elements
➢ Coordinating via the bioschemas.org initiative (ELIXIR is also part of)
the extension of schema.org for life science
* JavaScript Object Notation
** JavaScript Object Notation for Linked Data
Influence on schema.org evolution
https://developers.goo
gle.com/search/docs/d
ata-types/datasets
Acknowledgements
(for the bioCADDIE phase of DATS)
doi:10.1038/sdata.2017.59
Work ongoing in the
DCPPC crosscut metadata model
subgroup
❖ Creating DATS examples: https://github.com/dcppc/data-stewards
✧ Oxygen team: https://github.com/dcppc/data-stewards/issues/12
✧ Phosphorous team: in progress
✧ ……
Questions

Dats nih-dccpc-kc7-april2018-prs-uoxf

  • 1.
    Developed as acommunity effort, as part of the NIH BD2K bioCADDIE grant (1U24 AI117966-01) DATS - Data Tag Suite: model overview Philippe Rocca-Serra, Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone {philippe.rocca-serra,alejandra.gonzalez-beltran,susanna-assunta.sansone}@oerc.ox.ac.uk Oxford e-Research Centre, Department of Engineering, University of Oxford, UK NIH DCCPC - KC7, crosscut metadata model subgroup; April 20th, 2018 An activity of the NIH Data Commons’ Oxygen (1OT3OD025462-01) and Phosphorus (1OT3OD025459-01) Teams in KC7
  • 2.
    What is DATS? JATS (Journal Article Tag Suite) underpins PubMed for literature indexing, DATS (DAta Tag Suite) the data model to index data sources (used by DataMed, but not limited to) TIP: click github octocat to be taken to relevant files/document from this slide deck to the DATS specification(s)
  • 3.
    Where do Ifind the documentation? JATS (Journal Article Tag Suite) underpins PubMed for literature indexing, DATS (DatA Tag Suite) the data model to index data sources (used by DataMed, but not limited to) doi:10.1038/ng.3864 (2017) doi:10.1038/sdata.2017.59 (2017)
  • 4.
    Mar15 Jun15 Dec15 Jun16 Aug15 May16 Sep16 Mar17 bioCADDIE team -iterative development Our community engagement: input, feedback and links Phase 1 Phase 2 Phase 3 Design and development SOP and metadata strawman <DATS> name DATS v1.1 May17 DATS v2.0 (with access metadata, WG7) DATS v2.1 (schema.org JSON-LD) DATS v2.2 Metadata specification V1.0 with JSON schema Use cases workshop 1st DATS workshop WG3 formed; telecons start; dissemination via 2nd DATS workshop WG7 formed; telecons start WG12 formed; telecons start Evaluation & iterative refinement Continued evaluation & consolidation primarily metadata modelers primarily implementers
  • 5.
    ❖ Enabling discoverability:find and access datasets ❖ Focusing on surfacing key metadata descriptors, such as ✧ information and relations between authors, datasets, publication, funding sources, nature of biological signal and perturbation etc. ✧ Not the perfect model to represent all experimental details but enough capability to capture essential descriptors ✧ the domain-specific level of details and metadata belong to the realm of specialized databases ❖ Better than just having keywords ✧ we have aimed to have maximum coverage of use cases with minimal number of data elements and relations What was DATS supposed to do and be?
  • 6.
    Metadata elements identifiedby combining the two complementary approaches USE CASES: top-down approach SCHEMAS: bottom-up approach The development process in a nutshell (v1.0, v1.1, v2.0, v2.1, v2.2)
  • 7.
    bottom-up approach Building DATSby alignment (standing on the shoulders of giants) ❖ BioProject ❖ BioSample ❖ MiNIML ❖ PRIDE-ml ❖ MAGE-tab ❖ GA4GH metadata schema ❖ SRA xml ❖ ISA ❖ CDISC SDM / element of BRIDGE model ❖ ……(full list in the DATS specification) ❖ DataCite ❖ RIF-CS ❖ W3C HCLS dataset descriptions ❖ (mapping of many models including DCAT, PROV, VOID, Dublin Core) ❖ Project Open Metadata (used by HealthData.gov is being added in this new iteration) ❖ schema.org
  • 8.
    Convergence of elements extracted from competency questions andexisting (generic and biomedical) data models (incl. DataCite, DCAT, schema.org, HCLS dataset, RIF-CS, ISA-Tab, SRA-xml etc.) Building DATS from query cases Adoption of elements extracted from and from core entities extended entities
  • 9.
    Capturing the natureof a dataset Database of Reference Knowledge Storing knowledge about “The building blocks” Archive of Experiments Storing “The signal” [acquisition, analysis, reverse engineering]
  • 10.
    Defining boundaries forDATS ● Get all datasets on COPD where transcription profiling, spirometry’ were measured in cohorts of Southern Europeans A query to retrieve datasets for further analysis. ● Get all genes whose expression in human lung tissue is elevated following exposure to diesel particulates A query about findings for hypothesis generation DATS could represent the collection of statements as a datasets but how the statements are actually structured is beyond to current scope of DATS . Complementarity with Biolinks - @cmungall
  • 11.
  • 12.
    ❖ What isthe dataset about? ✧ Material, Data ❖ How was the dataset produced ? Which information does it hold? ✧ Dataset / Data Type with its Dimension, Method/Technology, Instrument ❖ Where can a dataset be found? ✧ Dataset, Distribution, Access objects (links to License, Formats) ❖ When was the datasets produced, released etc.? ✧ Dates to specify the nature of an event {create, modify, start, end...} and its timestamp ❖ Who did the work, funded the research, hosts the resources etc.? ✧ Person, Organisation and their roles, Grant DATS fundamentals
  • 13.
    Relevant use cases: assemblingsynthetic cohorts DATS objects highlight
  • 14.
    Counting things (I): trackingpatient and specimen relationships
  • 15.
    Counting things (I): trackingpatient and specimen relationships Relationships between materials matter: ❖ Assessing sample / specimen origin and patient identity ❖ In the context of longitudinal studies, repeated measure designs, where samples are collected or variables measured several times over the course of a study Ease of use and compatibility with biomedical ontologies owing to 'familiarity and awareness’ of DO, GO, UBERON and the likes. Note: if underlying model isn’t rich enough (as observed when mapping from a broad range of resources), accurate mapping from a primary resource into DATS may prove difficult
  • 16.
    Counting things (II): groupsand sizes in the context of studies
  • 17.
    Counting things (II): groupsand sizes in the context of studies For all datasets characterising “signal”, the ability to identify, list and characterise study populations matters, as does the ability to capture descriptors for ‘treatment’ or ‘perturbations’
  • 18.
    Dealing with spatial andtemporal properties of a dataset
  • 19.
    Tracking dataset spatialand temporal properties Where & When Query: “Get all datasets collected between 1945 and 1968 in University Hospitals from Japan and Korea”
  • 20.
    Spatial information ❖ DATSuses the entity ‘Place’ to report geolocation information for a Dataset (and other entities) Place: entity with the following attributes -name. -description. -coordinates. -geometry. {values from geoJSON} -postalAddress. dats.Dataset spatialCoverage dats.Place dats.Material spatialCoverage dats.Place dats.Organization location dats.Place dats.Activity location dats.Place dats.Place relates to Feature in GeoJSON, GeoLocation in DataCite and Place in schema.org
  • 21.
    Measuring things: Supporting thedescription of variables - dimensions and their relation to datasets
  • 22.
    Dimensions 1. DATS.Dimension: meantto be used to report what data points are about in a dataset, their nature, their units. 2. DATS.Dimension should be typed (categorical, continuous) 3. DATS.Dimension used from : ○ DATS.Material.characteristics.Dimension ○ DATS.DataAcquisition.measures.Dimension
  • 23.
    Dimension: an example {"@type": "Dimension", "identifier": { "identifier": "AQ5", "identifierSource": "" } , "name": { "valueIRI": "", "value": "Current marital status" }, "types":[{"value":"categorical","valueIRI":""}], "values": [ "1 Married", "2 Widowed", "3 Separated", "4 Divorced", "5 Never married", "-9 Missing" ], "partOf": [ "Dataset-33581-0001.json" ], "extraProperties": [ { "category": "landingPage", "values": [ "http://www.icpsr.umich.edu/ icpsrweb/ICPSR/ssvd/studies/ 33581/datasets/0001/variables/AQ5" ] } ] } /json-instances/ICPSR-33581/Dimension-33581-0001-AQ5.json Credits to Matthew Richardson / Sanda Ionescu (ICPSR)
  • 24.
    Dimensions ❖ Ongoing discussionto augment DATS.dimension in order to: ❖ provide summary statistics (min,max,mode,median,mean….) ❖ linked to group information ❖ under development and evaluation ❖ tightly tied to consent, access and terms of use issues summary statistics
  • 25.
    Tracking what thedataset is about Note: alignment with biomedical ontologies (e.g. OBO foundry)
  • 26.
    Objects to supportthe description of variables dimensions and their relation to datasets A study schedules a data acquisition event which measures a variable about some material, input to the event the resulting datasets has part dimensions
  • 27.
    Datasets can containDatasets Distinguish between results from measurement (output of data acquisitions) and results from data transformations (output of data analysis)
  • 28.
    Condition of access: licenses,access conditions and distributions
  • 29.
    How can thedataset be accessed? Access entity: ❖ landing page; access URL ❖ methods of access (e.g. download, service) ❖ authorization requirements ❖ authentication requirements
  • 30.
    How can thedataset be used? (licenses) ❖ The dataset should be associated with one or more licenses, which determine the terms of use of the dataset ❖ Licenses are legal documents giving official permission to do something with the resource ❖ DATS supports to record licenses’ identifiers, names, versions, creators. More details about the licenses are expected to be retrieved from external resources.
  • 31.
    Where the datasetcan be found? What data standards the dataset conforms to? How can the dataset be accessed? DATS allows reporting on what reporting guidelines, formats/models, terminologies the dataset complies with/uses
  • 32.
    Distribution: an example Adistribution is a specific available form of a dataset (e.g. the dataset in a specific format or specific endpoint)
  • 33.
  • 34.
    Relating datasets todatabases and data standards
  • 35.
  • 36.
    primary identifier (0..1) alternativeand related identifiers (0..n) Object identification
  • 37.
  • 38.
    Object identification: guidelines master/examples/Uniprot-P77967.json "identifier": { "identifier":"uniprot:P77967", "identifierSource": "uniprot" }, “alternateIdentifiers”:[ { "alternateIdentifier": "PIR:S74805", "identifierSource": "PIR" } ], ”relatedIdentifiers": [ { "identifier": "PANTHER:PTHR11455:SF22", "identifierSource": “PANTHER”, "relationType": “family and domain database” }] ❖ Primary identifier of the dataset - can be a string, but ideally an IRI. The identifier source is organization/namespace responsible for creating/hosting it (here using “compact URIs”) ❖ Identifiers of the dataset, other than the primary and their sources ❖ Identifiers of related resources: useful to allow cross-references with other complementary resources
  • 39.
    Tracking dataset produceridentity Who (and acknowledging funders)
  • 40.
  • 41.
    ❖ Distinction betweenprimary publication(s) and other citations ❖ If published work, pubmed or DOI will suffice ✧ Rely on dedicated APIs to recover necessary publication metadata for indexing/search, which can be included in DATS automatically "primaryPublications" : [ { "identifier": { "identifier": "https://www.ncbi.nlm.nih.gov/pubmed/7762914", "identifierSource": "pubmed" }, "alternateIdentifiers": [ { "identifier": "http://dx.doi.org/10.7326/0003-4819-123-1-199507010-00007", "identifierSource": "doi" } ], } ] Tracking bibliographic information
  • 42.
    ❖ Validators /Schema compliance testing ❖ DataMed Transformation Language https://biocaddie.org/sites/default/files/d7/project/1869/biocaddie-ahm-ingestion-2017sep.pdf [Jeff Grethe] Tools to handle DATS documents https://github.com/biocaddie/WG3-MetadataSpecifications
  • 43.
    Lessons learned ❖ Identificationof a Dataset: ➢ Identifying what is a dataset for a particular source is crucial for setting up an indexing pipeline to DataMed ❖ Use of DATS Dimensions (a high-level representation of quantitative or qualitative properties of an entity) ➢ E.g. in OMOP CDM there was a need to split a single entity into its procedural (mapped to DATS.DataAcquisition) and its variable information (mapped to DATS.Dimension) ❖ Documentation ➢ The available DATS documentation was useful, more would be better ❖ Support infrastructure ➢ For the future, include more examples and validation infrastructure
  • 44.
    Serializations and useof schema.org ❖ DATS model in JSON schema, serialized as: ➢ JSON* format, and ➢ JSON-LD** with vocabulary from schema.org ■ serializations in other formats can also be done, as / if needed ❖ Benefits for DataMed and databases index by DataMed ➢ Increased visibility (by both popular search engines), accessibility (via common query interfaces) and possibly improved ranking ➢ ❖ Extending/influencing schema.org ➢ Submitted to their tracker missing DATS core elements ➢ Coordinating via the bioschemas.org initiative (ELIXIR is also part of) the extension of schema.org for life science * JavaScript Object Notation ** JavaScript Object Notation for Linked Data
  • 45.
    Influence on schema.orgevolution https://developers.goo gle.com/search/docs/d ata-types/datasets
  • 46.
    Acknowledgements (for the bioCADDIEphase of DATS) doi:10.1038/sdata.2017.59
  • 47.
    Work ongoing inthe DCPPC crosscut metadata model subgroup ❖ Creating DATS examples: https://github.com/dcppc/data-stewards ✧ Oxygen team: https://github.com/dcppc/data-stewards/issues/12 ✧ Phosphorous team: in progress ✧ ……
  • 48.