SlideShare a Scribd company logo
Describing Datasets with the
Health Care and Life Sciences
Community Profile
Alasdair Gray
Department of Computer Science
Heriot-Watt University
www.macs.hw.ac.uk/~ajg33/
A.J.G.Gray@hw.ac.uk
@gray_alasdair
Michel Dumontier
Stanford University
M. Scott Marshall
Netherlands Cancer Institute
Materials released
under CC-BY License
You are free to:
• Share — copy and redistribute the material in any medium or format
• Adapt — remix, transform, and build upon the material for any purpose,
even commercially.
The licensor cannot revoke these freedoms as long as you follow the license
terms.
Under the following terms:
• Attribution — You must give appropriate credit, provide a link to the
license, and indicate if changes were made. You may do so in any
reasonable manner, but not in any way that suggests the licensor endorses
you or your use.
4 December 2016 Alasdair Gray @gray_alasdair 2
Outline
• Dataset Descriptions
• Overview
• Use Cases
• Requirements
• Implementations
• Existing Vocabularies
• HCLS Community Profile
• Overview
• Example
• Modules
• Hands On
• Develop your own description
• Validation
• Overview
• Demonstration
• Hands On
• Validate your description
• RDF Dataset Statistics
• Overview
• SPARQL Queries
4 December 2016 Alasdair Gray @gray_alasdair 3
W3C HCLS Group
Dumontier M, Gray AJG, Marshall MS, et
al. (2016) The health care and life sciences
community profile for dataset descriptions.
PeerJ 4:e2331 https://doi.org/10.7717/peerj.2331
4 December 2016 Alasdair Gray @gray_alasdair 4
Use Cases
Overview of a couple of examples
4 December 2016 Alasdair Gray @gray_alasdair 5
FAIR Data Principles
Findable
• Global persistent identifier
• Rich metadata
• Store metadata in registries
Accessible
• Resolvable identifiers
• Metadata persists
• Machine and human access
Interoperable
• Open data format
• Modelled with FAIR compliant vocabularies
• Reference external data
Reusable
• Rich metadata
• Clear license
• Provenance
4 December 2016 Alasdair Gray @gray_alasdair 6
FAIR Data
4 December 2016 Alasdair Gray @gray_alasdair 7
Open PHACTS Drug Discovery Platform
4 December 2016 Alasdair Gray @gray_alasdair 8
Open PHACTS Drug Discovery Platform
9
Data Cache
(Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON) Domain
Specific
Services
Identity
Resolution
Service
Identifier
Management
Service
“Adenosine
receptor 2a”
EC2.43.4
CS4532
P12374
CorePlatform
ChEMBL-
RDF
ChEMBL
v13
Chem2
Bio2RDF
SD
v13
v12
v2 or v8
10
Which version of ChEMBL?
Data Cache
(Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON) Domain
Specific
Services
Identity
Resolution
Service
Identifier
Management
Service
“Adenosine
receptor 2a”
EC2.43.4
CS4532
P12374
CorePlatform
ChEMBL-
RDF
ChEMBL
v13
Chem2
Bio2RDF
SD
v13
v12
v2 or v8
Open PHACTS
Discovery PlatformHistoric Use Case
~January 2012
Open PHACTS v2.1
ChEMBL 20
http://tiny.cc/ops-datasets
11
Which version of ChEMBL?
Challenges
• Datasets available
• In many versions over time
• In different formats
• From many mirrors/registries
• Datasets build on each other
• Files do not carry metadata
• Registries
• Can be out-of-date
• Can contain conflicting information
4 December 2016 Alasdair Gray @gray_alasdair 12
Scientists
require data
provenance!
Requirements
4 December 2016 Alasdair Gray @gray_alasdair 13
Metadata Requirements
• What is the dataset about?
• Name and description
• Content themes
• Who produced the dataset?
• Publisher details
• Content author details
• When was the dataset created?
• Date of creation
• Date of publication
• Versioning
• Where can I get the dataset?
• Licence and rights
• Download URL
• SPARQL endpoint
• Web service
• How was the dataset created?
• Experimental methodology
• Post-processing
• Why was the dataset created?
• Motivation
4 December 2016 Alasdair Gray @gray_alasdair 14
HCLS Additional Requirements
Standard metadata requirements plus:
1. Resolvable identifiers for metadata
2. Descriptions of data identifiers
3. Data provenance
4. Data statistics
4 December 2016 Alasdair Gray @gray_alasdair 15
HCLS Implementations
RDF Platform
More coming…
4 December 2016 Alasdair Gray @gray_alasdair 16
Existing Vocabularies
4 December 2016 Alasdair Gray @gray_alasdair 17
Dublin Core Metadata Initiative
Widely used
Broadly applicable
• Documents
• Datasets
✗Generic terms
✗Not comprehensive
✗No required properties
4 December 2016 Alasdair Gray @gray_alasdair 18
“Date: A point or period of time
associated with an event in the
lifecycle of the resource.”
19Alasdair Gray @gray_alasdair
Metadata carried with data
• Directly embedded: void:inDataset
✗No versioning
✗No checklist of requisite fields
✗Only for RDF data
VoID: Vocabulary of Interlinked Datasets
4 December 2016
DCAT: Data Catalog
 Separates Dataset and Distribution
✗No versioning
✗No prescribed properties
4 December 2016 Alasdair Gray @gray_alasdair 20
Fixed with DCAT-AP
DCAT-AP doesn’t meet use
case needs
HCLS Community Profile
http://www.w3.org/TR/hcls-dataset/
4 December 2016 Alasdair Gray @gray_alasdair 21
HCLS Dataset Descriptions
• 61 Metadata
properties
– 5 Modules
• 18 vocabularies
– DCTerms
– DCAT
– VoID
– …
4 December 2016 Alasdair Gray @gray_alasdair 22
ChEMBL Summary Level Description
4 December 2016 Alasdair Gray @gray_alasdair 23
ChEMBL 17 Version Level Description
4 December 2016 Alasdair Gray @gray_alasdair 24
ChEMBL 17 DB Distribution
4 December 2016 Alasdair Gray @gray_alasdair 25
ChEMBL 17 DB Distribution
4 December 2016 Alasdair Gray @gray_alasdair 26
ChEMBL 17 RDF Distribution
4 December 2016 Alasdair Gray @gray_alasdair 27
ChEMBL 17 RDF Distribution
4 December 2016 Alasdair Gray @gray_alasdair 28
ChEMBL 17 RDF Distribution
4 December 2016 Alasdair Gray @gray_alasdair 29
Description Modules
4 December 2016 Alasdair Gray @gray_alasdair 30
Core Metadata (Title & description)
Element Property Value
Summary
Level
Version
Level
Distribution
Level
Type declaration rdf:type dctypes:Dataset MUST MUST SHOULD
Type declaration rdf:type
void:Dataset or
dcat:Distribution MUST NOT MUST NOT MUST
Title dct:title rdf:langString MUST MUST MUST
Alternative titles dct:alternative rdf:langString MAY MAY MAY
Description dct:description rdf:langString MUST MUST MUST
4 December 2016 Alasdair Gray @gray_alasdair 31
Core Metadata (Dates & contributors)
Element Property Value
Summary
Level
Version
Level
Distribution
Level
Date createddct:created
rdfs:Literal encoded using the relevant ISO 8601
Date and Time compliant string and typed using
the appropriate XML Schema datatype
MUST
NOT SHOULD SHOULD
Other dates
pav:createdOn or
pav:authoredOn or
pav:curatedOn
xsd:dateTime, xsd:date, xsd:gYearMonth, or
xsd:gYear
MUST
NOT MAY MAY
Creators dct:creator IRI
MUST
NOT MUST MUST
Contributors
dct:contributor or
pav:createdBy or
pav:authoredBy or
pav:curatedBy IRI
MUST
NOT MAY MAY
Date of issuedct:issued
rdfs:Literal encoded using the relevant ISO 8601
Date and Time compliant string and typed using
the appropriate XML Schema datatype
MUST
NOT SHOULD SHOULD
Core Metadata (Publisher and licence)
Element Property Value
Summary
Level
Version
Level
Distribution
Level
Publisher dct:publisher IRI MUST MUST MUST
HTML page foaf:page IRI SHOULD SHOULD SHOULD
Logo schemaorg:logo IRI SHOULD SHOULD SHOULD
License dct:license IRI MAY SHOULD MUST
Rights dct:rights rdf:langString MAY MAY MAY
4 December 2016 Alasdair Gray @gray_alasdair 33
Core Metadata (Content description)
Element Property Value
Summary
Level
Version
Level
Distribution
Level
Keywords dcat:keyword xsd:string MAY MAY MAY
Language dct:language
http://lexvo.org/id/iso
639-3/{tag} MUST NOT SHOULD SHOULD
References dct:references IRI MAY MAY MAY
Concept
descriptors dcat:theme
IRI of type
skos:Concept MAY MAY MAY
Vocabulary
used void:vocabulary IRI MUST NOT MUST NOTSHOULD
Standards used dct:conformsTo IRI MUST NOT MAY SHOULD
Citations cito:citesAsAuthority IRI MAY MAY MAY
Related
material rdfs:seeAlso IRI MAY MAY MAY
Partitions dct:hasPart IRI MAY MAY MUST NOT
4 December 2016 Alasdair Gray @gray_alasdair 34
Identifiers
Element Property Value
Summary
Level
Version
Level
Distribution
Level
Preferred prefix idot:preferredPrefix xsd:string MAY MAY MAY
Alternate prefix idot:alternatePrefix xsd:string MAY MAY MAY
Identifier pattern idot:identifierPattern xsd:string MUST NOT MUST NOT MAY
URI pattern void:uriRegexPattern xsd:string MUST NOT MUST NOT MAY
File access pattern idot:accessPattern idot:AccessPattern MUST NOT MUST NOT MAY
Example identifier idot:exampleIdentifier xsd:string MUST NOT MUST NOT SHOULD
Example resource void:exampleResource IRI MUST NOT MUST NOT SHOULD
4 December 2016 Alasdair Gray @gray_alasdair 35
Provenance and Change (Versioning)
Element Property Value
Summary
Level
Version
Level
Distribution
Level
Version identifier pav:version xsd:string MUST NOT MUST SHOULD
Version linking dct:isVersionOf IRI MUST NOT MUST MUST NOT
Version linking pav:previousVersion IRI MUST NOT SHOULD SHOULD
Version linking pav:hasCurrentVersion IRI MAY MUST NOT MUST NOT
4 December 2016 Alasdair Gray @gray_alasdair 36
Provenance and Change
Element Property Value
Summary
Level
Version
Level
Distribution
Level
Data source
provenance
dct:source or
pav:retrievedFrom or
prov:wasDerivedFrom IRI MUST NOT SHOULD SHOULD
Item listing sio:has-data-item IRI MUST NOT MUST NOT MAY
Creation tool pav:createdWith IRI MUST NOT SHOULD SHOULD
Update frequency dct:accrualPeriodicity
IRI of type
dctypes:Frequency SHOULD MUST NOT MUST NOT
4 December 2016 Alasdair Gray @gray_alasdair 37
Availability and Distributions
Element Property Value
Summary
Level
Version
Level
Distribution
Level
Distribution description dcat:distribution
IRI of Distribution
Level description MUST NOT SHOULD MUST NOT
File format dct:format IRI or xsd:String MUST NOT MUST NOT MUST
File directory dcat:accessURL IRI MAY MAY MAY
File URL dcat:downloadURL IRI MUST NOT MUST NOT SHOULD
Byte size dcat:byteSize xsd:decimal MUST NOT MUST NOT SHOULD
RDF File URL void:dataDump IRI MUST NOT MUST NOT SHOULD
SPARQL endpoint void:sparqlEndpoint IRI SHOULD
SHOULD
NOT SHOULD NOT
Documentation dcat:landingPage IRI MUST NOT MAY MAY
Linkset void:subset IRI MUST NOT MUST NOT SHOULD
4 December 2016 Alasdair Gray @gray_alasdair 38
Statistics (RDF Core)
Element Property Value
Summary
Level
Version
Level
Distribution
Level
# of triples void:triples xsd:integer MUST NOT MUST NOT SHOULD
# of typed entities void:entities xsd:integer MUST NOT MUST NOT SHOULD
# of subjects void:distinctSubjects xsd:integer MUST NOT MUST NOT SHOULD
# of properties void:properties xsd:integer MUST NOT MUST NOT SHOULD
# of objects void:distinctObjects xsd:integer MUST NOT MUST NOT SHOULD
# of classes void:classPartition IRI MUST NOT MUST NOT SHOULD
# of literals void:classPartition IRI MUST NOT MUST NOT SHOULD
# of RDF graphs void:classPartition IRI MUST NOT MUST NOT SHOULD
Statistics (RDF Complete)
Element Property Value
Summary
Level
Version
Level
Distribution
Level
class frequency void:classPartition IRI MUST NOT MUST NOT MAY
property frequency void:propertyPartition IRI MUST NOT MUST NOT MAY
property and subject
types void:propertyPartition IRI MUST NOT MUST NOT MAY
property and object
types void:propertyPartition IRI MUST NOT MUST NOT MAY
property and literals void:propertyPartition IRI MUST NOT MUST NOT MAY
property subject and
object types void:propertyPartition IRI MUST NOT MUST NOT MAY
4 December 2016 Alasdair Gray @gray_alasdair 40
Hands on
Create your own dataset description
If you don’t have your own, pick one of the following two
4 December 2016 Alasdair Gray @gray_alasdair 41
PHI-Base: http://www.phi-base.org/
Guide to Pharmacology:
http://www.guidetopharmacology.org/
Validation Service
Andrew Beveridge
Jacob Baungard
Hansen
Johnny Val
Leif Gehrmann
Roisin Farmer
Sunil Khutan
Tomas Robertson
4 December 2016 Alasdair Gray @gray_alasdair 44
Example Constraint
4 December 2016 46
• Shape
• A Dataset Summary
• MUST be declared to be of type dctype:Dataset
• MUST have a dcterms:title as a language typed string
• MUST NOT have dcterms:created date
<Dataset> rdf:langString
.
✗
Alasdair Gray @gray_alasdair
Dates are associated
with versions in HCLS
Example Validation
4 December 2016 47
<Dataset> rdf:langString
.
✗
Alasdair Gray @gray_alasdair
• Shape
• Data
Example Validation
• Shape
• Data
4 December 2016 48
<Dataset> rdf:langString
.
✗
Alasdair Gray @gray_alasdair
Example Validation
4 December 2016 49
<Dataset> rdf:langString
.
✗
Alasdair Gray @gray_alasdair
• Shape
• Data
Example Validation (Closed Shape)
4 December 2016 50
<Dataset> rdf:langString
.
✗
Alasdair Gray @gray_alasdair
• Shape
• Data
<Dataset> {
rdf:type (dctypes:Dataset),
dct:title rdf:langString,
dct:alternative rdf:langString+,
!dct:created .
}
Shape
4 December 2016 51
<Dataset> rdf:langString
.
✗
Alasdair Gray @gray_alasdair
Shape Expressions (ShEx)
<Dataset> {
rdf:type (dctypes:Dataset),
dct:title rdf:langString,
dct:alternative rdf:langString+,
!dct:created .
}
<Dataset> {
rdf:type (dctypes:Dataset),
dct:title rdf:langString,
dct:alternative rdf:langString+,
!dct:created .
}
<Dataset> {
rdf:type (dctypes:Dataset),
dct:title rdf:langString,
dct:alternative rdf:langString+,
!dct:created .
}
<Dataset> {
rdf:type (dctypes:Dataset),
dct:title rdf:langString,
dct:alternative rdf:langString+,
!dct:created .
}
<Dataset> {
rdf:type (dctypes:Dataset),
dct:title rdf:langString,
dct:alternative rdf:langString+,
!dct:created .
}
<Dataset> {
rdf:type (dctypes:Dataset),
dct:title rdf:langString,
dct:alternative rdf:langString+,
!dct:created .
}
4 December 2016 52Alasdair Gray @gray_alasdair
Validator can’t warn of
missing property
Example data
ShEx Validation
<Dataset> {
`MUST` rdf:type (dctypes:Dataset),
`MUST` dct:title rdf:langString,
`MAY` dct:alternative rdf:langString+,
`MUST` !dct:created .
}
Shape
4 December 2016 53
<Dataset> rdf:langString
.
✗
Alasdair Gray @gray_alasdair
Validator can warn of
missing property
Requirement Levels
http://www.w3.org/2015/03/ShExValidata/
Validata Demo
Validator Hands-On
4 December 2016 Alasdair Gray @gray_alasdair 56
http://www.w3.org/2015/03/ShExValidata/
• Try the ChEMBL example
• Play with Options
• Add resources
• Switch between open/closed shapes
• Change requirement level
• Check your own work
• Try descriptions from
• Riken Metadatabase
http://metadb.riken.jp/archives/HCLSProfile/HCLSProfile_MetaDB_ALL.ttl
• PHI-Base
https://www.dropbox.com/s/tghogwagn962tmi/Metadata.ttl?dl=0
4 December 2016 Alasdair Gray @gray_alasdair 57
RDF Dataset Statistics
4 December 2016 Alasdair Gray @gray_alasdair 58
Statistics (RDF Core)
Element Property Value
Summary
Level
Version
Level
Distribution
Level
# of triples void:triples xsd:integer MUST NOT MUST NOT SHOULD
# of typed entities void:entities xsd:integer MUST NOT MUST NOT SHOULD
# of subjects void:distinctSubjects xsd:integer MUST NOT MUST NOT SHOULD
# of properties void:properties xsd:integer MUST NOT MUST NOT SHOULD
# of objects void:distinctObjects xsd:integer MUST NOT MUST NOT SHOULD
# of classes void:classPartition IRI MUST NOT MUST NOT SHOULD
# of literals void:classPartition IRI MUST NOT MUST NOT SHOULD
# of RDF graphs void:classPartition IRI MUST NOT MUST NOT SHOULD
Statistics (RDF Complete)
Element Property Value
Summary
Level
Version
Level
Distribution
Level
class frequency void:classPartition IRI MUST NOT MUST NOT MAY
property frequency void:propertyPartition IRI MUST NOT MUST NOT MAY
property and subject
types void:propertyPartition IRI MUST NOT MUST NOT MAY
property and object
types void:propertyPartition IRI MUST NOT MUST NOT MAY
property and literals void:propertyPartition IRI MUST NOT MUST NOT MAY
property subject and
object types void:propertyPartition IRI MUST NOT MUST NOT MAY
4 December 2016 Alasdair Gray @gray_alasdair 60
Why provide rich dataset descriptions?
• Support
• Endpoint
exploration
• Query writing
• Eliminates
expensive
exploratory queries
4 December 2016 Alasdair Gray @gray_alasdair 61
Why provide rich dataset descriptions?
• Support
• Endpoint
exploration
• Query writing
• Eliminates
expensive
exploratory queries
4 December 2016 Alasdair Gray @gray_alasdair 62
Why provide rich dataset descriptions?
• Support
• Endpoint
exploration
• Query writing
• Eliminates
expensive
exploratory queries
4 December 2016 Alasdair Gray @gray_alasdair 63
Generate with SPARQL queries
Number of triples Subject-types related to Object-types
4 December 2016 Alasdair Gray @gray_alasdair 64
Summary
4 December 2016 Alasdair Gray @gray_alasdair 65
FAIR Data Principles
Findable
• Global persistent identifier
• Rich metadata
• Store metadata in registries
Accessible
• Resolvable identifiers
• Metadata persists
• Machine and human access
Interoperable
• Open data format
• Modelled with FAIR compliant vocabularies
• Reference external data
Reusable
• Rich metadata
• Clear license
• Provenance
4 December 2016 Alasdair Gray @gray_alasdair 66
HCLS Dataset Descriptions
https://www.w3.org/TR/hcls-dataset/
Dumontier M, Gray AJG, Marshall MS, et al. (2016) The health care and life sciences
community profile for dataset descriptions. PeerJ 4:e2331 https://doi.org/10.7717/peerj.2331
A.J.G.Gray@hw.ac.uk @gray_alasdair

More Related Content

What's hot

Yosemite Project - Part 3 - Transformations for Integrating VA data with FHIR...
Yosemite Project - Part 3 - Transformations for Integrating VA data with FHIR...Yosemite Project - Part 3 - Transformations for Integrating VA data with FHIR...
Yosemite Project - Part 3 - Transformations for Integrating VA data with FHIR...
DATAVERSITY
 
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
DATAVERSITY
 
Citations needed for the sum of all human knowledge: Wikidata as the missing ...
Citations needed for the sum of all human knowledge: Wikidata as the missing ...Citations needed for the sum of all human knowledge: Wikidata as the missing ...
Citations needed for the sum of all human knowledge: Wikidata as the missing ...
Dario Taraborelli
 
Verifiable, linked open knowledge that anyone can edit
Verifiable, linked open knowledge that anyone can editVerifiable, linked open knowledge that anyone can edit
Verifiable, linked open knowledge that anyone can edit
Dario Taraborelli
 
The DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with ConfidenceThe DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with Confidence
Merce Crosas
 
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
GigaScience, BGI Hong Kong
 
Metadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data RepositoriesMetadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data Repositories
andrea huang
 
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can EditWikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Dario Taraborelli
 
ORCID: An Overview - Alice Meadows
ORCID: An Overview - Alice MeadowsORCID: An Overview - Alice Meadows
ORCID: An Overview - Alice Meadows
Crossref
 
Using Funding Data
Using Funding DataUsing Funding Data
Using Funding Data
Crossref
 
Data curation at Dryad Digital Repository: A former curator's perspective
Data curation at Dryad Digital Repository: A former curator's perspectiveData curation at Dryad Digital Repository: A former curator's perspective
Data curation at Dryad Digital Repository: A former curator's perspective
Jane Frazier
 
Linked Data for improved organization of research data
Linked Data  for improved organization  of research dataLinked Data  for improved organization  of research data
Linked Data for improved organization of research data
Samuel Lampa
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Rothamsted Research, UK
 
Introduction to CrossRef Text and Data Mining Webinar
Introduction to CrossRef Text and Data Mining WebinarIntroduction to CrossRef Text and Data Mining Webinar
Introduction to CrossRef Text and Data Mining Webinar
Crossref
 
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
Araport
 
Your Work is Distinctive, What about Your Name?
Your Work is Distinctive, What about Your Name?Your Work is Distinctive, What about Your Name?
Your Work is Distinctive, What about Your Name?
ORCID, Inc
 
Federated Query Formulation and Processing through BioFed
Federated Query Formulation and Processing through BioFedFederated Query Formulation and Processing through BioFed
Federated Query Formulation and Processing through BioFed
Syed Muhammad Ali Hasnain
 
The ENCODE Portal REST API
The ENCODE Portal REST API The ENCODE Portal REST API
The ENCODE Portal REST API
ENCODE-DCC
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Merce Crosas
 
FundRef Webinar
FundRef WebinarFundRef Webinar
FundRef Webinar
Crossref
 

What's hot (20)

Yosemite Project - Part 3 - Transformations for Integrating VA data with FHIR...
Yosemite Project - Part 3 - Transformations for Integrating VA data with FHIR...Yosemite Project - Part 3 - Transformations for Integrating VA data with FHIR...
Yosemite Project - Part 3 - Transformations for Integrating VA data with FHIR...
 
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
 
Citations needed for the sum of all human knowledge: Wikidata as the missing ...
Citations needed for the sum of all human knowledge: Wikidata as the missing ...Citations needed for the sum of all human knowledge: Wikidata as the missing ...
Citations needed for the sum of all human knowledge: Wikidata as the missing ...
 
Verifiable, linked open knowledge that anyone can edit
Verifiable, linked open knowledge that anyone can editVerifiable, linked open knowledge that anyone can edit
Verifiable, linked open knowledge that anyone can edit
 
The DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with ConfidenceThe DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with Confidence
 
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
 
Metadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data RepositoriesMetadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data Repositories
 
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can EditWikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
 
ORCID: An Overview - Alice Meadows
ORCID: An Overview - Alice MeadowsORCID: An Overview - Alice Meadows
ORCID: An Overview - Alice Meadows
 
Using Funding Data
Using Funding DataUsing Funding Data
Using Funding Data
 
Data curation at Dryad Digital Repository: A former curator's perspective
Data curation at Dryad Digital Repository: A former curator's perspectiveData curation at Dryad Digital Repository: A former curator's perspective
Data curation at Dryad Digital Repository: A former curator's perspective
 
Linked Data for improved organization of research data
Linked Data  for improved organization  of research dataLinked Data  for improved organization  of research data
Linked Data for improved organization of research data
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
 
Introduction to CrossRef Text and Data Mining Webinar
Introduction to CrossRef Text and Data Mining WebinarIntroduction to CrossRef Text and Data Mining Webinar
Introduction to CrossRef Text and Data Mining Webinar
 
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
 
Your Work is Distinctive, What about Your Name?
Your Work is Distinctive, What about Your Name?Your Work is Distinctive, What about Your Name?
Your Work is Distinctive, What about Your Name?
 
Federated Query Formulation and Processing through BioFed
Federated Query Formulation and Processing through BioFedFederated Query Formulation and Processing through BioFed
Federated Query Formulation and Processing through BioFed
 
The ENCODE Portal REST API
The ENCODE Portal REST API The ENCODE Portal REST API
The ENCODE Portal REST API
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
 
FundRef Webinar
FundRef WebinarFundRef Webinar
FundRef Webinar
 

Similar to Tutorial: Describing Datasets with the Health Care and Life Sciences Community Profile

Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSDataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
Alasdair Gray
 
Re-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutRe-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playout
MediaMixerCommunity
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
Ontotext
 
Semantic Web talk TEMPLATE
Semantic Web talk TEMPLATESemantic Web talk TEMPLATE
Semantic Web talk TEMPLATE
Oleksiy Pylypenko
 
Apache Any23 - Anything to Triples
Apache Any23 - Anything to TriplesApache Any23 - Anything to Triples
Apache Any23 - Anything to Triples
Michele Mostarda
 
SemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsSemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsRinke Hoekstra
 
20th Feb 2020 json-ld-rdf-im-proposal.pdf
20th Feb 2020 json-ld-rdf-im-proposal.pdf20th Feb 2020 json-ld-rdf-im-proposal.pdf
20th Feb 2020 json-ld-rdf-im-proposal.pdf
Michal Miklas
 
Ld4 l triannon
Ld4 l triannonLd4 l triannon
Ld4 l triannon
Naomi Dushay
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
vty
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
Vyacheslav Tykhonov
 
CTDA MODS and Islandora XML Forms
CTDA MODS and Islandora XML FormsCTDA MODS and Islandora XML Forms
CTDA MODS and Islandora XML Forms
University of Connecticut Libraries
 
Introduction to metadata cleansing using SPARQL update queries
Introduction to metadata cleansing using SPARQL update queriesIntroduction to metadata cleansing using SPARQL update queries
Introduction to metadata cleansing using SPARQL update queries
European Commission
 
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014Robert Meusel
 
Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...
Gautier Poupeau
 
CMD2RDF
CMD2RDFCMD2RDF
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
EUCLID project
 
HDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsHDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data Portals
Ahmad Assaf
 
How to describe a dataset. Interoperability issues
How to describe a dataset. Interoperability issuesHow to describe a dataset. Interoperability issues
How to describe a dataset. Interoperability issues
Valeria Pesce
 
How to Describe a Dataset. Interoperability Issues, by Valeria Pesce
How to Describe a Dataset. Interoperability Issues, by Valeria PesceHow to Describe a Dataset. Interoperability Issues, by Valeria Pesce
How to Describe a Dataset. Interoperability Issues, by Valeria Pesce
AIMS (Agricultural Information Management Standards)
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Dr.-Ing. Thomas Hartmann
 

Similar to Tutorial: Describing Datasets with the Health Care and Life Sciences Community Profile (20)

Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSDataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
 
Re-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutRe-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playout
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
Semantic Web talk TEMPLATE
Semantic Web talk TEMPLATESemantic Web talk TEMPLATE
Semantic Web talk TEMPLATE
 
Apache Any23 - Anything to Triples
Apache Any23 - Anything to TriplesApache Any23 - Anything to Triples
Apache Any23 - Anything to Triples
 
SemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsSemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n Bolts
 
20th Feb 2020 json-ld-rdf-im-proposal.pdf
20th Feb 2020 json-ld-rdf-im-proposal.pdf20th Feb 2020 json-ld-rdf-im-proposal.pdf
20th Feb 2020 json-ld-rdf-im-proposal.pdf
 
Ld4 l triannon
Ld4 l triannonLd4 l triannon
Ld4 l triannon
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
 
CTDA MODS and Islandora XML Forms
CTDA MODS and Islandora XML FormsCTDA MODS and Islandora XML Forms
CTDA MODS and Islandora XML Forms
 
Introduction to metadata cleansing using SPARQL update queries
Introduction to metadata cleansing using SPARQL update queriesIntroduction to metadata cleansing using SPARQL update queries
Introduction to metadata cleansing using SPARQL update queries
 
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
 
Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...
 
CMD2RDF
CMD2RDFCMD2RDF
CMD2RDF
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
HDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsHDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data Portals
 
How to describe a dataset. Interoperability issues
How to describe a dataset. Interoperability issuesHow to describe a dataset. Interoperability issues
How to describe a dataset. Interoperability issues
 
How to Describe a Dataset. Interoperability Issues, by Valeria Pesce
How to Describe a Dataset. Interoperability Issues, by Valeria PesceHow to Describe a Dataset. Interoperability Issues, by Valeria Pesce
How to Describe a Dataset. Interoperability Issues, by Valeria Pesce
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
 

More from Alasdair Gray

Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Alasdair Gray
 
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Alasdair Gray
 
Open PHACTS: The Data Today
Open PHACTS: The Data TodayOpen PHACTS: The Data Today
Open PHACTS: The Data Today
Alasdair Gray
 
Project X
Project XProject X
Project X
Alasdair Gray
 
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyData Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case Study
Alasdair Gray
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data Context
Alasdair Gray
 
Data Linkage
Data LinkageData Linkage
Data Linkage
Alasdair Gray
 
Scientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataScientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry data
Alasdair Gray
 
Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...
Alasdair Gray
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community Profile
Alasdair Gray
 
SensorBench
SensorBenchSensorBench
SensorBench
Alasdair Gray
 
Data Science meets Linked Data
Data Science meets Linked DataData Science meets Linked Data
Data Science meets Linked Data
Alasdair Gray
 
Sensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingSensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingAlasdair Gray
 
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Alasdair Gray
 
Computing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsComputing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery Datasets
Alasdair Gray
 
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Alasdair Gray
 
Including Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL QueryIncluding Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL Query
Alasdair Gray
 
2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions
Alasdair Gray
 

More from Alasdair Gray (18)

Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
 
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
 
Open PHACTS: The Data Today
Open PHACTS: The Data TodayOpen PHACTS: The Data Today
Open PHACTS: The Data Today
 
Project X
Project XProject X
Project X
 
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyData Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case Study
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data Context
 
Data Linkage
Data LinkageData Linkage
Data Linkage
 
Scientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataScientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry data
 
Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community Profile
 
SensorBench
SensorBenchSensorBench
SensorBench
 
Data Science meets Linked Data
Data Science meets Linked DataData Science meets Linked Data
Data Science meets Linked Data
 
Sensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingSensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-being
 
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
 
Computing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsComputing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery Datasets
 
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
 
Including Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL QueryIncluding Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL Query
 
2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions
 

Recently uploaded

Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 

Recently uploaded (20)

Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 

Tutorial: Describing Datasets with the Health Care and Life Sciences Community Profile

  • 1. Describing Datasets with the Health Care and Life Sciences Community Profile Alasdair Gray Department of Computer Science Heriot-Watt University www.macs.hw.ac.uk/~ajg33/ A.J.G.Gray@hw.ac.uk @gray_alasdair Michel Dumontier Stanford University M. Scott Marshall Netherlands Cancer Institute
  • 2. Materials released under CC-BY License You are free to: • Share — copy and redistribute the material in any medium or format • Adapt — remix, transform, and build upon the material for any purpose, even commercially. The licensor cannot revoke these freedoms as long as you follow the license terms. Under the following terms: • Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. 4 December 2016 Alasdair Gray @gray_alasdair 2
  • 3. Outline • Dataset Descriptions • Overview • Use Cases • Requirements • Implementations • Existing Vocabularies • HCLS Community Profile • Overview • Example • Modules • Hands On • Develop your own description • Validation • Overview • Demonstration • Hands On • Validate your description • RDF Dataset Statistics • Overview • SPARQL Queries 4 December 2016 Alasdair Gray @gray_alasdair 3
  • 4. W3C HCLS Group Dumontier M, Gray AJG, Marshall MS, et al. (2016) The health care and life sciences community profile for dataset descriptions. PeerJ 4:e2331 https://doi.org/10.7717/peerj.2331 4 December 2016 Alasdair Gray @gray_alasdair 4
  • 5. Use Cases Overview of a couple of examples 4 December 2016 Alasdair Gray @gray_alasdair 5
  • 6. FAIR Data Principles Findable • Global persistent identifier • Rich metadata • Store metadata in registries Accessible • Resolvable identifiers • Metadata persists • Machine and human access Interoperable • Open data format • Modelled with FAIR compliant vocabularies • Reference external data Reusable • Rich metadata • Clear license • Provenance 4 December 2016 Alasdair Gray @gray_alasdair 6
  • 7. FAIR Data 4 December 2016 Alasdair Gray @gray_alasdair 7
  • 8. Open PHACTS Drug Discovery Platform 4 December 2016 Alasdair Gray @gray_alasdair 8
  • 9. Open PHACTS Drug Discovery Platform 9
  • 10. Data Cache (Triple Store) Semantic Workflow Engine Linked Data API (RDF/XML, TTL, JSON) Domain Specific Services Identity Resolution Service Identifier Management Service “Adenosine receptor 2a” EC2.43.4 CS4532 P12374 CorePlatform ChEMBL- RDF ChEMBL v13 Chem2 Bio2RDF SD v13 v12 v2 or v8 10 Which version of ChEMBL?
  • 11. Data Cache (Triple Store) Semantic Workflow Engine Linked Data API (RDF/XML, TTL, JSON) Domain Specific Services Identity Resolution Service Identifier Management Service “Adenosine receptor 2a” EC2.43.4 CS4532 P12374 CorePlatform ChEMBL- RDF ChEMBL v13 Chem2 Bio2RDF SD v13 v12 v2 or v8 Open PHACTS Discovery PlatformHistoric Use Case ~January 2012 Open PHACTS v2.1 ChEMBL 20 http://tiny.cc/ops-datasets 11 Which version of ChEMBL?
  • 12. Challenges • Datasets available • In many versions over time • In different formats • From many mirrors/registries • Datasets build on each other • Files do not carry metadata • Registries • Can be out-of-date • Can contain conflicting information 4 December 2016 Alasdair Gray @gray_alasdair 12 Scientists require data provenance!
  • 13. Requirements 4 December 2016 Alasdair Gray @gray_alasdair 13
  • 14. Metadata Requirements • What is the dataset about? • Name and description • Content themes • Who produced the dataset? • Publisher details • Content author details • When was the dataset created? • Date of creation • Date of publication • Versioning • Where can I get the dataset? • Licence and rights • Download URL • SPARQL endpoint • Web service • How was the dataset created? • Experimental methodology • Post-processing • Why was the dataset created? • Motivation 4 December 2016 Alasdair Gray @gray_alasdair 14
  • 15. HCLS Additional Requirements Standard metadata requirements plus: 1. Resolvable identifiers for metadata 2. Descriptions of data identifiers 3. Data provenance 4. Data statistics 4 December 2016 Alasdair Gray @gray_alasdair 15
  • 16. HCLS Implementations RDF Platform More coming… 4 December 2016 Alasdair Gray @gray_alasdair 16
  • 17. Existing Vocabularies 4 December 2016 Alasdair Gray @gray_alasdair 17
  • 18. Dublin Core Metadata Initiative Widely used Broadly applicable • Documents • Datasets ✗Generic terms ✗Not comprehensive ✗No required properties 4 December 2016 Alasdair Gray @gray_alasdair 18 “Date: A point or period of time associated with an event in the lifecycle of the resource.”
  • 19. 19Alasdair Gray @gray_alasdair Metadata carried with data • Directly embedded: void:inDataset ✗No versioning ✗No checklist of requisite fields ✗Only for RDF data VoID: Vocabulary of Interlinked Datasets 4 December 2016
  • 20. DCAT: Data Catalog  Separates Dataset and Distribution ✗No versioning ✗No prescribed properties 4 December 2016 Alasdair Gray @gray_alasdair 20 Fixed with DCAT-AP DCAT-AP doesn’t meet use case needs
  • 21. HCLS Community Profile http://www.w3.org/TR/hcls-dataset/ 4 December 2016 Alasdair Gray @gray_alasdair 21
  • 22. HCLS Dataset Descriptions • 61 Metadata properties – 5 Modules • 18 vocabularies – DCTerms – DCAT – VoID – … 4 December 2016 Alasdair Gray @gray_alasdair 22
  • 23. ChEMBL Summary Level Description 4 December 2016 Alasdair Gray @gray_alasdair 23
  • 24. ChEMBL 17 Version Level Description 4 December 2016 Alasdair Gray @gray_alasdair 24
  • 25. ChEMBL 17 DB Distribution 4 December 2016 Alasdair Gray @gray_alasdair 25
  • 26. ChEMBL 17 DB Distribution 4 December 2016 Alasdair Gray @gray_alasdair 26
  • 27. ChEMBL 17 RDF Distribution 4 December 2016 Alasdair Gray @gray_alasdair 27
  • 28. ChEMBL 17 RDF Distribution 4 December 2016 Alasdair Gray @gray_alasdair 28
  • 29. ChEMBL 17 RDF Distribution 4 December 2016 Alasdair Gray @gray_alasdair 29
  • 30. Description Modules 4 December 2016 Alasdair Gray @gray_alasdair 30
  • 31. Core Metadata (Title & description) Element Property Value Summary Level Version Level Distribution Level Type declaration rdf:type dctypes:Dataset MUST MUST SHOULD Type declaration rdf:type void:Dataset or dcat:Distribution MUST NOT MUST NOT MUST Title dct:title rdf:langString MUST MUST MUST Alternative titles dct:alternative rdf:langString MAY MAY MAY Description dct:description rdf:langString MUST MUST MUST 4 December 2016 Alasdair Gray @gray_alasdair 31
  • 32. Core Metadata (Dates & contributors) Element Property Value Summary Level Version Level Distribution Level Date createddct:created rdfs:Literal encoded using the relevant ISO 8601 Date and Time compliant string and typed using the appropriate XML Schema datatype MUST NOT SHOULD SHOULD Other dates pav:createdOn or pav:authoredOn or pav:curatedOn xsd:dateTime, xsd:date, xsd:gYearMonth, or xsd:gYear MUST NOT MAY MAY Creators dct:creator IRI MUST NOT MUST MUST Contributors dct:contributor or pav:createdBy or pav:authoredBy or pav:curatedBy IRI MUST NOT MAY MAY Date of issuedct:issued rdfs:Literal encoded using the relevant ISO 8601 Date and Time compliant string and typed using the appropriate XML Schema datatype MUST NOT SHOULD SHOULD
  • 33. Core Metadata (Publisher and licence) Element Property Value Summary Level Version Level Distribution Level Publisher dct:publisher IRI MUST MUST MUST HTML page foaf:page IRI SHOULD SHOULD SHOULD Logo schemaorg:logo IRI SHOULD SHOULD SHOULD License dct:license IRI MAY SHOULD MUST Rights dct:rights rdf:langString MAY MAY MAY 4 December 2016 Alasdair Gray @gray_alasdair 33
  • 34. Core Metadata (Content description) Element Property Value Summary Level Version Level Distribution Level Keywords dcat:keyword xsd:string MAY MAY MAY Language dct:language http://lexvo.org/id/iso 639-3/{tag} MUST NOT SHOULD SHOULD References dct:references IRI MAY MAY MAY Concept descriptors dcat:theme IRI of type skos:Concept MAY MAY MAY Vocabulary used void:vocabulary IRI MUST NOT MUST NOTSHOULD Standards used dct:conformsTo IRI MUST NOT MAY SHOULD Citations cito:citesAsAuthority IRI MAY MAY MAY Related material rdfs:seeAlso IRI MAY MAY MAY Partitions dct:hasPart IRI MAY MAY MUST NOT 4 December 2016 Alasdair Gray @gray_alasdair 34
  • 35. Identifiers Element Property Value Summary Level Version Level Distribution Level Preferred prefix idot:preferredPrefix xsd:string MAY MAY MAY Alternate prefix idot:alternatePrefix xsd:string MAY MAY MAY Identifier pattern idot:identifierPattern xsd:string MUST NOT MUST NOT MAY URI pattern void:uriRegexPattern xsd:string MUST NOT MUST NOT MAY File access pattern idot:accessPattern idot:AccessPattern MUST NOT MUST NOT MAY Example identifier idot:exampleIdentifier xsd:string MUST NOT MUST NOT SHOULD Example resource void:exampleResource IRI MUST NOT MUST NOT SHOULD 4 December 2016 Alasdair Gray @gray_alasdair 35
  • 36. Provenance and Change (Versioning) Element Property Value Summary Level Version Level Distribution Level Version identifier pav:version xsd:string MUST NOT MUST SHOULD Version linking dct:isVersionOf IRI MUST NOT MUST MUST NOT Version linking pav:previousVersion IRI MUST NOT SHOULD SHOULD Version linking pav:hasCurrentVersion IRI MAY MUST NOT MUST NOT 4 December 2016 Alasdair Gray @gray_alasdair 36
  • 37. Provenance and Change Element Property Value Summary Level Version Level Distribution Level Data source provenance dct:source or pav:retrievedFrom or prov:wasDerivedFrom IRI MUST NOT SHOULD SHOULD Item listing sio:has-data-item IRI MUST NOT MUST NOT MAY Creation tool pav:createdWith IRI MUST NOT SHOULD SHOULD Update frequency dct:accrualPeriodicity IRI of type dctypes:Frequency SHOULD MUST NOT MUST NOT 4 December 2016 Alasdair Gray @gray_alasdair 37
  • 38. Availability and Distributions Element Property Value Summary Level Version Level Distribution Level Distribution description dcat:distribution IRI of Distribution Level description MUST NOT SHOULD MUST NOT File format dct:format IRI or xsd:String MUST NOT MUST NOT MUST File directory dcat:accessURL IRI MAY MAY MAY File URL dcat:downloadURL IRI MUST NOT MUST NOT SHOULD Byte size dcat:byteSize xsd:decimal MUST NOT MUST NOT SHOULD RDF File URL void:dataDump IRI MUST NOT MUST NOT SHOULD SPARQL endpoint void:sparqlEndpoint IRI SHOULD SHOULD NOT SHOULD NOT Documentation dcat:landingPage IRI MUST NOT MAY MAY Linkset void:subset IRI MUST NOT MUST NOT SHOULD 4 December 2016 Alasdair Gray @gray_alasdair 38
  • 39. Statistics (RDF Core) Element Property Value Summary Level Version Level Distribution Level # of triples void:triples xsd:integer MUST NOT MUST NOT SHOULD # of typed entities void:entities xsd:integer MUST NOT MUST NOT SHOULD # of subjects void:distinctSubjects xsd:integer MUST NOT MUST NOT SHOULD # of properties void:properties xsd:integer MUST NOT MUST NOT SHOULD # of objects void:distinctObjects xsd:integer MUST NOT MUST NOT SHOULD # of classes void:classPartition IRI MUST NOT MUST NOT SHOULD # of literals void:classPartition IRI MUST NOT MUST NOT SHOULD # of RDF graphs void:classPartition IRI MUST NOT MUST NOT SHOULD
  • 40. Statistics (RDF Complete) Element Property Value Summary Level Version Level Distribution Level class frequency void:classPartition IRI MUST NOT MUST NOT MAY property frequency void:propertyPartition IRI MUST NOT MUST NOT MAY property and subject types void:propertyPartition IRI MUST NOT MUST NOT MAY property and object types void:propertyPartition IRI MUST NOT MUST NOT MAY property and literals void:propertyPartition IRI MUST NOT MUST NOT MAY property subject and object types void:propertyPartition IRI MUST NOT MUST NOT MAY 4 December 2016 Alasdair Gray @gray_alasdair 40
  • 41. Hands on Create your own dataset description If you don’t have your own, pick one of the following two 4 December 2016 Alasdair Gray @gray_alasdair 41
  • 44. Validation Service Andrew Beveridge Jacob Baungard Hansen Johnny Val Leif Gehrmann Roisin Farmer Sunil Khutan Tomas Robertson 4 December 2016 Alasdair Gray @gray_alasdair 44
  • 45. Example Constraint 4 December 2016 46 • Shape • A Dataset Summary • MUST be declared to be of type dctype:Dataset • MUST have a dcterms:title as a language typed string • MUST NOT have dcterms:created date <Dataset> rdf:langString . ✗ Alasdair Gray @gray_alasdair Dates are associated with versions in HCLS
  • 46. Example Validation 4 December 2016 47 <Dataset> rdf:langString . ✗ Alasdair Gray @gray_alasdair • Shape • Data
  • 47. Example Validation • Shape • Data 4 December 2016 48 <Dataset> rdf:langString . ✗ Alasdair Gray @gray_alasdair
  • 48. Example Validation 4 December 2016 49 <Dataset> rdf:langString . ✗ Alasdair Gray @gray_alasdair • Shape • Data
  • 49. Example Validation (Closed Shape) 4 December 2016 50 <Dataset> rdf:langString . ✗ Alasdair Gray @gray_alasdair • Shape • Data
  • 50. <Dataset> { rdf:type (dctypes:Dataset), dct:title rdf:langString, dct:alternative rdf:langString+, !dct:created . } Shape 4 December 2016 51 <Dataset> rdf:langString . ✗ Alasdair Gray @gray_alasdair Shape Expressions (ShEx)
  • 51. <Dataset> { rdf:type (dctypes:Dataset), dct:title rdf:langString, dct:alternative rdf:langString+, !dct:created . } <Dataset> { rdf:type (dctypes:Dataset), dct:title rdf:langString, dct:alternative rdf:langString+, !dct:created . } <Dataset> { rdf:type (dctypes:Dataset), dct:title rdf:langString, dct:alternative rdf:langString+, !dct:created . } <Dataset> { rdf:type (dctypes:Dataset), dct:title rdf:langString, dct:alternative rdf:langString+, !dct:created . } <Dataset> { rdf:type (dctypes:Dataset), dct:title rdf:langString, dct:alternative rdf:langString+, !dct:created . } <Dataset> { rdf:type (dctypes:Dataset), dct:title rdf:langString, dct:alternative rdf:langString+, !dct:created . } 4 December 2016 52Alasdair Gray @gray_alasdair Validator can’t warn of missing property Example data ShEx Validation
  • 52. <Dataset> { `MUST` rdf:type (dctypes:Dataset), `MUST` dct:title rdf:langString, `MAY` dct:alternative rdf:langString+, `MUST` !dct:created . } Shape 4 December 2016 53 <Dataset> rdf:langString . ✗ Alasdair Gray @gray_alasdair Validator can warn of missing property Requirement Levels
  • 54. Validator Hands-On 4 December 2016 Alasdair Gray @gray_alasdair 56
  • 55. http://www.w3.org/2015/03/ShExValidata/ • Try the ChEMBL example • Play with Options • Add resources • Switch between open/closed shapes • Change requirement level • Check your own work • Try descriptions from • Riken Metadatabase http://metadb.riken.jp/archives/HCLSProfile/HCLSProfile_MetaDB_ALL.ttl • PHI-Base https://www.dropbox.com/s/tghogwagn962tmi/Metadata.ttl?dl=0 4 December 2016 Alasdair Gray @gray_alasdair 57
  • 56. RDF Dataset Statistics 4 December 2016 Alasdair Gray @gray_alasdair 58
  • 57. Statistics (RDF Core) Element Property Value Summary Level Version Level Distribution Level # of triples void:triples xsd:integer MUST NOT MUST NOT SHOULD # of typed entities void:entities xsd:integer MUST NOT MUST NOT SHOULD # of subjects void:distinctSubjects xsd:integer MUST NOT MUST NOT SHOULD # of properties void:properties xsd:integer MUST NOT MUST NOT SHOULD # of objects void:distinctObjects xsd:integer MUST NOT MUST NOT SHOULD # of classes void:classPartition IRI MUST NOT MUST NOT SHOULD # of literals void:classPartition IRI MUST NOT MUST NOT SHOULD # of RDF graphs void:classPartition IRI MUST NOT MUST NOT SHOULD
  • 58. Statistics (RDF Complete) Element Property Value Summary Level Version Level Distribution Level class frequency void:classPartition IRI MUST NOT MUST NOT MAY property frequency void:propertyPartition IRI MUST NOT MUST NOT MAY property and subject types void:propertyPartition IRI MUST NOT MUST NOT MAY property and object types void:propertyPartition IRI MUST NOT MUST NOT MAY property and literals void:propertyPartition IRI MUST NOT MUST NOT MAY property subject and object types void:propertyPartition IRI MUST NOT MUST NOT MAY 4 December 2016 Alasdair Gray @gray_alasdair 60
  • 59. Why provide rich dataset descriptions? • Support • Endpoint exploration • Query writing • Eliminates expensive exploratory queries 4 December 2016 Alasdair Gray @gray_alasdair 61
  • 60. Why provide rich dataset descriptions? • Support • Endpoint exploration • Query writing • Eliminates expensive exploratory queries 4 December 2016 Alasdair Gray @gray_alasdair 62
  • 61. Why provide rich dataset descriptions? • Support • Endpoint exploration • Query writing • Eliminates expensive exploratory queries 4 December 2016 Alasdair Gray @gray_alasdair 63
  • 62. Generate with SPARQL queries Number of triples Subject-types related to Object-types 4 December 2016 Alasdair Gray @gray_alasdair 64
  • 63. Summary 4 December 2016 Alasdair Gray @gray_alasdair 65
  • 64. FAIR Data Principles Findable • Global persistent identifier • Rich metadata • Store metadata in registries Accessible • Resolvable identifiers • Metadata persists • Machine and human access Interoperable • Open data format • Modelled with FAIR compliant vocabularies • Reference external data Reusable • Rich metadata • Clear license • Provenance 4 December 2016 Alasdair Gray @gray_alasdair 66
  • 65. HCLS Dataset Descriptions https://www.w3.org/TR/hcls-dataset/ Dumontier M, Gray AJG, Marshall MS, et al. (2016) The health care and life sciences community profile for dataset descriptions. PeerJ 4:e2331 https://doi.org/10.7717/peerj.2331 A.J.G.Gray@hw.ac.uk @gray_alasdair

Editor's Notes

  1. Large community buy in - 27 authors – Major data providers EBI, RIKEN, SIB Wide range of use cases
  2. OPS Explorer screenshot returning compound information Open PHACTS as a use case; data integration platform One of many use cases gathered in HCLSIG
  3. Key element is provenance of where the data has come from Enabled by having detailed descriptions of the sources
  4. Behind the scenes! ChemSpider: EBI SDF file ChEMBL 13 Data Cache: Chem2Bio2RDF ChEMBL RDF File downloaded May 2011 Chem2Bio2RDF metadata webpages: ChEMBL 8 File contents: ChEMBL 2 Mapping Server: Kasabi ChEMBL RDF file ChEMBL 12
  5. Behind the scenes! ChemSpider: EBI SDF file ChEMBL 13 Data Cache: Chem2Bio2RDF ChEMBL RDF File downloaded May 2011 Chem2Bio2RDF metadata webpages: ChEMBL 8 File contents: ChEMBL 2 Mapping Server: Kasabi ChEMBL RDF file ChEMBL 12
  6. We reuse several properties
  7. Summary level: time unchanging information, e.g. name, description, publisher Version level: version specific information, e.g. version number, creator, etc Distribution level: file specific information, e.g. file location and format, number of triples Reuse vocabularies: DCTerms, DCAT, VoID, FOAF, … Prescribed properties: MUST, SHOULD, MAY, MUST NOT for each level
  8. 21 Properties 4 MUST 4 SHOULD 13 May
  9. Link into data publishing pipeline via API Not tied to HCLS, only a motivation No existing tool meets these needs
  10. Constraints form a graph pattern that data must comply with
  11. How do we validate that our example data conforms to a certain shape Express expected shape as ShEx Toy example, what about for real
  12. How do we validate that our example data conforms to a certain shape Express expected shape as ShEx Toy example, what about for real
  13. How do we validate that our example data conforms to a certain shape Express expected shape as ShEx Toy example, what about for real
  14. How do we validate that our example data conforms to a certain shape Express expected shape as ShEx Toy example, what about for real
  15. ShEx: Concise notation regex based W3C SHACL not stable when work done ShEx is an implementation of SHACL with extra features
  16. Step through validation process
  17. Extended ShEx to allow arbitrary hierarchies Toy example, what about for real
  18. ShEx-validator has other dependencies too Minimist: arguments parser Promise: call backs Pegjs: parser generator Mocha: test driven development
  19. Summary level: time unchanging information, e.g. name, description, publisher Version level: version specific information, e.g. version number, creator, etc Distribution level: file specific information, e.g. file location and format, number of triples Acknowledge W3C HCLS IG