SlideShare a Scribd company logo
Describing Datasets with the W3C
HCLS standard
Melissa Haendel
Michel Dumontier
World Wide Web Consortium (W3C)
 The W3C is the main international standards
organization for the World Wide Web
 The W3C is made up of over 400 member
organizations for the purpose of working
together in the development of standards for the
World Wide Web
 W3C has sophisticated development and
community validation procedures for standards
development
The Semantic Web
is the new global web of knowledge
It involves standards for publishing, sharing, and querying
facts, expert knowledge and services
It is a scalable approach to the
discovery of independently formulated
and distributed knowledge
Cyganiak and Jentzsch. http://lod-cloud.net/
Resource Description Framework
 Language to represent knowledge
 Logic-based formalism -> automated reasoning
 graph-like properties -> data analysis
 Good for:
 Describing in terms of type, attributes, relations
 Integrating data from different sources
 Sharing the data (W3C standard)
 Reusing what is available, developing what you need,
and contributing back to the web of data
Challenge: Working with Web Data
 Often have inadequate descriptions so we don’t know what
they are about or how they were constructed
 Datasets change over time, but often don’t come with
versioning information
 May have been constructed using other data, but it’s not clear
which version of data was used or whether these were
modified
 Data may be available in a variety of formats
 There may be multiple copies of data from different providers,
but it’s unclear if they are exact copies or derivatives
 Version of standard or vocabulary used not indicated
 Data registries are not synchronized and can contain
conflicting information
Key Use Cases for HCLS Dataset
description
1. Dataset Identification, Description, Licensing and
Provenance
2. Dataset Discovery (via Catalog)
3. Exchange of Dataset Descriptions
4. Dataset Linking
5. Content Summary
6. Monitoring of Dataset Changes
Objectives
 Develop a guidance note for reusing existing
vocabularies to describe datasets with RDF
– Mandatory, recommended, optional descriptors
– Identifiers
– Versioning
– Attribution
– Provenance
– Content summarization
 Recommend vocabulary-linked attributes and
value sets
 Provide reference editor and validation
We complied a list of metadata fields
used across the community
and then surveyed over 20 vocabularies to see if they
provided relevant metadata elements or value sets…
…to produce a big spreadsheet that maps metadata
needs with existing vocabularies
Dublin Core Metadata Initiative
Widely used
Broadly applicable
– Documents
– Datasets
✗Generic terms
✗Not comprehensive
✗No required properties
“Date: A point or period of time
associated with an event in the
lifecycle of the resource.”
DCAT: Data Catalog
 Separates Dataset and Distribution
✗No versioning
✗No prescribed properties
No single vocabulary provides
all key metadata fields
http://tiny.cc/hcls-datadesc
Included Vocabularies
Three Component Metadata Model:
description – version – distribution
Description
 Identifiers
 Title
 Description
 Homepage
 License
 Language
 Keywords
 Concepts and vocabularies used
 Standards
 Publication
Attribution
 Simple Model
– Individuals are related to roles using specific
properties
e.g. dct:creator, pav:createdBy, pav:curatedBy
 Expandable Model
– Individuals are related to roles and dates via
associated object
– PROV, VIVO-ISF
Provenance and Change
 Version number
 Source
 Provenance: retrieved from, derived from,
created with
 Frequency of change
Availability
 Format
 Download URL
 Landing page
 SPARQL endpoint
VoID Editor
Tools to create the metadata
Tools to validate the metadata
New version
using ShEx in
development
HCLS:
http://www.w3.org/blog/hcls/
Mailing list: http://lists.w3.org/Archives/Public/public-
semweb-lifesci/
Editors’ Draft:
http://tiny.cc/hcls-datadesc-ed
W3C Interest Group Note:
http://tiny.cc/hcls-datadesc
Special thanks to Alasdair Gray, Scott Marshall, Joachim Baran
Thanks to all other contributors to the HCLS note

More Related Content

What's hot

Libraries and Data Management
Libraries and Data ManagementLibraries and Data Management
Libraries and Data Management
University of California Curation Center
 
Correcting and Updating the Scholarly Record through CrossMark
Correcting and Updating the Scholarly Record through CrossMarkCorrecting and Updating the Scholarly Record through CrossMark
Correcting and Updating the Scholarly Record through CrossMark
Crossref
 
ORCID for Publishers
ORCID for PublishersORCID for Publishers
ORCID for Publishers
ORCID, Inc
 
Crossref for Ambassadors - Introductory webinar
Crossref for Ambassadors - Introductory webinarCrossref for Ambassadors - Introductory webinar
Crossref for Ambassadors - Introductory webinar
Vanessa Fairhurst
 
Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
Steven McEachern - ADA, DDI (metadata standard) and the Data LifecycleSteven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
Steve Androulakis
 
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
ARDC
 
Sharepoint taxonomy introduction us
Sharepoint taxonomy introduction   usSharepoint taxonomy introduction   us
Sharepoint taxonomy introduction us
QUONTRASOLUTIONS
 
ORCID for Universities & Research Organizations
ORCID for Universities & Research OrganizationsORCID for Universities & Research Organizations
ORCID for Universities & Research Organizations
ORCID, Inc
 
ORCID & Iam @ UM - Outreach
ORCID & Iam @ UM - OutreachORCID & Iam @ UM - Outreach
ORCID & Iam @ UM - Outreach
Elaine Westbrooks
 
How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?
Jian Qin
 
ORCID for funding organizations
ORCID for funding organizationsORCID for funding organizations
ORCID for funding organizations
ORCID, Inc
 
Jisc UK ORCID Support: onboarding webinar
Jisc UK ORCID Support: onboarding webinarJisc UK ORCID Support: onboarding webinar
Jisc UK ORCID Support: onboarding webinar
Jisc
 
Rebecca Grant - DRI/ARA(I) Training: Introduction to EAD - Metadata and Metad...
Rebecca Grant - DRI/ARA(I) Training: Introduction to EAD - Metadata and Metad...Rebecca Grant - DRI/ARA(I) Training: Introduction to EAD - Metadata and Metad...
Rebecca Grant - DRI/ARA(I) Training: Introduction to EAD - Metadata and Metad...
dri_ireland
 
DSpace-CRIS 7: What is Coming? OR2020
DSpace-CRIS 7: What is Coming? OR2020DSpace-CRIS 7: What is Coming? OR2020
DSpace-CRIS 7: What is Coming? OR2020
4Science
 
FAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRnessFAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRness
Susanna-Assunta Sansone
 
Leverage DSpace for an enterprise, mission critical platform
Leverage DSpace for an enterprise, mission critical platformLeverage DSpace for an enterprise, mission critical platform
Leverage DSpace for an enterprise, mission critical platform
Andrea Bollini
 
Linked data 20171106
Linked data 20171106Linked data 20171106
Linked data 20171106
Synaptica, LLC
 
Meta data
Meta dataMeta data
Meta data
MoonFandrA
 
Introduction to Crossref - Crossref LIVE Bangkok
Introduction to Crossref - Crossref LIVE BangkokIntroduction to Crossref - Crossref LIVE Bangkok
Introduction to Crossref - Crossref LIVE Bangkok
Crossref
 
OAIS: What is it and Where is it Going? - Don Sawyer (2002)
OAIS: What is it and Where is it Going? - Don Sawyer (2002)OAIS: What is it and Where is it Going? - Don Sawyer (2002)
OAIS: What is it and Where is it Going? - Don Sawyer (2002)
faflrt
 

What's hot (20)

Libraries and Data Management
Libraries and Data ManagementLibraries and Data Management
Libraries and Data Management
 
Correcting and Updating the Scholarly Record through CrossMark
Correcting and Updating the Scholarly Record through CrossMarkCorrecting and Updating the Scholarly Record through CrossMark
Correcting and Updating the Scholarly Record through CrossMark
 
ORCID for Publishers
ORCID for PublishersORCID for Publishers
ORCID for Publishers
 
Crossref for Ambassadors - Introductory webinar
Crossref for Ambassadors - Introductory webinarCrossref for Ambassadors - Introductory webinar
Crossref for Ambassadors - Introductory webinar
 
Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
Steven McEachern - ADA, DDI (metadata standard) and the Data LifecycleSteven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
 
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
 
Sharepoint taxonomy introduction us
Sharepoint taxonomy introduction   usSharepoint taxonomy introduction   us
Sharepoint taxonomy introduction us
 
ORCID for Universities & Research Organizations
ORCID for Universities & Research OrganizationsORCID for Universities & Research Organizations
ORCID for Universities & Research Organizations
 
ORCID & Iam @ UM - Outreach
ORCID & Iam @ UM - OutreachORCID & Iam @ UM - Outreach
ORCID & Iam @ UM - Outreach
 
How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?
 
ORCID for funding organizations
ORCID for funding organizationsORCID for funding organizations
ORCID for funding organizations
 
Jisc UK ORCID Support: onboarding webinar
Jisc UK ORCID Support: onboarding webinarJisc UK ORCID Support: onboarding webinar
Jisc UK ORCID Support: onboarding webinar
 
Rebecca Grant - DRI/ARA(I) Training: Introduction to EAD - Metadata and Metad...
Rebecca Grant - DRI/ARA(I) Training: Introduction to EAD - Metadata and Metad...Rebecca Grant - DRI/ARA(I) Training: Introduction to EAD - Metadata and Metad...
Rebecca Grant - DRI/ARA(I) Training: Introduction to EAD - Metadata and Metad...
 
DSpace-CRIS 7: What is Coming? OR2020
DSpace-CRIS 7: What is Coming? OR2020DSpace-CRIS 7: What is Coming? OR2020
DSpace-CRIS 7: What is Coming? OR2020
 
FAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRnessFAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRness
 
Leverage DSpace for an enterprise, mission critical platform
Leverage DSpace for an enterprise, mission critical platformLeverage DSpace for an enterprise, mission critical platform
Leverage DSpace for an enterprise, mission critical platform
 
Linked data 20171106
Linked data 20171106Linked data 20171106
Linked data 20171106
 
Meta data
Meta dataMeta data
Meta data
 
Introduction to Crossref - Crossref LIVE Bangkok
Introduction to Crossref - Crossref LIVE BangkokIntroduction to Crossref - Crossref LIVE Bangkok
Introduction to Crossref - Crossref LIVE Bangkok
 
OAIS: What is it and Where is it Going? - Don Sawyer (2002)
OAIS: What is it and Where is it Going? - Don Sawyer (2002)OAIS: What is it and Where is it Going? - Don Sawyer (2002)
OAIS: What is it and Where is it Going? - Don Sawyer (2002)
 

Viewers also liked

Electronic industries association
Electronic industries associationElectronic industries association
Electronic industries association
lindavargas33
 
Organismos que rigen el cableado estructurado
Organismos que rigen el cableado estructuradoOrganismos que rigen el cableado estructurado
Organismos que rigen el cableado estructurado
cristianvillada
 
Buliding a DCAT Merger (SemDev 2015)
Buliding a DCAT Merger (SemDev 2015)Buliding a DCAT Merger (SemDev 2015)
Buliding a DCAT Merger (SemDev 2015)
Pieter Heyvaert
 
How open is open? An evaluation rubric for public knowledgebases
How open is open?  An evaluation rubric for public knowledgebasesHow open is open?  An evaluation rubric for public knowledgebases
How open is open? An evaluation rubric for public knowledgebases
mhaendel
 
The Application of the Human Phenotype Ontology
The Application of the Human Phenotype Ontology The Application of the Human Phenotype Ontology
The Application of the Human Phenotype Ontology
mhaendel
 
Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016
Benjamin Good
 
The Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discoveryThe Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discovery
mhaendel
 

Viewers also liked (7)

Electronic industries association
Electronic industries associationElectronic industries association
Electronic industries association
 
Organismos que rigen el cableado estructurado
Organismos que rigen el cableado estructuradoOrganismos que rigen el cableado estructurado
Organismos que rigen el cableado estructurado
 
Buliding a DCAT Merger (SemDev 2015)
Buliding a DCAT Merger (SemDev 2015)Buliding a DCAT Merger (SemDev 2015)
Buliding a DCAT Merger (SemDev 2015)
 
How open is open? An evaluation rubric for public knowledgebases
How open is open?  An evaluation rubric for public knowledgebasesHow open is open?  An evaluation rubric for public knowledgebases
How open is open? An evaluation rubric for public knowledgebases
 
The Application of the Human Phenotype Ontology
The Application of the Human Phenotype Ontology The Application of the Human Phenotype Ontology
The Application of the Human Phenotype Ontology
 
Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016
 
The Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discoveryThe Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discovery
 

Similar to Dataset description using the W3C HCLS standard

Metadata lecture(9 17-14)
Metadata lecture(9 17-14)Metadata lecture(9 17-14)
Metadata lecture(9 17-14)
mhb120
 
A Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data RepositoriesA Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data Repositories
LIBER Europe
 
Metadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemesMetadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemes
Richard.Sapon-White
 
Learning Registry Overview Aug 2 2012
Learning Registry Overview Aug 2 2012Learning Registry Overview Aug 2 2012
Learning Registry Overview Aug 2 2012
Jeanne Kitchens
 
W3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description GuidelinesW3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description Guidelines
Michel Dumontier
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Laurent Alquier
 
A theory of Metadata enriching & filtering
A theory of  Metadata enriching & filteringA theory of  Metadata enriching & filtering
A theory of Metadata enriching & filtering
Cuerpo Academico 'Estudios de la Información'
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
dgarijo
 
W3C Library Linked Data Incubator Group - 2011
W3C Library Linked Data Incubator Group  - 2011W3C Library Linked Data Incubator Group  - 2011
W3C Library Linked Data Incubator Group - 2011
Antoine Isaac
 
The Scholix Framework and the OpenAIRE Scholexplorer Service (OpenAIRE webina...
The Scholix Framework and the OpenAIRE Scholexplorer Service (OpenAIRE webina...The Scholix Framework and the OpenAIRE Scholexplorer Service (OpenAIRE webina...
The Scholix Framework and the OpenAIRE Scholexplorer Service (OpenAIRE webina...
OpenAIRE
 
Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Linked Open Data in the World of Patents
Linked Open Data in the World of Patents
Dr. Haxel Consult
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTags
Merce Crosas
 
Buildvoc Introduction to linked data digital construction week 2018
Buildvoc Introduction to linked data digital construction week 2018Buildvoc Introduction to linked data digital construction week 2018
Buildvoc Introduction to linked data digital construction week 2018
Phil Stacey ICIOB
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
The University of Edinburgh
 
Next Generation Repositories
Next Generation RepositoriesNext Generation Repositories
Next Generation Repositories
ukcorr
 
How to describe a dataset. Interoperability issues
How to describe a dataset. Interoperability issuesHow to describe a dataset. Interoperability issues
How to describe a dataset. Interoperability issues
Valeria Pesce
 
How to Describe a Dataset. Interoperability Issues, by Valeria Pesce
How to Describe a Dataset. Interoperability Issues, by Valeria PesceHow to Describe a Dataset. Interoperability Issues, by Valeria Pesce
How to Describe a Dataset. Interoperability Issues, by Valeria Pesce
AIMS (Agricultural Information Management Standards)
 
Linked Data, Library Users, and the Discovery Tools of the Future
Linked Data, Library Users, and the Discovery Tools of the FutureLinked Data, Library Users, and the Discovery Tools of the Future
Linked Data, Library Users, and the Discovery Tools of the Future
Emily Nimsakont
 
Understanding Data
Understanding Data Understanding Data
Understanding Data
Kingsley Uyi Idehen
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015
Cason Snow
 

Similar to Dataset description using the W3C HCLS standard (20)

Metadata lecture(9 17-14)
Metadata lecture(9 17-14)Metadata lecture(9 17-14)
Metadata lecture(9 17-14)
 
A Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data RepositoriesA Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data Repositories
 
Metadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemesMetadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemes
 
Learning Registry Overview Aug 2 2012
Learning Registry Overview Aug 2 2012Learning Registry Overview Aug 2 2012
Learning Registry Overview Aug 2 2012
 
W3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description GuidelinesW3C HCLS Dataset Description Guidelines
W3C HCLS Dataset Description Guidelines
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
 
A theory of Metadata enriching & filtering
A theory of  Metadata enriching & filteringA theory of  Metadata enriching & filtering
A theory of Metadata enriching & filtering
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
 
W3C Library Linked Data Incubator Group - 2011
W3C Library Linked Data Incubator Group  - 2011W3C Library Linked Data Incubator Group  - 2011
W3C Library Linked Data Incubator Group - 2011
 
The Scholix Framework and the OpenAIRE Scholexplorer Service (OpenAIRE webina...
The Scholix Framework and the OpenAIRE Scholexplorer Service (OpenAIRE webina...The Scholix Framework and the OpenAIRE Scholexplorer Service (OpenAIRE webina...
The Scholix Framework and the OpenAIRE Scholexplorer Service (OpenAIRE webina...
 
Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Linked Open Data in the World of Patents
Linked Open Data in the World of Patents
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTags
 
Buildvoc Introduction to linked data digital construction week 2018
Buildvoc Introduction to linked data digital construction week 2018Buildvoc Introduction to linked data digital construction week 2018
Buildvoc Introduction to linked data digital construction week 2018
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
 
Next Generation Repositories
Next Generation RepositoriesNext Generation Repositories
Next Generation Repositories
 
How to describe a dataset. Interoperability issues
How to describe a dataset. Interoperability issuesHow to describe a dataset. Interoperability issues
How to describe a dataset. Interoperability issues
 
How to Describe a Dataset. Interoperability Issues, by Valeria Pesce
How to Describe a Dataset. Interoperability Issues, by Valeria PesceHow to Describe a Dataset. Interoperability Issues, by Valeria Pesce
How to Describe a Dataset. Interoperability Issues, by Valeria Pesce
 
Linked Data, Library Users, and the Discovery Tools of the Future
Linked Data, Library Users, and the Discovery Tools of the FutureLinked Data, Library Users, and the Discovery Tools of the Future
Linked Data, Library Users, and the Discovery Tools of the Future
 
Understanding Data
Understanding Data Understanding Data
Understanding Data
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015
 

More from mhaendel

Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
mhaendel
 
Semantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discoverySemantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discovery
mhaendel
 
The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA
mhaendel
 
Equivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholderEquivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholder
mhaendel
 
Building (and traveling) the data-brick road: A report from the front lines ...
Building (and traveling) the data-brick road:  A report from the front lines ...Building (and traveling) the data-brick road:  A report from the front lines ...
Building (and traveling) the data-brick road: A report from the front lines ...
mhaendel
 
GA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project IntroductionGA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project Introduction
mhaendel
 
GA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team updateGA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team update
mhaendel
 
Reusable data for biomedicine: A data licensing odyssey
Reusable data for biomedicine:  A data licensing odysseyReusable data for biomedicine:  A data licensing odyssey
Reusable data for biomedicine: A data licensing odyssey
mhaendel
 
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease DiscoveryData Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
mhaendel
 
Global phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discoveryGlobal phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discovery
mhaendel
 
Deep phenotyping to aid identification of coding & non-coding rare disease v...
Deep phenotyping to aid identification  of coding & non-coding rare disease v...Deep phenotyping to aid identification  of coding & non-coding rare disease v...
Deep phenotyping to aid identification of coding & non-coding rare disease v...
mhaendel
 
Science in the open, what does it take?
Science in the open, what does it take?Science in the open, what does it take?
Science in the open, what does it take?
mhaendel
 
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
mhaendel
 
Phenopackets as applied to variant interpretation
Phenopackets as applied to variant interpretation Phenopackets as applied to variant interpretation
Phenopackets as applied to variant interpretation
mhaendel
 
Credit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributionsCredit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributions
mhaendel
 
Deep phenotyping for everyone
Deep phenotyping for everyoneDeep phenotyping for everyone
Deep phenotyping for everyone
mhaendel
 
Why the world needs phenopacketeers, and how to be one
Why the world needs phenopacketeers, and how to be oneWhy the world needs phenopacketeers, and how to be one
Why the world needs phenopacketeers, and how to be one
mhaendel
 
On the frontier of genotype-2-phenotype data integration
On the frontier of genotype-2-phenotype data integrationOn the frontier of genotype-2-phenotype data integration
On the frontier of genotype-2-phenotype data integration
mhaendel
 
Envisioning a world where everyone helps solve disease
Envisioning a world where everyone helps solve diseaseEnvisioning a world where everyone helps solve disease
Envisioning a world where everyone helps solve disease
mhaendel
 
Getting (and giving) credit for all that we do
Getting (and giving) credit for all that we doGetting (and giving) credit for all that we do
Getting (and giving) credit for all that we do
mhaendel
 

More from mhaendel (20)

Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
 
Semantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discoverySemantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discovery
 
The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA
 
Equivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholderEquivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholder
 
Building (and traveling) the data-brick road: A report from the front lines ...
Building (and traveling) the data-brick road:  A report from the front lines ...Building (and traveling) the data-brick road:  A report from the front lines ...
Building (and traveling) the data-brick road: A report from the front lines ...
 
GA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project IntroductionGA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project Introduction
 
GA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team updateGA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team update
 
Reusable data for biomedicine: A data licensing odyssey
Reusable data for biomedicine:  A data licensing odysseyReusable data for biomedicine:  A data licensing odyssey
Reusable data for biomedicine: A data licensing odyssey
 
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease DiscoveryData Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
 
Global phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discoveryGlobal phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discovery
 
Deep phenotyping to aid identification of coding & non-coding rare disease v...
Deep phenotyping to aid identification  of coding & non-coding rare disease v...Deep phenotyping to aid identification  of coding & non-coding rare disease v...
Deep phenotyping to aid identification of coding & non-coding rare disease v...
 
Science in the open, what does it take?
Science in the open, what does it take?Science in the open, what does it take?
Science in the open, what does it take?
 
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
 
Phenopackets as applied to variant interpretation
Phenopackets as applied to variant interpretation Phenopackets as applied to variant interpretation
Phenopackets as applied to variant interpretation
 
Credit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributionsCredit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributions
 
Deep phenotyping for everyone
Deep phenotyping for everyoneDeep phenotyping for everyone
Deep phenotyping for everyone
 
Why the world needs phenopacketeers, and how to be one
Why the world needs phenopacketeers, and how to be oneWhy the world needs phenopacketeers, and how to be one
Why the world needs phenopacketeers, and how to be one
 
On the frontier of genotype-2-phenotype data integration
On the frontier of genotype-2-phenotype data integrationOn the frontier of genotype-2-phenotype data integration
On the frontier of genotype-2-phenotype data integration
 
Envisioning a world where everyone helps solve disease
Envisioning a world where everyone helps solve diseaseEnvisioning a world where everyone helps solve disease
Envisioning a world where everyone helps solve disease
 
Getting (and giving) credit for all that we do
Getting (and giving) credit for all that we doGetting (and giving) credit for all that we do
Getting (and giving) credit for all that we do
 

Recently uploaded

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 

Recently uploaded (20)

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 

Dataset description using the W3C HCLS standard

  • 1. Describing Datasets with the W3C HCLS standard Melissa Haendel Michel Dumontier
  • 2. World Wide Web Consortium (W3C)  The W3C is the main international standards organization for the World Wide Web  The W3C is made up of over 400 member organizations for the purpose of working together in the development of standards for the World Wide Web  W3C has sophisticated development and community validation procedures for standards development
  • 3. The Semantic Web is the new global web of knowledge It involves standards for publishing, sharing, and querying facts, expert knowledge and services It is a scalable approach to the discovery of independently formulated and distributed knowledge Cyganiak and Jentzsch. http://lod-cloud.net/
  • 4. Resource Description Framework  Language to represent knowledge  Logic-based formalism -> automated reasoning  graph-like properties -> data analysis  Good for:  Describing in terms of type, attributes, relations  Integrating data from different sources  Sharing the data (W3C standard)  Reusing what is available, developing what you need, and contributing back to the web of data
  • 5. Challenge: Working with Web Data  Often have inadequate descriptions so we don’t know what they are about or how they were constructed  Datasets change over time, but often don’t come with versioning information  May have been constructed using other data, but it’s not clear which version of data was used or whether these were modified  Data may be available in a variety of formats  There may be multiple copies of data from different providers, but it’s unclear if they are exact copies or derivatives  Version of standard or vocabulary used not indicated  Data registries are not synchronized and can contain conflicting information
  • 6. Key Use Cases for HCLS Dataset description 1. Dataset Identification, Description, Licensing and Provenance 2. Dataset Discovery (via Catalog) 3. Exchange of Dataset Descriptions 4. Dataset Linking 5. Content Summary 6. Monitoring of Dataset Changes
  • 7. Objectives  Develop a guidance note for reusing existing vocabularies to describe datasets with RDF – Mandatory, recommended, optional descriptors – Identifiers – Versioning – Attribution – Provenance – Content summarization  Recommend vocabulary-linked attributes and value sets  Provide reference editor and validation
  • 8. We complied a list of metadata fields used across the community and then surveyed over 20 vocabularies to see if they provided relevant metadata elements or value sets… …to produce a big spreadsheet that maps metadata needs with existing vocabularies
  • 9. Dublin Core Metadata Initiative Widely used Broadly applicable – Documents – Datasets ✗Generic terms ✗Not comprehensive ✗No required properties “Date: A point or period of time associated with an event in the lifecycle of the resource.”
  • 10. DCAT: Data Catalog  Separates Dataset and Distribution ✗No versioning ✗No prescribed properties
  • 11. No single vocabulary provides all key metadata fields
  • 13.
  • 15. Three Component Metadata Model: description – version – distribution
  • 16. Description  Identifiers  Title  Description  Homepage  License  Language  Keywords  Concepts and vocabularies used  Standards  Publication
  • 17. Attribution  Simple Model – Individuals are related to roles using specific properties e.g. dct:creator, pav:createdBy, pav:curatedBy  Expandable Model – Individuals are related to roles and dates via associated object – PROV, VIVO-ISF
  • 18. Provenance and Change  Version number  Source  Provenance: retrieved from, derived from, created with  Frequency of change
  • 19. Availability  Format  Download URL  Landing page  SPARQL endpoint
  • 20. VoID Editor Tools to create the metadata
  • 21. Tools to validate the metadata New version using ShEx in development
  • 22. HCLS: http://www.w3.org/blog/hcls/ Mailing list: http://lists.w3.org/Archives/Public/public- semweb-lifesci/ Editors’ Draft: http://tiny.cc/hcls-datadesc-ed W3C Interest Group Note: http://tiny.cc/hcls-datadesc Special thanks to Alasdair Gray, Scott Marshall, Joachim Baran Thanks to all other contributors to the HCLS note

Editor's Notes

  1. We reuse several properties
  2. Dataset description creator Generates outline description through web form Allows you to see generated content
  3. Given a dataset description, does it conform to the OPS guidelines Generates error (red) and warning (orange) reports Error for MUST properties Warning for SHOULD properties Information for MAY properties